• [PATCH] libdpkg: Use OpenSSL for hashing.

    From Sebastian Andrzej Siewior@21:1/5 to All on Sat Jul 1 00:10:01 2023
    This a quick hack to get some feedback regarding this change: Use
    OpenSSL crypto library for hashing instead of libmd.
    OpenSSL provides a slightly assembly optimized version for amd64 while
    libmd is pure C. This passes the testsuite and I was able to perform an
    upgrade so it can't be that bad ;) If I read this right, the checksum is computed during package installation.
    Would it be acceptable to to switch it?

    While at it, why do we use md5? I'm asking because a small upgrade to
    sha1 would improve the performance since sha1 performs better on
    architectures that provide optimisation for it which includes a lot.
    _If_ we are changing things here then we could decide if something like checksum (e.g. xxhash) is enough to catch a bitflip or if a
    cryptographic checksum is really required.

    I don't know _why_ we have it: The .deb file was verified by apt after
    the download, the decompressor has also a checksum.
    The md5sum file isn't signed in any way, so whoever modified the binary
    in the system could update md5sum file in case it is verified later.

    Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
    ---
    configure.ac | 6 ++-
    debian/control | 3 +-
    lib/dpkg/buffer.c | 110 ++++++++++++++++++++++++++++++++++++++++------
    3 files changed, 103 insertions(+), 16 deletions(-)

    diff --git a/configure.ac b/configure.ac
    index afc6d2bf7e0bc..72458d50586b8 100644
    --- a/configure.ac
    +++ b/configure.ac
    @@ -96,7 +96,11 @@ AC_SYS_LARGEFILE

    # Checks for libraries.
    DPKG_LIB_RT
    -DPKG_LIB_MD
    +#DPKG_LIB_MD
    +AC_ARG_VAR([MD_LIBS], [linker flags for md library])
    +MD_LIBS=-lcrypto
    +have_libmd=openssl
    +AC_SEARCH_LIBS([EVP_MD_fetch], [crypto])
    DPKG_LIB_Z
    DPKG_LIB_BZ2
    DPKG_LIB_LZMA
    diff --git a/debian/control b/debian/control
    index d1b304546ef45..6fd7c29387bd2 100644
    --- a/debian/control
    +++ b/debian/control
    @@ -16,7 +16,8 @@ Rules-Requires-Root: no
    gettext (>= 0.19.7),
    # Version needed for --porefs defaults, conditional addenda and mode=eof.
    po4a (>= 0.59),
    - libmd-dev,
    +# libmd-dev,
    + libssl-dev,
    zlib1g-dev,
    libbz2-dev,
    # Version needed for multi-threaded decompress
  • From Guillem Jover@21:1/5 to Sebastian Andrzej Siewior on Sat Jul 1 13:20:02 2023
    Hi!

    On Sat, 2023-07-01 at 00:03:53 +0200, Sebastian Andrzej Siewior wrote:
    This a quick hack to get some feedback regarding this change: Use
    OpenSSL crypto library for hashing instead of libmd.
    OpenSSL provides a slightly assembly optimized version for amd64 while
    libmd is pure C. This passes the testsuite and I was able to perform an upgrade so it can't be that bad ;) If I read this right, the checksum is computed during package installation.
    Would it be acceptable to to switch it?

    I gave the reasoning for switching from the embedded MD5
    implementation to libmd at <https://lists.debian.org/debian-devel/2022/07/msg00045.html>, and I
    think it still holds. (This would also imply pulling OpenSSL or any
    other crypto library into the current essential-set, which are not
    small compared to the minimal libmd library.)

    While at it, why do we use md5? I'm asking because a small upgrade to
    sha1 would improve the performance since sha1 performs better on architectures that provide optimisation for it which includes a lot.
    _If_ we are changing things here then we could decide if something like checksum (e.g. xxhash) is enough to catch a bitflip or if a
    cryptographic checksum is really required.

    I don't know _why_ we have it: The .deb file was verified by apt after
    the download, the decompressor has also a checksum.
    The md5sum file isn't signed in any way, so whoever modified the binary
    in the system could update md5sum file in case it is verified later.

    Right, this is not for security reasons, but for integrity ones. This
    part of the interface (debsums assume these exist), and is also what
    dpkg uses to track conffiles changes, and is also part of the interface,
    where packages query those values to check whether files have changed
    from the original for example. There's been talk over the years to
    "upgrade" to stronger digest functions, but it's not been really
    pressing precisely because these are not intended for security purposes,
    and having a stronger digest might make people thing they are intended
    for security. Although using cryptographically broken digests means
    this tends to trigger people's alarm bells, so…

    This is supposedly documented in the deb-md5sums(5) man page, and
    perhaps should be made more clear in the man page documenting the
    «dpkg -V» option, so I'm happy to try to clarify these.


    With the fsys metadata work, it will be easier to add new digests, but
    that implies an increase in db size or control members in .deb, so I'm
    not sure whether it's really worth it. There are people that want to
    also include per-file signatures (such as IMA stuff) in the mtree
    metadata in the .deb, so that would cover the security side of things,
    but that would go counter to reproducibility, so I'm not seeing that
    happening easily, and I expect there will probably be concerns about
    lock-in and similar.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sebastian Andrzej Siewior@21:1/5 to Guillem Jover on Sun Jul 2 11:50:01 2023
    On 2023-07-01 13:16:17 [+0200], Guillem Jover wrote:
    Hi!
    Hi Guillem,

    On Sat, 2023-07-01 at 00:03:53 +0200, Sebastian Andrzej Siewior wrote:
    Would it be acceptable to to switch it?

    I gave the reasoning for switching from the embedded MD5
    implementation to libmd at <https://lists.debian.org/debian-devel/2022/07/msg00045.html>, and I
    think it still holds. (This would also imply pulling OpenSSL or any
    other crypto library into the current essential-set, which are not
    small compared to the minimal libmd library.)

    I can't comment on that. libssl is pretty much part of every
    installation. libgcrypt is part of every debootstrap due to gpg.
    The essential-set seems to be something different and looking at libmd's
    size of 60KiB vs libssl 6MiB it is hard to argue for libssl :)
    As I said, can't comment on that but thanks for the background.

    Right, this is not for security reasons, but for integrity ones. This
    part of the interface (debsums assume these exist), and is also what
    dpkg uses to track conffiles changes, and is also part of the interface, where packages query those values to check whether files have changed
    from the original for example. There's been talk over the years to
    "upgrade" to stronger digest functions, but it's not been really
    pressing precisely because these are not intended for security purposes,
    and having a stronger digest might make people thing they are intended
    for security. Although using cryptographically broken digests means
    this tends to trigger people's alarm bells, so…

    It popped up on my side due to popping in perf testing ;)

    This is supposedly documented in the deb-md5sums(5) man page, and
    perhaps should be made more clear in the man page documenting the
    «dpkg -V» option, so I'm happy to try to clarify these.

    ach. Years ago I used something different for it. Good to know that dpkg
    itself supports it.

    With the fsys metadata work, it will be easier to add new digests, but
    that implies an increase in db size or control members in .deb, so I'm
    not sure whether it's really worth it. There are people that want to
    also include per-file signatures (such as IMA stuff) in the mtree
    metadata in the .deb, so that would cover the security side of things,
    but that would go counter to reproducibility, so I'm not seeing that happening easily, and I expect there will probably be concerns about
    lock-in and similar.

    Oh I see. So based on what I read, it is just a checksum kind of thing
    so xxh128 would be a perfect replacement. But I do understand that you
    need to maintain things and adding an additional digest means adding and keeping the older one for compatibility reasons.

    Since it popped on my perf testing, do we need to verify the md5sum
    during installation? I tried installation of firefox (since it is a
    little big) with libmd, openssl and then telling dpkg to just do nothing
    and compare the runtime. I didn't do that because the installation
    process involved man-db taking some time and I was worried that it might
    fiddle with the results. Then I tried "dpkg-deb -x" but didn't see md5
    in log so it seems to be skipped.
    If adding a fast replacement is difficult could we skip doing the md5
    check during installation?

    Thanks,
    Guillem

    Sebastian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Guillem Jover@21:1/5 to Sebastian Andrzej Siewior on Mon Jul 3 23:50:02 2023
    Hi!

    On Sun, 2023-07-02 at 11:44:26 +0200, Sebastian Andrzej Siewior wrote:
    On 2023-07-01 13:16:17 [+0200], Guillem Jover wrote:
    On Sat, 2023-07-01 at 00:03:53 +0200, Sebastian Andrzej Siewior wrote:
    Would it be acceptable to to switch it?

    I gave the reasoning for switching from the embedded MD5
    implementation to libmd at <https://lists.debian.org/debian-devel/2022/07/msg00045.html>, and I
    think it still holds. (This would also imply pulling OpenSSL or any
    other crypto library into the current essential-set, which are not
    small compared to the minimal libmd library.)

    I can't comment on that. libssl is pretty much part of every
    installation. libgcrypt is part of every debootstrap due to gpg.
    The essential-set seems to be something different and looking at libmd's
    size of 60KiB vs libssl 6MiB it is hard to argue for libssl :)
    As I said, can't comment on that but thanks for the background.

    The more important issue, which I mentioned on the linked post is that
    AFAIK these crypto libraries can end up refusing to provide
    implementations for things that have been disabled due to some policy
    (for security reasons on unsafe algos, or stuff like FIPS), which is
    not really what is desired here.

    It popped up on my side due to popping in perf testing ;)

    I've pondered about adding assembler optimized versions of some of
    these functions to libmd, but the complexity didn't seem worth it.
    But I think this could be revisited.

    This is supposedly documented in the deb-md5sums(5) man page, and
    perhaps should be made more clear in the man page documenting the
    «dpkg -V» option, so I'm happy to try to clarify these.

    ach. Years ago I used something different for it. Good to know that dpkg itself supports it.

    I've queued the attached patch, hoping that might help a bit with the
    docs.

    With the fsys metadata work, it will be easier to add new digests, but
    that implies an increase in db size or control members in .deb, so I'm
    not sure whether it's really worth it. There are people that want to
    also include per-file signatures (such as IMA stuff) in the mtree
    metadata in the .deb, so that would cover the security side of things,
    but that would go counter to reproducibility, so I'm not seeing that happening easily, and I expect there will probably be concerns about lock-in and similar.

    Oh I see. So based on what I read, it is just a checksum kind of thing
    so xxh128 would be a perfect replacement. But I do understand that you
    need to maintain things and adding an additional digest means adding and keeping the older one for compatibility reasons.

    Yes. It's good though that there are xxhsum CLI tools so that these can
    easily interoperate. If adding something new though, an advantage of
    using a SHA-2 variant, for example, is that those tools are provided
    in coreutils which is also part of the essential set.

    Since it popped on my perf testing, do we need to verify the md5sum
    during installation? I tried installation of firefox (since it is a
    little big) with libmd, openssl and then telling dpkg to just do nothing
    and compare the runtime. I didn't do that because the installation
    process involved man-db taking some time and I was worried that it might fiddle with the results.

    You should be able to disable man-db processing with:

    $ echo 'man-db man-db/auto-update boolean false' | debconf-set-selections

    On mechanical disks I'd expect fsync()/sync_file_range()/etc to
    dominate, on SSD disks I'd expect the decompression to dominate. But
    I might be wrong, so if you feel like digging into this and get some
    numbers that might shed some light. And perhaps adding asm optimized
    variants to libmd might just do it?

    Then I tried "dpkg-deb -x" but didn't see md5
    in log so it seems to be skipped.

    Yeah, you should think about «dpkg-deb -x» to be closer to tar, than
    dpkg in that sense, it will skip many of the actions and checks done
    by the latter.

    If adding a fast replacement is difficult could we skip doing the md5
    check during installation?

    Adding new digests should not be hard, it's more whether it's worth
    it, increased db space usage, potential false sense of security, and
    what it pulls in, and availability etc. For multiarch refcounting we
    must do the digests to make sure whether they are the same files. For unpacking, if there is no md5sums we should do these too to have the
    digests around. We have done the digests for all other files on
    unpacking even if there was an md5sums file because sometimes packages contained incomplete or out-of-sync information in the md5sums files. :/

    Thanks,
    Guillem

    From 03c1cfe858b83db8f02344f095ff4966c1313891 Mon Sep 17 00:00:00 2001
    From: Guillem Jover <guillem@debian.org>
    Date: Mon, 3 Jul 2023 23:18:13 +0200
    Subject: [PATCH] man: Clarify that the md5sums checks as integrity and not
    security checks

    This is hinted in the text and in deb-md5sums(5) man page, but it's
    worth making this point very clear least someone might incorrectly
    assume they can verify their system is fine from attacks or malicious modifications.

    Prompted-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
    ---
    man/dpkg.pod | 4 ++++
    1 file changed, 4 insertions(+)

    diff --git a/man/dpkg.pod b/man/dpkg.pod
    index 20112591c..b7d056fab 100644
    --- a/man/dpkg.pod
    +++ b/man/dpkg.pod
    @@ -338,6 +338,8 @@ of the file contents against the stored value in the files database.
    It will only get checked
    if the database contains the file md5sum. To check for any missing
    metadata in the database, the B<--audit> command can be used.
    +This is only an integrity check and should not be considered as any
    +kind of security verification.

    The output format is selectable with the B<--verify-format>
    option, which by default uses the B<rpm> format, but that might
    @@ -1083,6 +1085,8 @@ information available.
    =item 3 ‘B<5>’

    The digest check failed, which means the file contents have changed.
    +This is only an integrity check and should not be considered as any
    +kind of security verification.

    =item 4-9 ‘B<?>’

    --
    2.40.1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)