• Re: Validating tarballs against git repositories

    From Guillem Jover@21:1/5 to Antonio Russo on Sat Mar 30 05:50:01 2024
    Hi!

    On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
    This is a vector I've been somewhat paranoid about myself, and I
    typically check the difference between git archive $TAG and the downloaded tar, whenever I package things. Obviously a backdoor could have been inserted into the git repository directly, but there is a culture
    surrounding good hygiene in commits: they ought to be small, focused,
    and well described.

    But the backdoor was in fact included in a git commit (it's hidden
    inside a test compressed file).

    The part that was only present in the tarball was the code to extract
    and hook the inclusion of the backdoor via the build system.

    People are comfortable discussing and challenging
    a commit that looks fishy, even if that commit is by the main developer
    of a package. I have been assuming tooling existed in package
    maintainers' toolkits to verify the faithful reproduction of the
    published git tag in the downloaded source tarball, beyond a signature
    check by the upstream developer. Apparently, this is not universal.

    Had tooling existed in Debian to automatically validate this faithful reproduction, we might not have been exposed to this issue.

    Given that the autogenerated stuff is not present in the git tree,
    a diff between tarball and git would always generate tons of delta,
    so this would not have helped.

    Having done this myself, it has been my experience that many partial
    build artifacts are captured in source tarballs that are not otherwise maintained in the git repository. For instance, in zfs (which I have contributed to in the past), many automake files are regenerated.
    (I do not believe that specific package is vulnerable to an attack
    on the autoconf/automake files, since the debian package calls the
    upstream tooling to regenerate those files.)

    We already have a policy of not shipping upstream-built artifacts, so
    I am making a proposal that I believe simply takes that one step further:

    1. Move towards allowing, and then favoring, git-tags over source tarballs

    I assume you mean git archives out of git tags? Otherwise how do you
    go from git-tag to a source package in your mind?

    2. Require upstream-built artifacts be removed (instead, generate these
    ab-initio during build)

    The problem here is that the .m4 file to hook into the build system was
    named like one shipped by gnulib (so less suspicious), but xz-utils does
    not use gnulib, and thus the autotools machinery does not know anything
    about it, so even the «autoreconf -f -i» done by debhelper via
    dh-autoreconf, would not regenerate it.

    Removing these might be cumbersome after the fact if upstream includes
    for example their own maintained .m4 files. See dpkg's m4 dir for an
    example of this (although there it's easy as all are namespaced but…).

    Not using an upstream provided tarball, might also mean we stop being
    able to use upstream signatures, which seems worse. The alternative
    might be promoting for upstreams to just do the equivalent of
    «git archive», but that might defeat the portability and dependency
    reduction properties that were designed into the autotools build
    system, or increase the bootstrap set (see for example the pkg.dpkg.author-release build profile used by dpkg).

    (For dpkg at least I'm pondering whether to play with switching to
    doing something equivalent to «git archive» though, but see above, or
    maybe generate two tarballs, a plain «git archive» and a portable one.)

    3. Have tooling that automatically checks the sanitized sources against
    the development RCSs.

    Perhaps we could have a declarative way to state all the autogenerated artifacts included in a tarball that need to be cleaned up
    automatically after unpack, in a similar way as how we have a way to automatically exclude stuff when repackaging tarballs via uscan?

    (.gitignore, if upstream properly maintains those might be a good
    starting point, but that will tend to include more than necessary.)

    4. Look unfavorably on upstreams without RCS.

    Some upstreams have a VCS, but still do massive code drops, or include autogenerated stuff in the VCS, or do not do atomic commits, or in
    addition their commit message are of the style "fix stuff", "." or
    alike. So while this is something we should encourage, it's not
    sufficient. I think part of this might already be present in our
    Upstream Guidelines in the wiki.

    In the present case, the triggering modification was in a modified .m4 file that injected a snippet into the configure script. That modification
    could have been flagged using this kind of process.

    I don't think this modification would have been spotted, because it
    was not modifying a file it would usually get autogenerated by its
    build system.

    While this would be a lot of work, I believe doing so would require a
    much larger amount of additional complexity in orchestrating attacks
    against Debian in the future.

    It would certainly make it a bit harder, but I'm afraid that if you
    cannot trust upstream and they are playing a long game, then IMO they
    can still sneak nasty stuff even in plain sight with just code commits,
    unless you are paying extreme close attention. :/

    See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Antonio Russo on Sat Mar 30 07:30:01 2024
    Antonio Russo <antonio.e.russo@gmail.com> writes:

    The way I see it, there are two options in handling a buildable package:

    1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.

    2. The file would not have been scrubbed, and a difference between the
    git version and the released tar version would have been noticed.
    Backdoor found.

    Either of these is, in my mind, dramatically better than what happened.

    I think the point that you're assuming (probably because you quite
    reasonably think it's too obvious to need to be stated, but I'm not sure
    it's that obvious to everyone) is that malicious code injected via a
    commit is significantly easier to detect than malicious code that is only
    in the release tarball.

    This is not *always* correct; it really depends on how many eyes are on
    the upstream repository and how complex or unreadable the code upstream
    writes normally is. (For example, I am far from confident that I can
    eyeball the difference between valid and malicious procmail-style C code
    or random M4 files.) I think it's clearly at least *sometimes* correct, though, so I'm sympathetic, particularly given that it's already Debian practice to regenerate the build system files anyway.

    In other words, we should make sure that breaking the specific tactics
    *this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
    minor shift in attacker tactics. I *think* I'm mostly convinced that
    forcing the attacker into Git commits is a useful partial defense, but I'm
    not sure this is obviously true.

    Ok, so am I understanding you correctly in that you are saying: we do actually want *some* build artifacts in the source archives?

    If that's the case, could make those files at packaging time, analogous
    to the DFSG-exclude stripping process?

    If I have followed this all correctly, I believe that in this case the
    exploit is not in a build artifact. It's in a very opaque source artifact
    that is different in the release tarball from the Git archive. Assuming
    that I have that right, stripping build artifacts wouldn't have done
    anything about this exploit, but comparing Git and release tarballs would
    have.

    I think you're here anticipating a *different* exploit that would be
    carried in build artifacts that Debian didn't remove and reconstruct, and
    that we want to remove those from our upstream source archives in order to ensure that we can't accidentally do that.

    On 2024-03-29 22:41, Guillem Jover wrote:

    (For dpkg at least I'm pondering whether to play with switching to
    doing something equivalent to «git archive» though, but see above, or
    maybe generate two tarballs, a plain «git archive» and a portable one.)

    Yeah, with my upstream hat on, I'm considering something similar, but I
    still believe I have users who want to compile from source on systems
    without current autotools, so I still need separate release tarballs.
    Having to generate multiple release artifacts (and document them, and
    explain to people which ones they want, etc.) is certainly doable, but I
    can't say that I'm all that thrilled about it.

    I think with my upstream hat on I'd rather ship a clear manifest (checked
    into Git) that tells distributions which files in the distribution tarball
    are build artifacts, and guarantee that if you delete all of those files,
    the remaining tree should be byte-for-byte identical with the
    corresponding signed Git tag. (In other words, Guillem's suggestion.)
    Then I can continue to ship only one release artifact.

    I take a look at these every year or so to keep me terrified of C! If
    it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there
    is hope (of them!) identifying bad(tm) things.

    A single upstream developer is the most common case, though. Perhaps less
    so for core libraries, but, well, there are plenty of examples. (To pick another one that comes readily to mind, zlib appears to only have one
    active maintainer.)

    The reality that we are struggling with is that the free software infrastructure on which much of computing runs is massively and painfully underfunded by society as a whole, and is almost entirely dependent on
    random people maintaining things in their free time because they find it
    fun, many of whom are close to burnout. This is, in many ways, the true
    root cause of this entire event.

    The sad irony here is that the xz maintainer tried to do exactly what we
    advise people in this situation to do: try to add a comaintainer to share
    the work, and don't block work because you don't have time to personally
    vet everything in detail. This is *exactly* why maintainers often don't
    want to do that, and thus force people to fork packages rather than join
    in maintaining the existing package.

    This is an aside, but this is why my personal policy for my own projects
    that I no longer have to maintain is to orphan them and require that
    someone fork them, not add additional contributors to my repository or
    release infrastructure. I do not have the resources to vet new
    maintainers -- if I had that time to spend on the projects, I wouldn't
    have orphaned them -- and therefore I want to explicitly disclaim any responsibility for what the new maintainer may do. Someone else will have
    to judge whether they are trustworthy. But I'm not sure that
    distributions are in a good position to do that *either*.

    But, I will definitely concede that, had I seen a commit that changed
    that line in the m4, there's a good chance my eyes would have glazed
    over it.

    This is why I am somewhat skeptical that forcing everything into Git
    commits is as much of a benefit as people are hoping. This particular
    attacker thought it was better to avoid the Git repository, so that is
    evidence in support of that approach, and it's certainly more helpful,
    once you know something bad has happened, to be able to use all the Git
    tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be
    force-pushed, and if the Git repository is somewhere other than GitHub,
    the malicious possibilities are even broader.

    We could narrow those possibilities somewhat by maintaining
    Debian-controlled mirrors of upstream Git repositories so that we could
    detect rewritten history. (There are a whole lot of reasons why I think
    dgit is a superior model for archive management. One of them is that it captures the full Git history of upstream at the point of the upload on Debian-controlled infrastructure if the maintainer of the package bases it
    on upstream's Git tree.)

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Antonio Russo on Sat Mar 30 08:20:01 2024
    On 30/03/24 01:21, Antonio Russo wrote:
    3. Have tooling that automatically checks the sanitized sources against
    the development RCSs.

    git-buildpackage and pristine-tar can be used for that.

    4. Look unfavorably on upstreams without RCS.

    And look unfavorably on Debian packages without VCS. And, in addition:

    5. Require something like tag2upload to create new releases of Debian
    packages.

    For too many core packages there is an opaque "something happens on the
    Debian maintainer laptop" step that has no place in 2024. We have no
    idea how many Debian DDs/DMs machiens have been compromised because of
    this attack. (Hopefully zero.) Any future upload of source debs may, in principle, contain malicious code.

    The workflow for Debian packages has already gone from:

    1. new upstream release;
    2. something happens on the DD/DM machine;
    3. the DD/DM uploads two non-reviewed-in-practice blobs (source deb,
    binary deb) to unstable.

    to:

    1. new upstream release;
    2. something happens on the DD/DM machine;
    3. the DD/DM uploads a non-reviewed-in-practice blob (source deb) to the buildd;
    4. the buildd compiles the source deb into the binary deb;
    5. the buildd uploads a non-reviewed-in-practice blob (binary deb) to
    unstable.

    This change moved a lot of trust from the hands (and machines) of a
    myriad of DDs/DMs into a handful of closely guarded build machines. A compromised gcc on the DD/DM machine is no longer a problem. But a
    compromised tar/dpkg/debhelper still is.

    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs
    "source deb" and "binary deb" to unstable.

    This change would have three advantages:

    * Make the whole process happen outside the DD/DM computer, so it
    becomes more public and easier to review (commits vs debs), removing
    many chances for compromises.

    * Close two specific attack vectors (hiding code in upstream release
    tarballs and in source debs) that have always existed and for one of
    which we have now proof of exploitation.

    * Force attackers to do their work under public scrutiny, raising the complexity and the cost of carrying out an attack.

    Yes, such a workflow will not stop many other attack vectors, but at
    least _these_ attack vectors will be stopped.

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lucas Nussbaum@21:1/5 to Russ Allbery on Sat Mar 30 09:10:02 2024
    On 29/03/24 at 23:29 -0700, Russ Allbery wrote:
    The sad irony here is that the xz maintainer tried to do exactly what we advise people in this situation to do: try to add a comaintainer to share
    the work, and don't block work because you don't have time to personally
    vet everything in detail. This is *exactly* why maintainers often don't
    want to do that, and thus force people to fork packages rather than join
    in maintaining the existing package.

    Yes. In that specific case, the original xz maintainer (Lasse Collin)
    was socially-pressed by a likely fake person (Jigar Kumar) to do the
    "right thing" and hand over maintenance.

    https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

    I wonder if "Dennis Enn" is also a fake person. In retrospect, that
    email looks suspicious:

    On 2022-06-21 Dennis Ens wrote:
    Why not pass on maintainership for XZ for C so you can give XZ for
    Java more attention? Or pass on XZ for Java to someone else to focus
    on XZ for C? Trying to maintain both means that neither are
    maintained well.

    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?iso-8859-1?Q?An=EDbal?= Monsalve@21:1/5 to Antonio Russo on Sat Mar 30 09:40:01 2024
    On Fri, 2024-03-29 23:53:20 -0600, Antonio Russo wrote:
    On 2024-03-29 22:41, Guillem Jover wrote:
    See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.

    I take a look at these every year or so to keep me terrified of C!
    If it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there is hope (of them!) identifying bad(tm) things.

    Another scary example, "Reflections on Trusting Trust" by Ken Thompson:

    https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEExgRcgTiHt3wt/5elfFas/pR4l9gFAmYHzk8ACgkQfFas/pR4 l9jLcQ/9E5n7ceplpGaTuuMK9WjcmRVjFlURKJYbqKIXRGXXrIKLEq7fI9Z0iLxq paJxmf7pk2TsJ0p/vWrhIGExrYyGDKS/7W4PUEfDf5QhXuZGmqYhvoc21furzyPe argMsWhbc865qJlBnrtARiOidoPXqeceyPoK9fxrroppzyDwqs0wglxAiCCo29Xk 7zdYjqKKDcpzenTkwBUPv1eyn4Zv9jXu9AaB18c9gumRN/sCcKjUamBKWGiGXdqM WBJMAor5cRPJiijcP9OJr5fRIr3nzXliO4datJFWme3yM/SX1h8R9fL6eBNBZdRS OyNWPyaeow0n8N3kx7qbXU639uPIF/ul8RtYTtYhSj4lKScjnao/fzPZ1TTMIt7N AUnnRCbiZmmmcSJuok4mApd7+ZSpcFwLF9w85P1zQxjrHmHmkYC+LrZGbXGm9fbv AVrGcaCpe6LE/pXJiGoSYrDVbOXqTRm7lUxc5M0M/ZVTh8/GP0ed57dMNOgL1Xj+ hGjo3djxCn57XP97u5qxCST8hXr/AZbR/XqhyXkN6dycx0rftUF5idvhWcYoJWne 20ht7TWbsOzokUTh355noc9VD1V0+fYtvR8/x+blp5b+ijp9fHQSBD6iQT+PAqXL Grn+QrfmDJr5w1izBbXfVyTHuYD6D+yt5ZIXRu+qBdVC33JywAY=
    =Oxez
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?utf-8?Q?Ingo_J=C3=BCrgensmann?=@21:1/5 to In his reply to that mail Lasse on Sat Mar 30 10:30:01 2024
    Am 30.03.2024 um 08:56 schrieb Lucas Nussbaum <lucas@debian.org>:

    Yes. In that specific case, the original xz maintainer (Lasse Collin)
    was socially-pressed by a likely fake person (Jigar Kumar) to do the
    "right thing" and hand over maintenance. https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

    In his reply to that mail Lasse writes in https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html:

    It's also good to keep in mind that this is an unpaid hobby project.


    This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is widely used, insert some malicious code and have fun. Then also imagine stuff that automates builds in other
    ways like docker containers, Ruby, Rust, pip that pull stuff from the network and installs it without further checks.

    I hope (and am confident) that Debian as a project will react accordingly to prevent this happening again.

    But as a society (that is widely using FLOSS) I would also hope that our developers will get proper funding instead of requiring them to maintain such software in their spare time.

    --
    Ciao... // Web: http://blog.windfluechter.net
    Ingo \X/ XMPP/Jabber: ij@jhookipa.net

    gpg pubkey: http://www.juergensmann.de/ij_public_key.asc

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rakhmatullin@21:1/5 to All on Sat Mar 30 10:40:01 2024
    On Sat, Mar 30, 2024 at 09:58:22AM +0100, Ingo Jürgensmann wrote:
    Yes. In that specific case, the original xz maintainer (Lasse Collin)
    was socially-pressed by a likely fake person (Jigar Kumar) to do the
    "right thing" and hand over maintenance. https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

    In his reply to that mail Lasse writes in https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html:

    It's also good to keep in mind that this is an unpaid hobby project.


    This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is widely used, insert some malicious code and have fun. Then also imagine stuff that automates builds in other
    ways like docker containers, Ruby, Rust, pip that pull stuff from the network and installs it without further checks.

    I hope (and am confident) that Debian as a project will react accordingly to prevent this happening again.
    How?

    --
    WBR, wRAR

    -----BEGIN PGP SIGNATURE-----

    iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmYH3U4tFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh ndUP/jJDFzdOUdyoW5Ma5jCtgYD0uf/CMCoA3oUlI+wYokdYAETAaGj1aZ0ZFNGN t913tl/Uc+Cs5l7cmtj7y+n3TAd5hE5Lf9XTMDHUofFOJnQP2jJJP7JCSHSN/yny Y1ibBz4MPkaDahRVpYBC2h7x6GmBJ4M26O9eN5u7rV/LZ0eiYUeXkyxjOE4mXY6I XDWoqrGXwUwCvVj7KbJKXTYueedya9pKKypESnmSgQm8gZZZwYbaVfY4GL14wm7F oC1cQzew3DV4KZKqx/SqZDpeBNLHq792iMf7TL44VPavCJdeyVBQYfdDk8p39Ofy E5AY5S0P6aoFac/2Tnf6eXsPCSKJMdvrgyntHT8HpId5Tb9MlP0m9XsPKGOFxOhY HYoFpClzKv/Jjb9OhTBtlpOnwxmE2dc+iGboOLErF0194EBrRtfbOmf72PzPCr5m d2JMWKPocdZoX2p8JQyImplVkBN0s2ElSJRIJxu/CdfSW70OmIfqyCE6fPJNjy9/ id8t4+XzmAiqGyX7Sryy94Aej8i0ZT5lX87XFHB6vYULVXizDtR2ZHU6D0uZCp3Q LQq6YigupTJC2wv/4X5Rc9FLQjbztP9ABb+Apyg5YPlcUMeFHGBUkPxKWbxjXnrs w9eXlP1UwD6FaxUECWDB7Wr18Wmr4W4BjXu258Vio+H6tCWr
    =YFiT
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Iustin Pop@21:1/5 to Gioele Barabucci on Sat Mar 30 11:00:01 2024
    On 2024-03-30 08:02:04, Gioele Barabucci wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs "source deb" and "binary deb" to unstable.

    This change would have three advantages:

    I think everyone fully agrees this is a good thing, no need to list the advantages.

    The problem is that this requies functionality testing to be fully
    automated via autopkgtest, and moved off the "update changelog, build
    package, test locally, test some more, upload".

    Give me good Salsa support for autopkgtest + lintian + piuparts, and
    easy support (so that I just have to toggle one checkbox), and I'm
    happy. Or even better, integrate all that testing with Salsa (I don't
    know if it has "CI tests must pass before merging"), and block tagging
    on the tagged version having been successfully tested.

    And yes, this should be uniform across all packages stored on Salsa, so
    as to not diverge how the testing is done.

    iustin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Antonio Russo on Sat Mar 30 10:40:01 2024
    Antonio Russo <aerusso@aerusso.net> writes:

    1. Move towards allowing, and then favoring, git-tags over source tarballs

    Some people have suggested this before -- and I have considered adopting
    that approach myself, but one thing that is often overlooked is that
    building from git usually increase the Build-Depends quite a lot
    compared to building from tarball, and that will more likely trigger
    cyclic dependencies. People that do bootstrapping for new platforms or cross-platform dislike such added dependency.

    One response to that may be "sorry, our concerns for supply chain
    security trumps your desire for easier building" but so far I believe
    the approach has been to compromise a little on supply chain side (i.e., building from tarballs) and compromise a little on the
    bootstrap/crossbuild smoothness (e.g., adding nodoc or nocheck targets).

    Moving that needle isn't all that trivial, although I think I'm moving
    myself to a preference that we really need to build everything from
    source code and preferrably not even including non-source code files
    because they may dormant and activated later on a'la the xz attack.

    An old irk of mine is that people seems to believe that running
    'autoreconf -fi' is intended or supposed to combat problems related to
    this: autoreconf was never designed for that purpose, nor does it
    achieve it realiably. Many distributions have adopted a preference to
    do run 'autoreconf' to "re-bootstrap" a project from source code. This
    misses a lot of generated files, and sometimes generate incorrect (and
    possibly harmful) newly generated files. For example: https://gitlab.com/libidn/libidn2/-/issues/108

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgfV6RQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFouxbAQCAkO/7CbiOC4OeF3uU3/kBoKlxJ6Xy yuDo3de0UEp87wEAn6jkW6M/UspaizrKE3CBs1djg4g5WRgQa3V6fDrgRg8=
    =2CJi
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Simon Josefsson on Sat Mar 30 11:10:01 2024
    On 30/03/24 10:05, Simon Josefsson wrote:
    Antonio Russo <aerusso@aerusso.net> writes:

    1. Move towards allowing, and then favoring, git-tags over source tarballs

    Some people have suggested this before -- and I have considered adopting
    that approach myself, but one thing that is often overlooked is that
    building from git usually increase the Build-Depends quite a lot
    compared to building from tarball, and that will more likely trigger
    cyclic dependencies. People that do bootstrapping for new platforms or cross-platform dislike such added dependency.

    Most of the time such added dependencies could be worked around with
    build profiles and cross building. More widespread support for <nodoc>, <nocheck> and Multi-Arch annotations can greatly reduce the number of
    deps needed to bootstrap an architecture.

    Just as an example, bootstrapping coreutils currently requires
    bootstrapping at least 68 other packages, including libx11-6 [1]. If
    coreutils supported <nodoc> [2], the transitive closure of its
    Build-Depends would be reduced to 20 packages, most of which in build-essential.

    [1] https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
    [2] https://bugs.debian.org/1057136

    Regards,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sean Whitton@21:1/5 to Antonio Russo on Sat Mar 30 11:50:02 2024
    Hello,

    On Fri 29 Mar 2024 at 06:21pm -06, Antonio Russo wrote:

    1. Move towards allowing, and then favoring, git-tags over source tarballs

    Many of us already do this. dgit maintains an official store of the tags.

    --
    Sean Whitton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sean Whitton@21:1/5 to Iustin Pop on Sat Mar 30 11:50:02 2024
    Hello,

    On Sat 30 Mar 2024 at 10:56am +01, Iustin Pop wrote:

    On 2024-03-30 08:02:04, Gioele Barabucci wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs "source >> deb" and "binary deb" to unstable.

    This change would have three advantages:

    I think everyone fully agrees this is a good thing, no need to list the advantages.

    It is also already fully implemented as tag2upload, and is merely as yet undeployed, for social reasons.

    --
    Sean Whitton

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Gioele Barabucci on Sat Mar 30 12:30:01 2024
    Gioele Barabucci <gioele@svario.it> writes:

    Just as an example, bootstrapping coreutils currently requires
    bootstrapping at least 68 other packages, including libx11-6 [1]. If coreutils supported <nodoc> [2], the transitive closure of its
    Build-Depends would be reduced to 20 packages, most of which in build-essential.

    [1] https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
    [2] https://bugs.debian.org/1057136

    Coreutils in Debian uses upstream tarballs and does not do a full
    bootstrap build. It does autoreconf instead of ./bootstrap. So the dependencies above is not the entire bootstrapping story to build
    coreutils from git compared to building from tarballs.

    It would help if upstreams would publish PGP-signed 'git-archive'-style tarballs, including content from git submodules in them.

    Relying on signed git tags is not reliable because git is primarily
    SHA1-based which in 2019 cost $45K to do a collission attack for.

    /Simon

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgf1ShQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFosaIAQDCpY0YFJ7Ubbgb8+QRzmK4mLIk4XuL 6QU4SsIKGXzsQQEAjEMdoWVeusE09NhFVu95pJTFFyL4yzFRffN6lzA/FQY=+ajI
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Boccassi@21:1/5 to Iustin Pop on Sat Mar 30 12:50:01 2024
    On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:

    On 2024-03-30 08:02:04, Gioele Barabucci wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
    deb" and "binary deb" to unstable.

    This change would have three advantages:

    I think everyone fully agrees this is a good thing, no need to list the advantages.

    The problem is that this requies functionality testing to be fully
    automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".

    Give me good Salsa support for autopkgtest + lintian + piuparts, and
    easy support (so that I just have to toggle one checkbox), and I'm
    happy. Or even better, integrate all that testing with Salsa (I don't
    know if it has "CI tests must pass before merging"), and block tagging
    on the tagged version having been successfully tested.

    This is all already implemented by Salsa CI? You just need to include
    the yml and enable the CI in the settings

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Boccassi@21:1/5 to Russ Allbery on Sat Mar 30 13:00:01 2024
    On Sat, 30 Mar 2024 at 06:29, Russ Allbery <rra@debian.org> wrote:

    Antonio Russo <antonio.e.russo@gmail.com> writes:

    The way I see it, there are two options in handling a buildable package:

    1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.

    2. The file would not have been scrubbed, and a difference between the
    git version and the released tar version would have been noticed.
    Backdoor found.

    Either of these is, in my mind, dramatically better than what happened.

    I think the point that you're assuming (probably because you quite
    reasonably think it's too obvious to need to be stated, but I'm not sure
    it's that obvious to everyone) is that malicious code injected via a
    commit is significantly easier to detect than malicious code that is only
    in the release tarball.

    This is not *always* correct; it really depends on how many eyes are on
    the upstream repository and how complex or unreadable the code upstream writes normally is. (For example, I am far from confident that I can
    eyeball the difference between valid and malicious procmail-style C code
    or random M4 files.) I think it's clearly at least *sometimes* correct, though, so I'm sympathetic, particularly given that it's already Debian practice to regenerate the build system files anyway.

    In other words, we should make sure that breaking the specific tactics
    *this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
    minor shift in attacker tactics. I *think* I'm mostly convinced that
    forcing the attacker into Git commits is a useful partial defense, but I'm not sure this is obviously true.

    While it's of course true that avoiding massaged tarballs as orig.tar
    is not a panacea, and that obfuscated malicious code can and is
    checked in git, I am pretty sure it is undeniable that having
    everything tracked in git makes it _easier_ to audit and investigate.
    Not perfect, not fool-proof, but easier, compared to manually diffing
    tarballs. And given we are talking about malicious actors using
    subterfuge to attack us, I think we could use all the help we can get,
    even if there's no perfect solution.

    In the end, massaged tarballs were needed to avoid rerunning
    autoconfery on twelve thousands different proprietary and
    non-proprietary Unix variants, back in the day. In 2024, we do
    dh_autoreconf by default so it's all moot anyway. When using Meson/CMake/home-grown makefiles there's no meaningful difference on
    average, although I'm sure there are corner cases and exceptions here
    and there.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sean Whitton@21:1/5 to Simon Josefsson on Sat Mar 30 13:30:01 2024
    Hello,

    On Sat 30 Mar 2024 at 12:19pm +01, Simon Josefsson wrote:

    Relying on signed git tags is not reliable because git is primarily SHA1-based which in 2019 cost $45K to do a collission attack for.

    We did some analysis on the SHA1 vulnerabilities and determined that
    they did not meaningfully affect dgit & tag2upload's design.

    --
    Sean Whitton

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmYIA34ZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQJ8KD/wNY7yiidt+z3vJzi761wt6 3ubkrhrJqpFPl1z2qxToiWhlPW58L9sNI6grjmQRkC8T2EaaBu/fKFH1FDhywg/T 3Hipd/lRdTQoyq/9lfSAzSaF5Rch30ZMgYtD0Ahq6rLLmk8yvjxr0+DLwTVOA7eU GxrZMj8I6rJbwJMsKDzWXxHvvpK0cTYtN0gcLwDGgpe4XfidBXSvfmTAWEO3GhuU RGHekOaMXU5uMP2QpmVLdy8aFDO3cBE9RE20tUaVHFh6gr8t0XIVF4AH6U3qFNjY 3W7yKDmQYbAxxH5ZTsj0abF0B/aIpJklsVM3k6ski48vlOCor4aLzvH96lzlw5/T /hH2XtVEfIeMnmH0+DFDtI7EIUBKolxSgd0nQK4k22CiLVeMbZO4JqvwxncF1iEh 3Sv0++L4w4cUplQmhqwQH40GAV7QWwIh2UzQqjF1NEd/+23Savg9VMAsiw5n13Pk 41yvwp5krebYaFFH8IU2Nupi3wf85iZ8VZd23ZkFZRvoLJTpCmfZVqG0uh8su/to v0oqWgBPAI9xFNTvRmd2mGf975BMxXx6T1F9rAXCmyTH1VbBiG2TMgWGJezR5AKR JAMmAqWxEZpKXAXzwtPjwhUWO1JEKtkLjckeHAXrVHqULwrgbuprKBP7QAB34w4N KHgOxhzIVsXsPBWJtekdgQ==SOgs
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Us
  • From Jan-Benedict Glaw@21:1/5 to Gioele Barabucci on Sat Mar 30 13:40:01 2024
    On Sat, 2024-03-30 08:02:04 +0100, Gioele Barabucci <gioele@svario.it> wrote:
    On 30/03/24 01:21, Antonio Russo wrote:
    3. Have tooling that automatically checks the sanitized sources against
    the development RCSs.

    git-buildpackage and pristine-tar can be used for that.

    Would be nice if pristine-tar's data file would be reproducible,
    too...

    MfG, JBG

    --

    -----BEGIN PGP SIGNATURE-----

    iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZggFygAKCRAdvV51g5nh u4rIAJ9WXEnnuVM3p32k4OrpwEdMid/rpACeLSljTtoZuAW0ibfkVX1W9u/HDQo=
    =GMYs
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jonathan Carter@21:1/5 to Simon Josefsson on Sat Mar 30 13:40:01 2024
    On 2024/03/30 11:05, Simon Josefsson wrote:
    1. Move towards allowing, and then favoring, git-tags over source tarballs

    Some people have suggested this before -- and I have considered adopting
    that approach myself, but one thing that is often overlooked is that
    building from git usually increase the Build-Depends quite a lot
    compared to building from tarball

    How in the world do you jump to that conclusion?

    -Jonathan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Jan-Benedict Glaw on Sat Mar 30 14:00:01 2024
    On Sat, Mar 30, 2024 at 01:30:07PM +0100, Jan-Benedict Glaw wrote:
    On Sat, 2024-03-30 08:02:04 +0100, Gioele Barabucci <gioele@svario.it> wrote:
    On 30/03/24 01:21, Antonio Russo wrote:
    3. Have tooling that automatically checks the sanitized sources against
    the development RCSs.
    git-buildpackage and pristine-tar can be used for that.
    Would be nice if pristine-tar's data file would be reproducible,
    too...

    Use pristine-lfs. Or just generate via "git archive".

    Bastian

    --
    It is undignified for a woman to play servant to a man who is not hers.
    -- Spock, "Amok Time", stardate 3372.7

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jonathan Carter@21:1/5 to Sean Whitton on Sat Mar 30 14:10:01 2024
    Hi Sean

    On 2024/03/30 12:43, Sean Whitton wrote:
    On 2024-03-30 08:02:04, Gioele Barabucci wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
    deb" and "binary deb" to unstable.

    This change would have three advantages:
    I think everyone fully agrees this is a good thing, no need to list the
    advantages.

    It is also already fully implemented as tag2upload, and is merely as yet undeployed, for social reasons.

    My understanding is that DSA aren't quite comfortable with it, since it
    would need to archive GPG signing key (or a keypair trusted by DAK)?

    I did enjoy the tag2upload talk that was given earlier this year at
    miniDebConf Campridge:

    https://peertube.debian.social/w/pav68XBWdurWzfTYvDgWRM

    One of the things I like most about it is that it doesn't break any
    existing workflow or technical implementation. And it seems like
    something most people would reasonably want to see implemented.

    So I think it boils down to finding some constructive way to engage with ftpmasters to find a solution that they are content with, because
    without that, nothing is going to happen. I'm not 100% sure that I would classify that as a social reason, DSA/ftpmaster is careful out of necessity.

    Any chance we can convince both ftpmaster members and tag2upload team to
    join at DebConf24 in Busan so that an attempt can be made to hash this
    out in person? I'm not sure everyone involved will be motivated enough
    to join a sprint just to work on this, but it tends to work so much
    better when people work on problems together in person rather than
    emails where people want to reply thoughtfully and then end up taking
    weeks to do so.

    I think it's not so much a question of *if* the Debian would ever switch
    to a git-based workflow, but *when*. And tag2upload's opt-in nature
    provides a great bridge to that future, there's clearly been a lot of
    good thought put into it, and there's really no alternative that even
    comes close in either design or being so close to being ready for implementation. However, I think it can only happen if you get all the
    right people in the same room to address the remaining concerns.

    -Jonathan

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Sean Whitton on Sat Mar 30 15:00:01 2024
    Sean Whitton <spwhitton@spwhitton.name> writes:

    Hello,

    On Sat 30 Mar 2024 at 12:19pm +01, Simon Josefsson wrote:

    Relying on signed git tags is not reliable because git is primarily
    SHA1-based which in 2019 cost $45K to do a collission attack for.

    We did some analysis on the SHA1 vulnerabilities and determined that
    they did not meaningfully affect dgit & tag2upload's design.

    Can you share that analysis? As far as I understand, it is possible for
    a malicious actor to create a git repository with the same commit id as
    HEAD, with different historic commits and tree content. I thought a
    signed tag is merely a signed reference to a particular commit id. If
    that commit id is a SHA1 reference, that opens up for ambiguity given
    recent (well, 2019) results on SHA1. Of course, I may be wrong in any
    of the chain, so would appreciate explanation of how this doesn't work.

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZggZmhQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFogFiAP41nz5dTNd08ZXiDw4okKZUjuFLxr5O tIj87BN+3QkRQwEAmlikfZgfzWvpdgsu4qqE5620ULBxXuqZ1vDTKPIQbQA=
    =AmRy
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Guillem Jover@21:1/5 to Antonio Russo on Sat Mar 30 14:20:02 2024
    Hi!

    On Fri, 2024-03-29 at 23:53:20 -0600, Antonio Russo wrote:
    On 2024-03-29 22:41, Guillem Jover wrote:
    On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
    Had tooling existed in Debian to automatically validate this faithful
    reproduction, we might not have been exposed to this issue.

    Given that the autogenerated stuff is not present in the git tree,
    a diff between tarball and git would always generate tons of delta,
    so this would not have helped.

    I may not have been clear, but I'm suggesting scrubbing all the
    autogenerated stuff, and comparing that against a similarly scrubbed
    git tag contents. (But you explain that this is problematic.)

    Yes, the point here is how we determine what is autogenerated stuff
    when confronted with a malicious upstream, so the problem again is
    that if you need to verify everything then you might easily get
    overwhelmed by sheer amount of autogenerated output. But see below.

    Having done this myself, it has been my experience that many partial
    build artifacts are captured in source tarballs that are not otherwise
    maintained in the git repository. For instance, in zfs (which I have
    contributed to in the past), many automake files are regenerated.
    (I do not believe that specific package is vulnerable to an attack
    on the autoconf/automake files, since the debian package calls the
    upstream tooling to regenerate those files.)

    (Hopefully the above clears up that I at least have some superficial awareness of the build artifacts showing up in the release tarball!)

    (Sorry, I guess my reply might have sounded patronizing? I noticed later
    on that you explicitly mentioned this, but thought that would be clear
    then when reading the whole mail, thought about adding a note to the
    earlier text, but considered it unnecessary. Should have probably added
    it anyway. :)

    1. Move towards allowing, and then favoring, git-tags over source tarballs

    I assume you mean git archives out of git tags? Otherwise how do you
    go from git-tag to a source package in your mind?

    I'm not wed to any specific mechanism, but I'd be content with that. I'd
    be most happy DD-signed tags that were certified dfsg, policy compliant (i.e., lacking build artifacts), and equivalent to scrubbed upstream source. (and more on that later, building on what you say).

    Many repositories today already do things close to this with pristine-tar,
    so this seems to me a direction where the tooling already exists.

    I'll add that, if we drop the desire for a signed archive, and instead require a signed git-tag (from which we can generate a source tar on
    demand, as you suggest), we can drop the pristine-tar requirement. If we
    are less progressive, but move to exclusively with Debian-regenerated
    .tar files, we can probably avoid many of the frustrating edge cases that pristine-tar still struggles with.

    I'm personally not a fan of pristine-tar, and my impression is that it
    is falling out of favor in various corners and big teams within the
    project. And then I'm also not a fan either for mixing packaging with
    upstream git history. The non-native packages I maintain only contain
    debian/ directories, which to me have the superior properties (but not tooling), including in a situation like this. I'll expand on this later.

    I've been thinking and, perhaps the only thing we'd need, is to include
    either a file or a field in some file that refers to the upstream commit
    we think the tarball is derived from. We also have fields that contain
    the upstream VCS repo. Then we could also have tooling that could perform
    such checks, independently from how we transport and pack our sources.

    2. Require upstream-built artifacts be removed (instead, generate these
    ab-initio during build)

    The problem here is that the .m4 file to hook into the build system was named like one shipped by gnulib (so less suspicious), but xz-utils does not use gnulib, and thus the autotools machinery does not know anything about it, so even the «autoreconf -f -i» done by debhelper via dh-autoreconf, would not regenerate it.

    The way I see it, there are two options in handling a buildable package:

    1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.

    2. The file would not have been scrubbed, and a difference between the
    git version and the released tar version would have been noticed.
    Backdoor found.

    Either of these is, in my mind, dramatically better than what happened.

    Sure, but that relies on knowing for certain what is and what is not autogenerated for 1), and to not be able to get drown in autogenerated
    output for 2) so that this cannot be easily missed, and for autoreconf
    to do what we expect! Also important is when this would be done, only
    on initial packaging, on every build? Because 1) has the bad property
    that it might get removed during initial packaging inspection, but
    might then stay latent until surrounding conditions activate it again
    if we are not continuously performing that kind of check (say on every
    build of the package).

    One automatic approach would be run dh-autoreconf and identify the
    changed files. Remove those files from both the distributed tarball and
    git tag. Check if those differ. (You also suggest something very similar
    to this, and repacking the archive with those debian-generated build artifacts).

    I may be missing something here, though!

    In theory this would be an option, I'm not sure how feasible this is
    in practice, though. :/ At least as of now.

    Removing these might be cumbersome after the fact if upstream includes
    for example their own maintained .m4 files. See dpkg's m4 dir for an example of this (although there it's easy as all are namespaced but…).

    I am not an m4 expert (in fact, I have specifically tried to avoid
    learning anything more about auto(make/reconf) than absolutely necessary.

    My point is just: either those files are needed, or not. If they're
    needed, they need to not differ. And if they're not, they should
    be scrubbed.

    I think you are saying that doing this automatically is going to be hard/impossible. Is that fair?

    Let's try to go in detail on how this was done on the build system
    side (I'm doing this right now, as previously only had skimmed over
    the process).

    The build system hook was planted in the tarball by adding a modified m4/build-to-host.m4 file. This file is originally from gnulib (but
    gettext would usually embed it if it required it). The macros contained
    within are used by m4/gettext.m4 coming from gettext.

    So to start with, this dependency (the AM_GNU_GETTEXT macro uses gl_BUILD_TO_HOST) is only present with newer gettext versions. The
    tarball was autoreconf'ed with gettext 0.22.4, Debian has gettext 0.21,
    which does not pull that dependency in. In that case if gettext.m4
    would get modified in this build now, then the hook would be inert,
    but once we update to a newer gettext then it would get activated
    again.

    The m4/build-to-host.m4 file in addition to hooking the payload into
    the build system, also got its serial number bumped from 3 to 30.

    And the bigger issue is that «autoreconf -f -i» does not even refresh
    the files (as you'd expect from the --force), if the .m4 serial is higher.
    So in Debian currently, the gettext.m4 in the tarball does not get
    refreshed (still pulling in the malicious build-to-host.m4, which
    would not happen with the gettext version from Debian), and if we
    updated to a newer gettext then it would not update build-to-host.m4
    anyway due to its bumped serial.

    This seems like a serious bug in autoreconf, but I've not checked if
    this has been brought up upstream, and whether they consider it's
    working as intended. I expect the serial to be used only when not
    in --force mode though. :/


    On the other side of the coin, is that the malicious actor added
    precisely that .m4 file into its .gitignore file (as you'd usually
    expect due to the new gettext version pulling that in), so if we were
    ignoring changes based on trusting upstream, then that could slip
    through if we only use this for checking, unless we repackage or always
    clean these before builds:

    https://git.tukaani.org/?p=xz.git;a=commitdiff;h=4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e

    Not using an upstream provided tarball, might also mean we stop being
    able to use upstream signatures, which seems worse. The alternative
    might be promoting for upstreams to just do the equivalent of
    «git archive», but that might defeat the portability and dependency reduction properties that were designed into the autotools build
    system, or increase the bootstrap set (see for example the pkg.dpkg.author-release build profile used by dpkg).

    Ok, so am I understanding you correctly in that you are saying: we do actually want *some* build artifacts in the source archives?

    Upstream might want them. And if we repackage out of principle, then
    we might lose on other properties, such as signatures for code
    provenance trails and similar.

    If that's the case, could make those files at packaging time, analogous
    to the DFSG-exclude stripping process?

    Ideally we'd remove all autogenerated files, but I'm not sure how
    effective that would be against a determined malicious actor.

    In the present case, the triggering modification was in a modified .m4 file
    that injected a snippet into the configure script. That modification
    could have been flagged using this kind of process.

    I don't think this modification would have been spotted, because it
    was not modifying a file it would usually get autogenerated by its
    build system.

    If we look at what ./autogen.sh would have changed, and scrub those
    files from the release archive, wouldn't that mean that the malicious
    m4 file would have been spotted, since it would NOT have been autogenerated?

    Not in this case no. autoreconf would have left alone both the
    gettext.m4 and the build-to-host.m4 files. I mean you are left with
    those (and other) files for manual review, and while like Russ I think
    I'm fluent in m4 and autotools stuff, if I had to skim review that
    file, it would look very non-suspicious to me, TBH.

    While this would be a lot of work, I believe doing so would require a
    much larger amount of additional complexity in orchestrating attacks
    against Debian in the future.

    It would certainly make it a bit harder, but I'm afraid that if you
    cannot trust upstream and they are playing a long game, then IMO they
    can still sneak nasty stuff even in plain sight with just code commits, unless you are paying extreme close attention. :/

    See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.

    I take a look at these every year or so to keep me terrified of C!
    If it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there is hope (of them!) identifying bad(tm) things.

    I just want to make sure that what we actually pull in is what the
    community is actually reviewing. I feel like anything less gets dangerous. (Given few enough eyeballs, all bugs are deep!)

    But, I will definitely concede that, had I seen a commit that changed
    that line in the m4, there's a good chance my eyes would have glazed over
    it.

    I think the biggest issue is that we are pretty much based on a model
    that relies on trusting upstreams, for code, for license and copyright compliance, etc. We tend to assume upstreams (and us!) can make
    mistakes, but that in general they are not working against us.

    When confronted with a known hostile (and not necessarily malicious)
    upstream the only winning game is not to play. If we do not even know
    the upstream is hostile and/or malicious that seems like a losing
    prospect to me. There are so many ways such upstream can slip stuff
    through in this model that this gets really nasty really quickly.

    Don't get me wrong, I think we can/should modify our processes and
    tooling somehow to at last try tot deter this path as much as possible,
    but it still seems to go counter to our model, and seems like a losing prospect. (You could have an upstream that tries to overwhelm you with
    sheer amount of commits for example. In this case they even included
    the bulk of the backdoor in git, and in the end I guess I don't see
    much difference between smuggling something through git or a tarball.)

    And, coming back to the Debian side of things. To me the most
    important part is that we might be able to close a bit this door with
    upstream, but what about this happening within Debian? I think we have discussed in the past, what would happen if someone tried this kind of
    long term attack on the project, and my feeling is that we have kind
    of shrugged it off as either "it would take too much effort so it's implausible" or "if they want to do it we are lost anyway" but perhaps
    I'm misremembering.

    Related to this, dgit has been brought up as the solution to this, but
    in my mind this incident reinforces my view that precisely storing
    more upstream stuff in git is the opposite of what we'd want, and
    makes reviewing even harder, given that in our context we are on a
    permanent fork against upstream, and if you include merge commits and
    similar, there's lots of places to hide stuff. In contrast storing
    only the packaging bits (debian/ dir alone) like pretty much every
    other downstream is doing with their packaging bits, makes for an
    obviously more manageable thing to review and not get drown into,
    more so if we have to consider that next time perhaps the long-game
    gets played within Debian. :(

    (An additional bonus of only keeping debian/ directories is that it
    would make it possible to checkout all Debian packaging locally. :)

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gioele Barabucci@21:1/5 to Jonathan Carter on Sat Mar 30 16:00:01 2024
    On 30/03/24 14:08, Jonathan Carter wrote:
    On 2024/03/30 12:43, Sean Whitton wrote:
    On 2024-03-30 08:02:04, Gioele Barabucci wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs
    "source
    deb" and "binary deb" to unstable.

    This change would have three advantages:
    I think everyone fully agrees this is a good thing, no need to list the
    advantages.

    It is also already fully implemented as tag2upload, and is merely as yet
    undeployed, for social reasons.

    My understanding is that DSA aren't quite comfortable with it, since it
    would need to archive GPG signing key (or a keypair trusted by DAK)?

    Don't the buildd already work like in similar way?

    The source deb is signed by the DD, the buildd checks the signature of
    the source deb, then builds and signs the binary debs.

    In the future the tag is signed by the DD, the buildd checks the
    signature of the tag, then builds and signs the source deb and the
    binary debs.

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Luca Boccassi on Sat Mar 30 16:50:01 2024
    Luca Boccassi <bluca@debian.org> writes:

    In the end, massaged tarballs were needed to avoid rerunning autoconfery
    on twelve thousands different proprietary and non-proprietary Unix
    variants, back in the day. In 2024, we do dh_autoreconf by default so
    it's all moot anyway.

    This is true from Debian's perspective. This is much less obviously true
    from upstream's perspective, and there are some advantages to aligning
    with upstream about what constitutes the release artifact.

    When using Meson/CMake/home-grown makefiles there's no meaningful
    difference on average, although I'm sure there are corner cases and exceptions here and there.

    Yes, perhaps it's time to switch to a different build system, although one
    of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not
    sure how one does that in the non-Autoconf build systems. Meson's Porting
    from Autotools [1] page, for example, doesn't seem to address this use
    case at all.

    [1] https://mesonbuild.com/Porting-from-autotools.html

    Maybe the answer is "you should give up on portability to older systems as
    the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to ij@2023.bluespice.org on Sat Mar 30 16:50:01 2024
    Ingo Jürgensmann <ij@2023.bluespice.org> writes:

    This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is
    widely used, insert some malicious code and have fun. Then also imagine
    stuff that automates builds in other ways like docker containers, Ruby,
    Rust, pip that pull stuff from the network and installs it without
    further checks.

    I hope (and am confident) that Debian as a project will react
    accordingly to prevent this happening again.

    Debian has precisely the same problem. We have more work to do than we possibly can do with the resources we have, there is some funding but not
    a lot of funding so most of the work is hobby work stolen from scarce free time, and we're under a lot of pressure to encourage and incorporate the
    work of new maintainers.

    And 99% of the time trusting the people who step up to help works out
    great.

    The hardest part about defending against social engineering is that it
    doesn't attack attack the weakness of a community. It attacks its
    *strengths*: trust, collaboration, and mutual assistance.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Simon Josefsson on Sat Mar 30 17:00:02 2024
    Simon Josefsson <simon@josefsson.org> writes:
    Sean Whitton <spwhitton@spwhitton.name> writes:

    We did some analysis on the SHA1 vulnerabilities and determined that
    they did not meaningfully affect dgit & tag2upload's design.

    Can you share that analysis? As far as I understand, it is possible for
    a malicious actor to create a git repository with the same commit id as
    HEAD, with different historic commits and tree content. I thought a
    signed tag is merely a signed reference to a particular commit id. If
    that commit id is a SHA1 reference, that opens up for ambiguity given
    recent (well, 2019) results on SHA1. Of course, I may be wrong in any
    of the chain, so would appreciate explanation of how this doesn't work.

    I believe you're talking about two different things. I think Sean is
    talking about preimage resistance, which assumes that the known-good
    repository is trusted, and I believe Simon is talking about manufactured collisions where the attacker controls both the good and the bad
    repository.

    The dgit and tag2upload design probably (I'd have to think about it some
    more, ideally while bouncing the problem off of someone else, because I've recycled those brain cells for other things) only needs preimage
    resistance, but the general case of a malicious upstream may be vulnerable
    to manufactured collisions.

    (So far as I know, preimage attacks against *MD5* are still infeasible,
    let alone against SHA-1.)

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Jonathan Carter on Sat Mar 30 16:30:01 2024
    Jonathan Carter <jcc@debian.org> writes:

    On 2024/03/30 11:05, Simon Josefsson wrote:
    1. Move towards allowing, and then favoring, git-tags over source tarballs >>
    Some people have suggested this before -- and I have considered adopting
    that approach myself, but one thing that is often overlooked is that
    building from git usually increase the Build-Depends quite a lot
    compared to building from tarball

    How in the world do you jump to that conclusion?

    By comparing the set of tools required to build from git with the tools installed by Build-Depends* for common projects. I'm thinking of
    projects like coreutils, wget, libidn2, gnutls, gzip, etc.

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZggvIhQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFonaFAPwKrhr8lOLym7COxOTZFVOqoq5KCXAU gwgTTsadxLUCWQD/Thjj5aWEeq787CBGdYbLcLBcUxzZ81mBxOFKf/SaoAU=
    =xjdB
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Stanley@21:1/5 to Russ Allbery on Sat Mar 30 17:10:01 2024
    On 2024-03-29 23:29:01 -0700 (-0700), Russ Allbery wrote:
    [...]
    if the Git repository is somewhere other than GitHub, the
    malicious possibilities are even broader.
    [...]

    I would not be so quick to make the same leap of faith. GitHub is
    not itself open source, nor is it transparently operated. It's a
    proprietary commercial service, with all the trust challenges that
    represents. Long, long before XZ was a twinkle in anyone's eye,
    malicious actors were already regularly getting their agents hired
    onto development teams to compromise commercial software. Just look
    at the Juniper VPN backdoor debacle for a fairly well-documented
    example (but there's strong evidence this practice dates back well
    before free/libre open source software even, at least to the 1970s).

    If anything, compromising an open project or transparent service is
    probably considerably harder, these sorts of people thrive in the
    comfort of shadows that the proprietary software world offers them,
    and (thankfully) struggle in the open, like with the rather quick identification and public response demonstrated in this case. I
    would be quite surprised by similarly rapid or open discussion from
    a proprietary service who discovered a saboteur in their ranks.
    --
    Jeremy Stanley

    -----BEGIN PGP SIGNATURE-----

    iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmYIL5hfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCnCkxAA1ptcyhGOoFKWMAZA4xCMZq9PmVrz1UcWIm++z/qPiyWLU9uiXvqiyNuK vyxH2R709ErVTCQVLdNK7Ty2MUFvPZbTxKjTw65dhBXw5zZMQMRBcJIYfC3IzLwi lBoRF8SDWKrXjkJ4asGxtJAs+/Obfc8d5KKcbmu7xat+1o0wlIb9dD54OsOGn7u5 Rus98RQ432IQXh1Xj1SG2Dph8YJPJVTqKGRKDJt4orBbUwm4m09iRwdSF7qfTfto lEbdCNVpP5zqxm8FtacX25pSjQIeVyvpcFlEI2aCjm+WRcRypbU7AS6yQp6JOfoT CMyukNVgQN+Is5Fvp8QUL+ucVRafKM5Z0HxSZD9kNzf11uxy4cDZ2xP/gyjLydTi fPhHSt6Aav0syrpJ92QveFVOvDkd9kJHKhqAkFw0gQUGIXjGRsctO6zoiwbhSN2n b0U2+arAGhpnjnwXyB1mmSidj97I/aFvfC4mAvoISrmHpEBYr74hFutEFMH7DgAO 441SCalwgGBozT4xBmHhckRw0ngzocJ/UOmlv0eCEwVcNTuQ4CYb31LinBVSrh6A 94IJyhr2Oj6d+pUYH1aHt9BgsKU6YXKnoL4cd9kzKESGL5xg0DoHSTcvBBcYtr80 smYL8uztAupTlTb/w8s6zU7yqbGA6bAM58fbwwHhWiOveBkMlYQ=
    =t0hC
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32
  • From Russ Allbery@21:1/5 to Jeremy Stanley on Sat Mar 30 17:20:01 2024
    Jeremy Stanley <fungi@yuggoth.org> writes:
    On 2024-03-29 23:29:01 -0700 (-0700), Russ Allbery wrote:
    [...]
    if the Git repository is somewhere other than GitHub, the
    malicious possibilities are even broader.
    [...]

    I would not be so quick to make the same leap of faith. GitHub is
    not itself open source, nor is it transparently operated. It's a
    proprietary commercial service, with all the trust challenges that represents. Long, long before XZ was a twinkle in anyone's eye,
    malicious actors were already regularly getting their agents hired
    onto development teams to compromise commercial software. Just look
    at the Juniper VPN backdoor debacle for a fairly well-documented
    example (but there's strong evidence this practice dates back well
    before free/libre open source software even, at least to the 1970s).

    This is a valid point: let me instead say that the malicious possibilities
    are *different*. All of your points about GitHub are valid, but the counterexample I had in mind is one where the malicious upstream runs the entire Git hosting architecture themselves and can make completely
    arbitrary changes to the Git repository freely. I don't think we know everything that is possible to do in that situation. I think it would be difficult (not impossible, but difficult) to get into that position at
    GitHub, whereas it is commonplace among self-hosted projects.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Iustin Pop@21:1/5 to Luca Boccassi on Sat Mar 30 20:50:01 2024
    On 2024-03-30 11:47:56, Luca Boccassi wrote:
    On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:

    On 2024-03-30 08:02:04, Gioele Barabucci wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
    deb" and "binary deb" to unstable.

    This change would have three advantages:

    I think everyone fully agrees this is a good thing, no need to list the advantages.

    The problem is that this requies functionality testing to be fully automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".

    Give me good Salsa support for autopkgtest + lintian + piuparts, and
    easy support (so that I just have to toggle one checkbox), and I'm
    happy. Or even better, integrate all that testing with Salsa (I don't
    know if it has "CI tests must pass before merging"), and block tagging
    on the tagged version having been successfully tested.

    This is all already implemented by Salsa CI? You just need to include
    the yml and enable the CI in the settings

    I will be the first to admit I'm not up to date on latest Salsa news,
    but see, what you mention - "include the yml" - is exactly what I don't
    want.

    If maintainers need to include a yaml file, it means it can vary between projects, which means it can either have bugs or be hijacked. In my
    view, there should be no freedom here, just one setting - "enable
    tag2upload with automated autopkg testing", and all packages would
    behave mostly the same way. But there are 2KiB single-binary packages as
    well as 2GB 25 binary packages, so maybe this is too wide scope.

    I just learned about tag2upload, need to look into that.

    (I'm still processing this whole story, and I fear the fallout/impact
    in terms of how development is regarded will be extremely high.)

    regards,
    iustin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rakhmatullin@21:1/5 to Iustin Pop on Sat Mar 30 21:00:01 2024
    On Sat, Mar 30, 2024 at 10:56:40AM +0100, Iustin Pop wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
    deb" and "binary deb" to unstable.

    This change would have three advantages:

    I think everyone fully agrees this is a good thing, no need to list the advantages.

    The problem is that this requies functionality testing to be fully
    automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".
    Do you mean this theoretical workflow will not have a step of the
    maintainer actually looking at the package and running it locally, or
    running any building or linting locally before pushing the changes?
    Then yeah, looking at some questions in the past years I understand that
    some people are already doing that, powered by Salsa CI (I can think of
    several possible reasons for that workflow but it still frustrates me).

    Give me good Salsa support for autopkgtest + lintian + piuparts, and
    easy support (so that I just have to toggle one checkbox), and I'm
    happy. Or even better, integrate all that testing with Salsa (I don't
    know if it has "CI tests must pass before merging"), and block tagging
    on the tagged version having been successfully tested.
    AFAIK the currently suggested way of enabling that is putting "recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
    file" in the salsa settings (no idea where is the page that tells that or
    how to find it even knowing it exists).

    --
    WBR, wRAR

    -----BEGIN PGP SIGNATURE-----

    iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmYIbvktFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh 7oMQAKB5vD1dcIRNbhN/3ThtkxMcW3TSHr13ugDsTKpZlFavfFJh7E1/cbYupsXi tH0FHlsI1zo4K5KDzjslEZsPMtUOP7zMamwJAwApnAtVwROmWzZbawXqZs9bze5P iS4EdqYrO0oiqRQjzIhjTslplXwFR/Dbra+Ti/dtNSY/edV2Iq/kZ0sP9CzJV6y5 of6IN3U0oen7WTYSZBcGp9yPyzlYYs78sbQbIv2gx2geddHXZz8/NFupN2JKm89P 3F41lQnnT3DofWZ/UcGg+p6xmaza3EDuLmHxMMobYf34GoNMISRO+7JqTvZBYb8z czofGshU+I47dTuEGTHZE0wbzbMOr8G2yb/gfq0fesuFPE3ivQoJRB4l3plJ9IB8 UNhFGkdfzL4A0TCuro1XGf29q6DsKIrgwsPD5hECvrZzAX5zBKsPy7YOiHdb5Mjy 4hbbNzBv2AAXsLESbjENbvqUqLQf4cOdNbHOmGEZBfqAAHBk2BbJKMNn3DYjDDjk TJ4aVuXFpXiA8AJnLbEO+fqpjxrUnkGAerGEX+IxAybShGEWWu1v+totP3WdnUel JkN/aNa7UtwGWip1m2JcsfMRycZGP+L1r8wW+8VJ9WP3Twt7Sk2C8wFWFkuGJJ/D 8bbZrHJttnV1a71vkVw2YjlK6asrh59Al13i4AsGlLMBUDSb
    =5lPZ
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robert Edmonds@21:1/5 to Russ Allbery on Sat Mar 30 21:50:01 2024
    Russ Allbery wrote:
    Yes, perhaps it's time to switch to a different build system, although one
    of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
    case at all.

    [1] https://mesonbuild.com/Porting-from-autotools.html

    Have a look at the documentation for the meson "compiler" object [1]. There is a
    lot of functionality in meson that has analogs in autoconf that isn't described in the "Porting from Autotools" document.

    [1] https://mesonbuild.com/Reference-manual_returned_compiler.html

    --
    Robert Edmonds
    edmonds@debian.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Iustin Pop@21:1/5 to Andrey Rakhmatullin on Sat Mar 30 21:30:01 2024
    On 2024-03-31 00:58:49, Andrey Rakhmatullin wrote:
    On Sat, Mar 30, 2024 at 10:56:40AM +0100, Iustin Pop wrote:
    Now it is time to take a step forward:

    1. new upstream release;
    2. the DD/DM merges the upstream release VCS into the Debian VCS;
    3. the buildd is notified of the new release;
    4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
    deb" and "binary deb" to unstable.

    This change would have three advantages:

    I think everyone fully agrees this is a good thing, no need to list the advantages.

    The problem is that this requies functionality testing to be fully automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".
    Do you mean this theoretical workflow will not have a step of the
    maintainer actually looking at the package and running it locally, or
    running any building or linting locally before pushing the changes?
    Then yeah, looking at some questions in the past years I understand that
    some people are already doing that, powered by Salsa CI (I can think of several possible reasons for that workflow but it still frustrates me).

    Not that it necessarily won't have that step, but how to integrate the
    testing into the tag signing/pushing step.

    I.e. before moving archive wide to "sign tag + push", there should be a standard of how this is all tested for a package. Maybe there is and I'm
    not aware, my Debian activities are very low key (but I try to keep up
    with mailing lists).

    Give me good Salsa support for autopkgtest + lintian + piuparts, and
    easy support (so that I just have to toggle one checkbox), and I'm
    happy. Or even better, integrate all that testing with Salsa (I don't
    know if it has "CI tests must pass before merging"), and block tagging
    on the tagged version having been successfully tested.
    AFAIK the currently suggested way of enabling that is putting "recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
    file" in the salsa settings (no idea where is the page that tells that or
    how to find it even knowing it exists).

    Aha, see, this I didn't know. On my list to test once archive is
    unblocked and I have time for packaging.

    regards,
    iustin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Antonio Russo on Sat Mar 30 22:20:01 2024
    On Fri, Mar 29, 2024 at 06:21:27PM -0600, Antonio Russo wrote:
    ...
    1. Move towards allowing, and then favoring, git-tags over source tarballs
    ...

    git commit IDs, not tags.

    Upstream moving git tags does sometimes happen.

    Usually for bad-but-not-malicious reasons like "add one more last-minute fix", but using tags would also invite to manipulation similar to what
    happened with xz at any point after the release.

    Best,
    Antonio Russo

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to Russ Allbery on Sat Mar 30 23:20:01 2024
    Russ Allbery <rra@debian.org> writes:

    Simon Josefsson <simon@josefsson.org> writes:
    Sean Whitton <spwhitton@spwhitton.name> writes:

    We did some analysis on the SHA1 vulnerabilities and determined that
    they did not meaningfully affect dgit & tag2upload's design.

    Can you share that analysis? As far as I understand, it is possible for
    a malicious actor to create a git repository with the same commit id as
    HEAD, with different historic commits and tree content. I thought a
    signed tag is merely a signed reference to a particular commit id. If
    that commit id is a SHA1 reference, that opens up for ambiguity given
    recent (well, 2019) results on SHA1. Of course, I may be wrong in any
    of the chain, so would appreciate explanation of how this doesn't work.

    I believe you're talking about two different things. I think Sean is
    talking about preimage resistance, which assumes that the known-good repository is trusted, and I believe Simon is talking about manufactured collisions where the attacker controls both the good and the bad
    repository.

    Right. I think the latter describes the xz scenario: someone could have
    pushed a maliciously crafted commit with a SHA1 collision commit id, so
    there are two different git repositories with that commit id, and a
    signed git tag on that commit id authenticates both trees, opening up
    for uncertainty about what was intended to be used. Unless I'm missing
    some detail of how git signed tag verification works that would catch
    this.

    The dgit and tag2upload design probably (I'd have to think about it some more, ideally while bouncing the problem off of someone else, because I've recycled those brain cells for other things) only needs preimage
    resistance, but the general case of a malicious upstream may be vulnerable
    to manufactured collisions.

    It is not completely clear to me: How about if some malicious person
    pushed a commit to salsa, asked a DD to "please review this repository
    and sign a tag to make the upload"? The DD would presumably sign a
    commit id that authenticate two different git trees, one with the
    exploit and one without it.

    /Simon

    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgiNjBQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFopFPAQDCoYbz03xv70Ktsh4BLtVOAfXQC2ON dz4zVJbbBdUihAD/fqlYjipcoBNoIUe7+cHPEQgPN/HnncUgqX/yqqxbMgc=
    =lJtZ
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Russ Allbery on Sat Mar 30 23:20:01 2024
    On Fri, Mar 29, 2024 at 11:29:01PM -0700, Russ Allbery wrote:
    ...
    In other words, we should make sure that breaking the specific tactics
    *this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
    minor shift in attacker tactics. I *think* I'm mostly convinced that
    forcing the attacker into Git commits is a useful partial defense, but I'm not sure this is obviously true.
    ...

    There are also other reasons why using tarballs by default is no longer
    a good option.

    In many cases our upstream source is the unsigned tarball Github
    automatically provides for every tag, which invites MITM attacks.

    The hash of these tarballs is expected to change over time, which makes
    it harder to reliably verify that the upstream sources we have in the
    archive match what is provided upstream.

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Simon Josefsson on Sun Mar 31 00:20:01 2024
    Simon Josefsson <simon@josefsson.org> writes:
    Russ Allbery <rra@debian.org> writes:

    I believe you're talking about two different things. I think Sean is
    talking about preimage resistance, which assumes that the known-good
    repository is trusted, and I believe Simon is talking about
    manufactured collisions where the attacker controls both the good and
    the bad repository.

    Right. I think the latter describes the xz scenario: someone could have pushed a maliciously crafted commit with a SHA1 collision commit id, so
    there are two different git repositories with that commit id, and a
    signed git tag on that commit id authenticates both trees, opening up
    for uncertainty about what was intended to be used. Unless I'm missing
    some detail of how git signed tag verification works that would catch
    this.

    This is also my understanding.

    The dgit and tag2upload design probably (I'd have to think about it
    some more, ideally while bouncing the problem off of someone else,
    because I've recycled those brain cells for other things) only needs
    preimage resistance, but the general case of a malicious upstream may
    be vulnerable to manufactured collisions.

    It is not completely clear to me: How about if some malicious person
    pushed a commit to salsa, asked a DD to "please review this repository
    and sign a tag to make the upload"? The DD would presumably sign a
    commit id that authenticate two different git trees, one with the
    exploit and one without it.

    Oh, hm, yes, this is a good point. I had forgotten that tag2upload was intended to work by pushing a tag to Salsa. This means an attacker can potentially race Salsa CI to move that tag to the malicious tree before
    the tree is fetched by tag from Salsa, or reuse the signed tag with a
    different repository with the same SHA-1.

    The first, most obvious step is that one has to make sure that a signed
    tag is restricted to a specific package and version and not portable to a different package and/or version that has the same SHA-1 hash due to
    attacker construction. There are several obvious ways that could be done;
    the one that comes immediately to mind is to require the tag message be
    the source package name and version number, which is good practice anyway.

    I think any remaining issues could be addressed with a fairly simple modification to the protocol: rather than pushing the signed tag to Salsa,
    the DD reviewer should push the signed tag to a separate archive server
    similar to that used by dgit today. As long as the first time the signed
    tag leaves the DD's system is in conjunction with a push of the
    corresponding reviewed tree to secure project systems, this avoids the substitution problem. The tag could then be pushed back to Salsa, either
    by the DD or by the service.

    This unfortunately means that one couldn't use the Salsa CI service to do
    the source package construction, and one has to know about this extra
    server. I think that restriction comes from the fact that we're worried
    an attacker may be able to manipulate the Salsa Git repository (through
    force pushes and tag replacements, for example), whereas the separate
    dedicated archive server can be more restrictive and never allow force
    pushes or tag moves, and reject any attempts to push a SHA-1 hash that has already been seen.

    Another possible option would be to prevent force pushes and tag moves in Salsa, since I think one of those operations would be required to pull off
    this attack, but maybe I'm missing someting. One of the things I'm murky
    on is exactly what Git operations are required to substitute the two trees
    with identical SHA-1 hashes. That property is going to break Git in weird ways, and I'm not sure what that means for one's ability to manipulate a
    Git repository over the protocols that Salsa exposes.

    Obviously it would be ideal if Git used stronger hashes than SHA-1 for
    tags, so that one need worry less about all of this.

    Even if my analysis is wrong, I think there are some fairly obvious and
    trivial additions to the tag2upload process that would prevent this
    attack, such as building a Merkle tree of the reviewed source tree using a SHA-256 hash and embedding the top hash of that tree in the body of the
    signed tag where it can be verified by the archive infrastructure. That
    might be a good idea *anyway*, although it does have the unfortunate side effect of requiring a local client to produce a correct tag rather than
    using standard Git signed tags. Uploading to Debian currently already semi-requires a custom local client, so to me this isn't a big deal,
    although I think there was some hope to avoid that.

    (These variations unfortunately don't help with the upstream problem.)

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timo =?utf-8?Q?R=C3=B6hling?=@21:1/5 to All on Sun Mar 31 02:00:01 2024
    Hi,

    * Simon Josefsson <simon@josefsson.org> [2024-03-30 12:19]:
    Relying on signed git tags is not reliable because git is primarily >SHA1-based which in 2019 cost $45K to do a collission attack for.
    FWIW, Gitlab is working on support for SHA 256 hashing [1], and as
    of Git 2.42, the SHA 256 repository format has matured enough that
    backwards incompatible breaks are very unlikely [2].


    Cheers
    Timo


    [1]
    https://about.gitlab.com/blog/2023/08/28/sha256-support-in-gitaly/
    [2] https://lore.kernel.org/lkml/xmqqr0nwp8mv.fsf@gitster.g/


    --
    ⢀⣴⠾⠻⢶⣦⠀ ╭────────────────────────────────────────────────────╮
    ⣾⠁⢠⠒⠀⣿⡁ │ Timo Röhling │
    ⢿⡄⠘⠷⠚⠋⠀ │ 9B03 EBB9 8300 DF97 C2B1 23BF CC8C 6BDD 1403 F4CA │
    ⠈⠳⣄⠀⠀⠀⠀ ╰────────────────────────────────────────────────────╯

    -----BEGIN PGP SIGNATURE-----

    iQGzBAEBCgAdFiEEJvtDgpxjkjCIVtam+C8H+466LVkFAmYIpYcACgkQ+C8H+466 LVmYeQv/dmYgZrTqNH4PdT7n4STBYCCLDgyFyZlHV3ox8Vc8wqcLzEGowEJKLzXS EhWqbxze2uSGfE29GC6XZxXsJ+UvBcx8fBfXccr3iu8ycYF28liS0eycHRfK8FN6 hvN/vXbkcFJhPgTRndqLblFXlvHko0s61CorBE/N/nyUb8XxfKtVDAR/CcXEIWZ5 eiICijhAq6vfANoEXmmAg1+PDnRs0R3UPfio5ffWx0yZ7WdbTlB+s4GCIDtDxLH4 YT0Y3XyuWO7nEC3vZ6etyEN/Vj58D8Rxhu4xzGypM6SS9F9DL/GtvA8k8g0kzarA vWCF98a+gRDwoyLtdzSk3PWcx3De2eu83r4PPaDahXs
  • From Gioele Barabucci@21:1/5 to Iustin Pop on Sun Mar 31 08:10:01 2024
    On 30/03/24 20:43, Iustin Pop wrote:
    On 2024-03-30 11:47:56, Luca Boccassi wrote:
    On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:
    Give me good Salsa support for autopkgtest + lintian + piuparts, and
    easy support (so that I just have to toggle one checkbox), and I'm
    happy. Or even better, integrate all that testing with Salsa (I don't
    know if it has "CI tests must pass before merging"), and block tagging
    on the tagged version having been successfully tested.

    This is all already implemented by Salsa CI? You just need to include
    the yml and enable the CI in the settings

    I will be the first to admit I'm not up to date on latest Salsa news,
    but see, what you mention - "include the yml" - is exactly what I don't want.

    Salsa CI is enabled by default for all projects in the debian/ namespace <https://salsa.debian.org/debian/>.

    Adding a yml file or changing the CI settings to reference the Salsa CI pipeline is needed only for projects in team- or maintainer-specific repositories, or when the dev wants to enable additional tests (or configure/block the default tests).

    Regard,

    --
    Gioele Barabucci

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lucas Nussbaum@21:1/5 to Russ Allbery on Sun Mar 31 08:20:01 2024
    On 29/03/24 at 23:29 -0700, Russ Allbery wrote:
    Antonio Russo <antonio.e.russo@gmail.com> writes:
    But, I will definitely concede that, had I seen a commit that changed
    that line in the m4, there's a good chance my eyes would have glazed
    over it.

    This is why I am somewhat skeptical that forcing everything into Git
    commits is as much of a benefit as people are hoping. This particular attacker thought it was better to avoid the Git repository, so that is evidence in support of that approach, and it's certainly more helpful,
    once you know something bad has happened, to be able to use all the Git
    tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be
    force-pushed, and if the Git repository is somewhere other than GitHub,
    the malicious possibilities are even broader.

    We could narrow those possibilities somewhat by maintaining
    Debian-controlled mirrors of upstream Git repositories so that we could detect rewritten history. (There are a whole lot of reasons why I think
    dgit is a superior model for archive management. One of them is that it captures the full Git history of upstream at the point of the upload on Debian-controlled infrastructure if the maintainer of the package bases it
    on upstream's Git tree.)

    I wonder if Software Heritage could help with that part?

    Lucas

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sven Joachim@21:1/5 to Simon Josefsson on Sun Mar 31 09:10:01 2024
    On 2024-03-30 12:19 +0100, Simon Josefsson wrote:

    Gioele Barabucci <gioele@svario.it> writes:

    Just as an example, bootstrapping coreutils currently requires
    bootstrapping at least 68 other packages, including libx11-6 [1]. If
    coreutils supported <nodoc> [2], the transitive closure of its
    Build-Depends would be reduced to 20 packages, most of which in
    build-essential.

    [1]
    https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
    [2] https://bugs.debian.org/1057136

    Coreutils in Debian uses upstream tarballs and does not do a full
    bootstrap build. It does autoreconf instead of ./bootstrap. So the dependencies above is not the entire bootstrapping story to build
    coreutils from git compared to building from tarballs.

    The coreutils bootstrap script fetches files over the network, so it is
    not possible to build the Debian package from upstream git tags. At the
    very least it would lack any translations, and there is also the
    problem of the gnulib submodule.

    Cheers,
    Sven

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefano Zacchiroli@21:1/5 to Lucas Nussbaum on Sun Mar 31 09:40:01 2024
    On Sun, Mar 31, 2024 at 08:16:33AM +0200, Lucas Nussbaum wrote:
    On 29/03/24 at 23:29 -0700, Russ Allbery wrote:
    This is why I am somewhat skeptical that forcing everything into Git commits is as much of a benefit as people are hoping. This particular attacker thought it was better to avoid the Git repository, so that is evidence in support of that approach, and it's certainly more helpful,
    once you know something bad has happened, to be able to use all the Git tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be force-pushed, and if the Git repository is somewhere other than GitHub,
    the malicious possibilities are even broader.

    I wonder if Software Heritage could help with that part?

    Yeah (provided that archival happens at the right moment) you can use
    Software Heritage APIs to detect, for instance, git history rewrites as
    and also commits moving from one branch/tag to another.

    It occurs to me that in the Guix/Nix packaging model, where they note
    down the commit of interest in their packaging recipe, you'll also automatically discover if a commit disappeared from upstream repo
    without needing a lot of extra tooling/integration (although not if it
    has moved between branches). However, you need a backup place to
    retrieve the commit from in case it disappear or gets rewritten upstream
    (Guix uses Software Heritage for this).

    Cheers
    --
    Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._
    Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CTO Software Heritage o o o o /\|^|/\ https://twitter.com/zacchiro . https://mastodon.xyz/@zacchiro '" V "'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Boccassi@21:1/5 to Russ Allbery on Sun Mar 31 13:00:01 2024
    On Sat, 30 Mar 2024 at 15:44, Russ Allbery <rra@debian.org> wrote:

    Luca Boccassi <bluca@debian.org> writes:

    In the end, massaged tarballs were needed to avoid rerunning autoconfery
    on twelve thousands different proprietary and non-proprietary Unix variants, back in the day. In 2024, we do dh_autoreconf by default so
    it's all moot anyway.

    This is true from Debian's perspective. This is much less obviously true from upstream's perspective, and there are some advantages to aligning
    with upstream about what constitutes the release artifact.

    My point is that, while there will be for sure exceptions here and
    there, by and large the need for massaged tarballs comes from projects
    using autoconf and wanting to ship source archives that do not require
    to run the autoconf machinery. And said upstreams might care about
    this because they support backward compatibility with ancient Unix
    stuff and such like (I mean, I _am_ upstream in one project that does
    exactly this for exactly this reason, zeromq, so I understand that
    requirement perfectly well).
    However, we as in Debian do not have this problem. We can and do
    re-run the autoconf machinery on every build. And at least on the main
    forges, the autogenerated (and thus out of reach from this kind of
    attacks) tarball is always present too - the massaged tarball is an
    _addition_, not a _substitution_. Hence: we should really really think
    about forcing all packages, by policy, to use the autogenerated
    tarball by default instead of the autoconf one, when both are present,
    unless extenuating circumstances (that have to be documented) are
    present.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Iustin Pop@21:1/5 to Gioele Barabucci on Sun Mar 31 13:30:01 2024
    On 2024-03-31 08:03:40, Gioele Barabucci wrote:
    On 30/03/24 20:43, Iustin Pop wrote:
    On 2024-03-30 11:47:56, Luca Boccassi wrote:
    On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:
    Give me good Salsa support for autopkgtest + lintian + piuparts, and easy support (so that I just have to toggle one checkbox), and I'm happy. Or even better, integrate all that testing with Salsa (I don't know if it has "CI tests must pass before merging"), and block tagging on the tagged version having been successfully tested.

    This is all already implemented by Salsa CI? You just need to include
    the yml and enable the CI in the settings

    I will be the first to admit I'm not up to date on latest Salsa news,
    but see, what you mention - "include the yml" - is exactly what I don't want.

    Salsa CI is enabled by default for all projects in the debian/ namespace <https://salsa.debian.org/debian/>.

    Adding a yml file or changing the CI settings to reference the Salsa CI pipeline is needed only for projects in team- or maintainer-specific repositories, or when the dev wants to enable additional tests (or configure/block the default tests).

    That sounds good, but are you sure that all /debian/ projects get it?

    I chose one random package of mine, https://salsa.debian.org/debian/python-pyxattr, and on the home page I
    see "Setup CI/CD" (implying it's disabled), and under build, I see
    nothing enabled.

    Is there a howto somewhere? Happy to read/follow.

    iustin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Luca Boccassi on Sun Mar 31 19:30:01 2024
    On Sat, Mar 30, 2024 at 11:55:04AM +0000, Luca Boccassi wrote:
    ...
    In the end, massaged tarballs were needed to avoid rerunning
    autoconfery on twelve thousands different proprietary and
    non-proprietary Unix variants, back in the day. In 2024, we do
    dh_autoreconf by default so it's all moot anyway.
    ...

    The first step of the xz exploit was in a vendored gnulib m4 file that
    is not (and should not be) in git and that does not get updated by dh_autoreconf.

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Luca Boccassi on Sun Mar 31 19:20:01 2024
    Luca Boccassi <bluca@debian.org> writes:
    On Sat, 30 Mar 2024 at 15:44, Russ Allbery <rra@debian.org> wrote:
    Luca Boccassi <bluca@debian.org> writes:

    In the end, massaged tarballs were needed to avoid rerunning
    autoconfery on twelve thousands different proprietary and
    non-proprietary Unix variants, back in the day. In 2024, we do
    dh_autoreconf by default so it's all moot anyway.

    This is true from Debian's perspective. This is much less obviously
    true from upstream's perspective, and there are some advantages to
    aligning with upstream about what constitutes the release artifact.

    My point is that, while there will be for sure exceptions here and
    there, by and large the need for massaged tarballs comes from projects
    using autoconf and wanting to ship source archives that do not require
    to run the autoconf machinery.

    Just as a data point, literally every C project for which I am upstream
    ships additional files in the release tarballs that are not in Git for
    reasons unrelated to Autoconf and friends.

    Most of this is pregenerated documentation (primarily man pages generated
    from POD), but it also includes generated test data and other things. The reason is similar: regenerating those files requires tools that may not be present on an older system (like a mess of random Perl modules) or, in the
    case of the man pages, may be old and thus produce significantly inferior output.

    However, we as in Debian do not have this problem. We can and do re-run
    the autoconf machinery on every build. And at least on the main forges,
    the autogenerated (and thus out of reach from this kind of attacks)
    tarball is always present too - the massaged tarball is an _addition_,
    not a _substitution_. Hence: we should really really think about forcing
    all packages, by policy, to use the autogenerated tarball by default
    instead of the autoconf one, when both are present, unless extenuating circumstances (that have to be documented) are present.

    I think this is probably right as long as by "autogenerated" you mean
    basing the Debian package on a signed upstream Git tag and *locally*
    generating a tarball to satisfy Debian's .orig.tar.gz requirement, not
    using GitHub's autogenerated tarball that has all sorts of other potential issues.

    Just to note, though, this means that we lose the upstream signature in
    the archive. The only place the upstream signature would then live is in Salsa.

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco d'Itri@21:1/5 to Russ Allbery on Mon Apr 1 00:00:02 2024
    On Mar 31, Russ Allbery <rra@debian.org> wrote:

    Most of this is pregenerated documentation (primarily man pages generated from POD), but it also includes generated test data and other things. The reason is similar: regenerating those files requires tools that may not be present on an older system (like a mess of random Perl modules) or, in the case of the man pages, may be old and thus produce significantly inferior output.
    But we do not use older systems to build our packages, so this does not matter.

    Indeed, long ago I started building inn2 from the git tree, no more tarballs...
    I switched long ago all my packages from tar archives to the git
    upstream tree. Not only this makes much easier to understand the changes
    in a new release, but it also makes possible packaging upstream
    snapshots.

    Just to note, though, this means that we lose the upstream signature in
    the archive. The only place the upstream signature would then live is in Salsa.
    Totally worth it!

    --
    ciao,
    Marco

    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZgncuAAKCRDLPsM64d7X gYHvAP4qd/cIo+k0EKjkZp3A2CpTxMH5DUY7hMba9Q8yLvdxQwEAr4oNEFmToIe3 pGHL/B6Gxd2YQchnVjVCdxmw5wisqgk=
    =45VN
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefano Rivera@21:1/5 to All on Mon Apr 1 00:40:01 2024
    Hi Guillem (2024.03.30_04:41:37_+0000)
    1. Move towards allowing, and then favoring, git-tags over source tarballs

    I assume you mean git archives out of git tags? Otherwise how do you
    go from git-tag to a source package in your mind?

    There are some issues with transforming upstream's git-centric world
    into tarballs for Debian source packages, that are worth bearing in mind.

    The upstream git repository has some extra metadata available that
    upstream build tools start depending on. Things like: versions, tracked
    files, and ignored files.

    This came up in the Python world, where setuptools-scm has become more
    popular over the years. This is a plugin for setuptools that extracts
    some metadata from the git repository:
    1. Determine the current version. Historically, specified in setup.py.
    2. Determine the data files that should be shipped in the installed
    package. Historically, these were specified in a MANIFEST.in file,
    but developers got lazy and delegated this problem to git.

    Currently we set the version for packages that depend on 1 by an
    environment variable that setuptools-scm will consume.

    For packages that get file lists from git, it's a little more complex. setuptools writes a foo.eggi-info/SOURCES.txt into source artifacts that
    it produces (sdists). When this file is present, it's used as a list of
    files. https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest

    So... for Python packages using setuptools-scm, we're pushed towards
    depending on upstream-created source tarballs (sdists), rather than
    upstream git archives, because we don't have the ".git" directory in our
    source packages.

    I can imagine that other ecosystems would run into similar problems and
    solve them by inventing similar protocols, if they solve them at all.
    Upstreams would probably prefer that we used git repositories *directly*
    as source artifacts, but that comes with a whole other can of worms...

    Stefano
    --
    Stefano Rivera
    http://tumbleweed.org.za/
    +1 415 683 3272

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gregor herrmann@21:1/5 to Russ Allbery on Mon Apr 1 02:30:01 2024
    On Sun, 31 Mar 2024 10:12:35 -0700, Russ Allbery wrote:

    My point is that, while there will be for sure exceptions here and
    there, by and large the need for massaged tarballs comes from projects using autoconf and wanting to ship source archives that do not require
    to run the autoconf machinery.
    Just as a data point, literally every C project for which I am upstream
    ships additional files in the release tarballs that are not in Git for reasons unrelated to Autoconf and friends.

    This is also true for every perl distribution on the CPAN made with
    the standard build tools (and I write this as a response to a mail of
    yours as I know that you know what I'm talking about :))

    Just to note, though, this means that we lose the upstream signature in
    the archive. The only place the upstream signature would then live is in Salsa.

    This also means that we are, at least in some ecosystems, diverging
    from the preferred way of distribution, and maybe more important, that
    we are adding a new step 0 to our build process, which is: making a
    (fake) upstream release.


    Cheers,
    gregor

    --
    .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org
    : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06
    `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
    `-

    -----BEGIN PGP SIGNATURE-----

    iQKTBAEBCgB9FiEE0eExbpOnYKgQTYX6uzpoAYZJqgYFAmYJ/5ZfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEQx RTEzMTZFOTNBNzYwQTgxMDREODVGQUJCM0E2ODAxODY0OUFBMDYACgkQuzpoAYZJ qgZdPxAAsMdU0WtwQpUV7E/utJX5OPuWZtJ7C0ooN3fNnEa2ONAJ7wqbi43GgfmW SWhiY1ngm/vhAYbSKcGMx4Hpy/82Qw+yeGHIX5ZgPEEw34fGh0IqIU8MK7EPTu08 1quNzyc0/7xr1VGrowRCjk5FxtvifNAaLgnytYBBckdPJJcUmq/o9siFxfGG+EUW MK9rT3DKsPF4/8ML4MzmyB55rLZVvpWRlT1iorFU4/rO2BV3M/qNYk1D09Vnq5+B XQtnJbjbqlFj7f8KBi0WsfdL1MmWIQqj70oF0nQIYjv/u31/KHEd8Nps8dUV2K8f 5wHnUicGCIUocy+pt/XL3v4m2GRoyA6To2LSvIN8BcholfnxjW6jHL+0MPBhhg/T 4YhyjiPv695TlmvjrK1RwO24RB6mmEp4DwuYPyRkGit4E85Ppx5J4tfV4ovasp3b GLmJrKnkIVWyXAoG3fjZ8o0qiDqXdrNC/aq2fNKXb09Xs1w+AiQLw/6higKGK/uS
    V64ksDi8
  • From gregor herrmann@21:1/5 to Marco d'Itri on Mon Apr 1 02:40:01 2024
    On Sun, 31 Mar 2024 23:59:20 +0200, Marco d'Itri wrote:

    I switched long ago all my packages from tar archives to the git
    upstream tree. Not only this makes much easier to understand the changes
    in a new release,

    That's not mutually exclusive. When adding an additional git remote
    and using gbp-import-orig's --upstream-vcs-tag you get the best of
    both worlds.


    Cheers,
    gregor

    --
    .''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org
    : :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06
    `. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
    `-

    -----BEGIN PGP SIGNATURE-----

    iQKTBAEBCgB9FiEE0eExbpOnYKgQTYX6uzpoAYZJqgYFAmYKAH1fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEQx RTEzMTZFOTNBNzYwQTgxMDREODVGQUJCM0E2ODAxODY0OUFBMDYACgkQuzpoAYZJ qgafShAAmS5NLCiVR9n5T91jNV98xz+KSJbkdkWcdUlaV64JdgDGrfMeB+P/5eus nxdyYQ5ZFOIvadBftvoZhxJspR1F/zsoFNvG/Hn3018cqOQDjw4DULF+TH/pv0k6 nrFl6EDJmWwQr2Tin1I5QhjAUb/93QK8twryixzTjtxU5z84JTHvQ1RpdGfq/5sA SbK14phLJg7XQIWl/nyS9IlhIkFMITPhX64Zwh6rtu3WI6Y7Ha/MucKFcbzd1cCt 88O36MhHT0fWgo1WdOhfB2wkyz/hkdNga4s4KlHMbGLzCtXUV+JmzN7a5ad5Db8/ GaYXWUwDT9bBdogf1gLAnrSOgrdKbpFfIoWuiOTCLZm/9K1ojRXtEWm3S+eh57eB NvNmpJ9h9ZF4cLGDuFB2fJ6yVAYf2sczETUkZ1ixWtIsHlztZJ2ZHPzo7mmwA+pl Ho9Y7mg+7gIEhHQjt9eh4a180NHmjYDr0rFI2HHM63LanOiLqu7okSLxw5I9kf39 QGOODr9SJyQUhAEM8pB4kMQd++wN1BNTJzusfGcTdb5PcybJhiTnIBcoVGVMPmfx H0sVAsHEJ9SbP5q4BsagOoz2vH9ZfjOr8LKgBMQM0JXH+IxjR8qKobaN50xdy3Ia cJGvtgQFUh3yY/YgK1DekPt2oLUdByzF8v4rH+YJok6lhAGUVKM=
    =3X39
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco d'Itri@21:1/5 to gregor herrmann on Mon Apr 1 02:50:01 2024
    On Apr 01, gregor herrmann <gregoa@debian.org> wrote:

    I switched long ago all my packages from tar archives to the git
    upstream tree. Not only this makes much easier to understand the changes in a new release,
    That's not mutually exclusive. When adding an additional git remote
    and using gbp-import-orig's --upstream-vcs-tag you get the best of
    both worlds.
    No: I get nothing of value by doing that and the repository will be
    cluttered by commits that I do not care about.
    Also: upstream VCS snapshots.

    --
    ciao,
    Marco

    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZgoCewAKCRDLPsM64d7X gWYGAP9i2JFwsyEL+GDO0XNEm4DiGNO4OuhEgxdZisfR17kYLgD7BKkX8iF5nikR so7kKb6E+wVQY+rGS0jC2fZLRGsfIw0=
    =Cb1u
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Josefsson@21:1/5 to G. Branden Robinson on Mon Apr 1 11:40:01 2024
    "G. Branden Robinson" <g.branden.robinson@gmail.com> writes:

    At 2024-03-31T22:32:49+0000, Stefano Rivera wrote:
    Upstreams would probably prefer that we used git repositories
    *directly* as source artifacts, but that comes with a whole other can
    of worms...

    Speaking from my upstream groff perspective, I wouldn't _prefer_ that.

    The distribution archives get build-testing on a much wider variety of systems, thanks to people on the groff@ and platform-testers@gnu mailing lists that help out when a release candidate is announced. They have
    access to platforms more exotic that I and a few other bleeding-edge
    HEAD mavens do. This practice tangibly improved the quality of the
    groff 1.23.0 release, especially on surviving proprietary Unix systems.

    Building from the repo, or using the bootstrap script--which Colin
    Watson just today ensured will be in future distribution archives--is fine.[1] I'm glad some people build the project that way. But I think
    that procedure serves an audience that is distinguishable in some ways.

    Running ./bootstrap in a tarball may lead to different results than the maintainer running ./bootstrap in pristine git. It is the same problem
    as running 'autoreconf -fvi' in a tarball does not necessarily lead to
    the same result as the maintainer running 'autoreconf -fvi' from
    pristine git. The different is what is pulled in from the system
    environment. Neither tool was designed to be run from within a tarball,
    so this is just bad practice that never worked reliable and without a
    lot of complexity it will likely not become reliable either.

    I have suggested before that upstream's (myself included) should publish PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
    checkout including submodules, *.po translations, and whatever else is
    required to actually build the project that is normally pulled in from
    external places (autoconf archive macros?). This *-src.tar.gz tarball
    should be possible to ./bootstrap and that would be the intended way to
    build it for people who care about vendored files. Thoughts? Perhaps I
    should formalize this proposal a bit more.

    /Simon

    Regards,
    Branden

    [1] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=822fef56e9ab7cbe69337b045f6f20e32e25f566


    -----BEGIN PGP SIGNATURE-----

    iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgp/UhQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFoiUWAQDIvyUyTmCrKuXxAmewmrl1jNjTQxqM p7/Jnt1S6EJgHAD8DTt4SaMRiIUVDBdU+dqw2tZXqAWXLD2rOON3I32uOwo=
    =ey1a
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to gregor herrmann on Mon Apr 1 12:30:01 2024
    On Mon, Apr 01, 2024 at 02:31:47AM +0200, gregor herrmann wrote:
    That's not mutually exclusive. When adding an additional git remote
    and using gbp-import-orig's --upstream-vcs-tag you get the best of
    both worlds.

    And this will error out if there are unexpected changes in the tarball?
    How will it be able to detect those?

    Bastian

    --
    I've already got a female to worry about. Her name is the Enterprise.
    -- Kirk, "The Corbomite Maneuver", stardate 1514.0

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco d'Itri@21:1/5 to Simon McVittie on Sat Apr 6 09:50:30 2024
    On Apr 05, Simon McVittie <smcv@debian.org> wrote:

    I find that having the upstream source code in git (in the same form that
    we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely
    useful tool, because it lets me trace the history of all of the files
    that we are treating as source - whether hand-written or autogenerated -
    if I want to do that. If we are concerned about defending against actively
    I agree: it would be untinkable for me to not have the complete history immediately available while I am working on a package.

    --
    ciao,
    Marco

    -----BEGIN PGP SIGNATURE-----

    iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZhB+3AAKCRDLPsM64d7X gdu6APoCqdGmPT1UdT+L01aHOMFBy5Xv/T9ezI7tI2GBan3moAEA9fGQzWKAEWNA xwf4g9igF4XWh5ekqR+4xd9wCRE9BA0=
    =CvXJ
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Simon McVittie on Sat Apr 6 09:50:39 2024
    On Fri, Apr 05, 2024 at 03:19:23PM +0100, Simon McVittie wrote:
    I find that having the upstream source code in git (in the same form that
    we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely
    useful tool, because it lets me trace the history of all of the files
    that we are treating as source - whether hand-written or autogenerated -
    if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high success rate, but I think we are certainly not going to be able to achieve
    it if we reject tools like git that could make it easier.

    Strongly agree. For many many things I rely heavily on having the
    upstream source code available in the same working tree when doing any
    kind of archaeology across Debian package versions, which is something I
    do a lot.

    I would hate to see an attacker who relied on an overloaded maintainer
    push us into significantly less convenient development setups, thereby increasing the likelihood of overload.

    In the "debian/ only" workflow, the Debian delta is exactly the contents
    of debian/. There is no redundancy, so every tree is in some sense a
    valid one (although of course sometimes patches will fail to apply, or whatever).

    I'd argue that this, and the similar error case in patches-unapplied, is symmetric with the error case in the patches-applied workflow (although
    it's true that there is redundancy in _commits_ in the latter case).

    --
    Colin Watson (he/him) [cjwatson@debian.org]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Stanley@21:1/5 to Russ Allbery on Sat Apr 6 09:51:30 2024
    On 2024-04-02 16:44:54 -0700 (-0700), Russ Allbery wrote:
    [...]
    I think a shallow clone of depth 1 is sufficient, although that's not sufficient to get the correct version number from Git in all cases.
    [...]

    Some tools (python3-reno, for example) want to inspect the commits
    and historical tags on branches, in order to do things like
    assembling release notes documents. I don't know if any reno-using
    projects packaged in Debian get release notes included, but if they
    do then shallow clones would break that process. The python3-pbr
    plugin also wants to look at commit messages on the current branch
    since the most recent tag if its SemVer-based version-guessing kicks
    in (typically if the current commit isn't tagged and the version
    string hasn't been overridden with an envvar).
    --
    Jeremy Stanley

    -----BEGIN PGP SIGNATURE-----

    iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmYMoGhfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCn1xQ//VzCp3Z7kzqiJ95JPaYNHtAGFvYa6YyqwkiyD/J7T5uvFzHbNKzJTtxlL P8F7oGsbNSvB7rbzGYOG4wUwdO2CW6leG5Ta8cNnFSCOth4ORq3IQebxa8ssgFOc oK9r9D1cgpxH7ouNeEoyTbkoDUIve3LVIop7oX7EOGUWGmSFwO3PrRAUwTE1NMTl NKUoROuJgQHtEimLI7/qrjhfh79AHKW3L6kUrvJt6F2ad6D/j03Ku/2GFZLxGNKV f52vc/wNYAiKfoIgSLo6pJQv+2l/vZiwRyoTUjWCEMcVw0vJVlNzsh3HKMTlfMKU JbnqsHNVNH2p36s0Z+m0PErOm/GndB111+qNHx136HF2G8TcX7beLmY4/2xVSwBU wJo1UVG2ALzquDTtLXBwEW5rK3FDkCibqGm+mWbX7m94Kku+y7gynXlmjpiICpqt ufekiTipJ4UPIClD+baQ3T4vKxi2n+b9BiHlA4+LsKp9S7BpQ+AuBDu7FqHY5TKT QyK+A/5FeuGdAQ9hRea4x7AHG/R+AkPmItLb2bdN7ZHr7ZkbbpGCwEra7XF0wS7O PsBiiLP6r8GwWf30tAqpZGbg6JG45lTfQveWfONwUH7Rv2wIvjCshp3UXfj/8dxi Y+OPLM/sjrFbIiG+5dArD0lp0mVy+LT1agUHmoi47c5a8ysofx8=
    =EODa
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32
  • From Russ Allbery@21:1/5 to Adrian Bunk on Sat Apr 6 09:51:32 2024
    Adrian Bunk <bunk@debian.org> writes:
    On Mon, Apr 01, 2024 at 11:17:21AM -0400, Theodore Ts'o wrote:

    Yeah, that too. There are still people building e2fsprogs on AIX,
    Solaris, and other legacy Unix systems, and I'd hate to break them, or
    require a lot of pain for people who are building on MacPorts, et. al.
    ...

    Everything you mention should already be supported by Meson.

    Meson honestly sounds great, and I personally love the idea of using a
    build system whose language is a bit more like Python, since I use that language professionally anyway. (It would be nice if it *was* Python
    rather than yet another ad hoc language, but I also get why they may want
    to restrict it.)

    The prospect of converting 25 years of portability code from M4 into a new language is daunting, however. For folks new to this ecosystem, what
    resources are already available? Are there large libraries of tests
    already out there akin to gnulib and the Autoconf Archive? Is there a
    really good "porting from Autotools" guide for Meson that goes beyond the
    very cursory guide in the Meson documentation?

    The problem with this sort of migration is that it is an immense amount of
    work just to get back to where you started. I look at the amount of
    effort and start thinking things like "well, if I'm going to rewrite a
    bunch of things anyway, maybe I should just rewrite the software in Rust instead."

    --
    Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Bastian Blank on Sat Apr 6 09:51:53 2024
    On Mon, Apr 01, 2024 at 12:03:48PM +0200, Bastian Blank wrote:
    On Mon, Apr 01, 2024 at 02:31:47AM +0200, gregor herrmann wrote:
    That's not mutually exclusive. When adding an additional git remote
    and using gbp-import-orig's --upstream-vcs-tag you get the best of
    both worlds.
    And this will error out if there are unexpected changes in the tarball?
    How will it be able to detect those?

    Okay, I looked into what it does. It just adds another parent to the
    commit with the import of the tar. It does nothing else with this
    information.

    So in the end you still need to manually review all the stuff that the
    tarball contains extra to the git. And for that I don't see that it
    actually gives some helping hands and makes it easier.

    So I really don't see how this makes the problem in hand any better.
    Again the workload of review is on the person doing the job. Aka we do
    fragile manual work instead of possibly failing automatic work.

    Bastian

    --
    Women professionals do tend to over-compensate.
    -- Dr. Elizabeth Dehaver, "Where No Man Has Gone Before",
    stardate 1312.9.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Vincent Bernat on Sat Apr 6 09:51:52 2024
    On Mon, Apr 01, 2024 at 06:36:30PM +0200, Vincent Bernat wrote:
    On 2024-04-01 12:44, Bastian Blank wrote:
    So in the end you still need to manually review all the stuff that the tarball contains extra to the git. And for that I don't see that it actually gives some helping hands and makes it easier.

    So I really don't see how this makes the problem in hand any better.
    Again the workload of review is on the person doing the job. Aka we do fragile manual work instead of possibly failing automatic work.

    I think that if Debian was using git instead of the generated tarball, this part of the backdoor would have just been included in the git repository as well. If we were able to magically switch everything to git (and we won't,
    we are not even able to agree on simpler stuff), I don't think it would have prevented the attack.

    Nothing prevents such an attack. Prevent would be a 100% fix, which can
    not exist. However what we can do is to make it harder to pull off.

    If they had been forced to commit all the activation code into the repo,
    it would have been directly visible for everyone. But instead, they
    choose to only ship it in the tarballs.

    That's why I asked if this would make it better, by removing this manual
    review task from the maintainer.

    Bastian

    --
    I object to intellect without discipline; I object to power without constructive purpose.
    -- Spock, "The Squire of Gothos", stardate 2124.5

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Colin Watson@21:1/5 to Simon Josefsson on Sat Apr 6 09:52:11 2024
    On Mon, Apr 01, 2024 at 05:24:45PM +0200, Simon Josefsson wrote:
    Colin Watson <cjwatson@debian.org> writes:
    On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:
    Running ./bootstrap in a tarball may lead to different results than the
    maintainer running ./bootstrap in pristine git. It is the same problem
    as running 'autoreconf -fvi' in a tarball does not necessarily lead to
    the same result as the maintainer running 'autoreconf -fvi' from
    pristine git. The different is what is pulled in from the system
    environment. Neither tool was designed to be run from within a tarball, >> so this is just bad practice that never worked reliable and without a
    lot of complexity it will likely not become reliable either.

    The practice of running "autoreconf -fi" or similar via dh-autoreconf
    has worked extremely well at scale in Debian. I'm sure there are
    complex edge cases where it's caused problems, but it's far from being a disaster area.

    Agreed. I'm saying it doesn't fix the problem that I perceive that some people appear to believe, i.e., that running 'autoreconf -fi' solves the re-bootrapping problem.

    Indeed - I've been pointing this out to people pretty much since the
    xz-utils backdoor was discovered.

    I have suggested before that upstream's (myself included) should publish >> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
    checkout including submodules,

    A while back I contributed support to Gnulib's bootstrap script to allow pinning particular commits without using submodules. I would recommend this mode; submodules have very strange UI.

    I never liked git submodules generally, so I would be happy to work on getting that to be supported -- do you have pointers for earlier works
    here?

    https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00029.html and
    thread - it's been in gnulib for some years. (I think you may have
    misread me as saying that I'd tried to contribute this and that it never
    made it, or something like that?)

    What is necessary, I think, is having something like this in
    bootstrap.conf:

    gnulib_commit_id = 123abc567...

    This is what I implemented, except I spelled it GNULIB_REVISION. Then
    see e.g.
    https://gitlab.com/libpipeline/libpipeline/-/blob/main/bootstrap.conf.

    As I noted in a comment on your blog, I think there is a case to be made for .po files being committed to upstream git, and I'm not fond of the practice of pulling them in only at bootstrap time (although I can understand why that's come to be popular as a result of limited
    maintainer time). I have several reasons to believe this:

    Those are all good arguments, but it still feels backwards to put these
    files into git. It felt so good to externalize all the translation
    churn outside of my git (or then, CVS...) repositories many years ago.

    I would prefer to maintain a po/SHA256SUMS in git and continue to
    download translations but have some mechanism to refuse to continue if
    the hashes differ.

    I wonder if a middle ground would be automated commits of translations.
    I don't think that's as robust, but a number of projects do it (e.g.
    d-i) and at least it's amenable to having translations go through CI
    rather than just being YOLOed straight into release tarballs.

    --
    Colin Watson (he/him) [cjwatson@debian.org]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Russ Allbery on Sat Apr 6 09:52:16 2024
    On Sat, Mar 30, 2024 at 08:44:36AM -0700, Russ Allbery wrote:
    Luca Boccassi <bluca@debian.org> writes:

    In the end, massaged tarballs were needed to avoid rerunning autoconfery
    on twelve thousands different proprietary and non-proprietary Unix variants, back in the day. In 2024, we do dh_autoreconf by default so
    it's all moot anyway.

    This is true from Debian's perspective. This is much less obviously true from upstream's perspective, and there are some advantages to aligning
    with upstream about what constitutes the release artifact.

    My upstream perspective is that I've burned repeatedly with
    incompatible version changes in autotools programs which causes my configure.{in,ac} file to no longer create a working configure script,
    or which causes subtle breakages. So my practice is to use autoconf
    on my Debian testing development system before checking in the
    configure.ac and configure files --- but I ship the generated files
    and I don't tell people to run autoreconf before running ./configure.
    And if things break after they run autoreconf, I tell them, "you ran autoreconf; you get to keep both pieces".

    And there *have* been times when autoconf has gotten updated in Debian
    testing, and the resulting configure script has broken, at which point
    I curse at autotools, and fix the configure.ac and/or aclocal.m4
    files, etc., and *then* check in the generated configure file and
    autotool source files.

    Yes, perhaps it's time to switch to a different build system, although one
    of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
    case at all.

    The other problem is that many of the other build systems are much
    slower than autoconf/makefile. (Note: I don't use libtool, because
    it's so d*mn slow.) Or building the alternate system might require a
    major bootstrapping phase, or requires downloading a JVM, etc.

    Maybe the answer is "you should give up on portability to older systems as the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.

    Yeah, that too. There are still people building e2fsprogs on AIX,
    Solaris, and other legacy Unix systems, and I'd hate to break them, or
    require a lot of pain for people who are building on MacPorts, et. al.
    It hasn't been *all* that long ago that I started require C99
    compilers....

    That being said, if someone who was worried about an Jia Tan-style
    attack with e2fsprogs, first of all, you can verify that configure
    corresponds to autoconf on the Debian testing at the time when the
    archive was generated, and the officially released tar file is
    generated via:

    git archive --prefix=e2fsprogs-${ver}/ ${commit} | gzip -9n > $fn

    ... and the release tarballs are also in the pristine-tar branch of
    e2fsprogs. So even if kernel.org (preferred) and sourceforget.net
    (legacy) servers for the e2fsprogs tar files completely implodes, and
    you only have access to the git repo, you can still get the original
    e2fsprogs tar files using pristine-tar.

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Guillem Jover on Sat Apr 6 09:52:20 2024
    On Sat, 30 Mar 2024 at 14:16:21 +0100, Guillem Jover wrote:
    in my mind this incident reinforces my view that precisely storing
    more upstream stuff in git is the opposite of what we'd want, and
    makes reviewing even harder, given that in our context we are on a
    permanent fork against upstream, and if you include merge commits and similar, there's lots of places to hide stuff. In contrast storing
    only the packaging bits (debian/ dir alone) like pretty much every
    other downstream is doing with their packaging bits, makes for an
    obviously more manageable thing to review and not get drown into,
    more so if we have to consider that next time perhaps the long-game
    gets played within Debian.

    I'd like to push back against this, because I'm not convinced by this reasoning, and I'd like to provide another point of view to consider.

    I find that having the upstream source code in git (in the same form that
    we use for the .orig.tar.*, so including Autotools noise, etc. if present,
    but excluding any files that we exclude by repacking) is an extremely
    useful tool, because it lets me trace the history of all of the files
    that we are treating as source - whether hand-written or autogenerated -
    if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high
    success rate, but I think we are certainly not going to be able to achieve
    it if we reject tools like git that could make it easier.

    Am I correct to say that you are assuming here that we have a way to
    verify the upstream source code out-of-band (therefore catching the xz
    backdoor is out-of-scope here), and what you are aiming to detect here
    is malicious changes that exist inside the Debian delta, more precisely
    the dpkg-source 1.0 .diff.gz or 3.0 (quilt) .debian.tar.*? If that's your threat model, then I don't think any of the modes that dgit can cope with
    are actually noticeably more difficult than a debian/-only git repo.

    As my example of a project that applies patches, I'm going to use
    bubblewrap, which is a small project and has a long-standing patch that
    changes an error message in bubblewrap.c to point to Debian-specific documentation; this makes it convenient to tell at a glance whether bubblewrap.c is the upstream version or the Debian version.

    There are basically three dgit-compatible workflows, with some minor adjustments around handling of .gitignore files:

    - "patches applied" (git-debrebase, etc.):
    This is the workflow that proponents of dgit sometimes recommend,
    and dgit uses it as its canonicalized internal representation of
    the package.
    The git tree is the same as `dpkg-source -x`, with upstream source code
    included, debian/ also included, and any Debian delta to the upstream
    source pre-applied to those source files.
    In the case of bubblewrap, if we used this workflow, after you clone
    the project, bubblewrap.c would already have the Debian-specific error
    message.
    (dgit --split-view=never or dgit --quilt=dpm)

    - "patches unapplied" (gbp pq, quilt, etc.):
    This is the workflow that many of the big teams use (at least Perl,
    Python, GNOME and systemd), and is the one that bubblewrap really uses.
    The git tree is the same as `dpkg-source -x --skip-patches`, with
    upstream source code included, and debian/ also included.
    Any Debian delta to the upstream source is represented in debian/patches
    but is *not* pre-applied to the source files: for example, in the case
    of bubblewrap, after you clone
    https://salsa.debian.org/debian/bubblewrap.git and view bubblewrap.c,
    it still has the upstream error message, not the Debian-specific one.
    (dgit --quilt=gbp or dgit --quilt=unapplied; I use the latter)

    - debian/ only:
    This is what you're advocating above.
    The git tree contians only debian/. If there is Debian delta to the
    upstream source, it is in debian/patches/ as usual.
    (dgit --quilt=baredebian* family)

    In the "patches applied" workflow, the Debian delta is something like
    `git diff upstream/VERSION..debian/latest`, where upstream/VERSION must
    match the .orig.tar.* and debian/latest is the packaging you are reviewing.
    Not every tree is a valid one, because if you are using 3.0 (quilt),
    then there is redundancy between the upstream source code and what's in debian/patches: it is an error if the result of reverting all the patches
    does not match the upstream source in the .orig.tar.*, modulo possibly
    some accommodation for changes to **/.gitignore being accepted and ignored.
    To detect malicious Debian changes in 3.0 (quilt) format, you would want
    to either check for that error, or review both the direct diff and the
    patches.

    Checking for that error is something that can be (and is) automated:
    I don't use this workflow myself, but as far as I'm aware, dgit will
    check that invariant, and it will fail to build your source package
    if the invariant doesn't hold. dpkg-source in 3.0 (quilt) format will
    also make your source package fail to build if the desired invariant
    isn't true, except where intentionally ignored (ignoring deleted files, ignoring chmod +x, etc.).

    In the "patches unapplied" workflow, the Debian delta is exactly the
    contents of debian/, including debian/patches. There is redundancy between
    the upstream source code and the Debianized branch: it is an error for
    `git diff upstream/VERSION..debian/latest` to show any changes outside
    debian/, except for possibly **/.gitignore.
    To detect malicious Debian changes, you would want to check for
    that error.

    Again, checking for that error is something that can be (and is)
    automated: I use this workflow myself (e.g. in bubblewrap), so I know from experience that dgit *does* check for that error, and will fail to build
    the source package if the invariant does not hold. Again, dpkg-source
    in 3.0 (quilt) format will also make your source package fail to build
    if that error exists, except in the cases that it intentionally ignores.

    In the "debian/ only" workflow, the Debian delta is exactly the contents
    of debian/. There is no redundancy, so every tree is in some sense a
    valid one (although of course sometimes patches will fail to apply, or whatever).

    In summary, my claim is that having the upstream source code in git is a
    good thing because it makes detecting malicious upstreams less difficult
    than in a debian/-only workflow, while not making detecting malicious
    packagers noticeably harder: the only extra thing that is necessary is to
    carry out a check that can be done automatically, to assert that the tree
    is internally-consistent for whichever workflow is in use.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Boccassi@21:1/5 to Colin Watson on Sat Apr 6 09:52:22 2024
    On Fri, 5 Apr 2024 at 16:18, Colin Watson <cjwatson@debian.org> wrote:

    On Fri, Apr 05, 2024 at 03:19:23PM +0100, Simon McVittie wrote:
    I find that having the upstream source code in git (in the same form that we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely useful tool, because it lets me trace the history of all of the files
    that we are treating as source - whether hand-written or autogenerated -
    if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high success rate, but I think we are certainly not going to be able to achieve it if we reject tools like git that could make it easier.

    Strongly agree. For many many things I rely heavily on having the
    upstream source code available in the same working tree when doing any
    kind of archaeology across Debian package versions, which is something I
    do a lot.

    I would hate to see an attacker who relied on an overloaded maintainer
    push us into significantly less convenient development setups, thereby increasing the likelihood of overload.

    +1

    gbp workflow is great, easy to review and very productive

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stefano Rivera@21:1/5 to All on Sat Apr 6 09:52:29 2024
    Hi Thomas (2024.04.02_22:33:47_+0000)
    Anyways, on the 400+ packages that I maintain within the OpenStack team, I did come across some upstream using setuptools-scm. To my experience, using the:

    git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
    | xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz

    workflow out of an upstream always work, including for those that are using setuptools-scm.

    Then you haven't come across any that are using this mechanism to
    install data, yet. You're only seeing the version determination.
    You will, at some point run into this problem. It's getting more
    popular.

    Stefano

    --
    Stefano Rivera
    http://tumbleweed.org.za/
    +1 415 683 3272

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Theodore Ts'o on Sat Apr 6 09:52:31 2024
    On Mon, Apr 01, 2024 at 11:17:21AM -0400, Theodore Ts'o wrote:
    On Sat, Mar 30, 2024 at 08:44:36AM -0700, Russ Allbery wrote:
    ...
    Yes, perhaps it's time to switch to a different build system, although one of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
    case at all.

    The other problem is that many of the other build systems are much
    slower than autoconf/makefile. (Note: I don't use libtool, because
    it's so d*mn slow.) Or building the alternate system might require a
    major bootstrapping phase, or requires downloading a JVM, etc.

    The main selling point of Meson has been that it is a lot faster
    than autotools.

    Maybe the answer is "you should give up on portability to older systems as the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.

    Yeah, that too. There are still people building e2fsprogs on AIX,
    Solaris, and other legacy Unix systems, and I'd hate to break them, or require a lot of pain for people who are building on MacPorts, et. al.
    ...

    Everything you mention should already be supported by Meson.

    - Ted

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theodore Ts'o@21:1/5 to Vincent Bernat on Sat Apr 6 09:52:42 2024
    On Mon, Apr 01, 2024 at 06:36:30PM +0200, Vincent Bernat wrote:

    I think that if Debian was using git instead of the generated tarball, this part of the backdoor would have just been included in the git repository as well. If we were able to magically switch everything to git (and we won't,
    we are not even able to agree on simpler stuff), I don't think it would have prevented the attack.

    I'm not sure how much it would have helped, but I think the theory
    behind eliminating the gap between the release tarball and the git
    tree is the theory that in 2024, more developers are more likely to be
    building and testing against the git tree, and so it might have been
    more likely noticed. After all, Jia Tan decided it was worth while to
    check in 99% of the exploit in git, but to only enable it when it was
    built from the release tarball. If the exploit was always active when
    built from the git tree, perhaps someone might have noticed before it
    Debian uploaded the trojan'ed binary package to unstable, and then a
    week or so later, having it promoted to testing.

    I'm not sure how likely that would be for the specific case of
    xz-utils, since it appears the number of developers (not just
    Maintainers) was extremely small, but presumably Jia Tan decided to do
    things in that way in the hopes of making less likely that the malware
    would be noticed.

    - Ted

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeremy Stanley@21:1/5 to Thomas Goirand on Sat Apr 6 09:52:43 2024
    On 2024-04-03 00:33:47 +0200 (+0200), Thomas Goirand wrote:
    [...]
    Also, sdists are *not* "upstream-created source tarballs". I
    consider the binary form built for PyPi. Just like we have .debs,
    PyPi has tarballs and wheels, rather than how you describe them.
    [...]

    Upstream in OpenStack we believe we are distributing source tarballs
    in sdist format. We produce and sign them, and serve them from
    multiple locations. When you rebuild from a Git tag of an OpenStack
    repository using a standard Python packaging ecosystem toolchain,
    SetupTools is generating an ephemeral sdist on the fly in order to
    set the metadata PBR and other components need.

    I think it's fine that you'd rather rebuild the source distributions
    from revision control than use the ones published by the OpenStack
    community (we sign our tags with the same OpenPGP key as our
    tarballs anyway), but it's merely your opinion that sdists are *not* "upstream-created source tarballs" (an opinion *not* shared by
    everyone).
    --
    Jeremy Stanley

    -----BEGIN PGP SIGNATURE-----

    iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmYMnvdfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCmoOw//ay8YM5bedowLq2yzpI/qXwhabDHkyvdx3uGFCoErOHdJe3YA+pe9IxqW ATG19MoeHTPj5gFFvLFQTZE8R68kM38a70k97nRb7+kBLSsoL2F/FaQUCC9h/Hpa 29d1Akvr368ZIc4+5AGcp2u+2FP5r3ne1ZXR1gIfozHEKMjM3EGGyCp74evI5qxr G8mg4/bIdrTQsuYuP9AQ70JN9yWobQoKb0jo589S8zVEVYQbTOERxm5KooYYkdWN dmXqB9kofr1XfJWH7h1VTyNRyzFsqMciDlw2a3Z7eCalpkrbVepO9bj/xi0riKri dABLfcVolY54zj9o1VlE1vR1Kv2vVnIyriFfwG1AgAEB5D6eKP9/2+iWFo8VEna6 4ZBFOrpK+AptIt1JyJ03bVu//THRXiHow0BhmEjsn+2RPmehsB+AegjNBIiSeQSl +WZr9bUrHimEJYsJjXlAr3DE66HLW6v3rsMLs0bF8mSFDQNhDS/BkDWI0e6pZvk1 fjUsC2TgXZzeL8s2xQlI1PLL3xigbeDVUexuH1HvO/TS8xr58HeJl8j9BEpkP+06 GQaG2V4ufEwo2/pVnpstIZOwXb4QeR4i3/nyNYWv/YQV++8e7Gk1KdGC3EzZRsW4 UJEc6oLl5G3CZazk1FwM6xPDe0z/2gwnxBub4KEjAIAgqLljBMc=
    =Ggwk
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32
  • From Sean Whitton@21:1/5 to Simon McVittie on Sat Apr 6 13:30:01 2024
    Hello,

    On Fri 05 Apr 2024 at 03:19pm +01, Simon McVittie wrote:

    There are basically three dgit-compatible workflows, with some minor adjustments around handling of .gitignore files:

    - "patches applied" (git-debrebase, etc.):
    This is the workflow that proponents of dgit sometimes recommend,
    and dgit uses it as its canonicalized internal representation of
    the package.
    The git tree is the same as `dpkg-source -x`, with upstream source code
    included, debian/ also included, and any Debian delta to the upstream
    source pre-applied to those source files.
    In the case of bubblewrap, if we used this workflow, after you clone
    the project, bubblewrap.c would already have the Debian-specific error
    message.
    (dgit --split-view=never or dgit --quilt=dpm)

    - "patches unapplied" (gbp pq, quilt, etc.):
    This is the workflow that many of the big teams use (at least Perl,
    Python, GNOME and systemd), and is the one that bubblewrap really uses.
    The git tree is the same as `dpkg-source -x --skip-patches`, with
    upstream source code included, and debian/ also included.
    Any Debian delta to the upstream source is represented in debian/patches
    but is *not* pre-applied to the source files: for example, in the case
    of bubblewrap, after you clone
    https://salsa.debian.org/debian/bubblewrap.git and view bubblewrap.c,
    it still has the upstream error message, not the Debian-specific one.
    (dgit --quilt=gbp or dgit --quilt=unapplied; I use the latter)

    - debian/ only:
    This is what you're advocating above.
    The git tree contians only debian/. If there is Debian delta to the
    upstream source, it is in debian/patches/ as usual.
    (dgit --quilt=baredebian* family)

    People interested in these differences may also want to look at:
    <https://wiki.debian.org/GitPackagingSurvey>.


    Again, checking for that error is something that can be (and is)
    automated: I use this workflow myself (e.g. in bubblewrap), so I know from experience that dgit *does* check for that error, and will fail to build
    the source package if the invariant does not hold. Again, dpkg-source
    in 3.0 (quilt) format will also make your source package fail to build
    if that error exists, except in the cases that it intentionally ignores.

    Right, both dgit and tag2upload's client-side wrapped of git-tag(1), git-debpush(1), do this check.

    --
    Sean Whitton

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmYRMWYZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQO4SEACGX9vxvkRQsirePURySZnD 6BTCBKk4S8lpZl3Z67nS1Tr27HUBWn9B40qAFIRG/oC9QYxfXTW8WYLreqlJsUfq ncBCmOuIyKxIfFHDU9K2auBG/D6/6EZ7HdelFl7LubdMdNpQtCLbRfo6PVqhywiC tYGAfrCquCIoYIc8TbRg0GeHLIf71ZD0Oc3+0DayIu30WApzKb1DoyzquQ7eLCzC h3JB0KJr4BDyJtXllK3eNDagq4CtN9u3k1xLdiFHpyr3Fz74YITrU2ekmipRF01K GD/Q3x3VHoKimyFc8DyhSALSCG/sLbL1JD243jIE/wCNMp1xn0M8Rob+NhEcpbWk R+7kDQ0RcaslcFp6uq0yX6e1+A2Mi+Dni/1v4RXBik4zN0tcRwrI9oY4BQ/wbRGu WZ5uGD7K/YlaE6NF3uvm4qkbwU2CsQLwz1Ag2AUbTvo3JY2MxFwWgipfN8mfj7Kh 5OUM2+d575bX89Zpg/uGHG18n+Key0mePIQ1A1zeF+VO8zDqTcXQu6deaPNjuIc3 6jylZ5HszY/ht0V0AljIururObAeJ/AO6rSqQVRUuWdSwota3hnWC45F/VuDCTPW 7Prp8iHJCO+WcsPAozPQ7EuvYQp510usdPcrhnIucIy+XXJcIgJUpOfegmCydTN6 XaQFLSrP0Zno3OUN1Plrtg==X8x+
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Us