Forum: >>> Magnum BBS <<<

Re: Validating tarballs against git repositories

From Guillem Jover@21:1/5 to Antonio Russo on Sat Mar 30 05:50:01 2024

Hi!

On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:

This is a vector I've been somewhat paranoid about myself, and I
typically check the difference between git archive $TAG and the downloaded tar, whenever I package things. Obviously a backdoor could have been inserted into the git repository directly, but there is a culture
surrounding good hygiene in commits: they ought to be small, focused,
and well described.

But the backdoor was in fact included in a git commit (it's hidden
inside a test compressed file).

The part that was only present in the tarball was the code to extract
and hook the inclusion of the backdoor via the build system.

People are comfortable discussing and challenging
a commit that looks fishy, even if that commit is by the main developer
of a package. I have been assuming tooling existed in package
maintainers' toolkits to verify the faithful reproduction of the
published git tag in the downloaded source tarball, beyond a signature
check by the upstream developer. Apparently, this is not universal.

Had tooling existed in Debian to automatically validate this faithful reproduction, we might not have been exposed to this issue.

Given that the autogenerated stuff is not present in the git tree,
a diff between tarball and git would always generate tons of delta,
so this would not have helped.

Having done this myself, it has been my experience that many partial
build artifacts are captured in source tarballs that are not otherwise maintained in the git repository. For instance, in zfs (which I have contributed to in the past), many automake files are regenerated.
(I do not believe that specific package is vulnerable to an attack
on the autoconf/automake files, since the debian package calls the
upstream tooling to regenerate those files.)

We already have a policy of not shipping upstream-built artifacts, so
I am making a proposal that I believe simply takes that one step further:

1. Move towards allowing, and then favoring, git-tags over source tarballs

I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?

2. Require upstream-built artifacts be removed (instead, generate these
ab-initio during build)

The problem here is that the .m4 file to hook into the build system was
named like one shipped by gnulib (so less suspicious), but xz-utils does
not use gnulib, and thus the autotools machinery does not know anything
about it, so even the «autoreconf -f -i» done by debhelper via
dh-autoreconf, would not regenerate it.

Removing these might be cumbersome after the fact if upstream includes
for example their own maintained .m4 files. See dpkg's m4 dir for an
example of this (although there it's easy as all are namespaced but…).

Not using an upstream provided tarball, might also mean we stop being
able to use upstream signatures, which seems worse. The alternative
might be promoting for upstreams to just do the equivalent of
«git archive», but that might defeat the portability and dependency
reduction properties that were designed into the autotools build
system, or increase the bootstrap set (see for example the pkg.dpkg.author-release build profile used by dpkg).

(For dpkg at least I'm pondering whether to play with switching to
doing something equivalent to «git archive» though, but see above, or
maybe generate two tarballs, a plain «git archive» and a portable one.)

3. Have tooling that automatically checks the sanitized sources against
the development RCSs.

Perhaps we could have a declarative way to state all the autogenerated artifacts included in a tarball that need to be cleaned up
automatically after unpack, in a similar way as how we have a way to automatically exclude stuff when repackaging tarballs via uscan?

(.gitignore, if upstream properly maintains those might be a good
starting point, but that will tend to include more than necessary.)

4. Look unfavorably on upstreams without RCS.

Some upstreams have a VCS, but still do massive code drops, or include autogenerated stuff in the VCS, or do not do atomic commits, or in
addition their commit message are of the style "fix stuff", "." or
alike. So while this is something we should encourage, it's not
sufficient. I think part of this might already be present in our
Upstream Guidelines in the wiki.

In the present case, the triggering modification was in a modified .m4 file that injected a snippet into the configure script. That modification
could have been flagged using this kind of process.

I don't think this modification would have been spotted, because it
was not modifying a file it would usually get autogenerated by its
build system.

While this would be a lot of work, I believe doing so would require a
much larger amount of additional complexity in orchestrating attacks
against Debian in the future.

It would certainly make it a bit harder, but I'm afraid that if you
cannot trust upstream and they are playing a long game, then IMO they
can still sneak nasty stuff even in plain sight with just code commits,
unless you are paying extreme close attention. :/

See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.

Thanks,
Guillem

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Antonio Russo on Sat Mar 30 07:30:01 2024

Antonio Russo <antonio.e.russo@gmail.com> writes:

The way I see it, there are two options in handling a buildable package:

1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.

2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.

Either of these is, in my mind, dramatically better than what happened.

I think the point that you're assuming (probably because you quite
reasonably think it's too obvious to need to be stated, but I'm not sure
it's that obvious to everyone) is that malicious code injected via a
commit is significantly easier to detect than malicious code that is only
in the release tarball.

This is not *always* correct; it really depends on how many eyes are on
the upstream repository and how complex or unreadable the code upstream
writes normally is. (For example, I am far from confident that I can
eyeball the difference between valid and malicious procmail-style C code
or random M4 files.) I think it's clearly at least *sometimes* correct, though, so I'm sympathetic, particularly given that it's already Debian practice to regenerate the build system files anyway.

In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics. I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm
not sure this is obviously true.

Ok, so am I understanding you correctly in that you are saying: we do actually want *some* build artifacts in the source archives?

If that's the case, could make those files at packaging time, analogous
to the DFSG-exclude stripping process?

If I have followed this all correctly, I believe that in this case the
exploit is not in a build artifact. It's in a very opaque source artifact
that is different in the release tarball from the Git archive. Assuming
that I have that right, stripping build artifacts wouldn't have done
anything about this exploit, but comparing Git and release tarballs would
have.

I think you're here anticipating a *different* exploit that would be
carried in build artifacts that Debian didn't remove and reconstruct, and
that we want to remove those from our upstream source archives in order to ensure that we can't accidentally do that.

On 2024-03-29 22:41, Guillem Jover wrote:

(For dpkg at least I'm pondering whether to play with switching to
doing something equivalent to «git archive» though, but see above, or
maybe generate two tarballs, a plain «git archive» and a portable one.)

Yeah, with my upstream hat on, I'm considering something similar, but I
still believe I have users who want to compile from source on systems
without current autotools, so I still need separate release tarballs.
Having to generate multiple release artifacts (and document them, and
explain to people which ones they want, etc.) is certainly doable, but I
can't say that I'm all that thrilled about it.

I think with my upstream hat on I'd rather ship a clear manifest (checked
into Git) that tells distributions which files in the distribution tarball
are build artifacts, and guarantee that if you delete all of those files,
the remaining tree should be byte-for-byte identical with the
corresponding signed Git tag. (In other words, Guillem's suggestion.)
Then I can continue to ship only one release artifact.

I take a look at these every year or so to keep me terrified of C! If
it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there
is hope (of them!) identifying bad(tm) things.

A single upstream developer is the most common case, though. Perhaps less
so for core libraries, but, well, there are plenty of examples. (To pick another one that comes readily to mind, zlib appears to only have one
active maintainer.)

The reality that we are struggling with is that the free software infrastructure on which much of computing runs is massively and painfully underfunded by society as a whole, and is almost entirely dependent on
random people maintaining things in their free time because they find it
fun, many of whom are close to burnout. This is, in many ways, the true
root cause of this entire event.

The sad irony here is that the xz maintainer tried to do exactly what we
advise people in this situation to do: try to add a comaintainer to share
the work, and don't block work because you don't have time to personally
vet everything in detail. This is *exactly* why maintainers often don't
want to do that, and thus force people to fork packages rather than join
in maintaining the existing package.

This is an aside, but this is why my personal policy for my own projects
that I no longer have to maintain is to orphan them and require that
someone fork them, not add additional contributors to my repository or
release infrastructure. I do not have the resources to vet new
maintainers -- if I had that time to spend on the projects, I wouldn't
have orphaned them -- and therefore I want to explicitly disclaim any responsibility for what the new maintainer may do. Someone else will have
to judge whether they are trustworthy. But I'm not sure that
distributions are in a good position to do that *either*.

But, I will definitely concede that, had I seen a commit that changed
that line in the m4, there's a good chance my eyes would have glazed
over it.

This is why I am somewhat skeptical that forcing everything into Git
commits is as much of a benefit as people are hoping. This particular
attacker thought it was better to avoid the Git repository, so that is
evidence in support of that approach, and it's certainly more helpful,
once you know something bad has happened, to be able to use all the Git
tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be
force-pushed, and if the Git repository is somewhere other than GitHub,
the malicious possibilities are even broader.

We could narrow those possibilities somewhat by maintaining
Debian-controlled mirrors of upstream Git repositories so that we could
detect rewritten history. (There are a whole lot of reasons why I think
dgit is a superior model for archive management. One of them is that it captures the full Git history of upstream at the point of the upload on Debian-controlled infrastructure if the maintainer of the package bases it
on upstream's Git tree.)

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Antonio Russo on Sat Mar 30 08:20:01 2024

On 30/03/24 01:21, Antonio Russo wrote:

3. Have tooling that automatically checks the sanitized sources against
the development RCSs.

git-buildpackage and pristine-tar can be used for that.

4. Look unfavorably on upstreams without RCS.

And look unfavorably on Debian packages without VCS. And, in addition:

5. Require something like tag2upload to create new releases of Debian
packages.

For too many core packages there is an opaque "something happens on the
Debian maintainer laptop" step that has no place in 2024. We have no
idea how many Debian DDs/DMs machiens have been compromised because of
this attack. (Hopefully zero.) Any future upload of source debs may, in principle, contain malicious code.

The workflow for Debian packages has already gone from:

1. new upstream release;
2. something happens on the DD/DM machine;
3. the DD/DM uploads two non-reviewed-in-practice blobs (source deb,
binary deb) to unstable.

to:

1. new upstream release;
2. something happens on the DD/DM machine;
3. the DD/DM uploads a non-reviewed-in-practice blob (source deb) to the buildd;
4. the buildd compiles the source deb into the binary deb;
5. the buildd uploads a non-reviewed-in-practice blob (binary deb) to
unstable.

This change moved a lot of trust from the hands (and machines) of a
myriad of DDs/DMs into a handful of closely guarded build machines. A compromised gcc on the DD/DM machine is no longer a problem. But a
compromised tar/dpkg/debhelper still is.

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs
"source deb" and "binary deb" to unstable.

This change would have three advantages:

* Make the whole process happen outside the DD/DM computer, so it
becomes more public and easier to review (commits vs debs), removing
many chances for compromises.

* Close two specific attack vectors (hiding code in upstream release
tarballs and in source debs) that have always existed and for one of
which we have now proof of exploitation.

* Force attackers to do their work under public scrutiny, raising the complexity and the cost of carrying out an attack.

Yes, such a workflow will not stop many other attack vectors, but at
least _these_ attack vectors will be stopped.

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lucas Nussbaum@21:1/5 to Russ Allbery on Sat Mar 30 09:10:02 2024

On 29/03/24 at 23:29 -0700, Russ Allbery wrote:

The sad irony here is that the xz maintainer tried to do exactly what we advise people in this situation to do: try to add a comaintainer to share
the work, and don't block work because you don't have time to personally
vet everything in detail. This is *exactly* why maintainers often don't
want to do that, and thus force people to fork packages rather than join
in maintaining the existing package.

Yes. In that specific case, the original xz maintainer (Lasse Collin)
was socially-pressed by a likely fake person (Jigar Kumar) to do the
"right thing" and hand over maintenance.

https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

I wonder if "Dennis Enn" is also a fake person. In retrospect, that
email looks suspicious:

On 2022-06-21 Dennis Ens wrote:

Why not pass on maintainership for XZ for C so you can give XZ for
Java more attention? Or pass on XZ for Java to someone else to focus
on XZ for C? Trying to maintain both means that neither are
maintained well.

Lucas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?iso-8859-1?Q?An=EDbal?= Monsalve@21:1/5 to Antonio Russo on Sat Mar 30 09:40:01 2024

On Fri, 2024-03-29 23:53:20 -0600, Antonio Russo wrote:

On 2024-03-29 22:41, Guillem Jover wrote:

See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.

I take a look at these every year or so to keep me terrified of C!
If it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there is hope (of them!) identifying bad(tm) things.

Another scary example, "Reflections on Trusting Trust" by Ken Thompson:

https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEExgRcgTiHt3wt/5elfFas/pR4l9gFAmYHzk8ACgkQfFas/pR4 l9jLcQ/9E5n7ceplpGaTuuMK9WjcmRVjFlURKJYbqKIXRGXXrIKLEq7fI9Z0iLxq paJxmf7pk2TsJ0p/vWrhIGExrYyGDKS/7W4PUEfDf5QhXuZGmqYhvoc21furzyPe argMsWhbc865qJlBnrtARiOidoPXqeceyPoK9fxrroppzyDwqs0wglxAiCCo29Xk 7zdYjqKKDcpzenTkwBUPv1eyn4Zv9jXu9AaB18c9gumRN/sCcKjUamBKWGiGXdqM WBJMAor5cRPJiijcP9OJr5fRIr3nzXliO4datJFWme3yM/SX1h8R9fL6eBNBZdRS OyNWPyaeow0n8N3kx7qbXU639uPIF/ul8RtYTtYhSj4lKScjnao/fzPZ1TTMIt7N AUnnRCbiZmmmcSJuok4mApd7+ZSpcFwLF9w85P1zQxjrHmHmkYC+LrZGbXGm9fbv AVrGcaCpe6LE/pXJiGoSYrDVbOXqTRm7lUxc5M0M/ZVTh8/GP0ed57dMNOgL1Xj+ hGjo3djxCn57XP97u5qxCST8hXr/AZbR/XqhyXkN6dycx0rftUF5idvhWcYoJWne 20ht7TWbsOzokUTh355noc9VD1V0+fYtvR8/x+blp5b+ijp9fHQSBD6iQT+PAqXL Grn+QrfmDJr5w1izBbXfVyTHuYD6D+yt5ZIXRu+qBdVC33JywAY=
=Oxez
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?utf-8?Q?Ingo_J=C3=BCrgensmann?=@21:1/5 to In his reply to that mail Lasse on Sat Mar 30 10:30:01 2024

Am 30.03.2024 um 08:56 schrieb Lucas Nussbaum <lucas@debian.org>:

Yes. In that specific case, the original xz maintainer (Lasse Collin)
was socially-pressed by a likely fake person (Jigar Kumar) to do the
"right thing" and hand over maintenance. https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

In his reply to that mail Lasse writes in https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html:

It's also good to keep in mind that this is an unpaid hobby project.

This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is widely used, insert some malicious code and have fun. Then also imagine stuff that automates builds in other
ways like docker containers, Ruby, Rust, pip that pull stuff from the network and installs it without further checks.

I hope (and am confident) that Debian as a project will react accordingly to prevent this happening again.

But as a society (that is widely using FLOSS) I would also hope that our developers will get proper funding instead of requiring them to maintain such software in their spare time.

--
Ciao... // Web: http://blog.windfluechter.net
Ingo \X/ XMPP/Jabber: ij@jhookipa.net

gpg pubkey: http://www.juergensmann.de/ij_public_key.asc

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andrey Rakhmatullin@21:1/5 to All on Sat Mar 30 10:40:01 2024

On Sat, Mar 30, 2024 at 09:58:22AM +0100, Ingo Jürgensmann wrote:

Yes. In that specific case, the original xz maintainer (Lasse Collin)
was socially-pressed by a likely fake person (Jigar Kumar) to do the
"right thing" and hand over maintenance. https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html

In his reply to that mail Lasse writes in https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html:

It's also good to keep in mind that this is an unpaid hobby project.

This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is widely used, insert some malicious code and have fun. Then also imagine stuff that automates builds in other

ways like docker containers, Ruby, Rust, pip that pull stuff from the network and installs it without further checks.

I hope (and am confident) that Debian as a project will react accordingly to prevent this happening again.

How?

--
WBR, wRAR

-----BEGIN PGP SIGNATURE-----

iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmYH3U4tFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh ndUP/jJDFzdOUdyoW5Ma5jCtgYD0uf/CMCoA3oUlI+wYokdYAETAaGj1aZ0ZFNGN t913tl/Uc+Cs5l7cmtj7y+n3TAd5hE5Lf9XTMDHUofFOJnQP2jJJP7JCSHSN/yny Y1ibBz4MPkaDahRVpYBC2h7x6GmBJ4M26O9eN5u7rV/LZ0eiYUeXkyxjOE4mXY6I XDWoqrGXwUwCvVj7KbJKXTYueedya9pKKypESnmSgQm8gZZZwYbaVfY4GL14wm7F oC1cQzew3DV4KZKqx/SqZDpeBNLHq792iMf7TL44VPavCJdeyVBQYfdDk8p39Ofy E5AY5S0P6aoFac/2Tnf6eXsPCSKJMdvrgyntHT8HpId5Tb9MlP0m9XsPKGOFxOhY HYoFpClzKv/Jjb9OhTBtlpOnwxmE2dc+iGboOLErF0194EBrRtfbOmf72PzPCr5m d2JMWKPocdZoX2p8JQyImplVkBN0s2ElSJRIJxu/CdfSW70OmIfqyCE6fPJNjy9/ id8t4+XzmAiqGyX7Sryy94Aej8i0ZT5lX87XFHB6vYULVXizDtR2ZHU6D0uZCp3Q LQq6YigupTJC2wv/4X5Rc9FLQjbztP9ABb+Apyg5YPlcUMeFHGBUkPxKWbxjXnrs w9eXlP1UwD6FaxUECWDB7Wr18Wmr4W4BjXu258Vio+H6tCWr
=YFiT
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Iustin Pop@21:1/5 to Gioele Barabucci on Sat Mar 30 11:00:01 2024

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the advantages.

The problem is that this requies functionality testing to be fully
automated via autopkgtest, and moved off the "update changelog, build
package, test locally, test some more, upload".

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.

And yes, this should be uniform across all packages stored on Salsa, so
as to not diverge how the testing is done.

iustin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Josefsson@21:1/5 to Antonio Russo on Sat Mar 30 10:40:01 2024

Antonio Russo <aerusso@aerusso.net> writes:

1. Move towards allowing, and then favoring, git-tags over source tarballs

Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball, and that will more likely trigger
cyclic dependencies. People that do bootstrapping for new platforms or cross-platform dislike such added dependency.

One response to that may be "sorry, our concerns for supply chain
security trumps your desire for easier building" but so far I believe
the approach has been to compromise a little on supply chain side (i.e., building from tarballs) and compromise a little on the
bootstrap/crossbuild smoothness (e.g., adding nodoc or nocheck targets).

Moving that needle isn't all that trivial, although I think I'm moving
myself to a preference that we really need to build everything from
source code and preferrably not even including non-source code files
because they may dormant and activated later on a'la the xz attack.

An old irk of mine is that people seems to believe that running
'autoreconf -fi' is intended or supposed to combat problems related to
this: autoreconf was never designed for that purpose, nor does it
achieve it realiably. Many distributions have adopted a preference to
do run 'autoreconf' to "re-bootstrap" a project from source code. This
misses a lot of generated files, and sometimes generate incorrect (and
possibly harmful) newly generated files. For example: https://gitlab.com/libidn/libidn2/-/issues/108

/Simon

-----BEGIN PGP SIGNATURE-----

iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgfV6RQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFouxbAQCAkO/7CbiOC4OeF3uU3/kBoKlxJ6Xy yuDo3de0UEp87wEAn6jkW6M/UspaizrKE3CBs1djg4g5WRgQa3V6fDrgRg8=
=2CJi
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Simon Josefsson on Sat Mar 30 11:10:01 2024

On 30/03/24 10:05, Simon Josefsson wrote:

Antonio Russo <aerusso@aerusso.net> writes:

1. Move towards allowing, and then favoring, git-tags over source tarballs

Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball, and that will more likely trigger
cyclic dependencies. People that do bootstrapping for new platforms or cross-platform dislike such added dependency.

Most of the time such added dependencies could be worked around with
build profiles and cross building. More widespread support for <nodoc>, <nocheck> and Multi-Arch annotations can greatly reduce the number of
deps needed to bootstrap an architecture.

Just as an example, bootstrapping coreutils currently requires
bootstrapping at least 68 other packages, including libx11-6 [1]. If
coreutils supported <nodoc> [2], the transitive closure of its
Build-Depends would be reduced to 20 packages, most of which in build-essential.

[1] https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
[2] https://bugs.debian.org/1057136

Regards,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sean Whitton@21:1/5 to Antonio Russo on Sat Mar 30 11:50:02 2024

Hello,

On Fri 29 Mar 2024 at 06:21pm -06, Antonio Russo wrote:

1. Move towards allowing, and then favoring, git-tags over source tarballs

Many of us already do this. dgit maintains an official store of the tags.

--
Sean Whitton

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sean Whitton@21:1/5 to Iustin Pop on Sat Mar 30 11:50:02 2024

Hello,

On Sat 30 Mar 2024 at 10:56am +01, Iustin Pop wrote:

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source >> deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the advantages.

It is also already fully implemented as tag2upload, and is merely as yet undeployed, for social reasons.

--
Sean Whitton

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Josefsson@21:1/5 to Gioele Barabucci on Sat Mar 30 12:30:01 2024

Gioele Barabucci <gioele@svario.it> writes:

Just as an example, bootstrapping coreutils currently requires
bootstrapping at least 68 other packages, including libx11-6 [1]. If coreutils supported <nodoc> [2], the transitive closure of its
Build-Depends would be reduced to 20 packages, most of which in build-essential.

[1] https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
[2] https://bugs.debian.org/1057136

Coreutils in Debian uses upstream tarballs and does not do a full
bootstrap build. It does autoreconf instead of ./bootstrap. So the dependencies above is not the entire bootstrapping story to build
coreutils from git compared to building from tarballs.

It would help if upstreams would publish PGP-signed 'git-archive'-style tarballs, including content from git submodules in them.

Relying on signed git tags is not reliable because git is primarily
SHA1-based which in 2019 cost $45K to do a collission attack for.

/Simon

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgf1ShQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFosaIAQDCpY0YFJ7Ubbgb8+QRzmK4mLIk4XuL 6QU4SsIKGXzsQQEAjEMdoWVeusE09NhFVu95pJTFFyL4yzFRffN6lzA/FQY=+ajI
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Luca Boccassi@21:1/5 to Iustin Pop on Sat Mar 30 12:50:01 2024

On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the advantages.

The problem is that this requies functionality testing to be fully
automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.

This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Luca Boccassi@21:1/5 to Russ Allbery on Sat Mar 30 13:00:01 2024

On Sat, 30 Mar 2024 at 06:29, Russ Allbery <rra@debian.org> wrote:

Antonio Russo <antonio.e.russo@gmail.com> writes:

The way I see it, there are two options in handling a buildable package:

1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.

2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.

Either of these is, in my mind, dramatically better than what happened.

I think the point that you're assuming (probably because you quite
reasonably think it's too obvious to need to be stated, but I'm not sure
it's that obvious to everyone) is that malicious code injected via a
commit is significantly easier to detect than malicious code that is only
in the release tarball.

This is not *always* correct; it really depends on how many eyes are on
the upstream repository and how complex or unreadable the code upstream writes normally is. (For example, I am far from confident that I can
eyeball the difference between valid and malicious procmail-style C code
or random M4 files.) I think it's clearly at least *sometimes* correct, though, so I'm sympathetic, particularly given that it's already Debian practice to regenerate the build system files anyway.

In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics. I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm not sure this is obviously true.

While it's of course true that avoiding massaged tarballs as orig.tar
is not a panacea, and that obfuscated malicious code can and is
checked in git, I am pretty sure it is undeniable that having
everything tracked in git makes it _easier_ to audit and investigate.
Not perfect, not fool-proof, but easier, compared to manually diffing
tarballs. And given we are talking about malicious actors using
subterfuge to attack us, I think we could use all the help we can get,
even if there's no perfect solution.

In the end, massaged tarballs were needed to avoid rerunning
autoconfery on twelve thousands different proprietary and
non-proprietary Unix variants, back in the day. In 2024, we do
dh_autoreconf by default so it's all moot anyway. When using Meson/CMake/home-grown makefiles there's no meaningful difference on
average, although I'm sure there are corner cases and exceptions here
and there.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sean Whitton@21:1/5 to Simon Josefsson on Sat Mar 30 13:30:01 2024

Hello,

On Sat 30 Mar 2024 at 12:19pm +01, Simon Josefsson wrote:

Relying on signed git tags is not reliable because git is primarily SHA1-based which in 2019 cost $45K to do a collission attack for.

We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.

--
Sean Whitton

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmYIA34ZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQJ8KD/wNY7yiidt+z3vJzi761wt6 3ubkrhrJqpFPl1z2qxToiWhlPW58L9sNI6grjmQRkC8T2EaaBu/fKFH1FDhywg/T 3Hipd/lRdTQoyq/9lfSAzSaF5Rch30ZMgYtD0Ahq6rLLmk8yvjxr0+DLwTVOA7eU GxrZMj8I6rJbwJMsKDzWXxHvvpK0cTYtN0gcLwDGgpe4XfidBXSvfmTAWEO3GhuU RGHekOaMXU5uMP2QpmVLdy8aFDO3cBE9RE20tUaVHFh6gr8t0XIVF4AH6U3qFNjY 3W7yKDmQYbAxxH5ZTsj0abF0B/aIpJklsVM3k6ski48vlOCor4aLzvH96lzlw5/T /hH2XtVEfIeMnmH0+DFDtI7EIUBKolxSgd0nQK4k22CiLVeMbZO4JqvwxncF1iEh 3Sv0++L4w4cUplQmhqwQH40GAV7QWwIh2UzQqjF1NEd/+23Savg9VMAsiw5n13Pk 41yvwp5krebYaFFH8IU2Nupi3wf85iZ8VZd23ZkFZRvoLJTpCmfZVqG0uh8su/to v0oqWgBPAI9xFNTvRmd2mGf975BMxXx6T1F9rAXCmyTH1VbBiG2TMgWGJezR5AKR JAMmAqWxEZpKXAXzwtPjwhUWO1JEKtkLjckeHAXrVHqULwrgbuprKBP7QAB34w4N KHgOxhzIVsXsPBWJtekdgQ==SOgs
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Us

From Jan-Benedict Glaw@21:1/5 to Gioele Barabucci on Sat Mar 30 13:40:01 2024

On Sat, 2024-03-30 08:02:04 +0100, Gioele Barabucci <gioele@svario.it> wrote:

On 30/03/24 01:21, Antonio Russo wrote:

3. Have tooling that automatically checks the sanitized sources against
the development RCSs.

git-buildpackage and pristine-tar can be used for that.

Would be nice if pristine-tar's data file would be reproducible,
too...

MfG, JBG

--

-----BEGIN PGP SIGNATURE-----

iF0EABECAB0WIQQlDTvPcScNjKREqWEdvV51g5nhuwUCZggFygAKCRAdvV51g5nh u4rIAJ9WXEnnuVM3p32k4OrpwEdMid/rpACeLSljTtoZuAW0ibfkVX1W9u/HDQo=
=GMYs
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jonathan Carter@21:1/5 to Simon Josefsson on Sat Mar 30 13:40:01 2024

On 2024/03/30 11:05, Simon Josefsson wrote:

1. Move towards allowing, and then favoring, git-tags over source tarballs

Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball

How in the world do you jump to that conclusion?

-Jonathan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bastian Blank@21:1/5 to Jan-Benedict Glaw on Sat Mar 30 14:00:01 2024

On Sat, Mar 30, 2024 at 01:30:07PM +0100, Jan-Benedict Glaw wrote:

On Sat, 2024-03-30 08:02:04 +0100, Gioele Barabucci <gioele@svario.it> wrote:

On 30/03/24 01:21, Antonio Russo wrote:

3. Have tooling that automatically checks the sanitized sources against
the development RCSs.

git-buildpackage and pristine-tar can be used for that.

Would be nice if pristine-tar's data file would be reproducible,
too...

Use pristine-lfs. Or just generate via "git archive".

Bastian

--
It is undignified for a woman to play servant to a man who is not hers.
-- Spock, "Amok Time", stardate 3372.7

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jonathan Carter@21:1/5 to Sean Whitton on Sat Mar 30 14:10:01 2024

Hi Sean

On 2024/03/30 12:43, Sean Whitton wrote:

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the
advantages.

It is also already fully implemented as tag2upload, and is merely as yet undeployed, for social reasons.

My understanding is that DSA aren't quite comfortable with it, since it
would need to archive GPG signing key (or a keypair trusted by DAK)?

I did enjoy the tag2upload talk that was given earlier this year at
miniDebConf Campridge:

https://peertube.debian.social/w/pav68XBWdurWzfTYvDgWRM

One of the things I like most about it is that it doesn't break any
existing workflow or technical implementation. And it seems like
something most people would reasonably want to see implemented.

So I think it boils down to finding some constructive way to engage with ftpmasters to find a solution that they are content with, because
without that, nothing is going to happen. I'm not 100% sure that I would classify that as a social reason, DSA/ftpmaster is careful out of necessity.

Any chance we can convince both ftpmaster members and tag2upload team to
join at DebConf24 in Busan so that an attempt can be made to hash this
out in person? I'm not sure everyone involved will be motivated enough
to join a sprint just to work on this, but it tends to work so much
better when people work on problems together in person rather than
emails where people want to reply thoughtfully and then end up taking
weeks to do so.

I think it's not so much a question of *if* the Debian would ever switch
to a git-based workflow, but *when*. And tag2upload's opt-in nature
provides a great bridge to that future, there's clearly been a lot of
good thought put into it, and there's really no alternative that even
comes close in either design or being so close to being ready for implementation. However, I think it can only happen if you get all the
right people in the same room to address the remaining concerns.

-Jonathan

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Josefsson@21:1/5 to Sean Whitton on Sat Mar 30 15:00:01 2024

Sean Whitton <spwhitton@spwhitton.name> writes:

Hello,

On Sat 30 Mar 2024 at 12:19pm +01, Simon Josefsson wrote:

Relying on signed git tags is not reliable because git is primarily
SHA1-based which in 2019 cost $45K to do a collission attack for.

We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.

Can you share that analysis? As far as I understand, it is possible for
a malicious actor to create a git repository with the same commit id as
HEAD, with different historic commits and tree content. I thought a
signed tag is merely a signed reference to a particular commit id. If
that commit id is a SHA1 reference, that opens up for ambiguity given
recent (well, 2019) results on SHA1. Of course, I may be wrong in any
of the chain, so would appreciate explanation of how this doesn't work.

/Simon

-----BEGIN PGP SIGNATURE-----

iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZggZmhQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFogFiAP41nz5dTNd08ZXiDw4okKZUjuFLxr5O tIj87BN+3QkRQwEAmlikfZgfzWvpdgsu4qqE5620ULBxXuqZ1vDTKPIQbQA=
=AmRy
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Guillem Jover@21:1/5 to Antonio Russo on Sat Mar 30 14:20:02 2024

Hi!

On Fri, 2024-03-29 at 23:53:20 -0600, Antonio Russo wrote:

On 2024-03-29 22:41, Guillem Jover wrote:

On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:

Had tooling existed in Debian to automatically validate this faithful
reproduction, we might not have been exposed to this issue.

Given that the autogenerated stuff is not present in the git tree,
a diff between tarball and git would always generate tons of delta,
so this would not have helped.

I may not have been clear, but I'm suggesting scrubbing all the
autogenerated stuff, and comparing that against a similarly scrubbed
git tag contents. (But you explain that this is problematic.)

Yes, the point here is how we determine what is autogenerated stuff
when confronted with a malicious upstream, so the problem again is
that if you need to verify everything then you might easily get
overwhelmed by sheer amount of autogenerated output. But see below.

Having done this myself, it has been my experience that many partial
build artifacts are captured in source tarballs that are not otherwise
maintained in the git repository. For instance, in zfs (which I have
contributed to in the past), many automake files are regenerated.
(I do not believe that specific package is vulnerable to an attack
on the autoconf/automake files, since the debian package calls the
upstream tooling to regenerate those files.)

(Hopefully the above clears up that I at least have some superficial awareness of the build artifacts showing up in the release tarball!)

(Sorry, I guess my reply might have sounded patronizing? I noticed later
on that you explicitly mentioned this, but thought that would be clear
then when reading the whole mail, thought about adding a note to the
earlier text, but considered it unnecessary. Should have probably added
it anyway. :)

1. Move towards allowing, and then favoring, git-tags over source tarballs

I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?

I'm not wed to any specific mechanism, but I'd be content with that. I'd
be most happy DD-signed tags that were certified dfsg, policy compliant (i.e., lacking build artifacts), and equivalent to scrubbed upstream source. (and more on that later, building on what you say).

Many repositories today already do things close to this with pristine-tar,
so this seems to me a direction where the tooling already exists.

I'll add that, if we drop the desire for a signed archive, and instead require a signed git-tag (from which we can generate a source tar on
demand, as you suggest), we can drop the pristine-tar requirement. If we
are less progressive, but move to exclusively with Debian-regenerated
.tar files, we can probably avoid many of the frustrating edge cases that pristine-tar still struggles with.

I'm personally not a fan of pristine-tar, and my impression is that it
is falling out of favor in various corners and big teams within the
project. And then I'm also not a fan either for mixing packaging with
upstream git history. The non-native packages I maintain only contain
debian/ directories, which to me have the superior properties (but not tooling), including in a situation like this. I'll expand on this later.

I've been thinking and, perhaps the only thing we'd need, is to include
either a file or a field in some file that refers to the upstream commit
we think the tarball is derived from. We also have fields that contain
the upstream VCS repo. Then we could also have tooling that could perform
such checks, independently from how we transport and pack our sources.

2. Require upstream-built artifacts be removed (instead, generate these
ab-initio during build)

The problem here is that the .m4 file to hook into the build system was named like one shipped by gnulib (so less suspicious), but xz-utils does not use gnulib, and thus the autotools machinery does not know anything about it, so even the «autoreconf -f -i» done by debhelper via dh-autoreconf, would not regenerate it.

The way I see it, there are two options in handling a buildable package:

1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.

2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.

Either of these is, in my mind, dramatically better than what happened.

Sure, but that relies on knowing for certain what is and what is not autogenerated for 1), and to not be able to get drown in autogenerated
output for 2) so that this cannot be easily missed, and for autoreconf
to do what we expect! Also important is when this would be done, only
on initial packaging, on every build? Because 1) has the bad property
that it might get removed during initial packaging inspection, but
might then stay latent until surrounding conditions activate it again
if we are not continuously performing that kind of check (say on every
build of the package).

One automatic approach would be run dh-autoreconf and identify the
changed files. Remove those files from both the distributed tarball and
git tag. Check if those differ. (You also suggest something very similar
to this, and repacking the archive with those debian-generated build artifacts).

I may be missing something here, though!

In theory this would be an option, I'm not sure how feasible this is
in practice, though. :/ At least as of now.

Removing these might be cumbersome after the fact if upstream includes
for example their own maintained .m4 files. See dpkg's m4 dir for an example of this (although there it's easy as all are namespaced but…).

I am not an m4 expert (in fact, I have specifically tried to avoid
learning anything more about auto(make/reconf) than absolutely necessary.

My point is just: either those files are needed, or not. If they're
needed, they need to not differ. And if they're not, they should
be scrubbed.

I think you are saying that doing this automatically is going to be hard/impossible. Is that fair?

Let's try to go in detail on how this was done on the build system
side (I'm doing this right now, as previously only had skimmed over
the process).

The build system hook was planted in the tarball by adding a modified m4/build-to-host.m4 file. This file is originally from gnulib (but
gettext would usually embed it if it required it). The macros contained
within are used by m4/gettext.m4 coming from gettext.

So to start with, this dependency (the AM_GNU_GETTEXT macro uses gl_BUILD_TO_HOST) is only present with newer gettext versions. The
tarball was autoreconf'ed with gettext 0.22.4, Debian has gettext 0.21,
which does not pull that dependency in. In that case if gettext.m4
would get modified in this build now, then the hook would be inert,
but once we update to a newer gettext then it would get activated
again.

The m4/build-to-host.m4 file in addition to hooking the payload into
the build system, also got its serial number bumped from 3 to 30.

And the bigger issue is that «autoreconf -f -i» does not even refresh
the files (as you'd expect from the --force), if the .m4 serial is higher.
So in Debian currently, the gettext.m4 in the tarball does not get
refreshed (still pulling in the malicious build-to-host.m4, which
would not happen with the gettext version from Debian), and if we
updated to a newer gettext then it would not update build-to-host.m4
anyway due to its bumped serial.

This seems like a serious bug in autoreconf, but I've not checked if
this has been brought up upstream, and whether they consider it's
working as intended. I expect the serial to be used only when not
in --force mode though. :/

On the other side of the coin, is that the malicious actor added
precisely that .m4 file into its .gitignore file (as you'd usually
expect due to the new gettext version pulling that in), so if we were
ignoring changes based on trusting upstream, then that could slip
through if we only use this for checking, unless we repackage or always
clean these before builds:

https://git.tukaani.org/?p=xz.git;a=commitdiff;h=4323bc3e0c1e1d2037d5e670a3bf6633e8a3031e

Not using an upstream provided tarball, might also mean we stop being
able to use upstream signatures, which seems worse. The alternative
might be promoting for upstreams to just do the equivalent of
«git archive», but that might defeat the portability and dependency reduction properties that were designed into the autotools build
system, or increase the bootstrap set (see for example the pkg.dpkg.author-release build profile used by dpkg).

Ok, so am I understanding you correctly in that you are saying: we do actually want *some* build artifacts in the source archives?

Upstream might want them. And if we repackage out of principle, then
we might lose on other properties, such as signatures for code
provenance trails and similar.

If that's the case, could make those files at packaging time, analogous
to the DFSG-exclude stripping process?

Ideally we'd remove all autogenerated files, but I'm not sure how
effective that would be against a determined malicious actor.

In the present case, the triggering modification was in a modified .m4 file
that injected a snippet into the configure script. That modification
could have been flagged using this kind of process.

I don't think this modification would have been spotted, because it
was not modifying a file it would usually get autogenerated by its
build system.

If we look at what ./autogen.sh would have changed, and scrub those
files from the release archive, wouldn't that mean that the malicious
m4 file would have been spotted, since it would NOT have been autogenerated?

Not in this case no. autoreconf would have left alone both the
gettext.m4 and the build-to-host.m4 files. I mean you are left with
those (and other) files for manual review, and while like Russ I think
I'm fluent in m4 and autotools stuff, if I had to skim review that
file, it would look very non-suspicious to me, TBH.

While this would be a lot of work, I believe doing so would require a
much larger amount of additional complexity in orchestrating attacks
against Debian in the future.

It would certainly make it a bit harder, but I'm afraid that if you
cannot trust upstream and they are playing a long game, then IMO they
can still sneak nasty stuff even in plain sight with just code commits, unless you are paying extreme close attention. :/

See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.

I take a look at these every year or so to keep me terrified of C!
If it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there is hope (of them!) identifying bad(tm) things.

I just want to make sure that what we actually pull in is what the
community is actually reviewing. I feel like anything less gets dangerous. (Given few enough eyeballs, all bugs are deep!)

But, I will definitely concede that, had I seen a commit that changed
that line in the m4, there's a good chance my eyes would have glazed over
it.

I think the biggest issue is that we are pretty much based on a model
that relies on trusting upstreams, for code, for license and copyright compliance, etc. We tend to assume upstreams (and us!) can make
mistakes, but that in general they are not working against us.

When confronted with a known hostile (and not necessarily malicious)
upstream the only winning game is not to play. If we do not even know
the upstream is hostile and/or malicious that seems like a losing
prospect to me. There are so many ways such upstream can slip stuff
through in this model that this gets really nasty really quickly.

Don't get me wrong, I think we can/should modify our processes and
tooling somehow to at last try tot deter this path as much as possible,
but it still seems to go counter to our model, and seems like a losing prospect. (You could have an upstream that tries to overwhelm you with
sheer amount of commits for example. In this case they even included
the bulk of the backdoor in git, and in the end I guess I don't see
much difference between smuggling something through git or a tarball.)

And, coming back to the Debian side of things. To me the most
important part is that we might be able to close a bit this door with
upstream, but what about this happening within Debian? I think we have discussed in the past, what would happen if someone tried this kind of
long term attack on the project, and my feeling is that we have kind
of shrugged it off as either "it would take too much effort so it's implausible" or "if they want to do it we are lost anyway" but perhaps
I'm misremembering.

Related to this, dgit has been brought up as the solution to this, but
in my mind this incident reinforces my view that precisely storing
more upstream stuff in git is the opposite of what we'd want, and
makes reviewing even harder, given that in our context we are on a
permanent fork against upstream, and if you include merge commits and
similar, there's lots of places to hide stuff. In contrast storing
only the packaging bits (debian/ dir alone) like pretty much every
other downstream is doing with their packaging bits, makes for an
obviously more manageable thing to review and not get drown into,
more so if we have to consider that next time perhaps the long-game
gets played within Debian. :(

(An additional bonus of only keeping debian/ directories is that it
would make it possible to checkout all Debian packaging locally. :)

Thanks,
Guillem

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gioele Barabucci@21:1/5 to Jonathan Carter on Sat Mar 30 16:00:01 2024

On 30/03/24 14:08, Jonathan Carter wrote:

On 2024/03/30 12:43, Sean Whitton wrote:

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs
"source
deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the
advantages.

It is also already fully implemented as tag2upload, and is merely as yet
undeployed, for social reasons.

My understanding is that DSA aren't quite comfortable with it, since it
would need to archive GPG signing key (or a keypair trusted by DAK)?

Don't the buildd already work like in similar way?

The source deb is signed by the DD, the buildd checks the signature of
the source deb, then builds and signs the binary debs.

In the future the tag is signed by the DD, the buildd checks the
signature of the tag, then builds and signs the source deb and the
binary debs.

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Luca Boccassi on Sat Mar 30 16:50:01 2024

Luca Boccassi <bluca@debian.org> writes:

In the end, massaged tarballs were needed to avoid rerunning autoconfery
on twelve thousands different proprietary and non-proprietary Unix
variants, back in the day. In 2024, we do dh_autoreconf by default so
it's all moot anyway.

This is true from Debian's perspective. This is much less obviously true
from upstream's perspective, and there are some advantages to aligning
with upstream about what constitutes the release artifact.

When using Meson/CMake/home-grown makefiles there's no meaningful
difference on average, although I'm sure there are corner cases and exceptions here and there.

Yes, perhaps it's time to switch to a different build system, although one
of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not
sure how one does that in the non-Autoconf build systems. Meson's Porting
from Autotools [1] page, for example, doesn't seem to address this use
case at all.

[1] https://mesonbuild.com/Porting-from-autotools.html

Maybe the answer is "you should give up on portability to older systems as
the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to ij@2023.bluespice.org on Sat Mar 30 16:50:01 2024

Ingo Jürgensmann <ij@2023.bluespice.org> writes:

This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is
widely used, insert some malicious code and have fun. Then also imagine
stuff that automates builds in other ways like docker containers, Ruby,
Rust, pip that pull stuff from the network and installs it without
further checks.

I hope (and am confident) that Debian as a project will react
accordingly to prevent this happening again.

Debian has precisely the same problem. We have more work to do than we possibly can do with the resources we have, there is some funding but not
a lot of funding so most of the work is hobby work stolen from scarce free time, and we're under a lot of pressure to encourage and incorporate the
work of new maintainers.

And 99% of the time trusting the people who step up to help works out
great.

The hardest part about defending against social engineering is that it
doesn't attack attack the weakness of a community. It attacks its
*strengths*: trust, collaboration, and mutual assistance.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Simon Josefsson on Sat Mar 30 17:00:02 2024

Simon Josefsson <simon@josefsson.org> writes:

Sean Whitton <spwhitton@spwhitton.name> writes:

We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.

Can you share that analysis? As far as I understand, it is possible for
a malicious actor to create a git repository with the same commit id as
HEAD, with different historic commits and tree content. I thought a
signed tag is merely a signed reference to a particular commit id. If
that commit id is a SHA1 reference, that opens up for ambiguity given
recent (well, 2019) results on SHA1. Of course, I may be wrong in any
of the chain, so would appreciate explanation of how this doesn't work.

I believe you're talking about two different things. I think Sean is
talking about preimage resistance, which assumes that the known-good
repository is trusted, and I believe Simon is talking about manufactured collisions where the attacker controls both the good and the bad
repository.

The dgit and tag2upload design probably (I'd have to think about it some
more, ideally while bouncing the problem off of someone else, because I've recycled those brain cells for other things) only needs preimage
resistance, but the general case of a malicious upstream may be vulnerable
to manufactured collisions.

(So far as I know, preimage attacks against *MD5* are still infeasible,
let alone against SHA-1.)

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Josefsson@21:1/5 to Jonathan Carter on Sat Mar 30 16:30:01 2024

Jonathan Carter <jcc@debian.org> writes:

On 2024/03/30 11:05, Simon Josefsson wrote:

1. Move towards allowing, and then favoring, git-tags over source tarballs >>

Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball

How in the world do you jump to that conclusion?

By comparing the set of tools required to build from git with the tools installed by Build-Depends* for common projects. I'm thinking of
projects like coreutils, wget, libidn2, gnutls, gzip, etc.

/Simon

-----BEGIN PGP SIGNATURE-----

iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZggvIhQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFonaFAPwKrhr8lOLym7COxOTZFVOqoq5KCXAU gwgTTsadxLUCWQD/Thjj5aWEeq787CBGdYbLcLBcUxzZ81mBxOFKf/SaoAU=
=xjdB
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeremy Stanley@21:1/5 to Russ Allbery on Sat Mar 30 17:10:01 2024

On 2024-03-29 23:29:01 -0700 (-0700), Russ Allbery wrote:
[...]

if the Git repository is somewhere other than GitHub, the
malicious possibilities are even broader.

[...]

I would not be so quick to make the same leap of faith. GitHub is
not itself open source, nor is it transparently operated. It's a
proprietary commercial service, with all the trust challenges that
represents. Long, long before XZ was a twinkle in anyone's eye,
malicious actors were already regularly getting their agents hired
onto development teams to compromise commercial software. Just look
at the Juniper VPN backdoor debacle for a fairly well-documented
example (but there's strong evidence this practice dates back well
before free/libre open source software even, at least to the 1970s).

If anything, compromising an open project or transparent service is
probably considerably harder, these sorts of people thrive in the
comfort of shadows that the proprietary software world offers them,
and (thankfully) struggle in the open, like with the rather quick identification and public response demonstrated in this case. I
would be quite surprised by similarly rapid or open discussion from
a proprietary service who discovered a saboteur in their ranks.
--
Jeremy Stanley

-----BEGIN PGP SIGNATURE-----

iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmYIL5hfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCnCkxAA1ptcyhGOoFKWMAZA4xCMZq9PmVrz1UcWIm++z/qPiyWLU9uiXvqiyNuK vyxH2R709ErVTCQVLdNK7Ty2MUFvPZbTxKjTw65dhBXw5zZMQMRBcJIYfC3IzLwi lBoRF8SDWKrXjkJ4asGxtJAs+/Obfc8d5KKcbmu7xat+1o0wlIb9dD54OsOGn7u5 Rus98RQ432IQXh1Xj1SG2Dph8YJPJVTqKGRKDJt4orBbUwm4m09iRwdSF7qfTfto lEbdCNVpP5zqxm8FtacX25pSjQIeVyvpcFlEI2aCjm+WRcRypbU7AS6yQp6JOfoT CMyukNVgQN+Is5Fvp8QUL+ucVRafKM5Z0HxSZD9kNzf11uxy4cDZ2xP/gyjLydTi fPhHSt6Aav0syrpJ92QveFVOvDkd9kJHKhqAkFw0gQUGIXjGRsctO6zoiwbhSN2n b0U2+arAGhpnjnwXyB1mmSidj97I/aFvfC4mAvoISrmHpEBYr74hFutEFMH7DgAO 441SCalwgGBozT4xBmHhckRw0ngzocJ/UOmlv0eCEwVcNTuQ4CYb31LinBVSrh6A 94IJyhr2Oj6d+pUYH1aHt9BgsKU6YXKnoL4cd9kzKESGL5xg0DoHSTcvBBcYtr80 smYL8uztAupTlTb/w8s6zU7yqbGA6bAM58fbwwHhWiOveBkMlYQ=
=t0hC
-----END PGP SIGNATURE-----

--- SoupGate-Win32

From Russ Allbery@21:1/5 to Jeremy Stanley on Sat Mar 30 17:20:01 2024

Jeremy Stanley <fungi@yuggoth.org> writes:

On 2024-03-29 23:29:01 -0700 (-0700), Russ Allbery wrote:
[...]

if the Git repository is somewhere other than GitHub, the
malicious possibilities are even broader.

[...]

I would not be so quick to make the same leap of faith. GitHub is
not itself open source, nor is it transparently operated. It's a
proprietary commercial service, with all the trust challenges that represents. Long, long before XZ was a twinkle in anyone's eye,
malicious actors were already regularly getting their agents hired
onto development teams to compromise commercial software. Just look
at the Juniper VPN backdoor debacle for a fairly well-documented
example (but there's strong evidence this practice dates back well
before free/libre open source software even, at least to the 1970s).

This is a valid point: let me instead say that the malicious possibilities
are *different*. All of your points about GitHub are valid, but the counterexample I had in mind is one where the malicious upstream runs the entire Git hosting architecture themselves and can make completely
arbitrary changes to the Git repository freely. I don't think we know everything that is possible to do in that situation. I think it would be difficult (not impossible, but difficult) to get into that position at
GitHub, whereas it is commonplace among self-hosted projects.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Iustin Pop@21:1/5 to Luca Boccassi on Sat Mar 30 20:50:01 2024

On 2024-03-30 11:47:56, Luca Boccassi wrote:

On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:

On 2024-03-30 08:02:04, Gioele Barabucci wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the advantages.

The problem is that this requies functionality testing to be fully automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.

This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings

I will be the first to admit I'm not up to date on latest Salsa news,
but see, what you mention - "include the yml" - is exactly what I don't
want.

If maintainers need to include a yaml file, it means it can vary between projects, which means it can either have bugs or be hijacked. In my
view, there should be no freedom here, just one setting - "enable
tag2upload with automated autopkg testing", and all packages would
behave mostly the same way. But there are 2KiB single-binary packages as
well as 2GB 25 binary packages, so maybe this is too wide scope.

I just learned about tag2upload, need to look into that.

(I'm still processing this whole story, and I fear the fallout/impact
in terms of how development is regarded will be extremely high.)

regards,
iustin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andrey Rakhmatullin@21:1/5 to Iustin Pop on Sat Mar 30 21:00:01 2024

On Sat, Mar 30, 2024 at 10:56:40AM +0100, Iustin Pop wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the advantages.

The problem is that this requies functionality testing to be fully
automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".

Do you mean this theoretical workflow will not have a step of the
maintainer actually looking at the package and running it locally, or
running any building or linting locally before pushing the changes?
Then yeah, looking at some questions in the past years I understand that
some people are already doing that, powered by Salsa CI (I can think of
several possible reasons for that workflow but it still frustrates me).

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.

AFAIK the currently suggested way of enabling that is putting "recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
file" in the salsa settings (no idea where is the page that tells that or
how to find it even knowing it exists).

--
WBR, wRAR

-----BEGIN PGP SIGNATURE-----

iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmYIbvktFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh 7oMQAKB5vD1dcIRNbhN/3ThtkxMcW3TSHr13ugDsTKpZlFavfFJh7E1/cbYupsXi tH0FHlsI1zo4K5KDzjslEZsPMtUOP7zMamwJAwApnAtVwROmWzZbawXqZs9bze5P iS4EdqYrO0oiqRQjzIhjTslplXwFR/Dbra+Ti/dtNSY/edV2Iq/kZ0sP9CzJV6y5 of6IN3U0oen7WTYSZBcGp9yPyzlYYs78sbQbIv2gx2geddHXZz8/NFupN2JKm89P 3F41lQnnT3DofWZ/UcGg+p6xmaza3EDuLmHxMMobYf34GoNMISRO+7JqTvZBYb8z czofGshU+I47dTuEGTHZE0wbzbMOr8G2yb/gfq0fesuFPE3ivQoJRB4l3plJ9IB8 UNhFGkdfzL4A0TCuro1XGf29q6DsKIrgwsPD5hECvrZzAX5zBKsPy7YOiHdb5Mjy 4hbbNzBv2AAXsLESbjENbvqUqLQf4cOdNbHOmGEZBfqAAHBk2BbJKMNn3DYjDDjk TJ4aVuXFpXiA8AJnLbEO+fqpjxrUnkGAerGEX+IxAybShGEWWu1v+totP3WdnUel JkN/aNa7UtwGWip1m2JcsfMRycZGP+L1r8wW+8VJ9WP3Twt7Sk2C8wFWFkuGJJ/D 8bbZrHJttnV1a71vkVw2YjlK6asrh59Al13i4AsGlLMBUDSb
=5lPZ
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Robert Edmonds@21:1/5 to Russ Allbery on Sat Mar 30 21:50:01 2024

Russ Allbery wrote:

Yes, perhaps it's time to switch to a different build system, although one
of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
case at all.

[1] https://mesonbuild.com/Porting-from-autotools.html

Have a look at the documentation for the meson "compiler" object [1]. There is a
lot of functionality in meson that has analogs in autoconf that isn't described in the "Porting from Autotools" document.

[1] https://mesonbuild.com/Reference-manual_returned_compiler.html

--
Robert Edmonds
edmonds@debian.org

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Iustin Pop@21:1/5 to Andrey Rakhmatullin on Sat Mar 30 21:30:01 2024

On 2024-03-31 00:58:49, Andrey Rakhmatullin wrote:

On Sat, Mar 30, 2024 at 10:56:40AM +0100, Iustin Pop wrote:

Now it is time to take a step forward:

1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.

This change would have three advantages:

I think everyone fully agrees this is a good thing, no need to list the advantages.

The problem is that this requies functionality testing to be fully automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".

Do you mean this theoretical workflow will not have a step of the
maintainer actually looking at the package and running it locally, or
running any building or linting locally before pushing the changes?
Then yeah, looking at some questions in the past years I understand that
some people are already doing that, powered by Salsa CI (I can think of several possible reasons for that workflow but it still frustrates me).

Not that it necessarily won't have that step, but how to integrate the
testing into the tag signing/pushing step.

I.e. before moving archive wide to "sign tag + push", there should be a standard of how this is all tested for a package. Maybe there is and I'm
not aware, my Debian activities are very low key (but I try to keep up
with mailing lists).

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.

AFAIK the currently suggested way of enabling that is putting "recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
file" in the salsa settings (no idea where is the page that tells that or
how to find it even knowing it exists).

Aha, see, this I didn't know. On my list to test once archive is
unblocked and I have time for packaging.

regards,
iustin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Adrian Bunk@21:1/5 to Antonio Russo on Sat Mar 30 22:20:01 2024

On Fri, Mar 29, 2024 at 06:21:27PM -0600, Antonio Russo wrote:

...
1. Move towards allowing, and then favoring, git-tags over source tarballs
...

git commit IDs, not tags.

Upstream moving git tags does sometimes happen.

Usually for bad-but-not-malicious reasons like "add one more last-minute fix", but using tags would also invite to manipulation similar to what
happened with xz at any point after the release.

Best,
Antonio Russo

cu
Adrian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Josefsson@21:1/5 to Russ Allbery on Sat Mar 30 23:20:01 2024

Russ Allbery <rra@debian.org> writes:

Simon Josefsson <simon@josefsson.org> writes:

Sean Whitton <spwhitton@spwhitton.name> writes:

We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.

Can you share that analysis? As far as I understand, it is possible for
a malicious actor to create a git repository with the same commit id as
HEAD, with different historic commits and tree content. I thought a
signed tag is merely a signed reference to a particular commit id. If
that commit id is a SHA1 reference, that opens up for ambiguity given
recent (well, 2019) results on SHA1. Of course, I may be wrong in any
of the chain, so would appreciate explanation of how this doesn't work.

I believe you're talking about two different things. I think Sean is
talking about preimage resistance, which assumes that the known-good repository is trusted, and I believe Simon is talking about manufactured collisions where the attacker controls both the good and the bad
repository.

Right. I think the latter describes the xz scenario: someone could have
pushed a maliciously crafted commit with a SHA1 collision commit id, so
there are two different git repositories with that commit id, and a
signed git tag on that commit id authenticates both trees, opening up
for uncertainty about what was intended to be used. Unless I'm missing
some detail of how git signed tag verification works that would catch
this.

The dgit and tag2upload design probably (I'd have to think about it some more, ideally while bouncing the problem off of someone else, because I've recycled those brain cells for other things) only needs preimage
resistance, but the general case of a malicious upstream may be vulnerable
to manufactured collisions.

It is not completely clear to me: How about if some malicious person
pushed a commit to salsa, asked a DD to "please review this repository
and sign a tag to make the upload"? The DD would presumably sign a
commit id that authenticate two different git trees, one with the
exploit and one without it.

/Simon

-----BEGIN PGP SIGNATURE-----

iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgiNjBQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFopFPAQDCoYbz03xv70Ktsh4BLtVOAfXQC2ON dz4zVJbbBdUihAD/fqlYjipcoBNoIUe7+cHPEQgPN/HnncUgqX/yqqxbMgc=
=lJtZ
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Adrian Bunk@21:1/5 to Russ Allbery on Sat Mar 30 23:20:01 2024

On Fri, Mar 29, 2024 at 11:29:01PM -0700, Russ Allbery wrote:

...
In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics. I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm not sure this is obviously true.
...

There are also other reasons why using tarballs by default is no longer
a good option.

In many cases our upstream source is the unsigned tarball Github
automatically provides for every tag, which invites MITM attacks.

The hash of these tarballs is expected to change over time, which makes
it harder to reliably verify that the upstream sources we have in the
archive match what is provided upstream.

cu
Adrian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Simon Josefsson on Sun Mar 31 00:20:01 2024

Simon Josefsson <simon@josefsson.org> writes:

Russ Allbery <rra@debian.org> writes:

I believe you're talking about two different things. I think Sean is
talking about preimage resistance, which assumes that the known-good
repository is trusted, and I believe Simon is talking about
manufactured collisions where the attacker controls both the good and
the bad repository.

Right. I think the latter describes the xz scenario: someone could have pushed a maliciously crafted commit with a SHA1 collision commit id, so
there are two different git repositories with that commit id, and a
signed git tag on that commit id authenticates both trees, opening up
for uncertainty about what was intended to be used. Unless I'm missing
some detail of how git signed tag verification works that would catch
this.

This is also my understanding.

The dgit and tag2upload design probably (I'd have to think about it
some more, ideally while bouncing the problem off of someone else,
because I've recycled those brain cells for other things) only needs
preimage resistance, but the general case of a malicious upstream may
be vulnerable to manufactured collisions.

It is not completely clear to me: How about if some malicious person
pushed a commit to salsa, asked a DD to "please review this repository
and sign a tag to make the upload"? The DD would presumably sign a
commit id that authenticate two different git trees, one with the
exploit and one without it.

Oh, hm, yes, this is a good point. I had forgotten that tag2upload was intended to work by pushing a tag to Salsa. This means an attacker can potentially race Salsa CI to move that tag to the malicious tree before
the tree is fetched by tag from Salsa, or reuse the signed tag with a
different repository with the same SHA-1.

The first, most obvious step is that one has to make sure that a signed
tag is restricted to a specific package and version and not portable to a different package and/or version that has the same SHA-1 hash due to
attacker construction. There are several obvious ways that could be done;
the one that comes immediately to mind is to require the tag message be
the source package name and version number, which is good practice anyway.

I think any remaining issues could be addressed with a fairly simple modification to the protocol: rather than pushing the signed tag to Salsa,
the DD reviewer should push the signed tag to a separate archive server
similar to that used by dgit today. As long as the first time the signed
tag leaves the DD's system is in conjunction with a push of the
corresponding reviewed tree to secure project systems, this avoids the substitution problem. The tag could then be pushed back to Salsa, either
by the DD or by the service.

This unfortunately means that one couldn't use the Salsa CI service to do
the source package construction, and one has to know about this extra
server. I think that restriction comes from the fact that we're worried
an attacker may be able to manipulate the Salsa Git repository (through
force pushes and tag replacements, for example), whereas the separate
dedicated archive server can be more restrictive and never allow force
pushes or tag moves, and reject any attempts to push a SHA-1 hash that has already been seen.

Another possible option would be to prevent force pushes and tag moves in Salsa, since I think one of those operations would be required to pull off
this attack, but maybe I'm missing someting. One of the things I'm murky
on is exactly what Git operations are required to substitute the two trees
with identical SHA-1 hashes. That property is going to break Git in weird ways, and I'm not sure what that means for one's ability to manipulate a
Git repository over the protocols that Salsa exposes.

Obviously it would be ideal if Git used stronger hashes than SHA-1 for
tags, so that one need worry less about all of this.

Even if my analysis is wrong, I think there are some fairly obvious and
trivial additions to the tag2upload process that would prevent this
attack, such as building a Merkle tree of the reviewed source tree using a SHA-256 hash and embedding the top hash of that tree in the body of the
signed tag where it can be verified by the archive infrastructure. That
might be a good idea *anyway*, although it does have the unfortunate side effect of requiring a local client to produce a correct tag rather than
using standard Git signed tags. Uploading to Debian currently already semi-requires a custom local client, so to me this isn't a big deal,
although I think there was some hope to avoid that.

(These variations unfortunately don't help with the upstream problem.)

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Timo =?utf-8?Q?R=C3=B6hling?=@21:1/5 to All on Sun Mar 31 02:00:01 2024

Hi,

* Simon Josefsson <simon@josefsson.org> [2024-03-30 12:19]:

Relying on signed git tags is not reliable because git is primarily >SHA1-based which in 2019 cost $45K to do a collission attack for.

FWIW, Gitlab is working on support for SHA 256 hashing [1], and as
of Git 2.42, the SHA 256 repository format has matured enough that
backwards incompatible breaks are very unlikely [2].

Cheers
Timo

[1]
https://about.gitlab.com/blog/2023/08/28/sha256-support-in-gitaly/
[2] https://lore.kernel.org/lkml/xmqqr0nwp8mv.fsf@gitster.g/

--
⢀⣴⠾⠻⢶⣦⠀ ╭────────────────────────────────────────────────────╮
⣾⠁⢠⠒⠀⣿⡁ │ Timo Röhling │
⢿⡄⠘⠷⠚⠋⠀ │ 9B03 EBB9 8300 DF97 C2B1 23BF CC8C 6BDD 1403 F4CA │
⠈⠳⣄⠀⠀⠀⠀ ╰────────────────────────────────────────────────────╯

-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEEJvtDgpxjkjCIVtam+C8H+466LVkFAmYIpYcACgkQ+C8H+466 LVmYeQv/dmYgZrTqNH4PdT7n4STBYCCLDgyFyZlHV3ox8Vc8wqcLzEGowEJKLzXS EhWqbxze2uSGfE29GC6XZxXsJ+UvBcx8fBfXccr3iu8ycYF28liS0eycHRfK8FN6 hvN/vXbkcFJhPgTRndqLblFXlvHko0s61CorBE/N/nyUb8XxfKtVDAR/CcXEIWZ5 eiICijhAq6vfANoEXmmAg1+PDnRs0R3UPfio5ffWx0yZ7WdbTlB+s4GCIDtDxLH4 YT0Y3XyuWO7nEC3vZ6etyEN/Vj58D8Rxhu4xzGypM6SS9F9DL/GtvA8k8g0kzarA vWCF98a+gRDwoyLtdzSk3PWcx3De2eu83r4PPaDahXs

From Gioele Barabucci@21:1/5 to Iustin Pop on Sun Mar 31 08:10:01 2024

On 30/03/24 20:43, Iustin Pop wrote:

On 2024-03-30 11:47:56, Luca Boccassi wrote:

On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:

Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.

This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings

I will be the first to admit I'm not up to date on latest Salsa news,
but see, what you mention - "include the yml" - is exactly what I don't want.

Salsa CI is enabled by default for all projects in the debian/ namespace <https://salsa.debian.org/debian/>.

Adding a yml file or changing the CI settings to reference the Salsa CI pipeline is needed only for projects in team- or maintainer-specific repositories, or when the dev wants to enable additional tests (or configure/block the default tests).

Regard,

--
Gioele Barabucci

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lucas Nussbaum@21:1/5 to Russ Allbery on Sun Mar 31 08:20:01 2024

On 29/03/24 at 23:29 -0700, Russ Allbery wrote:

Antonio Russo <antonio.e.russo@gmail.com> writes:

But, I will definitely concede that, had I seen a commit that changed
that line in the m4, there's a good chance my eyes would have glazed
over it.

This is why I am somewhat skeptical that forcing everything into Git
commits is as much of a benefit as people are hoping. This particular attacker thought it was better to avoid the Git repository, so that is evidence in support of that approach, and it's certainly more helpful,
once you know something bad has happened, to be able to use all the Git
tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be
force-pushed, and if the Git repository is somewhere other than GitHub,
the malicious possibilities are even broader.

We could narrow those possibilities somewhat by maintaining
Debian-controlled mirrors of upstream Git repositories so that we could detect rewritten history. (There are a whole lot of reasons why I think
dgit is a superior model for archive management. One of them is that it captures the full Git history of upstream at the point of the upload on Debian-controlled infrastructure if the maintainer of the package bases it
on upstream's Git tree.)

I wonder if Software Heritage could help with that part?

Lucas

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Sven Joachim@21:1/5 to Simon Josefsson on Sun Mar 31 09:10:01 2024

On 2024-03-30 12:19 +0100, Simon Josefsson wrote:

Gioele Barabucci <gioele@svario.it> writes:

Just as an example, bootstrapping coreutils currently requires
bootstrapping at least 68 other packages, including libx11-6 [1]. If
coreutils supported <nodoc> [2], the transitive closure of its
Build-Depends would be reduced to 20 packages, most of which in
build-essential.

[1]
https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
[2] https://bugs.debian.org/1057136

Coreutils in Debian uses upstream tarballs and does not do a full
bootstrap build. It does autoreconf instead of ./bootstrap. So the dependencies above is not the entire bootstrapping story to build
coreutils from git compared to building from tarballs.

The coreutils bootstrap script fetches files over the network, so it is
not possible to build the Debian package from upstream git tags. At the
very least it would lack any translations, and there is also the
problem of the gnulib submodule.

Cheers,
Sven

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Zacchiroli@21:1/5 to Lucas Nussbaum on Sun Mar 31 09:40:01 2024

On Sun, Mar 31, 2024 at 08:16:33AM +0200, Lucas Nussbaum wrote:

On 29/03/24 at 23:29 -0700, Russ Allbery wrote:

This is why I am somewhat skeptical that forcing everything into Git commits is as much of a benefit as people are hoping. This particular attacker thought it was better to avoid the Git repository, so that is evidence in support of that approach, and it's certainly more helpful,
once you know something bad has happened, to be able to use all the Git tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be force-pushed, and if the Git repository is somewhere other than GitHub,
the malicious possibilities are even broader.

I wonder if Software Heritage could help with that part?

Yeah (provided that archival happens at the right moment) you can use
Software Heritage APIs to detect, for instance, git history rewrites as
and also commits moving from one branch/tag to another.

It occurs to me that in the Guix/Nix packaging model, where they note
down the commit of interest in their packaging recipe, you'll also automatically discover if a commit disappeared from upstream repo
without needing a lot of extra tooling/integration (although not if it
has moved between branches). However, you need a backup place to
retrieve the commit from in case it disappear or gets rewritten upstream
(Guix uses Software Heritage for this).

Cheers
--
Stefano Zacchiroli . zack@upsilon.cc . https://upsilon.cc/zack _. ^ ._
Full professor of Computer Science o o o \/|V|\/ Télécom Paris, Polytechnic Institute of Paris o o o </> <\> Co-founder & CTO Software Heritage o o o o /\|^|/\ https://twitter.com/zacchiro . https://mastodon.xyz/@zacchiro '" V "'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Luca Boccassi@21:1/5 to Russ Allbery on Sun Mar 31 13:00:01 2024

On Sat, 30 Mar 2024 at 15:44, Russ Allbery <rra@debian.org> wrote:

Luca Boccassi <bluca@debian.org> writes:

In the end, massaged tarballs were needed to avoid rerunning autoconfery
on twelve thousands different proprietary and non-proprietary Unix variants, back in the day. In 2024, we do dh_autoreconf by default so
it's all moot anyway.

This is true from Debian's perspective. This is much less obviously true from upstream's perspective, and there are some advantages to aligning
with upstream about what constitutes the release artifact.

My point is that, while there will be for sure exceptions here and
there, by and large the need for massaged tarballs comes from projects
using autoconf and wanting to ship source archives that do not require
to run the autoconf machinery. And said upstreams might care about
this because they support backward compatibility with ancient Unix
stuff and such like (I mean, I _am_ upstream in one project that does
exactly this for exactly this reason, zeromq, so I understand that
requirement perfectly well).
However, we as in Debian do not have this problem. We can and do
re-run the autoconf machinery on every build. And at least on the main
forges, the autogenerated (and thus out of reach from this kind of
attacks) tarball is always present too - the massaged tarball is an
_addition_, not a _substitution_. Hence: we should really really think
about forcing all packages, by policy, to use the autogenerated
tarball by default instead of the autoconf one, when both are present,
unless extenuating circumstances (that have to be documented) are
present.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Iustin Pop@21:1/5 to Gioele Barabucci on Sun Mar 31 13:30:01 2024

On 2024-03-31 08:03:40, Gioele Barabucci wrote:

On 30/03/24 20:43, Iustin Pop wrote:

On 2024-03-30 11:47:56, Luca Boccassi wrote:

On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:

Give me good Salsa support for autopkgtest + lintian + piuparts, and easy support (so that I just have to toggle one checkbox), and I'm happy. Or even better, integrate all that testing with Salsa (I don't know if it has "CI tests must pass before merging"), and block tagging on the tagged version having been successfully tested.

This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings

I will be the first to admit I'm not up to date on latest Salsa news,
but see, what you mention - "include the yml" - is exactly what I don't want.

Salsa CI is enabled by default for all projects in the debian/ namespace <https://salsa.debian.org/debian/>.

Adding a yml file or changing the CI settings to reference the Salsa CI pipeline is needed only for projects in team- or maintainer-specific repositories, or when the dev wants to enable additional tests (or configure/block the default tests).

That sounds good, but are you sure that all /debian/ projects get it?

I chose one random package of mine, https://salsa.debian.org/debian/python-pyxattr, and on the home page I
see "Setup CI/CD" (implying it's disabled), and under build, I see
nothing enabled.

Is there a howto somewhere? Happy to read/follow.

iustin

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Adrian Bunk@21:1/5 to Luca Boccassi on Sun Mar 31 19:30:01 2024

On Sat, Mar 30, 2024 at 11:55:04AM +0000, Luca Boccassi wrote:

...
In the end, massaged tarballs were needed to avoid rerunning
autoconfery on twelve thousands different proprietary and
non-proprietary Unix variants, back in the day. In 2024, we do
dh_autoreconf by default so it's all moot anyway.
...

The first step of the xz exploit was in a vendored gnulib m4 file that
is not (and should not be) in git and that does not get updated by dh_autoreconf.

cu
Adrian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Luca Boccassi on Sun Mar 31 19:20:01 2024

Luca Boccassi <bluca@debian.org> writes:

On Sat, 30 Mar 2024 at 15:44, Russ Allbery <rra@debian.org> wrote:

Luca Boccassi <bluca@debian.org> writes:

In the end, massaged tarballs were needed to avoid rerunning
autoconfery on twelve thousands different proprietary and
non-proprietary Unix variants, back in the day. In 2024, we do
dh_autoreconf by default so it's all moot anyway.

This is true from Debian's perspective. This is much less obviously
true from upstream's perspective, and there are some advantages to
aligning with upstream about what constitutes the release artifact.

My point is that, while there will be for sure exceptions here and
there, by and large the need for massaged tarballs comes from projects
using autoconf and wanting to ship source archives that do not require
to run the autoconf machinery.

Just as a data point, literally every C project for which I am upstream
ships additional files in the release tarballs that are not in Git for
reasons unrelated to Autoconf and friends.

Most of this is pregenerated documentation (primarily man pages generated
from POD), but it also includes generated test data and other things. The reason is similar: regenerating those files requires tools that may not be present on an older system (like a mess of random Perl modules) or, in the
case of the man pages, may be old and thus produce significantly inferior output.

However, we as in Debian do not have this problem. We can and do re-run
the autoconf machinery on every build. And at least on the main forges,
the autogenerated (and thus out of reach from this kind of attacks)
tarball is always present too - the massaged tarball is an _addition_,
not a _substitution_. Hence: we should really really think about forcing
all packages, by policy, to use the autogenerated tarball by default
instead of the autoconf one, when both are present, unless extenuating circumstances (that have to be documented) are present.

I think this is probably right as long as by "autogenerated" you mean
basing the Debian package on a signed upstream Git tag and *locally*
generating a tarball to satisfy Debian's .orig.tar.gz requirement, not
using GitHub's autogenerated tarball that has all sorts of other potential issues.

Just to note, though, this means that we lose the upstream signature in
the archive. The only place the upstream signature would then live is in Salsa.

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco d'Itri@21:1/5 to Russ Allbery on Mon Apr 1 00:00:02 2024

On Mar 31, Russ Allbery <rra@debian.org> wrote:

Most of this is pregenerated documentation (primarily man pages generated from POD), but it also includes generated test data and other things. The reason is similar: regenerating those files requires tools that may not be present on an older system (like a mess of random Perl modules) or, in the case of the man pages, may be old and thus produce significantly inferior output.

But we do not use older systems to build our packages, so this does not matter.

Indeed, long ago I started building inn2 from the git tree, no more tarballs...
I switched long ago all my packages from tar archives to the git
upstream tree. Not only this makes much easier to understand the changes
in a new release, but it also makes possible packaging upstream
snapshots.

Just to note, though, this means that we lose the upstream signature in
the archive. The only place the upstream signature would then live is in Salsa.

Totally worth it!

--
ciao,
Marco

-----BEGIN PGP SIGNATURE-----

iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZgncuAAKCRDLPsM64d7X gYHvAP4qd/cIo+k0EKjkZp3A2CpTxMH5DUY7hMba9Q8yLvdxQwEAr4oNEFmToIe3 pGHL/B6Gxd2YQchnVjVCdxmw5wisqgk=
=45VN
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Rivera@21:1/5 to All on Mon Apr 1 00:40:01 2024

Hi Guillem (2024.03.30_04:41:37_+0000)

1. Move towards allowing, and then favoring, git-tags over source tarballs

I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?

There are some issues with transforming upstream's git-centric world
into tarballs for Debian source packages, that are worth bearing in mind.

The upstream git repository has some extra metadata available that
upstream build tools start depending on. Things like: versions, tracked
files, and ignored files.

This came up in the Python world, where setuptools-scm has become more
popular over the years. This is a plugin for setuptools that extracts
some metadata from the git repository:
1. Determine the current version. Historically, specified in setup.py.
2. Determine the data files that should be shipped in the installed
package. Historically, these were specified in a MANIFEST.in file,
but developers got lazy and delegated this problem to git.

Currently we set the version for packages that depend on 1 by an
environment variable that setuptools-scm will consume.

For packages that get file lists from git, it's a little more complex. setuptools writes a foo.eggi-info/SOURCES.txt into source artifacts that
it produces (sdists). When this file is present, it's used as a list of
files. https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest

So... for Python packages using setuptools-scm, we're pushed towards
depending on upstream-created source tarballs (sdists), rather than
upstream git archives, because we don't have the ".git" directory in our
source packages.

I can imagine that other ecosystems would run into similar problems and
solve them by inventing similar protocols, if they solve them at all.
Upstreams would probably prefer that we used git repositories *directly*
as source artifacts, but that comes with a whole other can of worms...

Stefano
--
Stefano Rivera
http://tumbleweed.org.za/
+1 415 683 3272

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From gregor herrmann@21:1/5 to Russ Allbery on Mon Apr 1 02:30:01 2024

On Sun, 31 Mar 2024 10:12:35 -0700, Russ Allbery wrote:

My point is that, while there will be for sure exceptions here and
there, by and large the need for massaged tarballs comes from projects using autoconf and wanting to ship source archives that do not require
to run the autoconf machinery.

Just as a data point, literally every C project for which I am upstream
ships additional files in the release tarballs that are not in Git for reasons unrelated to Autoconf and friends.

This is also true for every perl distribution on the CPAN made with
the standard build tools (and I write this as a response to a mail of
yours as I know that you know what I'm talking about :))

Just to note, though, this means that we lose the upstream signature in
the archive. The only place the upstream signature would then live is in Salsa.

This also means that we are, at least in some ecosystems, diverging
from the preferred way of distribution, and maybe more important, that
we are adding a new step 0 to our build process, which is: making a
(fake) upstream release.

Cheers,
gregor

--
.''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org
: :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06
`. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
`-

-----BEGIN PGP SIGNATURE-----

iQKTBAEBCgB9FiEE0eExbpOnYKgQTYX6uzpoAYZJqgYFAmYJ/5ZfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEQx RTEzMTZFOTNBNzYwQTgxMDREODVGQUJCM0E2ODAxODY0OUFBMDYACgkQuzpoAYZJ qgZdPxAAsMdU0WtwQpUV7E/utJX5OPuWZtJ7C0ooN3fNnEa2ONAJ7wqbi43GgfmW SWhiY1ngm/vhAYbSKcGMx4Hpy/82Qw+yeGHIX5ZgPEEw34fGh0IqIU8MK7EPTu08 1quNzyc0/7xr1VGrowRCjk5FxtvifNAaLgnytYBBckdPJJcUmq/o9siFxfGG+EUW MK9rT3DKsPF4/8ML4MzmyB55rLZVvpWRlT1iorFU4/rO2BV3M/qNYk1D09Vnq5+B XQtnJbjbqlFj7f8KBi0WsfdL1MmWIQqj70oF0nQIYjv/u31/KHEd8Nps8dUV2K8f 5wHnUicGCIUocy+pt/XL3v4m2GRoyA6To2LSvIN8BcholfnxjW6jHL+0MPBhhg/T 4YhyjiPv695TlmvjrK1RwO24RB6mmEp4DwuYPyRkGit4E85Ppx5J4tfV4ovasp3b GLmJrKnkIVWyXAoG3fjZ8o0qiDqXdrNC/aq2fNKXb09Xs1w+AiQLw/6higKGK/uS
V64ksDi8

From gregor herrmann@21:1/5 to Marco d'Itri on Mon Apr 1 02:40:01 2024

On Sun, 31 Mar 2024 23:59:20 +0200, Marco d'Itri wrote:

I switched long ago all my packages from tar archives to the git
upstream tree. Not only this makes much easier to understand the changes
in a new release,

That's not mutually exclusive. When adding an additional git remote
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.

Cheers,
gregor

--
.''`. https://info.comodo.priv.at -- Debian Developer https://www.debian.org
: :' : OpenPGP fingerprint D1E1 316E 93A7 60A8 104D 85FA BB3A 6801 8649 AA06
`. `' Member VIBE!AT & SPI Inc. -- Supporter Free Software Foundation Europe
`-

-----BEGIN PGP SIGNATURE-----

iQKTBAEBCgB9FiEE0eExbpOnYKgQTYX6uzpoAYZJqgYFAmYKAH1fFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEQx RTEzMTZFOTNBNzYwQTgxMDREODVGQUJCM0E2ODAxODY0OUFBMDYACgkQuzpoAYZJ qgafShAAmS5NLCiVR9n5T91jNV98xz+KSJbkdkWcdUlaV64JdgDGrfMeB+P/5eus nxdyYQ5ZFOIvadBftvoZhxJspR1F/zsoFNvG/Hn3018cqOQDjw4DULF+TH/pv0k6 nrFl6EDJmWwQr2Tin1I5QhjAUb/93QK8twryixzTjtxU5z84JTHvQ1RpdGfq/5sA SbK14phLJg7XQIWl/nyS9IlhIkFMITPhX64Zwh6rtu3WI6Y7Ha/MucKFcbzd1cCt 88O36MhHT0fWgo1WdOhfB2wkyz/hkdNga4s4KlHMbGLzCtXUV+JmzN7a5ad5Db8/ GaYXWUwDT9bBdogf1gLAnrSOgrdKbpFfIoWuiOTCLZm/9K1ojRXtEWm3S+eh57eB NvNmpJ9h9ZF4cLGDuFB2fJ6yVAYf2sczETUkZ1ixWtIsHlztZJ2ZHPzo7mmwA+pl Ho9Y7mg+7gIEhHQjt9eh4a180NHmjYDr0rFI2HHM63LanOiLqu7okSLxw5I9kf39 QGOODr9SJyQUhAEM8pB4kMQd++wN1BNTJzusfGcTdb5PcybJhiTnIBcoVGVMPmfx H0sVAsHEJ9SbP5q4BsagOoz2vH9ZfjOr8LKgBMQM0JXH+IxjR8qKobaN50xdy3Ia cJGvtgQFUh3yY/YgK1DekPt2oLUdByzF8v4rH+YJok6lhAGUVKM=
=3X39
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco d'Itri@21:1/5 to gregor herrmann on Mon Apr 1 02:50:01 2024

On Apr 01, gregor herrmann <gregoa@debian.org> wrote:

I switched long ago all my packages from tar archives to the git
upstream tree. Not only this makes much easier to understand the changes in a new release,

That's not mutually exclusive. When adding an additional git remote
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.

No: I get nothing of value by doing that and the repository will be
cluttered by commits that I do not care about.
Also: upstream VCS snapshots.

--
ciao,
Marco

-----BEGIN PGP SIGNATURE-----

iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZgoCewAKCRDLPsM64d7X gWYGAP9i2JFwsyEL+GDO0XNEm4DiGNO4OuhEgxdZisfR17kYLgD7BKkX8iF5nikR so7kKb6E+wVQY+rGS0jC2fZLRGsfIw0=
=Cb1u
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon Josefsson@21:1/5 to G. Branden Robinson on Mon Apr 1 11:40:01 2024

"G. Branden Robinson" <g.branden.robinson@gmail.com> writes:

At 2024-03-31T22:32:49+0000, Stefano Rivera wrote:

Upstreams would probably prefer that we used git repositories
*directly* as source artifacts, but that comes with a whole other can
of worms...

Speaking from my upstream groff perspective, I wouldn't _prefer_ that.

The distribution archives get build-testing on a much wider variety of systems, thanks to people on the groff@ and platform-testers@gnu mailing lists that help out when a release candidate is announced. They have
access to platforms more exotic that I and a few other bleeding-edge
HEAD mavens do. This practice tangibly improved the quality of the
groff 1.23.0 release, especially on surviving proprietary Unix systems.

Building from the repo, or using the bootstrap script--which Colin
Watson just today ensured will be in future distribution archives--is fine.[1] I'm glad some people build the project that way. But I think
that procedure serves an audience that is distinguishable in some ways.

Running ./bootstrap in a tarball may lead to different results than the maintainer running ./bootstrap in pristine git. It is the same problem
as running 'autoreconf -fvi' in a tarball does not necessarily lead to
the same result as the maintainer running 'autoreconf -fvi' from
pristine git. The different is what is pulled in from the system
environment. Neither tool was designed to be run from within a tarball,
so this is just bad practice that never worked reliable and without a
lot of complexity it will likely not become reliable either.

I have suggested before that upstream's (myself included) should publish PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
checkout including submodules, *.po translations, and whatever else is
required to actually build the project that is normally pulled in from
external places (autoconf archive macros?). This *-src.tar.gz tarball
should be possible to ./bootstrap and that would be the intended way to
build it for people who care about vendored files. Thoughts? Perhaps I
should formalize this proposal a bit more.

/Simon

Regards,
Branden

[1] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=822fef56e9ab7cbe69337b045f6f20e32e25f566

-----BEGIN PGP SIGNATURE-----

iIoEARYIADIWIQSjzJyHC50xCrrUzy9RcisI/kdFogUCZgp/UhQcc2ltb25Aam9z ZWZzc29uLm9yZwAKCRBRcisI/kdFoiUWAQDIvyUyTmCrKuXxAmewmrl1jNjTQxqM p7/Jnt1S6EJgHAD8DTt4SaMRiIUVDBdU+dqw2tZXqAWXLD2rOON3I32uOwo=
=ey1a
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bastian Blank@21:1/5 to gregor herrmann on Mon Apr 1 12:30:01 2024

On Mon, Apr 01, 2024 at 02:31:47AM +0200, gregor herrmann wrote:

That's not mutually exclusive. When adding an additional git remote
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.

And this will error out if there are unexpected changes in the tarball?
How will it be able to detect those?

Bastian

--
I've already got a female to worry about. Her name is the Enterprise.
-- Kirk, "The Corbomite Maneuver", stardate 1514.0

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco d'Itri@21:1/5 to Simon McVittie on Sat Apr 6 09:50:30 2024

On Apr 05, Simon McVittie <smcv@debian.org> wrote:

I find that having the upstream source code in git (in the same form that
we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely
useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively

I agree: it would be untinkable for me to not have the complete history immediately available while I am working on a package.

--
ciao,
Marco

-----BEGIN PGP SIGNATURE-----

iHUEABYIAB0WIQQnKUXNg20437dCfobLPsM64d7XgQUCZhB+3AAKCRDLPsM64d7X gdu6APoCqdGmPT1UdT+L01aHOMFBy5Xv/T9ezI7tI2GBan3moAEA9fGQzWKAEWNA xwf4g9igF4XWh5ekqR+4xd9wCRE9BA0=
=CvXJ
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Colin Watson@21:1/5 to Simon McVittie on Sat Apr 6 09:50:39 2024

On Fri, Apr 05, 2024 at 03:19:23PM +0100, Simon McVittie wrote:

I find that having the upstream source code in git (in the same form that
we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely
useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high success rate, but I think we are certainly not going to be able to achieve
it if we reject tools like git that could make it easier.

Strongly agree. For many many things I rely heavily on having the
upstream source code available in the same working tree when doing any
kind of archaeology across Debian package versions, which is something I
do a lot.

I would hate to see an attacker who relied on an overloaded maintainer
push us into significantly less convenient development setups, thereby increasing the likelihood of overload.

In the "debian/ only" workflow, the Debian delta is exactly the contents
of debian/. There is no redundancy, so every tree is in some sense a
valid one (although of course sometimes patches will fail to apply, or whatever).

I'd argue that this, and the similar error case in patches-unapplied, is symmetric with the error case in the patches-applied workflow (although
it's true that there is redundancy in _commits_ in the latter case).

--
Colin Watson (he/him) [cjwatson@debian.org]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeremy Stanley@21:1/5 to Russ Allbery on Sat Apr 6 09:51:30 2024

On 2024-04-02 16:44:54 -0700 (-0700), Russ Allbery wrote:
[...]

I think a shallow clone of depth 1 is sufficient, although that's not sufficient to get the correct version number from Git in all cases.

[...]

Some tools (python3-reno, for example) want to inspect the commits
and historical tags on branches, in order to do things like
assembling release notes documents. I don't know if any reno-using
projects packaged in Debian get release notes included, but if they
do then shallow clones would break that process. The python3-pbr
plugin also wants to look at commit messages on the current branch
since the most recent tag if its SemVer-based version-guessing kicks
in (typically if the current commit isn't tagged and the version
string hasn't been overridden with an envvar).
--
Jeremy Stanley

-----BEGIN PGP SIGNATURE-----

iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmYMoGhfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCn1xQ//VzCp3Z7kzqiJ95JPaYNHtAGFvYa6YyqwkiyD/J7T5uvFzHbNKzJTtxlL P8F7oGsbNSvB7rbzGYOG4wUwdO2CW6leG5Ta8cNnFSCOth4ORq3IQebxa8ssgFOc oK9r9D1cgpxH7ouNeEoyTbkoDUIve3LVIop7oX7EOGUWGmSFwO3PrRAUwTE1NMTl NKUoROuJgQHtEimLI7/qrjhfh79AHKW3L6kUrvJt6F2ad6D/j03Ku/2GFZLxGNKV f52vc/wNYAiKfoIgSLo6pJQv+2l/vZiwRyoTUjWCEMcVw0vJVlNzsh3HKMTlfMKU JbnqsHNVNH2p36s0Z+m0PErOm/GndB111+qNHx136HF2G8TcX7beLmY4/2xVSwBU wJo1UVG2ALzquDTtLXBwEW5rK3FDkCibqGm+mWbX7m94Kku+y7gynXlmjpiICpqt ufekiTipJ4UPIClD+baQ3T4vKxi2n+b9BiHlA4+LsKp9S7BpQ+AuBDu7FqHY5TKT QyK+A/5FeuGdAQ9hRea4x7AHG/R+AkPmItLb2bdN7ZHr7ZkbbpGCwEra7XF0wS7O PsBiiLP6r8GwWf30tAqpZGbg6JG45lTfQveWfONwUH7Rv2wIvjCshp3UXfj/8dxi Y+OPLM/sjrFbIiG+5dArD0lp0mVy+LT1agUHmoi47c5a8ysofx8=
=EODa
-----END PGP SIGNATURE-----

--- SoupGate-Win32

From Russ Allbery@21:1/5 to Adrian Bunk on Sat Apr 6 09:51:32 2024

Adrian Bunk <bunk@debian.org> writes:

On Mon, Apr 01, 2024 at 11:17:21AM -0400, Theodore Ts'o wrote:

Yeah, that too. There are still people building e2fsprogs on AIX,
Solaris, and other legacy Unix systems, and I'd hate to break them, or
require a lot of pain for people who are building on MacPorts, et. al.
...

Everything you mention should already be supported by Meson.

Meson honestly sounds great, and I personally love the idea of using a
build system whose language is a bit more like Python, since I use that language professionally anyway. (It would be nice if it *was* Python
rather than yet another ad hoc language, but I also get why they may want
to restrict it.)

The prospect of converting 25 years of portability code from M4 into a new language is daunting, however. For folks new to this ecosystem, what
resources are already available? Are there large libraries of tests
already out there akin to gnulib and the Autoconf Archive? Is there a
really good "porting from Autotools" guide for Meson that goes beyond the
very cursory guide in the Meson documentation?

The problem with this sort of migration is that it is an immense amount of
work just to get back to where you started. I look at the amount of
effort and start thinking things like "well, if I'm going to rewrite a
bunch of things anyway, maybe I should just rewrite the software in Rust instead."

--
Russ Allbery (rra@debian.org) <https://www.eyrie.org/~eagle/>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bastian Blank@21:1/5 to Bastian Blank on Sat Apr 6 09:51:53 2024

On Mon, Apr 01, 2024 at 12:03:48PM +0200, Bastian Blank wrote:

On Mon, Apr 01, 2024 at 02:31:47AM +0200, gregor herrmann wrote:

That's not mutually exclusive. When adding an additional git remote
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.

And this will error out if there are unexpected changes in the tarball?
How will it be able to detect those?

Okay, I looked into what it does. It just adds another parent to the
commit with the import of the tar. It does nothing else with this
information.

So in the end you still need to manually review all the stuff that the
tarball contains extra to the git. And for that I don't see that it
actually gives some helping hands and makes it easier.

So I really don't see how this makes the problem in hand any better.
Again the workload of review is on the person doing the job. Aka we do
fragile manual work instead of possibly failing automatic work.

Bastian

--
Women professionals do tend to over-compensate.
-- Dr. Elizabeth Dehaver, "Where No Man Has Gone Before",
stardate 1312.9.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bastian Blank@21:1/5 to Vincent Bernat on Sat Apr 6 09:51:52 2024

On Mon, Apr 01, 2024 at 06:36:30PM +0200, Vincent Bernat wrote:

On 2024-04-01 12:44, Bastian Blank wrote:

So in the end you still need to manually review all the stuff that the tarball contains extra to the git. And for that I don't see that it actually gives some helping hands and makes it easier.

So I really don't see how this makes the problem in hand any better.
Again the workload of review is on the person doing the job. Aka we do fragile manual work instead of possibly failing automatic work.

I think that if Debian was using git instead of the generated tarball, this part of the backdoor would have just been included in the git repository as well. If we were able to magically switch everything to git (and we won't,
we are not even able to agree on simpler stuff), I don't think it would have prevented the attack.

Nothing prevents such an attack. Prevent would be a 100% fix, which can
not exist. However what we can do is to make it harder to pull off.

If they had been forced to commit all the activation code into the repo,
it would have been directly visible for everyone. But instead, they
choose to only ship it in the tarballs.

That's why I asked if this would make it better, by removing this manual
review task from the maintainer.

Bastian

--
I object to intellect without discipline; I object to power without constructive purpose.
-- Spock, "The Squire of Gothos", stardate 2124.5

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Colin Watson@21:1/5 to Simon Josefsson on Sat Apr 6 09:52:11 2024

On Mon, Apr 01, 2024 at 05:24:45PM +0200, Simon Josefsson wrote:

Colin Watson <cjwatson@debian.org> writes:

On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:

Running ./bootstrap in a tarball may lead to different results than the
maintainer running ./bootstrap in pristine git. It is the same problem
as running 'autoreconf -fvi' in a tarball does not necessarily lead to
the same result as the maintainer running 'autoreconf -fvi' from
pristine git. The different is what is pulled in from the system
environment. Neither tool was designed to be run from within a tarball, >> so this is just bad practice that never worked reliable and without a
lot of complexity it will likely not become reliable either.

The practice of running "autoreconf -fi" or similar via dh-autoreconf
has worked extremely well at scale in Debian. I'm sure there are
complex edge cases where it's caused problems, but it's far from being a disaster area.

Agreed. I'm saying it doesn't fix the problem that I perceive that some people appear to believe, i.e., that running 'autoreconf -fi' solves the re-bootrapping problem.

Indeed - I've been pointing this out to people pretty much since the
xz-utils backdoor was discovered.

I have suggested before that upstream's (myself included) should publish >> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
checkout including submodules,

A while back I contributed support to Gnulib's bootstrap script to allow pinning particular commits without using submodules. I would recommend this mode; submodules have very strange UI.

I never liked git submodules generally, so I would be happy to work on getting that to be supported -- do you have pointers for earlier works
here?

https://lists.gnu.org/archive/html/bug-gnulib/2018-04/msg00029.html and
thread - it's been in gnulib for some years. (I think you may have
misread me as saying that I'd tried to contribute this and that it never
made it, or something like that?)

What is necessary, I think, is having something like this in
bootstrap.conf:

gnulib_commit_id = 123abc567...

This is what I implemented, except I spelled it GNULIB_REVISION. Then
see e.g.
https://gitlab.com/libpipeline/libpipeline/-/blob/main/bootstrap.conf.

As I noted in a comment on your blog, I think there is a case to be made for .po files being committed to upstream git, and I'm not fond of the practice of pulling them in only at bootstrap time (although I can understand why that's come to be popular as a result of limited
maintainer time). I have several reasons to believe this:

Those are all good arguments, but it still feels backwards to put these
files into git. It felt so good to externalize all the translation
churn outside of my git (or then, CVS...) repositories many years ago.

I would prefer to maintain a po/SHA256SUMS in git and continue to
download translations but have some mechanism to refuse to continue if
the hashes differ.

I wonder if a middle ground would be automated commits of translations.
I don't think that's as robust, but a number of projects do it (e.g.
d-i) and at least it's amenable to having translations go through CI
rather than just being YOLOed straight into release tarballs.

--
Colin Watson (he/him) [cjwatson@debian.org]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theodore Ts'o@21:1/5 to Russ Allbery on Sat Apr 6 09:52:16 2024

On Sat, Mar 30, 2024 at 08:44:36AM -0700, Russ Allbery wrote:

Luca Boccassi <bluca@debian.org> writes:

In the end, massaged tarballs were needed to avoid rerunning autoconfery
on twelve thousands different proprietary and non-proprietary Unix variants, back in the day. In 2024, we do dh_autoreconf by default so
it's all moot anyway.

This is true from Debian's perspective. This is much less obviously true from upstream's perspective, and there are some advantages to aligning
with upstream about what constitutes the release artifact.

My upstream perspective is that I've burned repeatedly with
incompatible version changes in autotools programs which causes my configure.{in,ac} file to no longer create a working configure script,
or which causes subtle breakages. So my practice is to use autoconf
on my Debian testing development system before checking in the
configure.ac and configure files --- but I ship the generated files
and I don't tell people to run autoreconf before running ./configure.
And if things break after they run autoreconf, I tell them, "you ran autoreconf; you get to keep both pieces".

And there *have* been times when autoconf has gotten updated in Debian
testing, and the resulting configure script has broken, at which point
I curse at autotools, and fix the configure.ac and/or aclocal.m4
files, etc., and *then* check in the generated configure file and
autotool source files.

Yes, perhaps it's time to switch to a different build system, although one
of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
case at all.

The other problem is that many of the other build systems are much
slower than autoconf/makefile. (Note: I don't use libtool, because
it's so d*mn slow.) Or building the alternate system might require a
major bootstrapping phase, or requires downloading a JVM, etc.

Maybe the answer is "you should give up on portability to older systems as the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.

Yeah, that too. There are still people building e2fsprogs on AIX,
Solaris, and other legacy Unix systems, and I'd hate to break them, or
require a lot of pain for people who are building on MacPorts, et. al.
It hasn't been *all* that long ago that I started require C99
compilers....

That being said, if someone who was worried about an Jia Tan-style
attack with e2fsprogs, first of all, you can verify that configure
corresponds to autoconf on the Debian testing at the time when the
archive was generated, and the officially released tar file is
generated via:

git archive --prefix=e2fsprogs-${ver}/ ${commit} | gzip -9n > $fn

... and the release tarballs are also in the pristine-tar branch of
e2fsprogs. So even if kernel.org (preferred) and sourceforget.net
(legacy) servers for the e2fsprogs tar files completely implodes, and
you only have access to the git repo, you can still get the original
e2fsprogs tar files using pristine-tar.

- Ted

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Simon McVittie@21:1/5 to Guillem Jover on Sat Apr 6 09:52:20 2024

On Sat, 30 Mar 2024 at 14:16:21 +0100, Guillem Jover wrote:

in my mind this incident reinforces my view that precisely storing
more upstream stuff in git is the opposite of what we'd want, and
makes reviewing even harder, given that in our context we are on a
permanent fork against upstream, and if you include merge commits and similar, there's lots of places to hide stuff. In contrast storing
only the packaging bits (debian/ dir alone) like pretty much every
other downstream is doing with their packaging bits, makes for an
obviously more manageable thing to review and not get drown into,
more so if we have to consider that next time perhaps the long-game
gets played within Debian.

I'd like to push back against this, because I'm not convinced by this reasoning, and I'd like to provide another point of view to consider.

I find that having the upstream source code in git (in the same form that
we use for the .orig.tar.*, so including Autotools noise, etc. if present,
but excluding any files that we exclude by repacking) is an extremely
useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high
success rate, but I think we are certainly not going to be able to achieve
it if we reject tools like git that could make it easier.

Am I correct to say that you are assuming here that we have a way to
verify the upstream source code out-of-band (therefore catching the xz
backdoor is out-of-scope here), and what you are aiming to detect here
is malicious changes that exist inside the Debian delta, more precisely
the dpkg-source 1.0 .diff.gz or 3.0 (quilt) .debian.tar.*? If that's your threat model, then I don't think any of the modes that dgit can cope with
are actually noticeably more difficult than a debian/-only git repo.

As my example of a project that applies patches, I'm going to use
bubblewrap, which is a small project and has a long-standing patch that
changes an error message in bubblewrap.c to point to Debian-specific documentation; this makes it convenient to tell at a glance whether bubblewrap.c is the upstream version or the Debian version.

There are basically three dgit-compatible workflows, with some minor adjustments around handling of .gitignore files:

- "patches applied" (git-debrebase, etc.):
This is the workflow that proponents of dgit sometimes recommend,
and dgit uses it as its canonicalized internal representation of
the package.
The git tree is the same as `dpkg-source -x`, with upstream source code
included, debian/ also included, and any Debian delta to the upstream
source pre-applied to those source files.
In the case of bubblewrap, if we used this workflow, after you clone
the project, bubblewrap.c would already have the Debian-specific error
message.
(dgit --split-view=never or dgit --quilt=dpm)

- "patches unapplied" (gbp pq, quilt, etc.):
This is the workflow that many of the big teams use (at least Perl,
Python, GNOME and systemd), and is the one that bubblewrap really uses.
The git tree is the same as `dpkg-source -x --skip-patches`, with
upstream source code included, and debian/ also included.
Any Debian delta to the upstream source is represented in debian/patches
but is *not* pre-applied to the source files: for example, in the case
of bubblewrap, after you clone
https://salsa.debian.org/debian/bubblewrap.git and view bubblewrap.c,
it still has the upstream error message, not the Debian-specific one.
(dgit --quilt=gbp or dgit --quilt=unapplied; I use the latter)

- debian/ only:
This is what you're advocating above.
The git tree contians only debian/. If there is Debian delta to the
upstream source, it is in debian/patches/ as usual.
(dgit --quilt=baredebian* family)

In the "patches applied" workflow, the Debian delta is something like
`git diff upstream/VERSION..debian/latest`, where upstream/VERSION must
match the .orig.tar.* and debian/latest is the packaging you are reviewing.
Not every tree is a valid one, because if you are using 3.0 (quilt),
then there is redundancy between the upstream source code and what's in debian/patches: it is an error if the result of reverting all the patches
does not match the upstream source in the .orig.tar.*, modulo possibly
some accommodation for changes to **/.gitignore being accepted and ignored.
To detect malicious Debian changes in 3.0 (quilt) format, you would want
to either check for that error, or review both the direct diff and the
patches.

Checking for that error is something that can be (and is) automated:
I don't use this workflow myself, but as far as I'm aware, dgit will
check that invariant, and it will fail to build your source package
if the invariant doesn't hold. dpkg-source in 3.0 (quilt) format will
also make your source package fail to build if the desired invariant
isn't true, except where intentionally ignored (ignoring deleted files, ignoring chmod +x, etc.).

In the "patches unapplied" workflow, the Debian delta is exactly the
contents of debian/, including debian/patches. There is redundancy between
the upstream source code and the Debianized branch: it is an error for
`git diff upstream/VERSION..debian/latest` to show any changes outside
debian/, except for possibly **/.gitignore.
To detect malicious Debian changes, you would want to check for
that error.

Again, checking for that error is something that can be (and is)
automated: I use this workflow myself (e.g. in bubblewrap), so I know from experience that dgit *does* check for that error, and will fail to build
the source package if the invariant does not hold. Again, dpkg-source
in 3.0 (quilt) format will also make your source package fail to build
if that error exists, except in the cases that it intentionally ignores.

In the "debian/ only" workflow, the Debian delta is exactly the contents
of debian/. There is no redundancy, so every tree is in some sense a
valid one (although of course sometimes patches will fail to apply, or whatever).

In summary, my claim is that having the upstream source code in git is a
good thing because it makes detecting malicious upstreams less difficult
than in a debian/-only workflow, while not making detecting malicious
packagers noticeably harder: the only extra thing that is necessary is to
carry out a check that can be done automatically, to assert that the tree
is internally-consistent for whichever workflow is in use.

smcv

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Luca Boccassi@21:1/5 to Colin Watson on Sat Apr 6 09:52:22 2024

On Fri, 5 Apr 2024 at 16:18, Colin Watson <cjwatson@debian.org> wrote:

On Fri, Apr 05, 2024 at 03:19:23PM +0100, Simon McVittie wrote:

I find that having the upstream source code in git (in the same form that we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high success rate, but I think we are certainly not going to be able to achieve it if we reject tools like git that could make it easier.

Strongly agree. For many many things I rely heavily on having the
upstream source code available in the same working tree when doing any
kind of archaeology across Debian package versions, which is something I
do a lot.

I would hate to see an attacker who relied on an overloaded maintainer
push us into significantly less convenient development setups, thereby increasing the likelihood of overload.

+1

gbp workflow is great, easy to review and very productive

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stefano Rivera@21:1/5 to All on Sat Apr 6 09:52:29 2024

Hi Thomas (2024.04.02_22:33:47_+0000)

Anyways, on the 400+ packages that I maintain within the OpenStack team, I did come across some upstream using setuptools-scm. To my experience, using the:

git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
| xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz

workflow out of an upstream always work, including for those that are using setuptools-scm.

Then you haven't come across any that are using this mechanism to
install data, yet. You're only seeing the version determination.
You will, at some point run into this problem. It's getting more
popular.

Stefano

--
Stefano Rivera
http://tumbleweed.org.za/
+1 415 683 3272

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Adrian Bunk@21:1/5 to Theodore Ts'o on Sat Apr 6 09:52:31 2024

On Mon, Apr 01, 2024 at 11:17:21AM -0400, Theodore Ts'o wrote:

On Sat, Mar 30, 2024 at 08:44:36AM -0700, Russ Allbery wrote:
...

Yes, perhaps it's time to switch to a different build system, although one of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
case at all.

The other problem is that many of the other build systems are much
slower than autoconf/makefile. (Note: I don't use libtool, because
it's so d*mn slow.) Or building the alternate system might require a
major bootstrapping phase, or requires downloading a JVM, etc.

The main selling point of Meson has been that it is a lot faster
than autotools.

Maybe the answer is "you should give up on portability to older systems as the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.

Yeah, that too. There are still people building e2fsprogs on AIX,
Solaris, and other legacy Unix systems, and I'd hate to break them, or require a lot of pain for people who are building on MacPorts, et. al.
...

Everything you mention should already be supported by Meson.

- Ted

cu
Adrian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Theodore Ts'o@21:1/5 to Vincent Bernat on Sat Apr 6 09:52:42 2024

On Mon, Apr 01, 2024 at 06:36:30PM +0200, Vincent Bernat wrote:

I think that if Debian was using git instead of the generated tarball, this part of the backdoor would have just been included in the git repository as well. If we were able to magically switch everything to git (and we won't,
we are not even able to agree on simpler stuff), I don't think it would have prevented the attack.

I'm not sure how much it would have helped, but I think the theory
behind eliminating the gap between the release tarball and the git
tree is the theory that in 2024, more developers are more likely to be
building and testing against the git tree, and so it might have been
more likely noticed. After all, Jia Tan decided it was worth while to
check in 99% of the exploit in git, but to only enable it when it was
built from the release tarball. If the exploit was always active when
built from the git tree, perhaps someone might have noticed before it
Debian uploaded the trojan'ed binary package to unstable, and then a
week or so later, having it promoted to testing.

I'm not sure how likely that would be for the specific case of
xz-utils, since it appears the number of developers (not just
Maintainers) was extremely small, but presumably Jia Tan decided to do
things in that way in the hopes of making less likely that the malware
would be noticed.

- Ted

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeremy Stanley@21:1/5 to Thomas Goirand on Sat Apr 6 09:52:43 2024

On 2024-04-03 00:33:47 +0200 (+0200), Thomas Goirand wrote:
[...]

Also, sdists are *not* "upstream-created source tarballs". I
consider the binary form built for PyPi. Just like we have .debs,
PyPi has tarballs and wheels, rather than how you describe them.

[...]

Upstream in OpenStack we believe we are distributing source tarballs
in sdist format. We produce and sign them, and serve them from
multiple locations. When you rebuild from a Git tag of an OpenStack
repository using a standard Python packaging ecosystem toolchain,
SetupTools is generating an ephemeral sdist on the fly in order to
set the metadata PBR and other components need.

I think it's fine that you'd rather rebuild the source distributions
from revision control than use the ones published by the OpenStack
community (we sign our tags with the same OpenPGP key as our
tarballs anyway), but it's merely your opinion that sdists are *not* "upstream-created source tarballs" (an opinion *not* shared by
everyone).
--
Jeremy Stanley

-----BEGIN PGP SIGNATURE-----

iQKTBAABCgB9FiEEl65Jb8At7J/DU7LnSPmWEUNJWCkFAmYMnvdfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDk3 QUU0OTZGQzAyREVDOUZDMzUzQjJFNzQ4Rjk5NjExNDM0OTU4MjkACgkQSPmWEUNJ WCmoOw//ay8YM5bedowLq2yzpI/qXwhabDHkyvdx3uGFCoErOHdJe3YA+pe9IxqW ATG19MoeHTPj5gFFvLFQTZE8R68kM38a70k97nRb7+kBLSsoL2F/FaQUCC9h/Hpa 29d1Akvr368ZIc4+5AGcp2u+2FP5r3ne1ZXR1gIfozHEKMjM3EGGyCp74evI5qxr G8mg4/bIdrTQsuYuP9AQ70JN9yWobQoKb0jo589S8zVEVYQbTOERxm5KooYYkdWN dmXqB9kofr1XfJWH7h1VTyNRyzFsqMciDlw2a3Z7eCalpkrbVepO9bj/xi0riKri dABLfcVolY54zj9o1VlE1vR1Kv2vVnIyriFfwG1AgAEB5D6eKP9/2+iWFo8VEna6 4ZBFOrpK+AptIt1JyJ03bVu//THRXiHow0BhmEjsn+2RPmehsB+AegjNBIiSeQSl +WZr9bUrHimEJYsJjXlAr3DE66HLW6v3rsMLs0bF8mSFDQNhDS/BkDWI0e6pZvk1 fjUsC2TgXZzeL8s2xQlI1PLL3xigbeDVUexuH1HvO/TS8xr58HeJl8j9BEpkP+06 GQaG2V4ufEwo2/pVnpstIZOwXb4QeR4i3/nyNYWv/YQV++8e7Gk1KdGC3EzZRsW4 UJEc6oLl5G3CZazk1FwM6xPDe0z/2gwnxBub4KEjAIAgqLljBMc=
=Ggwk
-----END PGP SIGNATURE-----

--- SoupGate-Win32

From Sean Whitton@21:1/5 to Simon McVittie on Sat Apr 6 13:30:01 2024

Hello,

On Fri 05 Apr 2024 at 03:19pm +01, Simon McVittie wrote:

There are basically three dgit-compatible workflows, with some minor adjustments around handling of .gitignore files:

- "patches applied" (git-debrebase, etc.):
This is the workflow that proponents of dgit sometimes recommend,
and dgit uses it as its canonicalized internal representation of
the package.
The git tree is the same as `dpkg-source -x`, with upstream source code
included, debian/ also included, and any Debian delta to the upstream
source pre-applied to those source files.
In the case of bubblewrap, if we used this workflow, after you clone
the project, bubblewrap.c would already have the Debian-specific error
message.
(dgit --split-view=never or dgit --quilt=dpm)

- "patches unapplied" (gbp pq, quilt, etc.):
This is the workflow that many of the big teams use (at least Perl,
Python, GNOME and systemd), and is the one that bubblewrap really uses.
The git tree is the same as `dpkg-source -x --skip-patches`, with
upstream source code included, and debian/ also included.
Any Debian delta to the upstream source is represented in debian/patches
but is *not* pre-applied to the source files: for example, in the case
of bubblewrap, after you clone
https://salsa.debian.org/debian/bubblewrap.git and view bubblewrap.c,
it still has the upstream error message, not the Debian-specific one.
(dgit --quilt=gbp or dgit --quilt=unapplied; I use the latter)

- debian/ only:
This is what you're advocating above.
The git tree contians only debian/. If there is Debian delta to the
upstream source, it is in debian/patches/ as usual.
(dgit --quilt=baredebian* family)

People interested in these differences may also want to look at:
<https://wiki.debian.org/GitPackagingSurvey>.

Again, checking for that error is something that can be (and is)
automated: I use this workflow myself (e.g. in bubblewrap), so I know from experience that dgit *does* check for that error, and will fail to build
the source package if the invariant does not hold. Again, dpkg-source
in 3.0 (quilt) format will also make your source package fail to build
if that error exists, except in the cases that it intentionally ignores.

Right, both dgit and tag2upload's client-side wrapped of git-tag(1), git-debpush(1), do this check.

--
Sean Whitton

--=-=-Content-Type: application/pgp-signature; name="signature.asc"

-----BEGIN PGP SIGNATURE-----

iQJNBAEBCgA3FiEEm5FwB64DDjbk/CSLaVt65L8GYkAFAmYRMWYZHHNwd2hpdHRv bkBzcHdoaXR0b24ubmFtZQAKCRBpW3rkvwZiQO4SEACGX9vxvkRQsirePURySZnD 6BTCBKk4S8lpZl3Z67nS1Tr27HUBWn9B40qAFIRG/oC9QYxfXTW8WYLreqlJsUfq ncBCmOuIyKxIfFHDU9K2auBG/D6/6EZ7HdelFl7LubdMdNpQtCLbRfo6PVqhywiC tYGAfrCquCIoYIc8TbRg0GeHLIf71ZD0Oc3+0DayIu30WApzKb1DoyzquQ7eLCzC h3JB0KJr4BDyJtXllK3eNDagq4CtN9u3k1xLdiFHpyr3Fz74YITrU2ekmipRF01K GD/Q3x3VHoKimyFc8DyhSALSCG/sLbL1JD243jIE/wCNMp1xn0M8Rob+NhEcpbWk R+7kDQ0RcaslcFp6uq0yX6e1+A2Mi+Dni/1v4RXBik4zN0tcRwrI9oY4BQ/wbRGu WZ5uGD7K/YlaE6NF3uvm4qkbwU2CsQLwz1Ag2AUbTvo3JY2MxFwWgipfN8mfj7Kh 5OUM2+d575bX89Zpg/uGHG18n+Key0mePIQ1A1zeF+VO8zDqTcXQu6deaPNjuIc3 6jylZ5HszY/ht0V0AljIururObAeJ/AO6rSqQVRUuWdSwota3hnWC45F/VuDCTPW 7Prp8iHJCO+WcsPAozPQ7EuvYQp510usdPcrhnIucIy+XXJcIgJUpOfegmCydTN6 XaQFLSrP0Zno3OUN1Plrtg==X8x+
-----END PGP SIGNATURE-----

--- SoupGate-Win32 v1.05
* Origin: fsxNet Us

Who's Online
Recent Visitors
- Bob Worm
  Tue Jun 4 23:31:56 2024
  from Wales, Uk via Telnet
- Guest
  Wed Jun 5 17:02:17 2024
  from Na via SSH
- Keyop
  Wed Jun 5 14:36:24 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Wed Jun 5 08:38:47 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	307
Nodes:	16 (2 / 14)
Uptime:	107:18:25
Calls:	6,852
Calls today:	3
Files:	12,355
Messages:	5,415,942

Re: Validating tarballs against git repositories

Who's Online

Recent Visitors

System Info