This is a vector I've been somewhat paranoid about myself, and I
typically check the difference between git archive $TAG and the downloaded tar, whenever I package things. Obviously a backdoor could have been inserted into the git repository directly, but there is a culture
surrounding good hygiene in commits: they ought to be small, focused,
and well described.
People are comfortable discussing and challenging
a commit that looks fishy, even if that commit is by the main developer
of a package. I have been assuming tooling existed in package
maintainers' toolkits to verify the faithful reproduction of the
published git tag in the downloaded source tarball, beyond a signature
check by the upstream developer. Apparently, this is not universal.
Had tooling existed in Debian to automatically validate this faithful reproduction, we might not have been exposed to this issue.
Having done this myself, it has been my experience that many partial
build artifacts are captured in source tarballs that are not otherwise maintained in the git repository. For instance, in zfs (which I have contributed to in the past), many automake files are regenerated.
(I do not believe that specific package is vulnerable to an attack
on the autoconf/automake files, since the debian package calls the
upstream tooling to regenerate those files.)
We already have a policy of not shipping upstream-built artifacts, so
I am making a proposal that I believe simply takes that one step further:
1. Move towards allowing, and then favoring, git-tags over source tarballs
2. Require upstream-built artifacts be removed (instead, generate these
ab-initio during build)
3. Have tooling that automatically checks the sanitized sources against
the development RCSs.
4. Look unfavorably on upstreams without RCS.
In the present case, the triggering modification was in a modified .m4 file that injected a snippet into the configure script. That modification
could have been flagged using this kind of process.
While this would be a lot of work, I believe doing so would require a
much larger amount of additional complexity in orchestrating attacks
against Debian in the future.
The way I see it, there are two options in handling a buildable package:
1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.
2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.
Either of these is, in my mind, dramatically better than what happened.
Ok, so am I understanding you correctly in that you are saying: we do actually want *some* build artifacts in the source archives?
If that's the case, could make those files at packaging time, analogous
to the DFSG-exclude stripping process?
On 2024-03-29 22:41, Guillem Jover wrote:
(For dpkg at least I'm pondering whether to play with switching to
doing something equivalent to «git archive» though, but see above, or
maybe generate two tarballs, a plain «git archive» and a portable one.)
I take a look at these every year or so to keep me terrified of C! If
it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there
is hope (of them!) identifying bad(tm) things.
But, I will definitely concede that, had I seen a commit that changed
that line in the m4, there's a good chance my eyes would have glazed
over it.
3. Have tooling that automatically checks the sanitized sources against
the development RCSs.
4. Look unfavorably on upstreams without RCS.
The sad irony here is that the xz maintainer tried to do exactly what we advise people in this situation to do: try to add a comaintainer to share
the work, and don't block work because you don't have time to personally
vet everything in detail. This is *exactly* why maintainers often don't
want to do that, and thus force people to fork packages rather than join
in maintaining the existing package.
Why not pass on maintainership for XZ for C so you can give XZ for
Java more attention? Or pass on XZ for Java to someone else to focus
on XZ for C? Trying to maintain both means that neither are
maintained well.
On 2024-03-29 22:41, Guillem Jover wrote:
See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.
I take a look at these every year or so to keep me terrified of C!
If it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there is hope (of them!) identifying bad(tm) things.
Yes. In that specific case, the original xz maintainer (Lasse Collin)
was socially-pressed by a likely fake person (Jigar Kumar) to do the
"right thing" and hand over maintenance. https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html
It's also good to keep in mind that this is an unpaid hobby project.
Yes. In that specific case, the original xz maintainer (Lasse Collin)
was socially-pressed by a likely fake person (Jigar Kumar) to do the
"right thing" and hand over maintenance. https://www.mail-archive.com/xz-devel@tukaani.org/msg00566.html
In his reply to that mail Lasse writes in https://www.mail-archive.com/xz-devel@tukaani.org/msg00567.html:
It's also good to keep in mind that this is an unpaid hobby project.
This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is widely used, insert some malicious code and have fun. Then also imagine stuff that automates builds in otherways like docker containers, Ruby, Rust, pip that pull stuff from the network and installs it without further checks.
I hope (and am confident) that Debian as a project will react accordingly to prevent this happening again.How?
Now it is time to take a step forward:
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source deb" and "binary deb" to unstable.
This change would have three advantages:
1. Move towards allowing, and then favoring, git-tags over source tarballs
Antonio Russo <aerusso@aerusso.net> writes:
1. Move towards allowing, and then favoring, git-tags over source tarballs
Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball, and that will more likely trigger
cyclic dependencies. People that do bootstrapping for new platforms or cross-platform dislike such added dependency.
1. Move towards allowing, and then favoring, git-tags over source tarballs
On 2024-03-30 08:02:04, Gioele Barabucci wrote:
Now it is time to take a step forward:
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source >> deb" and "binary deb" to unstable.
This change would have three advantages:
I think everyone fully agrees this is a good thing, no need to list the advantages.
Just as an example, bootstrapping coreutils currently requires
bootstrapping at least 68 other packages, including libx11-6 [1]. If coreutils supported <nodoc> [2], the transitive closure of its
Build-Depends would be reduced to 20 packages, most of which in build-essential.
[1] https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
[2] https://bugs.debian.org/1057136
On 2024-03-30 08:02:04, Gioele Barabucci wrote:
Now it is time to take a step forward:
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.
This change would have three advantages:
I think everyone fully agrees this is a good thing, no need to list the advantages.
The problem is that this requies functionality testing to be fully
automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".
Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.
Antonio Russo <antonio.e.russo@gmail.com> writes:
The way I see it, there are two options in handling a buildable package:
1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.
2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.
Either of these is, in my mind, dramatically better than what happened.
I think the point that you're assuming (probably because you quite
reasonably think it's too obvious to need to be stated, but I'm not sure
it's that obvious to everyone) is that malicious code injected via a
commit is significantly easier to detect than malicious code that is only
in the release tarball.
This is not *always* correct; it really depends on how many eyes are on
the upstream repository and how complex or unreadable the code upstream writes normally is. (For example, I am far from confident that I can
eyeball the difference between valid and malicious procmail-style C code
or random M4 files.) I think it's clearly at least *sometimes* correct, though, so I'm sympathetic, particularly given that it's already Debian practice to regenerate the build system files anyway.
In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics. I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm not sure this is obviously true.
Relying on signed git tags is not reliable because git is primarily SHA1-based which in 2019 cost $45K to do a collission attack for.
On 30/03/24 01:21, Antonio Russo wrote:
3. Have tooling that automatically checks the sanitized sources against
the development RCSs.
git-buildpackage and pristine-tar can be used for that.
1. Move towards allowing, and then favoring, git-tags over source tarballs
Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball
On Sat, 2024-03-30 08:02:04 +0100, Gioele Barabucci <gioele@svario.it> wrote:
On 30/03/24 01:21, Antonio Russo wrote:Would be nice if pristine-tar's data file would be reproducible,
3. Have tooling that automatically checks the sanitized sources againstgit-buildpackage and pristine-tar can be used for that.
the development RCSs.
too...
On 2024-03-30 08:02:04, Gioele Barabucci wrote:
Now it is time to take a step forward:I think everyone fully agrees this is a good thing, no need to list the
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.
This change would have three advantages:
advantages.
It is also already fully implemented as tag2upload, and is merely as yet undeployed, for social reasons.
Hello,
On Sat 30 Mar 2024 at 12:19pm +01, Simon Josefsson wrote:
Relying on signed git tags is not reliable because git is primarily
SHA1-based which in 2019 cost $45K to do a collission attack for.
We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.
On 2024-03-29 22:41, Guillem Jover wrote:
On Fri, 2024-03-29 at 18:21:27 -0600, Antonio Russo wrote:
Had tooling existed in Debian to automatically validate this faithful
reproduction, we might not have been exposed to this issue.
Given that the autogenerated stuff is not present in the git tree,
a diff between tarball and git would always generate tons of delta,
so this would not have helped.
I may not have been clear, but I'm suggesting scrubbing all the
autogenerated stuff, and comparing that against a similarly scrubbed
git tag contents. (But you explain that this is problematic.)
Having done this myself, it has been my experience that many partial
build artifacts are captured in source tarballs that are not otherwise
maintained in the git repository. For instance, in zfs (which I have
contributed to in the past), many automake files are regenerated.
(I do not believe that specific package is vulnerable to an attack
on the autoconf/automake files, since the debian package calls the
upstream tooling to regenerate those files.)
(Hopefully the above clears up that I at least have some superficial awareness of the build artifacts showing up in the release tarball!)
1. Move towards allowing, and then favoring, git-tags over source tarballs
I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?
I'm not wed to any specific mechanism, but I'd be content with that. I'd
be most happy DD-signed tags that were certified dfsg, policy compliant (i.e., lacking build artifacts), and equivalent to scrubbed upstream source. (and more on that later, building on what you say).
Many repositories today already do things close to this with pristine-tar,
so this seems to me a direction where the tooling already exists.
I'll add that, if we drop the desire for a signed archive, and instead require a signed git-tag (from which we can generate a source tar on
demand, as you suggest), we can drop the pristine-tar requirement. If we
are less progressive, but move to exclusively with Debian-regenerated
.tar files, we can probably avoid many of the frustrating edge cases that pristine-tar still struggles with.
2. Require upstream-built artifacts be removed (instead, generate these
ab-initio during build)
The problem here is that the .m4 file to hook into the build system was named like one shipped by gnulib (so less suspicious), but xz-utils does not use gnulib, and thus the autotools machinery does not know anything about it, so even the «autoreconf -f -i» done by debhelper via dh-autoreconf, would not regenerate it.
The way I see it, there are two options in handling a buildable package:
1. That file would have been considered a build artifact, consequently removed and then regenerated. No backdoor.
2. The file would not have been scrubbed, and a difference between the
git version and the released tar version would have been noticed.
Backdoor found.
Either of these is, in my mind, dramatically better than what happened.
One automatic approach would be run dh-autoreconf and identify the
changed files. Remove those files from both the distributed tarball and
git tag. Check if those differ. (You also suggest something very similar
to this, and repacking the archive with those debian-generated build artifacts).
I may be missing something here, though!
Removing these might be cumbersome after the fact if upstream includes
for example their own maintained .m4 files. See dpkg's m4 dir for an example of this (although there it's easy as all are namespaced but…).
I am not an m4 expert (in fact, I have specifically tried to avoid
learning anything more about auto(make/reconf) than absolutely necessary.
My point is just: either those files are needed, or not. If they're
needed, they need to not differ. And if they're not, they should
be scrubbed.
I think you are saying that doing this automatically is going to be hard/impossible. Is that fair?
Not using an upstream provided tarball, might also mean we stop being
able to use upstream signatures, which seems worse. The alternative
might be promoting for upstreams to just do the equivalent of
«git archive», but that might defeat the portability and dependency reduction properties that were designed into the autotools build
system, or increase the bootstrap set (see for example the pkg.dpkg.author-release build profile used by dpkg).
Ok, so am I understanding you correctly in that you are saying: we do actually want *some* build artifacts in the source archives?
If that's the case, could make those files at packaging time, analogous
to the DFSG-exclude stripping process?
In the present case, the triggering modification was in a modified .m4 file
that injected a snippet into the configure script. That modification
could have been flagged using this kind of process.
I don't think this modification would have been spotted, because it
was not modifying a file it would usually get autogenerated by its
build system.
If we look at what ./autogen.sh would have changed, and scrub those
files from the release archive, wouldn't that mean that the malicious
m4 file would have been spotted, since it would NOT have been autogenerated?
While this would be a lot of work, I believe doing so would require a
much larger amount of additional complexity in orchestrating attacks
against Debian in the future.
It would certainly make it a bit harder, but I'm afraid that if you
cannot trust upstream and they are playing a long game, then IMO they
can still sneak nasty stuff even in plain sight with just code commits, unless you are paying extreme close attention. :/
See for example <https://en.wikipedia.org/wiki/Underhanded_C_Contest>.
I take a look at these every year or so to keep me terrified of C!
If it's a single upstream developer, I absolutely agree, but if there's an upstream community reviewing the git commits, I really do believe there is hope (of them!) identifying bad(tm) things.
I just want to make sure that what we actually pull in is what the
community is actually reviewing. I feel like anything less gets dangerous. (Given few enough eyeballs, all bugs are deep!)
But, I will definitely concede that, had I seen a commit that changed
that line in the m4, there's a good chance my eyes would have glazed over
it.
On 2024/03/30 12:43, Sean Whitton wrote:
On 2024-03-30 08:02:04, Gioele Barabucci wrote:
Now it is time to take a step forward:I think everyone fully agrees this is a good thing, no need to list the
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs
"source
deb" and "binary deb" to unstable.
This change would have three advantages:
advantages.
It is also already fully implemented as tag2upload, and is merely as yet
undeployed, for social reasons.
My understanding is that DSA aren't quite comfortable with it, since it
would need to archive GPG signing key (or a keypair trusted by DAK)?
In the end, massaged tarballs were needed to avoid rerunning autoconfery
on twelve thousands different proprietary and non-proprietary Unix
variants, back in the day. In 2024, we do dh_autoreconf by default so
it's all moot anyway.
When using Meson/CMake/home-grown makefiles there's no meaningful
difference on average, although I'm sure there are corner cases and exceptions here and there.
This reminds me of https://xkcd.com/2347/ - and I think that’s getting a more common threat vector for FLOSS: pick up some random lib that is
widely used, insert some malicious code and have fun. Then also imagine
stuff that automates builds in other ways like docker containers, Ruby,
Rust, pip that pull stuff from the network and installs it without
further checks.
I hope (and am confident) that Debian as a project will react
accordingly to prevent this happening again.
Sean Whitton <spwhitton@spwhitton.name> writes:
We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.
Can you share that analysis? As far as I understand, it is possible for
a malicious actor to create a git repository with the same commit id as
HEAD, with different historic commits and tree content. I thought a
signed tag is merely a signed reference to a particular commit id. If
that commit id is a SHA1 reference, that opens up for ambiguity given
recent (well, 2019) results on SHA1. Of course, I may be wrong in any
of the chain, so would appreciate explanation of how this doesn't work.
On 2024/03/30 11:05, Simon Josefsson wrote:
1. Move towards allowing, and then favoring, git-tags over source tarballs >>Some people have suggested this before -- and I have considered adopting
that approach myself, but one thing that is often overlooked is that
building from git usually increase the Build-Depends quite a lot
compared to building from tarball
How in the world do you jump to that conclusion?
if the Git repository is somewhere other than GitHub, the[...]
malicious possibilities are even broader.
On 2024-03-29 23:29:01 -0700 (-0700), Russ Allbery wrote:
[...]
if the Git repository is somewhere other than GitHub, the[...]
malicious possibilities are even broader.
I would not be so quick to make the same leap of faith. GitHub is
not itself open source, nor is it transparently operated. It's a
proprietary commercial service, with all the trust challenges that represents. Long, long before XZ was a twinkle in anyone's eye,
malicious actors were already regularly getting their agents hired
onto development teams to compromise commercial software. Just look
at the Juniper VPN backdoor debacle for a fairly well-documented
example (but there's strong evidence this practice dates back well
before free/libre open source software even, at least to the 1970s).
On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:
On 2024-03-30 08:02:04, Gioele Barabucci wrote:
Now it is time to take a step forward:
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.
This change would have three advantages:
I think everyone fully agrees this is a good thing, no need to list the advantages.
The problem is that this requies functionality testing to be fully automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".
Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.
This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings
Do you mean this theoretical workflow will not have a step of theNow it is time to take a step forward:
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.
This change would have three advantages:
I think everyone fully agrees this is a good thing, no need to list the advantages.
The problem is that this requies functionality testing to be fully
automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".
Give me good Salsa support for autopkgtest + lintian + piuparts, andAFAIK the currently suggested way of enabling that is putting "recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.
Yes, perhaps it's time to switch to a different build system, although one
of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
case at all.
[1] https://mesonbuild.com/Porting-from-autotools.html
On Sat, Mar 30, 2024 at 10:56:40AM +0100, Iustin Pop wrote:
Now it is time to take a step forward:
1. new upstream release;
2. the DD/DM merges the upstream release VCS into the Debian VCS;
3. the buildd is notified of the new release;
4. the buildd creates and uploads the non-reviewed-in-practice blobs "source
deb" and "binary deb" to unstable.
This change would have three advantages:
I think everyone fully agrees this is a good thing, no need to list the advantages.
The problem is that this requies functionality testing to be fully automated via autopkgtest, and moved off the "update changelog, build package, test locally, test some more, upload".Do you mean this theoretical workflow will not have a step of the
maintainer actually looking at the package and running it locally, or
running any building or linting locally before pushing the changes?
Then yeah, looking at some questions in the past years I understand that
some people are already doing that, powered by Salsa CI (I can think of several possible reasons for that workflow but it still frustrates me).
Give me good Salsa support for autopkgtest + lintian + piuparts, andAFAIK the currently suggested way of enabling that is putting "recipes/debian.yml@salsa-ci-team/pipeline" into "CI/CD configuration
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.
file" in the salsa settings (no idea where is the page that tells that or
how to find it even knowing it exists).
...
1. Move towards allowing, and then favoring, git-tags over source tarballs
...
Best,
Antonio Russo
Simon Josefsson <simon@josefsson.org> writes:
Sean Whitton <spwhitton@spwhitton.name> writes:
We did some analysis on the SHA1 vulnerabilities and determined that
they did not meaningfully affect dgit & tag2upload's design.
Can you share that analysis? As far as I understand, it is possible for
a malicious actor to create a git repository with the same commit id as
HEAD, with different historic commits and tree content. I thought a
signed tag is merely a signed reference to a particular commit id. If
that commit id is a SHA1 reference, that opens up for ambiguity given
recent (well, 2019) results on SHA1. Of course, I may be wrong in any
of the chain, so would appreciate explanation of how this doesn't work.
I believe you're talking about two different things. I think Sean is
talking about preimage resistance, which assumes that the known-good repository is trusted, and I believe Simon is talking about manufactured collisions where the attacker controls both the good and the bad
repository.
The dgit and tag2upload design probably (I'd have to think about it some more, ideally while bouncing the problem off of someone else, because I've recycled those brain cells for other things) only needs preimage
resistance, but the general case of a malicious upstream may be vulnerable
to manufactured collisions.
...
In other words, we should make sure that breaking the specific tactics
*this* attacker used truly make the attacker's life harder, as opposed to making life harder for Debian packagers while only forcing a one-time,
minor shift in attacker tactics. I *think* I'm mostly convinced that
forcing the attacker into Git commits is a useful partial defense, but I'm not sure this is obviously true.
...
Russ Allbery <rra@debian.org> writes:
I believe you're talking about two different things. I think Sean is
talking about preimage resistance, which assumes that the known-good
repository is trusted, and I believe Simon is talking about
manufactured collisions where the attacker controls both the good and
the bad repository.
Right. I think the latter describes the xz scenario: someone could have pushed a maliciously crafted commit with a SHA1 collision commit id, so
there are two different git repositories with that commit id, and a
signed git tag on that commit id authenticates both trees, opening up
for uncertainty about what was intended to be used. Unless I'm missing
some detail of how git signed tag verification works that would catch
this.
The dgit and tag2upload design probably (I'd have to think about it
some more, ideally while bouncing the problem off of someone else,
because I've recycled those brain cells for other things) only needs
preimage resistance, but the general case of a malicious upstream may
be vulnerable to manufactured collisions.
It is not completely clear to me: How about if some malicious person
pushed a commit to salsa, asked a DD to "please review this repository
and sign a tag to make the upload"? The DD would presumably sign a
commit id that authenticate two different git trees, one with the
exploit and one without it.
Relying on signed git tags is not reliable because git is primarily >SHA1-based which in 2019 cost $45K to do a collission attack for.FWIW, Gitlab is working on support for SHA 256 hashing [1], and as
On 2024-03-30 11:47:56, Luca Boccassi wrote:
On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:
Give me good Salsa support for autopkgtest + lintian + piuparts, and
easy support (so that I just have to toggle one checkbox), and I'm
happy. Or even better, integrate all that testing with Salsa (I don't
know if it has "CI tests must pass before merging"), and block tagging
on the tagged version having been successfully tested.
This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings
I will be the first to admit I'm not up to date on latest Salsa news,
but see, what you mention - "include the yml" - is exactly what I don't want.
Antonio Russo <antonio.e.russo@gmail.com> writes:
But, I will definitely concede that, had I seen a commit that changed
that line in the m4, there's a good chance my eyes would have glazed
over it.
This is why I am somewhat skeptical that forcing everything into Git
commits is as much of a benefit as people are hoping. This particular attacker thought it was better to avoid the Git repository, so that is evidence in support of that approach, and it's certainly more helpful,
once you know something bad has happened, to be able to use all the Git
tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be
force-pushed, and if the Git repository is somewhere other than GitHub,
the malicious possibilities are even broader.
We could narrow those possibilities somewhat by maintaining
Debian-controlled mirrors of upstream Git repositories so that we could detect rewritten history. (There are a whole lot of reasons why I think
dgit is a superior model for archive management. One of them is that it captures the full Git history of upstream at the point of the upload on Debian-controlled infrastructure if the maintainer of the package bases it
on upstream's Git tree.)
Gioele Barabucci <gioele@svario.it> writes:
Just as an example, bootstrapping coreutils currently requires
bootstrapping at least 68 other packages, including libx11-6 [1]. If
coreutils supported <nodoc> [2], the transitive closure of its
Build-Depends would be reduced to 20 packages, most of which in
build-essential.
[1]
https://buildd.debian.org/status/fetch.php?pkg=coreutils&arch=amd64&ver=9.4-3.1&stamp=1710441056&raw=1
[2] https://bugs.debian.org/1057136
Coreutils in Debian uses upstream tarballs and does not do a full
bootstrap build. It does autoreconf instead of ./bootstrap. So the dependencies above is not the entire bootstrapping story to build
coreutils from git compared to building from tarballs.
On 29/03/24 at 23:29 -0700, Russ Allbery wrote:
This is why I am somewhat skeptical that forcing everything into Git commits is as much of a benefit as people are hoping. This particular attacker thought it was better to avoid the Git repository, so that is evidence in support of that approach, and it's certainly more helpful,
once you know something bad has happened, to be able to use all the Git tools to figure out exactly what happened. But I'm not sure we're fully accounting for the fact that tags can be moved, branches can be force-pushed, and if the Git repository is somewhere other than GitHub,
the malicious possibilities are even broader.
I wonder if Software Heritage could help with that part?
Luca Boccassi <bluca@debian.org> writes:
In the end, massaged tarballs were needed to avoid rerunning autoconfery
on twelve thousands different proprietary and non-proprietary Unix variants, back in the day. In 2024, we do dh_autoreconf by default so
it's all moot anyway.
This is true from Debian's perspective. This is much less obviously true from upstream's perspective, and there are some advantages to aligning
with upstream about what constitutes the release artifact.
On 30/03/24 20:43, Iustin Pop wrote:
On 2024-03-30 11:47:56, Luca Boccassi wrote:
On Sat, 30 Mar 2024 at 09:57, Iustin Pop <iustin@debian.org> wrote:
Give me good Salsa support for autopkgtest + lintian + piuparts, and easy support (so that I just have to toggle one checkbox), and I'm happy. Or even better, integrate all that testing with Salsa (I don't know if it has "CI tests must pass before merging"), and block tagging on the tagged version having been successfully tested.
This is all already implemented by Salsa CI? You just need to include
the yml and enable the CI in the settings
I will be the first to admit I'm not up to date on latest Salsa news,
but see, what you mention - "include the yml" - is exactly what I don't want.
Salsa CI is enabled by default for all projects in the debian/ namespace <https://salsa.debian.org/debian/>.
Adding a yml file or changing the CI settings to reference the Salsa CI pipeline is needed only for projects in team- or maintainer-specific repositories, or when the dev wants to enable additional tests (or configure/block the default tests).
...
In the end, massaged tarballs were needed to avoid rerunning
autoconfery on twelve thousands different proprietary and
non-proprietary Unix variants, back in the day. In 2024, we do
dh_autoreconf by default so it's all moot anyway.
...
On Sat, 30 Mar 2024 at 15:44, Russ Allbery <rra@debian.org> wrote:
Luca Boccassi <bluca@debian.org> writes:
In the end, massaged tarballs were needed to avoid rerunning
autoconfery on twelve thousands different proprietary and
non-proprietary Unix variants, back in the day. In 2024, we do
dh_autoreconf by default so it's all moot anyway.
This is true from Debian's perspective. This is much less obviously
true from upstream's perspective, and there are some advantages to
aligning with upstream about what constitutes the release artifact.
My point is that, while there will be for sure exceptions here and
there, by and large the need for massaged tarballs comes from projects
using autoconf and wanting to ship source archives that do not require
to run the autoconf machinery.
However, we as in Debian do not have this problem. We can and do re-run
the autoconf machinery on every build. And at least on the main forges,
the autogenerated (and thus out of reach from this kind of attacks)
tarball is always present too - the massaged tarball is an _addition_,
not a _substitution_. Hence: we should really really think about forcing
all packages, by policy, to use the autogenerated tarball by default
instead of the autoconf one, when both are present, unless extenuating circumstances (that have to be documented) are present.
Most of this is pregenerated documentation (primarily man pages generated from POD), but it also includes generated test data and other things. The reason is similar: regenerating those files requires tools that may not be present on an older system (like a mess of random Perl modules) or, in the case of the man pages, may be old and thus produce significantly inferior output.But we do not use older systems to build our packages, so this does not matter.
Just to note, though, this means that we lose the upstream signature inTotally worth it!
the archive. The only place the upstream signature would then live is in Salsa.
1. Move towards allowing, and then favoring, git-tags over source tarballs
I assume you mean git archives out of git tags? Otherwise how do you
go from git-tag to a source package in your mind?
My point is that, while there will be for sure exceptions here andJust as a data point, literally every C project for which I am upstream
there, by and large the need for massaged tarballs comes from projects using autoconf and wanting to ship source archives that do not require
to run the autoconf machinery.
ships additional files in the release tarballs that are not in Git for reasons unrelated to Autoconf and friends.
Just to note, though, this means that we lose the upstream signature in
the archive. The only place the upstream signature would then live is in Salsa.
I switched long ago all my packages from tar archives to the git
upstream tree. Not only this makes much easier to understand the changes
in a new release,
No: I get nothing of value by doing that and the repository will beI switched long ago all my packages from tar archives to the gitThat's not mutually exclusive. When adding an additional git remote
upstream tree. Not only this makes much easier to understand the changes in a new release,
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.
At 2024-03-31T22:32:49+0000, Stefano Rivera wrote:
Upstreams would probably prefer that we used git repositories
*directly* as source artifacts, but that comes with a whole other can
of worms...
Speaking from my upstream groff perspective, I wouldn't _prefer_ that.
The distribution archives get build-testing on a much wider variety of systems, thanks to people on the groff@ and platform-testers@gnu mailing lists that help out when a release candidate is announced. They have
access to platforms more exotic that I and a few other bleeding-edge
HEAD mavens do. This practice tangibly improved the quality of the
groff 1.23.0 release, especially on surviving proprietary Unix systems.
Building from the repo, or using the bootstrap script--which Colin
Watson just today ensured will be in future distribution archives--is fine.[1] I'm glad some people build the project that way. But I think
that procedure serves an audience that is distinguishable in some ways.
Regards,
Branden
[1] https://git.savannah.gnu.org/cgit/groff.git/commit/?id=822fef56e9ab7cbe69337b045f6f20e32e25f566
That's not mutually exclusive. When adding an additional git remote
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.
I find that having the upstream source code in git (in the same form thatI agree: it would be untinkable for me to not have the complete history immediately available while I am working on a package.
we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely
useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively
I find that having the upstream source code in git (in the same form that
we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely
useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high success rate, but I think we are certainly not going to be able to achieve
it if we reject tools like git that could make it easier.
In the "debian/ only" workflow, the Debian delta is exactly the contents
of debian/. There is no redundancy, so every tree is in some sense a
valid one (although of course sometimes patches will fail to apply, or whatever).
I think a shallow clone of depth 1 is sufficient, although that's not sufficient to get the correct version number from Git in all cases.[...]
On Mon, Apr 01, 2024 at 11:17:21AM -0400, Theodore Ts'o wrote:
Yeah, that too. There are still people building e2fsprogs on AIX,
Solaris, and other legacy Unix systems, and I'd hate to break them, or
require a lot of pain for people who are building on MacPorts, et. al.
...
Everything you mention should already be supported by Meson.
On Mon, Apr 01, 2024 at 02:31:47AM +0200, gregor herrmann wrote:
That's not mutually exclusive. When adding an additional git remoteAnd this will error out if there are unexpected changes in the tarball?
and using gbp-import-orig's --upstream-vcs-tag you get the best of
both worlds.
How will it be able to detect those?
On 2024-04-01 12:44, Bastian Blank wrote:
So in the end you still need to manually review all the stuff that the tarball contains extra to the git. And for that I don't see that it actually gives some helping hands and makes it easier.
So I really don't see how this makes the problem in hand any better.
Again the workload of review is on the person doing the job. Aka we do fragile manual work instead of possibly failing automatic work.
I think that if Debian was using git instead of the generated tarball, this part of the backdoor would have just been included in the git repository as well. If we were able to magically switch everything to git (and we won't,
we are not even able to agree on simpler stuff), I don't think it would have prevented the attack.
Colin Watson <cjwatson@debian.org> writes:
On Mon, Apr 01, 2024 at 11:33:06AM +0200, Simon Josefsson wrote:
Running ./bootstrap in a tarball may lead to different results than the
maintainer running ./bootstrap in pristine git. It is the same problem
as running 'autoreconf -fvi' in a tarball does not necessarily lead to
the same result as the maintainer running 'autoreconf -fvi' from
pristine git. The different is what is pulled in from the system
environment. Neither tool was designed to be run from within a tarball, >> so this is just bad practice that never worked reliable and without a
lot of complexity it will likely not become reliable either.
The practice of running "autoreconf -fi" or similar via dh-autoreconf
has worked extremely well at scale in Debian. I'm sure there are
complex edge cases where it's caused problems, but it's far from being a disaster area.
Agreed. I'm saying it doesn't fix the problem that I perceive that some people appear to believe, i.e., that running 'autoreconf -fi' solves the re-bootrapping problem.
I have suggested before that upstream's (myself included) should publish >> PGP-signed *-src.tar.gz tarballs that contain the entire pristine git
checkout including submodules,
A while back I contributed support to Gnulib's bootstrap script to allow pinning particular commits without using submodules. I would recommend this mode; submodules have very strange UI.
I never liked git submodules generally, so I would be happy to work on getting that to be supported -- do you have pointers for earlier works
here?
What is necessary, I think, is having something like this in
bootstrap.conf:
gnulib_commit_id = 123abc567...
As I noted in a comment on your blog, I think there is a case to be made for .po files being committed to upstream git, and I'm not fond of the practice of pulling them in only at bootstrap time (although I can understand why that's come to be popular as a result of limited
maintainer time). I have several reasons to believe this:
Those are all good arguments, but it still feels backwards to put these
files into git. It felt so good to externalize all the translation
churn outside of my git (or then, CVS...) repositories many years ago.
I would prefer to maintain a po/SHA256SUMS in git and continue to
download translations but have some mechanism to refuse to continue if
the hashes differ.
Luca Boccassi <bluca@debian.org> writes:
In the end, massaged tarballs were needed to avoid rerunning autoconfery
on twelve thousands different proprietary and non-proprietary Unix variants, back in the day. In 2024, we do dh_autoreconf by default so
it's all moot anyway.
This is true from Debian's perspective. This is much less obviously true from upstream's perspective, and there are some advantages to aligning
with upstream about what constitutes the release artifact.
Yes, perhaps it's time to switch to a different build system, although one
of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
case at all.
Maybe the answer is "you should give up on portability to older systems as the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.
in my mind this incident reinforces my view that precisely storing
more upstream stuff in git is the opposite of what we'd want, and
makes reviewing even harder, given that in our context we are on a
permanent fork against upstream, and if you include merge commits and similar, there's lots of places to hide stuff. In contrast storing
only the packaging bits (debian/ dir alone) like pretty much every
other downstream is doing with their packaging bits, makes for an
obviously more manageable thing to review and not get drown into,
more so if we have to consider that next time perhaps the long-game
gets played within Debian.
On Fri, Apr 05, 2024 at 03:19:23PM +0100, Simon McVittie wrote:
I find that having the upstream source code in git (in the same form that we use for the .orig.tar.*, so including Autotools noise, etc. if present, but excluding any files that we exclude by repacking) is an extremely useful tool, because it lets me trace the history of all of the files
that we are treating as source - whether hand-written or autogenerated -
if I want to do that. If we are concerned about defending against actively malicious upstreams like the recent xz releases, then that's already a difficult task and one where it's probably unrealistic to expect a high success rate, but I think we are certainly not going to be able to achieve it if we reject tools like git that could make it easier.
Strongly agree. For many many things I rely heavily on having the
upstream source code available in the same working tree when doing any
kind of archaeology across Debian package versions, which is something I
do a lot.
I would hate to see an attacker who relied on an overloaded maintainer
push us into significantly less convenient development setups, thereby increasing the likelihood of overload.
Anyways, on the 400+ packages that I maintain within the OpenStack team, I did come across some upstream using setuptools-scm. To my experience, using the:
git archive --prefix=$(DEBPKGNAME)-$(VERSION)/ $(GIT_TAG) \
| xz >../$(DEBPKGNAME)_$(VERSION).orig.tar.xz
workflow out of an upstream always work, including for those that are using setuptools-scm.
On Sat, Mar 30, 2024 at 08:44:36AM -0700, Russ Allbery wrote:
...
Yes, perhaps it's time to switch to a different build system, although one of the reasons I've personally been putting this off is that I do a lot of feature probing for library APIs that have changed over time, and I'm not sure how one does that in the non-Autoconf build systems. Meson's Porting from Autotools [1] page, for example, doesn't seem to address this use
case at all.
The other problem is that many of the other build systems are much
slower than autoconf/makefile. (Note: I don't use libtool, because
it's so d*mn slow.) Or building the alternate system might require a
major bootstrapping phase, or requires downloading a JVM, etc.
Maybe the answer is "you should give up on portability to older systems as the cost of having a cleaner build system," and that's not an entirely unreasonable thing to say, but that's going to be a hard sell for a lot of upstreams that care immensely about this.
Yeah, that too. There are still people building e2fsprogs on AIX,
Solaris, and other legacy Unix systems, and I'd hate to break them, or require a lot of pain for people who are building on MacPorts, et. al.
...
- Ted
I think that if Debian was using git instead of the generated tarball, this part of the backdoor would have just been included in the git repository as well. If we were able to magically switch everything to git (and we won't,
we are not even able to agree on simpler stuff), I don't think it would have prevented the attack.
Also, sdists are *not* "upstream-created source tarballs". I[...]
consider the binary form built for PyPi. Just like we have .debs,
PyPi has tarballs and wheels, rather than how you describe them.
There are basically three dgit-compatible workflows, with some minor adjustments around handling of .gitignore files:
- "patches applied" (git-debrebase, etc.):
This is the workflow that proponents of dgit sometimes recommend,
and dgit uses it as its canonicalized internal representation of
the package.
The git tree is the same as `dpkg-source -x`, with upstream source code
included, debian/ also included, and any Debian delta to the upstream
source pre-applied to those source files.
In the case of bubblewrap, if we used this workflow, after you clone
the project, bubblewrap.c would already have the Debian-specific error
message.
(dgit --split-view=never or dgit --quilt=dpm)
- "patches unapplied" (gbp pq, quilt, etc.):
This is the workflow that many of the big teams use (at least Perl,
Python, GNOME and systemd), and is the one that bubblewrap really uses.
The git tree is the same as `dpkg-source -x --skip-patches`, with
upstream source code included, and debian/ also included.
Any Debian delta to the upstream source is represented in debian/patches
but is *not* pre-applied to the source files: for example, in the case
of bubblewrap, after you clone
https://salsa.debian.org/debian/bubblewrap.git and view bubblewrap.c,
it still has the upstream error message, not the Debian-specific one.
(dgit --quilt=gbp or dgit --quilt=unapplied; I use the latter)
- debian/ only:
This is what you're advocating above.
The git tree contians only debian/. If there is Debian delta to the
upstream source, it is in debian/patches/ as usual.
(dgit --quilt=baredebian* family)
Again, checking for that error is something that can be (and is)
automated: I use this workflow myself (e.g. in bubblewrap), so I know from experience that dgit *does* check for that error, and will fail to build
the source package if the invariant does not hold. Again, dpkg-source
in 3.0 (quilt) format will also make your source package fail to build
if that error exists, except in the cases that it intentionally ignores.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 107:18:25 |
Calls: | 6,852 |
Calls today: | 3 |
Files: | 12,355 |
Messages: | 5,415,942 |