Finally, while this is somewhat gnulib specific, I think the practice
goes beyond gnulib
Simon Josefsson wrote:
Finally, while this is somewhat gnulib specific, I think the practice
goes beyond gnulib
Yes, gnulib-tool for modules written in C is similar to
* 'npm install' for JavaScript source code packages [1],
* 'cargo fetch' for Rust source code packages [2],
except that gnulib-tool is simpler: it fetches from a single source location only.
How does Debian handle these kinds of source-code dependencies?
I would assume that (some stripped down
version of) git is a requirement to do any useful work on any platform
these days, so maybe it isn't a problem
The current approach of running autoreconf -fi is based on a
misunderstanding: autoreconf -fi is documented to not replace certain
files with newer versions:
https://lists.nongnu.org/archive/html/bug-gnulib/2024-04/msg00052.html
1) Use upstream's PGP signed git-archive tarball.
To reach our goals in the beginning of this post, this upstream tarball
has to be filtered to remove all pre-generated artifacts and vendored
code. Use some mechanism, like the debian/copyright Files-Excluded
mechanism to remove them. If you used a git-archive upstream tarball, chances are higher that you won't have to do a lot of work especially
for pre-generated scripts.
There is one design of gnulib that is important to understand: gnulib is
a source-only library and is not versioned and has no release tarballs.
Its release artifact is the git repository containing all the commits. Packages like coreutils, gzip, tar etc pin to one particular commit of gnulib.
1) Use upstream's PGP signed git-archive tarball.
Here's how I do it in e2fsprogs which (a) makes the git-archive
tarball be bit-for-bit reproducible given a particular git commit ID,
and (b) minimizes the size of the tarball when stored using
pristine-tar:
https://github.com/tytso/e2fsprogs/blob/master/util/gen-git-tarball
To reach our goals in the beginning of this post, this upstream tarball
has to be filtered to remove all pre-generated artifacts and vendored
code. Use some mechanism, like the debian/copyright Files-Excluded
mechanism to remove them. If you used a git-archive upstream tarball,
chances are higher that you won't have to do a lot of work especially
for pre-generated scripts.
Why does it *has* to be filtered? For the purposes of building, if
you really want to nuke all of the pre-generated files, you can just
move them out of the way at the beginning of the debian/rules run, and
then move them back as part of "debian/rules clean". Then you can use autoreconf -fi to your heart's content in debian/rules (modulo
possibly breaking things if you insist on nuking aclocal.m4 and
regenerating it without taking proper care, as discussed above).
This also allows the *.orig.tar.gz to be the same as the upstream
signed PGP tarball, which you've said is the ideal, no?
There is one design of gnulib that is important to understand: gnulib is
a source-only library and is not versioned and has no release tarballs.
Its release artifact is the git repository containing all the commits.
Packages like coreutils, gzip, tar etc pin to one particular commit of
gnulib.
Note that how we treat gnulib is a bit differently from how we treat
other C shared libraries, where we claim that *all* libraries must be dynamically linked, and that include source code by reference is
against Debian Policy, precisely because of the toil needed to update
all of the binary packages should some security vulnerability gets
discovered in the library which is either linked statically or
included by code duplication.
And yet, we seem to have given a pass for gnulib, probably because it
would be too awkward to enforce that rule *everywhere*, so apparently
we've turned a blind eye.
I personally think the "everything must be dynamically linked" to be
not really workable in real life, and should be an aspirational goal
--- and the fact that we treat gnulib differently is a great proof
point about how the current debian policy is not really doable in real
life if it were enforced strictly, everywhere, with no exceptions....
Certainly for languages like Rust, it *can't* be enforced, so again,
that's another place where that rule is not enforced consistently; if
it were, we wouldn't be able to ship Rust programs.
The best solution to this is to try to promote people to put those
autoconf macros that they are manually maintaining that can't be
supplied in acinclude.m4, which is now included by default by autoconf
in addition to aclocal.m4.
Note that how we treat gnulib is a bit differently from how we treat
other C shared libraries, where we claim that *all* libraries must be dynamically linked, and that include source code by reference is against Debian Policy, precisely because of the toil needed to update all of the binary packages should some security vulnerability gets discovered in
the library which is either linked statically or included by code duplication.
And yet, we seem to have given a pass for gnulib, probably because it
would be too awkward to enforce that rule *everywhere*, so apparently
we've turned a blind eye.
"Theodore Ts'o" <tytso@mit.edu> writes:
And yet, we seem to have given a pass for gnulib, probably because it
would be too awkward to enforce that rule *everywhere*, so apparently
we've turned a blind eye.
No, there's an explicit exception for cases like gnulib. Policy 4.13:
Some software packages include in their distribution convenience copies of code from other software packages, generally so that users compiling from source don’t have to download multiple packages. Debian
packages should not make use of these convenience copies unless the included package is explicitly intended to be used in this way.
In ecosystems like NPM, Cargo, Golang, Python and so on pinning to
specific versions is also "explicitly intended to be used"; they just sometimes don't include convenience copies directly as they have tooling
to download these (which is not allowed in Debian).
(Arguably Debian should use those more often as keeping all software at
the same dependency version is a futile effort IMHO...)
Going into detail, you use 'gzip -9n' but I use git-archive defaults
which is the same as -n aka --no-name. I agree adding -9 aka --best is
an improvement. Gnulib's maint.mk also add --rsyncable, would you agree
that this is also an improvement?
Right, there is no requirement for orig.tar.gz to be filtered. But then
the outcome depends on upstream, and I don't think we can convince all upstreams about these concerns. Most upstream prefer to ship
pre-generated and vendored files in their tarballs, and will continue to
do so.
free from vendored or pre-generated files. That's the case for most
upstream tarballs in Debian today (including e2fsprogs, openssh,
coreutils). Without filtering that tarball we won't fulfil the goals I mentioned in the beginning of my post. The downsides with not filtering include (somewhat repeating myself):
...
On Sun, May 12, 2024 at 04:27:06PM +0200, Simon Josefsson wrote:
Going into detail, you use 'gzip -9n' but I use git-archive defaults
which is the same as -n aka --no-name. I agree adding -9 aka --best is
an improvement. Gnulib's maint.mk also add --rsyncable, would you agree
that this is also an improvement?
I'm not convinced --rsyncable is an improvement. It makes the
compressed object slightly larger, and in exchange, if the compressed
object changes slightly, it's possible that when you rsync the changed
file, it might be more efficient. But in the case of PGP signed
release tarballs, the file is constant; it's never going to change,
and even if there are slight changes between say, e2fsprogs v1.47.0
and e2fsprogs v1.47.1, in practice, this is not something --rsyncable
can take advantage of, unless you manually copy
e2fsprogs-v1.47.0.tar.gz to e2fsprogs-v1.47.1.tar.bz, and then rsync e2fsprogs-v1.471.tar.g.... and I don't think anyone is doing this,
either automatically or manually.
That being said, --rsyncable is mostly harmless, so I don't have
strong feelings about changing it to add or remove in someone's
release workflow.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 360 |
Nodes: | 16 (2 / 14) |
Uptime: | 129:24:27 |
Calls: | 7,686 |
Files: | 12,828 |
Messages: | 5,711,157 |