• Upstream dist tarball transparency (was Re: Validating tarballs against

    From Guillem Jover@21:1/5 to Russ Allbery on Sat Apr 6 09:50:59 2024
    Hi!

    On Fri, 2024-03-29 at 23:29:01 -0700, Russ Allbery wrote:
    On 2024-03-29 22:41, Guillem Jover wrote:
    (For dpkg at least I'm pondering whether to play with switching to
    doing something equivalent to «git archive» though, but see above, or maybe generate two tarballs, a plain «git archive» and a portable one.)

    Yeah, with my upstream hat on, I'm considering something similar, but I
    still believe I have users who want to compile from source on systems
    without current autotools, so I still need separate release tarballs.
    Having to generate multiple release artifacts (and document them, and
    explain to people which ones they want, etc.) is certainly doable, but I can't say that I'm all that thrilled about it.

    I think with my upstream hat on I'd rather ship a clear manifest (checked into Git) that tells distributions which files in the distribution tarball are build artifacts, and guarantee that if you delete all of those files,
    the remaining tree should be byte-for-byte identical with the
    corresponding signed Git tag. (In other words, Guillem's suggestion.)
    Then I can continue to ship only one release artifact.

    I've been pondering about this and I think I might have come up with a
    protocol that to me (!) seems safe, even against a malicious upstream. And
    does not require two tarballs which as you say seems cumbersome, and makes
    it harder to explain to users. But I'd like to run this through the list
    in case I've missed something obvious.

    I've implemented a prototype for dpkg, in the branch:

    https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/log/?h=next/dist-transparency

    For context, for a long time dpkg dist tarballs have already shipped a «.dist-version», I think some GNU projects started to do something
    similar but with a different name.

    The relevant commits:

    https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=54a6ad9db3da335a40fed9020195864c4a87bdc1
    (Add .dist-vcs-id, in git main already)

    At least for dpkg, if «make dist» is run from outside a tag, then
    the version will include the commit and whether the working dir
    was dirty, but from a tag, only the version is included and there's
    no link to what commit that was pointing to at that time. This file
    adds that link, regardless of the current commit. And prints it as
    part of the configure summary.

    https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=1944a90d13c7c63592c438e550a212ab9e3aad76
    (Remove VCS specific files from dist)

    Simplifies the comparisons.

    https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=39d181e60b3413c58a72056beec0a5a6f584cd92
    (Add .dist-vcs-url)

    This adds a new file to track the upstream VCS URL, so that it can
    used from a deterministic place, for verification purposes.

    https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?id=b3d7e0f195bd69b4622121e78fce751ea76dc0bc
    (Add .dist-vcs-files)

    This adds a new file with the list of files *in* the VCS, so that
    we can get back to that clean state, even from a distributed
    tarball, or from a extracted directory with built artifacts.

    I also thought about listing the autogenerated files as Russ
    mentions, but that seems error-prone and non-exhaustive, because
    those might depend on the version of the autotools (or other build
    system) used, and does not include artifacts part of the build phase,
    which could be used to smuggle things in.

    This last commit lists the three operations that all this makes
    possible:

    * list difference in file lists (should be none)
    * list difference in file contents (should be none)
    * resetting the directory into a state like the VCS (except
    for the VCS tracking/supporting files)

    These operations are fairly generic, the one thing I could see
    being "configurable" is the VCS files to exclude, maybe via
    another file, but I've not thought about the consequences here.


    I think this is safe (in the sense of detecting smuggled artifacts or modifications in the dist tarball not present in the VCS, but certainly
    not against modifications or artifacts smuggled in the VCS), because a
    user that wants to verify any of this can make sure the URL is the
    expected one, and everything else seems to fall from here, otherwise
    you should get differences. (Thinking now, perhaps one of the checks
    should be whether the expected tag or branch matches the commit id?)

    This is currently catered for a Debian native package or just handling
    the upstream part with no packaging, but I don't think it would be much
    work to integrate this into packaged upstreams (mostly excluding whatever
    is in the debian.tar parts?), or even to use something like this from
    an upstream that does not provide these files by adding equivalent files
    or metadata in the packaging.

    The only things that one would need to trust are the invocations to
    perform those actions, which should *not* be part of the distributed
    tarball. I'm thinking to perhaps create a new git repo containing
    those snippets so that users can use them or as reference
    implementations. Or perhaps include some of this within dpkg (but then
    those could not be used to verify dpkg itself :D).

    For dpkg, I'm considering merging this, and then performing the resetting during the package build. Eventually perhaps this could be added as
    a feature in dpkg-source?

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)