• Bug#1071202: src:gnupg2: upstream tarball ships files not in upstream r

    From Daniel Kahn Gillmor@21:1/5 to All on Thu May 16 03:10:01 2024
    Source: gnupg2
    Severity: minor
    X-Debbugs-Cc: Daniel Kahn Gillmor <dkg@fifthhorseman.net>

    The gnupg2 package is built from source based on the upstream released
    tarball. Upstream also uses git for revision control, and we track
    upstream git as well as the released tarballs. upstream uses OpenPGP to
    sign both git tags and released tarballs.

    We trim many prebuilt files from the tarball, so what's in our debian
    packaging repositories are pretty close to upstream's git repos. But
    not quite all of them.

    Inspired by the recent xz mess, where malicious files were slipped into
    a tarball, i'd like to minimize the amount of non-tracked source used in
    GnuPG. I think we should use debian/clean (and gbp import-orig's
    filtering, see #1071200) to trim out all of the generated files before
    build, so that what we're building from source is as close to upstream traceable git commits as possible.

    I did a quick scan of what we're shipping in revision control (hence,
    what's in the filtered tarball) that the upstream git tag isn't
    accounting for. Here's what i found:

    $ git diff --stat gnupg-2.2.43..upstream/2.2.43 | grep -e '\+' -e 'Bin 0 ->'
    ChangeLog | 34710 ++++++++++++++++++-
    VERSION | 1 +
    common/audit-events.h | 116 +
    common/status-codes.h | 248 +
    doc/defsincdate | 1 +
    doc/gnupg-card-architecture.pdf | Bin 0 -> 19221 bytes
    doc/gnupg-card-architecture.png | Bin 0 -> 8843 bytes
    doc/gnupg-module-overview.pdf | 408 +
    doc/gnupg-module-overview.png | Bin 0 -> 124560 bytes
    po/ca.po | 2295 +-
    po/cs.po | 2303 +-
    po/da.po | 2299 +-
    po/de.po | 2310 +-
    po/el.po | 2295 +-
    po/en@boldquot.po | 10967 ++++++
    po/en@quot.po | 10951 ++++++
    po/eo.po | 2295 +-
    po/es.po | 2307 +-
    po/et.po | 2299 +-
    po/fi.po | 2295 +-
    po/fr.po | 2299 +-
    po/gl.po | 2303 +-
    po/gnupg2.pot | 10636 ++++++
    po/hu.po | 2295 +-
    po/id.po | 2295 +-
    po/it.po | 2295 +-
    po/ja.po | 2295 +-
    po/nb.po | 2295 +-
    po/pl.po | 2295 +-
    po/pt.po | 2295 +-
    po/ro.po | 2307 +-
    po/ru.po | 2303 +-
    po/sk.po | 2303 +-
    po/sv.po | 2299 +-
    po/tr.po | 2295 +-
    po/uk.po | 2299 +-
    po/zh_CN.po | 2295 +-
    po/zh_TW.po | 2291 +-
    regexp/_unicode_mapping.c | 284 +
    242 files changed, 127919 insertions(+), 42329 deletions(-)
    $

    the doc/*.{pdf,png} stuff is fixed already, as of 2.2.43-3, and will be filtered out whenever we move to the next upstream release.

    Here's my attempt at analyzing what remains:

    ChangeLog: this is generated automatically by upstream from upstream git history, and we ship it (half a meg after compression!) in all the
    produced packages. This seems like a lot, and we ought to be able to
    drop it from nearly everywhere. what if we just shipped it with
    gnupg2-doc, and left the other packages with a simple text file? or
    What if we just stopped shipping it altogether? will anyone mind?
    The details are at developer-level, and it'll still be in the source
    tarballs if anyone wants to read the file.

    VERSION: this contains only the upstream version number. Can we
    generate it manually from debian/changelog?

    doc/defsincdate: this file is generated upstream, and can potentially
    introduce non-reproducibility (see debian/patches/debian-packaging/avoid-regenerating-defsincdate-use-shipped-file.patch
    for more discussion). If we strip that file, and drop the above patch
    (or tune it so that it only works with $SOURCE_DATE_EPOCH) then we
    should be able to avoid unreproducibility. Doing so would mean that
    generated documentation files would have the timestamp of the changelog
    entry, though, rather than the timestamp of the upstream tarball.
    that might make (for example) a diffoscope comparison of shipped files
    between point releases unnecessarily noisy.

    common/{audit-events,status-codes}.h: these appear to be stripped and
    rebuilt in maintainer-mode. we're currently building (at least one of
    our builds) in maintainer-mode, so it seems like we ought to be able to
    strip them and ensure that they get rebuilt, but i haven't tested.

    regexp/_unicode_mapping.c: this is another maintainer-mode file,
    generated from UnicodeData.txt. Looks like it contains a mapping
    between upper and lower case codepoints. Debian ships a more up-to-date UnicodeData.txt in the unicode-data package, which includes some
    codepoints (like GLAGOLITIC CAPITAL LETTER CAUDATE CHRIVI and GLAGOLITIC
    SMALL LETTER CAUDATE CHRIVI) that are paired casewise, but are not
    represented in this file. Maybe the right (and more up-to-date)
    solution is to build-depend on unicode-data, strip both this file and UnicodeData.txt in debian/clean, and patch to generate this file from /usr/share/unicode/UnicodeData.txt instead.

    I'm not sure what to do about the po/??.po files. they appear to all be modified/annotated (adding source code file and line number annotations)
    by upstream during "make dist" (when the tarball is created), and then
    our build process re-annotates them. Seems like it would be nicer to
    work with the unannotated files, because then we could apply patches
    that are simpler to port from version to version.

    I also don't fully understand the l10n mechanism used here: if po/en@boldquot.po, po/en@quot.po, and po/gnupg2.pot are generated during
    "make dist", it seems like we ought to be able to generate them
    ourselves directly, but i haven't tested.

    Happy to hear any suggestions about the right way forward to bring GnuPG
    in debian more in line with upstream's revision control, to reduce the
    amount of slippage that can be introduced in a tarball.

    If we could somehow prune to a state where we are building from (a
    subset of) the intersection of the upstream git tag and the released
    tarball, that would give us something concrete to automatically check on
    each version upgrade.

    --dkg

    -- System Information:
    Debian Release: trixie/sid
    APT prefers testing-debug
    APT policy: (500, 'testing-debug'), (500, 'testing'), (500, 'stable'), (500, 'oldstable'), (200, 'unstable-debug'), (200, 'unstable'), (1, 'experimental-debug'), (1, 'experimental')
    Architecture: amd64 (x86_64)

    Kernel: Linux 6.7.12-amd64 (SMP w/4 CPU threads; PREEMPT)
    Kernel taint flags: TAINT_FIRMWARE_WORKAROUND
    Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
    Shell: /bin/sh linked to /usr/bin/dash
    Init: systemd (via /run/systemd/system)

    -- no debconf information

    -----BEGIN PGP SIGNATURE-----

    wr0EARYKAG8FgmZFWk4JEHctFh41zUuBRxQAAAAAAB4AIHNhbHRAbm90YXRpb25z LnNlcXVvaWEtcGdwLm9yZ09No7nVS3Zi8RFfiKQgCXDR2ektJOYb3ZtVHhn3lCM3 FiEEdLwExD2GCEvoZywGdy0WHjXNS4EAAJIIAQCb96hWx62g3qKTgdIqQVgYL0sS Ic+E/gzSEOVgu4XdbwEA8pgXPHxOqZtRgxyhKGFxr+0xrDPPVUpGijkFrlhZkwc=
    =vxVL
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From NIIBE Yutaka@21:1/5 to Daniel Kahn Gillmor on Fri May 17 02:40:01 2024
    Hello,

    For your information, let me explain about regexp support.

    Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:
    regexp/_unicode_mapping.c | 284 +
    [...]
    Maybe the right (and more up-to-date) solution is to build-depend on unicode-data, strip both this file and UnicodeData.txt in
    debian/clean, and patch to generate this file from /usr/share/unicode/UnicodeData.txt instead.

    The regexp subdirectory was introduced to support POSIX regexp functions
    on Windows. The intention is providing same behavior among GnuPG on
    different Operating Systems. Historically, regexp in OpenPGP had been
    not supported by Windows versions of GnuPG; In the past, when a user
    switched his Operating System from common POSIX one to Windows, it
    stopped working.

    If it is only for Debian, possibly, we can simply use POSIX regexp
    functions in the C library, perhaps.

    There are corner cases for regexp matching among different regexp
    functions support and Unicode versions.

    Strictly speaking about a data specification, it would be more acculate
    to specify exact Unicode version explicitly in the OpenPGP standard.
    --

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daniel Kahn Gillmor@21:1/5 to NIIBE Yutaka on Wed May 22 06:50:01 2024
    Hi gniibe--

    Thanks for this additional info!

    On Fri 2024-05-17 09:02:40 +0900, NIIBE Yutaka wrote:
    The regexp subdirectory was introduced to support POSIX regexp functions
    on Windows. The intention is providing same behavior among GnuPG on different Operating Systems. Historically, regexp in OpenPGP had been
    not supported by Windows versions of GnuPG; In the past, when a user
    switched his Operating System from common POSIX one to Windows, it
    stopped working.

    If it is only for Debian, possibly, we can simply use POSIX regexp
    functions in the C library, perhaps.

    If GnuPG doesn't use this regexp dir when building on Debian, that
    sounds fine to me :) Then we definitely don't need to use or ship that
    mapping file!

    There are corner cases for regexp matching among different regexp
    functions support and Unicode versions.

    yes, the regexp support in the standard is ill-specified in a lot of
    ways, and most implementations struggle to implement it properly, for
    all kinds of reasons.

    We don't have good interop tests for it yet because we haven't extended
    sop into certificate management. I should probably try to get that
    under way. :/

    Strictly speaking about a data specification, it would be more acculate
    to specify exact Unicode version explicitly in the OpenPGP standard.

    Unicode is supposed to evolve in a stable and sane way. I think tying
    OpenPGP to a specific version of Unicode would be a mistake; it's hard
    enough to upgrade OpenPGP as it is, without having to coordinate across versions of unicode in the first place.

    --dkg

    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    wr0EARYKAG8FgmZLrEgJEHctFh41zUuBRxQAAAAAAB4AIHNhbHRAbm90YXRpb25z LnNlcXVvaWEtcGdwLm9yZwxT2dLS1wbSpWR2An1Qwdfe1N0JJpx50KVIiA80uTdg FiEEdLwExD2GCEvoZywGdy0WHjXNS4EAAMqiAQC/gGkWCitrzzf2v4pYHvfuRRk0 8H9rVoqM8S/mH57LNQD/X4p4Ivty/Pm033fTg0hku4obLLdaJpXxPxDvWpLTQws=bh7G
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)