• [gentoo-dev] About EGO_SUM

    From Florian Schmaus@21:1/5 to All on Fri Jun 3 13:20:01 2022
    EGO_SUM is marked as 'deprecated' in go-module.eclass [1, 2]. I
    acknowledge that there are packages where the usage of EGO_SUM is very problematic. However, I wonder if there are packages where using
    dependency tarballs is problematic while using EGO_SUM would be not.

    Take for example an ebuild containing

    SRC_URI="

    https://salsa.debian.org/baz/${PN}/-/archive/v${PV}/${PN}-v${PV}.tar.bz2
    ${P}.tar.bz2
    https://personal.site/files/gentoo/${P}-vendor.tar.xz
    "

    where ${P}-vendor.tar.xz is a Go dependency tarball, containing only a
    few Go modules. Hence EGO_SUM would contain only a few entries in this case.

    I see multiple issues of using dependency tarballs in such cases.

    First, my trust in a tarball created by someone and hosted somewhere is
    lower than the contents of the artifacts hosted on an official hub.
    Next, if anyone takes the time to review the contents of the dependency tarball, it may only benefit Gentoo. On the other hand, if someone
    reviews EGO_SUM artifacts, the whole Go ecosystem will benefit.

    I may not know Gentoo's mirror system in detail, but I believe using
    EGO_SUM facilitates cross-package distfile sharing. While dependency
    tarballs will increase the space requirements, and, probably more
    importantly, the load on the mirrors.

    Even more problematic are that dependency tarballs require additional
    steps that would not be required when EGO_SUM is used. While those steps
    appear simple, behavioral theory shows that even the tiniest additional
    steps have a huge impact (e.g., online shops loose a relative large
    share of customers if for each an additional checkout step). If we force dependency tarballs for Go software, then packaging Go software just
    become a little bit harder.

    This leads me to the question why are we actually deprecating EGO_SUM?
    It seems like a nice alternative for Go packaging that we may want to
    keep. But maybe I am missing something?

    - Flow


    1: https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L108
    2: https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L349-L352

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ionen Wolkens@21:1/5 to Florian Schmaus on Fri Jun 3 15:00:01 2022
    On Fri, Jun 03, 2022 at 01:18:08PM +0200, Florian Schmaus wrote:
    EGO_SUM is marked as 'deprecated' in go-module.eclass [1, 2]. I
    acknowledge that there are packages where the usage of EGO_SUM is very problematic. However, I wonder if there are packages where using
    dependency tarballs is problematic while using EGO_SUM would be not.

    Take for example an ebuild containing

    SRC_URI="

    https://salsa.debian.org/baz/${PN}/-/archive/v${PV}/${PN}-v${PV}.tar.bz2
    ${P}.tar.bz2
    https://personal.site/files/gentoo/${P}-vendor.tar.xz
    "

    where ${P}-vendor.tar.xz is a Go dependency tarball, containing only a
    few Go modules. Hence EGO_SUM would contain only a few entries in this case.

    I see multiple issues of using dependency tarballs in such cases.

    First, my trust in a tarball created by someone and hosted somewhere is lower than the contents of the artifacts hosted on an official hub.
    Next, if anyone takes the time to review the contents of the dependency tarball, it may only benefit Gentoo. On the other hand, if someone
    reviews EGO_SUM artifacts, the whole Go ecosystem will benefit.

    I do wonder what degree of verification is being done when these get
    merged at the moment, ideally upstream go.sum would be used at build
    time but well (I can go around and change code in the vendor tarball
    and it builds just fine at the moment). https://github.com/golang/go/issues/27348

    If I start merging these guess I'd end up making myself a script to
    make my own tarball and compare it's identical with the proxied
    maintainer's.


    I may not know Gentoo's mirror system in detail, but I believe using
    EGO_SUM facilitates cross-package distfile sharing. While dependency tarballs will increase the space requirements, and, probably more importantly, the load on the mirrors.

    Even more problematic are that dependency tarballs require additional
    steps that would not be required when EGO_SUM is used. While those steps appear simple, behavioral theory shows that even the tiniest additional steps have a huge impact (e.g., online shops loose a relative large
    share of customers if for each an additional checkout step). If we force dependency tarballs for Go software, then packaging Go software just
    become a little bit harder.

    This leads me to the question why are we actually deprecating EGO_SUM?
    It seems like a nice alternative for Go packaging that we may want to
    keep. But maybe I am missing something?

    Missed bits and pieces but was never quite sure why this went toward
    full deprecation, just discouraged may have been fair enough, or
    (maybe?) impose a limit at which the eclass will tell you to use a
    vendor tarball so this doesn't get constantly ignored bringing us
    back to square 1.

    Not that I work with Go packages so I don't have much to say here.
    fwiw there is one rust ebuild which I'm thinking to use a vendor
    tarball due to ridiculous crates, while there is e.g. media-libs/cubeb
    with only 12. So I'm happy I can choose (not that rust is as bad
    as Go in that regard).


    1: https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L108
    2: https://github.com/gentoo/gentoo/blob/9fec686abf789fdff36a90c3763d9558203cbf9a/eclass/go-module.eclass#L349-L352


    --
    ionen

    -----BEGIN PGP SIGNATURE-----

    iQEzBAABCAAdFiEEx3SLh1HBoPy/yLVYskQGsLCsQzQFAmKaBPgACgkQskQGsLCs QzShNwgAjtjCgz9Z+IujWK/1KEsgqwigpxPd2R4xNghED/BcTUEY5P6tAriQ77BL pAMmyPrAhDI/HJDIgdAnKN03lf0MRh2wF4BvBU+LmTPQ8A3f38HtupkfHbJvZ5Kq 3HgpWyXCRtCIsW8NFuk9nBmyDRM9zIDwitFdaQKQdHT//r2uT+7QjLvmCkuC5Oy0 1CMwXAPMmDZsWqrgYwyNng+FoiMI8j//nPm0Z2HgBD83eUgPLlbmaazmkw86kQwT Lq72jvdp5TbzKnO/uDfY6AW2tdEJ/3eTgTcmRsuoslT99TmHEV42+HojzI+B314p Rj6hUBr7okY5n1c7QhElLavd7BYmbQ==
    =J1sD
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Robin H. Johnson@21:1/5 to Florian Schmaus on Wed Jun 8 22:50:01 2022
    On Fri, Jun 03, 2022 at 01:18:08PM +0200, Florian Schmaus wrote:
    EGO_SUM is marked as 'deprecated' in go-module.eclass [1, 2]. I
    acknowledge that there are packages where the usage of EGO_SUM is very problematic. However, I wonder if there are packages where using
    dependency tarballs is problematic while using EGO_SUM would be not.
    ... [snip all the great points]
    Even more problematic are that dependency tarballs require additional
    steps that would not be required when EGO_SUM is used. While those steps appear simple, behavioral theory shows that even the tiniest additional steps have a huge impact (e.g., online shops loose a relative large
    share of customers if for each an additional checkout step). If we force dependency tarballs for Go software, then packaging Go software just
    become a little bit harder.
    Your above is entirely correct, and I was against the plan to introduce dependency tarballs.

    This leads me to the question why are we actually deprecating EGO_SUM?
    It seems like a nice alternative for Go packaging that we may want to
    keep. But maybe I am missing something?
    EGO_SUM vs dependency tarballs:
    - bloats ebuilds
    - bloats Manifests
    - bloats metadata/md5-cache/ (SRC_URI etc)
    - doesn't bloat mirrors with gentoo-unique distfiles
    - EGO_SUM is verifiable/reproducible from Upstream Go systems
    - less downloads on upgrades (only changed Go deps, not entire dep tarballs)

    EGO_SUM data right now adds, to every user's system:
    - 2.6MB of text to ebuilds (340k after de-dupe)
    - 7MB of text to Manifests (2M after de-dupe)
    - 6.4MB+ of text to metadata/md5-cache (I don't have a easy way to calc deduped amount here)
    On the server side:
    - The sum total of Go distfiles mirrored on Gentoo mirrors right now is only 3.4GB.
    - less downloads

    Dependency tarballs:
    - Right now ~15GiB on each mirror, plus storage of the primary copy
    somewhere (dev.g.o right now, but not great)
    - Conservatively if the remaining EGO_SUM packages converted to Dep
    tarballs, it would need another 8GB each of primary location and
    mirrors.
    - larger downloads for users who DO want to upgrade a Go package (all
    new deps tarball even if only one or two deps changed)
    - must be preserved much longer, unless we can introduce a guaranteed
    way to regenerate them for any prior ebuild.

    I was trying to introduce a third option, but I haven't had the time to
    write an entire GLEP.

    The TL;DR is introducing a 2nd-level Manifest+metadata file, that tries
    to move just the metadata out of the tree, in a way that can be
    regenerated (specifically, a 1:1 reproducible creation from a given go.sum).
    It DOES need to contain slightly more data than the present Manifest, specifically a full SRC_URI entry for each file (upstream URI plus what
    to rename it to on Gentoo side)

    The 2nd-level Manifest would be listed as SRC_URI, and be handled in src_fetch/src_unpack. Download & verify the extra distfiles, against the Manifest checksum data (and for Golang against go.sum checksums).

    The Portage mirrordist code needs the most work in this case, as it
    would need to fetch the 2nd-level Manifests so it can populate Gentoo mastermirror with the distfiles mirrored from upstream.

    The storage costs for the proposed idea:
    - same 1:1 base distfile storage as EGO_SUM (e.g. upstream distfiles are
    mirrored 1:1 content, just different naming)
    - Probably 1 Metadata-Manifest file per ebuild $PVR (conceptually it
    could be split more or shared between some ebuilds/packages)
    - Main tree Manifests: 1 DIST entry per Metadata-Manifest in a given package
    - Main tree ebuilds: 1 line for the Metadata-Manifest in the ebuild.
    - metadata/md5-cache: 1 src_uri line!
    - mirrors: add the Metadata-Manifest

    --
    Robin Hugh Johnson
    Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
    E-Mail : robbat2@gentoo.org
    GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
    GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v2
    Comment: Robbat2 @ Orbis-Terrarum Networks - The text below is a digital signature. If it doesn't make any sense to you, ignore it.

    iQKTBAABCgB9FiEEveu2pS8Vb98xaNkRGTlfI8WIJsQFAmKhCcZfFIAAAAAALgAo aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEJE RUJCNkE1MkYxNTZGREYzMTY4RDkxMTE5Mzk1RjIzQzU4ODI2QzQACgkQGTlfI8WI JsSmnxAAiJ/HitJcTsqa7IIN9cp22VtBtMu5H3WtJ2XzDDZmxO822BLphY0ecMVT YhriXtVD0uaUhKXVAD4LkKHQDung81+n/xKjv39/Vlq3+S38TeUdjiwKotKCrXNX CkcSZKsTN7SaA1KkTrnicdQT+tSIfiZBIHK+O4LDhqt9xX6+Cky0GW+oRSJ1IFG4 eKi1UzHg52vviPrYNFjufYjvxyBMvYmXDY0xB1Xv+1wfcc4xjSL4NVU6SdBBtYBT zCqcUFJhuETaCS0SiXNR+011UXucFO7ZnPLKrkX0OEelXsSvVLgWtYkse0cBtdAj a55NKZYwGMWu95jRC9MN1U5mZ4uRd2DYY/ySE5OCySZONMPlHbZgaWxWrlLULsZ8 zrOwhWc90eEa+wlF1TSr
  • From Madhu@21:1/5 to All on Thu Jun 9 08:20:01 2022
    * "Robin H. Johnson" <robbat2-20220608T184338-394361540Z @orbis-terrarum.net> : Wrote on Wed, 8 Jun 2022 20:42:48 +0000:
    EGO_SUM vs dependency tarballs:
    - bloats ebuilds
    - bloats Manifests
    - bloats metadata/md5-cache/ (SRC_URI etc)
    - doesn't bloat mirrors with gentoo-unique distfiles
    - EGO_SUM is verifiable/reproducible from Upstream Go systems
    - less downloads on upgrades (only changed Go deps, not entire dep tarballs)

    EGO_SUM data right now adds, to every user's system:
    - 2.6MB of text to ebuilds (340k after de-dupe)
    - 7MB of text to Manifests (2M after de-dupe)
    - 6.4MB+ of text to metadata/md5-cache (I don't have a easy way to
    calc deduped amount here)
    On the server side:
    - The sum total of Go distfiles mirrored on Gentoo mirrors right now
    is only 3.4GB.
    - less downloads

    Dependency tarballs:
    - Right now ~15GiB on each mirror, plus storage of the primary copy
    somewhere (dev.g.o right now, but not great)
    - Conservatively if the remaining EGO_SUM packages converted to Dep
    tarballs, it would need another 8GB each of primary location and
    mirrors.
    - larger downloads for users who DO want to upgrade a Go package (all
    new deps tarball even if only one or two deps changed)
    - must be preserved much longer, unless we can introduce a guaranteed
    way to regenerate them for any prior ebuild.

    I was trying to introduce a third option, but I haven't had the time to
    write an entire GLEP.

    The TL;DR is introducing a 2nd-level Manifest+metadata file, that tries
    to move just the metadata out of the tree, in a way that can be
    regenerated (specifically, a 1:1 reproducible creation from a given go.sum). It DOES need to contain slightly more data than the present Manifest, specifically a full SRC_URI entry for each file (upstream URI plus what
    to rename it to on Gentoo side)

    The 2nd-level Manifest would be listed as SRC_URI, and be handled in src_fetch/src_unpack. Download & verify the extra distfiles, against the Manifest checksum data (and for Golang against go.sum checksums).

    The Portage mirrordist code needs the most work in this case, as it
    would need to fetch the 2nd-level Manifests so it can populate Gentoo mastermirror with the distfiles mirrored from upstream.

    The storage costs for the proposed idea:
    - same 1:1 base distfile storage as EGO_SUM (e.g. upstream distfiles are
    mirrored 1:1 content, just different naming)
    - Probably 1 Metadata-Manifest file per ebuild $PVR (conceptually it
    could be split more or shared between some ebuilds/packages)
    - Main tree Manifests: 1 DIST entry per Metadata-Manifest in a given package - Main tree ebuilds: 1 line for the Metadata-Manifest in the ebuild.
    - metadata/md5-cache: 1 src_uri line!
    - mirrors: add the Metadata-Manifest

    [Without claiming to have fully understood the proposal above: around
    Apr 15th 22 I tried suggesting to WilliamH on IRC that perhaps portage
    should implement the dirhash approach that go has taken to solve the
    problem of upstream sources when they invented go.sum.

    from hash.go in sources
    go/src/cmd/vendor/golang.org/x/mod/sumdb/dirhash/hash.go

    // Hash1 is "h1:" followed by the base64-encoded SHA-256 hash of a
    summary prepared as if by the Unix command:find . -type f | sort |
    sha256sum

    loosely speaking the "manifest" could publish this dirhash of contents
    of go-mod/cache (which would have been bundled in the -deps.tar.xz)

    The immediate motivation was to avoid the network when I already had the sources locally: instead of downloading a -deps.tar.xz I could create it locally and dump it in distdir. portage would check the (hypothetically) published dirhash and let it through. the local timestamps and uid in my tarball and the upstream tarball wouldn't upset it.

    One unchecked assumption is that go-mod/cache can be recreated by
    unpacking sources. If so then with a notion of a "second level manifest"
    (the equivalent of go.sum) the contents can be assembled without having
    to store or download the actual -deps tarball.

    I didn't get very far in convincing WilliamH of my need so I dropped
    the idea. (I'm not sure if I'm being any clearer, if I'm missing
    something, do let me know)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sebastian Pipping@21:1/5 to Robin H. Johnson on Thu Jun 9 19:50:01 2022
    On 08.06.22 22:42, Robin H. Johnson wrote:
    EGO_SUM vs dependency tarballs:
    [..]
    - EGO_SUM is verifiable/reproducible from Upstream Go systems

    Let's be explicit, there is a _security_ threat here: as a user of an
    ebuild, dependency tarballs now take effort in manual review just to
    confirm that the content full matches its supposed list of ingredients.
    They are the perfect place to hide malicious code in plain sight. Now
    with dependency tarballs, there is a new layer that by design will
    likely be chronically under-audited. It gives me shivers, frankly.
    Previously with a manifest and upstream-only URLs, only upstream can add malicious code, not downstream in Gentoo.

    Best



    Sebastian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anna@21:1/5 to Sebastian Pipping on Thu Jun 9 20:20:01 2022
    On 2022-06-09 19:49, Sebastian Pipping wrote:
    On 08.06.22 22:42, Robin H. Johnson wrote:
    EGO_SUM vs dependency tarballs:
    [..]
    - EGO_SUM is verifiable/reproducible from Upstream Go systems

    Let's be explicit, there is a _security_ threat here: as a user of an
    ebuild, dependency tarballs now take effort in manual review just to
    confirm that the content full matches its supposed list of ingredients.
    They are the perfect place to hide malicious code in plain sight. Now
    with dependency tarballs, there is a new layer that by design will
    likely be chronically under-audited. It gives me shivers, frankly. Previously with a manifest and upstream-only URLs, only upstream can add malicious code, not downstream in Gentoo.

    I think dependency tarballs are a temporary solution. Maintainers should
    send upstream patches for their release CI/scripts to include the
    "vendor" directory.

    Seems like there will be an option in goreleaser soon: https://github.com/goreleaser/goreleaser/issues/2911

    I do it with just 'go' and 'tar' for the time being.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Helmert III@21:1/5 to Sebastian Pipping on Thu Jun 9 20:40:01 2022
    On Thu, Jun 09, 2022 at 07:49:04PM +0200, Sebastian Pipping wrote:
    On 08.06.22 22:42, Robin H. Johnson wrote:
    EGO_SUM vs dependency tarballs:
    [..]
    - EGO_SUM is verifiable/reproducible from Upstream Go systems

    Let's be explicit, there is a _security_ threat here: as a user of an
    ebuild, dependency tarballs now take effort in manual review just to
    confirm that the content full matches its supposed list of ingredients.
    They are the perfect place to hide malicious code in plain sight. Now
    with dependency tarballs, there is a new layer that by design will
    likely be chronically under-audited. It gives me shivers, frankly. Previously with a manifest and upstream-only URLs, only upstream can add malicious code, not downstream in Gentoo.

    There are many packages in ::gentoo that use tarballs of patches
    written and hosted by Gentoo developers, or tarballs of source code
    generated by developers themselves. A (very) rough grep shows this is
    very prevalent:

    ~/gentoo/gentoo $ grep -r SRC_URI.*dev.gentoo.org | wc -l
    2845

    So this problem isn't really new. Users are required to trust Gentoo
    packagers that we don't do naughty things to the source code, more or
    less just like any other distribution.
    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCAAdFiEElFuPenBj6NvNLoABXP0dAeB+IzgFAmKiPQ8ACgkQXP0dAeB+ Izixyg//ekndfVkGfsWHVAhEGLbMoIPVMrEeovIur7TTqQCnxTptDg8jH0tXK3Zg ekjseUE2TAny9IICI5x3rk+pybfA5miBQsnQ864qVjB4p67+PVjDExPHa30lsraa RrRc4uonjWTvebCsW2P8mT+Afb9NMq4P9qxrb8zyn4pDB5FuuxSQWvBNeeTNwwAa 5fxAfjiErQU7+WlxDqZXkon9kD3llxOontjLwa/sTMIgS+71m/BlUH5ECHr2sIYC NTFskQbJ3/Y6u5saJeG4vrgCKC+sybFNX9blRHRnxy4SygjL7Mhb/dr2t3T6qNr7 2dr6NJZV+hUVeMyJwrgy/gti1bE5az+AYm2Dolk4tyFHG1D+0rnNBLnMJaLLaCB1 EIkRGc5yHrooID/X+Glgtv/TNjKS/GD3wsMewpXpAJAvCJL3Io+ft6FR1Sf3kJGw hl/XvS/seVcHNXwvugAn051BqkXUpuNL5AhJGItC4KtHRrg/5SoDov0gZqCN3yxi 3YCF6oyZsD+I4OVdKClK3ycip2FkkQ0uIt3g/QspjD4wmpOqtCVblgn6ny1zqzqw 8q3L9Lcj3XXmeGa9IBsW9f5kW1dYyhnEgcj4jHIvc6fD7UYtpTip2U/JFISBfI60 YLggGcTOLbtTgX19YLJq2SkObTkPYpknVZIpDfKAqUTeSfSzsYA=
    =pw25
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)