• [gentoo-dev] [GLEP78] Updating specification

    From Ulrich Mueller@21:1/5 to All on Mon Sep 13 12:10:02 2021
    On Mon, 13 Sep 2021, Sheng Yu wrote:

    -The archive contains a number of files, stored in a single directory
    -whose name should match the basename of the package file. However,
    -the implementation must be able to process an archive where
    -the directory name is mismatched. There should be no explicit archive -member entry for the directory.
    +The archive contains a number of files. All package-related files
    +should be stored in a single directory whose name matches the CPV of
    +the package file. However, the implementation must be able to process
    +an archive where the directory name is mismatched. There should be no +explicit archive member entry for the directory.

    I wonder about CPV here. That's ${CATEGORY}/${P} and contains a slash,
    so it cannot be the name of a directory. Also, what about the package
    revision?

    +6. The package manifest data file ``Manifest`` (required).
    +
    +7. A signature for the package Manifest file ``Manifest.sig``
    + (optional).

    Given that the outer archive is uncompressed tar, every file will be zero-padded to a full block which adds some amount of bloat. So, could
    the signature be inlined in the Manifest file? That's also what GLEP 74 specifies.

    Also, IIRC one of the goals of the format was to allow partial download
    of metadata. That will only work if the Manifest file will be the first
    file in the archive (or at least appear before the image archive).

    +The implementation follows the Manifest specifications in GLEP 74 +[#GLEP74]_ and uses the DATA tag for files within the archive.

    AFAICS, GLEP 74 specifies an OpenPGP cleartext signature in the file
    itself, not a detached signature.

    Ulrich

    -----BEGIN PGP SIGNATURE-----

    iQFDBAEBCAAtFiEEtDnZ1O9xIP68rzDbUYgzUIhBXi4FAmE/IwoPHHVsbUBnZW50 b28ub3JnAAoJEFGIM1CIQV4uBd8H/jeE5qmNtsueJGc7tTUHDpINdNENPFTRwRSJ vxGl4BjFQT+6FSB51uyxgusLC0WNBNhUm0SjFWGF6MN1F10N+VfjtGXLre1OEjpt 0WyIfKwPMIZRCfszSuo9O4zoI8b+F3PNDHHZLs9ZPKklGecGpi/5xxbt+suyqfxW CTCI/uR07JTwa4P6G325oKtZR7JOw6SMBSbT+mYm5b0wilCuz3BP+dy2BpynUSfR 3yUwM/sUg9MpdsRsAxe0w1rc6eqmDAU72+09q7KXVNf6n3DRmZEpIkGiCmNmNn95 sHsw1Hom0XRpUQ/PQiHp19Q8C4hv2Ne6/SEHTvuB27dFw6i3cTE=
    =XVHT
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Micha=C5=82_G=C3=B3rny?=@21:1/5 to Ulrich Mueller on Mon Sep 13 23:10:02 2021
    On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:
    On Mon, 13 Sep 2021, Sheng Yu wrote:

    -The archive contains a number of files, stored in a single
    directory
    -whose name should match the basename of the package file. However,
    -the implementation must be able to process an archive where
    -the directory name is mismatched. There should be no explicit
    archive
    -member entry for the directory.
    +The archive contains a number of files. All package-related files
    +should be stored in a single directory whose name matches the CPV
    of
    +the package file. However, the implementation must be able to
    process
    +an archive where the directory name is mismatched. There should be
    no
    +explicit archive member entry for the directory.

    I wonder about CPV here. That's ${CATEGORY}/${P} and contains a slash,
    so it cannot be the name of a directory. Also, what about the package revision?

    Please restore the previous wording. The GLEP deliberately did not
    enforce a specific filename because it's about internal format.


    +6. The package manifest data file ``Manifest`` (required).
    +
    +7. A signature for the package Manifest file ``Manifest.sig``
    + (optional).

    Given that the outer archive is uncompressed tar, every file will be zero-padded to a full block which adds some amount of bloat. So, could
    the signature be inlined in the Manifest file? That's also what GLEP
    74
    specifies.

    Using inline signature in Manifest makes sense.


    Also, IIRC one of the goals of the format was to allow partial
    download
    of metadata. That will only work if the Manifest file will be the
    first
    file in the archive (or at least appear before the image archive).

    I disagree. This is solved by having detached metadata signature -- you
    can do a partial fetch and verify the metadata directly.

    On the other hand, putting Manifest first would make it impossible to
    create the archive from data stream without using temporary files,
    effectively doubling the needed free space. Well, technically you could
    just reserve space and write Manifest later but that would strongly
    depend on the size of PGP signature and that's not something I'd feel comfortable relying on.

    --
    Best regards,
    Michał Górny

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Freeman@21:1/5 to mgorny@gentoo.org on Tue Sep 14 00:10:01 2021
    On Mon, Sep 13, 2021 at 5:02 PM Michał Górny <mgorny@gentoo.org> wrote:

    On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:

    Also, IIRC one of the goals of the format was to allow partial
    download
    of metadata. That will only work if the Manifest file will be the
    first
    file in the archive (or at least appear before the image archive).

    I disagree. This is solved by having detached metadata signature -- you
    can do a partial fetch and verify the metadata directly.


    Another option I've tossed out there in the past is having a content
    hash of the metadata and putting that in the filename. That obviously
    won't tell you anything about the contents of the file without reading
    it, but if you're looking for a file with specific metadata you could
    predict its filename. This was intended to work with having multiple
    hashes for the same file using subsets of the metadata, using symbolic
    links.

    The thinking here is that you'd just hash a subset of metadata useful
    for identifying what file you'd want to download, such as CHOST,
    linked dependency versions, use flags, etc. You'd probably hash it with/without stuff like use flags so that you could either take a shot
    at getting the file exactly configured how you want, or accepting a
    version with any set of flags.

    Of course, this idea goes in direct opposition to your statement about
    not wanting to specify the filename. I get that argument. The intent
    here was to allow portage to go hunting through trusted repositories
    to find packages it can use without having to sync a lot of data - if
    you know the exact filename then a simple GET tells you if it is there
    or not.

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sheng Yu@21:1/5 to mgorny@gentoo.org on Tue Sep 14 01:30:01 2021
    ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

    On Monday, September 13th, 2021 at 17:02, Michał Górny <mgorny@gentoo.org> wrote:
    On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:
    On Mon, 13 Sep 2021, Sheng Yu wrote:

    -The archive contains a number of files, stored in a single
    directory
    -whose name should match the basename of the package file. However,
    -the implementation must be able to process an archive where
    -the directory name is mismatched. There should be no explicit
    archive
    -member entry for the directory.
    +The archive contains a number of files. All package-related files +should be stored in a single directory whose name matches the CPV
    of
    +the package file. However, the implementation must be able to
    process
    +an archive where the directory name is mismatched. There should be
    no
    +explicit archive member entry for the directory.

    I wonder about CPV here. That's ${CATEGORY}/${P} and contains a slash,
    so it cannot be the name of a directory. Also, what about the package revision?

    Please restore the previous wording. The GLEP deliberately did not
    enforce a specific filename because it's about internal format.

    Got it, but maybe we need to add a requirement for human readability.
    Since users should not have to check the data within the metadata.


    +6. The package manifest data file ``Manifest`` (required).
    +
    +7. A signature for the package Manifest file ``Manifest.sig``
    + (optional).

    Given that the outer archive is uncompressed tar, every file will be zero-padded to a full block which adds some amount of bloat. So, could
    the signature be inlined in the Manifest file? That's also what GLEP
    74
    specifies.

    Using inline signature in Manifest makes sense.

    This makes sense but leads to another problem: we allowed user-defined
    GPG commands, which gives us no control over exactly what format is
    generated. And I do not feel hard-code "--clear-sign" and "--detach-sign"
    is good practice.


    Also, IIRC one of the goals of the format was to allow partial
    download
    of metadata. That will only work if the Manifest file will be the
    first
    file in the archive (or at least appear before the image archive).

    I disagree. This is solved by having detached metadata signature -- you
    can do a partial fetch and verify the metadata directly.

    On the other hand, putting Manifest first would make it impossible to
    create the archive from data stream without using temporary files, effectively doubling the needed free space. Well, technically you could
    just reserve space and write Manifest later but that would strongly
    depend on the size of PGP signature and that's not something I'd feel comfortable relying on.


    Reserve space also wasted extra space and need a padding file.

    Thanks,
    Sheng Yu

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Sheng Yu@21:1/5 to Rich Freeman on Tue Sep 14 01:50:01 2021
    On Monday, September 13th, 2021 at 18:04, Rich Freeman <rich0@gentoo.org> wrote:

    On Mon, Sep 13, 2021 at 5:02 PM Michał Górny <mgorny@gentoo.org> wrote:

    On Mon, 2021-09-13 at 12:08 +0200, Ulrich Mueller wrote:

    Also, IIRC one of the goals of the format was to allow partial
    download
    of metadata. That will only work if the Manifest file will be the
    first
    file in the archive (or at least appear before the image archive).

    I disagree. This is solved by having detached metadata signature -- you can do a partial fetch and verify the metadata directly.


    Another option I've tossed out there in the past is having a content
    hash of the metadata and putting that in the filename. That obviously
    won't tell you anything about the contents of the file without reading
    it, but if you're looking for a file with specific metadata you could
    predict its filename. This was intended to work with having multiple
    hashes for the same file using subsets of the metadata, using symbolic
    links.

    The thinking here is that you'd just hash a subset of metadata useful
    for identifying what file you'd want to download, such as CHOST,
    linked dependency versions, use flags, etc. You'd probably hash it with/without stuff like use flags so that you could either take a shot
    at getting the file exactly configured how you want, or accepting a
    version with any set of flags.

    Of course, this idea goes in direct opposition to your statement about
    not wanting to specify the filename. I get that argument. The intent
    here was to allow portage to go hunting through trusted repositories
    to find packages it can use without having to sync a lot of data - if
    you know the exact filename then a simple GET tells you if it is there
    or not.

    Interesting concept, although this should be counted in the binpkg-multi-instance. A predictable configuration hash, rather than
    relying on index to get the difference between variants.

    Something like:
    bar/foo-1.0-r2-e3b0c44298fc1c149afbf4c8996fb9.gpkg.tar

    Thanks,
    Sheng Yu

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)