• [RFC PATCH] libdpkg: Sort files within the package.

    From Sebastian Andrzej Siewior@21:1/5 to All on Mon Jul 3 21:40:01 2023
    This started in 2021 and I have no idea what to do with it so here it
    is. Let me get back to how it started:
    I was looking into the why does it take some time after apt downloaded
    the packages before dpkg starts installing them. As it turns out, I had apt-listchanges installed which decompresses all .deb files looking for
    "NEWS". That was the pause. Then dpkg decompresses it again while
    installing. So a bit of waste…

    Anyway. I though it would speed up things by moving the files of
    interest to the front of the archive. Then I realized that it would
    probably require a new interface to query this sort of information or
    otherwise it would require some heuristic to decide whether or not the
    files of interest are to be expected at the front and the kill dpkg
    midway. Then I kind of stopped playing with it.
    A few days later I remembered that RAR had (has?) this "solid" kind
    of archives where it improves compression by grouping files of same kind
    (file extension) together. So I altered the original patch a bit and
    made this. I tested this by re-compressing a few .debs and the "ordered"
    .debs got smaller by a few bytes / KiBs but not by an order of magnitue.
    Then I forgot all about it.

    I just remember it all while doing openssl-dpkg patches the other week
    and just cleaned up the branch.

    Now on the serious side:
    - Does this look usefull?
    - Any suggestion to how to speed apt-listchanges? A special interface or
    better remove the package?

    Signed-off-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
    ---
    lib/dpkg/treewalk.c | 11 +++
    lib/dpkg/treewalk.h | 2 +
    src/deb/build.c | 192 ++++++++++++++++++++++++++++++++++++++++----
    3 files changed, 191 insertions(+), 14 deletions(-)

    diff --git a/lib/dpkg/treewalk.c b/lib/dpkg/treewalk.c
    index b9a6207f60f52..6c4bcd6728759 100644
    --- a/lib/dpkg/treewalk.c
    +++ b/lib/dpkg/treewalk.c
    @@ -217,6 +217,17 @@ treenode_cmp(const void *a, const void *b)
    (*(const struct treenode **)b)->name);
    }

    +int treenode_sort_nop(struct treenode *dir)
    +{
    + size_t i;
    +
    + /* Relink the nodes. */
    + for (i = 0; i < dir->down_used - 1; i++)
    + dir->down[i]->next = dir->down[i + 1];
    + dir->down[i]->next = NULL;
    + return 0;
    +}
    +
    static void
    treenode_sort_down(struct treenode *dir)
    {
    diff --git a/lib/dpkg/treewalk.h b/lib/dpkg/treewalk.h
    index 16b7a310cf939..918808c4a02da 100644
    --- a/lib/dpkg/treewalk.h
    +++ b/lib/dpkg/treewalk.h
    @@ -54,6 +54,8 @@ struct treewalk_funcs {
    treenode_skip_func *skip;
    };

    +int treenode_sort_nop(
  • From Guillem Jover@21:1/5 to Sebastian Andrzej Siewior on Sat Jul 22 13:50:01 2023
    Hi!

    On Mon, 2023-07-03 at 21:37:08 +0200, Sebastian Andrzej Siewior wrote:
    This started in 2021 and I have no idea what to do with it so here it
    is. Let me get back to how it started:
    I was looking into the why does it take some time after apt downloaded
    the packages before dpkg starts installing them. As it turns out, I had apt-listchanges installed which decompresses all .deb files looking for "NEWS". That was the pause. Then dpkg decompresses it again while
    installing. So a bit of waste…

    Yes.

    Now on the serious side:
    - Does this look usefull?

    While I think the current situation is not ideal, I find hardcoding
    pathname assumptions into the dpkg-deb or dpkg to be non-starters.
    This additionally would mean the output from for example «dpkg -L» is
    not "sorted" anymore, and dpkg can currently not sort it further from
    its side because it needs to preserve the property of listing symlinks
    last (which your patch does, but regresses on the non-symlink cases).
    Sorting this from the dpkg side would be possible once it has fsys
    metadata tracking information.

    How to select which files to put first could perhaps be done with the
    old proposal for pathname classes (so that the "policy" is in the
    packager hands and not on the tooling), but even then placing them at
    the beginning of the data.tar, still seems like a hack.

    - Any suggestion to how to speed apt-listchanges? A special interface or
    better remove the package?

    I think the better option would be to ship these files as part of the control.tar member, see for example:

    https://lists.debian.org/debian-devel/2012/07/msg00398.html

    which contains further references. Although AFAIR last time this was
    brought up there was no support for this at least in Debian. I've not
    tried to implement or improve support for this in dpkg, because if
    this ends up not being used there, then it would become a rather
    confusing interface. But perhaps there's some way to move the needle
    in that direction w/o making this confusing or forcing Debian's hand
    as a fait accompli kind of thing.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)