• supporting merged-/usr-via-aliased-dirs in dpkg

    From Helmut Grohne@21:1/5 to All on Wed Nov 2 23:40:02 2022
    Hi Guillem,

    please Cc me in replies.

    Disclaimer: I'm doing this on Freexian capacity.

    I'm trying to figure out a way to make dpkg better support the aliasing approach chosen by the CTTE to implement merged /usr (aka merged-/usr-via-aliased-dirs). In order to avoid doing unnecessary work, I'd like to gather requirements first and hope you can help me with that part.

    To that end, I looked through written material by you and identified the following as relevant. Do you spot anything important missing?

    https://wiki.debian.org/Teams/Dpkg/MergedUsr
    Among other things, this wiki page identifies problems arising from the aliasing layout. It seems fairly exhaustive to me and provides a starting
    point as to which aspects may need changes.

    https://lists.debian.org/debian-devel/2021/07/msg00196.html is an earlier and partial summary of problems written by you earlier.

    https://lists.debian.org/debian-devel/2020/02/msg00477.html establishes nomenclature for merged-/usr, merged-/usr-via-aliased-dirs and merged-/usr-via-moves-and-symlink-farms.

    https://lists.debian.org/20181223030614.GA8788@gaara.hadrons.org and https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?h=pu/query-map-pathnames&id=b3f56ff6f3eaed17f534d544a3b6f8cc952e49c6
    are starting points towards solving the problems arising from aliasing.

    This latter mail also mentions dpkg-statoverride as a problem area. I am wondering why it is missing from the wiki page. Do you mind me adding it there for completeness?

    In reading all of the above, I had the impression that you spent much thought on explaining why merged-/usr-via-aliased-dirs is bad and how alternatives may look like. Unfortunately, this is the implementation strategy that we're heading for.

    Then we have this proof of concept patch by uau at https://0x0.st/oNFG.diff (and an earlier versions https://0x0.st/-7ev.diff and https://0x0.st/-7vq.diff). Evidently, this was discussed on IRC (presumably #debian-dpkg) and categorized as "conceptually broken". While I identified the lack of separation of policy and mechanism, you appear to take more issues with this approach. As I looked through all of this, I failed to identify what other issues you see. It sure is an irreversible operation on the dpkg database. Once performed however, a number of the problems arising from the aliasing disappear.

    Let me sketch a possibly new behaviour of dpkg. In the spirit of dpkg --add-architecture, I propose adding a new option dpkg --add-alias <symlink> <target> (e.g. dpkg --add-alias /lib /usr/lib). This invocation would record the alias in the dpkg database. Any time a dpkg tool operates on a path, it would canonicalize the path using known aliases. This includes dpkg-divert, dpkg-statoverride, triggers and update-alternatives.

    I'm not sure whether we also need a --remove-alias <symlink> option. If we don't, we can make --add-alias irreversible (much like uau's patch did). Evidently, this simplifies a lot of things - not least lookups of canonicalized paths.

    Which problems would be fixed (+) or not (-) by this approach?
    - dpkg -S (possibly fixable, but maybe not worth it)
    - dpkg-deb -x
    + dpkg file overwrite/delete
    + dpkg-divert
    - dpkg-repack (only fixable, if the operation is reversible)
    + dpkg-statoverride
    + dpkg triggers
    - tar -x
    + update-alternatives

    While this doesn't solve all problems, it does fix a significant fraction that is relevant to upgrading and maintainer scripts. That seems worth exploring to me. What do you think? Also note that once no package ships files in aliased directories, the dpkg-deb -x, dpkg-repack and tar -x issues will have no practical consequences anymore.

    I've talked about this with Simon McVittie and Julian Andres Klode, both
    of which considered this approach viable (up to colouring the shed).

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Guillem Jover@21:1/5 to Helmut Grohne on Fri Nov 11 13:30:01 2022
    Hi!

    On Wed, 2022-11-02 at 23:35:23 +0100, Helmut Grohne wrote:
    I'm trying to figure out a way to make dpkg better support the aliasing approach chosen by the CTTE to implement merged /usr (aka merged-/usr-via-aliased-dirs). In order to avoid doing unnecessary work, I'd like to gather requirements first and hope you can help me with that part.

    I'm doing a shallow reply over this, can expand further during the
    weekend probably if necessary.

    https://lists.debian.org/20181223030614.GA8788@gaara.hadrons.org and https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?h=pu/query-map-pathnames&id=b3f56ff6f3eaed17f534d544a3b6f8cc952e49c6
    are starting points towards solving the problems arising from aliasing.

    Not really, the first lists things that are *not* proper solutions, the
    branch includes a deadend which I discarded long time ago. Those two are at least workarounds, and are definitely not on the path to proper solutions.

    This latter mail also mentions dpkg-statoverride as a problem area. I am wondering why it is missing from the wiki page. Do you mind me adding it there for completeness?

    It's probably missing because I run out of steam, and making the page
    more accurate has seemed pointless, TBH. But sure, go ahead.

    In reading all of the above, I had the impression that you spent much thought on explaining why merged-/usr-via-aliased-dirs is bad and how alternatives may
    look like. Unfortunately, this is the implementation strategy that we're heading for.

    I've also mentioned what an hypothetical solution might be founded on.
    On filesystem metadata tracking. But again I'm not even convinced this
    can either solve the issues in a non-interface-breaking way either. :/

    Then we have this proof of concept patch by uau at https://0x0.st/oNFG.diff (and an earlier versions https://0x0.st/-7ev.diff and https://0x0.st/-7vq.diff). Evidently, this was discussed on IRC (presumably #debian-dpkg) and categorized as "conceptually broken". While I identified the
    lack of separation of policy and mechanism, you appear to take more issues with
    this approach. As I looked through all of this, I failed to identify what other
    issues you see. It sure is an irreversible operation on the dpkg database. Once
    performed however, a number of the problems arising from the aliasing disappear.

    This would break interfaces, as it introduces change at a distance (as
    packages can expect their paths to match what's shipped from what's on
    the db, as packages are internally coherent).

    Even not updating the db and remapping on the fly or outputting both
    pathnames would be a breaking change.

    Both of these approaches do not really solve the problem, they just shift
    it elsewhere.

    Old package shipping stuff in both aliased directories would also
    still not be installable, even though their deps could be satisfied.

    Not even the pathname filtering support affects the fsys database, for
    example.

    Let me sketch a possibly new behaviour of dpkg. In the spirit of dpkg --add-architecture, I propose adding a new option dpkg --add-alias <symlink> <target> (e.g. dpkg --add-alias /lib /usr/lib). This invocation would record the alias in the dpkg database. Any time a dpkg tool operates on a path, it would canonicalize the path using known aliases. This includes dpkg-divert, dpkg-statoverride, triggers and update-alternatives.

    This is equivalent (although perhaps slightly better as instead of a config this is stored in the db so it can be used by commands that do not parse config files) to the deadend approach from the above branch. This is trying to encode filesystem knowledge that is supposed to be shipped in .deb into an option, that can get out of sync, and still does not cover the change at a distance issues.

    Not to mention interactions with people relocating entire directories
    via symlinks into other fsys points and similar, which is
    unfortunately still a supported setup.

    Which problems would be fixed (+) or not (-) by this approach?
    - dpkg -S (possibly fixable, but maybe not worth it)
    - dpkg-deb -x
    + dpkg file overwrite/delete
    + dpkg-divert
    - dpkg-repack (only fixable, if the operation is reversible)
    + dpkg-statoverride
    + dpkg triggers
    - tar -x
    + update-alternatives

    u-a does not interact with the dpkg fsys database.

    While this doesn't solve all problems, it does fix a significant fraction that
    is relevant to upgrading and maintainer scripts. That seems worth exploring to
    me. What do you think? Also note that once no package ships files in aliased directories, the dpkg-deb -x, dpkg-repack and tar -x issues will have no practical consequences anymore.

    See above, I think this is the wrong way to go.

    Thanks,
    Guillem

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Grohne@21:1/5 to Guillem Jover on Sat Nov 19 11:00:01 2022
    Hi Guillem,

    On Fri, Nov 11, 2022 at 01:21:32PM +0100, Guillem Jover wrote:
    I'm doing a shallow reply over this, can expand further during the
    weekend probably if necessary.

    Thank you for taking the time to reply.

    https://lists.debian.org/20181223030614.GA8788@gaara.hadrons.org and https://git.hadrons.org/cgit/debian/dpkg/dpkg.git/commit/?h=pu/query-map-pathnames&id=b3f56ff6f3eaed17f534d544a3b6f8cc952e49c6
    are starting points towards solving the problems arising from aliasing.

    Not really, the first lists things that are *not* proper solutions, the branch includes a deadend which I discarded long time ago. Those two are at least workarounds, and are definitely not on the path to proper solutions.

    I fear that the perfect is the enemy of the good here. I appreciate that
    you look for proper solutions, but I think there is another aspect to
    this that makes considering less than ideal solutions relevant.

    Throughout this discussion, you've been helpfully agreeing that the goal
    of having the actual files below /usr is something that can be done in principle. The disagreement arises from the way to get there. Whether we
    like it or not, we will have to deal with the consequences of the
    aliasing approach for quite some time. If we could somehow perform the
    actual move of files from / to /usr in binary packages, we'd mitigate
    much (not all) of the practical downsides of this approach. However, we currently cannot perform that move, because doing it with dpkg is
    unsafe. So while a more complete solution would certainly be desirable,
    a partial solution that allows us to perform these moves would get us a
    long way in eliminating much of the practical issues.

    So yeah, I do want you to consider this workaround, because I believe
    that it is achievable in a more timely manner and mitigating the bad
    effects is something we need sooner rather than later.

    [dpkg-statoverride]

    It's probably missing because I run out of steam, and making the page
    more accurate has seemed pointless, TBH. But sure, go ahead.

    Added.

    In reading all of the above, I had the impression that you spent much thought
    on explaining why merged-/usr-via-aliased-dirs is bad and how alternatives may
    look like. Unfortunately, this is the implementation strategy that we're heading for.

    I've also mentioned what an hypothetical solution might be founded on.
    On filesystem metadata tracking. But again I'm not even convinced this
    can either solve the issues in a non-interface-breaking way either. :/

    Given what you wrote, I'm fairly convinced that there is no solution
    that retains each and every interface assumption. And while building on filesystem metadata tracking sounds really nice, it is not something
    that we can rely on soon given that it entails database and package
    format changes. As such, I believe that we need to consider partial
    solutions to the worst of problems and also accept some form of
    interface breakage.

    Then we have this proof of concept patch by uau at https://0x0.st/oNFG.diff (and an earlier versions https://0x0.st/-7ev.diff and https://0x0.st/-7vq.diff). Evidently, this was discussed on IRC (presumably #debian-dpkg) and categorized as "conceptually broken". While I identified the
    lack of separation of policy and mechanism, you appear to take more issues with
    this approach. As I looked through all of this, I failed to identify what other
    issues you see. It sure is an irreversible operation on the dpkg database. Once
    performed however, a number of the problems arising from the aliasing disappear.

    This would break interfaces, as it introduces change at a distance (as packages can expect their paths to match what's shipped from what's on
    the db, as packages are internally coherent).

    In a strict sense, I think we already broke interfaces (in allowing the aliasing to proceed). We now look into mitigating the most pressing
    issues. At the same time, this kind of breakage affects a minority of
    packages. We're talking about less than 2000 packages in unstable. I do
    see how retaining backwards compatibility is important. I don't think
    we'll be able to do that here.

    Even not updating the db and remapping on the fly or outputting both pathnames would be a breaking change.

    While that is true, I think it is a change we can live with whereas not
    being able to move files from / to /usr and never being able to finish
    this transition would be rather bad for the morale of Debian as a whole.

    Both of these approaches do not really solve the problem, they just shift
    it elsewhere.

    I agree. However, they shift the problem sufficiently far away that we
    can start moving files and thus allow finishing the transition. When I
    say finish, I mean that packages stop shipping files in aliased
    directories. Once this state is achieved, many problems loose practical relevance.

    Old package shipping stuff in both aliased directories would also
    still not be installable, even though their deps could be satisfied.

    I agree that this problem remains unsolved. At the same time, the only technical measure I've seen addressing this has been versioned Breaks
    issued by usrmerge. Do you see any better solution to this problem? My understanding is that this would remain unsolved even with a "proper
    solution".

    [dpkg --add-alias proposal]

    This is equivalent (although perhaps slightly better as instead of a config this is stored in the db so it can be used by commands that do not parse config
    files) to the deadend approach from the above branch. This is trying to encode
    filesystem knowledge that is supposed to be shipped in .deb into an option, that can get out of sync, and still does not cover the change at a distance issues.

    I am aware of these limitations. Maybe I should have spelled them out
    more explicitly in my previous mail. I still believe that this is a
    viable way of addressing the worst of practical effects.

    u-a does not interact with the dpkg fsys database.

    That statement is not as obvious as it seems initially. u-a has its own directory inside the admindir. It would become a consumer of the
    recorded aliases, so it would increase that interaction.

    See above, I think this is the wrong way to go.

    Yes, I do see that you very much disagree with this route. What I fail
    to see is a "right way". As much as I trust your expertise, any way that
    starts with "roll back the aliasing" is something I would reject on practicality grounds, but that seems to be your only notion of "right
    way". We both agree that this transition has been improperly planned and executed. However bad that may be, that's where we are. So the question
    no longer is "How do we get from unmerged to merged in a proper way?".
    The question unfortunately has become "How do we mitigate the worst of
    issues resulting from aliasing?". You appear to be rejecting that change
    of question.

    With a all that being said, the projected --add-alias approach still
    seems to be the best available trade-off in the solution space to me.
    From my point of view, the most important property is finding a way to actually move files from / to /usr in a reasonably safe way. I think
    this is important, because most of the aliasing problems are only
    relevant to packages shipping files on aliased paths and moving files
    thus reduces the amount of affected packages. If you want to make moves
    safe while retaining the chosen aliasing approach, it becomes fairly
    obvious that dpkg needs to know about such aliases in some way. That
    train of thought appears to leave little room for alternatives, but
    maybe I'm missing something important.

    Thank you for considering

    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)