I have been looking into the aliasing problems in dpkg on behalf of Freexian's Debian funding. To that end I proposed a possible way forward
last year (https://lists.debian.org/debian-dpkg/2022/11/msg00007.html),
but the feedback I got was not particularly helpful in determining
consensus.
A little later, Simon Richter also looked into the problem (https://lists.debian.org/debian-dpkg/2022/12/msg00023.html), but
remained silent after the initial post. Little happened since then. Now Raphael Hertzog proposed to use the DEP process to get this thing
unstuck
and with the help of Emilio Pozuelo Monfort I created a draft
for discussion. I allocate number 17 via debian-project@l.d.o. What
follows is the draft text. Please consider it to be a piece of best intentions at reconciling feedback wherever I could.
Introduction
============
At its core, `dpkg` assumes that every filename uniquely refers to a
file on disk. The situation where two distinct filenames refer to the
same file on disk is referred to as aliasing.
Proposal
========
In order to handle aliasing efficiently, `dpkg` gains new options `--add-alias <symlink>`, `--remove-alias <symlink>` and
`--list-aliases`. When creating symbolic links that cause aliasing
effects, the creating entity is supposed to inform `dpkg` using an appropriate invocation. Doing so records the aliasing information in a
new mapping inside its administrative directory. No existing
administrative files are modified as a result of this operation. When
`dpkg` operates on paths, it can compute a canonicalized version using a
pure function without the need to `stat()` files on disk thus greatly improving performance. Canonicalized paths are only needed when
determining whether a file conflict exists. In all other cases,
original paths continue to be used as symbolic links will be followed by filesystem operations. The `--add-alias` operation records the target
of the symbolic link that must exist prior to invocation. The `--remove-alias` operation fails if any files are still installed in the aliased location.
Rejected proposals
==================
Hardcoding aliases into dpkg
----------------------------
It was suggested to include a static aliasing mapping into the `dpkg`
source code. Since `dpkg` is used by multiple projects in different
ways (not necessarily Debian-derivatives), this approach would break
other consumers. Also note that Debian's `dpkg` can be used to operate
on an installation using different aliases via the `--root` flag. As
such the alias mapping needs to be a property of the installation.
Modifying package lists in place
--------------------------------
`dpkg` could rewrite the extracted `.list` files from `control.tar` and
store paths in canonicalized form. Canonicalization would happen as
when a `control.tar` is extracted. It would also happen either as a
one-time conversion during the upgrade of `dpkg` or whenever a `.list`
file is read. Given canonicalized list files, string comparison on
files would support conflict detection. Other pieces to be updated in a similar way include `alternatives`, `diversions`, `statoverride`, and `triggers`.
This would affect the output of `dpkg -S`, which would then output canonicalized paths. Packages generated by `dpkg-repack` would have
their contents canonicalized as well.
Managing the aliasing mapping using a control file --------------------------------------------------
It was suggested that the mapping could be managed via a special control
file `canonical`. Given that aliasing is not a common operation, the
benefit of handling it declaratively is minor. Beyond that, aliasing
can also happen as an customization issued by an administrator.
Therefore, a command line based approach is preferred.
Having dpkg move files and create symbolic links ------------------------------------------------
When instructed with `--add-alias`, `dpkg` could also create the corresponding symbolic links and move the affected files to their new location. While that would be convenient, doing so is non-trivial in an atomic way. Sometimes, the underlying filesystem does not fully conform
to POSIX (e.g. `overlayfs`) and such corner cases need to be managed individually. Since such an implementation already exists outside
`dpkg` and its complexity is non-trivial, the moving of files shall
remain external. In case aliases are setup in a bootstrap setting, no
moves are necessary.
Implement aliasing after metadata tracking ------------------------------------------
The [metadata tracking](https://wiki.debian.org/Teams/Dpkg/Spec/MetadataTracking)
feature enhances `dpkg` with knowledge about filesystem metadata for installed files. This includes knowledge of symbolic links, which would
help with tracking aliasing. Unfortunately, progress on this is fairly
slow and we think that aliasing support is more urgent.
I thought my reply was rather clear, and that we had further clarified
that privately, that at the time I thought there was no other answer
required as (AFAIR) you stated you'd be digging further on it. And I mentioned I'd try to reply to the list, but it didn't feel urgent given
the clarifications given, neither the timing during the freeze?
Sigh, a DEP(!?), for a dpkg change? It feels more like a way to exhort pressure over this than anything else TBH…
I'm unlikely to discuss this topic on debian-devel, given previous
nastiness and abuse.
The text includes most (but not all) of what I've been saying publicly,
and what I've tried to further clarify to you and Emilio in private.
But I think ignores the essence of what I've been repeating all along.
I already mentioned this in my reply for the thread you reference. So,
let me repeat and possibly expand to avoid any future doubt. I already considered and discarded something like this (except for using a config option instead of a new command, but that does not really change the substance of the problems).
Let's also get back to the very basics. dpkg manages objects shipped
in binary packages, on the filesystem. It assumes this managing role in exclusivity, it will for example overwrite unmanaged files. It preserves admin changes with interfaces specifically provided for that (diversions, statoverrides, conffile changes) or the unfortunate symlink redirects.
These shipped objects define the filesystem layout (not the other way around). Due to the missing fsys metadata, where it does not have all
such metadata at hand when necessary (it might only have the one for
the currently unpacked .deb), it might use heuristics or check the
filesystem for such metadata, because it does not have anything else,
but that should not be taken to mean that the filesystem is the source
of truth, as most of those will be unnecessary once it has such
metadata at hand.
So the reason this proposal is still conceptually wrong is manifold:
* dpkg cannot safely and atomically perform such switches (and I don't
see it ever being able to portably do so, so I don't see ever
supporting that).
* No packages ships those symlinks (and none should! as that would
currently imply having the same pathname contain different file types
on the same system, introducing ordering issues and file type
conflicts).
* This introduces a series of commands to let dpkg know that a
filesystem change that was not shipped in any .deb (even though that
should have been the way to do it), has been done, which:
- Switches the source of truth from the .deb to the fsys.
- Confuses admin initiated changes from distro initiated ones.
* Wants to be a generic change but it is really targeted to this
specific mess. We have been doing similar aliasing transitions for
many doc dirs, by stopping shipping files within, shipping that
pathname as a symlink and then switching the directories to symlinks
to match (via the dpkg-maintscript-helper hack because we miss fsys
metadata). This means we'd need to then register all these directories
too? Meh.
* This information can get out of sync with reality, as it adds an
additional and unconnected with anything source of truth, that dpkg
cannot do anything about if it diverges (in contrast to diversions
or statoverrides f.ex.). This can never happen when that information
comes from the real source of truth (the fsys metadata via the .deb).
* This also adds undue complexity, by supporting those as admin aliases.
The admin generated redirecting symlinks are already annoying, I'd rather
not add further to that pile. I don't really want to support admins doing
this (dpkg-divert does not even support diverting a directory).
[ As an aside, I think ideally eventually nothing distro provided should
be allowed to be installed within an aliased dir, and dpkg should
eventually just error out in those cases, which eventually would get
rid of the aliasing problems and any such complexity (I'm not sure how
or when that would be feasible though, but obviously in Debian at
least not until nothing ships files there). ]
So this still looks like a terrible interface, like it did at the time
it was discarded; founded on a hack, an interface that seems wants to
be kind of a file-type override but it cannot be, and cannot even
properly act as record tracker, etc…
I thought it would be clear that if there is stuff that depends on
any of this kind of changes to dpkg, relying on those changes in
Debian would not be possible until after trixie+1. Of course there is
always the route to further pile up over the Jenga tower of hacks,
by for example adding huge amounts of Pre-Depends…
So given the above, I don't see why the apparent rush here. And as I've mentioned many times now, I'm planning to continue working on the fsys metadata stuff for 1.22.x, probably at the cost of database duplication
if necessary, if current blockers have not adapted by then. But as I've mentioned before, that might not guarantee this support is sufficient to support fixing this mess. But all other proposed changes I've seen
flying around for changes to dpkg are just conceptually wrong in one way
or another.
Yes, I am quite busy, but it's not forgotten. I keep adding new test cases.
Dpkg already has defined behaviour for directory vs symlink: the directory wins. In principle a future version of dpkg could change that, but /lib/ld-linux.so.2 is just too special, we'd never want to have a package that actually moves it.
That's why I went with "this needs to be a separate mechanism."
The reason to use a control file instead of a tool would be to install the alias from an Essential package, so the old-school "unpack essential packages, then overwrite with dpkg" approach to system installation would work again without special-casing usrmerge in debootstrap&co.
It was suggested that the mapping could be managed via a special control file `canonical`. Given that aliasing is not a common operation, the benefit of handling it declaratively is minor. Beyond that, aliasing
can also happen as an customization issued by an administrator. Therefore, a command line based approach is preferred.
The advantage is that it works for Essential packages, like the one shipping /lib/ld-linux.so.2.
On Sat, Apr 08, 2023 at 04:35:25AM +0200, Guillem Jover wrote:
Let's also get back to the very basics. dpkg manages objects shipped
in binary packages, on the filesystem. It assumes this managing role in exclusivity, it will for example overwrite unmanaged files. It preserves admin changes with interfaces specifically provided for that (diversions, statoverrides, conffile changes) or the unfortunate symlink redirects. These shipped objects define the filesystem layout (not the other way around). Due to the missing fsys metadata, where it does not have all
such metadata at hand when necessary (it might only have the one for
the currently unpacked .deb), it might use heuristics or check the filesystem for such metadata, because it does not have anything else,
but that should not be taken to mean that the filesystem is the source
of truth, as most of those will be unnecessary once it has such
metadata at hand.
This captures an insight I previously didn't have in that clarity and
that I find agreeable conceptually.
So the reason this proposal is still conceptually wrong is manifold:
* dpkg cannot safely and atomically perform such switches (and I don't
see it ever being able to portably do so, so I don't see ever
supporting that).
I agree, but the proposal also does not ask dpkg to perform such
switches, so I kinda fail to see how this is a relevant argument.
* No packages ships those symlinks (and none should! as that would
currently imply having the same pathname contain different file types
on the same system, introducing ordering issues and file type
conflicts).
I disagree with this argument on two levels. For one thing, I think that
the transition only is complete once these symlinks are shipped in a
package. In particular, that notion of complete likely encompasses that
no aliasing occurs anymore as all aliased files have been moved to their canonical location somehow (<- and this likely will be a quite difficult thing to do). For another, no package actually ships those symlinks now.
They are created behind dpkg's back in some postinst. This is
unfortunate and I agree with Simon Richter that this kinda is a policy violation, but at this time, it is an aspect we have to deal with
whether we want to or not.
I suspect that you disagree with the notion the we have to deal with
this situation, which I consider to be our fundamental disagreement.
* This introduces a series of commands to let dpkg know that a
filesystem change that was not shipped in any .deb (even though that
should have been the way to do it), has been done, which:
- Switches the source of truth from the .deb to the fsys.
While this is correct on some level, the aim of this change is to put
that truth back into dpkg of course.
- Confuses admin initiated changes from distro initiated ones.
I think we already do this with dpkg-divert, dpkg-statoverride and other tools. While this may not be nice, it certain has prior art and is
consistent with how we have been doing things in the past.
* Wants to be a generic change but it is really targeted to this
specific mess. We have been doing similar aliasing transitions for
many doc dirs, by stopping shipping files within, shipping that
pathname as a symlink and then switching the directories to symlinks
to match (via the dpkg-maintscript-helper hack because we miss fsys
metadata). This means we'd need to then register all these directories
too? Meh.
I would love to agree with this, but I believe that this ship has
sailed. This likely is part of our fundamental disagreement.
* This information can get out of sync with reality, as it adds an
additional and unconnected with anything source of truth, that dpkg
cannot do anything about if it diverges (in contrast to diversions
or statoverrides f.ex.). This can never happen when that information
comes from the real source of truth (the fsys metadata via the .deb).
I have difficulties accurately capturing the argument. The problem of information getting out of sync with reality should affect every aspect
of dpkg and indeed, that kinda is the status quo where upgrades can
loose files, because dpkg has an incomplete picture of reality. The aim
of this change is to allow us to re-sync the status quo into dpkg. My
view is that dpkg's information presently is out of sync with reality
and the proposed change partially fixes that.
[ As an aside, I think ideally eventually nothing distro provided should
be allowed to be installed within an aliased dir, and dpkg should
eventually just error out in those cases, which eventually would get
rid of the aliasing problems and any such complexity (I'm not sure how
or when that would be feasible though, but obviously in Debian at
least not until nothing ships files there). ]
It seems to me that this is something everyone agrees on. So our
disagreement resides in the way to get there rather than where to get
to.
So this still looks like a terrible interface, like it did at the time
it was discarded; founded on a hack, an interface that seems wants to
be kind of a file-type override but it cannot be, and cannot even
properly act as record tracker, etc…
I agree that in a perfect world, we would not need this. Let me circle
back to our fundamental disagreement.
My impression is that at this time basically everyone except you agrees
that we have to deal with the aliasing problems that have been rolled
out to users and will be forced in bookworm. I believe that this is the
state that we have to consider as starting point and that we cannot
magically turn this transition back to perform it in a better way. And indeed, I believe that there would have been a better way[1] that no
longer is available to us.
On the other hand, my impression is that you continue to see the
transition as fundamentally broken and in a state that we cannot work
from. You appear to believe that if we want to do it, we must start over
in a better way. That better way must not cause aliasing problems to
dpkg.
I thought it would be clear that if there is stuff that depends on
any of this kind of changes to dpkg, relying on those changes in
Debian would not be possible until after trixie+1. Of course there is always the route to further pile up over the Jenga tower of hacks,
by for example adding huge amounts of Pre-Depends…
I agree that we probably will deal with this until at least trixie+1.
This is precisely why I would like to have a plan to finish it sooner
rather than later.
I don't think we disagree (?), I probably didn't express myself clearly.
The fact that no package ships those symlinks *is* and *has* been a
problem, and what I've been saying all along, this will be the only
correct way to let dpkg know whether there will be aliasing in play.
But given these mentioned constraints
it cannot be made to support (as in accept) unpacking files inside
aliased directories (it should be able to unpack the symlinks creating
those aliased directories though!).
dpkg-divert distinguishes between local and package level changes, it
is true that dpkg-statoverride does not have (currently) that
distinction, although it is primarily an admin tool where I don't
think it makes much sense to support something like declarative
package statoverrides TBH once we can ship fsys metadata (perhaps
conditional one though).
On 6/21/23 20:33, Guillem Jover wrote:
I don't think we disagree (?), I probably didn't express myself clearly. The fact that no package ships those symlinks *is* and *has* been a problem, and what I've been saying all along, this will be the only
correct way to let dpkg know whether there will be aliasing in play.
I've looked into building a dpkg-alias tool that would work similar to dpkg-divert, and currently that looks like it might be a viable solution.
The package would need to unregister on upgrade in the postrm though, but that is standard for removed diversions.
- dpkg-query returns the package name if any aliased name matches
There should also be a flag whether to report the file name from the
data.tar as well, defaulting to "no", because that's what scripts expect.
But given these mentioned constraints
it cannot be made to support (as in accept) unpacking files inside
aliased directories (it should be able to unpack the symlinks creating those aliased directories though!).
I think that can be done. I have already successfully made it report a conflict between /bin/testfile and /usr/bin/testfile, with a meaningful
error message, and runtime overhead isn't too bad -- a factor of
log_{262144} 2 on the lookup time for a single path, but inserts got a bit more expensive because these now have prefix comparisons on the path. The latter could probably be improved with another hash on the first N bytes of the path.
I'd like to see a mechanism that ensures that dpkg understands those control files, though -- like a "critical" flag.
I suspect that for trixie, this will have to be an archive side check that any package using one of the declarative interfaces depends on an
appropriate version of dpkg, and/or its use disallowed until trixie+1 for
the convenience of backporters.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (3 / 13) |
Uptime: | 44:31:02 |
Calls: | 6,710 |
Calls today: | 3 |
Files: | 12,243 |
Messages: | 5,354,110 |