<br></div><div>Cheers,</div><div>mwh</div><div><br></div></div>
Recently the topic of exploiting newer instructions without dropping
support for older machines has come up several times inside Ubuntu engineering. I understand this topic has come up several times in the past for Debian as well, but nothing has really come of it to date.
I've spent a while thinking through the options and coming up with a design and wrote some notes into a wiki page: https://wiki.debian.org/ArchitectureVariants
In terms of building consensus around this design, I thought it makes sense to start at the bottom of the stack and so here I am on this mailing list
:-) I guess in due course this could become a DEP, and would certainly need to be discussed on debian-devel before getting too far.
What do you think? Have I missed any glaring implications?
Is there a better way of doing this?
Hi!
On Fri, 2023-09-01 at 08:43:55 +1200, Michael Hudson-Doyle wrote:
Recently the topic of exploiting newer instructions without dropping support for older machines has come up several times inside Ubuntu engineering. I understand this topic has come up several times in thepast
for Debian as well, but nothing has really come of it to date.
I also had a chat about this with Matthias Klose (CCed) around 2022-05.
I've spent a while thinking through the options and coming up with adesign
and wrote some notes into a wiki page: https://wiki.debian.org/ArchitectureVariants
I think we are already doing 1, 2 and 3. I agree 4 is just wrong. And something like 5 is what I suggested to Matthias for Ubuntu when we
last discussed it as the best way to go about this.
I'm not sure I entirely agree with the requirements you set forth
though:
- I think such optimized builds might need to be done with "special
toolchains" (these could simply be wrappers over the host compiler
passing the appropriate flags via command-line or via specs or
similar, not necessarily full blown toolchains), passing these via
something like dpkg-buildflags seems currently unreliable, as I don't
think we have full coverage in packages (neither for all compilers
available)? Although it would be better as it would centralize the
management. (For reference this is in part how rpm handles this:
https://github.com/rpm-software-management/rpm/blob/master/rpmrc.in)
- Perhaps that's a limitation from the archive software side, but
requiring to place the binary packages in the same pool seems
rather restrictive (it forces different filenames for example).
- I guess it might be nice for the ISA to be passed down to the
dpkg tools, but I don't think this is strictly necessary? A
frontend like apt could also decide based on metadata in say the
Release file, although not having the actual installed package
metadata on whether it was a different ISA build or not would make
its job more inconvenient. In any case I don't have a big issue
with recording this via dpkg-gencontrol or similar if necessary.
On the specific implementation details:
- Changing the Architecture format (as in adding colons there) seems
like a non-starter, and I expect that would break lots of things
(I mean it could be done but I'm not sure it's worth it for this).
Recording this mostly as a hint than anything else, via another
field (if necessary at all) I think would be best.
- As covered in previous discussions, dpkg could (but I don't think
it's necessary) check whether the .deb is runnable on the current
hw, but that's tricky as chrootless installs need to be taken
into account, etc. It should certainly not be part of dependency
resolution.
- I'm not fond of having to change the binary package name format
either for this (name_version_arch.deb) even if at least dpkg
itself does not care (but I know other tools do care), and
depending on the format I'd expect things to break (this goes
back to the shared pool concern).
- If dpkg-architecture needs to be aware of this, then this might need
to be auto-detectable from just the current toolchain being used.
Some of the above problems could perhaps be avoided if we introduced
a concept of architecture aliases/ISAs (similar to what rpm has), which
would side-step the pool sharing issue, the binary package renaming,
etc. One big issue with this is that it requires for dpkg to have an exhaustive table of all such aliases, and if there's ever a new alias
added, old dpkg versions need to be updated or they will not understand
what they match with. So this does not seem ideal either. So I guess this
is a variation over your proposal, but perhaps this could still be used
in specific contexts, say only at build-time (but not for dependency relationships), for repo management (say binary-arm64v9/Packages.xz),
or binary package names where the field would specify the actual name
for the filename, say:
Architecture: arm64
ArchitectureIsa: arm64v9
or maybe better:
Architecture: arm64
ArchitectureIsa: v9
resulting in dpkg-deb generating:
binpkg_1.0-1_arm64v9.deb
but targeting arm64.
I also think I prefer naming this explicitly as ISA
variants, if you will, than just architecture variants as that gives
way too much room
(which perhaps we want, but then that has other
implications over compatibility), and for the field perhaps just Isa is better, to avoid the implicit repetition of ArchitectureInstructionSetArchitecture :), but that makes it less easy
to associate both as related.
In the end though, I think there are perhaps bigger constraints from
the infra side of things than the package tooling, stuff like archive management software, or binary transition migration and similar.
In terms of building consensus around this design, I thought it makessense
to start at the bottom of the stack and so here I am on this mailing list :-) I guess in due course this could become a DEP, and would certainlyneed
to be discussed on debian-devel before getting too far.
I'm not sure there's ever been much of a wide interest in something
like this in Debian TBH. Due to deployment and increased infra
overhead at least?
What do you think? Have I missed any glaring implications?
No, I think the overall picture is about right, and captures most of the things we have discussed at various times and places in the past. :)
Is there a better way of doing this?
I think starting from 5, the rest are probably just details to hammer
out, but not insurmountable things.
<div>* How is the default ISA for a buildd chroot selected?</div><div><br></div><div>There is also the question of whether partial coverage of an ISA is handled by the package publisher or client side in apt but that's at least one level higher.</
<div><br></div><div>Cheers,</div><div>mwh </div></div></div>
Thanks for the considered response. And sorry for the very slow reply.
On Wed, 6 Sept 2023 at 21:27, Guillem Jover wrote:
I'm not sure I entirely agree with the requirements you set forth
though:
- I think such optimized builds might need to be done with "special
toolchains" (these could simply be wrappers over the host compiler
passing the appropriate flags via command-line or via specs or
similar, not necessarily full blown toolchains), passing these via
something like dpkg-buildflags seems currently unreliable, as I don't
think we have full coverage in packages (neither for all compilers
available)? Although it would be better as it would centralize the
management. (For reference this is in part how rpm handles this:
https://github.com/rpm-software-management/rpm/blob/master/rpmrc.in)
I agree that is not completely clear what the best approach here is, do we change the defaults of gcc or influence things via default buildflags.
I'm sure there are packages that do not respect dpkg-buildflags during
build but the consequences of this do not seem all that great -- such packages would not be optimized for the variant / ISA but if someone
manages to notice this, they can fix the bug.
OTOH, having the compiler default change may be a bit of a surprise for people who build binaries for deployment not via Debian packages. (Do our compilers in general target the same baseline as Debian does for a given architecture?).
- Perhaps that's a limitation from the archive software side, but
requiring to place the binary packages in the same pool seems
rather restrictive (it forces different filenames for example).
We are considering supporting multiple variant/ISAs in the primary Ubuntu archive, so if we get that far then yes, we want to have all the binary packages in the same pool. The first steps don't have to support this I guess.
- I guess it might be nice for the ISA to be passed down to the
dpkg tools, but I don't think this is strictly necessary? A
frontend like apt could also decide based on metadata in say the
Release file, although not having the actual installed package
metadata on whether it was a different ISA build or not would make
its job more inconvenient. In any case I don't have a big issue
with recording this via dpkg-gencontrol or similar if necessary.
I agree, I don't think it's /strictly/ required that the target ISA is recorded in the deb. But I think adding a field for it reduces scope for confusion later.
On the specific implementation details:
- As covered in previous discussions, dpkg could (but I don't think
it's necessary) check whether the .deb is runnable on the current
hw, but that's tricky as chrootless installs need to be taken
into account, etc. It should certainly not be part of dependency
resolution.
I'm sorry, what is a chrootless install? But I think I agree here too:
tricky and just not really worth it.
- I'm not fond of having to change the binary package name format
either for this (name_version_arch.deb) even if at least dpkg
itself does not care (but I know other tools do care), and
depending on the format I'd expect things to break (this goes
back to the shared pool concern).
I don't think this is avoidable in the long run. I must admit I have generally thought of the presence of the architecture name in the .deb file name to be more a convention than part of the format (and the "real" indication of a binary package's architecture is in DEBIAN/control).
- If dpkg-architecture needs to be aware of this, then this might need
to be auto-detectable from just the current toolchain being used.
So you are saying to configure a build environment for, say, x86-64-v3 you would configure gcc with --with-arch64=x86-64-v3 and then dpkg-architecture would parse the output of gcc -Q --help=target to set DEB_HOST_ARCH_VARIANT appropriately? (modulo mistakes in details) Or do you mean something else entirely?
Some of the above problems could perhaps be avoided if we introduced
a concept of architecture aliases/ISAs (similar to what rpm has), which would side-step the pool sharing issue, the binary package renaming,
etc. One big issue with this is that it requires for dpkg to have an exhaustive table of all such aliases, and if there's ever a new alias added, old dpkg versions need to be updated or they will not understand what they match with. So this does not seem ideal either. So I guess this is a variation over your proposal, but perhaps this could still be used
in specific contexts, say only at build-time (but not for dependency relationships), for repo management (say binary-arm64v9/Packages.xz),
or binary package names where the field would specify the actual name
for the filename, say:
Architecture: arm64
ArchitectureIsa: arm64v9
or maybe better:
Architecture: arm64
ArchitectureIsa: v9
resulting in dpkg-deb generating:
binpkg_1.0-1_arm64v9.deb
but targeting arm64.
I'm not sure but I think you have talked yourself into suggesting something very similar to my proposal here?
On Fri, 2023-09-01 at 08:43:55 +1200, Michael Hudson-Doyle wrote:
Is there a better way of doing this?
I think starting from 5, the rest are probably just details to hammer
out, but not insurmountable things.
Great. The things I see as a bit vague at a base level currently are:
* Should the ISA influence the toolchain via toolchain defaults or dpkg-buildflags?
* How is the default ISA for a buildd chroot selected?
There is also the question of whether partial coverage of an ISA is handled by the package publisher or client side in apt but that's at least one
level higher.
Hi!
On Thu, 2023-09-21 at 14:43:42 +1200, Michael Hudson-Doyle wrote:
Thanks for the considered response. And sorry for the very slow reply.
Idem! :)
On Wed, 6 Sept 2023 at 21:27, Guillem Jover wrote:
I'm not sure I entirely agree with the requirements you set forth
though:
don't- I think such optimized builds might need to be done with "special
toolchains" (these could simply be wrappers over the host compiler
passing the appropriate flags via command-line or via specs or
similar, not necessarily full blown toolchains), passing these via
something like dpkg-buildflags seems currently unreliable, as I
think we have full coverage in packages (neither for all compilers
available)? Although it would be better as it would centralize the
management. (For reference this is in part how rpm handles this:
https://github.com/rpm-software-management/rpm/blob/master/rpmrc.in)
I agree that is not completely clear what the best approach here is, dowe
change the defaults of gcc or influence things via default buildflags.
I'm sure there are packages that do not respect dpkg-buildflags during build but the consequences of this do not seem all that great -- such packages would not be optimized for the variant / ISA but if someone manages to notice this, they can fix the bug.
OTOH, having the compiler default change may be a bit of a surprise for people who build binaries for deployment not via Debian packages. (Do our compilers in general target the same baseline as Debian does for a given architecture?).
Right, given that the failure mode would be just "no-optimized-builds",
and should not end up with those packages being broken, at most just redundant with the baseline ones, then I guess controlling it either
way would seem fine, yes.
(Also if the packages are reproducible, and end up being not optimized
this might be detectable as producing identical artifacts as on the baseline.)
- Perhaps that's a limitation from the archive software side, but
requiring to place the binary packages in the same pool seems
rather restrictive (it forces different filenames for example).
We are considering supporting multiple variant/ISAs in the primary Ubuntu archive, so if we get that far then yes, we want to have all the binary packages in the same pool. The first steps don't have to support this I guess.
Ok. Just a note that even if served from the primary archive, there
could be multiple pools (like the multi-pool setup on debian-ports),
as the entry point are the (In)Release files.
But, yes, the other
option would be to use the variant/ISA name as a "fake arch" just in
the binary package name.
- I guess it might be nice for the ISA to be passed down to the
dpkg tools, but I don't think this is strictly necessary? A
frontend like apt could also decide based on metadata in say the
Release file, although not having the actual installed package
metadata on whether it was a different ISA build or not would make
its job more inconvenient. In any case I don't have a big issue
with recording this via dpkg-gencontrol or similar if necessary.
I agree, I don't think it's /strictly/ required that the target ISA is recorded in the deb. But I think adding a field for it reduces scope for confusion later.
Yes, agreed.
On the specific implementation details:
- As covered in previous discussions, dpkg could (but I don't think
it's necessary) check whether the .deb is runnable on the current
hw, but that's tricky as chrootless installs need to be taken
into account, etc. It should certainly not be part of dependency
resolution.
I'm sorry, what is a chrootless install? But I think I agree here too: tricky and just not really worth it.
https://wiki.debian.org/Teams/Dpkg/Spec/InstallBootstrap
running the host tools, so disallowing installation could be
problematic. Even though I guess there could be a warning about this,
or maybe it could be controlled through a force option, although both
seems like they could be disruptive.
- I'm not fond of having to change the binary package name format
either for this (name_version_arch.deb) even if at least dpkg
itself does not care (but I know other tools do care), and
depending on the format I'd expect things to break (this goes
back to the shared pool concern).
I don't think this is avoidable in the long run. I must admit I have generally thought of the presence of the architecture name in the .debfile
name to be more a convention than part of the format (and the "real" indication of a binary package's architecture is in DEBIAN/control).
Yes and no I guess. In theory the (canonical) information should be
extracted from the DEBIAN/control from inside the .deb, in practice
I think tools (?) (might) try to use heuristics from just the filename
to avoid having to open, uncompress and parse every .deb around, for performance reasons.
where we'd use a name which would otherwise be valid as an arch name (so,
no weird symbols, or «-» separators that are not intended to split <os>
and <cpu> or similar), then using a name for the variant/ISA would be
fine.
- If dpkg-architecture needs to be aware of this, then this might need
to be auto-detectable from just the current toolchain being used.
So you are saying to configure a build environment for, say, x86-64-v3you
would configure gcc with --with-arch64=x86-64-v3 and thendpkg-architecture
would parse the output of gcc -Q --help=target to setDEB_HOST_ARCH_VARIANT
appropriately? (modulo mistakes in details) Or do you mean something else entirely?
That would be one solution yes, which could give automatic bijective mappings, although ideally with a machine-readable way to get at it,
which I'm not sure we have currently.
For example code in dpkg-dev
already runs «$CC -dumpmachine» to infer the host architecture to use during builds.
While using a triplet variation could be a way to do that, that would
require such triplet support for each variant/ISA, which tends to be
very painful to introduce if it's not there already, so I'd not
consider this specific way a viable option.
thisSome of the above problems could perhaps be avoided if we introduced
a concept of architecture aliases/ISAs (similar to what rpm has), which would side-step the pool sharing issue, the binary package renaming,
etc. One big issue with this is that it requires for dpkg to have an exhaustive table of all such aliases, and if there's ever a new alias added, old dpkg versions need to be updated or they will not understand what they match with. So this does not seem ideal either. So I guess
is a variation over your proposal, but perhaps this could still be used in specific contexts, say only at build-time (but not for dependency relationships), for repo management (say binary-arm64v9/Packages.xz),
or binary package names where the field would specify the actual name
for the filename, say:
Architecture: arm64
ArchitectureIsa: arm64v9
or maybe better:
Architecture: arm64
ArchitectureIsa: v9
resulting in dpkg-deb generating:
binpkg_1.0-1_arm64v9.deb
but targeting arm64.
I'm not sure but I think you have talked yourself into suggestingsomething
very similar to my proposal here?
Ah sorry, yeah, didn't mean to present it as a new idea,
I was mostly
trying to walk over the issues, and refine upon your initial idea,
with those constraints applied. :)
On Fri, 2023-09-01 at 08:43:55 +1200, Michael Hudson-Doyle wrote:
Is there a better way of doing this?
I think starting from 5, the rest are probably just details to hammer out, but not insurmountable things.
Great. The things I see as a bit vague at a base level currently are:
* Should the ISA influence the toolchain via toolchain defaults or dpkg-buildflags?
* How is the default ISA for a buildd chroot selected?
So the clear downsides of either modifying the default toolchain or
having to provide an additional one is that this seems pretty heavy
weight. Also because people might want to build optimized variants
locally w/o having to mess with their already existing toolchains.
(I'm not sure whether something going along the lines of <https://git.hadrons.org/cgit/debian/fakecross.git> could be an
option, although as mentioned above, if that would imply new triplets,
then probably not.)
So the easiest way might indeed be by controlling this via an envvar,
which dpkg-buildpackage could also setup internally via a new option,
say --arch-isa=amd64v3 or similar
to make this slightly more
discoverable. Which would be easy to use from the buildds too I guess.
There is also the question of whether partial coverage of an ISA ishandled
by the package publisher or client side in apt but that's at least one level higher.
Yeah, that would be of no concern to dpkg, I think.
<div class="gmail_quote">Conceptually slightly separately, it might make sense to add a build "feature" to Dpkg::Vendor::Debian to allow setting -march (and -mtune?)</div><div class="gmail_quote"><br></div><div class="gmail_quote">Then whenwe want to add support to an ISA, we add a little thing to set_build_features (in either Vendor::Debian or Vendor::Ubuntu or wherever) that maps get_host_arch_isa() to values for the march-influencing feature.</div><div class="gmail_quote"><div><br></div>
On Tue, 31 Oct 2023 at 09:21, Guillem Jover wrote:
This can be used among other things to set up foreign chroots, by
running the host tools, so disallowing installation could be
problematic. Even though I guess there could be a warning about this,
or maybe it could be controlled through a force option, although both
seems like they could be disruptive.
Of course in such cases dpkg knows that something funny is going on and
could suppress the warning itself.
I spent a few minutes trying to think hard about this and I honestly don't think I can predict whether trying to prevent installation of incompatible packages is worth it (after all one of the ways users could get into
trouble would be moving an installed system to a different CPU and having binaries start to fail and obviously dpkg can't help there).
One result of this thinking was: I had been thinking/assuming the issue of which variants to consider would be apt configuration, but maybe dpkg configuration would make more sense (after all, --add-architecture is a parameter to dpkg). And in this case, dpkg could object when installing a variant that has not been configured.
If the only change in the package filename format is in the <arch> part where we'd use a name which would otherwise be valid as an arch name (so, no weird symbols, or «-» separators that are not intended to split <os> and <cpu> or similar), then using a name for the variant/ISA would be
fine.
Right. I think that (when possible pretending e.g. "amd64v3" is a distinct architecture will generally make things easier. E.g. I think britney
wouldn't need to know about the relationship between "amd64" and "amd64v3".
That would be one solution yes, which could give automatic bijective mappings, although ideally with a machine-readable way to get at it,
which I'm not sure we have currently.
I think "gcc -Q --help=target | grep -e '^\s*-march'" is about as machine readable as it gets currently, for better or worse (mostly worse).
For example code in dpkg-dev
already runs «$CC -dumpmachine» to infer the host architecture to use during builds.
While using a triplet variation could be a way to do that, that would require such triplet support for each variant/ISA, which tends to be
very painful to introduce if it's not there already, so I'd not
consider this specific way a viable option.
I admit I'm not an expert on triplet intricacies but I think a new triplet
is not appropriate here (a bit like a new Debian architecture for a variant/ISA choice is not the right concept).
On Thu, 2023-09-21 at 14:43:42 +1200, Michael Hudson-Doyle wrote:
* Should the ISA influence the toolchain via toolchain defaults or dpkg-buildflags?
* How is the default ISA for a buildd chroot selected?
So the clear downsides of either modifying the default toolchain or
having to provide an additional one is that this seems pretty heavy
weight. Also because people might want to build optimized variants
locally w/o having to mess with their already existing toolchains.
(I'm not sure whether something going along the lines of <https://git.hadrons.org/cgit/debian/fakecross.git> could be an
option, although as mentioned above, if that would imply new triplets,
then probably not.)
So the easiest way might indeed be by controlling this via an envvar,
DEB_HOST_ARCH_ISA?
which dpkg-buildpackage could also setup internally via a new option,
say --arch-isa=amd64v3 or similar
--host-arch-isa would be more coherent I think.
I guess one could add support for --target-host-arch-isa to build a
toolchain that defaults to a particular ISA. But well.
So to summarise, here are the generic changes that I think need to be made
to src:dpkg to support variant ISAs as a thing:
* add get_host_arch_isa() to Dpkg::Arch
* dpkg-gencontrol records DEB_HOST_ARCH_ISA into DEBIAN/control as ArchitectureIsa
* dpkg-architecture emits DEB_HOST_ARCH_ISA and grows --host-arch-isa flag
* dpkg-buildpackage passes --host-arch-isa to dpkg-architecture
* dpkg-genchanges should record the ISA in the changes file somehow I
guess?
* dpkg-deb records the ISA in the file name
Have I missed anything?
(Hmm does anything need to reject unknown values
found in DEB_HOST_ARCH_ISA / --host-arch-isa? Probably...)
Conceptually slightly separately, it might make sense to add a build "feature" to Dpkg::Vendor::Debian to allow setting -march (and -mtune?)
Then when we want to add support to an ISA, we add a little thing to set_build_features (in either Vendor::Debian or Vendor::Ubuntu or wherever) that maps get_host_arch_isa() to values for the march-influencing feature.
Hi!
On Thu, 2023-11-02 at 15:27:54 +0000, Michael Hudson-Doyle wrote:
On Tue, 31 Oct 2023 at 09:21, Guillem Jover wrote:
This can be used among other things to set up foreign chroots, by
running the host tools, so disallowing installation could be
problematic. Even though I guess there could be a warning about this,
or maybe it could be controlled through a force option, although both seems like they could be disruptive.
Of course in such cases dpkg knows that something funny is going on and could suppress the warning itself.
Right, also true.
I spent a few minutes trying to think hard about this and I honestlydon't
think I can predict whether trying to prevent installation ofincompatible
packages is worth it (after all one of the ways users could get into trouble would be moving an installed system to a different CPU and having binaries start to fail and obviously dpkg can't help there).
One result of this thinking was: I had been thinking/assuming the issueof
which variants to consider would be apt configuration, but maybe dpkg configuration would make more sense (after all, --add-architecture is a parameter to dpkg). And in this case, dpkg could object when installing a variant that has not been configured.
Yes, the "plan" has been to add support for run-time CPU attributes,
so that when adding a new arch, for example you can specify whether
that arch is runnable, which could help dpkg decide whether to allow
by default to install M-A:foreign packages.
I guess this is similar, so such future interface should probably take
this into account as something to support too. Will check where this
is tracked and add a note to it.
And of course that is fine as a guardrail, but if a user hit that out
of running a frontend, then that would already be too late, which to
me means that frontends need to be aware of this too (and not pass
packages that dpkg would/could/might refuse to install), when deciding
what to pass to dpkg.
But in any case, as you say, this currently would not be worse than configuring a foreign arch, installing some foreign package and trying
to run it, but it might make it potentially more common. And as
mentioned above the effecting layer this needs to be decided up seems
higher anyway (even if dpkg could provide the infra for it).
(so,If the only change in the package filename format is in the <arch> part where we'd use a name which would otherwise be valid as an arch name
no weird symbols, or «-» separators that are not intended to split <os> and <cpu> or similar), then using a name for the variant/ISA would be fine.
Right. I think that (when possible pretending e.g. "amd64v3" is adistinct
architecture will generally make things easier. E.g. I think britney wouldn't need to know about the relationship between "amd64" and"amd64v3".
I guess that depends on whether the intention is to create a full
optimized archive, or just a partial overlay one. In the latter case
then it might need to know to be able to satisfy dependencies.
That would be one solution yes, which could give automatic bijective mappings, although ideally with a machine-readable way to get at it, which I'm not sure we have currently.
I think "gcc -Q --help=target | grep -e '^\s*-march'" is about as machine readable as it gets currently, for better or worse (mostly worse).
That does not look very satisfactory, though.
And llvm/clang does not support it. :/
For example code in dpkg-dev
already runs «$CC -dumpmachine» to infer the host architecture to use during builds.
While using a triplet variation could be a way to do that, that would require such triplet support for each variant/ISA, which tends to be
very painful to introduce if it's not there already, so I'd not
consider this specific way a viable option.
I admit I'm not an expert on triplet intricacies but I think a newtriplet
is not appropriate here (a bit like a new Debian architecture for a variant/ISA choice is not the right concept).
We have i386 or arm (?) as (bad IMO) examples where the triplet can
define the arch baseline. The problem is that this requires updating
the GNU config.git upstream, and then getting that to trickle down into
every package that might be using autotools and not using autoreconf
at build time, or to even update triplet matches in configure scripts
and similar, which might be "acceptable" for a new arch, but seems disproportionate for a new ISA, so yes, as mentioned I agree it's not
viable.
On Thu, 2023-09-21 at 14:43:42 +1200, Michael Hudson-Doyle wrote:
* Should the ISA influence the toolchain via toolchain defaults or dpkg-buildflags?
* How is the default ISA for a buildd chroot selected?
So the clear downsides of either modifying the default toolchain or having to provide an additional one is that this seems pretty heavy weight. Also because people might want to build optimized variants locally w/o having to mess with their already existing toolchains.
(I'm not sure whether something going along the lines of <https://git.hadrons.org/cgit/debian/fakecross.git> could be an
option, although as mentioned above, if that would imply new triplets, then probably not.)
So the easiest way might indeed be by controlling this via an envvar,
DEB_HOST_ARCH_ISA?
Yeah, that works, and follows the current DPKG_*_ARCH_ABI lead for
example.
which dpkg-buildpackage could also setup internally via a new option,
say --arch-isa=amd64v3 or similar
--host-arch-isa would be more coherent I think.
Ah absolutely! For some reason had --arch in mind as a valid option
(I only see it now in dpkg-scanpackages :D, or maybe I was thinking
about --host-isa :).
I guess one could add support for --target-host-arch-isa to build a toolchain that defaults to a particular ISA. But well.
Yes, the ISA support in dpkg should be extensive enough (so that if
this needs to be supported in the toolchain, then it is possible).
So to summarise, here are the generic changes that I think need to bemade
to src:dpkg to support variant ISAs as a thing:
* add get_host_arch_isa() to Dpkg::Arch
Yes (perhaps as mentioned below also just get_host_isa()).
* dpkg-gencontrol records DEB_HOST_ARCH_ISA into DEBIAN/control as ArchitectureIsa
Probably better Architecture-Isa, otherwise the current automatic
case folding would make it end up as Architectureisa.
* dpkg-architecture emits DEB_HOST_ARCH_ISA and grows --host-arch-isaflag
Also DEB_BUILD_ARCH_ISA and DEB_TARGET_ARCH_ISA, and also grows a --target-arch-isa (but I'm thinking whether the shorter --host-isa would
be nicer, for example the --match-bits are not spelled --match-arch-bits, which would seem also a bit redundant).
* dpkg-buildpackage passes --host-arch-isa to dpkg-architecture
Yes, but only when not the baseline.
* dpkg-genchanges should record the ISA in the changes file somehow I
guess?
Yes, also dpkg-genbuildinfo.
This could be done either from the
envvars, or perhaps through the debian/files attributes support. But
given that this is supposedly build global (I think it would be rather
weird to end up with a .changes including say _amd64.deb and
_amd64v3.deb file references from the same build),
then probably using
the envvar might be the better way.
* dpkg-deb records the ISA in the file name
Yes.
Have I missed anything?
Nothing else comes to mind right now (except what I might have already added).
(Hmm does anything need to reject unknown values
found in DEB_HOST_ARCH_ISA / --host-arch-isa? Probably...)
I guess it indeed makes sense to define what ISAs are supported, and
either error out or warn and ignore such values. So there might be a
need to add something like a new data/isatable.
Conceptually slightly separately, it might make sense to add a build "feature" to Dpkg::Vendor::Debian to allow setting -march (and -mtune?)
Then when we want to add support to an ISA, we add a little thing to set_build_features (in either Vendor::Debian or Vendor::Ubuntu orwherever)
that maps get_host_arch_isa() to values for the march-influencingfeature.
Hmm, right, how to hook this. I'm not sure the current interface is good enough to describe this via build flags features, because such new feature area would expose arch-specific features. I have been thinking through
the build flags and will try to send a proposal/RFC to revamp parts of
it during the weekend.
But I think the ISA stuff is better just handled
(at leas for now) directly by injecting whatever flags when the requested
ISA is different to the baseline.
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> * dpkg-genchanges should record the ISA in the changes file somehow I<br>
Hi!
On Thu, 2023-11-02 at 15:27:54 +0000, Michael Hudson-Doyle wrote:
On Tue, 31 Oct 2023 at 09:21, Guillem Jover wrote:
This can be used among other things to set up foreign chroots, by
running the host tools, so disallowing installation could be
problematic. Even though I guess there could be a warning about this,
or maybe it could be controlled through a force option, although both seems like they could be disruptive.
Of course in such cases dpkg knows that something funny is going on and could suppress the warning itself.
Right, also true.
I spent a few minutes trying to think hard about this and I honestly don't think I can predict whether trying to prevent installation of incompatible packages is worth it (after all one of the ways users could get into trouble would be moving an installed system to a different CPU and having binaries start to fail and obviously dpkg can't help there).
One result of this thinking was: I had been thinking/assuming the issue of which variants to consider would be apt configuration, but maybe dpkg configuration would make more sense (after all, --add-architecture is a parameter to dpkg). And in this case, dpkg could object when installing a variant that has not been configured.
Yes, the "plan" has been to add support for run-time CPU attributes,
so that when adding a new arch, for example you can specify whether
that arch is runnable, which could help dpkg decide whether to allow
by default to install M-A:foreign packages.
I guess this is similar, so such future interface should probably take
this into account as something to support too. Will check where this
is tracked and add a note to it.
And of course that is fine as a guardrail, but if a user hit that out
of running a frontend, then that would already be too late, which to
me means that frontends need to be aware of this too (and not pass
packages that dpkg would/could/might refuse to install), when deciding
what to pass to dpkg.
But in any case, as you say, this currently would not be worse than configuring a foreign arch, installing some foreign package and trying
to run it, but it might make it potentially more common. And as
mentioned above the effecting layer this needs to be decided up seems
higher anyway (even if dpkg could provide the infra for it).
Hi!
On Fri, 2023-09-01 at 08:43:55 +1200, Michael Hudson-Doyle wrote:
Recently the topic of exploiting newer instructions without dropping support for older machines has come up several times inside Ubuntu engineering. I understand this topic has come up several times in the past for Debian as well, but nothing has really come of it to date.
I also had a chat about this with Matthias Klose (CCed) around 2022-05.
I've spent a while thinking through the options and coming up with a design and wrote some notes into a wiki page: https://wiki.debian.org/ArchitectureVariants
I think we are already doing 1, 2 and 3. I agree 4 is just wrong. And something like 5 is what I suggested to Matthias for Ubuntu when we
last discussed it as the best way to go about this.
I'm not sure I entirely agree with the requirements you set forth
though:
- I think such optimized builds might need to be done with "special
toolchains" (these could simply be wrappers over the host compiler
passing the appropriate flags via command-line or via specs or
similar, not necessarily full blown toolchains), passing these via
something like dpkg-buildflags seems currently unreliable, as I don't
think we have full coverage in packages (neither for all compilers
available)? Although it would be better as it would centralize the
management. (For reference this is in part how rpm handles this:
https://github.com/rpm-software-management/rpm/blob/master/rpmrc.in)
- Perhaps that's a limitation from the archive software side, but
requiring to place the binary packages in the same pool seems
rather restrictive (it forces different filenames for example).
- I guess it might be nice for the ISA to be passed down to the
dpkg tools, but I don't think this is strictly necessary? A
frontend like apt could also decide based on metadata in say the
Release file, although not having the actual installed package
metadata on whether it was a different ISA build or not would make
its job more inconvenient. In any case I don't have a big issue
with recording this via dpkg-gencontrol or similar if necessary.
On the specific implementation details:
- Changing the Architecture format (as in adding colons there) seems
like a non-starter, and I expect that would break lots of things
(I mean it could be done but I'm not sure it's worth it for this).
Recording this mostly as a hint than anything else, via another
field (if necessary at all) I think would be best.
- As covered in previous discussions, dpkg could (but I don't think
it's necessary) check whether the .deb is runnable on the current
hw, but that's tricky as chrootless installs need to be taken
into account, etc. It should certainly not be part of dependency
resolution.
- I'm not fond of having to change the binary package name format
either for this (name_version_arch.deb) even if at least dpkg
itself does not care (but I know other tools do care), and
depending on the format I'd expect things to break (this goes
back to the shared pool concern).
- If dpkg-architecture needs to be aware of this, then this might need
to be auto-detectable from just the current toolchain being used.
Some of the above problems could perhaps be avoided if we introduced
a concept of architecture aliases/ISAs (similar to what rpm has), which
would side-step the pool sharing issue, the binary package renaming,
etc. One big issue with this is that it requires for dpkg to have an exhaustive table of all such aliases, and if there's ever a new alias
added, old dpkg versions need to be updated or they will not understand
what they match with. So this does not seem ideal either. So I guess this
is a variation over your proposal, but perhaps this could still be used
in specific contexts, say only at build-time (but not for dependency relationships), for repo management (say binary-arm64v9/Packages.xz),
or binary package names where the field would specify the actual name
for the filename, say:
Architecture: arm64
ArchitectureIsa: arm64v9
or maybe better:
Architecture: arm64
ArchitectureIsa: v9
resulting in dpkg-deb generating:
binpkg_1.0-1_arm64v9.deb
but targeting arm64. I also think I prefer naming this explicitly as ISA variants, if you will, than just architecture variants as that gives
way too much room (which perhaps we want, but then that has other implications over compatibility), and for the field perhaps just Isa is better, to avoid the implicit repetition of ArchitectureInstructionSetArchitecture :), but that makes it less easy
to associate both as related.
On Wed, Sep 06, 2023 at 11:27:02AM +0200, Guillem Jover wrote:
Hi!
On Fri, 2023-09-01 at 08:43:55 +1200, Michael Hudson-Doyle wrote:
Recently the topic of exploiting newer instructions without dropping
support for older machines has come up several times inside Ubuntu
engineering. I understand this topic has come up several times in the past >>> for Debian as well, but nothing has really come of it to date.
I also had a chat about this with Matthias Klose (CCed) around 2022-05.
I've spent a while thinking through the options and coming up with a design >>> and wrote some notes into a wiki page:
https://wiki.debian.org/ArchitectureVariants
I think we are already doing 1, 2 and 3. I agree 4 is just wrong. And
something like 5 is what I suggested to Matthias for Ubuntu when we
last discussed it as the best way to go about this.
I'm not sure I entirely agree with the requirements you set forth
though:
- I think such optimized builds might need to be done with "special
toolchains" (these could simply be wrappers over the host compiler
passing the appropriate flags via command-line or via specs or
similar, not necessarily full blown toolchains), passing these via
something like dpkg-buildflags seems currently unreliable, as I don't
think we have full coverage in packages (neither for all compilers
available)? Although it would be better as it would centralize the
management. (For reference this is in part how rpm handles this:
https://github.com/rpm-software-management/rpm/blob/master/rpmrc.in)
- Perhaps that's a limitation from the archive software side, but
requiring to place the binary packages in the same pool seems
rather restrictive (it forces different filenames for example).
- I guess it might be nice for the ISA to be passed down to the
dpkg tools, but I don't think this is strictly necessary? A
frontend like apt could also decide based on metadata in say the
Release file, although not having the actual installed package
metadata on whether it was a different ISA build or not would make
its job more inconvenient. In any case I don't have a big issue
with recording this via dpkg-gencontrol or similar if necessary.
On the specific implementation details:
- Changing the Architecture format (as in adding colons there) seems
like a non-starter, and I expect that would break lots of things
(I mean it could be done but I'm not sure it's worth it for this).
Recording this mostly as a hint than anything else, via another
field (if necessary at all) I think would be best.
- As covered in previous discussions, dpkg could (but I don't think
it's necessary) check whether the .deb is runnable on the current
hw, but that's tricky as chrootless installs need to be taken
into account, etc. It should certainly not be part of dependency
resolution.
- I'm not fond of having to change the binary package name format
either for this (name_version_arch.deb) even if at least dpkg
itself does not care (but I know other tools do care), and
depending on the format I'd expect things to break (this goes
back to the shared pool concern).
- If dpkg-architecture needs to be aware of this, then this might need
to be auto-detectable from just the current toolchain being used.
Some of the above problems could perhaps be avoided if we introduced
a concept of architecture aliases/ISAs (similar to what rpm has), which
would side-step the pool sharing issue, the binary package renaming,
etc. One big issue with this is that it requires for dpkg to have an
exhaustive table of all such aliases, and if there's ever a new alias
added, old dpkg versions need to be updated or they will not understand
what they match with. So this does not seem ideal either. So I guess this
is a variation over your proposal, but perhaps this could still be used
in specific contexts, say only at build-time (but not for dependency
relationships), for repo management (say binary-arm64v9/Packages.xz),
or binary package names where the field would specify the actual name
for the filename, say:
Architecture: arm64
ArchitectureIsa: arm64v9
or maybe better:
Architecture: arm64
ArchitectureIsa: v9
resulting in dpkg-deb generating:
binpkg_1.0-1_arm64v9.deb
but targeting arm64. I also think I prefer naming this explicitly as ISA
variants, if you will, than just architecture variants as that gives
way too much room (which perhaps we want, but then that has other
implications over compatibility), and for the field perhaps just Isa is
better, to avoid the implicit repetition of
ArchitectureInstructionSetArchitecture :), but that makes it less easy
to associate both as related.
I have thought more about this and I'm not particularly fond of the ArchitectureIsa name. While *this specific use case* is a variant of
the architecture instruction set; you could just as well build other
variants such as "compiled with -O3", "compiled with frame pointers", "compiled with -O0", or other shenanigans (I haven't thought about
others outside compiler flags)
Hence I prefer Architecture-Variant, Subarchitecture, or anything
like that rather than have to invent another field or abuse this
one the next time we want to build a special variant of an architecture
with special optimizations for a special customer or whatnot.
On 03.05.24 11:27, Julian Andres Klode wrote:
On Wed, Sep 06, 2023 at 11:27:02AM +0200, Guillem Jover wrote:
Some of the above problems could perhaps be avoided if we introduced
a concept of architecture aliases/ISAs (similar to what rpm has), which would side-step the pool sharing issue, the binary package renaming,
etc. One big issue with this is that it requires for dpkg to have an exhaustive table of all such aliases, and if there's ever a new alias added, old dpkg versions need to be updated or they will not understand what they match with. So this does not seem ideal either. So I guess this is a variation over your proposal, but perhaps this could still be used in specific contexts, say only at build-time (but not for dependency relationships), for repo management (say binary-arm64v9/Packages.xz),
or binary package names where the field would specify the actual name
for the filename, say:
Architecture: arm64
ArchitectureIsa: arm64v9
or maybe better:
Architecture: arm64
ArchitectureIsa: v9
resulting in dpkg-deb generating:
binpkg_1.0-1_arm64v9.deb
but targeting arm64. I also think I prefer naming this explicitly as ISA variants, if you will, than just architecture variants as that gives
way too much room (which perhaps we want, but then that has other implications over compatibility), and for the field perhaps just Isa is better, to avoid the implicit repetition of ArchitectureInstructionSetArchitecture :), but that makes it less easy
to associate both as related.
I have thought more about this and I'm not particularly fond of the ArchitectureIsa name. While *this specific use case* is a variant of
the architecture instruction set; you could just as well build other variants such as "compiled with -O3", "compiled with frame pointers", "compiled with -O0", or other shenanigans (I haven't thought about
others outside compiler flags)
or
- DistroBuiltWithClang, e.g. using libc++ instead of libstdc++.
- distro built with some of the sanitizers turned on by default
Hence I prefer Architecture-Variant, Subarchitecture, or anything
like that rather than have to invent another field or abuse this
one the next time we want to build a special variant of an architecture with special optimizations for a special customer or whatnot.
yes, it sounds like a bit too specific.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 48:15:53 |
Calls: | 6,710 |
Calls today: | 3 |
Files: | 12,243 |
Messages: | 5,354,638 |
Posted today: | 1 |