• isa-support -- exit strategy?

    From Adam Borowski@21:1/5 to All on Fri Mar 25 23:40:01 2022
    Hi!
    While packages are allowed to not support entire architectures
    outright, there's a problem when some code requires a feature that is
    not present in the arch's baseline. Effectively, this punishes an arch
    for keeping compatibility. The package's maintainers are then required
    to conform to the baseline even when this requires a significant work
    and/or is a pointless exercise (eg. scientific number-crunching code
    makes no sense to run on a 2002 box).

    With that in mind, in 2017 I added "isa-support" which implements
    install-time checks via a dependency. Alas, this doesn't work as well
    as it should:

    * new installs fail quite late into installation process, leaving you
    with a bunch of packages unpacked but unconfigured; some apt
    frontends don't take this situation gracefully.

    * upgrades when an existing package drops support for old hardware are
    even worse.

    * while a hard Depends: works for leafy packages, on a library it
    disallows having alternate implementations that don't need the
    library in question. Eg, libvectorscan5 blocks a program that
    uses it from just checking the regexes one by one.

    Suggestions?


    Meow!
    --
    ⢀⣴⠾⠻⢶⣦⠀ Eight legs good, four legs bad! -- when your drider pwns a
    ⣾⠁⢠⠒⠀⣿⡁ smelly goodie centaur.
    ⢿⡄⠘⠷⠚⠋⠀ Rearkick OP -- my grandpa's brother-in-law got one-shotted
    ⠈⠳⣄⠀⠀⠀⠀ from full hp in RL, please nerf!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to Adam Borowski on Sat Mar 26 02:50:01 2022
    Adam Borowski wrote:

    * new installs fail quite late into installation process, leaving you
      with a bunch of packages unpacked but unconfigured; some apt
      frontends don't take this situation gracefully.

    Maybe install isa-support by default and add an apt hook similar to the apt-listbugs one that blocks installation of unsupported packages
    before the installation process starts.

    * upgrades when an existing package drops support for old hardware are
      even worse.

    As above, but this approach is pretty terrible with automated upgrades,
    so there probably needs to be a way for hooks to communicate with
    automated upgrade tools that certain packages should not be upgraded.

    * while a hard Depends: works for leafy packages, on a library it
      disallows having alternate implementations that don't need the
      library in question.  Eg, libvectorscan5 blocks a program that
      uses it from just checking the regexes one by one.

    Libraries really should do runtime CPU detection themselves and return
    failure when the current CPU isn't supported, then applications using
    them can fall back on alternative solutions.

    Suggestions?

    Install time is a suboptimal time for deciding whether or not a certain
    package is supported on the CPU installed during the current boot of a
    system. Live images run on many different CPUs. I have run a regular
    Debian install from an external USB hard drive on many different
    computers at different internet cafes. People often move their hard
    drive from an old/failed computer to a new computer.

    A better option might be to always allow installation, but have an apt
    hook or feature that warns or asks for confirmation when you install
    packages that are not supported on your current CPU, along with an
    executable that can check the CPU is supported, and print an error to stderr/X11/Wayland as appropriate and execute the command otherwise.

    An sse4-support symlink could point to an isa-support executable, which
    could check $0 against the CPU and either do the warning or exec "$@",
    then maintainers could call sse4-support from wrapper scripts etc.

    The errors could be done by different executables so that the errors
    fit in with the desktop that is currently in use if any.

    It might be worth looking at how things like Steam and Flatpak/Snap
    solve this issue, I expect games and proprietary apps often use CPU
    features unsupported on old systems.

    I also wonder if qemu could be used to emulate newer CPU features on
    older systems. That would probably be unusably slow though.

    It is probably worth initiating a cross-distro discussion about this.
    The above idea could even become a cross-distro solution to this issue.

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmI+b8QACgkQMRa6Xp/6 aaMyWg//clcnHrCtN8KeT9pOYpCw4OOm2MWUlkwMn8T9j0azM8nkP2IM7H8ghMv1 eFKKkOESAN9IPQBlRPZKsBUeUj2NrFsodKqD+3eR/Y49jAePRSOT7ORuOO2lRV4Y zZoFEEY9XwvpDG00IS9BtKiq+AvqzAvdnYIaRyfVvT6X8GJfWy7HYD4Xyad8EIP1 0lxPuWg+zFwkLpA8WMtiVHZ7D0/bh13QKkeYWZpfvTnGCxW9wIr9Y8SK1ixbTv/v z9TeShxF/Pzn2L4u5MVv+dDEnUN8j+PZ5g/9txLGswX2f5Hu01HbaUArL+Ttco93 LB15VjhcQqlFziCeTDzM+aOEI/6Y7jF3S2PAB3YC+sfk2Kqcodi6jIORJb49LtvV 15dsJBHY67NzMi6AfN5NhdwQ+K0J0N9jhXm2dQNMR+pxAEWf0Ayo7WAtKANjIhH/ OZ+ZZL9kGE6xhHwKpm/brEG9tpLWtmuvMOhmRFt1ayMxZc0BVrdmvkXpbYz36c/O QDa417smTu9Qdle6q5hJ5DyhN3COiuL7ZMPH0VSK3M2D1IObtiPWUDDkD9CdZoZr QQbNeyCH9PWCfN8c/Jzr/6l4TC7Sl4bxTbo2a/oTXbLdyAg3qSFcxnCAqlb5NIW6 Xw+g2cTAUV0jDVjAtaKUWHWrY4TTA4E+N8lDts6LLg4zon79+5c=
    =waOT
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From M. Zhou@21:1/5 to Adam Borowski on Sat Mar 26 02:40:01 2022
    Hi Adam,

    I think the problems that apt/dpkg
    are trying to deal with is already complicated enough, and
    the architecture specific code are still not significant
    enough to introduce change there.

    Indeed supporting number crunching programs on ancient
    hardware is not meaningful, but the demand on Debian's
    support for number crunching is not that strong according
    to my years of observation.

    For popular applications that can take advantage of above-baseline
    instruction sets, they will eventually write the dynamic code
    dispatcher and add the fallback.

    For applications seriously need performance, they will
    leave CPU and go to GPU or other hardware. If the user correctly
    write the code and fully leverage GPU, the non-optimal CPU
    code won't necessarily be a bottleneck.

    For applications seriously need CPU performance, they are
    possibly going to tell the users how to tweak compiling
    parameters and how to compile locally.

    Eventually, my thoughts about above-baseline support are
    still either source-based package distribution like portage, or
    small deb repository built with a customized dpkg-dev, like
    I mentioned in the past.

    On Fri, 2022-03-25 at 23:34 +0100, Adam Borowski wrote:
    Hi!
    While packages are allowed to not support entire architectures
    outright, there's a problem when some code requires a feature that is
    not present in the arch's baseline.  Effectively, this punishes an
    arch
    for keeping compatibility.  The package's maintainers are then
    required
    to conform to the baseline even when this requires a significant work
    and/or is a pointless exercise (eg.  scientific number-crunching code
    makes no sense to run on a 2002 box).

    With that in mind, in 2017 I added "isa-support" which implements install-time checks via a dependency.  Alas, this doesn't work as
    well
    as it should:

    * new installs fail quite late into installation process, leaving you
      with a bunch of packages unpacked but unconfigured; some apt
      frontends don't take this situation gracefully.

    * upgrades when an existing package drops support for old hardware
    are
      even worse.

    * while a hard Depends: works for leafy packages, on a library it
      disallows having alternate implementations that don't need the
      library in question.  Eg, libvectorscan5 blocks a program that
      uses it from just checking the regexes one by one.

    Suggestions?


    Meow!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andrey Rahmatullin@21:1/5 to Adam Borowski on Sat Mar 26 07:30:01 2022
    On Fri, Mar 25, 2022 at 11:34:17PM +0100, Adam Borowski wrote:
    While packages are allowed to not support entire architectures
    outright, there's a problem when some code requires a feature that is
    not present in the arch's baseline. Effectively, this punishes an arch
    for keeping compatibility. The package's maintainers are then required
    to conform to the baseline even when this requires a significant work
    and/or is a pointless exercise (eg. scientific number-crunching code
    makes no sense to run on a 2002 box).
    A partial arch (whatever that is, yeah) with the x86-64-v3 baseline, and optionally raise the main amd64 baseline to x86-64-v2?
    Assuming we don't want to mass-modify software to support code separation
    into hwcaps etc. or runtime detection.

    With that in mind, in 2017 I added "isa-support" which implements install-time checks via a dependency. Alas, this doesn't work as well
    as it should:
    (that was expected tbh)

    --
    WBR, wRAR

    -----BEGIN PGP SIGNATURE-----

    iQJhBAABCgBLFiEEolIP6gqGcKZh3YxVM2L3AxpJkuEFAmI+sbQtFIAAAAAAFQAP cGthLWFkZHJlc3NAZ251cGcub3Jnd3JhckBkZWJpYW4ub3JnAAoJEDNi9wMaSZLh jv0QAIynjYGGpeoVobG/A+uT/TBpeRE+U+2A4SMcoFknmXakXG95O+0Xan1dy0Pv qOJkPmRXi6gAvvy/HpTGP0CHrB7BZTdT6iDN//gUrhQId3xcUVZ8Z4K9h4Fe4cFH 0Hb1E0d4fc51Hb+H2lXA8bGTL4qsfK/A/92fu12FmNpb1z8rQHv5/N8zSnhMrxvA 8lkpwfbxYxLHGXaKddXHlVfYFtQPT736jpzvR/sZC2rIo6suduhTiVNnho45UUNN GIea04wPhAK0IGLBkwXvaL9daOxceDyfz2k/ekgCb9ZyXY8X27Gcvy2xYwHqY8KD lNFnwnjp88bVTlC/Z6McBRXZUDu4sMcWiiR0k5OaNyzJE74A47OwzWM/vl4vfoCg WBHrwdqWZ6MXy+nVckMnrzDkdEZRh0u2OlLsqb0lj2o+BDf4o1Xpp4KlhNY9UZEi 8ZAaQkueT/2HFIa/tLlvC8W0MsgsL4iuWBmT/Zp2Lirc7lys0hmzzIPKhubPe7WA 13ZaFNA7d09gzIIcavCXnzBaVKoGasscYz8cStSKoTHDC4rorVQq8v7c1UedtDor pPson5mGUMbLoYyPscu43szLbdfb5qka0H5PfS/XilNJltKQeqoL2ZePetbhjP3a PhXR1G/TGrXyXvSeRwPeWQn7Sywd2E6JLyXMP7qXvN2E5E8Q
    =v3kv
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stephan Lachnit@21:1/5 to lumin@debian.org on Sat Mar 26 11:50:01 2022
    On Sat, Mar 26, 2022 at 2:36 AM M. Zhou <lumin@debian.org> wrote:

    Indeed supporting number crunching programs on ancient
    hardware is not meaningful, but the demand on Debian's
    support for number crunching is not that strong according
    to my years of observation.

    For popular applications that can take advantage of above-baseline instruction sets, they will eventually write the dynamic code
    dispatcher and add the fallback.

    For applications seriously need performance, they will
    leave CPU and go to GPU or other hardware. If the user correctly
    write the code and fully leverage GPU, the non-optimal CPU
    code won't necessarily be a bottleneck.

    For applications seriously need CPU performance, they are
    possibly going to tell the users how to tweak compiling
    parameters and how to compile locally.

    I have to disagree on this one. Yes, runtime detection and GPU
    acceleration is great and all, but not every scientific library does
    it and I think it's unrealistic for us to patch them all up.
    Also I don't like the point "since there is low demand for number
    crunching on Debian, so let's just continue to ignore this problem".
    At least I know a decent amount of people that use Debian (or
    downstream distros) for scientific number crunching. Compiling
    optimized for large workloads will always be a thing no matter the
    baseline, but when getting started distro packages are just one less
    thing to care about.

    On Sat, Mar 26, 2022 at 7:25 AM Andrey Rahmatullin <wrar@debian.org> wrote:

    A partial arch (whatever that is, yeah) with the x86-64-v3 baseline, and optionally raise the main amd64 baseline to x86-64-v2?

    +1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon McVittie@21:1/5 to Paul Wise on Sat Mar 26 12:30:01 2022
    On Sat, 26 Mar 2022 at 09:43:32 +0800, Paul Wise wrote:
    It might be worth looking at how things like Steam and Flatpak/Snap
    solve this issue

    In general they don't, or to put it another way, they "solve" it to the
    same extent that Debian/apt/dpkg currently does. Each binary build has
    a required instruction set (often the default for its SDK's compiler,
    but sometimes higher via command-line options like -msse2) and won't
    work on older CPUs.

    It seems like many upstreams, notably Mozilla and Rust, have effectively
    chosen i686 + MMX + SSE + SSE2 to be their baseline for 32-bit builds,
    because that level of functionality is nearly 20 years old and lets them
    avoid the quirks of the i387 FPU.

    The binaries for Steam itself are documented to require x86_64 with
    CMPXCHG16B and SSE3, so Steam is higher-than-baseline even on x86_64.
    The developers of Steam have no interest in supporting early 2000s
    hardware, or embedded 32-bit x86 variants.

    For most native Linux Steam games, the compilers in the SDK are the
    version of gcc from an old version of Ubuntu (gcc 4.6, 4.8 or 5), or one
    of several approximately contemporary versions of clang, or a backport
    of gcc 9 from Debian, and inherit whatever our baseline was at the time
    (i586 or nearly-i686); but Steam game developers are encouraged to
    compile for x86_64 anyway, so it's mostly older games that have 32-bit binaries. For games that still have 32-bit binaries, I suspect that
    using -msse2 is common.

    For Proton and for native Linux Steam games that use the new soldier and
    sniper container runtimes (not really supported yet, but a few games jump through the necessary hoops to do it anyway), the compiler in the SDK
    is the version of gcc or clang from the Debian release that the runtime
    is based on (Debian 10 or 11), or in the case of the Debian-10-based
    soldier runtime, a backport of gcc 9. Again, game developers are encouraged
    to compile for x86_64 when targeting these runtimes.

    Most Flatpak apps are directly or indirectly based on the freedesktop.org runtimes available from Flathub (the GNOME and KDE runtimes are derived
    from the fd.o runtime), and the non-EOL versions of those runtimes no
    longer directly support 32-bit builds, only x86_64 and aarch64.

    Flatpak apps that need biarch x86 (i386 and x86_64 in parallel), like
    the Flatpak version of Steam, use the x86_64 version of the runtime and
    add the "Compat.i386" runtime extension. That extension uses multiarch
    paths, but is conceptually more similar to Debian packages like lib64z1
    than it is to Debian's i386 architecture - it contains 32-bit code,
    but it is only for use by x86_64 CPUs in 32-bit compatibility mode, so
    it can safely assume the presence of all the CPU extensions that are in
    the x86_64 baseline, and in particular SSE2.

    I expect games and proprietary apps often use CPU
    features unsupported on old systems.

    Yes, for large values of "old": bear in mind that, for example, SSE2 was introduced by Intel in 2000 and adopted by AMD in 2003. If Wikipedia is
    to be believed, the last new releases of mainstream pre-SSE2 CPUs were
    late models of the Athlon XP (2004) and Pentium III mobile variants (2007).

    (I'm aware that embedded 32-bit x86 CPUs like the AMD Geode series and
    Intel Quark do not have the same functionality as mainstream desktop/laptop CPUs of a comparable age.)

    I also wonder if qemu could be used to emulate newer CPU features on
    older systems. That would probably be unusably slow though.

    For opcodes that are used to improve performance, that would be
    completely self-defeating.

    For opcodes that are necessary for correctness of a particular sequence
    of code (like CMPXCHG16B), I don't see how that would even work; at best
    it would be really expensive, and at worst it would not provide the
    required semantics.

    smcv

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From M. Zhou@21:1/5 to Stephan Lachnit on Sat Mar 26 19:30:01 2022
    On Sat, 2022-03-26 at 11:42 +0100, Stephan Lachnit wrote:
    On Sat, Mar 26, 2022 at 2:36 AM M. Zhou <lumin@debian.org> wrote:

    Indeed supporting number crunching programs on ancient
    hardware is not meaningful, but the demand on Debian's
    support for number crunching is not that strong according
    to my years of observation.

    For popular applications that can take advantage of above-baseline instruction sets, they will eventually write the dynamic code
    dispatcher and add the fallback.

    For applications seriously need performance, they will
    leave CPU and go to GPU or other hardware. If the user correctly
    write the code and fully leverage GPU, the non-optimal CPU
    code won't necessarily be a bottleneck.

    For applications seriously need CPU performance, they are
    possibly going to tell the users how to tweak compiling
    parameters and how to compile locally.

    I have to disagree on this one. Yes, runtime detection and GPU
    acceleration is great and all, but not every scientific library does
    it and I think it's unrealistic for us to patch them all up.

    Please note I wrote "they (i.e. the upstream)" will implement
    the runtime detection or GPU acceleration, instead of us (Debian).

    Also I don't like the point "since there is low demand for number
    crunching on Debian, so let's just continue to ignore this problem".

    If it was 6 years ago, I would disagree with what I've said in the
    original post. Whether you like it or not, what I said is my
    changed mind after closely working on numerical related libraries
    for 6 years in Debian. And to be clear, I hold a negative opinion
    on what we Debian could actually change besides the upstream.

    If the upstream does not write runtime detection or GPU acceleration,
    they are either not facing a wide range of audience, or the problem
    does not matter, or simply the software isn't appropriate for
    Debian packaging.

    I mentioned infinite times that the eigen3 library which implements
    the core numerical computation part of TensorFlow does not support
    runtime detection -- because CPU acceleration does not matter for
    most of the users. Sane users who really need CPU performance are
    able to recompile tensorflow themselves.

    At least I know a decent amount of people that use Debian (or
    downstream distros) for scientific number crunching. Compiling
    optimized for large workloads will always be a thing no matter the
    baseline, but when getting started distro packages are just one less
    thing to care about.

    I humbly believe over 1/3 of packages I (co-)maintained for Debian are
    for number crunching. And I INSIST in my NEGATIVE opinion after
    trying to do some experiments over the years. The number of people
    who really care about the ISA baseline for Debian distributed package
    is very likely less than you expected.

    I appreciate people who speak for ISA baseline, and appreciate any
    actual effort in this regard. But the lack of care eventually
    changed my mind and make me hold a negative opinion.

    If you think I was simply unsuccessful in promoting any solution for
    the topics in this discussion, please go ahead and I will support
    you in a visible way.

    On Sat, Mar 26, 2022 at 7:25 AM Andrey Rahmatullin <wrar@debian.org> wrote:

    A partial arch (whatever that is, yeah) with the x86-64-v3 baseline, and optionally raise the main amd64 baseline to x86-64-v2?

    +1

    So again, that's possibly something like a partial debian archive with
    a dpkg fork I mentioned.
    That's probably the same idea as the ancient SIMDebian proposal.
    See the example patch for dpkg: https://github.com/SIMDebian/dpkg/commit/13b062567ac58dd1fe5395fb003d6230fd99e7c1
    So that a partial archive with selected source packages can be
    rebuilt automatically in bumped ISA baseline.

    To be clear, the fact that tensorflow does not support runtime detection
    while the baseline code sucks in performance is the direct reason
    why I proposed SIMDebian. The project is abandoned, and patch is
    only for reference.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Adam Borowski on Sun Apr 3 13:50:01 2022
    On Fri, Mar 25, 2022 at 11:34:17PM +0100, Adam Borowski wrote:
    Hi!

    Hi Adam!

    ...
    * while a hard Depends: works for leafy packages, on a library it
    disallows having alternate implementations that don't need the
    library in question. Eg, libvectorscan5 blocks a program that
    uses it from just checking the regexes one by one.

    Suggestions?

    glibc 2.33 added a modernized version of the old hwcaps.
    If a package builds a library several times with different optimizations
    and installs them into the correct directories in the binary package,
    the dynamic linker will automatically select the fastest one supported
    by the hardware.

    SIMDe (or similar approaches) could be used to build variant(s) of the
    library that have compile-time emulation of SIMD instructions in the
    lower baseline builds of vectorscan.

    People using libvectorscan5 on modern hardware with SSE 4.2 would then
    get the properly optimized fast version, while people on older hardware
    would get a version that is slow but works.

    For binaries, I have seen packages in the Debian Med (?) team that build several variants of a program and have a tiny wrapper program that chooses
    the correct one at startup.

    Meow!

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bastian Blank@21:1/5 to Adrian Bunk on Sun Apr 3 15:10:01 2022
    On Sun, Apr 03, 2022 at 02:17:15PM +0300, Adrian Bunk wrote:
    SIMDe (or similar approaches) could be used to build variant(s) of the library that have compile-time emulation of SIMD instructions in the
    lower baseline builds of vectorscan.

    But why? Who in their right mind would ever try to use those aweful
    slow implementations?

    Bastian

    --
    I have never understood the female capacity to avoid a direct answer to
    any question.
    -- Spock, "This Side of Paradise", stardate 3417.3

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gard Spreemann@21:1/5 to Bastian Blank on Mon Apr 4 09:40:01 2022
    Bastian Blank <waldi@debian.org> writes:

    On Sun, Apr 03, 2022 at 02:17:15PM +0300, Adrian Bunk wrote:
    SIMDe (or similar approaches) could be used to build variant(s) of the
    library that have compile-time emulation of SIMD instructions in the
    lower baseline builds of vectorscan.

    But why? Who in their right mind would ever try to use those aweful
    slow implementations?

    I don't know in this particular case, but somewhat analogously: Very few
    people in their right minds do deep learning on CPUs. Yet, I'm extremely
    happy that PyTorch is in Debian (thanks to the hard work of Mo Zhou!),
    even if it's CPU-only. It means that I can develop and test-run code on
    my machine using just Debian packages, before shipping it off to the
    actual compute infrastructure where GPUs do the heavy lifting on a
    GPU-enabled non-Debian PyTorch.

    I can imagine that there are people who do the same with lacking SIMD instructions.


    Best,
    Gard


    --=-=-Content-Type: application/pgp-signature; name="signature.asc"

    -----BEGIN PGP SIGNATURE-----

    iQJGBAEBCgAwFiEEz8XvhRCFHnNVtV6AnRFYKv1UjPoFAmJKn3cSHGdzcHJAbm9u ZW1wdHkub3JnAAoJEJ0RWCr9VIz6yv8QAK/IUD6flJ4u6RESjaNX+VmY9adwic/0 WnrrOJFAPavtDwhrtCWY+AFh+YW3vx6oBbRKfll6fNQHUM3eosiAxcEd4UVEZUwH KRVnqpewSbkMB7sbho+iM7pyrKiuY9HeHBmODwLAvpluDoX7U+uJobSBjHaa0WnS ymSs55yXAD9chUnL07gGPVwHxih/0MtnD7JBsiBq+fT2l9T3o3iZdhMw9YQRNlEJ 7pnzj4ZqYmvWFYrcIuys5dgHXxVP7auh2I1KhhfRgA3uamUt9yGkRjzqOntVFkxb vFl2mwpAbdHMlr0I0WRwRswmIROz0nNl0GhY8pNDscAZNOyjygneoWgHyXC9w2w+ D34nZqYdRZX5o0YtMrTri3PrzxlW+GpFdiPiwy+LCSWLw/9cHd+PMfGAx+2F7mkC hFLN8U0XwnRLgla4RERVpLHCaA8rBT6FMi2p4cbj3+AJW92n9NvkpvWl490Kw5ED Tme+ggjrmvUXRaCSsLiqm4SLpZizKZOVK1njtOJA1k8o6ZzAnO9iU4gVZmZy+9Yp 53Dfh7LB4M7JvHKW5S5t2BkdKpY5pDwDfIej87jsk5SbczhrOn8rQNLq3YGz4oEJ 4iwhTHS1DDDzOgHMvSDI7O6sGnaHwxB+vBF2M0lA4HZz7HhzJPfcvFQNhR7aanmh
    nsTPPHuWQVxW
    CL
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Bastian Blank on Tue Apr 5 11:20:01 2022
    On Sun, Apr 03, 2022 at 02:42:18PM +0200, Bastian Blank wrote:
    On Sun, Apr 03, 2022 at 02:17:15PM +0300, Adrian Bunk wrote:
    SIMDe (or similar approaches) could be used to build variant(s) of the library that have compile-time emulation of SIMD instructions in the
    lower baseline builds of vectorscan.

    But why? Who in their right mind would ever try to use those aweful
    slow implementations?

    There are often usecases where speed is critical and usecases where
    speed is not that important.
    E.g. one of the versions of the regex library Adam memtioned as example
    is being used by rspamd.
    On a busy mailserver the performance of the regex library might
    be critical, for filtering your personal emails an awful slow
    implementation might still be fast enough that you don't care.

    In areas like multimedia it is common that you end up having gazillions
    of libraries linked and loaded you might never use.
    E.g. a program that uses FFmpeg for mp3 decoding is also indirectly
    linked with several libraries for video encoding.
    If the whole library is compiled with some -msse or -march, then
    starting the program might fail due to unsupported instructions the
    compiler generated in the init function of a library you wouldn't
    have used.

    Bastian

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam Borowski@21:1/5 to Adrian Bunk on Wed Apr 6 13:40:01 2022
    On Sun, Apr 03, 2022 at 02:17:15PM +0300, Adrian Bunk wrote:
    On Fri, Mar 25, 2022 at 11:34:17PM +0100, Adam Borowski wrote:
    * while a hard Depends: works for leafy packages, on a library it
    disallows having alternate implementations that don't need the
    library in question. Eg, libvectorscan5 blocks a program that
    uses it from just checking the regexes one by one.

    glibc 2.33 added a modernized version of the old hwcaps.
    If a package builds a library several times with different optimizations
    and installs them into the correct directories in the binary package,
    the dynamic linker will automatically select the fastest one supported
    by the hardware.

    SIMDe (or similar approaches) could be used to build variant(s) of the library that have compile-time emulation of SIMD instructions in the
    lower baseline builds of vectorscan.

    In this particular case, it'd probably be faster to use non-SIMD ways
    instead of emulating them. This means two code paths, which particular
    users may or may not want to do the effort to implement.

    For binaries, I have seen packages in the Debian Med (?) team that build several variants of a program and have a tiny wrapper program that chooses the correct one at startup.

    This may take substantial work to implement, which for typical Debian Med packages is an utter waste of time.

    I wonder why vectorscan requires SSE4.2 while hyperscan that it's a fork of
    is happy with SSE3; a personal mail server may be perfectly adequate on hardware lacking either -- but no SSE4.2 is still realistic while no SSE3
    on amd64 requires some pretty specific dumpster diving (it's lacking only
    on early steppings of Athlon 64). Still, by our rules, SSE3 is not in the
    arch baseline, thus it's a RC bug not to run without.

    And thus, a Debian Med package is required to provide non-SSE3 builds that
    are almost untestable without hard-to-get hardware, that's a pure waste of maintainer time.

    While that mail server may be fully happy with checking the patterns with libpcre one by one. What {vector,hyper}scan are good for is matching _many_ regexes on _many_ lines of data; if there's either few patterns or little
    data, the serial slow way is as good or better.


    Meow!
    --
    ⢀⣴⠾⠻⢶⣦⠀ Eight legs good, four legs bad! -- when your drider pwns a
    ⣾⠁⢠⠒⠀⣿⡁ smelly goodie centaur.
    ⢿⡄⠘⠷⠚⠋⠀ Rearkick OP -- my grandpa's brother-in-law got one-shotted
    ⠈⠳⣄⠀⠀⠀⠀ from full hp in RL, please nerf!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adrian Bunk@21:1/5 to Adam Borowski on Thu Apr 14 10:30:01 2022
    On Wed, Apr 06, 2022 at 01:38:09PM +0200, Adam Borowski wrote:
    On Sun, Apr 03, 2022 at 02:17:15PM +0300, Adrian Bunk wrote:
    On Fri, Mar 25, 2022 at 11:34:17PM +0100, Adam Borowski wrote:
    * while a hard Depends: works for leafy packages, on a library it
    disallows having alternate implementations that don't need the
    library in question. Eg, libvectorscan5 blocks a program that
    uses it from just checking the regexes one by one.

    glibc 2.33 added a modernized version of the old hwcaps.
    If a package builds a library several times with different optimizations and installs them into the correct directories in the binary package,
    the dynamic linker will automatically select the fastest one supported
    by the hardware.

    SIMDe (or similar approaches) could be used to build variant(s) of the library that have compile-time emulation of SIMD instructions in the
    lower baseline builds of vectorscan.

    In this particular case, it'd probably be faster to use non-SIMD ways
    instead of emulating them. This means two code paths, which particular
    users may or may not want to do the effort to implement.

    For supporting older baselines my priority would be functionality with
    minimal effort both for upstreams and Debian maintainers, not optimal performance on old hardware.

    For binaries, I have seen packages in the Debian Med (?) team that build several variants of a program and have a tiny wrapper program that chooses the correct one at startup.

    This may take substantial work to implement, which for typical Debian Med packages is an utter waste of time.
    ...

    The proper approach would be to have the implenmentation in debhelper,
    so that the maintainer only has to declare which n different variants
    of the program to build on $architecture, and then everything including
    the wrapper is built by debhelper.

    I am not saying that I plan to implement it, but that's how I would
    design it to avoid the per-package work you are worried about.

    cu
    Adrian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to Adam Borowski on Mon Sep 5 09:10:01 2022
    On Fri, 2022-03-25 at 23:34 +0100, Adam Borowski wrote:

    Suggestions?

    FYI, as a result of this coming up on IRC again, I have summarised
    various suggestions on this wiki page for future reference:

    https://wiki.debian.org/InstructionSelection

    Please feel free to update/fix the page as needed.

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmMVoFYACgkQMRa6Xp/6 aaM4SBAAuWfzsXWKGKNbXrmK/txgj03BdavmZN8IaJYmErDd4XCQBESsc9d1T1B0 iiN+wWDpl1LKSS6TTM+4uzsSIEqf5UsYIKnqNL/PXSKzgPxBaxyui+UVn5JaNfcY ciYG2hnOVW2j2QSPEwney3AXssZjVtm344NfRaJ55/auN1S/lbwVQtyMva+SyLAp Md/5DAQBSWZWiQ2iNf+0dI4lh/vbRARHse8H6Mx0GM9e7NAaLXrySOZ+fsKaIoOR YbGLATsDOpAWS8wLOw5YnPjH0G3ujm5tGhccTmq3Au+1c5NFmehEUIEm9ztkRdfI HMH566D6u6jtzX8LFUvcFziZVOcYdJZPARLg19s+0YTydQ4LfvTu9pCyc4hUpeYd pNI12Wwhdns54ijTeBfF1Ys2idIXuCxIzRX1hRQY3LQWLWtL0kcd41VSBYC/xHvM MYCdo6OrGKgJNfVv7f49FTwOpZcIw87bwbd5uvllD9HsoCnD9WAam5j7/ojafp4a q0CkqHAlzE5uuLp1qct4UJFUFfED/5PXu3tLLXQNOL27ipUtbk7yAOSxV6pGSHc0 KC3xWqriBdPdj6wrkAcnb2E8rk0mSInbsL2OFhox5By0gCEJ/NHDLT2Oq2T/820e RZrVX6PvZZnUoDjY1fOkIQmZ0/KdRgoWpd77YIGUCim8WgZDhKg=
    =d9Ye
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Wise@21:1/5 to Adrian Bunk on Mon Sep 5 09:10:01 2022
    On Sun, 2022-04-03 at 14:17 +0300, Adrian Bunk wrote:

    For binaries, I have seen packages in the Debian Med (?) team that build several variants of a program and have a tiny wrapper program that chooses the correct one at startup.

    It appears this is called simd-dispatch and the script is available in
    several packages, but here is an example copy of it.

    https://sources.debian.org/src/scrappie/latest/debian/bin/simd-dispatch

    Probably something like this should move to isa-support, so that it can
    be shared instead of duplicated.

    --
    bye,
    pabs

    https://wiki.debian.org/PaulWise

    -----BEGIN PGP SIGNATURE-----

    iQIzBAABCgAdFiEEYQsotVz8/kXqG1Y7MRa6Xp/6aaMFAmMVm1oACgkQMRa6Xp/6 aaPh7Q/+MEjp9K35xaAb4z434WsnaQYN5lHCargDFw/AWrOW07m7PemilOmyXb52 HazLKN796uD7VTxYr40hR8e51elkH8qEyxcww1JeL+I302MQ/kys1JGP/uxrwyvh WDssdaZKxJRryPSLpp2kfNHDGhTtOHxX6Y59AsgKldH3vXSfXpk0MA8IuhXOKHlO zzeD9L2C1RikMQ0YrOdhobDNiGioJXQb3HIkE0k22N6WS6nFHdeohEaJz6yAOIno uiYm+x0MJMIkZtJtkuaGHC8sBXb3Vp0RPIoJwhdFDruhLO1i/XF+lkpc3WLFWI11 pVANXRLJJ8hKuP0LXfW5xjcRcGUc1uzyuFom4RlvXigI+VbEioV8h7iKL+mIGaf1 UC/rlGM2gpJOXZB/rlMrpRwBerX3enc9H+HpL5aK4etX+NQb7hx42U12qiclQSa9 fc5NIGgcPMb31q30IZrEklkENTghSWyCBpGJyY+KHKkxnt+t1G1gVSa3eogSs1RL 05tP09qJ+HShgU4LNfdLZSzhlfBGOLqB02EkuTQQamy/NsuMibZWjI/PXkSRNgNC tYkTXEXc5N9g591OnCWjyvVCAKOI96AJoEGKE0ynZR45KE4EgV3eu6ATA6Q4Y0tw cUWCfw0A1R/BInDsBYigKYSOt+yrrt/WkeZNcvXuJgcOspVqchk=
    =OHCu
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)