• LTO and ABI compatibility

    From Ian Jackson@21:1/5 to All on Tue Jul 19 19:30:01 2022
    I have just received Bug#1015348 reporting that adns doesn't work with
    LTO (link-time optimisation).

    How does LTO work with ABI compatibility, which we rely on very
    heavily ? Eg, my reading of the spec is as follows: if I add members
    to an enum in a new library version, making a combined program
    containing translation units with the old enum, and ones with the new
    enum, is UB.[1]

    But that is precisely what we do when we run new binaries against old libraries.

    I think src:adns only does things which are justified by traditional
    ABI compatibility assumptions (albeit, that in some parts of the build
    it makes these assumptions when linking statically, as well as when
    linking dynamically).

    So what, precisely and formally, are the rules ?

    I feel entitled to demand a fully precise and formal specification of
    the rules, because it is precisely fully precise and formal and
    expansive readings of C's literally-incomprehensible[2] specifications
    which are being used to justify miscompilations (so-called
    "optimisations").


    Frannkly, I think enabling LTO by default is a mistake. The
    performance benefits are not likely to be worth the bugs silently
    introduced across our codebase. If there are particular programs that
    would benefit from it, by all means enable it in those cases.

    Ian.

    [1] Assuming that the enum type is used in a relevant way.

    [2] If anyone doubts that the C specification is literally
    incomprehensible, observe, for example, the existence of research
    papers with titles like "Towards a formal semantics for C", or indeed
    the absolutely hilarious discovery that the specification forgot to
    define the meaning of assigments when the assignment target was
    written in parentheses, and that no-one noticed this for decades.

    --
    Ian Jackson <ijackson@chiark.greenend.org.uk> These opinions are my own.

    Pronouns: they/he. If I emailed you from @fyvzl.net or @evade.org.uk,
    that is a private address which bypasses my fierce spamfilter.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hideki Yamane@21:1/5 to Ian Jackson on Fri Jul 22 19:40:01 2022
    On Tue, 19 Jul 2022 18:24:07 +0100
    Ian Jackson <ijackson@chiark.greenend.org.uk> wrote:
    Frannkly, I think enabling LTO by default is a mistake. The
    performance benefits are not likely to be worth the bugs silently
    introduced across our codebase.

    Fedora, openSUSE, and Ubuntu did LTO by default, and I've not heard about
    any wrong. Is the situation in Debian differ from theirs?

    https://fedoraproject.org/wiki/LTOByDefault
    https://en.opensuse.org/openSUSE:LTO
    https://wiki.ubuntu.com/ToolChain/LTO


    If not, we can go with it by default and do some opt-out.


    --
    Hideki Yamane <henrich@iijmio-mail.jp>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Seth Arnold@21:1/5 to Hideki Yamane on Sat Jul 23 02:50:01 2022
    On Fri, Jul 22, 2022 at 03:30:16PM +0200, Hideki Yamane wrote:
    Fedora, openSUSE, and Ubuntu did LTO by default, and I've not heard about
    any wrong. Is the situation in Debian differ from theirs?

    Not everything works with LTO right away: debugging build problems or
    runtime problems takes effort.

    Ubuntu has kept track of some problematic packages via a new package: https://launchpad.net/ubuntu/+source/lto-disabled-list/+changelog

    Debian supports more architectures than Ubuntu or Fedora or OpenSUSE; this might mean there's costs unique to Debian. I suspect, without evidence,
    that Debian may have more diversity of usecases, and thus might find
    problems in software that are unnoticed in the other distributions.

    So while I think the Ubuntu experience is promising for enabling LTO in
    Debian, I don't think it's necessarily the exact same path.

    (Don't read this as encouragement or discouragement -- I just wanted to
    give a link to the lto-disabled-list package in the hopes that it helps
    inform the discussion.)

    Thanks

    -----BEGIN PGP SIGNATURE-----

    iQEzBAABCgAdFiEEQVAQ8bojyMcg37H18yFyWZ2NLpcFAmLbQs0ACgkQ8yFyWZ2N LpdKFgf/X1lxl/iwuG5l/+t4tzxwlicHJN7H9Vt8FBEuHagl0AtyisQ21530SxmT y0HpxzzOKSqYlM8TmmXy15HPw0cuF+eIIY5jvG5zVoFzdbkrhZM+uWzuDlN1OtpJ Htzgm7TYGOtAZC5rTQxP6EgBmxNFUhNamfLL3EBojtQdk9qaFzD2xFiU60PlgV1R 5atcSsHuBH2/V1C/0LlV8UbxfD0lr0e/mZOzDCkawx4GTLk6oj+/riJhJ3uxSKx6 RO39tP/zd34b82edBDByGzutB43uREr7MLbmzORPNJ+MYr0lWO0Lh4ylMw2nzVbJ VyL2t/O4hspVrGcrKFhm+LA2oLggVQ==
    =83mE
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?SsOpcsOpbXkgTGFs?=@21:1/5 to All on Sat Jul 23 08:50:02 2022
    Le sam. 23 juil. 2022 à 02:45, Seth Arnold <seth.arnold@canonical.com> a écrit :

    On Fri, Jul 22, 2022 at 03:30:16PM +0200, Hideki Yamane wrote:
    Fedora, openSUSE, and Ubuntu did LTO by default, and I've not heard
    about
    any wrong. Is the situation in Debian differ from theirs?

    Not everything works with LTO right away: debugging build problems or
    runtime problems takes effort.

    Ubuntu has kept track of some problematic packages via a new package: https://launchpad.net/ubuntu/+source/lto-disabled-list/+changelog


    While the list seems big, It looks like it has packages that don't compile stuff,
    like some pure js node modules. In particular, node-babel may not even be
    an actual package name.

    Jérémy

    <div dir="ltr"><div dir="ltr"><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le sam. 23 juil. 2022 à 02:45, Seth Arnold &lt;<a href="mailto:seth.arnold@canonical.com">seth.arnold@canonical.com</a>&gt; a écrit :<br></div><
    blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Fri, Jul 22, 2022 at 03:30:16PM +0200, Hideki Yamane wrote:<br>
    &gt;  Fedora, openSUSE, and Ubuntu did LTO by default, and I&#39;ve not heard about<br>
    &gt;  any wrong. Is the situation in Debian differ from theirs?<br>

    Not everything works with LTO right away: debugging build problems or<br> runtime problems takes effort.<br>

    Ubuntu has kept track of some problematic packages via a new package:<br>
    <a href="https://launchpad.net/ubuntu/+source/lto-disabled-list/+changelog" rel="noreferrer" target="_blank">https://launchpad.net/ubuntu/+source/lto-disabled-list/+changelog</a></blockquote><div><br></div><div>While the list seems big, It looks like it
    has packages that don&#39;t compile stuff,</div><div>like some pure js node modules. In particular, node-babel may not even be an actual package name.</div><div><br></div><div>Jérémy</div><div><br></div></div></div>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hideki Yamane@21:1/5 to Seth Arnold on Sat Jul 23 22:20:01 2022
    On Sat, 23 Jul 2022 00:37:37 +0000
    Seth Arnold <seth.arnold@canonical.com> wrote:
    Debian supports more architectures than Ubuntu or Fedora or OpenSUSE; this might mean there's costs unique to Debian.

    How about applying it to only amd64 and arm64 first, then expanding
    to other archs? It is an efficient way, IMO.


    --
    Hideki Yamane <henrich@iijmio-mail.jp>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Simon Richter@21:1/5 to Ian Jackson on Mon Jul 25 20:30:01 2022
    Hi Ian,

    On 7/19/22 19:24, Ian Jackson wrote:

    How does LTO work with ABI compatibility, which we rely on very
    heavily ?

    Symbols that are visible to the dynamic linker or that have their
    address taken are hard borders for optimization, even in non-LTO builds.

    For example,

    int a() { return 1; }
    int b() { return a(); }

    will compile both functions to "mov eax, 1; ret" with -O2, but if you
    also set -fPIC, b() becomes "xor eax, eax; jmp a@PLT" to allow a
    LD_PRELOAD library to override a.

    It is allowed to generate constant-propagation or inlined versions of externally visible functions, but these may be called only if the symbol
    has not been overridden, gcc knows to check for this, but the lookup is expensive enough that it is only generated as a check before a loop.

    Eg, my reading of the spec is as follows: if I add members
    to an enum in a new library version, making a combined program
    containing translation units with the old enum, and ones with the new
    enum, is UB.[1]

    If you add members in a way that does not cause existing definitions to
    change, that is supposed to be ABI compatible. LTO builds may emit a
    warning that the types are different because now there is a way to
    detect this case.

    Frannkly, I think enabling LTO by default is a mistake. The
    performance benefits are not likely to be worth the bugs silently
    introduced across our codebase. If there are particular programs that
    would benefit from it, by all means enable it in those cases.

    I agree that the performance isn't worth it, but I find the extended diagnostics useful, I've found quite a few bugs this way, mostly
    inconsistent duplicate declarations in different header files that were
    never included together.

    Simon

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)