• [gentoo-dev] [PATCH] profiles/arch/amd64: add "-mfpmath=sse" to CFLAGS_

    From Mike Gilbert@21:1/5 to All on Tue Oct 18 04:00:01 2022
    GCC uses x87 floating point instructions when building 32-bit x86
    code by default. This is true even for x86-64 multilib.

    Using the x87 floating point unit can lead to strange behavior when
    calculating intermediate values for single and double precision floats.
    It uses 80 bits for all calculations, which is larger than the 32 or 64
    bits specified for floats and doubles.

    Using the SSE2 instructions available on x86-64 for floating point
    arithmetic leads to more consistent behavior, and is usually faster.

    Reference: https://gcc.gnu.org/wiki/x87note
    Signed-off-by: Mike Gilbert <floppym@gentoo.org>
    ---
    profiles/arch/amd64/make.defaults | 4 ++--
    1 file changed, 2 insertions(+), 2 deletions(-)

    diff --git a/profiles/arch/amd64/make.defaults b/profiles/arch/amd64/make.defaults
    index 0c05dec124e..e7e18ff6a91 100644
    --- a/profiles/arch/amd64/make.defaults
    +++ b/profiles/arch/amd64/make.defaults
    @@ -1,4 +1,4 @@
    -# Copyright 1999-2021 Gentoo Authors
    +# Copyright 1999-2022 Gentoo Authors
    # Distributed under the terms of the GNU General Public License v2

    ARCH="amd64"
    @@ -28,7 +28,7 @@ LDFLAGS_amd64="-m elf_x86_64"
    CHOST_amd64="x86_64-pc-linux-gnu"

    # 32bit specific settings.
    -CFLAGS_x86="-m32"
    +CFLAGS_x86="-m32 -mfpmath=sse"
    LDFLAGS_x86="-m elf_i386"
    CHOST_x86="i686-pc-linux-gnu"

    --
    2.37.3

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ulrich Mueller@21:1/5 to Which on Tue Oct 18 10:20:01 2022
    On Tue, 18 Oct 2022, Mike Gilbert wrote:

    Reference: https://gcc.gnu.org/wiki/x87note

    Which says:

    | ... the amount of worst-case error that could possibly happen using
    | the x87 (with any amount of intermediate rounding) is at worst the
    | same as true 64 or 32 bit arithmetic, and in practice is almost always
    | better.

    and:

    | Note, however, that this greater repeatability comes at the cost of
    | lost precision (i.e. SSE always gets the same precision because it
    | always takes the equivalent of the x87's worst case: a forced round
    | down at each step).

    So, it comes with a price, and I wonder if we shouldn't leave that
    choice to the user, and go with the upstream GCC default?

    -CFLAGS_x86="-m32"
    +CFLAGS_x86="-m32 -mfpmath=sse"

    Also, why add the flag only to CFLAGS_x86 but not to CFLAGS_amd64?
    They should have the same single and double precision arithmetic?

    Ulrich

    -----BEGIN PGP SIGNATURE-----

    iQFDBAEBCAAtFiEEtDnZ1O9xIP68rzDbUYgzUIhBXi4FAmNOYHYPHHVsbUBnZW50 b28ub3JnAAoJEFGIM1CIQV4uUZ4H/ib3cUJcJX4x12Yd3ywp3PaNQzVymbkrTOhD h3bhuocHkwLK8yi3VJQGtD6uZXQ5KgykqO78Yvtv9JjVIvKjk0hbxCngxvQ17HaH Cna3KWRMuVPA0aeJHEz6raGaQv/MTVQM9G2hlZIU+mnBbthI1BiTPuA44gsI1+IW NVu9mqjckFPoD3R9WRrzEsbF98nghLN9YGfoN9lPeeCk9AaNt2w3mdFC1+clFcYu DIiLeFJxocVZhYIp651QOe4kRArubxoff3Is01mTbGzA6l7traUbAp/2Cica8cLF uzVVjlVI2rqa3twrB01Zm7udyh2LZhmaRArmCD21kIyjoywF3IY=
    =9G54
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Seifert@21:1/5 to Ulrich Mueller on Tue Oct 18 12:00:01 2022
    On Tue, 2022-10-18 at 10:14 +0200, Ulrich Mueller wrote:
    On Tue, 18 Oct 2022, Mike Gilbert wrote:

    Reference: https://gcc.gnu.org/wiki/x87note

    Which says:

    ... the amount of worst-case error that could possibly happen using
    the x87 (with any amount of intermediate rounding) is at worst the
    same as true 64 or 32 bit arithmetic, and in practice is almost
    always
    better.

    and:

    Note, however, that this greater repeatability comes at the cost of
    lost precision (i.e. SSE always gets the same precision because it
    always takes the equivalent of the x87's worst case: a forced round
    down at each step).

    So, it comes with a price, and I wonder if we shouldn't leave that
    choice to the user, and go with the upstream GCC default?

    -CFLAGS_x86="-m32"
    +CFLAGS_x86="-m32 -mfpmath=sse"

    -mfpmath=sse is already the default on amd64.

    Also, why add the flag only to CFLAGS_x86 but not to CFLAGS_amd64?
    They should have the same single and double precision arithmetic?

    Ulrich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ulrich Mueller@21:1/5 to All on Tue Oct 18 13:50:01 2022
    On Tue, 18 Oct 2022, David Seifert wrote:

    -CFLAGS_x86="-m32"
    +CFLAGS_x86="-m32 -mfpmath=sse"

    -mfpmath=sse is already the default on amd64.

    I see. This change makes sense then.

    What about profiles/arch/x86 though? IIUC we'll end up with an
    inconsistency between x86 and multilib amd64.

    Ulrich

    -----BEGIN PGP SIGNATURE-----

    iQFDBAEBCAAtFiEEtDnZ1O9xIP68rzDbUYgzUIhBXi4FAmNOkMkPHHVsbUBnZW50 b28ub3JnAAoJEFGIM1CIQV4usGAIAMFIzCxGHwIXpQebihjaptRoYDG9p074OJqK YwCXHc9D/JdVv/wR2DqCJkrK7Bmrm/8YZA6zNkLvKmpNc4dhwamxNP0DpZfdroo2 Hd0WCsa4ukcGoLyQnVtOYC5t9+pFfmObkKxUI+y44VSIJ1mJ4SccNaTmRf7bB2rA uZlReM5GFf8/6jWlgg7eIehVEu8paWlygWhCt/0kZvKjO3a00r5fNkaWCIX9HzQn bOHqpG4RjNhqlmtQUV0opS5B1DIoIOshxoFaXYYChPdWMk+RcvXZcjU+S0fdSwIR bKyZugNxg07J8chGXZX6AgQoE3fy8LJ46Rp3fdQ1nwj2Jjx6ZRY=
    =mZ5q
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Seifert@21:1/5 to Ulrich Mueller on Tue Oct 18 15:40:01 2022
    On Tue, 2022-10-18 at 13:40 +0200, Ulrich Mueller wrote:
    On Tue, 18 Oct 2022, David Seifert wrote:

    -CFLAGS_x86="-m32"
    +CFLAGS_x86="-m32 -mfpmath=sse"

    -mfpmath=sse is already the default on amd64.

    I see. This change makes sense then.

    What about profiles/arch/x86 though? IIUC we'll end up with an
    inconsistency between x86 and multilib amd64.

    Ulrich

    What if I want to build Gentoo on an old AMD Thunderbird which has
    neither SSE1 nor the more important SSE2?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Gilbert@21:1/5 to soap@gentoo.org on Tue Oct 18 18:00:01 2022
    On Tue, Oct 18, 2022 at 5:56 AM David Seifert <soap@gentoo.org> wrote:

    On Tue, 2022-10-18 at 10:14 +0200, Ulrich Mueller wrote:
    On Tue, 18 Oct 2022, Mike Gilbert wrote:

    Reference: https://gcc.gnu.org/wiki/x87note

    Which says:

    ... the amount of worst-case error that could possibly happen using
    the x87 (with any amount of intermediate rounding) is at worst the
    same as true 64 or 32 bit arithmetic, and in practice is almost
    always
    better.

    and:

    Note, however, that this greater repeatability comes at the cost of
    lost precision (i.e. SSE always gets the same precision because it
    always takes the equivalent of the x87's worst case: a forced round
    down at each step).

    So, it comes with a price, and I wonder if we shouldn't leave that
    choice to the user, and go with the upstream GCC default?

    -CFLAGS_x86="-m32"
    +CFLAGS_x86="-m32 -mfpmath=sse"

    -mfpmath=sse is already the default on amd64.

    I have amended the first paragraph to make this more clear:

    GCC uses x87 floating point instructions when building 32-bit x86
    code by default. When building 64-bit code, SSE2 instructions are used
    instead.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Gilbert@21:1/5 to soap@gentoo.org on Tue Oct 18 17:20:02 2022
    On Tue, Oct 18, 2022 at 9:37 AM David Seifert <soap@gentoo.org> wrote:

    On Tue, 2022-10-18 at 13:40 +0200, Ulrich Mueller wrote:
    On Tue, 18 Oct 2022, David Seifert wrote:

    -CFLAGS_x86="-m32"
    +CFLAGS_x86="-m32 -mfpmath=sse"

    -mfpmath=sse is already the default on amd64.

    I see. This change makes sense then.

    What about profiles/arch/x86 though? IIUC we'll end up with an inconsistency between x86 and multilib amd64.

    Ulrich

    What if I want to build Gentoo on an old AMD Thunderbird which has
    neither SSE1 nor the more important SSE2?

    Right. On amd64 CPU always supports SSE2, so -mfpmath=sse will always
    work there.

    On x86, we need to consider a more diverse set of supported instructions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ulrich Mueller@21:1/5 to All on Tue Oct 18 18:50:01 2022
    On Tue, 18 Oct 2022, David Seifert wrote:

    What if I want to build Gentoo on an old AMD Thunderbird which has
    neither SSE1 nor the more important SSE2?

    The -mfpmath=sse option is a no-op if the CPU doesn't support SSE,
    i.e. it will use 387 arithmetics nevertheless.

    -----BEGIN PGP SIGNATURE-----

    iQFDBAEBCAAtFiEEtDnZ1O9xIP68rzDbUYgzUIhBXi4FAmNO2LgPHHVsbUBnZW50 b28ub3JnAAoJEFGIM1CIQV4u+mAIALgVVjrE8L9X0PArC3yW3fscJB/eLeBwd6kS Q/NI+fspeyOA2j6JjTg8egWwjaiG0Xmo0TBNvZx8wOWsZHEg/rr6yUZYSSuoh0lv 7SLjwdCb3eWH2tyxMfobPiUPp+3ukZA4bAmTjVmAgMwyOjV6u2t5aEqrFgxd1++4 oo7sNOm+d/FG+XHm2AhSRW7mX36JRunS6puNpl2FNRRjjdZy46AHbJtrgNo3R27K KytccqH/68rCZDO964dYiII9kAymsis0d843yceZ5HvnvCkqp/EZcQsr917bJxGd VvZ4Jp5S878W4zEbF8WuQSb7lhsTgVxcvzhPibUfaMB9FZQ98FQ=
    =Y7rv
    -----END PGP SIGNATURE-----

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Mike Gilbert@21:1/5 to ulm@gentoo.org on Tue Oct 18 19:40:01 2022
    On Tue, Oct 18, 2022 at 12:47 PM Ulrich Mueller <ulm@gentoo.org> wrote:

    On Tue, 18 Oct 2022, David Seifert wrote:

    What if I want to build Gentoo on an old AMD Thunderbird which has
    neither SSE1 nor the more important SSE2?

    The -mfpmath=sse option is a no-op if the CPU doesn't support SSE,
    i.e. it will use 387 arithmetics nevertheless.

    I don't really see an "effective" way to deploy this via profiles on x86.

    We could add it to the default CFLAGS setting in profiles/arch/x86/make.defaults. However, we also default to
    -march=i686 there, and that doesn't support SSE or SSE2. Also, the
    entire CFLAGS variable is likely to be overridden by the CFLAGS
    setting in /etc/make.conf.

    The CFLAGS_x86 profile variable is only used by the
    multilib_toolchain_setup function in multilib.eclass. In other words,
    it only affects ebuilds that utilize the multilib eclasses to build
    libraries for multiple ABIs. That covers all 32-bit libraries on
    amd64, but doesn't cover all packages on x86.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)