• is there some known regression with gcc on armel?

    From Luca Olivetti@21:1/5 to All on Thu Aug 19 09:10:01 2021
    Hello,

    I upgraded a test machine (buffalo linkstation pro/Marvell Orion5x) from
    buster to bullseye, then I rebuilt the dspam[*] deb (since it's been
    dropped by debian since buster, I did the same there).
    The newly built binary doesn't start, complaining of a configuration
    error, the binary built with buster still works.
    I traced the execution with gdb and it doesn't make sense: a function is
    called with a pointer to the configuration struct but inside the
    function it is null.
    I then recompiled it with -O0 instead of the default -O2 and this time
    it works (will try later with -O1).
    I'm not very familiar with gcc, so do you know of any regression with
    gcc optimizations on armel (maybe with some other package needing
    special optimization options)?
    Note that I had to add the "-z muldefs" option to the linker, but I
    don't think that's the problem (the null I saw wasn't a global variable
    but a parameter).

    [*] I know it's a dead project, but it still works and it is
    surprisingly effective and lightweight on such an under powered machine.

    Bye
    --
    Luca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arnd Bergmann@21:1/5 to luca@ventoso.org on Thu Aug 19 10:10:01 2021
    On Thu, Aug 19, 2021 at 9:03 AM Luca Olivetti <luca@ventoso.org> wrote:

    Hello,

    I upgraded a test machine (buffalo linkstation pro/Marvell Orion5x) from buster to bullseye, then I rebuilt the dspam[*] deb (since it's been
    dropped by debian since buster, I did the same there).
    The newly built binary doesn't start, complaining of a configuration
    error, the binary built with buster still works.
    I traced the execution with gdb and it doesn't make sense: a function is called with a pointer to the configuration struct but inside the
    function it is null.
    I then recompiled it with -O0 instead of the default -O2 and this time
    it works (will try later with -O1).
    I'm not very familiar with gcc, so do you know of any regression with
    gcc optimizations on armel (maybe with some other package needing
    special optimization options)?
    Note that I had to add the "-z muldefs" option to the linker, but I
    don't think that's the problem (the null I saw wasn't a global variable
    but a parameter).

    [*] I know it's a dead project, but it still works and it is
    surprisingly effective and lightweight on such an under powered machine.

    The old Marvell CPUs have a known bug when processing the ldrd/strd instructions on misaligned pointers, which leads to incorrect data
    instead of trapping into the kernel.

    This only happens for incorrect source code that relies on undefined
    behavior, accessing a pointer to a 64-bit 'long long' variable that is
    not naturally aligned. This happens to work on most CPUs including
    all x86 and armv6+, and mostly works on armv5 because the kernel
    works around the undefined behavior by fixing up the load in an
    exception handler.

    openwrt actually carried a patch against gcc for this[1] in the past,
    though with a misleading description (this is only a bug on Marvell
    CPUs, not ARM926, and gcc doesn't seem to do anything it
    shouldn't be allowed for correct source code).

    To confirm that this is the actual problem, can you try building the
    package using '-O2 -march=armv4t' or '-O2 -march=armv5t' to
    override the default 'armv5te'?

    Regarding the question why this showed up now, I can only guess,
    probably a combination of multiple factors:

    - CPU architectures such as ARMv5 without native unaligned
    access are much less common than they used to be, as
    the industry is converging on x86/armv6+/riscv, so bugs in
    application source code don't get found as quickly as they
    used to

    - Any armel binaries from before the Debian Buster release were
    built for ARMv4T rather than ARMv5TE, so they did not use ldrd/strd
    at all.

    - Newer GCC versions tend to find better optimizations, so they
    may use LDRD/STRD when old versions did not.

    Arnd

    [1] https://gitce.net/mirrors/openwrt/commit/b050f87d13b5dc7ed82feb9a90f4529de58bdf25

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Olivetti@21:1/5 to All on Thu Aug 19 10:10:01 2021
    El 19/8/21 a les 9:03, Luca Olivetti ha escrit:
    I then recompiled it with -O0 instead of the default -O2 and this time
    it works (will try later with -O1).

    Nope, with -O1 same failure.

    Bye
    --
    Luca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Olivetti@21:1/5 to All on Thu Aug 19 11:00:01 2021
    El 19/8/21 a les 10:05, Arnd Bergmann ha escrit:

    To confirm that this is the actual problem, can you try building the
    package using '-O2 -march=armv4t' or '-O2 -march=armv5t' to
    override the default 'armv5te'?

    With -march=armv5t it doesn't work (same failure), I'll try with armv4t.

    FWIW this is the gdb session


    (gdb) run
    Starting program: /home/luca/dspam/dspam-3.10.1+dfsg/src/.libs/dspam
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/arm-linux-gnueabi/libthread_db.so.1".

    Breakpoint 1, main (argc=1, argv=0xbefff714) at dspam.c:196
    196 if (apply_defaults(&ATX)) {
    (gdb) step
    apply_defaults (ATX=0xbeffe758) at agent_shared.c:587
    587 int apply_defaults(AGENT_CTX *ATX) {
    (gdb) step
    591 if (!(ATX->flags & DAF_FIXED_TR_MODE)) {
    (gdb) step
    592 char *v = _ds_read_attribute(agent_config, "TrainingMode");
    (gdb) print agent_config
    $1 = (config_t) 0x43fb98
    (gdb) step
    _ds_read_attribute (config=0x0, key=0x417f04 "TrainingMode") at config_shared.c:132
    132 attribute_t attr = _ds_find_attribute(config, key);

    Bye
    --
    Luca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Olivetti@21:1/5 to All on Thu Aug 19 11:50:02 2021
    El 19/8/21 a les 10:56, Luca Olivetti ha escrit:
    El 19/8/21 a les 10:05, Arnd Bergmann ha escrit:

    To confirm that this is the actual problem, can you try building the
    package using '-O2 -march=armv4t' or '-O2 -march=armv5t' to
    override the default 'armv5te'?

    With -march=armv5t it doesn't work (same failure), I'll try with armv4t.

    Nope, same result


    FWIW this is the gdb session


    (gdb) run
    Starting program: /home/luca/dspam/dspam-3.10.1+dfsg/src/.libs/dspam
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/arm-linux-gnueabi/libthread_db.so.1".

    Breakpoint 1, main (argc=1, argv=0xbefff714) at dspam.c:196
    196       if (apply_defaults(&ATX)) {
    (gdb) step
    apply_defaults (ATX=0xbeffe758) at agent_shared.c:587
    587     int apply_defaults(AGENT_CTX *ATX) {
    (gdb) step
    591       if (!(ATX->flags & DAF_FIXED_TR_MODE)) {
    (gdb) step
    592         char *v = _ds_read_attribute(agent_config, "TrainingMode");
    (gdb) print agent_config
    $1 = (config_t) 0x43fb98
    (gdb) step
    _ds_read_attribute (config=0x0, key=0x417f04 "TrainingMode") at config_shared.c:132
    132       attribute_t attr = _ds_find_attribute(config, key);

    Bye

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Luca Olivetti@21:1/5 to All on Thu Aug 19 12:20:01 2021
    El 19/8/21 a les 11:44, Luca Olivetti ha escrit:
    El 19/8/21 a les 10:56, Luca Olivetti ha escrit:
    El 19/8/21 a les 10:05, Arnd Bergmann ha escrit:

    To confirm that this is the actual problem, can you try building the
    package using '-O2 -march=armv4t' or '-O2 -march=armv5t' to
    override the default 'armv5te'?

    With -march=armv5t it doesn't work (same failure), I'll try with armv4t.

    Nope, same result


    FWIW this is the gdb session


    (gdb) run
    Starting program: /home/luca/dspam/dspam-3.10.1+dfsg/src/.libs/dspam
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library
    "/lib/arm-linux-gnueabi/libthread_db.so.1".

    Breakpoint 1, main (argc=1, argv=0xbefff714) at dspam.c:196
    196       if (apply_defaults(&ATX)) {
    (gdb) step
    apply_defaults (ATX=0xbeffe758) at agent_shared.c:587
    587     int apply_defaults(AGENT_CTX *ATX) {
    (gdb) step
    591       if (!(ATX->flags & DAF_FIXED_TR_MODE)) {
    (gdb) step
    592         char *v = _ds_read_attribute(agent_config, "TrainingMode");
    (gdb) print agent_config
    $1 = (config_t) 0x43fb98
    (gdb) step
    _ds_read_attribute (config=0x0, key=0x417f04 "TrainingMode") at
    config_shared.c:132
    132       attribute_t attr = _ds_find_attribute(config, key);

    Tried again with -O0 and in the last step "config" has the correct
    address (though it is different than every previous run with -O2, where
    it was consistently 0x43fb98)

    (gdb) run
    Starting program: /home/luca/dspam/dspam-3.10.1+dfsg/src/.libs/dspam
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/arm-linux-gnueabi/libthread_db.so.1".

    Breakpoint 1, main (argc=1, argv=0xbefff714) at dspam.c:196
    step
    196 if (apply_defaults(&ATX)) {
    (gdb) step
    apply_defaults (ATX=0xbeffe768) at agent_shared.c:587
    587 int apply_defaults(AGENT_CTX *ATX) {
    (gdb) step
    591 if (!(ATX->flags & DAF_FIXED_TR_MODE)) {
    (gdb) step
    592 char *v = _ds_read_attribute(agent_config, "TrainingMode");
    (gdb) print agent_config
    $1 = (config_t) 0x449b98
    (gdb) step
    _ds_read_attribute (config=0x449b98, key=0x4229a0 "TrainingMode") at config_shared.c:132
    132 attribute_t attr = _ds_find_attribute(config, key);


    Bye
    --
    Luca

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)