• Update on the glibc segfault issue on Alpha

    From John Paul Adrian Glaubitz@21:1/5 to All on Mon Jan 2 12:50:01 2023
    Hello!

    Just a quick update on the glibc issue we're seeing on Alpha!

    My previous bisecting result was wrong and the segfault I ran into was the result
    of combining ld.so and libc.so files from different builds by not including all necessary paths in my LD_LIBRARY_PATH. So, my previously reported bug report was
    invalid [1].

    However, as can be seen from the build logs, there are still clearly issues with
    the testsuite [2]. Further analysis showed that the problem are static binaries which segfault. Anything dynamically linked, works.

    This becomes obvious when building the glibc Debian package manually with the testsuite
    disabled and then installing these packages inside a test chroot. While the libc-bin
    package is being configured, ldconfig is run - a statically linked binary - and crashes
    with a segfault.

    Adhemerval from glibc upstream is aware of the problem but he has not found yet a solution
    to this issue as it needs to be debugged further. I will try to bisect which particular
    introduced the regression and file a new upstream bug report.

    During our discussion, Adhemerval pointed out that this change [3] might be the culprit
    but I have not been able to verify this yet.

    Adrian

    [1] https://sourceware.org/bugzilla/show_bug.cgi?id=29899
    [2] https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=alpha&ver=2.36-6&stamp=1669706903&raw=0
    [3] https://sourceware.org/pipermail/glibc-cvs/2020q2/069444.html

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to John Paul Adrian Glaubitz on Mon Jan 2 18:40:01 2023
    Hello!

    On 1/2/23 12:44, John Paul Adrian Glaubitz wrote:
    Adhemerval from glibc upstream is aware of the problem but he has not found yet a solution
    to this issue as it needs to be debugged further. I will try to bisect which particular
    introduced the regression and file a new upstream bug report.

    During our discussion, Adhemerval pointed out that this change [3] might be the culprit
    but I have not been able to verify this yet.

    My latest bisecting has shown this change to be the responsible change, CC'ing the author:

    commit 73fc4e28b9464f0e13edc719a5372839970e7ddb (refs/bisect/bad)
    Author: Florian Weimer <fweimer@redhat.com>
    Date: Mon Feb 28 11:50:41 2022 +0100

    Linux: Consolidate auxiliary vector parsing (redo)

    And optimize it slightly.

    This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.

    In _dl_aux_init in elf/dl-support.c, use an explicit loop
    and -fno-tree-loop-distribute-patterns to avoid memset.

    Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adhemerval Zanella Netto@21:1/5 to John Paul Adrian Glaubitz on Tue Jan 3 13:30:01 2023
    On 02/01/23 14:37, John Paul Adrian Glaubitz wrote:
    Hello!

    On 1/2/23 12:44, John Paul Adrian Glaubitz wrote:
    Adhemerval from glibc upstream is aware of the problem but he has not found yet a solution
    to this issue as it needs to be debugged further. I will try to bisect which particular
    introduced the regression and file a new upstream bug report.

    During our discussion, Adhemerval pointed out that this change [3] might be the culprit
    but I have not been able to verify this yet.

    My latest bisecting has shown this change to be the responsible change, CC'ing the author:

    commit 73fc4e28b9464f0e13edc719a5372839970e7ddb (refs/bisect/bad)
    Author: Florian Weimer <fweimer@redhat.com>
    Date:   Mon Feb 28 11:50:41 2022 +0100

        Linux: Consolidate auxiliary vector parsing (redo)
            And optimize it slightly.
            This is commit 8c8510ab2790039e58995ef3a22309582413d3ff revised.
            In _dl_aux_init in elf/dl-support.c, use an explicit loop     and -fno-tree-loop-distribute-patterns to avoid memset.
            Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

    Thanks, this commits helps narrow down the issue. The 73fc4e28b9464f0e refactor did not
    add the GL(dl_phdr) and GL(dl_phnum) for static case, relying on the __ehdr_start symbol
    to get the correct values.

    The issue is for some archs, alpha for instance, the hidden weak reference is not making
    the static linker to define the __ehdr_start address correctly: it is being set to 0 and
    thus GL(dl_phdr) and GL(dl_phnum) are set to invalid values.

    And I am not sure if the hidden weak __ehdr_start does work on all architectures, so I
    think it would be safer to just restore the previous behavior to setup GL(dl_phdr) and
    GL(dl_phnum) for static and we can simplify __ehdr_start fallback case to not use
    a weak ref (as for PIE). I am checking if the following patch trigger any regression,
    at least for alpha it fixes the static failures:

    diff --git a/csu/libc-start.c b/csu/libc-start.c
    index 543560f36c..63a3eceaea 100644
    --- a/csu/libc-start.c
    +++ b/csu/libc-start.c
    @@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
    So we can set up _dl_phdr and _dl_phnum even without any
    information from auxv. */

    - extern const ElfW(Ehdr) __ehdr_start
    -# if BUILD_PIE_DEFAULT
    - __attribute__ ((visibility ("hidden")));
    -# else
    - __attribute__ ((weak, visibility ("hidden")));
    - if (&__ehdr_start != NULL)
    -# endif
    - {
    - assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    - GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    - GL(dl_phnum) = __ehdr_start.e_phnum;
    - }
    + extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
    + assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    + GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    + GL(dl_phnum) = __ehdr_start.e_phnum;
    }

    __tunables_init (__environ);
    diff --git a/sysdeps/unix/sysv
  • From Adhemerval Zanella Netto@21:1/5 to Florian Weimer on Tue Jan 3 15:10:01 2023
    On 03/01/23 10:29, Florian Weimer wrote:
    * Adhemerval Zanella Netto:

    diff --git a/csu/libc-start.c b/csu/libc-start.c
    index 543560f36c..63a3eceaea 100644
    --- a/csu/libc-start.c
    +++ b/csu/libc-start.c
    @@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
    So we can set up _dl_phdr and _dl_phnum even without any
    information from auxv. */

    - extern const ElfW(Ehdr) __ehdr_start
    -# if BUILD_PIE_DEFAULT
    - __attribute__ ((visibility ("hidden")));
    -# else
    - __attribute__ ((weak, visibility ("hidden")));
    - if (&__ehdr_start != NULL)
    -# endif
    - {
    - assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    - GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    - GL(dl_phnum) = __ehdr_start.e_phnum;
    - }
    + extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
    + assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    + GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    + GL(dl_phnum) = __ehdr_start.e_phnum;

    There's a separate thread that e.ph_off is not actually correct in this context because it's a file offset that doesn't necessarily match the
    virtual memory offset:>
    [PATCH 0/1] __libc_start_main() now uses auxv to obtain phdr's address
    [BZ #29864]
    <https://sourceware.org/pipermail/libc-alpha/2022-December/143874.html>


    Yes, but this is a fallback case where the kernel does not provide AT_PHDR
    and AT_PHNUM. As I added on the thread, I think it is better to use the
    kernel provided value since they will always present and it leverages a
    kernel fix for a similar issue [1].


    I think this needs to be cleaned up so that the static and dynamic cases
    are aligned. That is, we probably want to do the equivalent of

    GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    GL(dl_phnum) = __ehdr_start.e_phnum;


    I think we should use the __ehdr_start only as fallback.

    in common code. Ideally, we don't use global variables for that because
    in both cases, we only briefly need these variables.

    Right, this make sense.


    Thanks,
    Florian


    [1] https://sourceware.org/pipermail/libc-alpha/2022-December/143942.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Florian Weimer@21:1/5 to All on Tue Jan 3 15:20:01 2023
    * Adhemerval Zanella Netto:

    diff --git a/csu/libc-start.c b/csu/libc-start.c
    index 543560f36c..63a3eceaea 100644
    --- a/csu/libc-start.c
    +++ b/csu/libc-start.c
    @@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
    So we can set up _dl_phdr and _dl_phnum even without any
    information from auxv. */

    - extern const ElfW(Ehdr) __ehdr_start
    -# if BUILD_PIE_DEFAULT
    - __attribute__ ((visibility ("hidden")));
    -# else
    - __attribute__ ((weak, visibility ("hidden")));
    - if (&__ehdr_start != NULL)
    -# endif
    - {
    - assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    - GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff; - GL(dl_phnum) = __ehdr_start.e_phnum;
    - }
    + extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
    + assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    + GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    + GL(dl_phnum) = __ehdr_start.e_phnum;

    There's a separate thread that e.ph_off is not actually correct in this
    context because it's a file offset that doesn't necessarily match the
    virtual memory offset:

    [PATCH 0/1] __libc_start_main() now uses auxv to obtain phdr's address
    [BZ #29864]
    <https://sourceware.org/pipermail/libc-alpha/2022-December/143874.html>

    I think this needs to be cleaned up so that the static and dynamic cases
    are aligned. That is, we probably want to do the equivalent of

    GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    GL(dl_phnum) = __ehdr_start.e_phnum;

    in common code. Ideally, we don't use global variables for that because
    in both cases, we only briefly need these variables.

    Thanks,
    Florian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Adhemerval Zanella Netto on Wed Jan 4 10:30:01 2023
    Hi Adhemerval!

    On 1/3/23 13:09, Adhemerval Zanella Netto wrote:
    Thanks, this commits helps narrow down the issue. The 73fc4e28b9464f0e refactor did not
    add the GL(dl_phdr) and GL(dl_phnum) for static case, relying on the __ehdr_start symbol
    to get the correct values.

    The issue is for some archs, alpha for instance, the hidden weak reference is not making
    the static linker to define the __ehdr_start address correctly: it is being set to 0 and
    thus GL(dl_phdr) and GL(dl_phnum) are set to invalid values.

    And I am not sure if the hidden weak __ehdr_start does work on all architectures, so I
    think it would be safer to just restore the previous behavior to setup GL(dl_phdr) and
    GL(dl_phnum) for static and we can simplify __ehdr_start fallback case to not use
    a weak ref (as for PIE). I am checking if the following patch trigger any regression,
    at least for alpha it fixes the static failures:

    diff --git a/csu/libc-start.c b/csu/libc-start.c
    index 543560f36c..63a3eceaea 100644
    --- a/csu/libc-start.c
    +++ b/csu/libc-start.c
    @@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
    So we can set up _dl_phdr and _dl_phnum even without any
    information from auxv. */

    - extern const ElfW(Ehdr) __ehdr_start
    -# if BUILD_PIE_DEFAULT
    - __attribute__ ((visibility ("hidden")));
    -# else
    - __attribute__ ((weak, visibility ("hidden")));
    - if (&__ehdr_start != NULL)
    -# endif
    - {
    - assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    - GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff; - GL(dl_phnum) = __ehdr_start.e_phnum;
    - }
    + extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
    + assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    + GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    + GL(dl_phnum) = __ehdr_start.e_phnum;
    }

    __tunables_init (__environ);
    diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
    index bf9374371e..5913c9d6e5 100644
    --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
    +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
    @@ -56,6 +56,10 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
    if (GLRO(dl_sysinfo_dso) != NULL)
    GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
    #endif
    +#ifndef SHARED
    + GL(dl_phdr) = (void*) auxv_values[AT_PHDR];
    + GL(dl_phnum) = auxv_values[AT_PHENT];
    +#endif

    DL_PLATFORM_AUXV
    }

    I can confirm that this patch fixes the problem for me. I have uploaded a manually built glibc
    package with the patch applied to Debian »unreleased« so that the buildds can resume building
    packages again.

    Would it be possible to backport this patch to 2.36 once it has been merged with the current
    development tree? The reason is that I don't think that Debian is going to switch to anything
    beyond 2.36 anytime soon due to the upcoming feature freeze in January.

    Thanks,
    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adhemerval Zanella Netto@21:1/5 to John Paul Adrian Glaubitz on Wed Jan 4 13:20:01 2023
    On 04/01/23 06:25, John Paul Adrian Glaubitz wrote:
    Hi Adhemerval!

    On 1/3/23 13:09, Adhemerval Zanella Netto wrote:
    Thanks, this commits helps narrow down the issue.  The 73fc4e28b9464f0e refactor did not
    add the GL(dl_phdr) and GL(dl_phnum) for static case, relying on the __ehdr_start symbol
    to get the correct values.

    The issue is for some archs, alpha for instance, the hidden weak reference is not making
    the static linker to define the __ehdr_start address correctly: it is being set to 0 and
    thus GL(dl_phdr) and GL(dl_phnum) are set to invalid values.

    And I am not sure if the hidden weak __ehdr_start does work on all architectures, so I
    think it would be safer to just restore the previous behavior to setup GL(dl_phdr) and
    GL(dl_phnum) for static and we can simplify __ehdr_start fallback case to not use
    a weak ref (as for PIE). I am checking if the following patch trigger any regression,
    at least for alpha it fixes the static failures:

    diff --git a/csu/libc-start.c b/csu/libc-start.c
    index 543560f36c..63a3eceaea 100644
    --- a/csu/libc-start.c
    +++ b/csu/libc-start.c
    @@ -271,18 +271,10 @@ LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
               So we can set up _dl_phdr and _dl_phnum even without any
               information from auxv.  */

    -      extern const ElfW(Ehdr) __ehdr_start
    -# if BUILD_PIE_DEFAULT
    -       __attribute__ ((visibility ("hidden")));
    -# else
    -       __attribute__ ((weak, visibility ("hidden")));
    -      if (&__ehdr_start != NULL)
    -# endif
    -        {
    -          assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    -          GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    -          GL(dl_phnum) = __ehdr_start.e_phnum;
    -        }
    +      extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
    +      assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    +      GL(dl_phdr) = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    +      GL(dl_phnum) = __ehdr_start.e_phnum;
          }

        __tunables_init (__environ);
    diff --git a/sysdeps/unix/sysv/linux/dl-parse_auxv.h b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
    index bf9374371e..5913c9d6e5 100644
    --- a/sysdeps/unix/sysv/linux/dl-parse_auxv.h
    +++ b/sysdeps/unix/sysv/linux/dl-parse_auxv.h
    @@ -56,6 +56,10 @@ void _dl_parse_auxv (ElfW(auxv_t) *av, dl_parse_auxv_t auxv_values)
        if (GLRO(dl_sysinfo_dso) != NULL)
          GLRO(dl_sysinfo) = auxv_values[AT_SYSINFO];
      #endif
    +#ifndef SHARED
    +  GL(dl_phdr) = (void*) auxv_values[AT_PHDR];
    +  GL(dl_phnum) = auxv_values[AT_PHENT];
    +#endif

        DL_PLATFORM_AUXV
      }

    I can confirm that this patch fixes the problem for me. I have uploaded a manually built glibc
    package with the patch applied to Debian »unreleased« so that the buildds can resume building
    packages again.

    Would it be possible to backport this patch to 2.36 once it has been merged with the current
    development tree? The reason is that I don't think that Debian is going to switch to anything
    beyond 2.36 anytime soon due to the upcoming feature freeze in January.

    I have sent a more detailed patch [1] and I will probably backport to all affected releases
    once it is accepted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Adhemerval Zanella Netto on Thu Jan 5 12:30:01 2023
    Hi Adhemveral!

    On 1/4/23 13:00, Adhemerval Zanella Netto wrote:
    I can confirm that this patch fixes the problem for me. I have uploaded a manually built glibc
    package with the patch applied to Debian »unreleased« so that the buildds can resume building
    packages again.

    Would it be possible to backport this patch to 2.36 once it has been merged with the current
    development tree? The reason is that I don't think that Debian is going to switch to anything
    beyond 2.36 anytime soon due to the upcoming feature freeze in January.

    I have sent a more detailed patch [1] and I will probably backport to all affected releases
    once it is accepted.

    Great, thanks a lot! This means, the fix will eventually be available in Debian and we won't have
    to carry the patch for a long time.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)