• sparc64 pbuilder/cowbuilder setup with qemu

    From Gregor Riepl@21:1/5 to All on Sun Dec 18 18:20:01 2022
    Hi,

    Since the sparc64 build hosts @The GCC Compile farm are currently a bit
    hard to reach and my Ultra 10 isn't in working order, I was thinking of
    setting up a local Qemu cowbuilder/pbuilder installation.

    But I ran into a problem after following
    https://wiki.debian.org/Sparc64Qemu :

    I: debootstrap finished
    I: copying local configuration
    I: Installing apt-lines
    I: Refreshing the base.tgz
    I: upgrading packages
    I: mounting /proc filesystem
    I: mounting /sys filesystem
    I: creating /{dev,run}/shm
    I: mounting /dev/pts filesystem
    I: redirecting /dev/ptmx to /dev/pts/ptmx
    I: installing dummy policy-rc.d
    Reading package lists...
    E: Method gave invalid 400 URI Failure message: Could not switch saved set-user-ID
    E: Method http has died unexpectedly!
    E: Sub-process http returned an error code (112)

    This happens both with cowbuilder as well as cowbuilder-dist.

    The command line I used:

    sudo cowbuilder --create --basepath sid-sparc64.cow --distribution sid --debootstrapopts --arch=sparc64 --debootstrapopts --keyring=/usr/share/keyrings/debian-ports-archive-keyring.gpg --mirror http://ftp.ports.debian.org/debian-ports/

    (similar to the cowbuilder-dist line in the wiki)

    What am I missing, or how can I debug this problem?

    Regards,
    Gregor

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Gregor Riepl on Sun Dec 18 19:00:02 2022
    Hello Gregor!

    On 12/18/22 18:11, Gregor Riepl wrote:
    E: Method gave invalid 400 URI Failure message: Could not switch saved set-user-ID
    E: Method http has died unexpectedly!
    E: Sub-process http returned an error code (112)

    This seems to be the problem and related to apt. For some reason, apt is not able to
    switch its user id to "_apt". I'm not sure yet why.

    However, I could create a chroot with sbuild-createchroot:

    # sbuild-createchroot --keyring=/usr/share/keyrings/debian-ports-archive-keyring.gpg --arch=sparc64 \
    --include=debian-ports-archive-keyring sid /srv/chroot/sid-sparc64-sbuild \
    http://ftp.ports.debian.org/debian-ports/

    and then fix the apt error with:

    # echo "APT::Sandbox::User root;" > /srv/chroot/sid-sparc64-sbuild/etc/apt/apt.conf.d/10userid

    We need to find out what the reason for the apt user-mapping problem is.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gregor Riepl@21:1/5 to All on Sun Dec 18 23:40:02 2022
    E: Method gave invalid 400 URI Failure message: Could not switch saved set-user-ID
    E: Method http has died unexpectedly!
    E: Sub-process http returned an error code (112)

    This seems to be the problem and related to apt. For some reason, apt is not able to
    switch its user id to "_apt". I'm not sure yet why.

    So... next attempt with pbuilder and strace:

    sudo strace -f pbuilder create --loglevel D --debootstrapopts
    --arch=sparc64 --debootstrapopts --keyring=/usr/share/keyrings/debian-ports-archive-keyring.gpg
    --buildplace pbuilder/sid-sparc64.cow --mirror http://ftp.ports.debian.org/debian-ports/ --distribution sid --no-targz 2>pbuilder.strace

    What I can gather from the 600MB trace:

    [pid 102267] execve("/usr/lib/apt/methods/http",
    ["/usr/lib/apt/methods/http"], 0x2674910 /* 28 vars */ <unfinished ...>
    ...
    [pid 102267] write(1, "400 URI Failure\nMessage: Could n"..., 76) = 76
    ...
    [pid 102267] exit_group(112) = ?
    ...
    [pid 102267] +++ exited with 112 +++

    So the failing binary is most likely: /usr/lib/apt/methods/http
    Exit code: 112
    Last thing it was processing before failing: /var/lib/apt/lists/partial/ftp.ports.debian.org_debian-ports_dists_sid_InRelease

    Interestingly, it doesn't seem to have any issue setting the uid and
    euid to _apt:

    [pid 102267] setgroups(1, [0] <unfinished ...>
    [pid 102267] <... setgroups resumed>) = 0
    ...
    [pid 102267] setresgid(65534, 65534, 65534) = 0
    [pid 102267] setresuid(42, 42, 42) = 0
    [pid 102267] getgid() = 65534
    [pid 102267] getegid() = 65534
    [pid 102267] getuid() = 42
    [pid 102267] geteuid() = 42
    [pid 102267] getresuid([42], [42], [42]) = 0
    [pid 102267] write(1, "400 URI Failure\nMessage: Could n"..., 76) = 76

    42 and 65534 are in fact the uid and gid of the _apt user, according to pbuilder/sid-sparc64.cow/etc/passwd . Maybe it expects something else
    instead? But why it call setresuid(_apt, _apt, _apt) then...

    This could be a red herring, though. Maybe there is really an issue with
    the Release file or an HTTP error, but it's not detected/reported
    correctly. Fun fact: /usr/lib/apt/methods/http seems to be
    multi-threaded. Doesn't make it easier to trace...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gregor Riepl@21:1/5 to All on Mon Dec 19 22:20:02 2022
    [pid 102267] setgroups(1, [0] <unfinished ...>
    [pid 102267] <... setgroups resumed>)   = 0
    ...
    [pid 102267] setresgid(65534, 65534, 65534) = 0
    [pid 102267] setresuid(42, 42, 42)      = 0
    [pid 102267] getgid()                   = 65534
    [pid 102267] getegid()                  = 65534
    [pid 102267] getuid()                   = 42
    [pid 102267] geteuid()                  = 42
    [pid 102267] getresuid([42], [42], [42]) = 0
    [pid 102267] write(1, "400 URI Failure\nMessage: Could n"..., 76) = 76

    42 and 65534 are in fact the uid and gid of the _apt user, according to pbuilder/sid-sparc64.cow/etc/passwd . Maybe it expects something else instead? But why it call setresuid(_apt, _apt, _apt) then...

    Looking at the apt source code now...

    This does not fail:

    https://salsa.debian.org/apt-team/apt/-/blob/main/apt-pkg/contrib/fileutl.cc#L3353

    if (geteuid() != pw->pw_uid)
    return _error->Error("Could not switch effective user");

    Further below, there's the reported error:

    https://salsa.debian.org/apt-team/apt/-/blob/main/apt-pkg/contrib/fileutl.cc#L3361

    if (getresuid(&ruid, &euid, &suid))
    return _error->Errno("getresuid", "Could not get saved set-user-ID");
    if (suid != pw->pw_uid)
    return _error->Error("Could not switch saved set-user-ID");

    This matches what we see in the strace output.

    But here's is where it gets murky:

    geteuid() returns 42, which is compared with pw->pw_uid, and since
    there's no error here, I conclude that pw->pw_uid must be 42 at this point.

    Then, getresuid() queries the saved-uid, and according to strace, it's
    also returned as 42. As long as nothing has changed pw->pw_uid in the
    meantime, the following comparison shouldn't fail.

    pw_uid comes from a getpwnam() call further up, by the way, which
    appears in strace as reading from /etc/passwd. (not shown here, the
    whole trace is a bit too long)

    I don't see any other option than to run this through a debugger, but
    that's a bit of a daunting task. Do you have any hints on how to
    proceed, or does this look familiar somehow?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Gregor Riepl on Fri Dec 23 14:10:01 2022
    Hi Gregor!

    On 12/23/22 13:52, Gregor Riepl wrote:
    Comparing these two values:

    42 0x2A
    2752512 0x2A0000

    uid_t is defined as:
    /usr/include/sparc64-linux-gnu/sys/types.h:typedef __uid_t uid_t; /usr/include/sparc64-linux-gnu/bits/types.h:__STD_TYPE __UID_T_TYPE __uid_t; /* Type of user identifications. */
    /usr/include/sparc64-linux-gnu/bits/typesizes.h:#define __UID_T_TYPE __U32_TYPE
    /usr/include/sparc64-linux-gnu/bits/types.h:#define __U32_TYPE unsigned int

    struct passwd uses:
    __uid_t pw_uid; /* User ID. */

    It looks like another case of values not properly passed between the host and guest
    in a qemu-user setup. Another very prominent case are file handles, see [1].

    Adrian

    [1] https://sourceware.org/bugzilla/show_bug.cgi?id=23960

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gregor Riepl@21:1/5 to All on Fri Dec 23 14:00:02 2022
    After a lot of yak shaving, I managed to add some debugging to apt:

    apt/apt-pkg/contrib/fileutl.cc:3364
    return _error->Error("Could not switch saved set-user-ID (expected %lu
    got %lu)", (unsigned long) pw->pw_uid, (unsigned long) suid);

    (added the typecast to be sure printf isn't doing anything unexpected)

    This returns:
    Reading package lists... Done
    E: Method gave invalid 400 URI Failure message: Could not switch saved [set-user-ID] (expected 42 got 2752512)

    Comparing these two values:

    42 0x2A
    2752512 0x2A0000

    uid_t is defined as:
    /usr/include/sparc64-linux-gnu/sys/types.h:typedef __uid_t uid_t; /usr/include/sparc64-linux-gnu/bits/types.h:__STD_TYPE __UID_T_TYPE
    __uid_t; /* Type of user identifications. */ /usr/include/sparc64-linux-gnu/bits/typesizes.h:#define __UID_T_TYPE
    __U32_TYPE
    /usr/include/sparc64-linux-gnu/bits/types.h:#define __U32_TYPE
    unsigned int

    struct passwd uses:
    __uid_t pw_uid; /* User ID. */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gregor Riepl@21:1/5 to All on Fri Dec 23 17:20:01 2022
    Hi Adrian,

    Where should I report this bug, then?

    On the glibc Bugzilla, or on https://gitlab.com/qemu-project/qemu/-/issues ?

    Or is this a kernel bug?

    It looks like another case of values not properly passed between the
    host and guest
    in a qemu-user setup. Another very prominent case are file handles, see
    [1].

    Adrian

    [1] https://sourceware.org/bugzilla/show_bug.cgi?id=23960


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Gregor Riepl on Sun Dec 25 00:10:01 2022
    Hi Gregor!

    On 12/23/22 17:13, Gregor Riepl wrote:
    Where should I report this bug, then?

    On the glibc Bugzilla, or on https://gitlab.com/qemu-project/qemu/-/issues ?

    Or is this a kernel bug?

    I suggest reporting this as a QEMU bug first and they'll tell you whether you're
    at the right address or need to forward this to the kernel or glibc people.

    Thanks for finding and debugging this!

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Gregor Riepl on Sun Dec 25 14:50:02 2022
    On 12/25/22 14:38, Gregor Riepl wrote:
    I suggest reporting this as a QEMU bug first and they'll tell you whether you're
    at the right address or need to forward this to the kernel or glibc people.

    Done: https://gitlab.com/qemu-project/qemu/-/issues/1394

    Exemplary bug report, thanks a lot!

    I'll create an appropriate Debian bug when I have more information (i.e. which package needs to be fixed).

    Great, thanks!

    I hacked apt (see below) to get around the issue and started what I was originally trying
    to do: https://buildd.debian.org/status/logs.php?pkg=nss&ver=2%3A3.85-1&arch=sparc64

    As I mentioned earlier in this thread, you can just configure apt to run as root to avoid
    this problem.

    Unfortunately, the "uname: Bad address" doesn't appear in qemu. Looks like I'll have to
    debug this on the actual hardware after all.

    Is this a known issue?

    Not that I know of. However, I would just bisect the issue in nss.

    FWIW, I know that there is going to be a new SPARC machine in the GCC compile farm based
    on a SPARC T4-2 that should become available soonish.

    While trying to trace the libnss issue, I also encountered a crash in g++, which I haven't been able to debug yet.

    I bet you will run into some more issues as qemu-user with sparc64 has seen little testing.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gregor Riepl@21:1/5 to All on Sun Dec 25 14:40:01 2022
    I suggest reporting this as a QEMU bug first and they'll tell you
    whether you're
    at the right address or need to forward this to the kernel or glibc people.

    Done: https://gitlab.com/qemu-project/qemu/-/issues/1394

    I'll create an appropriate Debian bug when I have more information (i.e.
    which package needs to be fixed).


    I hacked apt (see below) to get around the issue and started what I was originally trying to do: https://buildd.debian.org/status/logs.php?pkg=nss&ver=2%3A3.85-1&arch=sparc64

    Unfortunately, the "uname: Bad address" doesn't appear in qemu. Looks
    like I'll have to debug this on the actual hardware after all.
    Is this a known issue?

    While trying to trace the libnss issue, I also encountered a crash in
    g++, which I haven't been able to debug yet.


    ---
    Quick fix for apt:

    $ patch -p1 < getresuid-qemu-sparc64.patch
    $ ninja libapt-pkg.so
    $ cp apt-pkg/libapt-pkg.so.6.0.0
    /usr/lib/sparc64-linux-gnu/libapt-pkg.so.6.0.0

    diff --git a/apt-pkg/contrib/fileutl.cc b/apt-pkg/contrib/fileutl.cc
    index 10c656301..88b97ad0d 100644
    --- a/apt-pkg/contrib/fileutl.cc
    +++ b/apt-pkg/contrib/fileutl.cc
    @@ -3360,6 +3360,7 @@ bool DropPrivileges()
    /*{{{*/
    uid_t suid = 0;
    if (getresuid(&ruid, &euid, &suid))
    return _error->Errno("getresuid", "Could not get saved
    set-user-ID");
    + suid = suid >> 16;
    if (suid != pw->pw_uid)
    return _error->Error("Could not switch saved set-user-ID");
    #endif
    @@ -3371,6 +3372,7 @@ bool DropPrivileges()
    /*{{{*/
    gid_t sgid = 0;
    if (getresgid(&rgid, &egid, &sgid))
    return _error->Errno("getresuid", "Could not get saved set-group-ID");
    + sgid = sgid >> 16;
    if (sgid != pw->pw_gid)
    r
  • From Gregor Riepl@21:1/5 to All on Mon Dec 26 11:10:01 2022
    As I mentioned earlier in this thread, you can just configure apt to run
    as root to avoid
    this problem.

    Ah, I missed that. Thanks for the hint.

    Unfortunately, the "uname: Bad address" doesn't appear in qemu. Looks
    like I'll have to
    debug this on the actual hardware after all.

    Is this a known issue?

    Not that I know of. However, I would just bisect the issue in nss.

    FWIW, I know that there is going to be a new SPARC machine in the GCC
    compile farm based
    on a SPARC T4-2 that should become available soonish.

    Since gcc102 is available right now, I tested the libnss build on that
    machine. libnspr4-dev isn't installed, but the build got sufficiently
    far to compare output with the Debian build machines.

    Unfortunately, the uname issue does not occur there, which leads me to
    believe something is different between gcc102 and the Debian build hosts[1].

    gcc102 has coreutils 9.1-1, libc6 2.36-4 and uname reports:
    Linux gcc102.fsffrance.org 6.1.0+ #1 SMP Tue Dec 13 08:35:29 CST 2022
    sparc64 GNU/Linux

    [1] sompek and nvg5120 : https://buildd.debian.org/status/logs.php?pkg=nss&ver=2%3A3.85-1&arch=sparc64

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Gregor Riepl on Mon Dec 26 11:10:02 2022
    Hi Gregor!

    On 12/26/22 11:01, Gregor Riepl wrote:
    Since gcc102 is available right now, I tested the libnss build on that machine. libnspr4-dev
    isn't installed, but the build got sufficiently far to compare output with the Debian build
    machines.

    We should probably set up schroot on these machines so that users can set up their own build
    environments. For gcc102, Zach van Rijn is actually the admin of the machine.

    Unfortunately, the uname issue does not occur there, which leads me to believe something is
    different between gcc102 and the Debian build hosts[1].

    gcc102 has coreutils 9.1-1, libc6 2.36-4 and uname reports:
    Linux gcc102.fsffrance.org 6.1.0+ #1 SMP Tue Dec 13 08:35:29 CST 2022 sparc64 GNU/Linux

    The buildds are normally updating their chroots regularly. Maybe that regression hasn't hit
    gcc102 yet?

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)