• Regression in 028abd92 for Sun UltraSPARC T1

    From John Paul Adrian Glaubitz@21:1/5 to Frank Scheiner on Mon Mar 22 22:50:02 2021
    Hello!

    On 3/22/21 10:30 PM, Frank Scheiner wrote:
    Riccardo Mottola first recognized a problem with 5.10.x kernels on his
    Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
    the problem also on my Sun T1000 and it looks like this specific issue
    breaks the mounting of the root FS or maybe mounting file systems at
    all. This affects both booting from disk and from network.
    (...)
    ...as first bad commit.

    ```
    commit 028abd9222df0cf5855dab5014a5ebaf06f90565
    Author: Christoph Hellwig <hch@lst.de>
    Date: Thu Sep 17 10:22:34 2020 +0200

    fs: remove compat_sys_mount

    compat_sys_mount is identical to the regular sys_mount now, so
    remove it
    and use the native version everywhere.
    ```

    [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565

    Looking at this change, I think it's rather unexpected that this particular change would break the kernel on a specific CPU target. Are you sure that
    this is the right bad commit?

    If you found the right commit, then I assume there is something wrong with
    the syscall handling on UltraSPARC T1.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John Paul Adrian Glaubitz on Mon Mar 22 23:00:01 2021
    Hi,

    On 22.03.21 22:48, John Paul Adrian Glaubitz wrote:
    On 3/22/21 10:30 PM, Frank Scheiner wrote:
    Riccardo Mottola first recognized a problem with 5.10.x kernels on his
    Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
    the problem also on my Sun T1000 and it looks like this specific issue
    breaks the mounting of the root FS or maybe mounting file systems at
    all. This affects both booting from disk and from network.
    (...)
    ...as first bad commit.

    ```
    commit 028abd9222df0cf5855dab5014a5ebaf06f90565
    Author: Christoph Hellwig <hch@lst.de>
    Date: Thu Sep 17 10:22:34 2020 +0200

    fs: remove compat_sys_mount

    compat_sys_mount is identical to the regular sys_mount now, so
    remove it
    and use the native version everywhere.
    ```

    [1]:
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565

    Looking at this change, I think it's rather unexpected that this particular change would break the kernel on a specific CPU target. Are you sure that this is the right bad commit?

    Well, I strictly followed the `git bisect` process and tested each and
    every proposed revision. It's indeed strange that this only affects
    UltraSPARC T1s, but the changes match the behavior: mounting of (root)
    FS is broken.

    If you found the right commit, then I assume there is something wrong with the syscall handling on UltraSPARC T1.

    Could be, all in all the T1 is a first of its kind.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to All on Mon Mar 22 22:40:02 2021
    Dear all,

    Riccardo Mottola first recognized a problem with 5.10.x kernels on his
    Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
    the problem also on my Sun T1000 and it looks like this specific issue
    breaks the mounting of the root FS or maybe mounting file systems at
    all. This affects both booting from disk and from network.

    [this thread]: https://lists.debian.org/debian-sparc/2021/03/msg00004.html

    I bisected the Linux kernel between:

    bbf5c979011a099af5dc76498918ed7df445635b (good)

    ...and:

    3650b228f83adda7e5ee532e2b90429c03f7b9ec (bad)

    ...and the process identified:

    028abd9222df0cf5855dab5014a5ebaf06f90565 ([1])

    ...as first bad commit.

    ```
    commit 028abd9222df0cf5855dab5014a5ebaf06f90565
    Author: Christoph Hellwig <hch@lst.de>
    Date: Thu Sep 17 10:22:34 2020 +0200

    fs: remove compat_sys_mount

    compat_sys_mount is identical to the regular sys_mount now, so
    remove it
    and use the native version everywhere.
    ```

    [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565

    Details about the bisecting on [2].

    [2]: https://lists.debian.org/debian-sparc/2021/03/msg00042.html

    So far this only affects UltraSPARC T1 processors. I didn't see that
    problem on a T5220 with UltraSPARC T2 and I also didn't see that problem
    on a Sun Ultra Enterprise 450 with UltraSPARC II when testing a recent
    Debian installation media with 5.10.x kernel some weeks ago. Other
    UltraSPARC processors weren't tested yet. I plant to check UltraSPARC
    IIIi and maybe others if time allows.

    ****

    Do you maybe have an idea, what could go wrong with 028abd92
    specifically on an UltraSPARC T1 processor?

    I can provide a full log of a broken (network) boot process if that's
    useful, I just need to re-create it. IIRC the kernel oopses for each
    hardware thread (similar to what Riccardo wrote on the debian-sparc
    mailing list above) and then stops.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jan Engelhardt@21:1/5 to Frank Scheiner on Tue Mar 23 18:00:02 2021
    On Monday 2021-03-22 22:55, Frank Scheiner wrote:
    Riccardo Mottola first recognized a problem with 5.10.x kernels on his
    Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
    the problem also on my Sun T1000 and it looks like this specific issue
    breaks the mounting of the root FS or maybe mounting file systems at
    all. This affects both booting from disk and from network.
    (...)
    ...as first bad commit.

    ```
    commit 028abd9222df0cf5855dab5014a5ebaf06f90565
    Author: Christoph Hellwig <hch@lst.de>
    fs: remove compat_sys_mount

    Some participants in the discussion over at the debian-sparc list mentioned "NFS" and "Invalid argument", which is something I know just too well from iptables. NFS is a filesystem that uses an extra data blob (5th argument to the mount syscall). Such blobs have historically not always been designed to bear the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to this as well.

    My hypothesis now is that fs/nfs/fs_context.c line 1160:

    if (in_compat_syscall())
    nfs4_compat_mount_data_conv(data);

    and ones similar to it (I didn't look too close where nfs3 gets to do its conversion), no longer trigger as a result of compat_sys_mount being
    wiped from the syscall table:

    +++ arch/sparc/kernel/syscalls/syscall.tbl
    @@ -201,7 +201,7 @@
    164 64 utrap_install sys_utrap_install
    165 common quotactl sys_quotactl
    166 common set_tid_address sys_set_tid_address
    -167 common mount sys_mount compat_sys_mount
    +167 common mount sys_mount

    I didn't extract from the debian-sparc discussion whether people were running the all-LP64 userspace, or had some older Debian with a ILP32-on-64bitkernel setup.


    [But that's just a theory - a kernel theory!]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Christoph Hellwig on Tue Mar 23 18:40:01 2021
    On 23.03.21 17:57, Christoph Hellwig wrote:
    On Tue, Mar 23, 2021 at 05:50:59PM +0100, Jan Engelhardt wrote:
    Some participants in the discussion over at the debian-sparc list mentioned >> "NFS" and "Invalid argument", which is something I know just too well from >> iptables. NFS is a filesystem that uses an extra data blob (5th argument to the
    mount syscall). Such blobs have historically not always been designed to bear
    the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to >> this as well.

    My hypothesis now is that fs/nfs/fs_context.c line 1160:

    if (in_compat_syscall())
    nfs4_compat_mount_data_conv(data);

    and ones similar to it (I didn't look too close where nfs3 gets to do its
    conversion), no longer trigger as a result of compat_sys_mount being
    wiped from the syscall table:

    No, if in_compat_syscall() syscall doesn't trigger properly the kernel
    would not get this far.

    That being said, the NFS compat code was moved out of the compat mount handler and into nfs and refactored in the commit just before this one.

    Frank, can you double check that commit 67e306c6906137020267eb9bbdbc127034da3627 really still works, and
    only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?

    Indeed, I also expected 67e306c6906137020267eb9bbdbc127034da3627 to fail because of its commit message, but from my log it did work correctly.

    As the T1000 is at home and I don't have another T1 based system in my
    storage location where I am now, I'll double check that in the evening
    and report back.

    Strangely for a V245 (with UltraSPARC IIIi) both commits seem to work
    according to my testing, but 5.10.x (from Debian) doesn't work and
    5.9.15 (also from Debian) does work - tested now both for boot from
    network and boot from disk.

    Possibly unrelated to the problem with the T1000, the V245 emits the
    following for boot from disk with 5.10.x:

    ```
    [...]
    Loading Linux 5.10.0-5-sparc64-smp ...
    Loading initial ramdisk ...

    [ 2.602821] rtc_cmos rtc_cmos: IRQ index 0 not found
    /dev/sda2: clean, 33516/8454144 files, 1105784/33798750 blocks
    [ 13.542728] autofs4:pid:1:autofs_fill_super: called with bogus options
    [ 13.628931] systemd[1]: proc-sys-fs-binfmt_misc.automount: Failed to initialize automounter: Invalid argument
    [ 13.759917] systemd[1]: Failed to set up automount Arbitrary
    Executable File Formats File System Automount Point.
    [FAILED] Failed to set up automount File System Automount Point.
    [ 14.456396] Unable to handle kernel paging request in mna handler
    [ 14.456400] at virtual address da65f2fed110e482
    [ 14.597474] current->{active_,}mm->context = 00000000000000ce
    [ 14.597478] current->{active_,}mm->pgd = fff0000006d5c000
    [ 14.752380] Unable to handle kernel paging request in mna handler
    [ 14.752383] at virtual address da65f2fed110e482
    [ 14.893509] current->{active_,}mm->context = 0000000000000094
    [ 14.969141] current->{active_,}mm->pgd = fff00011010e0000
    [ 15.040554] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
    [ 15.141430] Press Stop-A (L1-A) from sun keyboard or send break
    [ 15.141430] twice on console to return to the boot prom
    [ 15.141459] kernel BUG at kernel/cpu.c:960
    ```

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christoph Hellwig@21:1/5 to Jan Engelhardt on Tue Mar 23 18:30:01 2021
    On Tue, Mar 23, 2021 at 05:50:59PM +0100, Jan Engelhardt wrote:
    Some participants in the discussion over at the debian-sparc list mentioned "NFS" and "Invalid argument", which is something I know just too well from iptables. NFS is a filesystem that uses an extra data blob (5th argument to the
    mount syscall). Such blobs have historically not always been designed to bear the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to this as well.

    My hypothesis now is that fs/nfs/fs_context.c line 1160:

    if (in_compat_syscall())
    nfs4_compat_mount_data_conv(data);

    and ones similar to it (I didn't look too close where nfs3 gets to do its conversion), no longer trigger as a result of compat_sys_mount being
    wiped from the syscall table:

    No, if in_compat_syscall() syscall doesn't trigger properly the kernel
    would not get this far.

    That being said, the NFS compat code was moved out of the compat mount
    handler and into nfs and refactored in the commit just before this one.

    Frank, can you double check that commit 67e306c6906137020267eb9bbdbc127034da3627 really still works, and
    only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Christoph Hellwig on Tue Mar 23 23:20:01 2021
    On 23.03.21 17:57, Christoph Hellwig wrote:> Frank, can you double check
    that commit
    67e306c6906137020267eb9bbdbc127034da3627 really still works, and
    only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?

    So I manually checked out both 67e306c6906137020267eb9bbdbc127034da3627
    and 028abd9222df0cf5855dab5014a5ebaf06f90565 and recompiled both (doing
    `make [...] mrproper` before each run).

    The results didn't change from the ones from the bisecting process:

    67e306c6906137020267eb9bbdbc127034da3627

    ...is working and:

    028abd9222df0cf5855dab5014a5ebaf06f90565

    ...is broken on my T1000.

    As I don't know how big attachments can be on this list, I put the logs
    on pastebin.

    A log for 028abd9222df is here:

    https://pastebin.com/ApPYsMcu

    A log for 67e306c69061 is here:

    https://pastebin.com/uGLXX7RS

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christoph Hellwig@21:1/5 to Frank Scheiner on Wed Mar 24 09:50:01 2021
    On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
    028abd9222df0cf5855dab5014a5ebaf06f90565

    ...is broken on my T1000.

    As I don't know how big attachments can be on this list, I put the logs
    on pastebin.

    A log for 028abd9222df is here:

    https://pastebin.com/ApPYsMcu

    Just do confirm: in this tree line 304 in mm/slub.c is this BUG_ON:

    BUG_ON(object == fp); /* naive detection of double free or corruption */

    which would mean we have a double free. In that case it would be
    interesting which call to kfree this is, which could be done by
    calling gdb on vmlinux and then typing;

    l *(sys_mount+0x114/0x1e0)

    Not that a double free caused by this conversion makes any sense to me..

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Christoph Hellwig on Wed Mar 24 13:40:02 2021
    On 24.03.21 09:28, Christoph Hellwig wrote:
    On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
    028abd9222df0cf5855dab5014a5ebaf06f90565

    ...is broken on my T1000.

    As I don't know how big attachments can be on this list, I put the logs
    on pastebin.

    A log for 028abd9222df is here:

    https://pastebin.com/ApPYsMcu

    Just do confirm: in this tree line 304 in mm/slub.c is this BUG_ON:

    BUG_ON(object == fp); /* naive detection of double free or corruption */

    which would mean we have a double free. In that case it would be
    interesting which call to kfree this is, which could be done by
    calling gdb on vmlinux and then typing;

    l *(sys_mount+0x114/0x1e0)

    Not that a double free caused by this conversion makes any sense to me..

    Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other architectures, but "libpython3.8" is actually not available for sparc64, "libpython3.9" is available for sparc64 though:

    ```
    root@t1000:~# apt install gdb
    Reading package lists... Done
    Building dependency tree... Done
    Reading state information... Done
    Some packages could not be installed. This may mean that you have
    requested an impossible situation or if you are using the unstable
    distribution that some required packages have not yet been created
    or been moved out of Incoming.
    The following information may help to resolve the situation:

    The following packages have unmet dependencies:
    gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
    Recommends: libc-dbg
    E: Unable to correct problems, you have held broken packages.
    ```

    [1]: https://packages.debian.org/sid/gdb

    Something wrong with the dependencies. Any suggestions?

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Frank Scheiner on Wed Mar 24 13:50:02 2021
    Hello Frank!

    On 3/24/21 1:30 PM, Frank Scheiner wrote:
    Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other architectures, but "libpython3.8" is actually not available for sparc64, "libpython3.9" is available for sparc64 though:

    The reason for this is a bug in gdb [1] and the fact that we don't have cruft in Debian Ports [2]. If someone knows how to disable individual tests in the GDB testsuite, we could just disable the problematic test in src:gdb.

    Adrian

    [1] https://sourceware.org/bugzilla/show_bug.cgi?id=26170
    [2] https://lists.debian.org/debian-sparc/2017/12/msg00060.html

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anatoly Pugachev@21:1/5 to frank.scheiner@web.de on Wed Mar 24 13:50:02 2021
    On Wed, Mar 24, 2021 at 3:31 PM Frank Scheiner <frank.scheiner@web.de> wrote:
    Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other architectures, but "libpython3.8" is actually not available for sparc64, "libpython3.9" is available for sparc64 though:
    ...
    The following packages have unmet dependencies:
    gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
    Recommends: libc-dbg
    E: Unable to correct problems, you have held broken packages.
    ```
    Something wrong with the dependencies. Any suggestions?

    Frank,

    you could use http://snapshot.debian.org to install old versions of
    packages, i.e. gdb and libpython-3.8

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Anatoly Pugachev on Wed Mar 24 13:50:02 2021
    On 24.03.21 13:42, Anatoly Pugachev wrote:
    On Wed, Mar 24, 2021 at 3:31 PM Frank Scheiner <frank.scheiner@web.de> wrote:
    Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on
    "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other
    architectures, but "libpython3.8" is actually not available for sparc64,
    "libpython3.9" is available for sparc64 though:
    ...
    The following packages have unmet dependencies:
    gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
    Recommends: libc-dbg
    E: Unable to correct problems, you have held broken packages.
    ```
    Something wrong with the dependencies. Any suggestions?

    Frank,

    you could use http://snapshot.debian.org to install old versions of
    packages, i.e. gdb and libpython-3.8

    Of course, didn't think about that. Will try that and report my findings.

    Thanks and cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Christoph Hellwig on Wed Mar 24 14:10:01 2021
    On 24.03.21 09:28, Christoph Hellwig wrote:
    On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
    028abd9222df0cf5855dab5014a5ebaf06f90565

    ...is broken on my T1000.

    As I don't know how big attachments can be on this list, I put the logs
    on pastebin.

    A log for 028abd9222df is here:

    https://pastebin.com/ApPYsMcu

    Just do confirm: in this tree line 304 in mm/slub.c is this BUG_ON:

    BUG_ON(object == fp); /* naive detection of double free or corruption */

    which would mean we have a double free. In that case it would be
    interesting which call to kfree this is, which could be done by
    calling gdb on vmlinux and then typing;

    l *(sys_mount+0x114/0x1e0)

    Not that a double free caused by this conversion makes any sense to me..

    This is what I get:

    ```
    root@t1000:~/kernels-in-question# gdb vmlinux-028abd9222df-new
    GNU gdb (Debian 9.2-1+b1) 9.2
    Copyright (C) 2020 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later
    <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.
    Type "show copying" and "show warranty" for details.
    This GDB was configured as "sparc64-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from vmlinux-028abd9222df-new...
    (gdb) l *(sys_mount+0x114/0x1e0)
    0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).
    3385 fs/namespace.c: No such file or directory.
    (gdb)
    ```

    Kernel sources are not available on the T1000.

    If need be, where do they need to exist and how should the directory be
    named - `/usr/src/[...]`?

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John Paul Adrian Glaubitz on Wed Mar 24 14:20:01 2021
    On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:
    On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on the T1000.

    If need be, where do they need to exist and how should the directory be
    named - `/usr/src/[...]`?

    Try installing "linux-source" and the "-dbg" package for your Debian kernel.

    But don't I need the source for the kernel at 028abd92? I figured, I
    need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
    "5.9.0-rc1+" is the version the corresponding modules are installed -
    could that be correct?

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anatoly Pugachev@21:1/5 to frank.scheiner@web.de on Wed Mar 24 14:30:02 2021
    On Wed, Mar 24, 2021 at 4:19 PM Frank Scheiner <frank.scheiner@web.de> wrote:
    On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:
    On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on the T1000.

    If need be, where do they need to exist and how should the directory be
    named - `/usr/src/[...]`?

    Try installing "linux-source" and the "-dbg" package for your Debian kernel.

    But don't I need the source for the kernel at 028abd92? I figured, I
    need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
    "5.9.0-rc1+" is the version the corresponding modules are installed -
    could that be correct?

    Frank,

    i'm using gdb from kernel sources directory (from which kernel is
    installed), like:

    $ uname -a
    Linux ttip 5.12.0-rc4 #203 SMP Wed Mar 24 15:50:29 MSK 2021 sparc64 GNU/Linux
    $ cd linux-2.6
    linux-2.6$ git describe
    v5.12-rc4
    linux-2.6$ gdb -q vmlinux
    Reading symbols from vmlinux...
    (gdb) l *(sys_mount+0x114/0x1e0)
    0x6dd7c0 is in __se_sys_mount (fs/namespace.c:3431).
    3426 /* ... and return the root of (sub)tree on it */
    3427 return path.dentry;
    3428 }
    3429 EXPORT_SYMBOL(mount_subtree);
    3430
    3431 SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
    3432 char __user *, type, unsigned long, flags,
    void __user *, data)
    3433 {
    3434 int ret;
    3435 char *kernel_type;
    (gdb)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Anatoly Pugachev on Wed Mar 24 14:40:01 2021
    On 24.03.21 14:24, Anatoly Pugachev wrote:
    On Wed, Mar 24, 2021 at 4:19 PM Frank Scheiner <frank.scheiner@web.de> wrote:
    On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:
    On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on the T1000.

    If need be, where do they need to exist and how should the directory be >>>> named - `/usr/src/[...]`?

    Try installing "linux-source" and the "-dbg" package for your Debian kernel.

    But don't I need the source for the kernel at 028abd92? I figured, I
    need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
    "5.9.0-rc1+" is the version the corresponding modules are installed -
    could that be correct?

    Frank,

    i'm using gdb from kernel sources directory (from which kernel is
    installed), like:

    $ uname -a
    Linux ttip 5.12.0-rc4 #203 SMP Wed Mar 24 15:50:29 MSK 2021 sparc64 GNU/Linux $ cd linux-2.6
    linux-2.6$ git describe
    v5.12-rc4
    linux-2.6$ gdb -q vmlinux
    Reading symbols from vmlinux...
    (gdb) l *(sys_mount+0x114/0x1e0)
    0x6dd7c0 is in __se_sys_mount (fs/namespace.c:3431).
    3426 /* ... and return the root of (sub)tree on it */
    3427 return path.dentry;
    3428 }
    3429 EXPORT_SYMBOL(mount_subtree);
    3430
    3431 SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
    3432 char __user *, type, unsigned long, flags,
    void __user *, data)
    3433 {
    3434 int ret;
    3435 char *kernel_type;
    (gdb)


    Ok, will try that approach. I'm currently `tar`ing the kernel sources
    @028abd92 on the cross-compiling host and will move them over to the T1000.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Christoph Hellwig on Wed Mar 24 15:00:02 2021
    On 24.03.21 09:28, Christoph Hellwig wrote:
    On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
    028abd9222df0cf5855dab5014a5ebaf06f90565

    ...is broken on my T1000.

    As I don't know how big attachments can be on this list, I put the logs
    on pastebin.

    A log for 028abd9222df is here:

    https://pastebin.com/ApPYsMcu

    Just do confirm: in this tree line 304 in mm/slub.c is this BUG_ON:

    BUG_ON(object == fp); /* naive detection of double free or corruption */

    which would mean we have a double free. In that case it would be
    interesting which call to kfree this is, which could be done by
    calling gdb on vmlinux and then typing;

    l *(sys_mount+0x114/0x1e0)

    Not that a double free caused by this conversion makes any sense to me..


    Finally - a T1 thread is so slow (for untaring) that I untared the
    tarball from my X4270 cross-compile host to the T1000's root FS in the end:

    ```
    root@t1000:~/mnt/torvalds-linux# git describe
    v5.9-rc1-3-g028abd9222df
    root@t1000:~/mnt/torvalds-linux# gdb -q vmlinux
    Reading symbols from vmlinux...
    (gdb) l *(sys_mount+0x114/0x1e0)
    0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).
    3385 /* ... and return the root of (sub)tree on it */
    3386 return path.dentry;
    3387 }
    3388 EXPORT_SYMBOL(mount_subtree);
    3389
    3390 SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
    3391 char __user *, type, unsigned long, flags, void __user *, data)
    3392 {
    3393 int ret;
    3394 char *kernel_type;
    (gdb)
    ```

    ...not sure if that adds anything to what Anatoly already provided apart
    from the "correct" line numbers for the actually used kernel.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jan Engelhardt@21:1/5 to Frank Scheiner on Wed Mar 24 16:30:01 2021
    On Wednesday 2021-03-24 14:57, Frank Scheiner wrote:

    (gdb) l *(sys_mount+0x114/0x1e0)
    0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).

    /0x1e0 does not normally belong there. Just

    l *(sys_mount+0x114)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Jan Engelhardt on Wed Mar 24 17:00:02 2021
    On 24.03.21 16:22, Jan Engelhardt wrote:

    On Wednesday 2021-03-24 14:57, Frank Scheiner wrote:

    (gdb) l *(sys_mount+0x114/0x1e0)
    0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).

    /0x1e0 does not normally belong there. Just

    l *(sys_mount+0x114)


    I guess this comes from my log on [1]:

    ```
    [...]
    [ 20.089289] RPC: <kfree+0x3ac/0x420>
    [ 20.089415] l0: ffff8001f8885cc8 l1: ffff8001f8881380 l2:
    ffff8001ec434558 l3: 0000000000201db0
    [ 20.089586] l4: 000000000000029c l5: ffff80010000c1a0 l6:
    ffff8001ec79c000 l7: 00000000006c6380
    [ 20.089802] i0: 0000000000001000 i1: ffff8001ec436000 i2:
    00000000006c6494 i3: ffff8001ec436000
    [ 20.089877] i4: ffff800008405340 i5: 00006000045396c0 i6:
    ffff8001ec79f561 i7: 00000000006c6494
    [ 20.090051] I7: <sys_mount+0x114/0x1e0>
    [ 20.090186] Call Trace:
    [ 20.090279] [<00000000006c6494>] sys_mount+0x114/0x1e0
    [ 20.090338] [<00000000006c6454>] sys_mount+0xd4/0x1e0
    [ 20.090499] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
    [ 20.090697] Disabling lock debugging due to kernel taint
    [ 20.090770] Caller[00000000006c6494]: sys_mount+0x114/0x1e0
    [ 20.090926] Caller[00000000006c6454]: sys_mount+0xd4/0x1e0
    [ 20.091133] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44
    [ 20.091196] Caller[0000000000100aa8]: 0x100aa8
    [...]
    ```

    [1]: https://pastebin.com/ApPYsMcu

    Here the result for the suggested command:
    ```
    root@t1000:~/mnt/torvalds-linux# gdb -q vmlinux
    Reading symbols from vmlinux...
    (gdb) l *(sys_mount+0x114)
    0x6c6494 is in __se_sys_mount (fs/namespace.c:3415).
    3410 if (IS_ERR(options))
    3411 goto out_data;
    3412
    3413 ret = do_mount(kernel_dev, dir_name, kernel_type, flags, options);
    3414
    3415 kfree(options);
    3416 out_data:
    3417 kfree(kernel_dev);
    3418 out_dev:
    3419 kfree(kernel_type);
    (gdb)
    ```

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christoph Hellwig@21:1/5 to Frank Scheiner on Wed Mar 24 17:30:01 2021
    On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:
    [ 20.090279] [<00000000006c6494>] sys_mount+0x114/0x1e0
    [ 20.090338] [<00000000006c6454>] sys_mount+0xd4/0x1e0
    [ 20.090499] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
    [ 20.090697] Disabling lock debugging due to kernel taint
    [ 20.090770] Caller[00000000006c6494]: sys_mount+0x114/0x1e0
    [ 20.090926] Caller[00000000006c6454]: sys_mount+0xd4/0x1e0
    [ 20.091133] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44
    [ 20.091196] Caller[0000000000100aa8]: 0x100aa8
    [...]
    ```

    [1]: https://pastebin.com/ApPYsMcu

    Here the result for the suggested command:

    Thanks. And very strange, as i can't find what would free options
    before. Does the system boot if you comment out that kfree in line
    3415 (even if that casues a memleak elsewhere).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Christoph Hellwig on Wed Mar 24 17:40:01 2021
    On 24.03.21 17:10, Christoph Hellwig wrote:
    On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:
    [ 20.090279] [<00000000006c6494>] sys_mount+0x114/0x1e0
    [ 20.090338] [<00000000006c6454>] sys_mount+0xd4/0x1e0
    [ 20.090499] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
    [ 20.090697] Disabling lock debugging due to kernel taint
    [ 20.090770] Caller[00000000006c6494]: sys_mount+0x114/0x1e0
    [ 20.090926] Caller[00000000006c6454]: sys_mount+0xd4/0x1e0
    [ 20.091133] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44
    [ 20.091196] Caller[0000000000100aa8]: 0x100aa8
    [...]
    ```

    [1]: https://pastebin.com/ApPYsMcu

    Here the result for the suggested command:

    Thanks. And very strange, as i can't find what would free options
    before. Does the system boot if you comment out that kfree in line
    3415 (even if that casues a memleak elsewhere).

    Unfortunately not, the result with the kfree() commented in
    fs/namespace.c:3415 looks pretty similar in my eyes. Log is on [2]

    [1]: https://pastebin.com/zmSFpv3R

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Frank Scheiner on Wed Mar 24 17:40:02 2021
    On 24.03.21 17:33, Frank Scheiner wrote:
    On 24.03.21 17:10, Christoph Hellwig wrote:
    On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:
    [   20.090279] [<00000000006c6494>] sys_mount+0x114/0x1e0
    [   20.090338] [<00000000006c6454>] sys_mount+0xd4/0x1e0
    [   20.090499] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
    [   20.090697] Disabling lock debugging due to kernel taint
    [   20.090770] Caller[00000000006c6494]: sys_mount+0x114/0x1e0
    [   20.090926] Caller[00000000006c6454]: sys_mount+0xd4/0x1e
    0
    [   20.091133] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44 >>> [   20.091196] Caller[0000000000100aa8]: 0x100aa8
    [...]
    ```

    [1]: https://pastebin.com/ApPYsMcu

    Here the result for the suggested command:

    Thanks.  And very strange, as i can't find what would free options
    before.  Does the system boot if you comment out that kfree in line
    3415 (even if that casues a memleak elsewhere).

    Unfortunately not, the result with the kfree() commented in fs/namespace.c:3415 looks pretty similar in my eyes.

    Actually on second view the result looks different. :-/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christoph Hellwig@21:1/5 to All on Thu Mar 25 09:10:03 2021
    I have to admit I'm completely lost at this point. This new trace looks totally strange to me, and I'm pretty sure whatever symptoms you see are
    due to different alignments / code sections etc just triggered by the
    removal, we need help from the real sparc experts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)