Riccardo Mottola first recognized a problem with 5.10.x kernels on his
Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
the problem also on my Sun T1000 and it looks like this specific issue
breaks the mounting of the root FS or maybe mounting file systems at
all. This affects both booting from disk and from network.
(...)
...as first bad commit.
```
commit 028abd9222df0cf5855dab5014a5ebaf06f90565
Author: Christoph Hellwig <hch@lst.de>
Date: Thu Sep 17 10:22:34 2020 +0200
fs: remove compat_sys_mount
compat_sys_mount is identical to the regular sys_mount now, so
remove it
and use the native version everywhere.
```
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565
On 3/22/21 10:30 PM, Frank Scheiner wrote:
Riccardo Mottola first recognized a problem with 5.10.x kernels on his
Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
the problem also on my Sun T1000 and it looks like this specific issue
breaks the mounting of the root FS or maybe mounting file systems at
all. This affects both booting from disk and from network.
(...)
...as first bad commit.
```
commit 028abd9222df0cf5855dab5014a5ebaf06f90565
Author: Christoph Hellwig <hch@lst.de>
Date: Thu Sep 17 10:22:34 2020 +0200
fs: remove compat_sys_mount
compat_sys_mount is identical to the regular sys_mount now, so
remove it
and use the native version everywhere.
```
[1]:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=028abd9222df0cf5855dab5014a5ebaf06f90565
Looking at this change, I think it's rather unexpected that this particular change would break the kernel on a specific CPU target. Are you sure that this is the right bad commit?
If you found the right commit, then I assume there is something wrong with the syscall handling on UltraSPARC T1.
Riccardo Mottola first recognized a problem with 5.10.x kernels on his
Sun T2000 with UltraSPARC T1 (details in [this thread]). I could verify
the problem also on my Sun T1000 and it looks like this specific issue
breaks the mounting of the root FS or maybe mounting file systems at
all. This affects both booting from disk and from network.
(...)
...as first bad commit.
```
commit 028abd9222df0cf5855dab5014a5ebaf06f90565
Author: Christoph Hellwig <hch@lst.de>
fs: remove compat_sys_mount
On Tue, Mar 23, 2021 at 05:50:59PM +0100, Jan Engelhardt wrote:
Some participants in the discussion over at the debian-sparc list mentioned >> "NFS" and "Invalid argument", which is something I know just too well from >> iptables. NFS is a filesystem that uses an extra data blob (5th argument to the
mount syscall). Such blobs have historically not always been designed to bear
the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to >> this as well.
My hypothesis now is that fs/nfs/fs_context.c line 1160:
if (in_compat_syscall())
nfs4_compat_mount_data_conv(data);
and ones similar to it (I didn't look too close where nfs3 gets to do its
conversion), no longer trigger as a result of compat_sys_mount being
wiped from the syscall table:
No, if in_compat_syscall() syscall doesn't trigger properly the kernel
would not get this far.
That being said, the NFS compat code was moved out of the compat mount handler and into nfs and refactored in the commit just before this one.
Frank, can you double check that commit 67e306c6906137020267eb9bbdbc127034da3627 really still works, and
only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?
Some participants in the discussion over at the debian-sparc list mentioned "NFS" and "Invalid argument", which is something I know just too well from iptables. NFS is a filesystem that uses an extra data blob (5th argument to the
mount syscall). Such blobs have historically not always been designed to bear the same layout between ILP32 and LP64 modes, and nfs's structs fell prey to this as well.
My hypothesis now is that fs/nfs/fs_context.c line 1160:
if (in_compat_syscall())
nfs4_compat_mount_data_conv(data);
and ones similar to it (I didn't look too close where nfs3 gets to do its conversion), no longer trigger as a result of compat_sys_mount being
wiped from the syscall table:
67e306c6906137020267eb9bbdbc127034da3627 really still works, and
only 028abd9222df0cf5855dab5014a5ebaf06f90565 broke your setup?
028abd9222df0cf5855dab5014a5ebaf06f90565
...is broken on my T1000.
As I don't know how big attachments can be on this list, I put the logs
on pastebin.
A log for 028abd9222df is here:
https://pastebin.com/ApPYsMcu
On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
028abd9222df0cf5855dab5014a5ebaf06f90565
...is broken on my T1000.
As I don't know how big attachments can be on this list, I put the logs
on pastebin.
A log for 028abd9222df is here:
https://pastebin.com/ApPYsMcu
Just do confirm: in this tree line 304 in mm/slub.c is this BUG_ON:
BUG_ON(object == fp); /* naive detection of double free or corruption */
which would mean we have a double free. In that case it would be
interesting which call to kfree this is, which could be done by
calling gdb on vmlinux and then typing;
l *(sys_mount+0x114/0x1e0)
Not that a double free caused by this conversion makes any sense to me..
Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other architectures, but "libpython3.8" is actually not available for sparc64, "libpython3.9" is available for sparc64 though:
[1] https://sourceware.org/bugzilla/show_bug.cgi?id=26170
[2] https://lists.debian.org/debian-sparc/2017/12/msg00060.html
Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on "libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other architectures, but "libpython3.8" is actually not available for sparc64, "libpython3.9" is available for sparc64 though:
...
The following packages have unmet dependencies:
gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
Recommends: libc-dbg
E: Unable to correct problems, you have held broken packages.
```
Something wrong with the dependencies. Any suggestions?
On Wed, Mar 24, 2021 at 3:31 PM Frank Scheiner <frank.scheiner@web.de> wrote:
Sorry, but I can't install `gdb` on my T1000 ATM, because it depends on
"libpython3.8" for sparc64 (see [1]) and "libpython3.9" for the other
architectures, but "libpython3.8" is actually not available for sparc64,
"libpython3.9" is available for sparc64 though:
...
The following packages have unmet dependencies:
gdb : Depends: libpython3.8 (>= 3.8.2) but it is not installable
Recommends: libc-dbg
E: Unable to correct problems, you have held broken packages.
```
Something wrong with the dependencies. Any suggestions?
Frank,
you could use http://snapshot.debian.org to install old versions of
packages, i.e. gdb and libpython-3.8
On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
028abd9222df0cf5855dab5014a5ebaf06f90565
...is broken on my T1000.
As I don't know how big attachments can be on this list, I put the logs
on pastebin.
A log for 028abd9222df is here:
https://pastebin.com/ApPYsMcu
Just do confirm: in this tree line 304 in mm/slub.c is this BUG_ON:
BUG_ON(object == fp); /* naive detection of double free or corruption */
which would mean we have a double free. In that case it would be
interesting which call to kfree this is, which could be done by
calling gdb on vmlinux and then typing;
l *(sys_mount+0x114/0x1e0)
Not that a double free caused by this conversion makes any sense to me..
On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on the T1000.
If need be, where do they need to exist and how should the directory be
named - `/usr/src/[...]`?
Try installing "linux-source" and the "-dbg" package for your Debian kernel.
On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:
On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on the T1000.
If need be, where do they need to exist and how should the directory be
named - `/usr/src/[...]`?
Try installing "linux-source" and the "-dbg" package for your Debian kernel.
But don't I need the source for the kernel at 028abd92? I figured, I
need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
"5.9.0-rc1+" is the version the corresponding modules are installed -
could that be correct?
On Wed, Mar 24, 2021 at 4:19 PM Frank Scheiner <frank.scheiner@web.de> wrote:
On 24.03.21 14:16, John Paul Adrian Glaubitz wrote:
On 3/24/21 2:09 PM, Frank Scheiner wrote:> Kernel sources are not available on the T1000.
If need be, where do they need to exist and how should the directory be >>>> named - `/usr/src/[...]`?
Try installing "linux-source" and the "-dbg" package for your Debian kernel.
But don't I need the source for the kernel at 028abd92? I figured, I
need the sources in `/usr/src/linux-source-5.9.0-rc1+` because
"5.9.0-rc1+" is the version the corresponding modules are installed -
could that be correct?
Frank,
i'm using gdb from kernel sources directory (from which kernel is
installed), like:
$ uname -a
Linux ttip 5.12.0-rc4 #203 SMP Wed Mar 24 15:50:29 MSK 2021 sparc64 GNU/Linux $ cd linux-2.6
linux-2.6$ git describe
v5.12-rc4
linux-2.6$ gdb -q vmlinux
Reading symbols from vmlinux...
(gdb) l *(sys_mount+0x114/0x1e0)
0x6dd7c0 is in __se_sys_mount (fs/namespace.c:3431).
3426 /* ... and return the root of (sub)tree on it */
3427 return path.dentry;
3428 }
3429 EXPORT_SYMBOL(mount_subtree);
3430
3431 SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name,
3432 char __user *, type, unsigned long, flags,
void __user *, data)
3433 {
3434 int ret;
3435 char *kernel_type;
(gdb)
On Tue, Mar 23, 2021 at 11:17:41PM +0100, Frank Scheiner wrote:
028abd9222df0cf5855dab5014a5ebaf06f90565
...is broken on my T1000.
As I don't know how big attachments can be on this list, I put the logs
on pastebin.
A log for 028abd9222df is here:
https://pastebin.com/ApPYsMcu
Just do confirm: in this tree line 304 in mm/slub.c is this BUG_ON:
BUG_ON(object == fp); /* naive detection of double free or corruption */
which would mean we have a double free. In that case it would be
interesting which call to kfree this is, which could be done by
calling gdb on vmlinux and then typing;
l *(sys_mount+0x114/0x1e0)
Not that a double free caused by this conversion makes any sense to me..
(gdb) l *(sys_mount+0x114/0x1e0)
0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).
On Wednesday 2021-03-24 14:57, Frank Scheiner wrote:
(gdb) l *(sys_mount+0x114/0x1e0)
0x6c6380 is in __se_sys_mount (fs/namespace.c:3390).
/0x1e0 does not normally belong there. Just
l *(sys_mount+0x114)
[ 20.090279] [<00000000006c6494>] sys_mount+0x114/0x1e0
[ 20.090338] [<00000000006c6454>] sys_mount+0xd4/0x1e0
[ 20.090499] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
[ 20.090697] Disabling lock debugging due to kernel taint
[ 20.090770] Caller[00000000006c6494]: sys_mount+0x114/0x1e0
[ 20.090926] Caller[00000000006c6454]: sys_mount+0xd4/0x1e0
[ 20.091133] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44
[ 20.091196] Caller[0000000000100aa8]: 0x100aa8
[...]
```
[1]: https://pastebin.com/ApPYsMcu
Here the result for the suggested command:
On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:
[ 20.090279] [<00000000006c6494>] sys_mount+0x114/0x1e0
[ 20.090338] [<00000000006c6454>] sys_mount+0xd4/0x1e0
[ 20.090499] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
[ 20.090697] Disabling lock debugging due to kernel taint
[ 20.090770] Caller[00000000006c6494]: sys_mount+0x114/0x1e0
[ 20.090926] Caller[00000000006c6454]: sys_mount+0xd4/0x1e0
[ 20.091133] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44
[ 20.091196] Caller[0000000000100aa8]: 0x100aa8
[...]
```
[1]: https://pastebin.com/ApPYsMcu
Here the result for the suggested command:
Thanks. And very strange, as i can't find what would free options
before. Does the system boot if you comment out that kfree in line
3415 (even if that casues a memleak elsewhere).
On 24.03.21 17:10, Christoph Hellwig wrote:0
On Wed, Mar 24, 2021 at 04:58:39PM +0100, Frank Scheiner wrote:
[ 20.090279] [<00000000006c6494>] sys_mount+0x114/0x1e0
[ 20.090338] [<00000000006c6454>] sys_mount+0xd4/0x1e0
[ 20.090499] [<0000000000406274>] linux_sparc_syscall+0x34/0x44
[ 20.090697] Disabling lock debugging due to kernel taint
[ 20.090770] Caller[00000000006c6494]: sys_mount+0x114/0x1e0
[ 20.090926] Caller[00000000006c6454]: sys_mount+0xd4/0x1e
[ 20.091133] Caller[0000000000406274]: linux_sparc_syscall+0x34/0x44 >>> [ 20.091196] Caller[0000000000100aa8]: 0x100aa8
[...]
```
[1]: https://pastebin.com/ApPYsMcu
Here the result for the suggested command:
Thanks. And very strange, as i can't find what would free options
before. Does the system boot if you comment out that kfree in line
3415 (even if that casues a memleak elsewhere).
Unfortunately not, the result with the kfree() commented in fs/namespace.c:3415 looks pretty similar in my eyes.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 17:04:51 |
Calls: | 6,646 |
Calls today: | 1 |
Files: | 12,190 |
Messages: | 5,327,170 |