Forum: >>> Magnum BBS <<<

Kernel panic not printing a call trace?

From Rich@21:1/5 to All on Thu May 13 04:50:02 2021

Hi all,
So, I got my earlier system running sparc64 using a terrible method
(from inside the existing sparc install, mount -o remount,ro /; nc -l
| dd of=/dev/sda [...] an image generated in a VM, reboot and pray),
but now I'm doing the thing I actually wanted a sparc64 system for
(testing a kernel module on sparc64), and encountering a problem.

While running through its test suite, when it runs through a certain
suite of tests, every time (so far) it dies in the same annoying
fashion:
[ 1435.191913] Kernel panic - not syncing: corrupted stack end
detected inside scheduler
[ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P
OE 5.10.0-6-sparc64 #1 Debian 5.10.28-1
[ 1435.431126] Call Trace:
[ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break
[ 1435.463267] twice on console to return to the boot prom
[ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack
end detected inside scheduler ]---

RED State Exception

TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406 TL=0000.0000.0000.0001 TT=0000.0000.0000.0068
TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606

Watchdog Reset
Externally Initiated Reset
ok

(Sometimes, it winds up so disgruntled, the watchdog reset never
triggers, break twice on the console doesn't work, you need to
physically power cycle it.)

I'm mostly curious about whether anyone knows why the Call Trace might
be empty - I see the message about corrupted stack end above it, but
from what I can see online, plenty of people get that message and a
call trace printout below it (...on other architectures, at least). https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an
example of someone on this very list.

Does anyone have any insights? Or am I going to have to resort to
printks in random parts of the thread the panic notes and hope I find
the problem?

Thanks!
- Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John Paul Adrian Glaubitz@21:1/5 to Rich on Thu May 13 08:10:01 2021

On 5/13/21 4:49 AM, Rich wrote:

I'm mostly curious about whether anyone knows why the Call Trace might
be empty - I see the message about corrupted stack end above it, but
from what I can see online, plenty of people get that message and a
call trace printout below it (...on other architectures, at least). https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an example of someone on this very list.

If the stack is corrupted the backtrace may or may not be affected.

Does anyone have any insights? Or am I going to have to resort to
printks in random parts of the thread the panic notes and hope I find
the problem?

Why not bisect the kernel to find the actual bug?

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - glaubitz@debian.org
`. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rich@21:1/5 to rincebrain@gmail.com on Fri May 14 01:40:01 2021

(Sorry for the messy quoting, I'm not actually on the list so I didn't
see this reply until I thought to check the ML archives)

If the stack is corrupted the backtrace may or may not be affected.

Sure, but it happening every time is pretty surprising to me.

Why not bisect the kernel to find the actual bug?

A) I'm going to try booting variously old versions of the kernel, but...
B) I don't actually know that there was a version where the problem
I'm encountering didn't exist, so it's a relatively open search, and
C) Actually compiling kernels on this hardware will take an age each
time, so I was hoping to get better insight into the bug through a
stacktrace.

- Rich

On Wed, May 12, 2021 at 10:49 PM Rich <rincebrain@gmail.com> wrote:

Hi all,
So, I got my earlier system running sparc64 using a terrible method
(from inside the existing sparc install, mount -o remount,ro /; nc -l
| dd of=/dev/sda [...] an image generated in a VM, reboot and pray),
but now I'm doing the thing I actually wanted a sparc64 system for
(testing a kernel module on sparc64), and encountering a problem.

While running through its test suite, when it runs through a certain
suite of tests, every time (so far) it dies in the same annoying
fashion:
[ 1435.191913] Kernel panic - not syncing: corrupted stack end
detected inside scheduler
[ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P
OE 5.10.0-6-sparc64 #1 Debian 5.10.28-1
[ 1435.431126] Call Trace:
[ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break
[ 1435.463267] twice on console to return to the boot prom
[ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack
end detected inside scheduler ]---

RED State Exception

TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406
TL=0000.0000.0000.0001 TT=0000.0000.0000.0068
TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606

Watchdog Reset
Externally Initiated Reset
ok

(Sometimes, it winds up so disgruntled, the watchdog reset never
triggers, break twice on the console doesn't work, you need to
physically power cycle it.)

I'm mostly curious about whether anyone knows why the Call Trace might
be empty - I see the message about corrupted stack end above it, but
from what I can see online, plenty of people get that message and a
call trace printout below it (...on other architectures, at least). https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an example of someone on this very list.

Does anyone have any insights? Or am I going to have to resort to
printks in random parts of the thread the panic notes and hope I find
the problem?

Thanks!
- Rich

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	62:00:16
Calls:	6,654
Files:	12,200
Messages:	5,331,620

Kernel panic not printing a call trace?

Who's Online

System Info