• Kernel panic not printing a call trace?

    From Rich@21:1/5 to All on Thu May 13 04:50:02 2021
    Hi all,
    So, I got my earlier system running sparc64 using a terrible method
    (from inside the existing sparc install, mount -o remount,ro /; nc -l
    | dd of=/dev/sda [...] an image generated in a VM, reboot and pray),
    but now I'm doing the thing I actually wanted a sparc64 system for
    (testing a kernel module on sparc64), and encountering a problem.

    While running through its test suite, when it runs through a certain
    suite of tests, every time (so far) it dies in the same annoying
    fashion:
    [ 1435.191913] Kernel panic - not syncing: corrupted stack end
    detected inside scheduler
    [ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P
    OE 5.10.0-6-sparc64 #1 Debian 5.10.28-1
    [ 1435.431126] Call Trace:
    [ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break
    [ 1435.463267] twice on console to return to the boot prom
    [ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack
    end detected inside scheduler ]---

    RED State Exception

    TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506 TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
    TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406 TL=0000.0000.0000.0001 TT=0000.0000.0000.0068
    TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606


    Watchdog Reset
    Externally Initiated Reset
    ok

    (Sometimes, it winds up so disgruntled, the watchdog reset never
    triggers, break twice on the console doesn't work, you need to
    physically power cycle it.)

    I'm mostly curious about whether anyone knows why the Call Trace might
    be empty - I see the message about corrupted stack end above it, but
    from what I can see online, plenty of people get that message and a
    call trace printout below it (...on other architectures, at least). https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an
    example of someone on this very list.

    Does anyone have any insights? Or am I going to have to resort to
    printks in random parts of the thread the panic notes and hope I find
    the problem?

    Thanks!
    - Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Rich on Thu May 13 08:10:01 2021
    On 5/13/21 4:49 AM, Rich wrote:
    I'm mostly curious about whether anyone knows why the Call Trace might
    be empty - I see the message about corrupted stack end above it, but
    from what I can see online, plenty of people get that message and a
    call trace printout below it (...on other architectures, at least). https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an example of someone on this very list.

    If the stack is corrupted the backtrace may or may not be affected.

    Does anyone have any insights? Or am I going to have to resort to
    printks in random parts of the thread the panic notes and hope I find
    the problem?

    Why not bisect the kernel to find the actual bug?

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to rincebrain@gmail.com on Fri May 14 01:40:01 2021
    (Sorry for the messy quoting, I'm not actually on the list so I didn't
    see this reply until I thought to check the ML archives)

    If the stack is corrupted the backtrace may or may not be affected.

    Sure, but it happening every time is pretty surprising to me.

    Why not bisect the kernel to find the actual bug?

    A) I'm going to try booting variously old versions of the kernel, but...
    B) I don't actually know that there was a version where the problem
    I'm encountering didn't exist, so it's a relatively open search, and
    C) Actually compiling kernels on this hardware will take an age each
    time, so I was hoping to get better insight into the bug through a
    stacktrace.

    - Rich

    On Wed, May 12, 2021 at 10:49 PM Rich <rincebrain@gmail.com> wrote:

    Hi all,
    So, I got my earlier system running sparc64 using a terrible method
    (from inside the existing sparc install, mount -o remount,ro /; nc -l
    | dd of=/dev/sda [...] an image generated in a VM, reboot and pray),
    but now I'm doing the thing I actually wanted a sparc64 system for
    (testing a kernel module on sparc64), and encountering a problem.

    While running through its test suite, when it runs through a certain
    suite of tests, every time (so far) it dies in the same annoying
    fashion:
    [ 1435.191913] Kernel panic - not syncing: corrupted stack end
    detected inside scheduler
    [ 1435.294939] CPU: 0 PID: 722 Comm: spl_system_task Tainted: P
    OE 5.10.0-6-sparc64 #1 Debian 5.10.28-1
    [ 1435.431126] Call Trace:
    [ 1435.463267] Press Stop-A (L1-A) from sun keyboard or send break
    [ 1435.463267] twice on console to return to the boot prom
    [ 1435.609777] ---[ end Kernel panic - not syncing: corrupted stack
    end detected inside scheduler ]---

    RED State Exception

    TL=0000.0000.0000.0005 TT=0000.0000.0000.0010
    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
    TL=0000.0000.0000.0004 TT=0000.0000.0000.0010
    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
    TL=0000.0000.0000.0003 TT=0000.0000.0000.0010
    TPC=0000.0000.0042.4200 TnPC=0000.0000.0042.4204 TSTATE=0000.0000.8000.1506
    TL=0000.0000.0000.0002 TT=0000.0000.0000.0010
    TPC=0000.0000.0040.70d0 TnPC=0000.0000.0040.70d4 TSTATE=0000.0000.8004.1406
    TL=0000.0000.0000.0001 TT=0000.0000.0000.0068
    TPC=0000.0000.0048.bba4 TnPC=0000.0000.0048.bba8 TSTATE=0000.0000.8000.1606


    Watchdog Reset
    Externally Initiated Reset
    ok

    (Sometimes, it winds up so disgruntled, the watchdog reset never
    triggers, break twice on the console doesn't work, you need to
    physically power cycle it.)

    I'm mostly curious about whether anyone knows why the Call Trace might
    be empty - I see the message about corrupted stack end above it, but
    from what I can see online, plenty of people get that message and a
    call trace printout below it (...on other architectures, at least). https://lists.debian.org/debian-sparc/2016/09/msg00002.html is even an example of someone on this very list.

    Does anyone have any insights? Or am I going to have to resort to
    printks in random parts of the thread the panic notes and hope I find
    the problem?

    Thanks!
    - Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)