• update-grub causes a system lockup

    From John Paul Adrian Glaubitz@21:1/5 to Dennis Clarke on Tue Jan 12 01:50:01 2021
    On 1/12/21 1:39 AM, Dennis Clarke wrote:

    I made a few minor edits to /etc/default/grub and then :

    root@ceres:~# update-grub
    [ 303.211729] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [grub-probe:261]
    [ 303.306793] Modules linked in: sg(E) envctrl(E) display7seg(E)
    flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) ata_generic(E) pata_cmd64x(E) sym53c8xx(E) libata(E) scsi_transport_spi(E) scsi_mod(E) sunhme(E)
    (...)
    Also this has been happening for months.

    I would suggest installing a 4.x kernel and see if that helps. I know that
    5.x kernels can be a bit unstable on certain older SPARC machines.

    If the issue goes away with an older kernel, try bisecting to find the commit that introduced the issue.

    Either way, it's good to have something which allows to reproduce the bug.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dennis Clarke@21:1/5 to All on Tue Jan 12 01:50:02 2021
    I made a few minor edits to /etc/default/grub and then :

    root@ceres:~# update-grub
    [ 303.211729] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [grub-probe:261]
    [ 303.306793] Modules linked in: sg(E) envctrl(E) display7seg(E)
    flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) t10_pi(E)
    crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) ata_generic(E) pata_cmd64x(E) sym53c8xx(E) libata(E) scsi_transport_spi(E) scsi_mod(E) sunhme(E)
    [ 303.716582] CPU: 0 PID: 261 Comm: grub-probe Tainted: G E
    5.10.0-1-sparc64 #1 Debian 5.10.5-1
    [ 303.845889] TSTATE: 0000000011001606 TPC: 000000000094c4f0 TNPC: 000000000094c4f4 Y: 00000000 Tainted: G E
    [ 303.993559] TPC: <misc_open+0x50/0x180>
    [ 304.043951] g0: fffff800068f5ec0 g1: 0000000000000098 g2:
    0000000000000000 g3: 000000000196df50
    [ 304.158439] g4: fffff8000ac388a0 g5: 000000005ff099f6 g6:
    fffff8000b6fc000 g7: 000000000ef10180
    [ 304.272918] o0: 0000000000f24960 o1: fffff8000b6ff8ec o2:
    fffff800042833d0 o3: 0000000000000000
    [ 304.387399] o4: 0000000000000000 o5: 0000000000000000 sp:
    fffff8000b6fef81 ret_pc: 000000000094c4c0
    [ 304.506456] RPC: <misc_open+0x20/0x180>
    [ 304.556875] l0: 0000000000f24800 l1: 0000000000000000 l2:
    0000000000664c00 l3: 0000000661c58e90
    [ 304.671360] l4: 0000000000020000 l5: fffff8000b6ff8f0 l6:
    0000000000e12000 l7: 0000000000000001
    [ 304.785838] i0: fffff8000ad93048 i1: fffff8000b47b600 i2:
    0000000000f24800 i3: 0000000000f24978
    [ 304.900318] i4: 00000000000000ec i5: 0000000010076818 i6:
    fffff8000b6ff031 i7: 0000000000665838
    [ 305.014814] I7: <chrdev_open+0x98/0x1e0>
    [ 305.066356] Call Trace:
    [ 305.098501] [<0000000000665838>] chrdev_open+0x98/0x1e0
    [ 305.167245] [<000000000065ae30>] do_dentry_open+0x170/0x420
    [ 305.240529] [<000000000065ca68>] vfs_open+0x28/0x40
    [ 305.304691] [<0000000000671348>] path_openat+0x988/0x1100
    [ 305.375707] [<0000000000673dd0>] do_filp_open+0x50/0x100
    [ 305.445573] [<000000000065cd30>] do_sys_openat2+0x70/0x180
    [ 305.517732] [<000000000065d268>] sys_openat+0x48/0xc0
    [ 305.584186] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
    ~

    At this point I have to signal a break to the console.

    I am not yet sure exactly which binary causes this problem but I
    am going with a wild guess that somewhere in /usr/sbin/grub-mkconfig
    we end up with a show stopping fault. I am walking through it line
    by line and trying to find the culprit.

    Also this has been happening for months.



    --
    Dennis Clarke
    RISC-V/SPARC/PPC/ARM/CISC
    UNIX and Linux spoken
    GreyBeard and suspenders optional

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dennis Clarke@21:1/5 to John Paul Adrian Glaubitz on Tue Jan 12 18:00:03 2021
    On 1/12/21 12:47 AM, John Paul Adrian Glaubitz wrote:
    On 1/12/21 1:39 AM, Dennis Clarke wrote:

    I made a few minor edits to /etc/default/grub and then :

    root@ceres:~# update-grub
    [ 303.211729] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
    [grub-probe:261]
    [ 303.306793] Modules linked in: sg(E) envctrl(E) display7seg(E)
    flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E)
    crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) t10_pi(E)
    crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) ata_generic(E)
    pata_cmd64x(E) sym53c8xx(E) libata(E) scsi_transport_spi(E) scsi_mod(E)
    sunhme(E)
    (...)
    Also this has been happening for months.

    I would suggest installing a 4.x kernel and see if that helps. I know that 5.x kernels can be a bit unstable on certain older SPARC machines.

    If the issue goes away with an older kernel, try bisecting to find the commit that introduced the issue.

    Either way, it's good to have something which allows to reproduce the bug.


    I was thinking that the architecture may be the issue. The age I mean.
    So I dragged out a newer Oracle T4 unit to try. I have no idea what will
    happen with the newer unit and have never tried to run the installer via
    the new SP/console serial interface but will give it a try.

    Dennis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Dennis Clarke on Tue Jan 12 18:20:01 2021
    On 1/12/21 5:58 PM, Dennis Clarke wrote:
    Either way, it's good to have something which allows to reproduce the bug. >>

    I was thinking that the architecture may be the issue. The age I mean.
    So I dragged out a newer Oracle T4 unit to try. I have no idea what will happen with the newer unit and have never tried to run the installer via
    the new SP/console serial interface but will give it a try.

    There are known issues with kernel stability on older SPARC CPUs which have
    not been resolved yet.

    If you could fine a reliable reproducer to trigger the crash, I can later use that to bisect the problem to find which particular commit introduced the regression.

    So, if you want to help with the SPARC port, this would be an excellent opportunity.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to John Paul Adrian Glaubitz on Wed Jan 13 01:40:01 2021
    On 1/12/21 6:18 PM, John Paul Adrian Glaubitz wrote:
    If you could fine a reliable reproducer to trigger the crash, I can later use that to bisect the problem to find which particular commit introduced the regression.

    So, if you want to help with the SPARC port, this would be an excellent opportunity.

    Alternatively, could you just send me your grub.conf which causes the crash when
    running update-grub?

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Dennis Clarke on Tue Jan 19 15:20:02 2021
    Hi Dennis!

    On 1/12/21 5:58 PM, Dennis Clarke wrote:
    I was thinking that the architecture may be the issue. The age I mean.
    So I dragged out a newer Oracle T4 unit to try. I have no idea what will happen with the newer unit and have never tried to run the installer via
    the new SP/console serial interface but will give it a try.

    There are known issues with older CPUs which are a bug in the kernel.

    However, currently I don't know how to reproduce the crash. If you have something
    the reproducibly causes the kernel to crash on the old SPARC CPUs that I can use,
    it would be very helpful for fixing the problem.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)