• Booting a T5140 panics

    From Rich@21:1/5 to All on Tue Nov 16 04:20:01 2021
    Hi all,
    So, I came into possession of a T5140, and logically decided to try
    booting Debian on it.

    Unfortunately, this is somewhat complicated by the fact that at least
    my T5140 (and possibly all? I don't have the firmware updates to know)
    still has the "booting an image over 10MB? nah." TFTP issue, so I
    can't just lob netboot.tar.gz over the network and be happy.

    So I broke out my old optical drive from the cobwebs, burned [1] to an
    actual optical disc, and booted it.

    It got to GRUB, I booted Default Install, and immediately got back:
    [ 11.339170] NON-RESUMABLE ERROR: Reporting on cpu 0
    [ 11.339263] NON-RESUMABLE ERROR: TPC [0x00000000008dcd44] <__pci_enable_msix_range+0x364/0x680>
    [ 11.339425] NON-RESUMABLE ERROR: RAW [0001000000000001:0000090b1f1fd77f:0000000202000080:ffffffffffffffff
    [ 11.339563] NON-RESUMABLE ERROR: 0000000800000000:0000000000000000:0000000000000000:0000000000000000]
    [ 11.339700] NON-RESUMABLE ERROR: handle [0x0001000000000001] stick [0x0000090b1f1fd77f]
    [ 11.339758] NON-RESUMABLE ERROR: type [precise nonresumable]
    [ 11.339808] NON-RESUMABLE ERROR: attrs [0x02000080] < ASI sp-faulted priv > [ 11.339885] NON-RESUMABLE ERROR: raddr [0xffffffffffffffff]
    [ 11.339938] NON-RESUMABLE ERROR: insn effective address [0x000000c50020000c] [ 11.339991] NON-RESUMABLE ERROR: size [0x8]
    [ 11.340025] NON-RESUMABLE ERROR: asi [0x00]
    [ 11.340822] Kernel panic - not syncing: Non-resumable error.
    [ 11.340873] CPU: 0 PID: 90 Comm: systemd-udevd Tainted: G
    E 5.14.0-3-sparc64 #1 Debian 5.14.12-1
    [ 11.340952] Call Trace:
    [ 11.340981] [<0000000000c19e1c>] panic+0xec/0x340
    [ 11.341028] [<000000000042a628>] sun4v_nonresum_error+0xc8/0xe0
    [ 11.341089] [<0000000000406da0>] sun4v_nonres_mondo+0xc8/0xd8
    [ 11.341158] [<00000000008dcd44>] __pci_enable_msix_range+0x364/0x680
    [ 11.341223] [<00000000008dd080>] pci_enable_msix_range+0x20/0x40
    [ 11.341285] [<00000000101879e8>] niu_try_msix+0xc8/0x1a0 [niu]
    [ 11.341404] [<000000001018f13c>] niu_get_invariants+0x47c/0x2860 [niu]
    [ 11.341524] [<0000000010191774>] niu_pci_init_one+0x254/0x420 [niu]
    [ 11.341642] [<00000000008d3268>] pci_device_probe+0xc8/0x160
    [ 11.341708] [<0000000000971164>] really_probe+0xc4/0x480
    [ 11.341779] [<0000000000971644>] __driver_probe_device+0x124/0x180
    [ 11.341853] [<00000000009716c8>] driver_probe_device+0x28/0xe0
    [ 11.341925] [<0000000000971f24>] __driver_attach+0xc4/0x200
    [ 11.341997] [<000000000096e818>] bus_for_each_dev+0x58/0xa0
    [ 11.342066] [<000000000097079c>] driver_attach+0x1c/0x40
    [ 11.342135] [<00000000009701b0>] bus_add_driver+0x1d0/0x240
    [ 11.342282] Press Stop-A (L1-A) from sun keyboard or send break
    [ 11.342282] twice on console to return to the boot prom
    [ 11.342367] ---[ end Kernel panic - not syncing: Non-resumable error. ]---

    "Neat."

    Sadly, the niu driver doesn't really seem to have any parameters I
    could play with...

    Anybody have a suggestion for how to not get burned by this? I can try
    just booting older and older snapshots, but I only have so many
    burnable discs.

    Booting a wheezy netboot.tar.gz booted without complaint, but that's
    not directly useful at the moment...unless I want to clobber a disk
    with a bootable sparc64 image, I suppose.

    I tried editing the boot command in GRUB on the disc to say 'set options="priority=high nomsi"', but it didn't affect the behavior. I
    could blacklist the niu driver entirely next (assuming I'm not doing
    the parameter passing wrong), but then I need a non-niu NIC to plug
    in...

    Thanks for any insight anyone can provide,
    - Rich

    [1] - https://cdimage.debian.org/cdimage/ports/snapshots/2021-10-20/debian-11.0.0-sparc64-NETINST-1.iso

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dennis Clarke@21:1/5 to Rich on Tue Nov 16 05:10:01 2021
    On 11/15/21 22:19, Rich wrote:
    Hi all,
    So, I came into possession of a T5140, and logically decided to try
    booting Debian on it.


    Wish I could help. I just wanted to say WELL DONE !

    I have a Netra sparc unit that seems to run Debian just fine unless I
    try to do update-grub where it just panics.

    I am going to try, next week?, to boot a Fujitsu/Oracle M4000 unit that
    is laying around in my life. Should be a right royal disaster but I will certainly report progress.



    --
    Dennis Clarke
    RISC-V/SPARC/PPC/ARM/CISC
    UNIX and Linux spoken
    GreyBeard and suspenders optional

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Rich on Tue Nov 16 09:40:02 2021
    Hello Rich!

    On 11/16/21 04:19, Rich wrote:
    Anybody have a suggestion for how to not get burned by this? I can try
    just booting older and older snapshots, but I only have so many
    burnable discs.

    Try this image which has a 4.19 kernel which is known to be less problematic
    on older SPARCs:

    https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/debian-10.0-sparc64-NETINST-1.iso

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nemo Nusquam@21:1/5 to John Paul Adrian Glaubitz on Tue Nov 16 20:50:02 2021
    Greetings, Adrian.

    On 2021-11-16 03:39, John Paul Adrian Glaubitz wrote:
    Hello Rich!

    On 11/16/21 04:19, Rich wrote:
    Anybody have a suggestion for how to not get burned by this? I can try
    just booting older and older snapshots, but I only have so many
    burnable discs.
    Try this image which has a 4.19 kernel which is known to be less problematic on older SPARCs:

    https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/debian-10.0-sparc64-NETINST-1.iso

    Will this work on a SB2000?  I tried older Debian releases and there
    were problems with FC.

    Sincerely,
    N.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to glaubitz@physik.fu-berlin.de on Sat Nov 20 07:40:01 2021
    Curiously, I didn't see other people's replies to my message, probably
    due to not being subscribed. I figured at the point where I just
    bought another SPARC, I might as well...

    I've done some digging into it. The short version is that commit
    7d5ec3d3 introduced the readl() at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/msi.c?h=v5.14#n748
    , and for reasons that are currently opaque to me not being familiar
    with low-level SPARC details, that dies in a fire.

    Just short-circuiting it to 1 instead of calling readl() seems to
    function fine for my system, but isn't a great general solution...

    (Bonus fun - my T5140 is currently booting from a /boot on a USB drive
    and a / on an LSI SAS2 PCIe controller because A) the onboard SAS1
    card is faulted and not obviously fixable, and B) I only figured out
    how to get the SAS2 disks to show up as a bootable device by loading
    the FCode blob onto the card after I did the install...I should
    probably migrate that at some point.)

    Last bit of fun is that Linux seems to be misinterpreting the local-mac-address? setting and just unconditionally overriding
    _everything_'s MAC with the system's, whether it's set to true or
    false. (Amusingly, the "niu" driver has special casing for this and
    instead of setting everything to e.g. 00:aa:aa:aa:aa:aa, sets the
    first port to 00:aa:aa:aa:aa:aa, second to 00:aa:aa:aa:aa:ab, and so
    on.)

    - Rich

    On Tue, Nov 16, 2021 at 3:39 AM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:

    Hello Rich!

    On 11/16/21 04:19, Rich wrote:
    Anybody have a suggestion for how to not get burned by this? I can try
    just booting older and older snapshots, but I only have so many
    burnable discs.

    Try this image which has a 4.19 kernel which is known to be less problematic on older SPARCs:

    https://cdimage.debian.org/cdimage/ports/snapshots/2019-07-16/debian-10.0-sparc64-NETINST-1.iso

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Rich on Sat Nov 20 11:10:02 2021
    Hello Rich!

    On 11/20/21 07:38, Rich wrote:
    Curiously, I didn't see other people's replies to my message, probably
    due to not being subscribed. I figured at the point where I just
    bought another SPARC, I might as well...

    I've done some digging into it. The short version is that commit
    7d5ec3d3 introduced the readl() at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/msi.c?h=v5.14#n748
    , and for reasons that are currently opaque to me not being familiar
    with low-level SPARC details, that dies in a fire.

    Just short-circuiting it to 1 instead of calling readl() seems to
    function fine for my system, but isn't a great general solution...

    I would suggest bisecting the kernel to this point where the breakage was introduced and report the bug to the author of the original change as well
    as the SPARC Linux kernel mailing list [1].

    Adrian

    [1] http://vger.kernel.org/vger-lists.html#sparclinux

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich@21:1/5 to glaubitz@physik.fu-berlin.de on Sat Nov 20 11:10:01 2021
    I did - that's the commit that I mentioned there, 7d5ec3d3. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit?id=7d5ec3d3

    I reported it to linux-pci@ since it was there; I can easily go CC the
    original author and linux-sparc@, but wasn't sure of the etiquette
    involved, and didn't want to just blast multiple lists.

    - Rich

    On Sat, Nov 20, 2021 at 5:06 AM John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote:

    Hello Rich!

    On 11/20/21 07:38, Rich wrote:
    Curiously, I didn't see other people's replies to my message, probably
    due to not being subscribed. I figured at the point where I just
    bought another SPARC, I might as well...

    I've done some digging into it. The short version is that commit
    7d5ec3d3 introduced the readl() at https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/pci/msi.c?h=v5.14#n748
    , and for reasons that are currently opaque to me not being familiar
    with low-level SPARC details, that dies in a fire.

    Just short-circuiting it to 1 instead of calling readl() seems to
    function fine for my system, but isn't a great general solution...

    I would suggest bisecting the kernel to this point where the breakage was introduced and report the bug to the author of the original change as well
    as the SPARC Linux kernel mailing list [1].

    Adrian

    [1] http://vger.kernel.org/vger-lists.html#sparclinux

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Rich on Sat Nov 20 11:20:01 2021
    Hi Rich!

    On 11/20/21 11:08, Rich wrote:
    I did - that's the commit that I mentioned there, 7d5ec3d3. https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit?id=7d5ec3d3

    Perfect, thank you!

    I reported it to linux-pci@ since it was there; I can easily go CC the original author and linux-sparc@, but wasn't sure of the etiquette
    involved, and didn't want to just blast multiple lists.

    I fully agree.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer - glaubitz@debian.org
    `. `' Freie Universitaet Berlin - glaubitz@physik.fu-berlin.de
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)