Do people still run newer kernels on older hardware? If there is interest,
I may be able to get some more diagnostic information. In particular I'd be curious to know if Oracle do any routine testing of newer kernels on machines such as the U450 and whether anyone there can reproduce the problem.
[1] https://marc.info/?l=linux-sparc&m=161399891728083&w=2
Hi Mark!
On 2/24/21 12:14 PM, Mark Cave-Ayland wrote:
Do people still run newer kernels on older hardware? If there is interest, >> I may be able to get some more diagnostic information. In particular I'd be >> curious to know if Oracle do any routine testing of newer kernels on machines
such as the U450 and whether anyone there can reproduce the problem.
I think this must be an issue specific to this machine or this model as I haven't
seen such issues myself when testing on older machines.
There is a stability issue on newer kernels on older hardware that is currently
being debugged though [1].
Adrian
[1] https://marc.info/?l=linux-sparc&m=161399891728083&w=2
[...]
I then asked them to work backwards through a collection of historical debian-ports ISOs that I own until we found one that would boot. The
results were as follows:
debian-10.0.0-sparc64-NETINST-1.iso (kernel 5.9.0-1-sparc64, grub) - FAILS debian-9.0-sparc64-NETINST-1.iso (kernel 4.14.0-3-sparc64, SILO) - FAILS debian-7.7.0-sparc-netinst.iso (kernel 3.2.0-4-sparc64, SILO) - FAILS debian-6.0.4-sparc-netinst.iso (kernel 2.6.32-5-sparc64, SILO) - WORKS
Having eliminated the change of bootloader from SILO to grub as the
problem, it really seems as if something in the kernel broke booting on
a U450 between versions 2.6.32 and 3.2.0. I should add that these ISOs
all boot fine under qemu-system-sparc64 which is a U5 machine, so the
newer kernels are not completely broken.
Do people still run newer kernels on older hardware? If there is
interest, I may be able to get some more diagnostic information. In particular I'd be curious to know if Oracle do any routine testing of
newer kernels on machines such as the U450 and whether anyone there can reproduce the problem.
Hi Mark,
On 24.02.21 12:14, Mark Cave-Ayland wrote:
[...]
I then asked them to work backwards through a collection of historical
debian-ports ISOs that I own until we found one that would boot. The
results were as follows:
debian-10.0.0-sparc64-NETINST-1.iso (kernel 5.9.0-1-sparc64, grub) - FAILS >> debian-9.0-sparc64-NETINST-1.iso (kernel 4.14.0-3-sparc64, SILO) - FAILS
debian-7.7.0-sparc-netinst.iso (kernel 3.2.0-4-sparc64, SILO) - FAILS
debian-6.0.4-sparc-netinst.iso (kernel 2.6.32-5-sparc64, SILO) - WORKS
Having eliminated the change of bootloader from SILO to grub as the
problem, it really seems as if something in the kernel broke booting on
a U450 between versions 2.6.32 and 3.2.0. I should add that these ISOs
all boot fine under qemu-system-sparc64 which is a U5 machine, so the
newer kernels are not completely broken.
I have checked my logs and (probably) the last time I used my Ultra Enterprise 450 - 2018-04-21 - it was running a kernel v4.15.4:
```
root@e450:~# uname -a
Linux e450 4.15.0-1-sparc64-smp #1 SMP Debian 4.15.4-1 (2018-02-18)
sparc64 GNU/Linux
```
...successfully (incl. `openssl`, `7za` and STREAM benchmarks for half
an hour or so). And according to my netboot configuration it was booted
with GRUB - from the "[...]2.02+dfsg1-3" package. Looks like I didn't
test with any later GRUB version/package.
From my experience, US II (and derived versions like IIi and IIe)
is/was still working well at that time, though US III and IIIi sometimes
had problems, though not sure if that is due to the processor or the
other components on the respective system boards.
There is a stability issue on newer kernels on older hardware that is currently
being debugged though [1].
Didn't know of that thread. I wonder if this could be the reason for the crashes on my v480 and v490, though they happened already during kernel
boot.
On 24/02/2021 12:29, Frank Scheiner wrote:
On 24.02.21 12:14, Mark Cave-Ayland wrote:Thanks for the information! Do you have a display on your U450 at all?
The U450 we were trying to rescue was headless (i.e. connect via serial
only) so the only differences I can see might either be the display or
the fact that the boot was occurring from the CDROM rather than a local
disk installation.
Next time you have the U450 fired up, I'd be interested to find out if
it is possible to boot directly from the latest debian ports CDROM for comparison.
On 24/02/2021 12:29, Frank Scheiner wrote:
On 24.02.21 12:14, Mark Cave-Ayland wrote:Next time you have the U450 fired up, I'd be interested to find out if
it is possible to boot directly from the latest debian ports CDROM for comparison.
0x0000060000000000]Feb 28 10:21:24 kernel: [ 0.001095] MM: VMEMMAP [0x0000060000000000
0x00000c0000000000]Feb 28 10:21:24 kernel: [ 0.005132] Kernel: Using 4 locked TLB
Hi Mark,
On 24.02.21 14:01, Mark Cave-Ayland wrote:
On 24/02/2021 12:29, Frank Scheiner wrote:
On 24.02.21 12:14, Mark Cave-Ayland wrote:Next time you have the U450 fired up, I'd be interested to find out if
it is possible to boot directly from the latest debian ports CDROM for
comparison.
So I fetched her from (cold) storage this morning and let her warm up in
the morning sun. When ready I booted with the latest image I did find yesterday evening ([1]) and...
[1]: https://cdimage.debian.org/cdimage/ports/snapshots/2021-02-02/debian-10.0.0-sparc64-NETINST-1.iso
...it worked through until the first screen of the rescue mode is shown.
No crashes, no nothing.
Here is the start of the syslog - I didn't have any storage at hand so
copied it from screen directly:
```
Feb 28 10:21:24 syslogd started: BusyBox v1.30.1
Feb 28 10:21:24 kernel: klogd started: BusyBox v1.30.1 (Debian 1:1.30.1-4) Feb 28 10:21:24 kernel: [ 0.000145] PROMLIB: Sun IEEE Boot Prom 'OBP 3.30.0 2003/11/11 10:41'
Feb 28 10:21:24 kernel: [ 0.000232] PROMLIB: Root node compatible: sun4u
Feb 28 10:21:24 kernel: [ 0.000527] Linux version 5.10.0-3-sparc64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1
20210110, GNU ld (GNU Binutils for Debian) 2.35.1) #1 Debian 5.10.12-1 (2021-01-30)
Feb 28 10:21:24 kernel: [ 0.000721] Unknown boot switch (--)
Feb 28 10:21:24 kernel: [ 0.000730] Unknown boot switch (--)
Feb 28 10:21:24 kernel: [ 0.000905] printk: bootconsole [earlyprom0] enabled
Feb 28 10:21:24 kernel: [ 0.000914] ARCH: SUN4U
Feb 28 10:21:24 kernel: [ 0.001033] Ethernet address: 08:00:20:a7:5e:0a Feb 28 10:21:24 kernel: [ 0.001073] MM: PAGE_OFFSET is 0xfffff80000000000 (max_phys_bits == 40)
Feb 28 10:21:24 kernel: [ 0.001084] MM: VMALLOC [0x0000000100000000
0x0000060000000000]Feb 28 10:21:24 kernel: [ 0.001095] MM: VMEMMAP [0x0000060000000000
0x00000c0000000000]Feb 28 10:21:24 kernel: [ 0.005132] Kernel: Using 4 locked TLB
entries for main kernel image.
Feb 28 10:21:24 kernel: [ 0.005189] Remapping the kernel...
Feb 28 10:21:24 kernel: [ 0.052850] done.
Feb 28 10:21:24 kernel: [ 1.098314] OF stdout device is: /pci@1f,4000/ebus@1/
/se@14,400000:a
Feb 28 10:21:24 kernel: [ 1.098327] PROM: Built device tree with
139414 bytes of memory.
Feb 28 10:21:24 kernel: [ 1.098734] Top of RAM: 0xffea2000, Total
RAM: 0xffe96000
Feb 28 10:21:24 kernel: [ 1.098744] Memory hole size: 0MB
Feb 28 10:21:24 kernel: [ 1.124511] Allocated 16384 bytes for kernel page tables.
Feb 28 10:21:24 kernel: [ 1.124575] Zone ranges:
Feb 28 10:21:24 kernel: [ 1.124586] Normal [mem 0x0000000000000000-0x00000000ffea1fff]
Feb 28 10:21:24 kernel: [ 1.124608] Movable zone start for each node Feb 28 10:21:24 kernel: [ 1.124616] Early memory node ranges
Feb 28 10:21:24 kernel: [ 1.124628] node 0: [mem 0x0000000000000000-0x00000000ffdfdfff]
Feb 28 10:21:24 kernel: [ 1.124644] node 0: [mem 0x00000000ffe00000-0x00000000ffe81fff]
Feb 28 10:21:24 kernel: [ 1.124656] node 0: [mem 0x00000000ffe8c000-0x00000000ffea1fff]
Feb 28 10:21:24 kernel: [ 1.124746] Zeroed struct page in unavailable ranges: 181 pages
Feb 28 10:21:24 kernel: [ 1.124760] Initmem setup node 0 [mem 0x0000000000000000-0x00000000ffea1fff]
Feb 28 10:21:24 kernel: [ 1.124777] On node 0 totalpages: 524107
Feb 28 10:21:24 kernel: [ 1.124790] Normal zone: 4607 pages used for memmap
Feb 28 10:21:24 kernel: [ 1.124801] Normal zone: 0 pages reserved Feb 28 10:21:24 kernel: [ 1.124814] Normal zone: 524107 pages, LIFO batch:31
Feb 28 10:21:24 kernel: [ 1.289565] Booting
Linux...
Feb 28 10:21:24 kernel: [ 1.289591] CPU CAPS: [flush,stbar,swap,muldiv,v9,mul32,div32,v8plus]
Feb 28 10:21:24 kernel: [ 1.289674] CPU CAPS: [vis]
Feb 28 10:21:24 kernel: [ 1.302223] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
Feb 28 10:21:24 kernel: [ 1.302239] pcpu-alloc: [0] 0
Feb 28 10:21:24 kernel: [ 1.308282] Built 1 zonelists, mobility grouping on. Total pages: 519500
Feb 28 10:21:24 kernel: [ 1.308299] Kernel command line: BOOT_IMAGE=/install/vmlinux rescue/enable=true --- quiet
Feb 28 10:21:24 kernel: [ 1.333950] Dentry cache hash table entries: 524288 (order: 9, 4194304 bytes, linear)
Feb 28 10:21:24 kernel: [ 1.343863] Inode-cache hash table entries: 262144 (order: 8, 2097152 bytes, linear)
Feb 28 10:21:24 kernel: [ 1.343878] Sorting __ex_table...
Feb 28 10:21:24 kernel: [ 1.346444] mem auto-init: stack:off, heap alloc:on, heap free:off
Feb 28 10:21:24 kernel: [ 1.531560] Memory: 4114688K/4192856K
available (8081K kernel code, 1417K rwdata, 2152K rodata, 496K init,
405K bss, 78168K reserved, ,
0K cma-reserved)
[...]
```
For referenced my machine has four US II running at 400 MHz and 16 x 256
MiB memory modules installed:
```
~ # cat /proc/cpuinfo
cpu : TI UltraSparc II (BlackBird) fpu : UltraSparc II integrated FPU pmu : ultra12
prom : OBP 3.30.0 2003/11/11 10:41 type : sun4u
ncpus probed : 4
ncpus active : 1
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0ClkTck : 0000000017d78400
cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis MMU Type : Spitfire
MMU PGSZs : 8K,64K,512K,4MB
```
...and there also was a graphics card installed, but I used the machine
via serial console.
I can't say where our two machines differ (maybe OBP version?), but it
could be interesting to see, if your client's machine can boot
successfully from a Solaris 10 CDROM. Maybe even before trying that, I
would run the whole hardware with the diag key position enabled and log
and follow that output via the serial console. Maybe some memory modules
need re-seating or are defective or something is wrong with the
processors - though I never saw something like the latter within all the various US II powered machines I own. In addition I remember that not
all processor modules were recommended or maybe compatible with all
machines they could be fitted in. So it could be an idea to also check
that (i.e. the `501-[...]` number and what's recommended in a Sun System Handbook).
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 65:51:23 |
Calls: | 6,654 |
Files: | 12,200 |
Messages: | 5,331,878 |