On Mar 12, 2021, at 5:56 AM, Dennis Clarke <dclarke@blastwave.org> wrote:
I have seen this for a few months now. The old old netra machine will
run just fine endlessly but if I attempt to perform a package update
then I am always assured to see :
ceres# apt-get update
Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.6 MB] Get:3 http://deb.debian.org/debian-ports sid/main all Packages [8,682
kB]
Fetched 30.3 MB in 1min 24s (361 kB/s)
Reading package lists... Done
ceres#
Then try "upgrade" and the machine drops off the network :
Setting up systemd (247.3-1) ...
Timeout, server 172.16.35.61 not responding.
On the serial console we see :
ceres# [2968669.114937] systemd[1]: systemd 247.3-1 running in system
mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID
+ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[2968669.411163] systemd[1]: Detected architecture sparc64.
[2968696.703129] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [systemd:1]
[2968696.794780] Modules linked in: drm(E)
drm_panel_orientation_quirks(E) i2c_core(E) sg(E) envctrl(E)
display7seg(E) flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E)
sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E)
crct10dif_common(E) ata_generic(E) pata_cmd64x(E) libata(E) sym53c8xx(E) scsi_transport_spi(E) scsi_mod(E) sunhme(E)
[2968697.265208] CPU: 0 PID: 1 Comm: systemd Tainted: G E 5.10.0-1-sparc64 #1 Debian 5.10.5-1
[2968697.391074] TSTATE: 0000000011001604 TPC: 000000000094c4f0 TNPC: 000000000094c4f4 Y: 00000000 Tainted: G E
[2968697.541033] TPC: <misc_open+0x50/0x180>
[2968697.593712] g0: fffff800065a1c80 g1: 0000000000000098 g2: 0000000000000000 g3: 0000000000000002
[2968697.710488] g4: fffff80004197020 g5: 0000000000e93214 g6: fffff80004198000 g7: 0000000000500008
[2968697.827256] o0: 0000000000f24960 o1: fffff800049ab110 o2: 0000000000040000 o3: 0000000000000000
[2968697.944022] o4: 0000000000000000 o5: 0000000000000000 sp: fffff8000419af81 ret_pc: 000000000094c4c0
[2968698.065369] RPC: <misc_open+0x20/0x180>
[2968698.118074] l0: 0000000000f24800 l1: fffff800041ce021 l2: 00000003e775fef2 l3: 00000003e775fef2
[2968698.234848] l4: 0000000000020000 l5: fffff8000419b8f0 l6: 0000000000e12000 l7: 0000000000000001
[2968698.351615] i0: fffff8000b791048 i1: fffff800049ab100 i2: 0000000000f24800 i3: 0000000000f24978
[2968698.468381] i4: 00000000000000eb i5: 0000000010040818 i6: fffff8000419b031 i7: 0000000000665838
[2968698.585168] I7: <chrdev_open+0x98/0x1e0>
[2968698.638996] Call Trace:
[2968698.673323] [<0000000000665838>] chrdev_open+0x98/0x1e0
[2968698.744355] [<000000000065ae30>] do_dentry_open+0x170/0x420 [2968698.819928] [<000000000065ca68>] vfs_open+0x28/0x40
[2968698.886379] [<0000000000671348>] path_openat+0x988/0x1100 [2968698.959682] [<0000000000673dd0>] do_filp_open+0x50/0x100 [2968699.031837] [<000000000065cd30>] do_sys_openat2+0x70/0x180 [2968699.106284] [<000000000065d268>] sys_openat+0x48/0xc0
[2968699.175027] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
~
Type 'go' to resume
ok ~
[EOT]
This is pretty consistent behavior. If someone has any ideas that would
be great. I realize that the old old Netra X1 or Netra T1 is well past
its prime but it does run very stable. I would love to fire up a big
Oracle M4000 unit to try but I have not heard from anyone anywhere that
knows if that can work at all. So for now these old netra units are all
that I can test with.
--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional
<div style="margin: 0px; font-stretch: normal; font-size: 17px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">cpucaps<span class="Apple-tab-span" style="white-space: pre;"> </span>: flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis</span></div><div style="margin: 0px; font-stretch: normal; font-size: 17px; line-height: normal; font-family: Menlo;" class=""><span style="font-variant-ligatures: no-common-ligatures;" class="">
0000000000000000 g3: 0000000000000002<br class="">[2968697.710488] g4: fffff80004197020 g5: 0000000000e93214 g6:<br class="">fffff80004198000 g7: 0000000000500008<br class="">[2968697.827256] o0: 0000000000f24960 o1: fffff800049ab110 o2:<br class="">0000000000040000 o3: 0000000000000000<br class="">[2968697.944022] o4: 0000000000000000 o5: 0000000000000000 sp:<br class="">fffff8000419af81 ret_pc: 000000000094c4c0<br class="">[2968698.065369] RPC: <misc_open+0x20/0x180><br class="">[2968698.
The Netra’s have few different devices wonder if there is a bug in one of those drivers?</div><div><br class=""></div><div>-Mike</div><br class=""></body></html>
On Mar 13, 2021, at 9:29 AM, Mike Tremaine <mgt@stellarcore.net> wrote:
On Mar 12, 2021, at 5:56 AM, Dennis Clarke <dclarke@blastwave.org <mailto:dclarke@blastwave.org>> wrote:
I have seen this for a few months now. The old old netra machine will
run just fine endlessly but if I attempt to perform a package update
then I am always assured to see :
What kernel are you on? I do not have a Netra handy (but I have one in storage, like everyone ;p ). I have an Ultra 5 here so UltraSparc IIi CPU. It does not expect this behavior. Any chance the memory module need to be reseated?
ceres# apt-get update
Get:1 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> sid/main sparc64 Packages [21.6 MB]
Get:3 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> sid/main all Packages [8,682
kB]
Fetched 30.3 MB in 1min 24s (361 kB/s)
Reading package lists... Done
ceres#
Then try "upgrade" and the machine drops off the network :
I have unstable the mix but as point of reference….
mgt@xray:~$ uname -a
Linux xray 5.10.0-3-sparc64 #1 Debian 5.10.13-1 (2021-02-06) sparc64 GNU/Linux
mgt@xray:~$ cat /etc/debian_version
bullseye/sid
mgt@xray:~$ cat /proc/cpuinfo
cpu : TI UltraSparc IIi (Sabre)
fpu : UltraSparc IIi integrated FPU
pmu : ultra12
prom : OBP 3.31.0 2001/07/25 20:36
type : sun4u
ncpus probed : 1
ncpus active : 1
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0ClkTck : 0000000013d92d40
cpucaps : flush,stbar,swap,muldiv,v9,mul32,div32,v8plus,vis
MMU Type : Spitfire
MMU PGSZs : 8K,64K,512K,4MB
root@xray:/home/users/mgt# apt update
Get:1 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> unreleased InRelease [56.6 kB]
Get:3 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> sid/main all Packages [9,069 kB]
Get:4 http://deb.debian.org/debian-ports <http://deb.debian.org/debian-ports> sid/main sparc64 Packages [21.5 MB]
Fetched 30.7 MB in 1min 55s (266 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
111 packages can be upgraded. Run 'apt list --upgradable' to see them. root@xray:/home/users/mgt# apt list --upgradeable
Listing… Done
.
.
apt upgrade was then run and 111 packages upgraded without issue….
Setting up systemd (247.3-1) ...
Timeout, server 172.16.35.61 not responding.
On the serial console we see :
ceres# [2968669.114937] systemd[1]: systemd 247.3-1 running in system
mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP
+LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD -SECCOMP +BLKID
+ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
[2968669.411163] systemd[1]: Detected architecture sparc64.
[2968696.703129] watchdog: BUG: soft lockup - CPU#0 stuck for 23s!
[systemd:1]
[2968696.794780] Modules linked in: drm(E)
drm_panel_orientation_quirks(E) i2c_core(E) sg(E) envctrl(E)
display7seg(E) flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E)
autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) crc32c_generic(E)
sd_mod(E) t10_pi(E) crc_t10dif(E) crct10dif_generic(E)
crct10dif_common(E) ata_generic(E) pata_cmd64x(E) libata(E) sym53c8xx(E)
scsi_transport_spi(E) scsi_mod(E) sunhme(E)
[2968697.265208] CPU: 0 PID: 1 Comm: systemd Tainted: G E
5.10.0-1-sparc64 #1 Debian 5.10.5-1
[2968697.391074] TSTATE: 0000000011001604 TPC: 000000000094c4f0 TNPC:
000000000094c4f4 Y: 00000000 Tainted: G E
[2968697.541033] TPC: <misc_open+0x50/0x180>
[2968697.593712] g0: fffff800065a1c80 g1: 0000000000000098 g2:
0000000000000000 g3: 0000000000000002
[2968697.710488] g4: fffff80004197020 g5: 0000000000e93214 g6:
fffff80004198000 g7: 0000000000500008
[2968697.827256] o0: 0000000000f24960 o1: fffff800049ab110 o2:
0000000000040000 o3: 0000000000000000
[2968697.944022] o4: 0000000000000000 o5: 0000000000000000 sp:
fffff8000419af81 ret_pc: 000000000094c4c0
[2968698.065369] RPC: <misc_open+0x20/0x180>
[2968698.118074] l0: 0000000000f24800 l1: fffff800041ce021 l2:
00000003e775fef2 l3: 00000003e775fef2
[2968698.234848] l4: 0000000000020000 l5: fffff8000419b8f0 l6:
0000000000e12000 l7: 0000000000000001
[2968698.351615] i0: fffff8000b791048 i1: fffff800049ab100 i2:
0000000000f24800 i3: 0000000000f24978
[2968698.468381] i4: 00000000000000eb i5: 0000000010040818 i6:
fffff8000419b031 i7: 0000000000665838
[2968698.585168] I7: <chrdev_open+0x98/0x1e0>
[2968698.638996] Call Trace:
[2968698.673323] [<0000000000665838>] chrdev_open+0x98/0x1e0
[2968698.744355] [<000000000065ae30>] do_dentry_open+0x170/0x420
[2968698.819928] [<000000000065ca68>] vfs_open+0x28/0x40
[2968698.886379] [<0000000000671348>] path_openat+0x988/0x1100
[2968698.959682] [<0000000000673dd0>] do_filp_open+0x50/0x100
[2968699.031837] [<000000000065cd30>] do_sys_openat2+0x70/0x180
[2968699.106284] [<000000000065d268>] sys_openat+0x48/0xc0
[2968699.175027] [<0000000000406174>] linux_sparc_syscall+0x34/0x44
~
Type 'go' to resume
ok ~
[EOT]
This is pretty consistent behavior. If someone has any ideas that would
be great. I realize that the old old Netra X1 or Netra T1 is well past
its prime but it does run very stable. I would love to fire up a big
Oracle M4000 unit to try but I have not heard from anyone anywhere that
knows if that can work at all. So for now these old netra units are all
that I can test with.
--
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional
The Netra’s have few different devices wonder if there is a bug in one of those drivers?
-Mike
On Mar 12, 2021, at 5:56 AM,
Dennis Clarke <dclarke@blastwave.org> wrote:
I have seen this for a few months now. The old old netra machine will
run just fine endlessly but if I attempt to perform a package update
then I am always assured to see :
What kernel are you on?
poweron
On 3/13/21 5:29 PM, Mike Tremaine wrote:
[...]On Mar 12, 2021, at 5:56 AM,
Dennis Clarke <dclarke@blastwave.org> wrote:
I did sent a BRK to the serial port and that drops us into the firmware
"ok" prompt. There is a failed fan but in fact the fan is entirely not there. At all. I removed it because it had failed five or six years ago
and getting another one is just annoying. Also it is not really needed.
We can see that there is 1G of ECC memory and the memory passes all the
basic tests.
Now I setup a few of the firmware variables and reset the unit :
ok printenv
Variable Name Value Default Value
[...]
local-mac-address? false false
[...]
ceres# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff
3: enp1s3f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff
ceres#
However there must be a bug somewhere because the physical MAC address
is the same on both interfaces.
On Mar 13, 2021, at 12:58 PM, Frank Scheiner <frank.scheiner@web.de> wrote:
Hi Dennis,
On 13.03.21 20:21, Dennis Clarke wrote:
On 3/13/21 5:29 PM, Mike Tremaine wrote:
[...]On Mar 12, 2021, at 5:56 AM,
Dennis Clarke <dclarke@blastwave.org> wrote:
I did sent a BRK to the serial port and that drops us into the firmware
"ok" prompt. There is a failed fan but in fact the fan is entirely not
there. At all. I removed it because it had failed five or six years ago
and getting another one is just annoying. Also it is not really needed.
Is the heatsink on the board cooled by a chassis then?
We can see that there is 1G of ECC memory and the memory passes all the
basic tests.
Now I setup a few of the firmware variables and reset the unit :
ok printenv
Variable Name Value Default Value
[...]
local-mac-address? false false
[...]
ceres# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode
DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff
3: enp1s3f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
link/ether 08:00:20:c2:46:48 brd ff:ff:ff:ff:ff:ff
ceres#
However there must be a bug somewhere because the physical MAC address
is the same on both interfaces.
This is due to `local-mac-address?` set to `false` in OBP. See e.g. [1]
for details.
[1]: https://docs.oracle.com/cd/E36784_01/html/E37475/eyprp.html
Cheers,
Frank
Let’s assume it’s not hardware, Dennis has posted the tests and states the machine ran Sol10 fine.
So, if, for example, you want to verify that the memory is okay, you should run
a memtest program.
...the built-in (memory) diagnostics of Sun machines are pretty
thorough. This is not a PC. :-)
On 3/14/21 6:48 PM, Frank Scheiner wrote:
So, if, for example, you want to verify that the memory is okay, you should run
a memtest program.
...the built-in (memory) diagnostics of Sun machines are pretty
thorough. This is not a PC. :-)
I doubt that the hardware runs a thorough memory test by default that
can be compared to a full memtest86 test run.
Either way, if the kernel breaks for someone, they will have to bisect the issue. I don't have any means in bisecting a problem if I cannot reproduce
it in the first place.
I have seen this for a few months now. The old old netra machine will
run just fine endlessly but if I attempt to perform a package update
then I am always assured to see :
ceres# apt-get update
Get:1 http://deb.debian.org/debian-ports sid InRelease [55.3 kB]
Get:2 http://deb.debian.org/debian-ports sid/main sparc64 Packages [21.6 MB] Get:3 http://deb.debian.org/debian-ports sid/main all Packages [8,682
kB]
Fetched 30.3 MB in 1min 24s (361 kB/s)
Reading package lists... Done
ceres#
Then try "upgrade" and the machine drops off the network :
Setting up systemd (247.3-1) ...
Timeout, server 172.16.35.61 not responding.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (3 / 13) |
Uptime: | 49:23:29 |
Calls: | 6,649 |
Calls today: | 1 |
Files: | 12,200 |
Messages: | 5,330,097 |