Forum: >>> Magnum BBS <<<

Ultra5 successful install - PGX64 issues

From Frank Scheiner@21:1/5 to Dennis Clarke on Sat Apr 14 20:20:02 2018

On 04/14/2018 06:11 PM, Dennis Clarke wrote:

Really? Well then .. let me see what I have that is ancient in the
warehouse.

How about PA-RISC? I happen to have some superdomes kicking about but
they require truely a ton of power to operate.

I assume hppa people in Debian (debian-hppa@l.d.o in CC) would
appreciate testing on such gear. Not sure if those superdomes will work
out of the box though. I know from my own testing that the following
"smaller" machines work with Debian GNU/Linux Sid for hppa:

* 712/80
* c3700, c3750, J5600, rp2470
* c8000, rp3440

Apart from the rp3440 - and maybe also the 712/80 which showed some
issue with it's built-in NIC after netbooting the Linux kernel and the
OS - all machines also work diskless, which could speed up testing for
you and avoid a manual Debian installation - although this could still
be interesting.

Cheers,
Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helge Deller@21:1/5 to Frank Scheiner on Sun Apr 15 10:40:01 2018

On 14.04.2018 20:13, Frank Scheiner wrote:

On 04/14/2018 06:11 PM, Dennis Clarke wrote:

Really? Well then .. let me see what I have that is ancient in the
warehouse.

How about PA-RISC? I happen to have some superdomes kicking about but they require truely a ton of power to operate.

I assume hppa people in Debian (debian-hppa@l.d.o in CC) would appreciate testing on such gear.
Not sure if those superdomes will work out of the box though.

It really would be interesting if Linux can boot on such machines.
If they don't, I'm pretty sure that I can finish the firmware support in Linux to be able to boot in a cell. For that I'd need access to such a machines via ssh (to a x86 machine for cross-compiling/tftpboot provisioning) & a serial port to the superdome.

I know from my own testing that the following "smaller" machines work with Debian GNU/Linux Sid for hppa:

* 712/80
* c3700, c3750, J5600, rp2470
* c8000, rp3440

Apart from the rp3440 - and maybe also the 712/80 which showed some issue with it's built-in NIC after netbooting the Linux kernel and the OS

What kind of problems?

- all machines also work diskless, which could speed up testing for you and avoid a manual Debian installation - although this could still be interesting.

Helge

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Scheiner@21:1/5 to Helge Deller on Thu Apr 19 21:40:02 2018

Hi,

and sorry for the delay, I was a little short of spare time this week. :-/

On 04/15/2018 10:34 AM, Helge Deller wrote:

On 14.04.2018 20:13, Frank Scheiner wrote:

I know from my own testing that the following "smaller" machines work with Debian GNU/Linux Sid for hppa:

* 712/80
* c3700, c3750, J5600, rp2470
* c8000, rp3440

Apart from the rp3440 - and maybe also the 712/80 which showed some issue with it's built-in NIC after netbooting the Linux kernel and the OS

What kind of problems?

Unfortunately I seem to not have made any notes for the issue with the
712/80, so I retried with the assumed issue creating configuration
earlier this week:

This configuration was using a Debian Linux kernel 4.9.25-1
(4.9.0-3-parisc from 2017-05-02). And when netbooting it, shortly after
login the machine seems to loose contact to the NFS server:

```
[...]
[ OK ] Started Serial Getty on ttyS0.
[ OK ] Started Getty on tty1.
[ OK ] Reached target Login Prompts.

Debian GNU/Linux buster/sid hp-712 ttyS0

hp-712 login: root
Password:
Last login: Thu Sep 18 11:30:50 CET 1902 from 172.16.1.1 on pts/0
Linux hp-712 4.9.0-3-parisc #1 Debian 4.9.25-1 (2017-05-02) parisc

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

[ 232.973913] nfs: server 172.16.0.2 not responding, still trying
[ 233.094265] nfs: server 172.16.0.2 not responding, still trying
[ 233.205127] nfs: server 172.16.0.2 not responding, still trying
[ 233.568429] nfs: server 172.16.0.2 not responding, still trying
[ 233.692383] nfs: server 172.16.0.2 not responding, still trying
[ 233.808818] nfs: server 172.16.0.2 not responding, still trying
[...]
[ 235.179253] nfs: server 172.16.0.2 OK
[ 235.251896] nfs: server 172.16.0.2 not responding, still trying
[...]
```

Although it seems to be able to reconnect from time to time, the machine
is not accessible.

Afterwards I found some older notes about this machine which mention no
issues during diskless operation with the very same configuration
(kernel and possibly also userland), which made me wonder, if there's
maybe an issue between the machine's built-in NIC and my used 1000 Mbit
network switch. And indeed, when connecting another 100 Mbit network
switch in between the 712/80 and the 1000 Mbit network switch the issue
seemed to be gone and the machine stayed accessible .

But later this week I retried the 712/80 with the current Linux kernel
(4.15.x) and Debian userland and the issue hit me again, although much
later and despite the 100 Mbit network switch in between. Looking at it
I could see that the collision indicator was active on the switch for
the port used by the 712/80. I then configured a singular port of the
1000 Mbit network switch to 10 Mbit full duplex and attached the 712/80
to it. And then the issue again seemed to be gone. But trying to install
a package or updating the package cache again quickly triggered it. Well
that's not that of an issue, as I can do the package management for the
712/80 with another machine (e.g. c8000).

Also interesting, the kernel messages for 4.15.11, please notice the
time difference between "random: crng init done" and "Key type
asymmetric registered":

```
[ 0.000000] Linux version 4.15.0-2-parisc
(debian-kernel@lists.debian.org) (gcc version 7.3.0 (Debian 7.3.0-12))
#1 Debian 4.15.11-1 (2018-03-20)
[ 0.000000] unwind_init: start = 0x1086e8b4, end = 0x108c5644,
entries = 22233
[ 0.000000] FP[0] enabled: Rev 1 Model 13
[ 0.000000] The 32-bit Kernel has started...
[...]
[ 9.919844] workingset: timestamp_bits=14 max_order=15 bucket_order=1
[ 10.168866] zbud: loaded
[ 56.112387] random: crng init done
[ 433.392379] Key type asymmetric registered
[ 433.445502] Asymmetric key parser 'x509' registered
[...]
[ 544.565451] systemd[1]: Detected architecture parisc.

Welcome to Debian GNU/Linux buster/sid!
[...]
[ OK ] Started Serial Getty on ttyS0.
[ OK ] Started Getty on tty1.
[ OK ] Reached target Login Prompts.

Debian GNU/Linux buster/sid hp-712 ttyS0

hp-712 login:

```

...On first try I assumed the machine or the kernel would hang, but no,
it was still working all the time.

Today I tested it again (with 4.15.11) and the issue this time hit me
already during login, after I entered the username.

So I'm actually back at where I'm started. :-(

I suspect that maybe the built-in 82596 NIC cannot cope with the amount
of traffic that happens during diskless operation - although I then
wonder why it doesn't have a problem during the TFTP operation to load
the lifimage. Next thing I'll examine will be the parameters used for
the NFS mount (especially for rsize and wsize) - if I ever can login to
it again :-). And maybe a fan for the passive heat sink of the CPU which
gets quite hot during operation.

Any suggestions on where to look else?

****

For the rp3440 I (also) have to retract my earlier statement as it looks
like my second rp3440 actually **works** diskless. I have to retest with
my first rp3440 (currently in storage) as it seems it behaves
differently in this regard - or maybe I misconfigured something there in
the past. I have to recheck.

But for my second rp3440 I still had to blacklist the `radeon` module to achieve this, as otherwise the system (console) seems to crash shortly
before the login prompt would have appeared or just after. This is my
used kernel command line as configured with palo 1.99 and Linux 4.14.x:

```
Current command line:
0/vmlinux HOME=/ root=/dev/nfs ip=:::::enp32s2:dhcp
modprobe.blacklist=radeon initrd=0/ramdisk TERM=vt102 console=ttyS0
0: 0/vmlinux
1: HOME=/
2: root=/dev/nfs
3: ip=:::::enp32s2:dhcp
4: modprobe.blacklist=radeon
5: initrd=0/ramdisk
6: TERM=vt102
7: console=ttyS0
```

Interestingly after upgrading all packages (obviously including palo) on
the NFS root FS and building a new lifimage with Linux 4.15.x,
blacklisting the radeon module seems to be no longer required. Not sure
if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon module
is no longer loaded automatically with this configuration.

****

So actually at least also the rp3440 can work diskless - good that you
asked, Helge. :-)

Cheers,
Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John David Anglin@21:1/5 to Frank Scheiner on Thu Apr 19 23:40:01 2018

On 2018-04-19 3:29 PM, Frank Scheiner wrote:

Interestingly after upgrading all packages (obviously including palo)
on the NFS root FS and building a new lifimage with Linux 4.15.x, blacklisting the radeon module seems to be no longer required. Not
sure if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon
module is no longer loaded automatically with this configuration.

A quirk was added to disable the radeon driver for the builtin RV100 in
the rp3440. It's broken. There should be a message in the dmesg log
about this.

Dave

--
John David Anglin dave.anglin@bell.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Scheiner@21:1/5 to John David Anglin on Fri Apr 20 08:00:01 2018

On 04/19/2018 11:33 PM, John David Anglin wrote:

On 2018-04-19 3:29 PM, Frank Scheiner wrote:

Interestingly after upgrading all packages (obviously including palo)
on the NFS root FS and building a new lifimage with Linux 4.15.x,
blacklisting the radeon module seems to be no longer required. Not
sure if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon
module is no longer loaded automatically with this configuration.

A quirk was added to disable the radeon driver for the builtin RV100 in
the rp3440. It's broken. There should be a message in the dmesg log
about this.

Indeed, there it is:

```
[...]
[ 3.100350] pci 0000:e0:01.0: Hiding Diva built-in AUX serial device
[ 3.101683] pci 0000:e0:02.0: Hiding Diva built-in ATI card
[...]
```

...I overlooked it. I wonder why this issue didn't hit the rp3440 (not
even my first one) when it was booted from disk, because it really looks
like not loading the radeon module solved the problem for diskless
operation.

Cheers,
Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helge Deller@21:1/5 to Frank Scheiner on Fri Apr 20 08:40:01 2018

On 19.04.2018 21:29, Frank Scheiner wrote:

Apart from the rp3440 - and maybe also the 712/80 which showed some issue with it's built-in NIC after netbooting the Linux kernel and the OS

What kind of problems?

Unfortunately I seem to not have made any notes for the issue with the 712/80, so I retried with the assumed issue creating configuration earlier this week:

This configuration was using a Debian Linux kernel 4.9.25-1 (4.9.0-3-parisc from 2017-05-02). And when netbooting it, shortly after login the machine seems to loose contact to the NFS server:

```
[...]
[ OK ] Started Serial Getty on ttyS0.
[ OK ] Started Getty on tty1.
[ OK ] Reached target Login Prompts.

Debian GNU/Linux buster/sid hp-712 ttyS0

hp-712 login: root
Password:
Last login: Thu Sep 18 11:30:50 CET 1902 from 172.16.1.1 on pts/0
Linux hp-712 4.9.0-3-parisc #1 Debian 4.9.25-1 (2017-05-02) parisc

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.

[ 232.973913] nfs: server 172.16.0.2 not responding, still trying
[ 233.094265] nfs: server 172.16.0.2 not responding, still trying
[ 233.205127] nfs: server 172.16.0.2 not responding, still trying
[ 233.568429] nfs: server 172.16.0.2 not responding, still trying
[ 233.692383] nfs: server 172.16.0.2 not responding, still trying
[ 233.808818] nfs: server 172.16.0.2 not responding, still trying
[...]
[ 235.179253] nfs: server 172.16.0.2 OK
[ 235.251896] nfs: server 172.16.0.2 not responding, still trying
[...]
```

Although it seems to be able to reconnect from time to time, the machine is not accessible.

Afterwards I found some older notes about this machine which mention
no issues during diskless operation with the very same configuration
(kernel and possibly also userland), which made me wonder, if there's
maybe an issue between the machine's built-in NIC and my used 1000
Mbit network switch. And indeed, when connecting another 100 Mbit
network switch in between the 712/80 and the 1000 Mbit network switch
the issue seemed to be gone and the machine stayed accessible .

But later this week I retried the 712/80 with the current Linux
kernel (4.15.x) and Debian userland and the issue hit me again,
although much later and despite the 100 Mbit network switch in
between. Looking at it I could see that the collision indicator was
active on the switch for the port used by the 712/80. I then
configured a singular port of the 1000 Mbit network switch to 10 Mbit
full duplex and attached the 712/80 to it. And then the issue again
seemed to be gone. But trying to install a package or updating the
package cache again quickly triggered it. Well that's not that of an
issue, as I can do the package management for the 712/80 with another
machine (e.g. c8000).

Also interesting, the kernel messages for 4.15.11, please notice the
time difference between "random: crng init done" and "Key type
asymmetric registered":

Seems to be a generic issue. https://www.linuxquestions.org/questions/showthread.php?p=5803405#post5803405

My assumption is, that the kernel waits until it has
enough randomness for the various encryption algorithms.

```
[    0.000000] Linux version 4.15.0-2-parisc (debian-kernel@lists.debian.org) (gcc version 7.3.0 (Debian 7.3.0-12)) #1 Debian 4.15.11-1 (2018-03-20)
[    0.000000] unwind_init: start = 0x1086e8b4, end = 0x108c5644, entries = 22233
[    0.000000] FP[0] enabled: Rev 1 Model 13
[    0.000000] The 32-bit Kernel has started...
[...]
[    9.919844] workingset: timestamp_bits=14 max_order=15 bucket_order=1 [   10.168866] zbud: loaded
[   56.112387] random: crng init done
[ 433.392379] Key type asymmetric registered
[ 433.445502] Asymmetric key parser 'x509' registered
[...]
[ 544.565451] systemd[1]: Detected architecture parisc.

Welcome to Debian GNU/Linux buster/sid!
[...]
[ OK ] Started Serial Getty on ttyS0.
[ OK ] Started Getty on tty1.
[ OK ] Reached target Login Prompts.

Debian GNU/Linux buster/sid hp-712 ttyS0

hp-712 login:

```

...On first try I assumed the machine or the kernel would hang, but no, it was still working all the time.

Today I tested it again (with 4.15.11) and the issue this time hit me already during login, after I entered the username.

So I'm actually back at where I'm started. :-(

I suspect that maybe the built-in 82596 NIC cannot cope with the
amount of traffic that happens during diskless operation - although I
then wonder why it doesn't have a problem during the TFTP operation
to load the lifimage.

When loading via TFTP not much traffic is generated.

Next thing I'll examine will be the parameters used for the NFS mount (especially for rsize and wsize) - if I ever can login to it again
:-). And maybe a fan for the passive heat sink of the CPU which gets
quite hot during operation.

Any suggestions on where to look else?

Not really.

****

For the rp3440 I (also) have to retract my earlier statement as it
looks like my second rp3440 actually **works** diskless. I have to
retest with my first rp3440 (currently in storage) as it seems it
behaves differently in this regard - or maybe I misconfigured
something there in the past. I have to recheck.

But for my second rp3440 I still had to blacklist the `radeon` module
to achieve this, as otherwise the system (console) seems to crash
shortly before the login prompt would have appeared or just after.
This is my used kernel command line as configured with palo 1.99 and
Linux 4.14.x:

```
Current command line:
0/vmlinux HOME=/ root=/dev/nfs ip=:::::enp32s2:dhcp modprobe.blacklist=radeon initrd=0/ramdisk TERM=vt102 console=ttyS0
0: 0/vmlinux
1: HOME=/
2: root=/dev/nfs
3: ip=:::::enp32s2:dhcp
4: modprobe.blacklist=radeon
5: initrd=0/ramdisk
6: TERM=vt102
7: console=ttyS0
```

Interestingly after upgrading all packages (obviously including palo)
on the NFS root FS and building a new lifimage with Linux 4.15.x, blacklisting the radeon module seems to be no longer required. Not
sure if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon
module is no longer loaded automatically with this configuration.

There were two issues fixed regarding rp3440.
1. The radeon module on the management board is automatically
disabled by the Linux kernel. This fixes crashes/hangs.
2. The serial port on the management board is disabled by the
Linux kernel.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bcf3f1752a622f1372d3252d0fea8855d89812e7

Older versions of palo tried to work around problem #2 by
giving kernel parameter "console=ttyS1" to the Linux kernel when
booting.
So, since you upgraded palo and kernel both workarounds aren't
necessary any longer and rp-class machines should work without
any further quirks.

Helge

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jeroen Roovers@21:1/5 to Frank Scheiner on Fri Apr 20 11:50:02 2018

On Thu, 19 Apr 2018 21:29:45 +0200
Frank Scheiner <frank.scheiner@web.de> wrote:

Afterwards I found some older notes about this machine which mention
no issues during diskless operation with the very same configuration
(kernel and possibly also userland), which made me wonder, if there's
maybe an issue between the machine's built-in NIC and my used 1000
Mbit network switch. And indeed, when connecting another 100 Mbit
network switch in between the 712/80 and the 1000 Mbit network switch
the issue seemed to be gone and the machine stayed accessible .

But later this week I retried the 712/80 with the current Linux
kernel (4.15.x) and Debian userland and the issue hit me again,
although much later and despite the 100 Mbit network switch in
between. Looking at it I could see that the collision indicator was
active on the switch for the port used by the 712/80. I then
configured a singular port of the 1000 Mbit network switch to 10 Mbit
full duplex and attached the 712/80 to it. And then the issue again
seemed to be gone. But trying to install a package or updating the
package cache again quickly triggered it.

You could try setting the internal NIC to half-duplex, or perhaps use a (passive) 10BASE-T hub instead of a switch if you cannot configure that internally, on the kernel command line, or doing it in userland is too
late.

Kind regards,
jer

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John David Anglin@21:1/5 to Jeroen Roovers on Sat Apr 21 02:30:02 2018

On 2018-04-20 5:24 AM, Jeroen Roovers wrote:

But later this week I retried the 712/80 with the current Linux
kernel (4.15.x) and Debian userland and the issue hit me again,
although much later and despite the 100 Mbit network switch in
between. Looking at it I could see that the collision indicator was
active on the switch for the port used by the 712/80. I then
configured a singular port of the 1000 Mbit network switch to 10 Mbit
full duplex and attached the 712/80 to it. And then the issue again
seemed to be gone. But trying to install a package or updating the
package cache again quickly triggered it.

You could try setting the internal NIC to half-duplex, or perhaps use a (passive) 10BASE-T hub instead of a switch if you cannot configure that internally, on the kernel command line, or doing it in userland is too
late.

From the manual, it seems the 10BASE-T port is half duplex (CSMA/CD).
The MAU
interface is definitely half duplex and the word duplex is not mentioned
in the manual.

The 10BASE-T port probably doesn't support auto negotiation, so you will
need to manually
set the switch port to 10BASE-T half duplex if it doesn't automatically configure to this mode
when auto negotiation fails.

Some switches support a half-duplex back pressure form of flow control.

Setting the switch port is probably easier than finding a passive 10BASE
hub.

Dave

--
John David Anglin dave.anglin@bell.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John David Anglin@21:1/5 to Helge Deller on Sat Apr 21 21:20:01 2018

On 2018-04-20 2:37 AM, Helge Deller wrote:

Also interesting, the kernel messages for 4.15.11, please notice the
time difference between "random: crng init done" and "Key type
asymmetric registered":

Seems to be a generic issue. https://www.linuxquestions.org/questions/showthread.php?p=5803405#post5803405

My assumption is, that the kernel waits until it has
enough randomness for the various encryption algorithms.

I think this is caused by cryptomgr_test. It can be disabled with "cryptomgr.notests" on command
line.

Dave

--
John David Anglin dave.anglin@bell.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John David Anglin@21:1/5 to Helge Deller on Sun Apr 22 00:40:02 2018

On 2018-04-21 6:17 PM, Helge Deller wrote:

It can be disabled with "cryptomgr.notests" on command line.

Did you tested this?

Not recently. I found this when I was working on the cache.TLB patch.
It caused a stall in one version.

Unless I typed it wrong it didn't worked on my B160L:
[ 0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux

You typed it wrong.

Dave

--
John David Anglin dave.anglin@bell.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helge Deller@21:1/5 to John David Anglin on Sun Apr 22 00:20:01 2018

On 21.04.2018 21:12, John David Anglin wrote:

On 2018-04-20 2:37 AM, Helge Deller wrote:

Also interesting, the kernel messages for 4.15.11, please notice the
time difference between "random: crng init done" and "Key type
asymmetric registered":

Seems to be a generic issue.
https://www.linuxquestions.org/questions/showthread.php?p=5803405#post5803405

My assumption is, that the kernel waits until it has
enough randomness for the various encryption algorithms.

I think this is caused by cryptomgr_test.
It can be disabled with "cryptomgr.notests" on command line.

Did you tested this?
Unless I typed it wrong it didn't worked on my B160L:
[ 0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux
...
[ 15.549370] workingset: timestamp_bits=14 max_order=15 bucket_order=1
[ 15.688261] zbud: loaded
[ 57.608154] random: crng init done
...long delay here...
[ 207.522038] Key type asymmetric registered
[ 207.574154] Asymmetric key parser 'x509' registered
[ 207.635883] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
[ 207.729718] io scheduler noop registered

Helge

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helge Deller@21:1/5 to John David Anglin on Sun Apr 22 11:10:01 2018

On 22.04.2018 00:36, John David Anglin wrote:

On 2018-04-21 6:17 PM, Helge Deller wrote:

It can be disabled with "cryptomgr.notests" on command line.

Did you tested this?

Not recently. I found this when I was working on the cache.TLB patch. It caused a stall in one version.

Unless I typed it wrong it didn't worked on my B160L:
[ 0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux

You typed it wrong.

Yes, my fault.
"cryptomgr.notests" did worked as expected.
Thanks!
Helge

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Scheiner@21:1/5 to Helge Deller on Sun Apr 22 21:20:01 2018

On 04/22/2018 11:06 AM, Helge Deller wrote:

On 22.04.2018 00:36, John David Anglin wrote:

On 2018-04-21 6:17 PM, Helge Deller wrote:

It can be disabled with "cryptomgr.notests" on command line.

Did you tested this?

Not recently. I found this when I was working on the cache.TLB patch. It caused a stall in one version.

Unless I typed it wrong it didn't worked on my B160L:
[ 0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux

You typed it wrong.

Yes, my fault.
"cryptomgr.notests" did worked as expected.

Great, that's pretty useful for slower machines like the 712/80.

Cheers,
Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Scheiner@21:1/5 to Jeroen Roovers on Sun Apr 22 21:20:01 2018

On 04/20/2018 11:24 AM, Jeroen Roovers wrote:

You could try setting the internal NIC to half-duplex, or perhaps use a (passive) 10BASE-T hub instead of a switch if you cannot configure that internally, on the kernel command line, or doing it in userland is too
late.

I actually had the port configured to half-duplex at first. But I was distracted by the high number of collisions taking place and so changed
it to full-duplex.

Cheers,
Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Scheiner@21:1/5 to John David Anglin on Sun Apr 22 21:20:01 2018

On 04/21/2018 02:22 AM, John David Anglin wrote:

From the manual, it seems the 10BASE-T port is half duplex (CSMA/CD).
The MAU
interface is definitely half duplex and the word duplex is not mentioned
in the manual.

I also didn't find any info about half-/full-duplex in the two manuals I
have at hand for the 712/80 ("Service Handbook" and "Technical Reference Manual"). To be sure, which one did you consult?

The 10BASE-T port probably doesn't support auto negotiation, so you will
need to manually
set the switch port to 10BASE-T half duplex if it doesn't automatically configure to this mode
when auto negotiation fails.

Did this at first but then went for full-duplex again. Today I started
with full-duplex and actively cooling the heatsink (now smoothed and
with fresh thermal grease applied) of the 712/80's processor, but that
didn't help alone. The issue hit me after entering the password during
login.

Then I reconfigured half-duplex and tried again. The machine now worked
through the whole login and I could also do an `apt update` without
issues afterwards. Then I let it alone for about twenty minutes and on
return I did an `apt list --upgradable` which triggered the issue again.

:-/

Some switches support a half-duplex back pressure form of flow control.

I'll try that now. According to the documentation my switch can create back-pressure as form of flow control.

Cheers,
Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From John David Anglin@21:1/5 to Frank Scheiner on Sun Apr 22 22:20:01 2018

On 2018-04-22 3:17 PM, Frank Scheiner wrote:

On 04/21/2018 02:22 AM, John David Anglin wrote:

From the manual, it seems the 10BASE-T port is half duplex
(CSMA/CD). The MAU
interface is definitely half duplex and the word duplex is not
mentioned in the manual.

I also didn't find any info about half-/full-duplex in the two manuals
I have at hand for the 712/80 ("Service Handbook" and "Technical
Reference Manual"). To be sure, which one did you consult?

I looked at the "Technical Reference".

The 10BASE-T port probably doesn't support auto negotiation, so you
will need to manually
set the switch port to 10BASE-T half duplex if it doesn't
automatically configure to this mode
when auto negotiation fails.

Did this at first but then went for full-duplex again. Today I started
with full-duplex and actively cooling the heatsink (now smoothed and
with fresh thermal grease applied) of the 712/80's processor, but that
didn't help alone. The issue hit me after entering the password during
login.

Then I reconfigured half-duplex and tried again. The machine now
worked through the whole login and I could also do an `apt update`
without issues afterwards. Then I let it alone for about twenty
minutes and on return I did an `apt list --upgradable` which triggered
the issue again.

Seems like hardware problem, probably in 712. The switch and 712 need
to be in same mode. If my supposition about the 712
only supporting half duplex is correct, then the switch will have to be
in half duplex. I think network boot and `apt update`
would be a sufficient test of the network configuration. Without error messages, this is hard.

:-/

Some switches support a half-duplex back pressure form of flow control.

I'll try that now. According to the documentation my switch can create back-pressure as form of flow control.

It's possible flow control on the server port might help given that the
712 is so slow and probably
needs half duplex. The switch might drop packets as a result. However,
IP usually adjusts for slow segments.

Dave

--
John David Anglin dave.anglin@bell.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Frank Scheiner@21:1/5 to Frank Scheiner on Mon Apr 23 15:50:01 2018

On 04/22/2018 09:17 PM, Frank Scheiner wrote:

Some switches support a half-duplex back pressure form of flow control.

I'll try that now. According to the documentation my switch can create back-pressure as form of flow control.

Yesterday after I activated flow control on the switch, the 712/80 got
back after a while and finished the `apt list --upgradable` command with
output - in between the journald of systemd crashed and restarted.
Reissuing the same `apt [...]` command worked without problems. On the
switch's port summary I could now also recognize that the host that acts
as NFS server now got pause frames submitted by the switch - so the flow control is working.

I then tried to install `joe` and when `update-alternatives` started it
again lost the connection to the NFS server. :-( It didn't recover from
that - at least not during the time I waited for it - so I powered the
712/80 down.

I thought maybe switching back to System V init might ease the load a
little bit for the 712/80, so I upgraded the file system with a c8000
(incl. newer patch level for the kernel) and removed systemd afterwards
(also from initramfs).

I then ran some benchmarks without any issues in between.

Today I still have the problems described in [1] when doing `apt install
[...]` or `apt remove [...]` but now the 712/80 recovered each time so
far after a while, so it looks like an improvement to me. Look at the
timings for `apt remove [...]`:

```
root@hp-712:~# time apt remove -y joe
[ 8794.150750] nfs: server 172.16.0.2 not responding, still trying
[...]
[ 8794.962227] nfs: server 172.16.0.2 not responding, still trying
[ 8795.074226] nfs: server 172.16.0.2 OK
[...]
[ 8797.271834] nfs: server 172.16.0.2 OK
[ 8802.242312] nfs: server 172.16.0.2 not responding, still trying
[...]
[ 9235.937478] nfs: server 172.16.0.2 OK
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
joe
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 2,086 kB disk space will be freed.
(Reading database ... 41128 files and directories currently installed.) Removing joe (4.6-1) ...
update-alternatives: using /usr/bin/jmacs to provide /usr/bin/editor
(editor) in auto mode
update-alternatives: using /usr/bin/jpico to provide /usr/bin/editor
(editor) in auto mode
update-alternatives: using /bin/nano to provide /usr/bin/editor (editor)
in auto mode
Processing triggers for mime-support (3.60) ...
[ 9357.992385] nfs: server 172.16.0.2 not responding, still trying
[...]
[10055.370493] nfs: server 172.16.0.2 not responding, still trying [10055.709731] nfs: server 172.16.0.2 OK
[...]
[10057.212469] nfs: server 172.16.0.2 OK

real 22m0.853s
user 1m3.264s
sys 0m43.875s
```

...the `apt install -y joe` done beforehand took about 41 minutes. So
the 712/80 can recover from the described problems, but package
management should really be done from a more powerful machine, at least
when running diskless.

[1]: https://lists.debian.org/debian-hppa/2018/04/msg00007.html

Cheers,
Frank

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	64:05:36
Calls:	6,654
Files:	12,200
Messages:	5,331,763

Ultra5 successful install - PGX64 issues

Who's Online

System Info