• Ultra5 successful install - PGX64 issues

    From Frank Scheiner@21:1/5 to Dennis Clarke on Sat Apr 14 20:20:02 2018
    On 04/14/2018 06:11 PM, Dennis Clarke wrote:
    Really?  Well then .. let me see what I have that is ancient in the
     warehouse.

    How about PA-RISC?  I happen to have some superdomes kicking about but
    they require truely a ton of power to operate.

    I assume hppa people in Debian (debian-hppa@l.d.o in CC) would
    appreciate testing on such gear. Not sure if those superdomes will work
    out of the box though. I know from my own testing that the following
    "smaller" machines work with Debian GNU/Linux Sid for hppa:

    * 712/80
    * c3700, c3750, J5600, rp2470
    * c8000, rp3440

    Apart from the rp3440 - and maybe also the 712/80 which showed some
    issue with it's built-in NIC after netbooting the Linux kernel and the
    OS - all machines also work diskless, which could speed up testing for
    you and avoid a manual Debian installation - although this could still
    be interesting.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helge Deller@21:1/5 to Frank Scheiner on Sun Apr 15 10:40:01 2018
    On 14.04.2018 20:13, Frank Scheiner wrote:
    On 04/14/2018 06:11 PM, Dennis Clarke wrote:
    Really?  Well then .. let me see what I have that is ancient in the
      warehouse.

    How about PA-RISC?  I happen to have some superdomes kicking about but they require truely a ton of power to operate.

    I assume hppa people in Debian (debian-hppa@l.d.o in CC) would appreciate testing on such gear.
    Not sure if those superdomes will work out of the box though.

    It really would be interesting if Linux can boot on such machines.
    If they don't, I'm pretty sure that I can finish the firmware support in Linux to be able to boot in a cell. For that I'd need access to such a machines via ssh (to a x86 machine for cross-compiling/tftpboot provisioning) & a serial port to the superdome.

    I know from my own testing that the following "smaller" machines work with Debian GNU/Linux Sid for hppa:

    * 712/80
    * c3700, c3750, J5600, rp2470
    * c8000, rp3440

    Apart from the rp3440 - and maybe also the 712/80 which showed some issue with it's built-in NIC after netbooting the Linux kernel and the OS

    What kind of problems?

    - all machines also work diskless, which could speed up testing for you and avoid a manual Debian installation - although this could still be interesting.

    Helge

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Helge Deller on Thu Apr 19 21:40:02 2018
    Hi,

    and sorry for the delay, I was a little short of spare time this week. :-/

    On 04/15/2018 10:34 AM, Helge Deller wrote:
    On 14.04.2018 20:13, Frank Scheiner wrote:
    I know from my own testing that the following "smaller" machines work with Debian GNU/Linux Sid for hppa:

    * 712/80
    * c3700, c3750, J5600, rp2470
    * c8000, rp3440

    Apart from the rp3440 - and maybe also the 712/80 which showed some issue with it's built-in NIC after netbooting the Linux kernel and the OS

    What kind of problems?

    Unfortunately I seem to not have made any notes for the issue with the
    712/80, so I retried with the assumed issue creating configuration
    earlier this week:

    This configuration was using a Debian Linux kernel 4.9.25-1
    (4.9.0-3-parisc from 2017-05-02). And when netbooting it, shortly after
    login the machine seems to loose contact to the NFS server:

    ```
    [...]
    [ OK ] Started Serial Getty on ttyS0.
    [ OK ] Started Getty on tty1.
    [ OK ] Reached target Login Prompts.

    Debian GNU/Linux buster/sid hp-712 ttyS0

    hp-712 login: root
    Password:
    Last login: Thu Sep 18 11:30:50 CET 1902 from 172.16.1.1 on pts/0
    Linux hp-712 4.9.0-3-parisc #1 Debian 4.9.25-1 (2017-05-02) parisc

    The programs included with the Debian GNU/Linux system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.

    Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
    permitted by applicable law.

    [ 232.973913] nfs: server 172.16.0.2 not responding, still trying
    [ 233.094265] nfs: server 172.16.0.2 not responding, still trying
    [ 233.205127] nfs: server 172.16.0.2 not responding, still trying
    [ 233.568429] nfs: server 172.16.0.2 not responding, still trying
    [ 233.692383] nfs: server 172.16.0.2 not responding, still trying
    [ 233.808818] nfs: server 172.16.0.2 not responding, still trying
    [...]
    [ 235.179253] nfs: server 172.16.0.2 OK
    [ 235.251896] nfs: server 172.16.0.2 not responding, still trying
    [...]
    ```

    Although it seems to be able to reconnect from time to time, the machine
    is not accessible.

    Afterwards I found some older notes about this machine which mention no
    issues during diskless operation with the very same configuration
    (kernel and possibly also userland), which made me wonder, if there's
    maybe an issue between the machine's built-in NIC and my used 1000 Mbit
    network switch. And indeed, when connecting another 100 Mbit network
    switch in between the 712/80 and the 1000 Mbit network switch the issue
    seemed to be gone and the machine stayed accessible .

    But later this week I retried the 712/80 with the current Linux kernel
    (4.15.x) and Debian userland and the issue hit me again, although much
    later and despite the 100 Mbit network switch in between. Looking at it
    I could see that the collision indicator was active on the switch for
    the port used by the 712/80. I then configured a singular port of the
    1000 Mbit network switch to 10 Mbit full duplex and attached the 712/80
    to it. And then the issue again seemed to be gone. But trying to install
    a package or updating the package cache again quickly triggered it. Well
    that's not that of an issue, as I can do the package management for the
    712/80 with another machine (e.g. c8000).

    Also interesting, the kernel messages for 4.15.11, please notice the
    time difference between "random: crng init done" and "Key type
    asymmetric registered":

    ```
    [ 0.000000] Linux version 4.15.0-2-parisc
    (debian-kernel@lists.debian.org) (gcc version 7.3.0 (Debian 7.3.0-12))
    #1 Debian 4.15.11-1 (2018-03-20)
    [ 0.000000] unwind_init: start = 0x1086e8b4, end = 0x108c5644,
    entries = 22233
    [ 0.000000] FP[0] enabled: Rev 1 Model 13
    [ 0.000000] The 32-bit Kernel has started...
    [...]
    [ 9.919844] workingset: timestamp_bits=14 max_order=15 bucket_order=1
    [ 10.168866] zbud: loaded
    [ 56.112387] random: crng init done
    [ 433.392379] Key type asymmetric registered
    [ 433.445502] Asymmetric key parser 'x509' registered
    [...]
    [ 544.565451] systemd[1]: Detected architecture parisc.

    Welcome to Debian GNU/Linux buster/sid!
    [...]
    [ OK ] Started Serial Getty on ttyS0.
    [ OK ] Started Getty on tty1.
    [ OK ] Reached target Login Prompts.

    Debian GNU/Linux buster/sid hp-712 ttyS0

    hp-712 login:

    ```

    ...On first try I assumed the machine or the kernel would hang, but no,
    it was still working all the time.

    Today I tested it again (with 4.15.11) and the issue this time hit me
    already during login, after I entered the username.

    So I'm actually back at where I'm started. :-(

    I suspect that maybe the built-in 82596 NIC cannot cope with the amount
    of traffic that happens during diskless operation - although I then
    wonder why it doesn't have a problem during the TFTP operation to load
    the lifimage. Next thing I'll examine will be the parameters used for
    the NFS mount (especially for rsize and wsize) - if I ever can login to
    it again :-). And maybe a fan for the passive heat sink of the CPU which
    gets quite hot during operation.

    Any suggestions on where to look else?

    ****

    For the rp3440 I (also) have to retract my earlier statement as it looks
    like my second rp3440 actually **works** diskless. I have to retest with
    my first rp3440 (currently in storage) as it seems it behaves
    differently in this regard - or maybe I misconfigured something there in
    the past. I have to recheck.

    But for my second rp3440 I still had to blacklist the `radeon` module to achieve this, as otherwise the system (console) seems to crash shortly
    before the login prompt would have appeared or just after. This is my
    used kernel command line as configured with palo 1.99 and Linux 4.14.x:

    ```
    Current command line:
    0/vmlinux HOME=/ root=/dev/nfs ip=:::::enp32s2:dhcp
    modprobe.blacklist=radeon initrd=0/ramdisk TERM=vt102 console=ttyS0
    0: 0/vmlinux
    1: HOME=/
    2: root=/dev/nfs
    3: ip=:::::enp32s2:dhcp
    4: modprobe.blacklist=radeon
    5: initrd=0/ramdisk
    6: TERM=vt102
    7: console=ttyS0
    ```

    Interestingly after upgrading all packages (obviously including palo) on
    the NFS root FS and building a new lifimage with Linux 4.15.x,
    blacklisting the radeon module seems to be no longer required. Not sure
    if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon module
    is no longer loaded automatically with this configuration.

    ****

    So actually at least also the rp3440 can work diskless - good that you
    asked, Helge. :-)

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John David Anglin@21:1/5 to Frank Scheiner on Thu Apr 19 23:40:01 2018
    On 2018-04-19 3:29 PM, Frank Scheiner wrote:
    Interestingly after upgrading all packages (obviously including palo)
    on the NFS root FS and building a new lifimage with Linux 4.15.x, blacklisting the radeon module seems to be no longer required. Not
    sure if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon
    module is no longer loaded automatically with this configuration.
    A quirk was added to disable the radeon driver for the builtin RV100 in
    the rp3440.  It's broken.  There should be a message in the dmesg log
    about this.

    Dave

    --
    John David Anglin dave.anglin@bell.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John David Anglin on Fri Apr 20 08:00:01 2018
    On 04/19/2018 11:33 PM, John David Anglin wrote:
    On 2018-04-19 3:29 PM, Frank Scheiner wrote:
    Interestingly after upgrading all packages (obviously including palo)
    on the NFS root FS and building a new lifimage with Linux 4.15.x,
    blacklisting the radeon module seems to be no longer required. Not
    sure if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon
    module is no longer loaded automatically with this configuration.
    A quirk was added to disable the radeon driver for the builtin RV100 in
    the rp3440.  It's broken.  There should be a message in the dmesg log
    about this.

    Indeed, there it is:

    ```
    [...]
    [ 3.100350] pci 0000:e0:01.0: Hiding Diva built-in AUX serial device
    [ 3.101683] pci 0000:e0:02.0: Hiding Diva built-in ATI card
    [...]
    ```

    ...I overlooked it. I wonder why this issue didn't hit the rp3440 (not
    even my first one) when it was booted from disk, because it really looks
    like not loading the radeon module solved the problem for diskless
    operation.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helge Deller@21:1/5 to Frank Scheiner on Fri Apr 20 08:40:01 2018
    On 19.04.2018 21:29, Frank Scheiner wrote:
    Apart from the rp3440 - and maybe also the 712/80 which showed some issue with it's built-in NIC after netbooting the Linux kernel and the OS

    What kind of problems?

    Unfortunately I seem to not have made any notes for the issue with the 712/80, so I retried with the assumed issue creating configuration earlier this week:

    This configuration was using a Debian Linux kernel 4.9.25-1 (4.9.0-3-parisc from 2017-05-02). And when netbooting it, shortly after login the machine seems to loose contact to the NFS server:

    ```
    [...]
    [  OK  ] Started Serial Getty on ttyS0.
    [  OK  ] Started Getty on tty1.
    [  OK  ] Reached target Login Prompts.

    Debian GNU/Linux buster/sid hp-712 ttyS0

    hp-712 login: root
    Password:
    Last login: Thu Sep 18 11:30:50 CET 1902 from 172.16.1.1 on pts/0
    Linux hp-712 4.9.0-3-parisc #1 Debian 4.9.25-1 (2017-05-02) parisc

    The programs included with the Debian GNU/Linux system are free software;
    the exact distribution terms for each program are described in the
    individual files in /usr/share/doc/*/copyright.

    Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
    permitted by applicable law.

    [  232.973913] nfs: server 172.16.0.2 not responding, still trying
    [  233.094265] nfs: server 172.16.0.2 not responding, still trying
    [  233.205127] nfs: server 172.16.0.2 not responding, still trying
    [  233.568429] nfs: server 172.16.0.2 not responding, still trying
    [  233.692383] nfs: server 172.16.0.2 not responding, still trying
    [  233.808818] nfs: server 172.16.0.2 not responding, still trying
    [...]
    [  235.179253] nfs: server 172.16.0.2 OK
    [  235.251896] nfs: server 172.16.0.2 not responding, still trying
    [...]
    ```

    Although it seems to be able to reconnect from time to time, the machine is not accessible.

    Afterwards I found some older notes about this machine which mention
    no issues during diskless operation with the very same configuration
    (kernel and possibly also userland), which made me wonder, if there's
    maybe an issue between the machine's built-in NIC and my used 1000
    Mbit network switch. And indeed, when connecting another 100 Mbit
    network switch in between the 712/80 and the 1000 Mbit network switch
    the issue seemed to be gone and the machine stayed accessible .

    But later this week I retried the 712/80 with the current Linux
    kernel (4.15.x) and Debian userland and the issue hit me again,
    although much later and despite the 100 Mbit network switch in
    between. Looking at it I could see that the collision indicator was
    active on the switch for the port used by the 712/80. I then
    configured a singular port of the 1000 Mbit network switch to 10 Mbit
    full duplex and attached the 712/80 to it. And then the issue again
    seemed to be gone. But trying to install a package or updating the
    package cache again quickly triggered it. Well that's not that of an
    issue, as I can do the package management for the 712/80 with another
    machine (e.g. c8000).

    Also interesting, the kernel messages for 4.15.11, please notice the
    time difference between "random: crng init done" and "Key type
    asymmetric registered":
    Seems to be a generic issue. https://www.linuxquestions.org/questions/showthread.php?p=5803405#post5803405

    My assumption is, that the kernel waits until it has
    enough randomness for the various encryption algorithms.


    ```
    [    0.000000] Linux version 4.15.0-2-parisc (debian-kernel@lists.debian.org) (gcc version 7.3.0 (Debian 7.3.0-12)) #1 Debian 4.15.11-1 (2018-03-20)
    [    0.000000] unwind_init: start = 0x1086e8b4, end = 0x108c5644, entries = 22233
    [    0.000000] FP[0] enabled: Rev 1 Model 13
    [    0.000000] The 32-bit Kernel has started...
    [...]
    [    9.919844] workingset: timestamp_bits=14 max_order=15 bucket_order=1 [   10.168866] zbud: loaded
    [   56.112387] random: crng init done
    [  433.392379] Key type asymmetric registered
    [  433.445502] Asymmetric key parser 'x509' registered
    [...]
    [  544.565451] systemd[1]: Detected architecture parisc.

    Welcome to Debian GNU/Linux buster/sid!
    [...]
    [  OK  ] Started Serial Getty on ttyS0.
    [  OK  ] Started Getty on tty1.
    [  OK  ] Reached target Login Prompts.

    Debian GNU/Linux buster/sid hp-712 ttyS0

    hp-712 login:

    ```

    ...On first try I assumed the machine or the kernel would hang, but no, it was still working all the time.

    Today I tested it again (with 4.15.11) and the issue this time hit me already during login, after I entered the username.

    So I'm actually back at where I'm started. :-(

    I suspect that maybe the built-in 82596 NIC cannot cope with the
    amount of traffic that happens during diskless operation - although I
    then wonder why it doesn't have a problem during the TFTP operation
    to load the lifimage.
    When loading via TFTP not much traffic is generated.

    Next thing I'll examine will be the parameters used for the NFS mount (especially for rsize and wsize) - if I ever can login to it again
    :-). And maybe a fan for the passive heat sink of the CPU which gets
    quite hot during operation.

    Any suggestions on where to look else?

    Not really.



    ****

    For the rp3440 I (also) have to retract my earlier statement as it
    looks like my second rp3440 actually **works** diskless. I have to
    retest with my first rp3440 (currently in storage) as it seems it
    behaves differently in this regard - or maybe I misconfigured
    something there in the past. I have to recheck.

    But for my second rp3440 I still had to blacklist the `radeon` module
    to achieve this, as otherwise the system (console) seems to crash
    shortly before the login prompt would have appeared or just after.
    This is my used kernel command line as configured with palo 1.99 and
    Linux 4.14.x:

    ```
    Current command line:
    0/vmlinux HOME=/ root=/dev/nfs ip=:::::enp32s2:dhcp modprobe.blacklist=radeon initrd=0/ramdisk TERM=vt102 console=ttyS0
     0: 0/vmlinux
     1: HOME=/
     2: root=/dev/nfs
     3: ip=:::::enp32s2:dhcp
     4: modprobe.blacklist=radeon
     5: initrd=0/ramdisk
     6: TERM=vt102
     7: console=ttyS0
    ```

    Interestingly after upgrading all packages (obviously including palo)
    on the NFS root FS and building a new lifimage with Linux 4.15.x, blacklisting the radeon module seems to be no longer required. Not
    sure if this is due to palo 2.00 or Linux 4.15.x. Anyways the radeon
    module is no longer loaded automatically with this configuration.

    There were two issues fixed regarding rp3440.
    1. The radeon module on the management board is automatically
    disabled by the Linux kernel. This fixes crashes/hangs.
    2. The serial port on the management board is disabled by the
    Linux kernel.
    https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bcf3f1752a622f1372d3252d0fea8855d89812e7

    Older versions of palo tried to work around problem #2 by
    giving kernel parameter "console=ttyS1" to the Linux kernel when
    booting.
    So, since you upgraded palo and kernel both workarounds aren't
    necessary any longer and rp-class machines should work without
    any further quirks.

    Helge

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeroen Roovers@21:1/5 to Frank Scheiner on Fri Apr 20 11:50:02 2018
    On Thu, 19 Apr 2018 21:29:45 +0200
    Frank Scheiner <frank.scheiner@web.de> wrote:

    Afterwards I found some older notes about this machine which mention
    no issues during diskless operation with the very same configuration
    (kernel and possibly also userland), which made me wonder, if there's
    maybe an issue between the machine's built-in NIC and my used 1000
    Mbit network switch. And indeed, when connecting another 100 Mbit
    network switch in between the 712/80 and the 1000 Mbit network switch
    the issue seemed to be gone and the machine stayed accessible .

    But later this week I retried the 712/80 with the current Linux
    kernel (4.15.x) and Debian userland and the issue hit me again,
    although much later and despite the 100 Mbit network switch in
    between. Looking at it I could see that the collision indicator was
    active on the switch for the port used by the 712/80. I then
    configured a singular port of the 1000 Mbit network switch to 10 Mbit
    full duplex and attached the 712/80 to it. And then the issue again
    seemed to be gone. But trying to install a package or updating the
    package cache again quickly triggered it.

    You could try setting the internal NIC to half-duplex, or perhaps use a (passive) 10BASE-T hub instead of a switch if you cannot configure that internally, on the kernel command line, or doing it in userland is too
    late.


    Kind regards,
    jer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John David Anglin@21:1/5 to Jeroen Roovers on Sat Apr 21 02:30:02 2018
    On 2018-04-20 5:24 AM, Jeroen Roovers wrote:
    But later this week I retried the 712/80 with the current Linux
    kernel (4.15.x) and Debian userland and the issue hit me again,
    although much later and despite the 100 Mbit network switch in
    between. Looking at it I could see that the collision indicator was
    active on the switch for the port used by the 712/80. I then
    configured a singular port of the 1000 Mbit network switch to 10 Mbit
    full duplex and attached the 712/80 to it. And then the issue again
    seemed to be gone. But trying to install a package or updating the
    package cache again quickly triggered it.
    You could try setting the internal NIC to half-duplex, or perhaps use a (passive) 10BASE-T hub instead of a switch if you cannot configure that internally, on the kernel command line, or doing it in userland is too
    late.
    From the manual, it seems the 10BASE-T port is half duplex (CSMA/CD). 
    The MAU
    interface is definitely half duplex and the word duplex is not mentioned
    in the manual.

    The 10BASE-T port probably doesn't support auto negotiation, so you will
    need to manually
    set the switch port to 10BASE-T half duplex if it doesn't automatically configure to this mode
    when auto negotiation fails.

    Some switches support a half-duplex back pressure form of flow control.

    Setting the switch port is probably easier than finding a passive 10BASE
    hub.

    Dave

    --
    John David Anglin dave.anglin@bell.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John David Anglin@21:1/5 to Helge Deller on Sat Apr 21 21:20:01 2018
    On 2018-04-20 2:37 AM, Helge Deller wrote:
    Also interesting, the kernel messages for 4.15.11, please notice the
    time difference between "random: crng init done" and "Key type
    asymmetric registered":
    Seems to be a generic issue. https://www.linuxquestions.org/questions/showthread.php?p=5803405#post5803405

    My assumption is, that the kernel waits until it has
    enough randomness for the various encryption algorithms.
    I think this is caused by cryptomgr_test.  It can be disabled with "cryptomgr.notests" on command
    line.

    Dave

    --
    John David Anglin dave.anglin@bell.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John David Anglin@21:1/5 to Helge Deller on Sun Apr 22 00:40:02 2018
    On 2018-04-21 6:17 PM, Helge Deller wrote:
    It can be disabled with "cryptomgr.notests" on command line.
    Did you tested this?
    Not recently.  I found this when I was working on the cache.TLB patch. 
    It caused a stall in one version.
    Unless I typed it wrong it didn't worked on my B160L:
    [ 0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux
    You typed it wrong.

    Dave

    --
    John David Anglin dave.anglin@bell.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helge Deller@21:1/5 to John David Anglin on Sun Apr 22 00:20:01 2018
    On 21.04.2018 21:12, John David Anglin wrote:
    On 2018-04-20 2:37 AM, Helge Deller wrote:
    Also interesting, the kernel messages for 4.15.11, please notice the
    time difference between "random: crng init done" and "Key type
    asymmetric registered":
    Seems to be a generic issue.
    https://www.linuxquestions.org/questions/showthread.php?p=5803405#post5803405

    My assumption is, that the kernel waits until it has
    enough randomness for the various encryption algorithms.

    I think this is caused by cryptomgr_test.
    It can be disabled with "cryptomgr.notests" on command line.

    Did you tested this?
    Unless I typed it wrong it didn't worked on my B160L:
    [ 0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux
    ...
    [ 15.549370] workingset: timestamp_bits=14 max_order=15 bucket_order=1
    [ 15.688261] zbud: loaded
    [ 57.608154] random: crng init done
    ...long delay here...
    [ 207.522038] Key type asymmetric registered
    [ 207.574154] Asymmetric key parser 'x509' registered
    [ 207.635883] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 250)
    [ 207.729718] io scheduler noop registered

    Helge

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helge Deller@21:1/5 to John David Anglin on Sun Apr 22 11:10:01 2018
    On 22.04.2018 00:36, John David Anglin wrote:
    On 2018-04-21 6:17 PM, Helge Deller wrote:
    It can be disabled with "cryptomgr.notests" on command line.
    Did you tested this?
    Not recently.  I found this when I was working on the cache.TLB patch.  It caused a stall in one version.
    Unless I typed it wrong it didn't worked on my B160L:
    [    0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux

    You typed it wrong.

    Yes, my fault.
    "cryptomgr.notests" did worked as expected.
    Thanks!
    Helge

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Helge Deller on Sun Apr 22 21:20:01 2018
    On 04/22/2018 11:06 AM, Helge Deller wrote:
    On 22.04.2018 00:36, John David Anglin wrote:
    On 2018-04-21 6:17 PM, Helge Deller wrote:
    It can be disabled with "cryptomgr.notests" on command line.
    Did you tested this?
    Not recently.  I found this when I was working on the cache.TLB patch.  It caused a stall in one version.
    Unless I typed it wrong it didn't worked on my B160L:
    [    0.000000] Kernel command line: root=/dev/sda5 crpytomgr.notests HOME=/ console=ttyS0 TERM=vt102 palo_kernel=2/vmlinux

    You typed it wrong.

    Yes, my fault.
    "cryptomgr.notests" did worked as expected.

    Great, that's pretty useful for slower machines like the 712/80.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Jeroen Roovers on Sun Apr 22 21:20:01 2018
    On 04/20/2018 11:24 AM, Jeroen Roovers wrote:
    You could try setting the internal NIC to half-duplex, or perhaps use a (passive) 10BASE-T hub instead of a switch if you cannot configure that internally, on the kernel command line, or doing it in userland is too
    late.

    I actually had the port configured to half-duplex at first. But I was distracted by the high number of collisions taking place and so changed
    it to full-duplex.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John David Anglin on Sun Apr 22 21:20:01 2018
    On 04/21/2018 02:22 AM, John David Anglin wrote:
    From the manual, it seems the 10BASE-T port is half duplex (CSMA/CD).
    The MAU
    interface is definitely half duplex and the word duplex is not mentioned
    in the manual.

    I also didn't find any info about half-/full-duplex in the two manuals I
    have at hand for the 712/80 ("Service Handbook" and "Technical Reference Manual"). To be sure, which one did you consult?


    The 10BASE-T port probably doesn't support auto negotiation, so you will
    need to manually
    set the switch port to 10BASE-T half duplex if it doesn't automatically configure to this mode
    when auto negotiation fails.

    Did this at first but then went for full-duplex again. Today I started
    with full-duplex and actively cooling the heatsink (now smoothed and
    with fresh thermal grease applied) of the 712/80's processor, but that
    didn't help alone. The issue hit me after entering the password during
    login.

    Then I reconfigured half-duplex and tried again. The machine now worked
    through the whole login and I could also do an `apt update` without
    issues afterwards. Then I let it alone for about twenty minutes and on
    return I did an `apt list --upgradable` which triggered the issue again.

    :-/


    Some switches support a half-duplex back pressure form of flow control.

    I'll try that now. According to the documentation my switch can create back-pressure as form of flow control.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John David Anglin@21:1/5 to Frank Scheiner on Sun Apr 22 22:20:01 2018
    On 2018-04-22 3:17 PM, Frank Scheiner wrote:
    On 04/21/2018 02:22 AM, John David Anglin wrote:
     From the manual, it seems the 10BASE-T port is half duplex
    (CSMA/CD). The MAU
    interface is definitely half duplex and the word duplex is not
    mentioned in the manual.

    I also didn't find any info about half-/full-duplex in the two manuals
    I have at hand for the 712/80 ("Service Handbook" and "Technical
    Reference Manual"). To be sure, which one did you consult?
    I looked at the "Technical Reference".


    The 10BASE-T port probably doesn't support auto negotiation, so you
    will need to manually
    set the switch port to 10BASE-T half duplex if it doesn't
    automatically configure to this mode
    when auto negotiation fails.

    Did this at first but then went for full-duplex again. Today I started
    with full-duplex and actively cooling the heatsink (now smoothed and
    with fresh thermal grease applied) of the 712/80's processor, but that
    didn't help alone. The issue hit me after entering the password during
    login.

    Then I reconfigured half-duplex and tried again. The machine now
    worked through the whole login and I could also do an `apt update`
    without issues afterwards. Then I let it alone for about twenty
    minutes and on return I did an `apt list --upgradable` which triggered
    the issue again.
    Seems like hardware problem, probably in 712.  The switch and 712 need
    to be in same mode.  If my supposition about the 712
    only supporting half duplex is correct, then the switch will have to be
    in half duplex.  I think network boot and `apt update`
    would be a sufficient test of the network configuration.  Without error messages, this is hard.

    :-/


    Some switches support a half-duplex back pressure form of flow control.

    I'll try that now. According to the documentation my switch can create back-pressure as form of flow control.
    It's possible flow control on the server port might help given that the
    712 is so slow and probably
    needs half duplex.  The switch might drop packets as a result. However,
    IP usually adjusts for slow segments.

    Dave

    --
    John David Anglin dave.anglin@bell.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Frank Scheiner on Mon Apr 23 15:50:01 2018
    On 04/22/2018 09:17 PM, Frank Scheiner wrote:
    Some switches support a half-duplex back pressure form of flow control.

    I'll try that now. According to the documentation my switch can create back-pressure as form of flow control.

    Yesterday after I activated flow control on the switch, the 712/80 got
    back after a while and finished the `apt list --upgradable` command with
    output - in between the journald of systemd crashed and restarted.
    Reissuing the same `apt [...]` command worked without problems. On the
    switch's port summary I could now also recognize that the host that acts
    as NFS server now got pause frames submitted by the switch - so the flow control is working.

    I then tried to install `joe` and when `update-alternatives` started it
    again lost the connection to the NFS server. :-( It didn't recover from
    that - at least not during the time I waited for it - so I powered the
    712/80 down.

    I thought maybe switching back to System V init might ease the load a
    little bit for the 712/80, so I upgraded the file system with a c8000
    (incl. newer patch level for the kernel) and removed systemd afterwards
    (also from initramfs).

    I then ran some benchmarks without any issues in between.

    Today I still have the problems described in [1] when doing `apt install
    [...]` or `apt remove [...]` but now the 712/80 recovered each time so
    far after a while, so it looks like an improvement to me. Look at the
    timings for `apt remove [...]`:

    ```
    root@hp-712:~# time apt remove -y joe
    [ 8794.150750] nfs: server 172.16.0.2 not responding, still trying
    [...]
    [ 8794.962227] nfs: server 172.16.0.2 not responding, still trying
    [ 8795.074226] nfs: server 172.16.0.2 OK
    [...]
    [ 8797.271834] nfs: server 172.16.0.2 OK
    [ 8802.242312] nfs: server 172.16.0.2 not responding, still trying
    [...]
    [ 9235.937478] nfs: server 172.16.0.2 OK
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following packages will be REMOVED:
    joe
    0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
    After this operation, 2,086 kB disk space will be freed.
    (Reading database ... 41128 files and directories currently installed.) Removing joe (4.6-1) ...
    update-alternatives: using /usr/bin/jmacs to provide /usr/bin/editor
    (editor) in auto mode
    update-alternatives: using /usr/bin/jpico to provide /usr/bin/editor
    (editor) in auto mode
    update-alternatives: using /bin/nano to provide /usr/bin/editor (editor)
    in auto mode
    Processing triggers for mime-support (3.60) ...
    [ 9357.992385] nfs: server 172.16.0.2 not responding, still trying
    [...]
    [10055.370493] nfs: server 172.16.0.2 not responding, still trying [10055.709731] nfs: server 172.16.0.2 OK
    [...]
    [10057.212469] nfs: server 172.16.0.2 OK

    real 22m0.853s
    user 1m3.264s
    sys 0m43.875s
    ```

    ...the `apt install -y joe` done beforehand took about 41 minutes. So
    the 712/80 can recover from the described problems, but package
    management should really be done from a more powerful machine, at least
    when running diskless.

    [1]: https://lists.debian.org/debian-hppa/2018/04/msg00007.html

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)