• Bizarro comms issue

    From Daemon Can@21:1/5 to All on Fri Jul 17 12:24:53 2020
    I've an 44p-170 running 4.3.

    Out of the blue, it started getting "hung" from a communications perspective. After boot, the console and telnet will work fine for a time (anywhere from 3 minutes to a couple of hours) and then all connections stop working. The console appears to connected (electrically - CD RTS/CTS etc) but it stops responding to key presses or
    attempts to reset it. The system is pingable, and you can get an FTP "connection", but with no actual transfer possible (can't list files etc.)

    Occasionally, it'll come back to life for a short while, then it'll be gone again.
    Once & a while, I'll get a "respawning too quickly" message for tty25 on the console (it's disabled).

    The only way to get it back is to hard reset it, and hope you get access to try a few things before it hangs up again.

    Has anybody ever seen something like this before? (This is my 3rd RISC box in 30 years, and this is a new one for me)

    Tks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daemon Can@21:1/5 to All on Fri Jul 24 08:56:39 2020
    More info:

    Ran diagnostics (pre-boot). System claims everything is OK. Put the system into single user mode overnight, and it was still responding this morning. Rebooted into multi-user, and it locked up shortly after displaying the login banner on the console.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Daemon Can on Fri Jul 24 23:07:45 2020
    On 7/24/20 9:56 AM, Daemon Can wrote:
    Ran diagnostics (pre-boot). System claims everything is OK. Put the
    system into single user mode overnight, and it was still responding
    this morning. Rebooted into multi-user, and it locked up shortly
    after displaying the login banner on the console.

    Try multi-user with the network disconnected. See if the console still responds a day later.

    4.3 is rather long in the tooth. Is there any chance that something is attacking it across the network?



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Daemon Can@21:1/5 to Grant Taylor on Sat Jul 25 22:20:15 2020
    On Saturday, 25 July 2020 01:07:44 UTC-4, Grant Taylor wrote:


    Try multi-user with the network disconnected. See if the console still responds a day later.


    Yeah, tried that. Also with the RANs turned off.

    4.3 is rather long in the tooth. Is there any chance that something is attacking it across the network?

    It's possible. We've been adding a fair number of remote users recently, so one of their PC's might be "pinging" the crap out of it.

    Update: Over a few reboots, during the time I had before the system would lock up, I went into the inittab and disabled everything that I thought might cause a comms issue (faxserver, unused serial ports, etc.) I stopped the Progress database servers
    from coming up after boot as well. The system remained up after this (and is still running fine in multi-user). I was even able to start Progress after I was satisfied that things were going to stay up & running)

    At this point, I'll have to conclude the it was one of the serial port processes that was swamping the machine's ability to communicate (time will tell). I've already advised people that given this is a nearly 20 year old machine, and that we've been on
    it's successor system (Windows based) for over a year now, that perhaps it's time that they shouldn't be so reliant on it's services & data (it could give up the ghost any day).

    Thank you Grant, for your suggestions and for taking the time to respond.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)