• computer (MGA8) randomly freezes

    From William Unruh@2:250/1 to All on Fri Jul 28 02:16:54 2023
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    The only thing in the journalctl logs from just before is

    -------------------------------
    Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
    Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
    Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
    Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    -- Reboot --
    ---------------------------------------


    I recently replaced the power supply thinking it might be the cause.
    But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
    months).




    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Fri Jul 28 07:26:38 2023
    On Fri, 28 Jul 2023 01:16:54 -0000 (UTC), William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    The only thing in the journalctl logs from just before is

    -------------------------------
    Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
    Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
    Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
    Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    -- Reboot --
    ---------------------------------------


    I recently replaced the power supply thinking it might be the cause.
    But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
    months).


    Off hand it seems like the cpu gets into a tight loop and quits processing interrupts.

    I would install lm_sensors, configure/run lm_sensors, sensord and enable core dump.

    I have also modified ~/.bash_profile to check for core dump files and
    uses xmessage to provide a pop up if any are found.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Richard Kettlewell@2:250/1 to All on Fri Jul 28 08:14:22 2023
    William Unruh <unruh@invalid.ca> writes:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    The only thing in the journalctl logs from just before is

    -------------------------------
    Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s |
    1.5 kB 00:00
    Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]:
    dnf-makecache.service: Succeeded.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]:
    dnf-makecache.service: Consumed 1.527s CPU time.
    Jul 24 23:28:00 tunnel.physics.ubc.ca kernel:
    Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP
    SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
    Jul 24 23:35:31 tunnel.physics.ubc.ca kernel:
    Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    -- Reboot --
    ---------------------------------------

    There’s a chance that the kernel had something to say that couldn’t be written to the logs due to the crash; unfortunately that can be rather
    hard to get hold of in this case. If the machine has a serial port then
    you might be able to get it to write kernel logs to that. But you might
    just be out of luck l-(

    --
    https://www.greenend.org.uk/rjk/

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: terraraq NNTP server (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Fri Jul 28 09:27:47 2023
    On 7/28/23 03:16, William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

    This reminds me of the issue I have had with running two plasma5
    sessions, after a while they will just eat up the memory and when all is
    gone, then everything freezes, you can't login from remote and local
    login takes too long from entering username to entering password that
    the login is canceled.

    I would run a memory check once in a while, for example use crontab to
    run this one liner:

    date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
    echo "----" >> /var/log/memusage.log

    Then you can see if there is a process that grows.

    --
    //Aho


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Fri Jul 28 10:56:40 2023
    On Fri, 28 Jul 2023 10:27:47 +0200, J.O. Aho wrote:
    On 7/28/23 03:16, William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows
    nothingsignificant that I can see just before the freeze.

    This reminds me of the issue I have had with running two plasma5
    sessions, after a while they will just eat up the memory and when all is gone, then everything freezes, you can't login from remote and local
    login takes too long from entering username to entering password that
    the login is canceled.

    I would run a memory check once in a while, for example use crontab to
    run this one liner:

    date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
    echo "----" >> /var/log/memusage.log

    Then you can see if there is a process that grows.

    Cute, and with just a little bit of scripting/coding you could automate
    the check by flagging any line with value above some watermark for cpu
    and memory percentage.

    If it was me, I would print any greater than 1.0 for ether one.

    Since values have a decimal I would guess that I would have to
    use bc to test greater than watermark value if using bash.

    Simple case statement to change watermark values based on command.
    where needed.

    If you want a journal entry it would be something like
    echo 'msg_here' | systemd-cat -t app_name_here -p type_msg_here

    message types emerg, alert, crit, err, warning, notice, info, debug

    In a multi-node set up I send an email to LAN admin.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Jim@2:250/1 to All on Fri Jul 28 15:23:27 2023
    On Fri, 28 Jul 2023 01:16:54 +0000, William Unruh wrote:

    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    The only thing in the journalctl logs from just before is

    -------------------------------
    Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
    Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
    Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
    Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    -- Reboot --
    ---------------------------------------


    I recently replaced the power supply thinking it might be the cause.
    But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
    months).


    Pure speculation, but I think a update to the lib rpms months ago
    introduced something that now and then causes the Gnome desktop
    on Wayland to lock up.

    My main machine and backup are both Intel 64-bit quad-core cpu, with
    ram 32 GB for the former and 4 GB for the latter. Loads imposed are
    trivial (mostly Opera browser, with firefox the alternative for any
    siteopera does not work well with).

    Now and then the desktop would lock up. When things were worst, the
    backup would lock up every few days or maybe few weeks, and the main
    machine less often.

    Often, it would prove impossible to kill the browser or desktop.
    Power switch restart is easier than shifting to the other
    machine and trying to ssh in to reboot so that was my remedy.

    Inability to come up with any hint of why, but rarity of lock-up,
    has led me to just wait in expectation that someday (Mageia 9
    perhaps) such problems will be ironed out.

    Lockup has been less often in the past couple of months.

    Cheers!

    jim b.




    --
    UNIX is not user-unfriendly, it merely
    expects users to be computer friendly.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From TJ@2:250/1 to All on Fri Jul 28 15:28:50 2023
    On 2023-07-27 21:16, William Unruh wrote:

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    Did you mean you updated FROM that kernel?

    I hope so. Latest Mageia 8 kernel is 5.15.122-1.

    TJ

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Paul@2:250/1 to All on Fri Jul 28 22:25:12 2023
    On 7/27/2023 9:16 PM, William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    The only thing in the journalctl logs from just before is

    -------------------------------
    Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
    Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
    Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
    Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    -- Reboot --
    ---------------------------------------


    I recently replaced the power supply thinking it might be the cause.
    But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
    months).

    This bug does not yield to linear thinking, unfortunately.

    It's not a hardware issue. There's something in the middle
    of the graphics stack, which runs at frame rate (VSYNC), and
    it has debounce for mouse built into it. (Like, your Razor mouse ?
    Bad bad idea. You want a moldy old mouse with low DPI right now. Switch to PS/2 ports,
    if ya gottem.) It uses a timer. It gets behind. And, it lunches the input subsystem.

    We're seeing this elsewhere. Or, at least, log messages of the same type
    as the bug. Have a look in your /var/log.

    Try a radical change of DE, and see if stability returns.

    As one developer described it "it's not your CPU which is too slow,
    it is the Mutter architecture which is too slow".

    Now, it's either that, or something has changed which has brought
    an old bug back.

    Paul

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Fri Jul 28 23:02:20 2023
    On 7/28/23 11:56, Bit Twister wrote:
    On Fri, 28 Jul 2023 10:27:47 +0200, J.O. Aho wrote:
    On 7/28/23 03:16, William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows
    nothingsignificant that I can see just before the freeze.

    This reminds me of the issue I have had with running two plasma5
    sessions, after a while they will just eat up the memory and when all is
    gone, then everything freezes, you can't login from remote and local
    login takes too long from entering username to entering password that
    the login is canceled.

    I would run a memory check once in a while, for example use crontab to
    run this one liner:

    date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
    echo "----" >> /var/log/memusage.log

    Then you can see if there is a process that grows.

    Cute, and with just a little bit of scripting/coding you could automate
    the check by flagging any line with value above some watermark for cpu
    and memory percentage.

    If it was me, I would print any greater than 1.0 for ether one.

    Since values have a decimal I would guess that I would have to
    use bc to test greater than watermark value if using bash.

    You could use awk, have to always point out this great tool as it bears
    my name

    date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v
    min="0.3" '$6 >= min || $6=="%MEM"' && echo "----"

    Just change the -v min="0.3" to the value you think it has be or more to
    be displayed, we also keep displaying the column names, sure there is
    better ways of doing this, but my skills not that great.


    In a multi-node set up I send an email to LAN admin.

    In a multi node setup I would use icinga or nagios to monitor things,
    sure you can do the same with elk too.

    --
    //Aho

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Fri Jul 28 23:14:16 2023
    On 2023-07-28, Bit Twister <BitTwister@mouse-potato.com> wrote:
    On Fri, 28 Jul 2023 01:16:54 -0000 (UTC), William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows
    nothingsignificant that I can see just before the freeze.

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    The only thing in the journalctl logs from just before is

    -------------------------------
    Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
    Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created. >>
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
    Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
    Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    -- Reboot --
    ---------------------------------------


    I recently replaced the power supply thinking it might be the cause.
    But although freezes seem to be occuring less frequently, it is still
    occasionally freezing. (It seems to be occuring every two or three
    months).


    Off hand it seems like the cpu gets into a tight loop and quits processing interrupts.

    I would install lm_sensors, configure/run lm_sensors, sensord and enable core dump.

    I have also modified ~/.bash_profile to check for core dump files and
    uses xmessage to provide a pop up if any are found.


    OK, Here are the last two sensord reports just before the freeze
    I do not see anything out of the ordinary here. The last one (23:28:55) occured. The freeze ( as inferred from the last entry into /var/log/syslog occured aroung 23:35:31

    (
    --------------------------------
    Jul 24 23:35:31 tunnel kernel: [1181287.585321] Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    Jul 25 09:13:25 tunnel kernel: [ 0.000000] microcode: microcode updated early to revision 0xf0, date = 2021-11-12
    --------------------------------------

    ----------------------------------
    Jul 24 23:08:55 tunnel sensord: Chip: nvme-pci-0600
    Jul 24 23:08:55 tunnel sensord: Adapter: PCI adapter
    Jul 24 23:08:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:08:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:08:55 tunnel sensord: Chip: coretemp-isa-0000
    Jul 24 23:08:55 tunnel sensord: Adapter: ISA adapter
    Jul 24 23:08:55 tunnel sensord: Package id 0: 29.0 C
    Jul 24 23:08:55 tunnel sensord: Core 0: 28.0 C
    Jul 24 23:08:55 tunnel sensord: Core 1: 27.0 C
    Jul 24 23:08:55 tunnel sensord: Core 2: 28.0 C
    Jul 24 23:08:55 tunnel sensord: Core 3: 27.0 C
    Jul 24 23:08:55 tunnel sensord: Chip: acpitz-acpi-0
    Jul 24 23:08:55 tunnel sensord: Adapter: ACPI interface
    Jul 24 23:08:55 tunnel sensord: temp1: 27.8 C
    Jul 24 23:28:55 tunnel sensord: Chip: nvme-pci-0600
    Jul 24 23:28:55 tunnel sensord: Adapter: PCI adapter
    Jul 24 23:28:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:28:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:28:55 tunnel sensord: Chip: coretemp-isa-0000
    Jul 24 23:28:55 tunnel sensord: Adapter: ISA adapter
    Jul 24 23:28:55 tunnel sensord: Package id 0: 29.0 C
    Jul 24 23:28:55 tunnel sensord: Core 0: 28.0 C
    Jul 24 23:28:55 tunnel sensord: Core 1: 27.0 C
    Jul 24 23:28:55 tunnel sensord: Core 2: 27.0 C
    Jul 24 23:28:55 tunnel sensord: Core 3: 27.0 C
    Jul 24 23:28:55 tunnel sensord: Chip: acpitz-acpi-0
    Jul 24 23:28:55 tunnel sensord: Adapter: ACPI interface
    Jul 24 23:28:55 tunnel sensord: temp1: 27.8 C ----------------------------------------------


    I cannot find any core files, and I do not think I am suppressing them.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Fri Jul 28 23:20:40 2023
    On 2023-07-28, J.O. Aho <user@example.net> wrote:
    On 7/28/23 03:16, William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows
    nothingsignificant that I can see just before the freeze.

    This reminds me of the issue I have had with running two plasma5
    sessions, after a while they will just eat up the memory and when all is gone, then everything freezes, you can't login from remote and local
    login takes too long from entering username to entering password that
    the login is canceled.

    Except in my case, nothing I type does anything, the mouse cursor does
    not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
    do nothing, gkrellm stops updating. Usually when I have run out of
    memory, somethings still work, and the machine slows down drastically as
    it starts to swap. This is just a complete sudden freeze. (Unlike this
    past time, sometimes this has happened while I am working on the
    machine. This time it happened while I was asleep)

    I would run a memory check once in a while, for example use crontab to
    run this one liner:

    date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
    echo "----" >> /var/log/memusage.log

    Then you can see if there is a process that grows.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Sat Jul 29 01:02:38 2023
    On Fri, 28 Jul 2023 22:14:16 -0000 (UTC), William Unruh wrote:
    On 2023-07-28, Bit Twister <BitTwister@mouse-potato.com> wrote:
    On Fri, 28 Jul 2023 01:16:54 -0000 (UTC), William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows
    nothingsignificant that I can see just before the freeze.

    Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

    The only thing in the journalctl logs from just before is

    -------------------------------
    Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
    Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created. >>>
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
    Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. >>> Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
    Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
    Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    -- Reboot --
    ---------------------------------------


    I recently replaced the power supply thinking it might be the cause.
    But although freezes seem to be occuring less frequently, it is still
    occasionally freezing. (It seems to be occuring every two or three
    months).


    Off hand it seems like the cpu gets into a tight loop and quits processing >> interrupts.

    I would install lm_sensors, configure/run lm_sensors, sensord and enable core dump.

    I have also modified ~/.bash_profile to check for core dump files and
    uses xmessage to provide a pop up if any are found.


    OK, Here are the last two sensord reports just before the freeze
    I do not see anything out of the ordinary here. The last one (23:28:55) occured. The freeze ( as inferred from the last entry into /var/log/syslog occured aroung 23:35:31

    (
    --------------------------------
    Jul 24 23:35:31 tunnel kernel: [1181287.585321] Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
    Jul 25 09:13:25 tunnel kernel: [ 0.000000] microcode: microcode updated early to revision 0xf0, date = 2021-11-12
    --------------------------------------

    ----------------------------------
    Jul 24 23:08:55 tunnel sensord: Chip: nvme-pci-0600
    Jul 24 23:08:55 tunnel sensord: Adapter: PCI adapter
    Jul 24 23:08:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:08:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:08:55 tunnel sensord: Chip: coretemp-isa-0000
    Jul 24 23:08:55 tunnel sensord: Adapter: ISA adapter
    Jul 24 23:08:55 tunnel sensord: Package id 0: 29.0 C
    Jul 24 23:08:55 tunnel sensord: Core 0: 28.0 C
    Jul 24 23:08:55 tunnel sensord: Core 1: 27.0 C
    Jul 24 23:08:55 tunnel sensord: Core 2: 28.0 C
    Jul 24 23:08:55 tunnel sensord: Core 3: 27.0 C
    Jul 24 23:08:55 tunnel sensord: Chip: acpitz-acpi-0
    Jul 24 23:08:55 tunnel sensord: Adapter: ACPI interface
    Jul 24 23:08:55 tunnel sensord: temp1: 27.8 C
    Jul 24 23:28:55 tunnel sensord: Chip: nvme-pci-0600
    Jul 24 23:28:55 tunnel sensord: Adapter: PCI adapter
    Jul 24 23:28:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:28:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
    Jul 24 23:28:55 tunnel sensord: Chip: coretemp-isa-0000
    Jul 24 23:28:55 tunnel sensord: Adapter: ISA adapter
    Jul 24 23:28:55 tunnel sensord: Package id 0: 29.0 C
    Jul 24 23:28:55 tunnel sensord: Core 0: 28.0 C
    Jul 24 23:28:55 tunnel sensord: Core 1: 27.0 C
    Jul 24 23:28:55 tunnel sensord: Core 2: 27.0 C
    Jul 24 23:28:55 tunnel sensord: Core 3: 27.0 C
    Jul 24 23:28:55 tunnel sensord: Chip: acpitz-acpi-0
    Jul 24 23:28:55 tunnel sensord: Adapter: ACPI interface
    Jul 24 23:28:55 tunnel sensord: temp1: 27.8 C ----------------------------------------------


    I suggest that you should set max value allowed for temps.


    I cannot find any core files, and I do not think I am suppressing them.


    Pretty sure they are not enabled by default.
    You have to make configuration file changes.
    Quick scan of my install/change scripts finds this custom
    drop-in file and settings

    # grep core /etc/sysctl.d/xx__sysctl.conf
    # Enabling suid dump PID appending and set core location and name kernel.core_uses_pid = 1
    kernel.core_pattern = /var/tmp/%e_%p_%s.core
    # net.core.rmem_max = 1048576


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Sat Jul 29 01:17:45 2023
    On Sat, 29 Jul 2023 00:02:20 +0200, J.O. Aho wrote:
    On 7/28/23 11:56, Bit Twister wrote:
    On Fri, 28 Jul 2023 10:27:47 +0200, J.O. Aho wrote:
    On 7/28/23 03:16, William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing. >>>> Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the >>>> power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows
    nothingsignificant that I can see just before the freeze.

    This reminds me of the issue I have had with running two plasma5
    sessions, after a while they will just eat up the memory and when all is >>> gone, then everything freezes, you can't login from remote and local
    login takes too long from entering username to entering password that
    the login is canceled.

    I would run a memory check once in a while, for example use crontab to
    run this one liner:

    date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
    echo "----" >> /var/log/memusage.log

    Then you can see if there is a process that grows.

    Cute, and with just a little bit of scripting/coding you could automate
    the check by flagging any line with value above some watermark for cpu
    and memory percentage.

    If it was me, I would print any greater than 1.0 for ether one.

    Since values have a decimal I would guess that I would have to
    use bc to test greater than watermark value if using bash.

    You could use awk, have to always point out this great tool as it bears
    my name

    date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v
    min="0.3" '$6 >= min || $6=="%MEM"' && echo "----"

    Just change the -v min="0.3" to the value you think it has be or more to
    be displayed, we also keep displaying the column names, sure there is
    better ways of doing this, but my skills not that great.

    Yeah, but different nodes can have different apps needing higher limits.
    I'll be using bash and a case statement to change max values based on app.

    for example on my myth node I have
    # ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11
    USER UID COMMAND PID %CPU %MEM
    bittwis+ 1500 mythfrontend 4124 15.4 6.8
    mysql 976 mysqld 780 1.1 3.2
    mythtv 90 mythbackend 3804 1.5 3.0
    bittwis+ 1500 net_applet 2223 0.2 1.3
    bittwis+ 1500 xfwm4 2153 1.6 1.2

    yet my normal web browsing node has
    ]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11
    USER UID COMMAND PID %CPU %MEM
    root 0 Xorg 3147 0.4 0.9
    bittwis+ 1500 xfwm4 4645 0.1 0.7
    bittwis+ 1500 xfdesktop 4661 0.0 0.7
    mysql 978 mysqld 3194 0.0 0.5
    esept 1513 geany 9858 0.1 0.3
    root 0 geany 9953 0.4 0.3



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Sat Jul 29 09:18:32 2023
    On 7/29/23 00:20, William Unruh wrote:
    On 2023-07-28, J.O. Aho <user@example.net> wrote:
    On 7/28/23 03:16, William Unruh wrote:
    I have a disturbing system, which every once in a while freezes. The
    sceen on the monitor is some X scene (usually has Chrome running, but
    they again that is often the case) but the keyboard,mouse, do nothing.
    Trying to log on from the network fails with no response from the
    machine. Alt-ctrl-del does nothing. The only way to recover is via the
    power switch on the back.
    Afterwards, looking at /var/log/syslog, or /var/log/messages shows
    nothingsignificant that I can see just before the freeze.

    This reminds me of the issue I have had with running two plasma5
    sessions, after a while they will just eat up the memory and when all is
    gone, then everything freezes, you can't login from remote and local
    login takes too long from entering username to entering password that
    the login is canceled.

    Except in my case, nothing I type does anything, the mouse cursor does
    not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
    do nothing, gkrellm stops updating. Usually when I have run out of
    memory, somethings still work, and the machine slows down drastically as
    it starts to swap. This is just a complete sudden freeze. (Unlike this
    past time, sometimes this has happened while I am working on the
    machine. This time it happened while I was asleep)

    This is the exact same and as you mention the the swap, it's exactly the
    same and I can say even if you disable swap you will have the same behavior.

    I think you should monitor the memory usage, you could add this to your /etc/crontab

    */5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
    --sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
    "----" >> /var/log/mem.log)

    Will run every 5 minutes and give you a lost of all processes that uses
    10% or more of your memory. Then you take a look at /var/log/mem.log
    from time to time or after a freeze and see which processes you had there.

    What you can also do is to have an ssh connection from another machine
    always running, this way you may be able to kill processes, you will not
    be able to run ps, top, htop or other tools as they will be so slow that
    they will block your shell and all you can do is just reset the machine
    by pressing the button. If your issue is the desktop environment, I
    would recommend to switch to another one. Enlightenment I think may have
    the least of these issues, but I'm no fan of it, at the moment I'm using
    lxqt.

    --
    //Aho



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From Richard Kettlewell@2:250/1 to All on Sat Jul 29 10:44:07 2023
    William Unruh <unruh@invalid.ca> writes:
    Except in my case, nothing I type does anything, the mouse cursor does
    not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
    do nothing, gkrellm stops updating. Usually when I have run out of
    memory, somethings still work, and the machine slows down drastically as
    it starts to swap. This is just a complete sudden freeze. (Unlike this
    past time, sometimes this has happened while I am working on the
    machine. This time it happened while I was asleep)

    Yes. Normal behavior for running out of RAM is swapping, and for running
    out of RAM+swap is to start killing user processes.

    In this case:

    | Trying to log on from the network fails with no response from the
    | machine.

    Does it respond to ping?

    If so then the kernel’s still working, at least a bit (the lack of
    kernel logs suggest everything above that is dead).

    If it does not ping then the kernel has crashed, either due to a kernel
    bug or a hardware fault.

    --
    https://www.greenend.org.uk/rjk/

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: terraraq NNTP server (2:250/1@fidonet)
  • From David W. Hodgins@2:250/1 to All on Sat Jul 29 14:09:32 2023
    On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:
    */5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
    --sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
    "----" >> /var/log/mem.log)

    That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it the column number is wrong.

    [dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
    USER UID COMMAND PID %CPU %MEM
    dave 500 opera 6969 2.8 4.7
    ddclient 468 ddclient - slee 5691 0.0 0.0
    [dave@x3 ~]$ cat /proc/5691/comm
    ddclient - slee
    [dave@x3 ~]$ cat /proc/5691/cmdline
    ddclient - sleeping for 90 seconds[dave@x3 ~]$

    I don't see a reliable field separator.

    Regards, Dave Hodgins

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Sat Jul 29 16:39:48 2023
    On 7/29/23 15:09, David W. Hodgins wrote:
    On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:
    */5  * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
    --sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
    "----" >> /var/log/mem.log)

    That is not reliable. /proc/$PID/comm may contain spaces so by time awk
    gets it
    the column number is wrong.

    [dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
    USER       UID COMMAND             PID %CPU %MEM dave       500 opera              6969  2.8  4.7 ddclient   468 ddclient - slee    5691  0.0  0.0
    [dave@x3 ~]$ cat /proc/5691/comm
    ddclient - slee
    [dave@x3 ~]$ cat /proc/5691/cmdline
    ddclient - sleeping for 90 seconds[dave@x3 ~]$

    I don't see a reliable field separator.

    Regards, Dave Hodgins

    Easy fix:ps -Ao user,uid,pid,pcpu,pmem,comm --sort=-pmem | awk -v
    min="3.0" '$5 >= min || $5=="%MEM"'

    --
    //Aho



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Sat Jul 29 18:12:05 2023
    On 2023-07-29, David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
    On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:
    */5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
    --sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
    "----" >> /var/log/mem.log)

    That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
    the column number is wrong.

    [dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'

    How about $NF instead of $6?

    USER UID COMMAND PID %CPU %MEM
    dave 500 opera 6969 2.8 4.7
    ddclient 468 ddclient - slee 5691 0.0 0.0
    [dave@x3 ~]$ cat /proc/5691/comm
    ddclient - slee
    [dave@x3 ~]$ cat /proc/5691/cmdline
    ddclient - sleeping for 90 seconds[dave@x3 ~]$

    I don't see a reliable field separator.

    Regards, Dave Hodgins

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Sat Jul 29 18:16:07 2023
    On 2023-07-29, Richard Kettlewell <invalid@invalid.invalid> wrote:
    William Unruh <unruh@invalid.ca> writes:
    Except in my case, nothing I type does anything, the mouse cursor does
    not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
    do nothing, gkrellm stops updating. Usually when I have run out of
    memory, somethings still work, and the machine slows down drastically as
    it starts to swap. This is just a complete sudden freeze. (Unlike this
    past time, sometimes this has happened while I am working on the
    machine. This time it happened while I was asleep)

    Yes. Normal behavior for running out of RAM is swapping, and for running
    out of RAM+swap is to start killing user processes.

    Yes, and there is no evidence for that.


    In this case:

    | Trying to log on from the network fails with no response from the
    | machine.

    Does it respond to ping?

    No.


    If so then the kernel’s still working, at least a bit (the lack of
    kernel logs suggest everything above that is dead).

    If it does not ping then the kernel has crashed, either due to a kernel
    bug or a hardware fault.

    That is sure what it looks like. Weird thing is that the video card is
    still sending out the last image, so it is running.



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Sat Jul 29 19:52:19 2023
    On Sat, 29 Jul 2023 09:09:32 -0400, David W. Hodgins wrote:
    On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:
    */5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
    --sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
    "----" >> /var/log/mem.log)

    That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
    the column number is wrong.

    [dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
    USER UID COMMAND PID %CPU %MEM
    dave 500 opera 6969 2.8 4.7
    ddclient 468 ddclient - slee 5691 0.0 0.0
    [dave@x3 ~]$ cat /proc/5691/comm
    ddclient - slee
    [dave@x3 ~]$ cat /proc/5691/cmdline
    ddclient - sleeping for 90 seconds[dave@x3 ~]$

    I don't see a reliable field separator.

    My bash solution
    while read -r line ; do
    set -- $line
    _bin=$3
    shift $(( $# - 2 ))
    _cpu=$1
    _mem=$2

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Sat Jul 29 20:45:33 2023
    On 29/07/2023 11:44, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    Except in my case, nothing I type does anything, the mouse cursor does
    not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
    do nothing, gkrellm stops updating. Usually when I have run out of
    memory, somethings still work, and the machine slows down drastically as
    it starts to swap. This is just a complete sudden freeze. (Unlike this
    past time, sometimes this has happened while I am working on the
    machine. This time it happened while I was asleep)

    Yes. Normal behavior for running out of RAM is swapping, and for running
    out of RAM+swap is to start killing user processes.

    The problem is that the swapping in and out Xorg and the desktop
    environment makes things extremely slow and the tradition in Linux is to
    kill a random process (not sure if it's changed nowadays), the
    likelihood that the right process is killed is slim and if not the right
    one is killed, the swapping will not end and the swap will keep the
    system slow and even if it kills the right process it will in reality
    take hours before anything happens due of the swap in and out.

    My experience is that things do not get better even if you disable swap (seldom you really need swap when having 64GB RAM), for some reason it
    seems to be the disk is as active as during the swap in / out. Could it
    have to do with zram?

    Something that makes a difference is setting a hard memory limit on the process, then it will be killed when it tries to use more RAM than the
    limit.

    Of course at this point we don't know what the issues is for OP and he
    don't use plasma5, so I doubt it's plasmashell that is his issue.

    --
    //Aho

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From David W. Hodgins@2:250/1 to All on Sat Jul 29 21:09:21 2023
    On Sat, 29 Jul 2023 14:52:19 -0400, Bit Twister <BitTwister@mouse-potato.com> wrote:

    On Sat, 29 Jul 2023 09:09:32 -0400, David W. Hodgins wrote:
    On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:
    */5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
    --sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
    "----" >> /var/log/mem.log)

    That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
    the column number is wrong.

    [dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
    USER UID COMMAND PID %CPU %MEM
    dave 500 opera 6969 2.8 4.7
    ddclient 468 ddclient - slee 5691 0.0 0.0
    [dave@x3 ~]$ cat /proc/5691/comm
    ddclient - slee
    [dave@x3 ~]$ cat /proc/5691/cmdline
    ddclient - sleeping for 90 seconds[dave@x3 ~]$

    I don't see a reliable field separator.

    My bash solution
    while read -r line ; do
    set -- $line
    _bin=$3
    shift $(( $# - 2 ))
    _cpu=$1
    _mem=$2

    The solution of putting the comm field last works fine.

    $ ps -Ao user,uid,pid,pcpu,pmem,comm --sort=-pmem | awk -v min="2.0" '$5 >= min || $5=="%MEM"'
    USER UID PID %CPU %MEM COMMAND
    dave 500 6969 3.7 4.9 opera
    dave 500 33667 7.5 4.0 firefox
    dave 500 6186 0.7 2.9 plasmashell
    dave 500 34910 1.3 2.3 Isolated Web Co
    dave 500 34975 0.8 2.3 Isolated Web Co
    dave 500 53719 2.1 2.0 Isolated Web Co

    The Isolated Web Content processes are all firefox.

    Regards, Dave Hodgins

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Sat Jul 29 23:39:00 2023
    On 2023-07-29, J.O. Aho <user@example.net> wrote:
    On 29/07/2023 11:44, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    Except in my case, nothing I type does anything, the mouse cursor does
    not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
    do nothing, gkrellm stops updating. Usually when I have run out of
    memory, somethings still work, and the machine slows down drastically as >>> it starts to swap. This is just a complete sudden freeze. (Unlike this
    past time, sometimes this has happened while I am working on the
    machine. This time it happened while I was asleep)

    Yes. Normal behavior for running out of RAM is swapping, and for running
    out of RAM+swap is to start killing user processes.

    The problem is that the swapping in and out Xorg and the desktop
    environment makes things extremely slow and the tradition in Linux is to kill a random process (not sure if it's changed nowadays), the
    likelihood that the right process is killed is slim and if not the right
    one is killed, the swapping will not end and the swap will keep the
    system slow and even if it kills the right process it will in reality
    take hours before anything happens due of the swap in and out.

    My experience is that things do not get better even if you disable swap (seldom you really need swap when having 64GB RAM), for some reason it
    seems to be the disk is as active as during the swap in / out. Could it
    have to do with zram?

    Something that makes a difference is setting a hard memory limit on the process, then it will be killed when it tries to use more RAM than the limit.

    Of course at this point we don't know what the issues is for OP and he
    don't use plasma5, so I doubt it's plasmashell that is his issue.

    Actually I do use Plasma and sddm.
    But as I said there is little indication beforehand that is is swapping
    that is the problem (in the times that it happened to me while I was
    working on the system.-- as I said this last time it occured just before bedtime and I discovered it next morning. )



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Sun Jul 30 04:40:43 2023
    On Sat, 29 Jul 2023 16:09:21 -0400, David W. Hodgins wrote:
    On Sat, 29 Jul 2023 14:52:19 -0400, Bit Twister <BitTwister@mouse-potato.com> wrote:

    On Sat, 29 Jul 2023 09:09:32 -0400, David W. Hodgins wrote:
    On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:
    */5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
    --sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
    "----" >> /var/log/mem.log)

    That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
    the column number is wrong.

    [dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
    USER UID COMMAND PID %CPU %MEM
    dave 500 opera 6969 2.8 4.7
    ddclient 468 ddclient - slee 5691 0.0 0.0
    [dave@x3 ~]$ cat /proc/5691/comm
    ddclient - slee
    [dave@x3 ~]$ cat /proc/5691/cmdline
    ddclient - sleeping for 90 seconds[dave@x3 ~]$

    I don't see a reliable field separator.

    My bash solution
    while read -r line ; do
    set -- $line
    _bin=$3
    shift $(( $# - 2 ))
    _cpu=$1
    _mem=$2

    The solution of putting the comm field last works fine.

    $ ps -Ao user,uid,pid,pcpu,pmem,comm --sort=-pmem | awk -v min="2.0" '$5 >= min || $5=="%MEM"'
    USER UID PID %CPU %MEM COMMAND
    dave 500 6969 3.7 4.9 opera
    dave 500 33667 7.5 4.0 firefox
    dave 500 6186 0.7 2.9 plasmashell
    dave 500 34910 1.3 2.3 Isolated Web Co
    dave 500 34975 0.8 2.3 Isolated Web Co
    dave 500 53719 2.1 2.0 Isolated Web Co

    Well, damn, got to go get crowbar to get head out of derriere.
    Nice solution.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From David W. Hodgins@2:250/1 to All on Sun Jul 30 05:09:02 2023
    On Sat, 29 Jul 2023 23:40:43 -0400, Bit Twister <BitTwister@mouse-potato.com> wrote:
    Nice solution.

    It was J.O. Aho that posted that solution.

    Regards, Dave Hodgins

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Richard Kettlewell@2:250/1 to All on Sun Jul 30 11:36:00 2023
    William Unruh <unruh@invalid.ca> writes:
    On 2023-07-29, Richard Kettlewell <invalid@invalid.invalid> wrote:
    Does it respond to ping?

    No.
    [...]
    If it does not ping then the kernel has crashed, either due to a kernel
    bug or a hardware fault.

    That is sure what it looks like. Weird thing is that the video card is
    still sending out the last image, so it is running.

    The video card keeps transmits to the display under its own steam, it
    doesn’t need the kernel to tell it to do so.

    --
    https://www.greenend.org.uk/rjk/

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: terraraq NNTP server (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Sun Jul 30 11:54:33 2023
    On 7/30/23 00:39, William Unruh wrote:
    On 2023-07-29, J.O. Aho <user@example.net> wrote:
    On 29/07/2023 11:44, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    Except in my case, nothing I type does anything, the mouse cursor does >>>> not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp >>>> do nothing, gkrellm stops updating. Usually when I have run out of
    memory, somethings still work, and the machine slows down drastically as >>>> it starts to swap. This is just a complete sudden freeze. (Unlike this >>>> past time, sometimes this has happened while I am working on the
    machine. This time it happened while I was asleep)

    Yes. Normal behavior for running out of RAM is swapping, and for running >>> out of RAM+swap is to start killing user processes.

    The problem is that the swapping in and out Xorg and the desktop
    environment makes things extremely slow and the tradition in Linux is to
    kill a random process (not sure if it's changed nowadays), the
    likelihood that the right process is killed is slim and if not the right
    one is killed, the swapping will not end and the swap will keep the
    system slow and even if it kills the right process it will in reality
    take hours before anything happens due of the swap in and out.

    My experience is that things do not get better even if you disable swap
    (seldom you really need swap when having 64GB RAM), for some reason it
    seems to be the disk is as active as during the swap in / out. Could it
    have to do with zram?

    Something that makes a difference is setting a hard memory limit on the
    process, then it will be killed when it tries to use more RAM than the
    limit.

    Of course at this point we don't know what the issues is for OP and he
    don't use plasma5, so I doubt it's plasmashell that is his issue.

    Actually I do use Plasma and sddm.
    But as I said there is little indication beforehand that is is swapping
    that is the problem (in the times that it happened to me while I was
    working on the system.-- as I said this last time it occured just before bedtime and I discovered it next morning. )

    Not sure about your setup, I have nVidia RTX 2060 with the closed source driver (generally the latest) with a kernel that tend to be the latest
    from the distribution together with the latest version of kde/plasma.

    Running two Xorg with their own plasma5, one with weather applet and
    directory display and the other one with just the directory display.
    Fixed image background on both.

    The memory leak is quite random when it starts, but latest 2 days after
    start of plasmashell it will have begun to eat up a lot of memory, first slowly and then faster and faster. When using strace it seems
    plasmashell is trying to connect somewhere and it gets a timeout, during
    the timeout period it manage to queue 3-4 more requests (haven't figured
    out where it tries to connect). Usually it's just one of the
    plasmashells that hogs a lot of memory, like 30-40GB while the other may
    just be around 10GB (which one uses the most is random). When the RAM
    run out, then the disk activity starts (remind you that I do not have a
    swap) and when the disk activity has begun, the Xorg will be so slow and
    if you move the mouse pointer you will see it move a small distance and
    then freeze. It will be now impossible to ssh to the machine, if you
    have a running ssh connection to the machine you can still write
    something, but it will be slow like a 120baud modem. If you already are
    root ('su -' before it happen), then you can run 'killall -9
    plasmashell' and after a short while the machine will be somewhat usable again, now you just need to start the plasmashell for each user, this
    tend to be a two step process, first "su - username" and run
    "DISPLAY=:X.0 plasmashell --replace", then go and switch to that users
    VT and then run from konsole "plasmashell --replace", and then repeat
    the process with the next user.

    There is a bug report on this, I did keep on updating it for each
    version of kde/plasma I had installed and from time to time send in more
    logs, but the process has been 0%.

    There is a known bug in the wallpaper slideshow, but never used that
    one, so it's not the one affecting me.

    Nowadays I don't use plasma5 at all, I will make a try with plasma6 when
    it's in the distros repository, but until then I will keep on using lxqt
    and I suggest you do also check for another desktop environment to use
    until plasma6 is out for your distro.

    --
    //Aho



    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From Paul@2:250/1 to All on Sun Jul 30 12:01:12 2023
    On 7/30/2023 6:36 AM, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    On 2023-07-29, Richard Kettlewell <invalid@invalid.invalid> wrote:
    Does it respond to ping?

    No.
    [...]
    If it does not ping then the kernel has crashed, either due to a kernel
    bug or a hardware fault.

    That is sure what it looks like. Weird thing is that the video card is
    still sending out the last image, so it is running.

    The video card keeps transmits to the display under its own steam, it doesn’t need the kernel to tell it to do so.


    For a non-tearing display, I would expect double buffering, and some
    sort of command issued to aid compositing (Z-axis sorting). That may
    require a minimum of some sort of command list to implement compositing.
    A command list issued at least, 60 times a second.

    And there's more than one level of compositing going on, as a web browser
    also composites, even when its "window" is "stable". It is still composing
    the same stable image, 60 times a second.

    Even when a computer is idle, modern developers have made it busy.
    In ways that were never present in the past. There was a time, when
    a computer was actually idle -- if you moved a window, it required a
    redraw, but at least there was proportionate behavior to user input.
    Whereas the modern way of doing things, is just abusive.

    Making conclusions about how stuff works on computers now, by
    observing failures, is pretty difficult. But there are some
    examples available. For example, when the memory consumption
    of a browser used to shoot up, I managed to trace that one
    day (by noticing a graphical failure-to=update), to a
    breakage in the browser "output" being "consumed" by the
    window manager. It seemed to indicate, that the browser
    process did not recognize an overflow condition and it
    did not "throw away" unconsumed content it was sending
    to the window manager.

    It's not like the old days, where some game would crash,
    and various parts of the mechanical aspects of how graphics
    and sound worked, presented themselves. The failures now,
    are much harder to observe. For example, on an old game
    crash, the sound buffer pointer was reused again and again,
    so a repetitive sound would come out of the speaker until
    the PC was reset.

    Paul

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Sun Jul 30 18:59:22 2023
    On 2023-07-30, J.O. Aho <user@example.net> wrote:
    On 7/30/23 00:39, William Unruh wrote:
    On 2023-07-29, J.O. Aho <user@example.net> wrote:
    On 29/07/2023 11:44, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    ....

    Running two Xorg with their own plasma5, one with weather applet and directory display and the other one with just the directory display.
    Fixed image background on both.

    The memory leak is quite random when it starts, but latest 2 days after start of plasmashell it will have begun to eat up a lot of memory, first slowly and then faster and faster. When using strace it seems
    plasmashell is trying to connect somewhere and it gets a timeout, during
    the timeout period it manage to queue 3-4 more requests (haven't figured
    out where it tries to connect). Usually it's just one of the
    plasmashells that hogs a lot of memory, like 30-40GB while the other may just be around 10GB (which one uses the most is random). When the RAM

    How much ram do you have?
    Mu plasmashell right now is 2GB VSZ

    I run Intel onboard graphics. There is no slowdown beforehand. It just
    as if someone threw a switch to shut everything down, except that the
    image keeps showing (ie the grephics card is still sending out a
    signal).
    No disk activity, no slowdown. just stop.

    run out, then the disk activity starts (remind you that I do not have a swap) and when the disk activity has begun, the Xorg will be so slow and
    if you move the mouse pointer you will see it move a small distance and
    then freeze. It will be now impossible to ssh to the machine, if you
    have a running ssh connection to the machine you can still write
    something, but it will be slow like a 120baud modem. If you already are
    root ('su -' before it happen), then you can run 'killall -9
    plasmashell' and after a short while the machine will be somewhat usable again, now you just need to start the plasmashell for each user, this
    tend to be a two step process, first "su - username" and run
    "DISPLAY=:X.0 plasmashell --replace", then go and switch to that users
    VT and then run from konsole "plasmashell --replace", and then repeat
    the process with the next user.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Mike Easter@2:250/1 to All on Sun Jul 30 20:11:21 2023
    J.O. Aho wrote:
    Nowadays I don't use plasma5 at all, I will make a try with plasma6 when it's in the distros repository, but until then I will keep on using lxqt
    and I suggest you do also check for another desktop environment to use
    until plasma6 is out for your distro.

    I realize that we are talking about an 'unknown' diagnosis, but it
    sounds like it is closely associated w/ the KDE desktop, that
    theoretically anyone using plasma5 could spring a memory leak, or, is
    there potentially some much more limited concept going on here?

    Like some 'specific' KDE 'gear' that if someone didn't have that KDE
    app, they wouldn't have a leak?

    --
    Mike Easter

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Sun Jul 30 21:03:28 2023
    On 7/30/23 19:59, William Unruh wrote:
    On 2023-07-30, J.O. Aho <user@example.net> wrote:
    On 7/30/23 00:39, William Unruh wrote:
    On 2023-07-29, J.O. Aho <user@example.net> wrote:
    On 29/07/2023 11:44, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    ...

    Running two Xorg with their own plasma5, one with weather applet and
    directory display and the other one with just the directory display.
    Fixed image background on both.

    The memory leak is quite random when it starts, but latest 2 days after
    start of plasmashell it will have begun to eat up a lot of memory, first
    slowly and then faster and faster. When using strace it seems
    plasmashell is trying to connect somewhere and it gets a timeout, during
    the timeout period it manage to queue 3-4 more requests (haven't figured
    out where it tries to connect). Usually it's just one of the
    plasmashells that hogs a lot of memory, like 30-40GB while the other may
    just be around 10GB (which one uses the most is random). When the RAM

    How much ram do you have?

    64GB, I used to have 32, but I ran into the issue and I was hoping that doubling the amount of RAM should make it less affecting, gosh I was wrong.

    Mu plasmashell right now is 2GB VSZ

    I think I usually had it to take ~700MB when it started, if it was up on
    2GB I knew it would start to eat more memory, so a "plasmashell
    --replace" was the only thing that prevented it from going bad, for a
    while, in best case you buy yourself another day.


    I run Intel onboard graphics. There is no slowdown beforehand. It just
    as if someone threw a switch to shut everything down, except that the
    image keeps showing (ie the grephics card is still sending out a
    signal).

    You will have a desktop on you screen all the way till you reboot, this
    far I never seen the plasmashell process to be auto killed, if it would
    then your screen would turn black, but a mouse pointer that you can move around.


    No disk activity, no slowdown. just stop.

    If you had been a hour earlier, I think you would maybe caught the
    slowdown, as it's related to the amount of free RAM you have left, when
    you don't it's the freeze time.

    --
    //Aho


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Sun Jul 30 21:10:56 2023
    On 7/30/23 21:11, Mike Easter wrote:
    J.O. Aho wrote:
    Nowadays I don't use plasma5 at all, I will make a try with plasma6
    when it's in the distros repository, but until then I will keep on
    using lxqt and I suggest you do also check for another desktop
    environment to use until plasma6 is out for your distro.

    I realize that we are talking about an 'unknown' diagnosis, but it
    sounds like it is closely associated w/ the KDE desktop, that
    theoretically anyone using plasma5 could spring a memory leak, or, is
    there potentially some much more limited concept going on here?

    Like some 'specific' KDE 'gear' that if someone didn't have that KDE
    app, they wouldn't have a leak?


    I have had this issue for some years, it's only affected my primary
    desktop where I have two X sessions running.

    I do not have this issue on our laptops which just runs one X session,
    all run the same distro, all of them are up to date, desktop nVidia
    graphics, laptop 1 intel, and laptop intel/nVidia. Appletwise they are configured the same.

    If I understood right from William, he has intel on his desktop and runs
    one X session.

    So we do not know what causes the issue, so KDE/plasma5 has some odd
    memory leak that seems to be random, of course it could be a setting
    that causes it, but from the xsession-errors log there ain't anything
    that would hint on what it could be.

    --
    //Aho

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From Jim Diamond@2:250/1 to All on Mon Jul 31 01:30:05 2023
    On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:
    On 2023-07-30, J.O. Aho <user@example.net> wrote:
    On 7/30/23 00:39, William Unruh wrote:
    On 2023-07-29, J.O. Aho <user@example.net> wrote:
    On 29/07/2023 11:44, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    ...

    Running two Xorg with their own plasma5, one with weather applet and
    directory display and the other one with just the directory display.
    Fixed image background on both.

    The memory leak is quite random when it starts, but latest 2 days after
    start of plasmashell it will have begun to eat up a lot of memory, first
    slowly and then faster and faster. When using strace it seems
    plasmashell is trying to connect somewhere and it gets a timeout, during
    the timeout period it manage to queue 3-4 more requests (haven't figured
    out where it tries to connect). Usually it's just one of the
    plasmashells that hogs a lot of memory, like 30-40GB while the other may
    just be around 10GB (which one uses the most is random). When the RAM

    How much ram do you have?
    Mu plasmashell right now is 2GB VSZ

    I don't think the VSZ is all that significant.

    What is the RSS?

    (Right now I have a chromium process whose VSZ and RSS are reported by ps
    to be, respectively, 1185776448 and 63232.)

    Jim

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Richard Kettlewell@2:250/1 to All on Mon Jul 31 08:11:47 2023
    Jim Diamond <JimDiamond@jdvb.ca> writes:
    On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:
    How much ram do you have?
    Mu plasmashell right now is 2GB VSZ

    I don't think the VSZ is all that significant.

    Indeed VSZ tells you very little - for instance it includes a 2MB dead
    page in the middle of every shared library, which (in a complex process)
    scales up VSZ to something huge, but consumes almost no real resources.

    What is the RSS?

    I think the theories about user space memory leaks are a red herring.
    The reported behavior is just not consistent with them.

    --
    https://www.greenend.org.uk/rjk/

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: terraraq NNTP server (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Mon Jul 31 08:45:27 2023
    On 7/30/23 22:03, J.O. Aho wrote:
    On 7/30/23 19:59, William Unruh wrote:

    Mu plasmashell right now is 2GB VSZ

    I think I usually had it to take ~700MB when it started, if it was up on
    2GB I knew it would start to eat more memory, so a "plasmashell
    --replace" was the only thing that prevented it from going bad, for a
    while, in best case you buy yourself another day.

    Sorry, I missed you looked at VSZ, that one is irrelevant, look at RSS
    or calculate from %MEM.

    --
    //Aho

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Mon Jul 31 08:49:52 2023
    On 2023-07-30, J.O. Aho <user@example.net> wrote:
    On 7/30/23 19:59, William Unruh wrote:
    On 2023-07-30, J.O. Aho <user@example.net> wrote:
    On 7/30/23 00:39, William Unruh wrote:
    On 2023-07-29, J.O. Aho <user@example.net> wrote:
    On 29/07/2023 11:44, Richard Kettlewell wrote:
    William Unruh <unruh@invalid.ca> writes:
    ....


    How much ram do you have?

    64GB, I used to have 32, but I ran into the issue and I was hoping that doubling the amount of RAM should make it less affecting, gosh I was wrong.

    I have 8 GB. ps says plasmashell right now has VSZ of 1.9GB and %MEM of
    3.1% ( which sould be 250MB I guess.


    Mu plasmashell right now is 2GB VSZ

    I think I usually had it to take ~700MB when it started, if it was up on
    2GB I knew it would start to eat more memory, so a "plasmashell
    --replace" was the only thing that prevented it from going bad, for a
    while, in best case you buy yourself another day.


    I run Intel onboard graphics. There is no slowdown beforehand. It just
    as if someone threw a switch to shut everything down, except that the
    image keeps showing (ie the grephics card is still sending out a
    signal).

    You will have a desktop on you screen all the way till you reboot, this
    far I never seen the plasmashell process to be auto killed, if it would
    then your screen would turn black, but a mouse pointer that you can move around.


    No disk activity, no slowdown. just stop.

    If you had been a hour earlier, I think you would maybe caught the
    slowdown, as it's related to the amount of free RAM you have left, when
    you don't it's the freeze time.

    I have about 8GB of swap, so it should slow down when it starts to use
    swap.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From J.O. Aho@2:250/1 to All on Mon Jul 31 09:37:56 2023
    On 7/31/23 09:49, William Unruh wrote:
    On 2023-07-30, J.O. Aho <user@example.net> wrote:
    On 7/30/23 19:59, William Unruh wrote:

    ...


    How much ram do you have?

    64GB, I used to have 32, but I ran into the issue and I was hoping that
    doubling the amount of RAM should make it less affecting, gosh I was wrong.

    I have 8 GB. ps says plasmashell right now has VSZ of 1.9GB and %MEM of
    3.1% ( which sould be 250MB I guess.

    Then your plasmashell is using 250MB at the moment, the VSZ is a in
    theory amount of RAM the process would use if it had to load everything
    at once, but as you ain't using all the features of plasmahsell this
    will not happen.

    The RSS amount will include all the memory dependencies may use like libraries, as most libraries in Linux are shared, then the same shared
    library can be reported multiple times (once in each process that uses
    it), this applies to the %MEM too.

    If your %MEM reaches 25%, then it will use like 2GB mem and the RSS will
    also give you something in that region too +/- something as it's more
    accurate value than %MEM.


    If you had been a hour earlier, I think you would maybe caught the
    slowdown, as it's related to the amount of free RAM you have left, when
    you don't it's the freeze time.

    I have about 8GB of swap, so it should slow down when it starts to use
    swap.

    Depends on what has been swapped out, if it's just things that are
    seldom used (say hibernated tabs in your browser), then you will not
    notice anything until you load a such tab. The issue is when it start
    swap in and out data that it's needing, this will lead to that the disk
    light will blink really rapidly and everything goes extremely slowly or
    seems to have frozen.


    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)
  • From Richard Kettlewell@2:250/1 to All on Mon Jul 31 12:28:04 2023
    "J.O. Aho" <user@example.net> writes:
    Then your plasmashell is using 250MB at the moment, the VSZ is a in
    theory amount of RAM the process would use if it had to load
    everything at once, but as you ain't using all the features of
    plasmahsell this will not happen.

    On a 64-bit Linux, VSZ is often way more than the RAM a process could
    possibly use. Each shared library has a 2MB PROT_NONE mapping that
    contributes to VSZ but can never consume any RAM, and GUI processses
    often have huge numbers of shared libraries so this can easily reach
    into the hundreds of megabytes.

    See https://www.greenend.org.uk/rjk/tech/dataseg.html for discussion.

    --
    https://www.greenend.org.uk/rjk/

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: terraraq NNTP server (2:250/1@fidonet)
  • From Jim Diamond@2:250/1 to All on Mon Jul 31 15:24:56 2023
    On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:
    Jim Diamond <JimDiamond@jdvb.ca> writes:
    On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:
    How much ram do you have?
    Mu plasmashell right now is 2GB VSZ

    I don't think the VSZ is all that significant.

    Indeed VSZ tells you very little - for instance it includes a 2MB dead
    page in the middle of every shared library, which (in a complex process) scales up VSZ to something huge, but consumes almost no real resources.

    What is the RSS?

    I think the theories about user space memory leaks are a red herring.
    The reported behavior is just not consistent with them.


    I have not followed the thread too closely, but I suspect you are right.

    But when I saw someone reporting VSZ, I thought that they should know that
    it wasn't meaningful, just in case they keep barking up that tree.

    Cheers.
    Jim

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Mon Jul 31 17:47:33 2023
    On 2023-07-31, Jim Diamond <JimDiamond@jdvb.ca> wrote:
    On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:
    Jim Diamond <JimDiamond@jdvb.ca> writes:
    On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:
    How much ram do you have?
    Mu plasmashell right now is 2GB VSZ

    I don't think the VSZ is all that significant.

    Indeed VSZ tells you very little - for instance it includes a 2MB dead
    page in the middle of every shared library, which (in a complex process)
    scales up VSZ to something huge, but consumes almost no real resources.

    What is the RSS?

    I think the theories about user space memory leaks are a red herring.
    The reported behavior is just not consistent with them.


    I have not followed the thread too closely, but I suspect you are right.

    But when I saw someone reporting VSZ, I thought that they should know that
    it wasn't meaningful, just in case they keep barking up that tree.

    Yes, thanks for that. VSZ seemed much to large to be reasonable. but I
    could not see anything else in the ps aux report that could be the
    actual memory used (except %MEM and what it was a percentage of I had no
    idea.)

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Paul@2:250/1 to All on Tue Aug 1 10:15:07 2023
    On 7/31/2023 12:47 PM, William Unruh wrote:
    On 2023-07-31, Jim Diamond <JimDiamond@jdvb.ca> wrote:
    On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:
    Jim Diamond <JimDiamond@jdvb.ca> writes:
    On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:
    How much ram do you have?
    Mu plasmashell right now is 2GB VSZ

    I don't think the VSZ is all that significant.

    Indeed VSZ tells you very little - for instance it includes a 2MB dead
    page in the middle of every shared library, which (in a complex process) >>> scales up VSZ to something huge, but consumes almost no real resources.

    What is the RSS?

    I think the theories about user space memory leaks are a red herring.
    The reported behavior is just not consistent with them.


    I have not followed the thread too closely, but I suspect you are right.

    But when I saw someone reporting VSZ, I thought that they should know that >> it wasn't meaningful, just in case they keep barking up that tree.

    Yes, thanks for that. VSZ seemed much to large to be reasonable. but I
    could not see anything else in the ps aux report that could be the
    actual memory used (except %MEM and what it was a percentage of I had no idea.)


    Only you know the particulars of your system.

    You say you're using Plasma, and "plasma freezes" shows in Google searches.

    If you were to note a rising RAM consumption (say, leave top running
    when freeze occurs), in the example here, one post mentions
    it is possible it's a video card driver issue. The graphics stack
    is producing frames, but there is no consumer.

    https://bugzilla.redhat.com/show_bug.cgi?id=1399396

    In some cases, the NVidia driver has watchdog enabled, the Nouveau
    does not. This makes the Nvidia driver capable of doing a VPU recover,
    and restoring service before it is too late.

    Paul

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From William Unruh@2:250/1 to All on Tue Aug 1 17:26:32 2023
    On 2023-08-01, Paul <nospam@needed.invalid> wrote:
    On 7/31/2023 12:47 PM, William Unruh wrote:
    On 2023-07-31, Jim Diamond <JimDiamond@jdvb.ca> wrote:
    On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:
    Jim Diamond <JimDiamond@jdvb.ca> writes:
    On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote: >>>>>> How much ram do you have?
    Mu plasmashell right now is 2GB VSZ

    I don't think the VSZ is all that significant.

    Indeed VSZ tells you very little - for instance it includes a 2MB dead >>>> page in the middle of every shared library, which (in a complex process) >>>> scales up VSZ to something huge, but consumes almost no real resources. >>>>
    What is the RSS?

    I think the theories about user space memory leaks are a red herring.
    The reported behavior is just not consistent with them.


    I have not followed the thread too closely, but I suspect you are right. >>>
    But when I saw someone reporting VSZ, I thought that they should know that >>> it wasn't meaningful, just in case they keep barking up that tree.

    Yes, thanks for that. VSZ seemed much to large to be reasonable. but I
    could not see anything else in the ps aux report that could be the
    actual memory used (except %MEM and what it was a percentage of I had no
    idea.)


    Only you know the particulars of your system.

    You say you're using Plasma, and "plasma freezes" shows in Google searches.

    As I said it is not just plasma that freezes but the whole system. I
    cannot log in via ssh on the network, no keyboard entry works (eb
    alt-ctrl-del does not reboot the system), no mouse cursor moves when I
    more themouse. In the past ( ie over a year ago) when this happened,
    there was no precursor. The system did not slow down. before hand. I
    know what swapping feels like, and there was no evidence of that. (I
    have as much swap space as I have ram, about 8GB each)


    If you were to note a rising RAM consumption (say, leave top running
    when freeze occurs), in the example here, one post mentions

    The freeze occurs randomly about once every month or so (or rather I
    have had two in the past 2 months and none from last Aug to that time)

    it is possible it's a video card driver issue. The graphics stack
    is producing frames, but there is no consumer.

    I have an onboard Intel graphics.
    lspci says
    00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]

    https://bugzilla.redhat.com/show_bug.cgi?id=1399396

    In some cases, the NVidia driver has watchdog enabled, the Nouveau
    does not. This makes the Nvidia driver capable of doing a VPU recover,
    and restoring service before it is too late.

    --- MBSE BBS v1.0.8.4 (Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)