Forum: >>> Magnum BBS <<<

computer (MGA8) randomly freezes

From William Unruh@2:250/1 to All on Fri Jul 28 02:16:54 2023

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

The only thing in the journalctl logs from just before is

-------------------------------
Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
-- Reboot --
---------------------------------------

I recently replaced the power supply thinking it might be the cause.
But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
months).

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Bit Twister@2:250/1 to All on Fri Jul 28 07:26:38 2023

On Fri, 28 Jul 2023 01:16:54 -0000 (UTC), William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

The only thing in the journalctl logs from just before is

-------------------------------
Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
-- Reboot --
---------------------------------------

I recently replaced the power supply thinking it might be the cause.
But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
months).

Off hand it seems like the cpu gets into a tight loop and quits processing interrupts.

I would install lm_sensors, configure/run lm_sensors, sensord and enable core dump.

I have also modified ~/.bash_profile to check for core dump files and
uses xmessage to provide a pop up if any are found.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Richard Kettlewell@2:250/1 to All on Fri Jul 28 08:14:22 2023

William Unruh <unruh@invalid.ca> writes:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

The only thing in the journalctl logs from just before is

-------------------------------
Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s |
1.5 kB 00:00
Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]:
dnf-makecache.service: Succeeded.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]:
dnf-makecache.service: Consumed 1.527s CPU time.
Jul 24 23:28:00 tunnel.physics.ubc.ca kernel:
Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP
SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
Jul 24 23:35:31 tunnel.physics.ubc.ca kernel:
Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
-- Reboot --
---------------------------------------

There’s a chance that the kernel had something to say that couldn’t be written to the logs due to the crash; unfortunately that can be rather
hard to get hold of in this case. If the machine has a serial port then
you might be able to get it to write kernel logs to that. But you might
just be out of luck l-(

--
https://www.greenend.org.uk/rjk/

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: terraraq NNTP server (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Fri Jul 28 09:27:47 2023

On 7/28/23 03:16, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

This reminds me of the issue I have had with running two plasma5
sessions, after a while they will just eat up the memory and when all is
gone, then everything freezes, you can't login from remote and local
login takes too long from entering username to entering password that
the login is canceled.

I would run a memory check once in a while, for example use crontab to
run this one liner:

date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
echo "----" >> /var/log/memusage.log

Then you can see if there is a process that grows.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Bit Twister@2:250/1 to All on Fri Jul 28 10:56:40 2023

On Fri, 28 Jul 2023 10:27:47 +0200, J.O. Aho wrote:

On 7/28/23 03:16, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows
nothingsignificant that I can see just before the freeze.

This reminds me of the issue I have had with running two plasma5
sessions, after a while they will just eat up the memory and when all is gone, then everything freezes, you can't login from remote and local
login takes too long from entering username to entering password that
the login is canceled.

I would run a memory check once in a while, for example use crontab to
run this one liner:

date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
echo "----" >> /var/log/memusage.log

Then you can see if there is a process that grows.

Cute, and with just a little bit of scripting/coding you could automate
the check by flagging any line with value above some watermark for cpu
and memory percentage.

If it was me, I would print any greater than 1.0 for ether one.

Since values have a decimal I would guess that I would have to
use bc to test greater than watermark value if using bash.

Simple case statement to change watermark values based on command.
where needed.

If you want a journal entry it would be something like
echo 'msg_here' | systemd-cat -t app_name_here -p type_msg_here

message types emerg, alert, crit, err, warning, notice, info, debug

In a multi-node set up I send an email to LAN admin.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Jim@2:250/1 to All on Fri Jul 28 15:23:27 2023

On Fri, 28 Jul 2023 01:16:54 +0000, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

The only thing in the journalctl logs from just before is

-------------------------------
Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
-- Reboot --
---------------------------------------

I recently replaced the power supply thinking it might be the cause.
But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
months).

Pure speculation, but I think a update to the lib rpms months ago
introduced something that now and then causes the Gnome desktop
on Wayland to lock up.

My main machine and backup are both Intel 64-bit quad-core cpu, with
ram 32 GB for the former and 4 GB for the latter. Loads imposed are
trivial (mostly Opera browser, with firefox the alternative for any
siteopera does not work well with).

Now and then the desktop would lock up. When things were worst, the
backup would lock up every few days or maybe few weeks, and the main
machine less often.

Often, it would prove impossible to kill the browser or desktop.
Power switch restart is easier than shifting to the other
machine and trying to ssh in to reboot so that was my remedy.

Inability to come up with any hint of why, but rarity of lock-up,
has led me to just wait in expectation that someday (Mageia 9
perhaps) such problems will be ironed out.

Lockup has been less often in the past couple of months.

Cheers!

jim b.

--
UNIX is not user-unfriendly, it merely
expects users to be computer friendly.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From TJ@2:250/1 to All on Fri Jul 28 15:28:50 2023

On 2023-07-27 21:16, William Unruh wrote:

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

Did you mean you updated FROM that kernel?

I hope so. Latest Mageia 8 kernel is 5.15.122-1.

TJ

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Paul@2:250/1 to All on Fri Jul 28 22:25:12 2023

On 7/27/2023 9:16 PM, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows nothingsignificant that I can see just before the freeze.

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

The only thing in the journalctl logs from just before is

-------------------------------
Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created.

Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
-- Reboot --
---------------------------------------

I recently replaced the power supply thinking it might be the cause.
But although freezes seem to be occuring less frequently, it is still occasionally freezing. (It seems to be occuring every two or three
months).

This bug does not yield to linear thinking, unfortunately.

It's not a hardware issue. There's something in the middle
of the graphics stack, which runs at frame rate (VSYNC), and
it has debounce for mouse built into it. (Like, your Razor mouse ?
Bad bad idea. You want a moldy old mouse with low DPI right now. Switch to PS/2 ports,
if ya gottem.) It uses a timer. It gets behind. And, it lunches the input subsystem.

We're seeing this elsewhere. Or, at least, log messages of the same type
as the bug. Have a look in your /var/log.

Try a radical change of DE, and see if stability returns.

As one developer described it "it's not your CPU which is too slow,
it is the Mutter architecture which is too slow".

Now, it's either that, or something has changed which has brought
an old bug back.

Paul

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Fri Jul 28 23:02:20 2023

On 7/28/23 11:56, Bit Twister wrote:

On Fri, 28 Jul 2023 10:27:47 +0200, J.O. Aho wrote:

On 7/28/23 03:16, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows
nothingsignificant that I can see just before the freeze.

This reminds me of the issue I have had with running two plasma5
sessions, after a while they will just eat up the memory and when all is
gone, then everything freezes, you can't login from remote and local
login takes too long from entering username to entering password that
the login is canceled.

I would run a memory check once in a while, for example use crontab to
run this one liner:

date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
echo "----" >> /var/log/memusage.log

Then you can see if there is a process that grows.

Cute, and with just a little bit of scripting/coding you could automate
the check by flagging any line with value above some watermark for cpu
and memory percentage.

If it was me, I would print any greater than 1.0 for ether one.

Since values have a decimal I would guess that I would have to
use bc to test greater than watermark value if using bash.

You could use awk, have to always point out this great tool as it bears
my name

date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v
min="0.3" '$6 >= min || $6=="%MEM"' && echo "----"

Just change the -v min="0.3" to the value you think it has be or more to
be displayed, we also keep displaying the column names, sure there is
better ways of doing this, but my skills not that great.

In a multi-node set up I send an email to LAN admin.

In a multi node setup I would use icinga or nagios to monitor things,
sure you can do the same with elk too.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Fri Jul 28 23:14:16 2023

On 2023-07-28, Bit Twister <BitTwister@mouse-potato.com> wrote:

On Fri, 28 Jul 2023 01:16:54 -0000 (UTC), William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows
nothingsignificant that I can see just before the freeze.

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

The only thing in the journalctl logs from just before is

-------------------------------
Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created. >>
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
-- Reboot --
---------------------------------------

I recently replaced the power supply thinking it might be the cause.
But although freezes seem to be occuring less frequently, it is still
occasionally freezing. (It seems to be occuring every two or three
months).

Off hand it seems like the cpu gets into a tight loop and quits processing interrupts.

I would install lm_sensors, configure/run lm_sensors, sensord and enable core dump.

I have also modified ~/.bash_profile to check for core dump files and
uses xmessage to provide a pop up if any are found.

OK, Here are the last two sensord reports just before the freeze
I do not see anything out of the ordinary here. The last one (23:28:55) occured. The freeze ( as inferred from the last entry into /var/log/syslog occured aroung 23:35:31

(
--------------------------------
Jul 24 23:35:31 tunnel kernel: [1181287.585321] Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
Jul 25 09:13:25 tunnel kernel: [ 0.000000] microcode: microcode updated early to revision 0xf0, date = 2021-11-12
--------------------------------------

----------------------------------
Jul 24 23:08:55 tunnel sensord: Chip: nvme-pci-0600
Jul 24 23:08:55 tunnel sensord: Adapter: PCI adapter
Jul 24 23:08:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:08:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:08:55 tunnel sensord: Chip: coretemp-isa-0000
Jul 24 23:08:55 tunnel sensord: Adapter: ISA adapter
Jul 24 23:08:55 tunnel sensord: Package id 0: 29.0 C
Jul 24 23:08:55 tunnel sensord: Core 0: 28.0 C
Jul 24 23:08:55 tunnel sensord: Core 1: 27.0 C
Jul 24 23:08:55 tunnel sensord: Core 2: 28.0 C
Jul 24 23:08:55 tunnel sensord: Core 3: 27.0 C
Jul 24 23:08:55 tunnel sensord: Chip: acpitz-acpi-0
Jul 24 23:08:55 tunnel sensord: Adapter: ACPI interface
Jul 24 23:08:55 tunnel sensord: temp1: 27.8 C
Jul 24 23:28:55 tunnel sensord: Chip: nvme-pci-0600
Jul 24 23:28:55 tunnel sensord: Adapter: PCI adapter
Jul 24 23:28:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:28:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:28:55 tunnel sensord: Chip: coretemp-isa-0000
Jul 24 23:28:55 tunnel sensord: Adapter: ISA adapter
Jul 24 23:28:55 tunnel sensord: Package id 0: 29.0 C
Jul 24 23:28:55 tunnel sensord: Core 0: 28.0 C
Jul 24 23:28:55 tunnel sensord: Core 1: 27.0 C
Jul 24 23:28:55 tunnel sensord: Core 2: 27.0 C
Jul 24 23:28:55 tunnel sensord: Core 3: 27.0 C
Jul 24 23:28:55 tunnel sensord: Chip: acpitz-acpi-0
Jul 24 23:28:55 tunnel sensord: Adapter: ACPI interface
Jul 24 23:28:55 tunnel sensord: temp1: 27.8 C ----------------------------------------------

I cannot find any core files, and I do not think I am suppressing them.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Fri Jul 28 23:20:40 2023

On 2023-07-28, J.O. Aho <user@example.net> wrote:

On 7/28/23 03:16, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows
nothingsignificant that I can see just before the freeze.

This reminds me of the issue I have had with running two plasma5
sessions, after a while they will just eat up the memory and when all is gone, then everything freezes, you can't login from remote and local
login takes too long from entering username to entering password that
the login is canceled.

Except in my case, nothing I type does anything, the mouse cursor does
not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
do nothing, gkrellm stops updating. Usually when I have run out of
memory, somethings still work, and the machine slows down drastically as
it starts to swap. This is just a complete sudden freeze. (Unlike this
past time, sometimes this has happened while I am working on the
machine. This time it happened while I was asleep)

I would run a memory check once in a while, for example use crontab to
run this one liner:

date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
echo "----" >> /var/log/memusage.log

Then you can see if there is a process that grows.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Bit Twister@2:250/1 to All on Sat Jul 29 01:02:38 2023

On Fri, 28 Jul 2023 22:14:16 -0000 (UTC), William Unruh wrote:

On 2023-07-28, Bit Twister <BitTwister@mouse-potato.com> wrote:

On Fri, 28 Jul 2023 01:16:54 -0000 (UTC), William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows
nothingsignificant that I can see just before the freeze.

Updated Mga8. Kernel 5.15.88-desktop-1.mga8.

The only thing in the journalctl logs from just before is

-------------------------------
Jul 24 23:22:34 tunnel.physics.ubc.ca dnf[2573748]: teams 8.9 kB/s | 1.5 kB 00:00
Jul 24 23:22:35 tunnel.physics.ubc.ca dnf[2573748]: Metadata cache created. >>>
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Succeeded.
Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: Finished dnf makecache. >>> Jul 24 23:22:35 tunnel.physics.ubc.ca systemd[1]: dnf-makecache.service: Consumed 1.527s CPU time.
Jul 24 23:28:00 tunnel.physics.ubc.ca kernel: Shorewall:sshd-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=183.106.205.242 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=56 ID=64660 PROTO=TCP SPT=31651 DPT=22 WINDOW=16988 RES=0x00 SYN URGP=0
Jul 24 23:35:31 tunnel.physics.ubc.ca kernel: Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
-- Reboot --
---------------------------------------

I recently replaced the power supply thinking it might be the cause.
But although freezes seem to be occuring less frequently, it is still
occasionally freezing. (It seems to be occuring every two or three
months).

Off hand it seems like the cpu gets into a tight loop and quits processing >> interrupts.

I would install lm_sensors, configure/run lm_sensors, sensord and enable core dump.

I have also modified ~/.bash_profile to check for core dump files and
uses xmessage to provide a pop up if any are found.

OK, Here are the last two sensord reports just before the freeze
I do not see anything out of the ordinary here. The last one (23:28:55) occured. The freeze ( as inferred from the last entry into /var/log/syslog occured aroung 23:35:31

(
--------------------------------
Jul 24 23:35:31 tunnel kernel: [1181287.585321] Shorewall:net-fw:DROP:IN=eno1 OUT= MAC=4c:ed:fb:c2:2a:f3:a0:ab:1b:88:6e:58:08:00 SRC=185.225.74.53 DST=192.168.0.3 LEN=40 TOS=0x00 PREC=0x00 TTL=244 ID=54321 PROTO=TCP SPT=33231 DPT=22 WINDOW=65535 RES=0x00 SYN URGP=0
Jul 25 09:13:25 tunnel kernel: [ 0.000000] microcode: microcode updated early to revision 0xf0, date = 2021-11-12
--------------------------------------

----------------------------------
Jul 24 23:08:55 tunnel sensord: Chip: nvme-pci-0600
Jul 24 23:08:55 tunnel sensord: Adapter: PCI adapter
Jul 24 23:08:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:08:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:08:55 tunnel sensord: Chip: coretemp-isa-0000
Jul 24 23:08:55 tunnel sensord: Adapter: ISA adapter
Jul 24 23:08:55 tunnel sensord: Package id 0: 29.0 C
Jul 24 23:08:55 tunnel sensord: Core 0: 28.0 C
Jul 24 23:08:55 tunnel sensord: Core 1: 27.0 C
Jul 24 23:08:55 tunnel sensord: Core 2: 28.0 C
Jul 24 23:08:55 tunnel sensord: Core 3: 27.0 C
Jul 24 23:08:55 tunnel sensord: Chip: acpitz-acpi-0
Jul 24 23:08:55 tunnel sensord: Adapter: ACPI interface
Jul 24 23:08:55 tunnel sensord: temp1: 27.8 C
Jul 24 23:28:55 tunnel sensord: Chip: nvme-pci-0600
Jul 24 23:28:55 tunnel sensord: Adapter: PCI adapter
Jul 24 23:28:55 tunnel sensord: Composite: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:28:55 tunnel sensord: Sensor 2: 24.9 C (min = -40.1 C, max = 83.8 C)
Jul 24 23:28:55 tunnel sensord: Chip: coretemp-isa-0000
Jul 24 23:28:55 tunnel sensord: Adapter: ISA adapter
Jul 24 23:28:55 tunnel sensord: Package id 0: 29.0 C
Jul 24 23:28:55 tunnel sensord: Core 0: 28.0 C
Jul 24 23:28:55 tunnel sensord: Core 1: 27.0 C
Jul 24 23:28:55 tunnel sensord: Core 2: 27.0 C
Jul 24 23:28:55 tunnel sensord: Core 3: 27.0 C
Jul 24 23:28:55 tunnel sensord: Chip: acpitz-acpi-0
Jul 24 23:28:55 tunnel sensord: Adapter: ACPI interface
Jul 24 23:28:55 tunnel sensord: temp1: 27.8 C ----------------------------------------------

I suggest that you should set max value allowed for temps.

I cannot find any core files, and I do not think I am suppressing them.

Pretty sure they are not enabled by default.
You have to make configuration file changes.
Quick scan of my install/change scripts finds this custom
drop-in file and settings

# grep core /etc/sysctl.d/xx__sysctl.conf
# Enabling suid dump PID appending and set core location and name kernel.core_uses_pid = 1
kernel.core_pattern = /var/tmp/%e_%p_%s.core
# net.core.rmem_max = 1048576

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Bit Twister@2:250/1 to All on Sat Jul 29 01:17:45 2023

On Sat, 29 Jul 2023 00:02:20 +0200, J.O. Aho wrote:

On 7/28/23 11:56, Bit Twister wrote:

On Fri, 28 Jul 2023 10:27:47 +0200, J.O. Aho wrote:

On 7/28/23 03:16, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing. >>>> Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the >>>> power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows
nothingsignificant that I can see just before the freeze.

This reminds me of the issue I have had with running two plasma5
sessions, after a while they will just eat up the memory and when all is >>> gone, then everything freezes, you can't login from remote and local
login takes too long from entering username to entering password that
the login is canceled.

I would run a memory check once in a while, for example use crontab to
run this one liner:

date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11 &&
echo "----" >> /var/log/memusage.log

Then you can see if there is a process that grows.

Cute, and with just a little bit of scripting/coding you could automate
the check by flagging any line with value above some watermark for cpu
and memory percentage.

If it was me, I would print any greater than 1.0 for ether one.

Since values have a decimal I would guess that I would have to
use bc to test greater than watermark value if using bash.

You could use awk, have to always point out this great tool as it bears
my name

date && ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v
min="0.3" '$6 >= min || $6=="%MEM"' && echo "----"

Just change the -v min="0.3" to the value you think it has be or more to
be displayed, we also keep displaying the column names, sure there is
better ways of doing this, but my skills not that great.

Yeah, but different nodes can have different apps needing higher limits.
I'll be using bash and a case statement to change max values based on app.

for example on my myth node I have
# ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11
USER UID COMMAND PID %CPU %MEM
bittwis+ 1500 mythfrontend 4124 15.4 6.8
mysql 976 mysqld 780 1.1 3.2
mythtv 90 mythbackend 3804 1.5 3.0
bittwis+ 1500 net_applet 2223 0.2 1.3
bittwis+ 1500 xfwm4 2153 1.6 1.2

yet my normal web browsing node has
]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | head -n 11
USER UID COMMAND PID %CPU %MEM
root 0 Xorg 3147 0.4 0.9
bittwis+ 1500 xfwm4 4645 0.1 0.7
bittwis+ 1500 xfdesktop 4661 0.0 0.7
mysql 978 mysqld 3194 0.0 0.5
esept 1513 geany 9858 0.1 0.3
root 0 geany 9953 0.4 0.3

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Sat Jul 29 09:18:32 2023

On 7/29/23 00:20, William Unruh wrote:

On 2023-07-28, J.O. Aho <user@example.net> wrote:

On 7/28/23 03:16, William Unruh wrote:

I have a disturbing system, which every once in a while freezes. The
sceen on the monitor is some X scene (usually has Chrome running, but
they again that is often the case) but the keyboard,mouse, do nothing.
Trying to log on from the network fails with no response from the
machine. Alt-ctrl-del does nothing. The only way to recover is via the
power switch on the back.
Afterwards, looking at /var/log/syslog, or /var/log/messages shows
nothingsignificant that I can see just before the freeze.

This reminds me of the issue I have had with running two plasma5
sessions, after a while they will just eat up the memory and when all is
gone, then everything freezes, you can't login from remote and local
login takes too long from entering username to entering password that
the login is canceled.

Except in my case, nothing I type does anything, the mouse cursor does
not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
do nothing, gkrellm stops updating. Usually when I have run out of
memory, somethings still work, and the machine slows down drastically as
it starts to swap. This is just a complete sudden freeze. (Unlike this
past time, sometimes this has happened while I am working on the
machine. This time it happened while I was asleep)

This is the exact same and as you mention the the swap, it's exactly the
same and I can say even if you disable swap you will have the same behavior.

I think you should monitor the memory usage, you could add this to your /etc/crontab

*/5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
--sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
"----" >> /var/log/mem.log)

Will run every 5 minutes and give you a lost of all processes that uses
10% or more of your memory. Then you take a look at /var/log/mem.log
from time to time or after a freeze and see which processes you had there.

What you can also do is to have an ssh connection from another machine
always running, this way you may be able to kill processes, you will not
be able to run ps, top, htop or other tools as they will be so slow that
they will block your shell and all you can do is just reset the machine
by pressing the button. If your issue is the desktop environment, I
would recommend to switch to another one. Enlightenment I think may have
the least of these issues, but I'm no fan of it, at the moment I'm using
lxqt.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Richard Kettlewell@2:250/1 to All on Sat Jul 29 10:44:07 2023

William Unruh <unruh@invalid.ca> writes:

Except in my case, nothing I type does anything, the mouse cursor does
not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
do nothing, gkrellm stops updating. Usually when I have run out of
memory, somethings still work, and the machine slows down drastically as
it starts to swap. This is just a complete sudden freeze. (Unlike this
past time, sometimes this has happened while I am working on the
machine. This time it happened while I was asleep)

Yes. Normal behavior for running out of RAM is swapping, and for running
out of RAM+swap is to start killing user processes.

In this case:

| Trying to log on from the network fails with no response from the
| machine.

Does it respond to ping?

If so then the kernel’s still working, at least a bit (the lack of
kernel logs suggest everything above that is dead).

If it does not ping then the kernel has crashed, either due to a kernel
bug or a hardware fault.

--
https://www.greenend.org.uk/rjk/

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: terraraq NNTP server (2:250/1@fidonet)

From David W. Hodgins@2:250/1 to All on Sat Jul 29 14:09:32 2023

On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:

*/5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
--sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
"----" >> /var/log/mem.log)

That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it the column number is wrong.

[dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
USER UID COMMAND PID %CPU %MEM
dave 500 opera 6969 2.8 4.7
ddclient 468 ddclient - slee 5691 0.0 0.0
[dave@x3 ~]$ cat /proc/5691/comm
ddclient - slee
[dave@x3 ~]$ cat /proc/5691/cmdline
ddclient - sleeping for 90 seconds[dave@x3 ~]$

I don't see a reliable field separator.

Regards, Dave Hodgins

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Sat Jul 29 16:39:48 2023

On 7/29/23 15:09, David W. Hodgins wrote:

On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:

*/5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
--sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
"----" >> /var/log/mem.log)

That is not reliable. /proc/$PID/comm may contain spaces so by time awk
gets it
the column number is wrong.

[dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
USER UID COMMAND PID %CPU %MEM dave 500 opera 6969 2.8 4.7 ddclient 468 ddclient - slee 5691 0.0 0.0
[dave@x3 ~]$ cat /proc/5691/comm
ddclient - slee
[dave@x3 ~]$ cat /proc/5691/cmdline
ddclient - sleeping for 90 seconds[dave@x3 ~]$

I don't see a reliable field separator.

Regards, Dave Hodgins

Easy fix:ps -Ao user,uid,pid,pcpu,pmem,comm --sort=-pmem | awk -v

min="3.0" '$5 >= min || $5=="%MEM"'

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Sat Jul 29 18:12:05 2023

On 2023-07-29, David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:

On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:

*/5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
--sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
"----" >> /var/log/mem.log)

That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
the column number is wrong.

[dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'

How about $NF instead of $6?

USER UID COMMAND PID %CPU %MEM
dave 500 opera 6969 2.8 4.7
ddclient 468 ddclient - slee 5691 0.0 0.0
[dave@x3 ~]$ cat /proc/5691/comm
ddclient - slee
[dave@x3 ~]$ cat /proc/5691/cmdline
ddclient - sleeping for 90 seconds[dave@x3 ~]$

I don't see a reliable field separator.

Regards, Dave Hodgins

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Sat Jul 29 18:16:07 2023

On 2023-07-29, Richard Kettlewell <invalid@invalid.invalid> wrote:

William Unruh <unruh@invalid.ca> writes:

Except in my case, nothing I type does anything, the mouse cursor does
not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
do nothing, gkrellm stops updating. Usually when I have run out of
memory, somethings still work, and the machine slows down drastically as
it starts to swap. This is just a complete sudden freeze. (Unlike this
past time, sometimes this has happened while I am working on the
machine. This time it happened while I was asleep)

Yes. Normal behavior for running out of RAM is swapping, and for running
out of RAM+swap is to start killing user processes.

Yes, and there is no evidence for that.

In this case:

| Trying to log on from the network fails with no response from the
| machine.

Does it respond to ping?

No.

If so then the kernel’s still working, at least a bit (the lack of
kernel logs suggest everything above that is dead).

If it does not ping then the kernel has crashed, either due to a kernel
bug or a hardware fault.

That is sure what it looks like. Weird thing is that the video card is
still sending out the last image, so it is running.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Bit Twister@2:250/1 to All on Sat Jul 29 19:52:19 2023

On Sat, 29 Jul 2023 09:09:32 -0400, David W. Hodgins wrote:

On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:

*/5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
--sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
"----" >> /var/log/mem.log)

That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
the column number is wrong.

[dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
USER UID COMMAND PID %CPU %MEM
dave 500 opera 6969 2.8 4.7
ddclient 468 ddclient - slee 5691 0.0 0.0
[dave@x3 ~]$ cat /proc/5691/comm
ddclient - slee
[dave@x3 ~]$ cat /proc/5691/cmdline
ddclient - sleeping for 90 seconds[dave@x3 ~]$

I don't see a reliable field separator.

My bash solution
while read -r line ; do
set -- $line
_bin=$3
shift $(( $# - 2 ))
_cpu=$1
_mem=$2

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Sat Jul 29 20:45:33 2023

On 29/07/2023 11:44, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

Except in my case, nothing I type does anything, the mouse cursor does
not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
do nothing, gkrellm stops updating. Usually when I have run out of
memory, somethings still work, and the machine slows down drastically as
it starts to swap. This is just a complete sudden freeze. (Unlike this
past time, sometimes this has happened while I am working on the
machine. This time it happened while I was asleep)

Yes. Normal behavior for running out of RAM is swapping, and for running
out of RAM+swap is to start killing user processes.

The problem is that the swapping in and out Xorg and the desktop
environment makes things extremely slow and the tradition in Linux is to
kill a random process (not sure if it's changed nowadays), the
likelihood that the right process is killed is slim and if not the right
one is killed, the swapping will not end and the swap will keep the
system slow and even if it kills the right process it will in reality
take hours before anything happens due of the swap in and out.

My experience is that things do not get better even if you disable swap (seldom you really need swap when having 64GB RAM), for some reason it
seems to be the disk is as active as during the swap in / out. Could it
have to do with zram?

Something that makes a difference is setting a hard memory limit on the process, then it will be killed when it tries to use more RAM than the
limit.

Of course at this point we don't know what the issues is for OP and he
don't use plasma5, so I doubt it's plasmashell that is his issue.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From David W. Hodgins@2:250/1 to All on Sat Jul 29 21:09:21 2023

On Sat, 29 Jul 2023 14:52:19 -0400, Bit Twister <BitTwister@mouse-potato.com> wrote:

On Sat, 29 Jul 2023 09:09:32 -0400, David W. Hodgins wrote:

On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:

*/5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
--sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
"----" >> /var/log/mem.log)

That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
the column number is wrong.

[dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
USER UID COMMAND PID %CPU %MEM
dave 500 opera 6969 2.8 4.7
ddclient 468 ddclient - slee 5691 0.0 0.0
[dave@x3 ~]$ cat /proc/5691/comm
ddclient - slee
[dave@x3 ~]$ cat /proc/5691/cmdline
ddclient - sleeping for 90 seconds[dave@x3 ~]$

I don't see a reliable field separator.

My bash solution
while read -r line ; do
set -- $line
_bin=$3
shift $(( $# - 2 ))
_cpu=$1
_mem=$2

The solution of putting the comm field last works fine.

$ ps -Ao user,uid,pid,pcpu,pmem,comm --sort=-pmem | awk -v min="2.0" '$5 >= min || $5=="%MEM"'
USER UID PID %CPU %MEM COMMAND
dave 500 6969 3.7 4.9 opera
dave 500 33667 7.5 4.0 firefox
dave 500 6186 0.7 2.9 plasmashell
dave 500 34910 1.3 2.3 Isolated Web Co
dave 500 34975 0.8 2.3 Isolated Web Co
dave 500 53719 2.1 2.0 Isolated Web Co

The Isolated Web Content processes are all firefox.

Regards, Dave Hodgins

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Sat Jul 29 23:39:00 2023

On 2023-07-29, J.O. Aho <user@example.net> wrote:

On 29/07/2023 11:44, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

Except in my case, nothing I type does anything, the mouse cursor does
not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp
do nothing, gkrellm stops updating. Usually when I have run out of
memory, somethings still work, and the machine slows down drastically as >>> it starts to swap. This is just a complete sudden freeze. (Unlike this
past time, sometimes this has happened while I am working on the
machine. This time it happened while I was asleep)

Yes. Normal behavior for running out of RAM is swapping, and for running
out of RAM+swap is to start killing user processes.

The problem is that the swapping in and out Xorg and the desktop
environment makes things extremely slow and the tradition in Linux is to kill a random process (not sure if it's changed nowadays), the
likelihood that the right process is killed is slim and if not the right
one is killed, the swapping will not end and the swap will keep the
system slow and even if it kills the right process it will in reality
take hours before anything happens due of the swap in and out.

My experience is that things do not get better even if you disable swap (seldom you really need swap when having 64GB RAM), for some reason it
seems to be the disk is as active as during the swap in / out. Could it
have to do with zram?

Something that makes a difference is setting a hard memory limit on the process, then it will be killed when it tries to use more RAM than the limit.

Of course at this point we don't know what the issues is for OP and he
don't use plasma5, so I doubt it's plasmashell that is his issue.

Actually I do use Plasma and sddm.
But as I said there is little indication beforehand that is is swapping
that is the problem (in the times that it happened to me while I was
working on the system.-- as I said this last time it occured just before bedtime and I discovered it next morning. )

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Bit Twister@2:250/1 to All on Sun Jul 30 04:40:43 2023

On Sat, 29 Jul 2023 16:09:21 -0400, David W. Hodgins wrote:

On Sat, 29 Jul 2023 14:52:19 -0400, Bit Twister <BitTwister@mouse-potato.com> wrote:

On Sat, 29 Jul 2023 09:09:32 -0400, David W. Hodgins wrote:

On Sat, 29 Jul 2023 04:18:32 -0400, J.O. Aho <user@example.net> wrote:

*/5 * * * * root (date && ps -Ao user,uid,comm,pid,pcpu,pmem
--sort=-pmem | awk -v min="10.0" '$6 >= min || $6=="%MEM"' && echo
"----" >> /var/log/mem.log)

That is not reliable. /proc/$PID/comm may contain spaces so by time awk gets it
the column number is wrong.

[dave@x3 ~]$ ps -Ao user,uid,comm,pid,pcpu,pmem --sort=-pmem | awk -v min="3.0" '$6 >= min || $6=="%MEM"'
USER UID COMMAND PID %CPU %MEM
dave 500 opera 6969 2.8 4.7
ddclient 468 ddclient - slee 5691 0.0 0.0
[dave@x3 ~]$ cat /proc/5691/comm
ddclient - slee
[dave@x3 ~]$ cat /proc/5691/cmdline
ddclient - sleeping for 90 seconds[dave@x3 ~]$

I don't see a reliable field separator.

My bash solution
while read -r line ; do
set -- $line
_bin=$3
shift $(( $# - 2 ))
_cpu=$1
_mem=$2

The solution of putting the comm field last works fine.

$ ps -Ao user,uid,pid,pcpu,pmem,comm --sort=-pmem | awk -v min="2.0" '$5 >= min || $5=="%MEM"'
USER UID PID %CPU %MEM COMMAND
dave 500 6969 3.7 4.9 opera
dave 500 33667 7.5 4.0 firefox
dave 500 6186 0.7 2.9 plasmashell
dave 500 34910 1.3 2.3 Isolated Web Co
dave 500 34975 0.8 2.3 Isolated Web Co
dave 500 53719 2.1 2.0 Isolated Web Co

Well, damn, got to go get crowbar to get head out of derriere.
Nice solution.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From David W. Hodgins@2:250/1 to All on Sun Jul 30 05:09:02 2023

On Sat, 29 Jul 2023 23:40:43 -0400, Bit Twister <BitTwister@mouse-potato.com> wrote:

Nice solution.

It was J.O. Aho that posted that solution.

Regards, Dave Hodgins

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Richard Kettlewell@2:250/1 to All on Sun Jul 30 11:36:00 2023

William Unruh <unruh@invalid.ca> writes:

On 2023-07-29, Richard Kettlewell <invalid@invalid.invalid> wrote:

Does it respond to ping?

No.

[...]

If it does not ping then the kernel has crashed, either due to a kernel
bug or a hardware fault.

That is sure what it looks like. Weird thing is that the video card is
still sending out the last image, so it is running.

The video card keeps transmits to the display under its own steam, it
doesn’t need the kernel to tell it to do so.

--
https://www.greenend.org.uk/rjk/

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: terraraq NNTP server (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Sun Jul 30 11:54:33 2023

On 7/30/23 00:39, William Unruh wrote:

On 2023-07-29, J.O. Aho <user@example.net> wrote:

On 29/07/2023 11:44, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

Except in my case, nothing I type does anything, the mouse cursor does >>>> not move, Teh Alt-ctrl-F keys do nothing, alt-ctrl-det or alt-ctrl-bksp >>>> do nothing, gkrellm stops updating. Usually when I have run out of
memory, somethings still work, and the machine slows down drastically as >>>> it starts to swap. This is just a complete sudden freeze. (Unlike this >>>> past time, sometimes this has happened while I am working on the
machine. This time it happened while I was asleep)

Yes. Normal behavior for running out of RAM is swapping, and for running >>> out of RAM+swap is to start killing user processes.

The problem is that the swapping in and out Xorg and the desktop
environment makes things extremely slow and the tradition in Linux is to
kill a random process (not sure if it's changed nowadays), the
likelihood that the right process is killed is slim and if not the right
one is killed, the swapping will not end and the swap will keep the
system slow and even if it kills the right process it will in reality
take hours before anything happens due of the swap in and out.

My experience is that things do not get better even if you disable swap
(seldom you really need swap when having 64GB RAM), for some reason it
seems to be the disk is as active as during the swap in / out. Could it
have to do with zram?

Something that makes a difference is setting a hard memory limit on the
process, then it will be killed when it tries to use more RAM than the
limit.

Of course at this point we don't know what the issues is for OP and he
don't use plasma5, so I doubt it's plasmashell that is his issue.

Actually I do use Plasma and sddm.
But as I said there is little indication beforehand that is is swapping
that is the problem (in the times that it happened to me while I was
working on the system.-- as I said this last time it occured just before bedtime and I discovered it next morning. )

Not sure about your setup, I have nVidia RTX 2060 with the closed source driver (generally the latest) with a kernel that tend to be the latest
from the distribution together with the latest version of kde/plasma.

Running two Xorg with their own plasma5, one with weather applet and
directory display and the other one with just the directory display.
Fixed image background on both.

The memory leak is quite random when it starts, but latest 2 days after
start of plasmashell it will have begun to eat up a lot of memory, first slowly and then faster and faster. When using strace it seems
plasmashell is trying to connect somewhere and it gets a timeout, during
the timeout period it manage to queue 3-4 more requests (haven't figured
out where it tries to connect). Usually it's just one of the
plasmashells that hogs a lot of memory, like 30-40GB while the other may
just be around 10GB (which one uses the most is random). When the RAM
run out, then the disk activity starts (remind you that I do not have a
swap) and when the disk activity has begun, the Xorg will be so slow and
if you move the mouse pointer you will see it move a small distance and
then freeze. It will be now impossible to ssh to the machine, if you
have a running ssh connection to the machine you can still write
something, but it will be slow like a 120baud modem. If you already are
root ('su -' before it happen), then you can run 'killall -9
plasmashell' and after a short while the machine will be somewhat usable again, now you just need to start the plasmashell for each user, this
tend to be a two step process, first "su - username" and run
"DISPLAY=:X.0 plasmashell --replace", then go and switch to that users
VT and then run from konsole "plasmashell --replace", and then repeat
the process with the next user.

There is a bug report on this, I did keep on updating it for each
version of kde/plasma I had installed and from time to time send in more
logs, but the process has been 0%.

There is a known bug in the wallpaper slideshow, but never used that
one, so it's not the one affecting me.

Nowadays I don't use plasma5 at all, I will make a try with plasma6 when
it's in the distros repository, but until then I will keep on using lxqt
and I suggest you do also check for another desktop environment to use
until plasma6 is out for your distro.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Paul@2:250/1 to All on Sun Jul 30 12:01:12 2023

On 7/30/2023 6:36 AM, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

On 2023-07-29, Richard Kettlewell <invalid@invalid.invalid> wrote:

Does it respond to ping?

No.

[...]

If it does not ping then the kernel has crashed, either due to a kernel
bug or a hardware fault.

That is sure what it looks like. Weird thing is that the video card is
still sending out the last image, so it is running.

The video card keeps transmits to the display under its own steam, it doesn’t need the kernel to tell it to do so.

For a non-tearing display, I would expect double buffering, and some
sort of command issued to aid compositing (Z-axis sorting). That may
require a minimum of some sort of command list to implement compositing.
A command list issued at least, 60 times a second.

And there's more than one level of compositing going on, as a web browser
also composites, even when its "window" is "stable". It is still composing
the same stable image, 60 times a second.

Even when a computer is idle, modern developers have made it busy.
In ways that were never present in the past. There was a time, when
a computer was actually idle -- if you moved a window, it required a
redraw, but at least there was proportionate behavior to user input.
Whereas the modern way of doing things, is just abusive.

Making conclusions about how stuff works on computers now, by
observing failures, is pretty difficult. But there are some
examples available. For example, when the memory consumption
of a browser used to shoot up, I managed to trace that one
day (by noticing a graphical failure-to=update), to a
breakage in the browser "output" being "consumed" by the
window manager. It seemed to indicate, that the browser
process did not recognize an overflow condition and it
did not "throw away" unconsumed content it was sending
to the window manager.

It's not like the old days, where some game would crash,
and various parts of the mechanical aspects of how graphics
and sound worked, presented themselves. The failures now,
are much harder to observe. For example, on an old game
crash, the sound buffer pointer was reused again and again,
so a repetitive sound would come out of the speaker until
the PC was reset.

Paul

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Sun Jul 30 18:59:22 2023

On 2023-07-30, J.O. Aho <user@example.net> wrote:

On 7/30/23 00:39, William Unruh wrote:

On 2023-07-29, J.O. Aho <user@example.net> wrote:

On 29/07/2023 11:44, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

....

Running two Xorg with their own plasma5, one with weather applet and directory display and the other one with just the directory display.
Fixed image background on both.

The memory leak is quite random when it starts, but latest 2 days after start of plasmashell it will have begun to eat up a lot of memory, first slowly and then faster and faster. When using strace it seems
plasmashell is trying to connect somewhere and it gets a timeout, during
the timeout period it manage to queue 3-4 more requests (haven't figured
out where it tries to connect). Usually it's just one of the
plasmashells that hogs a lot of memory, like 30-40GB while the other may just be around 10GB (which one uses the most is random). When the RAM

How much ram do you have?
Mu plasmashell right now is 2GB VSZ

I run Intel onboard graphics. There is no slowdown beforehand. It just
as if someone threw a switch to shut everything down, except that the
image keeps showing (ie the grephics card is still sending out a
signal).
No disk activity, no slowdown. just stop.

run out, then the disk activity starts (remind you that I do not have a swap) and when the disk activity has begun, the Xorg will be so slow and
if you move the mouse pointer you will see it move a small distance and
then freeze. It will be now impossible to ssh to the machine, if you
have a running ssh connection to the machine you can still write
something, but it will be slow like a 120baud modem. If you already are
root ('su -' before it happen), then you can run 'killall -9
plasmashell' and after a short while the machine will be somewhat usable again, now you just need to start the plasmashell for each user, this
tend to be a two step process, first "su - username" and run
"DISPLAY=:X.0 plasmashell --replace", then go and switch to that users
VT and then run from konsole "plasmashell --replace", and then repeat
the process with the next user.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Mike Easter@2:250/1 to All on Sun Jul 30 20:11:21 2023

J.O. Aho wrote:

Nowadays I don't use plasma5 at all, I will make a try with plasma6 when it's in the distros repository, but until then I will keep on using lxqt
and I suggest you do also check for another desktop environment to use
until plasma6 is out for your distro.

I realize that we are talking about an 'unknown' diagnosis, but it
sounds like it is closely associated w/ the KDE desktop, that
theoretically anyone using plasma5 could spring a memory leak, or, is
there potentially some much more limited concept going on here?

Like some 'specific' KDE 'gear' that if someone didn't have that KDE
app, they wouldn't have a leak?

--
Mike Easter

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Sun Jul 30 21:03:28 2023

On 7/30/23 19:59, William Unruh wrote:

On 2023-07-30, J.O. Aho <user@example.net> wrote:

On 7/30/23 00:39, William Unruh wrote:

On 2023-07-29, J.O. Aho <user@example.net> wrote:

On 29/07/2023 11:44, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

...

Running two Xorg with their own plasma5, one with weather applet and
directory display and the other one with just the directory display.
Fixed image background on both.

The memory leak is quite random when it starts, but latest 2 days after
start of plasmashell it will have begun to eat up a lot of memory, first
slowly and then faster and faster. When using strace it seems
plasmashell is trying to connect somewhere and it gets a timeout, during
the timeout period it manage to queue 3-4 more requests (haven't figured
out where it tries to connect). Usually it's just one of the
plasmashells that hogs a lot of memory, like 30-40GB while the other may
just be around 10GB (which one uses the most is random). When the RAM

How much ram do you have?

64GB, I used to have 32, but I ran into the issue and I was hoping that doubling the amount of RAM should make it less affecting, gosh I was wrong.

Mu plasmashell right now is 2GB VSZ

I think I usually had it to take ~700MB when it started, if it was up on
2GB I knew it would start to eat more memory, so a "plasmashell
--replace" was the only thing that prevented it from going bad, for a
while, in best case you buy yourself another day.

I run Intel onboard graphics. There is no slowdown beforehand. It just
as if someone threw a switch to shut everything down, except that the
image keeps showing (ie the grephics card is still sending out a
signal).

You will have a desktop on you screen all the way till you reboot, this
far I never seen the plasmashell process to be auto killed, if it would
then your screen would turn black, but a mouse pointer that you can move around.

No disk activity, no slowdown. just stop.

If you had been a hour earlier, I think you would maybe caught the
slowdown, as it's related to the amount of free RAM you have left, when
you don't it's the freeze time.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Sun Jul 30 21:10:56 2023

On 7/30/23 21:11, Mike Easter wrote:

J.O. Aho wrote:

Nowadays I don't use plasma5 at all, I will make a try with plasma6
when it's in the distros repository, but until then I will keep on
using lxqt and I suggest you do also check for another desktop
environment to use until plasma6 is out for your distro.

I realize that we are talking about an 'unknown' diagnosis, but it
sounds like it is closely associated w/ the KDE desktop, that
theoretically anyone using plasma5 could spring a memory leak, or, is
there potentially some much more limited concept going on here?

Like some 'specific' KDE 'gear' that if someone didn't have that KDE
app, they wouldn't have a leak?

I have had this issue for some years, it's only affected my primary
desktop where I have two X sessions running.

I do not have this issue on our laptops which just runs one X session,
all run the same distro, all of them are up to date, desktop nVidia
graphics, laptop 1 intel, and laptop intel/nVidia. Appletwise they are configured the same.

If I understood right from William, he has intel on his desktop and runs
one X session.

So we do not know what causes the issue, so KDE/plasma5 has some odd
memory leak that seems to be random, of course it could be a setting
that causes it, but from the xsession-errors log there ain't anything
that would hint on what it could be.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Jim Diamond@2:250/1 to All on Mon Jul 31 01:30:05 2023

On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:

On 2023-07-30, J.O. Aho <user@example.net> wrote:

On 7/30/23 00:39, William Unruh wrote:

On 2023-07-29, J.O. Aho <user@example.net> wrote:

On 29/07/2023 11:44, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

...

Running two Xorg with their own plasma5, one with weather applet and
directory display and the other one with just the directory display.
Fixed image background on both.

The memory leak is quite random when it starts, but latest 2 days after
start of plasmashell it will have begun to eat up a lot of memory, first
slowly and then faster and faster. When using strace it seems
plasmashell is trying to connect somewhere and it gets a timeout, during
the timeout period it manage to queue 3-4 more requests (haven't figured
out where it tries to connect). Usually it's just one of the
plasmashells that hogs a lot of memory, like 30-40GB while the other may
just be around 10GB (which one uses the most is random). When the RAM

How much ram do you have?
Mu plasmashell right now is 2GB VSZ

I don't think the VSZ is all that significant.

What is the RSS?

(Right now I have a chromium process whose VSZ and RSS are reported by ps
to be, respectively, 1185776448 and 63232.)

Jim

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Richard Kettlewell@2:250/1 to All on Mon Jul 31 08:11:47 2023

Jim Diamond <JimDiamond@jdvb.ca> writes:

On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:

How much ram do you have?
Mu plasmashell right now is 2GB VSZ

I don't think the VSZ is all that significant.

Indeed VSZ tells you very little - for instance it includes a 2MB dead
page in the middle of every shared library, which (in a complex process)
scales up VSZ to something huge, but consumes almost no real resources.

What is the RSS?

I think the theories about user space memory leaks are a red herring.
The reported behavior is just not consistent with them.

--
https://www.greenend.org.uk/rjk/

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: terraraq NNTP server (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Mon Jul 31 08:45:27 2023

On 7/30/23 22:03, J.O. Aho wrote:

On 7/30/23 19:59, William Unruh wrote:

Mu plasmashell right now is 2GB VSZ

I think I usually had it to take ~700MB when it started, if it was up on
2GB I knew it would start to eat more memory, so a "plasmashell
--replace" was the only thing that prevented it from going bad, for a
while, in best case you buy yourself another day.

Sorry, I missed you looked at VSZ, that one is irrelevant, look at RSS
or calculate from %MEM.

--
//Aho

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Mon Jul 31 08:49:52 2023

On 2023-07-30, J.O. Aho <user@example.net> wrote:

On 7/30/23 19:59, William Unruh wrote:

On 2023-07-30, J.O. Aho <user@example.net> wrote:

On 7/30/23 00:39, William Unruh wrote:

On 2023-07-29, J.O. Aho <user@example.net> wrote:

On 29/07/2023 11:44, Richard Kettlewell wrote:

William Unruh <unruh@invalid.ca> writes:

....

How much ram do you have?

64GB, I used to have 32, but I ran into the issue and I was hoping that doubling the amount of RAM should make it less affecting, gosh I was wrong.

I have 8 GB. ps says plasmashell right now has VSZ of 1.9GB and %MEM of
3.1% ( which sould be 250MB I guess.

Mu plasmashell right now is 2GB VSZ

I think I usually had it to take ~700MB when it started, if it was up on
2GB I knew it would start to eat more memory, so a "plasmashell
--replace" was the only thing that prevented it from going bad, for a
while, in best case you buy yourself another day.

I run Intel onboard graphics. There is no slowdown beforehand. It just
as if someone threw a switch to shut everything down, except that the
image keeps showing (ie the grephics card is still sending out a
signal).

You will have a desktop on you screen all the way till you reboot, this
far I never seen the plasmashell process to be auto killed, if it would
then your screen would turn black, but a mouse pointer that you can move around.

No disk activity, no slowdown. just stop.

If you had been a hour earlier, I think you would maybe caught the
slowdown, as it's related to the amount of free RAM you have left, when
you don't it's the freeze time.

I have about 8GB of swap, so it should slow down when it starts to use
swap.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From J.O. Aho@2:250/1 to All on Mon Jul 31 09:37:56 2023

On 7/31/23 09:49, William Unruh wrote:

On 2023-07-30, J.O. Aho <user@example.net> wrote:

On 7/30/23 19:59, William Unruh wrote:

...

How much ram do you have?

64GB, I used to have 32, but I ran into the issue and I was hoping that
doubling the amount of RAM should make it less affecting, gosh I was wrong.

I have 8 GB. ps says plasmashell right now has VSZ of 1.9GB and %MEM of
3.1% ( which sould be 250MB I guess.

Then your plasmashell is using 250MB at the moment, the VSZ is a in
theory amount of RAM the process would use if it had to load everything
at once, but as you ain't using all the features of plasmahsell this
will not happen.

The RSS amount will include all the memory dependencies may use like libraries, as most libraries in Linux are shared, then the same shared
library can be reported multiple times (once in each process that uses
it), this applies to the %MEM too.

If your %MEM reaches 25%, then it will use like 2GB mem and the RSS will
also give you something in that region too +/- something as it's more
accurate value than %MEM.

If you had been a hour earlier, I think you would maybe caught the
slowdown, as it's related to the amount of free RAM you have left, when
you don't it's the freeze time.

I have about 8GB of swap, so it should slow down when it starts to use
swap.

Depends on what has been swapped out, if it's just things that are
seldom used (say hibernated tabs in your browser), then you will not
notice anything until you load a such tab. The issue is when it start
swap in and out data that it's needing, this will lead to that the disk
light will blink really rapidly and everything goes extremely slowly or
seems to have frozen.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: Air Applewood, The Linux Gateway to the UK & Eire (2:250/1@fidonet)

From Richard Kettlewell@2:250/1 to All on Mon Jul 31 12:28:04 2023

"J.O. Aho" <user@example.net> writes:

Then your plasmashell is using 250MB at the moment, the VSZ is a in
theory amount of RAM the process would use if it had to load
everything at once, but as you ain't using all the features of
plasmahsell this will not happen.

On a 64-bit Linux, VSZ is often way more than the RAM a process could
possibly use. Each shared library has a 2MB PROT_NONE mapping that
contributes to VSZ but can never consume any RAM, and GUI processses
often have huge numbers of shared libraries so this can easily reach
into the hundreds of megabytes.

See https://www.greenend.org.uk/rjk/tech/dataseg.html for discussion.

--
https://www.greenend.org.uk/rjk/

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: terraraq NNTP server (2:250/1@fidonet)

From Jim Diamond@2:250/1 to All on Mon Jul 31 15:24:56 2023

On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:

Jim Diamond <JimDiamond@jdvb.ca> writes:

On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:

How much ram do you have?
Mu plasmashell right now is 2GB VSZ

I don't think the VSZ is all that significant.

Indeed VSZ tells you very little - for instance it includes a 2MB dead
page in the middle of every shared library, which (in a complex process) scales up VSZ to something huge, but consumes almost no real resources.

What is the RSS?

I think the theories about user space memory leaks are a red herring.
The reported behavior is just not consistent with them.

I have not followed the thread too closely, but I suspect you are right.

But when I saw someone reporting VSZ, I thought that they should know that
it wasn't meaningful, just in case they keep barking up that tree.

Cheers.
Jim

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Mon Jul 31 17:47:33 2023

On 2023-07-31, Jim Diamond <JimDiamond@jdvb.ca> wrote:

On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:

Jim Diamond <JimDiamond@jdvb.ca> writes:

On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:

How much ram do you have?
Mu plasmashell right now is 2GB VSZ

I don't think the VSZ is all that significant.

Indeed VSZ tells you very little - for instance it includes a 2MB dead
page in the middle of every shared library, which (in a complex process)
scales up VSZ to something huge, but consumes almost no real resources.

What is the RSS?

I think the theories about user space memory leaks are a red herring.
The reported behavior is just not consistent with them.

I have not followed the thread too closely, but I suspect you are right.

But when I saw someone reporting VSZ, I thought that they should know that
it wasn't meaningful, just in case they keep barking up that tree.

Yes, thanks for that. VSZ seemed much to large to be reasonable. but I
could not see anything else in the ps aux report that could be the
actual memory used (except %MEM and what it was a percentage of I had no
idea.)

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From Paul@2:250/1 to All on Tue Aug 1 10:15:07 2023

On 7/31/2023 12:47 PM, William Unruh wrote:

On 2023-07-31, Jim Diamond <JimDiamond@jdvb.ca> wrote:

On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:

Jim Diamond <JimDiamond@jdvb.ca> writes:

On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote:

How much ram do you have?
Mu plasmashell right now is 2GB VSZ

I don't think the VSZ is all that significant.

Indeed VSZ tells you very little - for instance it includes a 2MB dead
page in the middle of every shared library, which (in a complex process) >>> scales up VSZ to something huge, but consumes almost no real resources.

What is the RSS?

I think the theories about user space memory leaks are a red herring.
The reported behavior is just not consistent with them.

I have not followed the thread too closely, but I suspect you are right.

But when I saw someone reporting VSZ, I thought that they should know that >> it wasn't meaningful, just in case they keep barking up that tree.

Yes, thanks for that. VSZ seemed much to large to be reasonable. but I
could not see anything else in the ps aux report that could be the
actual memory used (except %MEM and what it was a percentage of I had no idea.)

Only you know the particulars of your system.

You say you're using Plasma, and "plasma freezes" shows in Google searches.

If you were to note a rising RAM consumption (say, leave top running
when freeze occurs), in the example here, one post mentions
it is possible it's a video card driver issue. The graphics stack
is producing frames, but there is no consumer.

https://bugzilla.redhat.com/show_bug.cgi?id=1399396

In some cases, the NVidia driver has watchdog enabled, the Nouveau
does not. This makes the Nvidia driver capable of doing a VPU recover,
and restoring service before it is too late.

Paul

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

From William Unruh@2:250/1 to All on Tue Aug 1 17:26:32 2023

On 2023-08-01, Paul <nospam@needed.invalid> wrote:

On 7/31/2023 12:47 PM, William Unruh wrote:

On 2023-07-31, Jim Diamond <JimDiamond@jdvb.ca> wrote:

On 2023-07-31 at 04:11 ADT, Richard Kettlewell <invalid@invalid.invalid> wrote:

Jim Diamond <JimDiamond@jdvb.ca> writes:

On 2023-07-30 at 14:59 ADT, William Unruh <unruh@invalid.ca> wrote: >>>>>> How much ram do you have?

Mu plasmashell right now is 2GB VSZ

I don't think the VSZ is all that significant.

Indeed VSZ tells you very little - for instance it includes a 2MB dead >>>> page in the middle of every shared library, which (in a complex process) >>>> scales up VSZ to something huge, but consumes almost no real resources. >>>>

What is the RSS?

I think the theories about user space memory leaks are a red herring.
The reported behavior is just not consistent with them.

I have not followed the thread too closely, but I suspect you are right. >>>
But when I saw someone reporting VSZ, I thought that they should know that >>> it wasn't meaningful, just in case they keep barking up that tree.

Yes, thanks for that. VSZ seemed much to large to be reasonable. but I
could not see anything else in the ps aux report that could be the
actual memory used (except %MEM and what it was a percentage of I had no
idea.)

Only you know the particulars of your system.

You say you're using Plasma, and "plasma freezes" shows in Google searches.

As I said it is not just plasma that freezes but the whole system. I
cannot log in via ssh on the network, no keyboard entry works (eb
alt-ctrl-del does not reboot the system), no mouse cursor moves when I
more themouse. In the past ( ie over a year ago) when this happened,
there was no precursor. The system did not slow down. before hand. I
know what swapping feels like, and there was no evidence of that. (I
have as much swap space as I have ram, about 8GB each)

If you were to note a rising RAM consumption (say, leave top running
when freeze occurs), in the example here, one post mentions

The freeze occurs randomly about once every month or so (or rather I
have had two in the past 2 months and none from last Aug to that time)

it is possible it's a video card driver issue. The graphics stack
is producing frames, but there is no consumer.

I have an onboard Intel graphics.
lspci says
00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630]

https://bugzilla.redhat.com/show_bug.cgi?id=1399396

In some cases, the NVidia driver has watchdog enabled, the Nouveau
does not. This makes the Nvidia driver capable of doing a VPU recover,
and restoring service before it is too late.

--- MBSE BBS v1.0.8.4 (Linux-x86_64)
* Origin: A noiseless patient Spider (2:250/1@fidonet)

Who's Online
Recent Visitors
- Keyop
  Sun May 5 19:26:27 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun May 5 19:26:11 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon May 6 11:44:29 2024
  from Wales, Uk via Telnet
- Bob Worm
  Tue May 7 09:06:52 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	39:08:53
Calls:	6,708
Calls today:	1
Files:	12,241
Messages:	5,353,638

computer (MGA8) randomly freezes

Who's Online

Recent Visitors

System Info