Forum: >>> Magnum BBS <<<

System freezes: How to get the reason?

From Paulo da Silva@21:1/5 to All on Mon Sep 27 18:22:21 2021

XPost: alt.os.linux

Hi all!

From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco Moock@21:1/5 to All on Mon Sep 27 19:38:14 2021

XPost: alt.os.linux

Am Mon, 27 Sep 2021 18:35:37 +0100
schrieb Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt>:

Às 18:27 de 27/09/21, Marco Moock escreveu:

Does it also happen with the nouveau driver?
Does it happen in the live system?
Does it happen with another graphics card or with that card in
another computer?

I can't use the nouveau driver because I need the computer (a laptop)
for AI deep learning with tensorflow GPU.

You can try if it does not happen with nouveau, maybe in the live system, it doesn't have nvidia-470 installed

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Paulo da Silva on Mon Sep 27 13:59:07 2021

XPost: alt.os.linux

Paulo da Silva wrote:

Às 18:27 de 27/09/21, Marco Moock escreveu:

Does it also happen with the nouveau driver?
Does it happen in the live system?
Does it happen with another graphics card or with that card in another computer?

I can't use the nouveau driver because I need the computer (a laptop)
for AI deep learning with tensorflow GPU.

Do you correlate the failure, with any particular
activity on the machine ?

For example, a more mundane activity on a computer,
is the usage of modern Firefox. While the user is
not viewing a web page, Firefox seems to leak memory
until all available memory in Ring 3 is used up.

But Linux has Out of Memory (OOM) killer, for the
handling of memory exhaustion that way. The system
should not freeze because Firefox happens to be
running.

Whereas, I don't know what happens, if a GPU that
uses shared memory, happens to request more and
more RAM for some GPU activity. An NVidia GPU is
more likely to have its own memory chips, and be
less likely to cause resource exhaustion on its own.

Try running "nvidia-smi" in a terminal window,
selecting the option to have it update the
screen repetitively (like "top" in a sense), and
watch resource consumption listed there. If you're
running the NVidia driver, that program should be
installed for you.

You could run "top" in one terminal window (using
the information near the top of top, for resource info).
And run "nvidia-smi" in a second window, to watch
for dwindling NVidia resources.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Mon Sep 27 19:02:28 2021

XPost: alt.os.linux

Às 18:38 de 27/09/21, Marco Moock escreveu:

Am Mon, 27 Sep 2021 18:35:37 +0100
schrieb Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt>:

Às 18:27 de 27/09/21, Marco Moock escreveu:

Does it also happen with the nouveau driver?
Does it happen in the live system?
Does it happen with another graphics card or with that card in
another computer?

I can't use the nouveau driver because I need the computer (a laptop)
for AI deep learning with tensorflow GPU.

You can try if it does not happen with nouveau, maybe in the live system, it doesn't have nvidia-470 installed

I only changed to 470 after the problem caused me a small loss of data
in the hope for a solution, but it also failed. So far I was able to
accept a failure once in a while. Most of time it works without any
problem. It may be a month perhaps more without any problem. Also that's
why using nouveau is not possible.

I wander if there is some kind of script or configuration that forces
the logs not to be buffered. I'll search in the internet ...

Thanks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco Moock@21:1/5 to All on Mon Sep 27 19:27:24 2021

XPost: alt.os.linux

Does it also happen with the nouveau driver?
Does it happen in the live system?
Does it happen with another graphics card or with that card in another computer?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Mon Sep 27 18:35:37 2021

XPost: alt.os.linux

Às 18:27 de 27/09/21, Marco Moock escreveu:

Does it also happen with the nouveau driver?
Does it happen in the live system?
Does it happen with another graphics card or with that card in another computer?

I can't use the nouveau driver because I need the computer (a laptop)
for AI deep learning with tensorflow GPU.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Java Jive@21:1/5 to Paulo da Silva on Mon Sep 27 19:36:08 2021

XPost: alt.os.linux

On 27/09/2021 18:22, Paulo da Silva wrote:

From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

Sounds as though it might be hardware. At least that could be something
to eliminate. Maybe run a memcheck, and an fsck of the entire disk surface?

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E. R.@21:1/5 to Paulo da Silva on Mon Sep 27 20:34:04 2021

XPost: alt.os.linux

On 27/09/2021 20.02, Paulo da Silva wrote:

Às 18:38 de 27/09/21, Marco Moock escreveu:

...

I wander if there is some kind of script or configuration that forces
the logs not to be buffered. I'll search in the internet ...

Yes. You can send kernel logs directly to another machine via ethernet,
or even better if available, serial port.

Directly from the kernel, mind.

I may be able to locate information later, if you are interested. Hidden
deep in my bug reports somewhere. But I don't have my notes taken on the machine I used for this, it is on another city.

--
Cheers,
Carlos E.R.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marco Moock@21:1/5 to All on Mon Sep 27 20:42:17 2021

XPost: alt.os.linux

Am Mon, 27 Sep 2021 19:36:08 +0100
schrieb Java Jive <java@evij.com.invalid>:

Maybe run a memcheck, and an fsck of the
entire disk surface?

If is is a drive fault, SysRq+R should still work.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Mon Sep 27 20:29:31 2021

XPost: alt.os.linux

Às 19:36 de 27/09/21, Java Jive escreveu:

On 27/09/2021 18:22, Paulo da Silva wrote:

From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

Sounds as though it might be hardware.

Almost for sure ...
...

Maybe run a memcheck,
How? In my boot menu there is no such option :-(

and an fsck of the entire disk

surface?

I am running btrfs and I use scrub after the freezes. Never had an error
on my SSD.
Also smartctl -a only reports one error for a long time
Error Information Log Entries: 1

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Mon Sep 27 20:32:19 2021

XPost: alt.os.linux

Às 19:34 de 27/09/21, Carlos E. R. escreveu:

On 27/09/2021 20.02, Paulo da Silva wrote:

Às 18:38 de 27/09/21, Marco Moock escreveu:

...

I wander if there is some kind of script or configuration that forces
the logs not to be buffered. I'll search in the internet ...

Yes. You can send kernel logs directly to another machine via ethernet,
or even better if available, serial port.

Directly from the kernel, mind.

I may be able to locate information later, if you are interested. Hidden
deep in my bug reports somewhere.

I would thank you very much if you could find them.
I am searching the internet for this stuff but so far I only found
trivial suggestions about logs.

But I don't have my notes taken on the

machine I used for this, it is on another city.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Java Jive@21:1/5 to Paulo da Silva on Mon Sep 27 20:57:46 2021

XPost: alt.os.linux

On 27/09/2021 20:29, Paulo da Silva wrote:

Às 19:36 de 27/09/21, Java Jive escreveu:

On 27/09/2021 18:22, Paulo da Silva wrote:

From time to time - may be a month or a couple of hours - my computer >>> completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

Sounds as though it might be hardware.

Almost for sure ...
....

Maybe run a memcheck,
How? In my boot menu there is no such option :-(

Download an image and boot from it:
https://www.memtest86.com/

and an fsck of the entire disk

surface?

I am running btrfs and I use scrub after the freezes. Never had an error
on my SSD.
Also smartctl -a only reports one error for a long time
Error Information Log Entries: 1

Fair enough, I didn't realise it was an SSD not a spinner, and it was
just one possible line of enquiry.

--

Fake news kills!

I may be contacted via the contact address given on my website:
www.macfh.co.uk

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Mon Sep 27 20:23:54 2021

XPost: alt.os.linux

Às 18:59 de 27/09/21, Paul escreveu:

Paulo da Silva wrote:

Às 18:27 de 27/09/21, Marco Moock escreveu:

Does it also happen with the nouveau driver?
Does it happen in the live system?
Does it happen with another graphics card or with that card in
another computer?

I can't use the nouveau driver because I need the computer (a laptop)
for AI deep learning with tensorflow GPU.

Do you correlate the failure, with any particular
activity on the machine ?

Certainly no. For example the last time I just left the computer
unatended making a backup. When I returned to the computer it was
frozen. The backup had terminated, however,

For example, a more mundane activity on a computer,
is the usage of modern Firefox. While the user is
not viewing a web page, Firefox seems to leak memory
until all available memory in Ring 3 is used up.

But Linux has Out of Memory (OOM) killer, for the
handling of memory exhaustion that way. The system
should not freeze because Firefox happens to be
running.

I have panel widgets monitoring many things, among them memory. The
laptop has 32GB of RAM. I rarely need them except for some data
processing on AI.
Also the temperature is kept low because the clock is set to half freq.
except when I need to run some special tasks, like training AI
algorithms for example. This is a very fast machine.
BTW, I never get a freeze when running these tasks. Certainly a
coincidence, because the freezes in general are very rare.

Whereas, I don't know what happens, if a GPU that
uses shared memory, happens to request more and
more RAM for some GPU activity. An NVidia GPU is
more likely to have its own memory chips, and be
less likely to cause resource exhaustion on its own.

Try running "nvidia-smi" in a terminal window,
selecting the option to have it update the
screen repetitively (like "top" in a sense), and
watch resource consumption listed there. If you're
running the NVidia driver, that program should be
installed for you.

You could run "top" in one terminal window (using
the information near the top of top, for resource info).
And run "nvidia-smi" in a second window, to watch
for dwindling NVidia resources.

I'll try this. Thanks Paul.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to Paulo da Silva on Mon Sep 27 22:47:16 2021

XPost: alt.os.linux

On 27/09/2021 19.22, Paulo da Silva wrote:

I am using kubuntu 20.04.
From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

Does your HDD led flash at lot?
If so, I would bet my money on that Plasma5 has leaked memory, in such
case the following bug could be of interest for your: https://bugs.kde.org/show_bug.cgi?id=436061

There ain't much you can do about this, the machine is too occupied with swapping that you won't be able to ssh to the machine. It could be wise
to disable swap and those get the kernel to kill a random process and
hopefully it is plasmashell. I have had times when plasmashell has taken
58G of RAM and it's no other option than reboot the computer.

After restart the journalctl -b -b1 shows nothing at the freeze time.

Tend to be difficult to write to file when system under heavy load.

Is there a way to get some information on what this is happening?

For me it was more to try to be notified before it's get too bad, like
logging the output from top* once every five minutes and that way be
able to see memory usage.

* for example use: top -b -n 1 >> /path/to/file/where/you/want/to/log

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E. R.@21:1/5 to Paulo da Silva on Tue Sep 28 00:58:43 2021

XPost: alt.os.linux

On 27/09/2021 21.29, Paulo da Silva wrote:

Às 19:36 de 27/09/21, Java Jive escreveu:

and an fsck of the entire disk

surface?

I am running btrfs and I use scrub after the freezes. Never had an error
on my SSD.
Also smartctl -a only reports one error for a long time
Error Information Log Entries: 1

You should do a smartctl short test, then a long test.

--
Cheers,
Carlos E.R.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Tue Sep 28 00:36:16 2021

XPost: alt.os.linux

Às 23:58 de 27/09/21, Carlos E. R. escreveu:

On 27/09/2021 21.29, Paulo da Silva wrote:

Às 19:36 de 27/09/21, Java Jive escreveu:

and an fsck of the entire disk

surface?

I am running btrfs and I use scrub after the freezes. Never had an error
on my SSD.
Also smartctl -a only reports one error for a long time
Error Information Log Entries: 1

You should do a smartctl short test, then a long test.

It doesn't work for SSD, at least for mine.
Only smartctl -a /dev/...

# smartctl -t long /dev/nvme0n1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-88-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

NVMe device successfully opened

Use 'smartctl -a' (or '-x') to print SMART (and more) information

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E. R.@21:1/5 to Paulo da Silva on Tue Sep 28 02:15:46 2021

XPost: alt.os.linux

On 28/09/2021 01.36, Paulo da Silva wrote:

Às 23:58 de 27/09/21, Carlos E. R. escreveu:

On 27/09/2021 21.29, Paulo da Silva wrote:

Às 19:36 de 27/09/21, Java Jive escreveu:

and an fsck of the entire disk

surface?

I am running btrfs and I use scrub after the freezes. Never had an error >>> on my SSD.
Also smartctl -a only reports one error for a long time
Error Information Log Entries: 1

You should do a smartctl short test, then a long test.

It doesn't work for SSD, at least for mine.

It works on mine. Sigh...

Only smartctl -a /dev/...

# smartctl -t long /dev/nvme0n1

Ah, that's not an SSD proper, but an nvme. Does not have a SATA
connection, has to emulate some things. Thus smart may not work.

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-88-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

NVMe device successfully opened

Use 'smartctl -a' (or '-x') to print SMART (and more) information

Pity.

--
Cheers,
Carlos E.R.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From William Unruh@21:1/5 to J.O. Aho on Tue Sep 28 01:55:20 2021

XPost: alt.os.linux

On 2021-09-27, J.O. Aho <user@example.net> wrote:

On 27/09/2021 19.22, Paulo da Silva wrote:

I am using kubuntu 20.04.
From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I wouod bet on a hardware problem. No warning. random. eg, the power
supply voltage could drop briefly. The system has not way of recording
it.
Buy a new computer.

Does your HDD led flash at lot?
If so, I would bet my money on that Plasma5 has leaked memory, in such
case the following bug could be of interest for your: https://bugs.kde.org/show_bug.cgi?id=436061

There ain't much you can do about this, the machine is too occupied with swapping that you won't be able to ssh to the machine. It could be wise
to disable swap and those get the kernel to kill a random process and hopefully it is plasmashell. I have had times when plasmashell has taken
58G of RAM and it's no other option than reboot the computer.

After restart the journalctl -b -b1 shows nothing at the freeze time.

Tend to be difficult to write to file when system under heavy load.

Is there a way to get some information on what this is happening?

For me it was more to try to be notified before it's get too bad, like logging the output from top* once every five minutes and that way be
able to see memory usage.

* for example use: top -b -n 1 >> /path/to/file/where/you/want/to/log

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From stepore@21:1/5 to Paulo da Silva on Mon Sep 27 20:44:24 2021

XPost: alt.os.linux

On 09/27/2021 12:32 PM, Paulo da Silva wrote:

I would thank you very much if you could find them.
I am searching the internet for this stuff but so far I only found
trivial suggestions about logs.

It's fairly trivial to setup another computer as a remote syslog server
and ship your laptop logs to that. Or if you're really keen, use
something like graylog or ELK stack or even free version of Splunk to
ship your logs to. They give you great insights to system logs.

On that note, it might be worth it to you to set up something like
Grafana on another server (or again Splunk) so you can setup and see
dashboards and historical overview of all system resources before/after
a freeze.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to William Unruh on Tue Sep 28 11:03:36 2021

XPost: alt.os.linux

On 28/09/2021 03.55, William Unruh wrote:

On 2021-09-27, J.O. Aho <user@example.net> wrote:

On 27/09/2021 19.22, Paulo da Silva wrote:

I am using kubuntu 20.04.
From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I wouod bet on a hardware problem. No warning. random. eg, the power
supply voltage could drop briefly. The system has not way of recording
it.

The plasmashell issue is quite random, sometimes it can take days before
it happens, sometimes it's just a short while after fresh reboot, so I
wouldn't jump on a hardware issue before ruling out powershell bug.

It must be quite expensive for you to get a new computer each time you
had a software issue.

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to William Unruh on Tue Sep 28 05:47:50 2021

XPost: alt.os.linux

William Unruh wrote:

On 2021-09-27, J.O. Aho <user@example.net> wrote:

On 27/09/2021 19.22, Paulo da Silva wrote:

I am using kubuntu 20.04.
From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I wouod bet on a hardware problem. No warning. random. eg, the power
supply voltage could drop briefly. The system has not way of recording
it.
Buy a new computer.

Among enthusiasts, it is popular to stock a spare
power supply. You can fit your spare supply and
retest, and see if that theory holds water.
Right now, the junk room sports a Seasonic S12
as the "designated hitter".

Running Prime95 (statically compiled Linux version
in "Just Testing" mode), while using the existing
supply, is an acceptance test. It tests machine
cooling is adequate (run something lmsensors based,
to see whether temp overshoots, while you're waiting
for the machine to shut off on CPU THERMTRIP). It draws
max CPU power. My machine, wall power climbs to 180W
while running that CPU integrity test.

https://www.mersenne.org/download/

If you have NVidia driver, you can add in a graphics
test if you want, but I don't have anything for that
in mind. I have a CUDA app, but it would be a pig
to set up due to libs and so on. On my machine, running
the graphics test case while Prime95 is running, raises
machine power to 360W (on a 550W PSU). Modern video
cards have a power limiter, and they also have a
status indicator in software, indicating which limiter is limiting
GPU performance. Running NVENC or NVDEC for example,
the card won't use more than 1/3rd of max power.

Normally, my machine power level doesn't go past 200W
without testing assistance like that. 360W to 400W loading,
is via synthetic (unlikely) tests.

*******

Haswell CPUs, at the time, some power supplies would
become unstable at low load, leading to "Haswell certified"
power supplies. But the most likely reason for that
to happen, was the existence of some older supplies
that have (on the label), a row of numbers for
"minimum consumption". No supply created in at least
the last ten years, has that row of numbers on the label.

The absolute worst situation of that type, is there
existed one supply, where the 12V rail needed 25% loading
to remain stable. So if the rail was 40 amps, the label would
read: Naturally, I was careful to never buy a supply
with the two-row MIN/MAX labeling, as it's an admission
of "stupid" in design. You would always be looking over
your shoulder, if you bought the one on the left.

Ancient supply label Modern supply label (zero amps is OK)
... +12V
Min 10A ... +12V
Max 40A Max 40A

With lots of computer hardware today, such a guarantee
could not be met in the form of min loading. The idle current
could easily drop below 10A for example. Some modern supplies
have met the "0 amps" requirement, by having a 5W or 10W
load inside the PSU for the purpose of meeting open circuit
stability requirements. It's unlikely an 80+ supply is
doing that.

And here, stability does not mean "oscillation",
stability means remaining in regulation, 12V +/- 5%. If
unloaded, a "MIN/MAX" supply might deviate past 5% by a bit.
12V only gets in trouble, if it drops below 11V, as an example of
how far it can be pushed on overload. Burning might result
(hard drive clamp device activates) at around +15V or so.
There's a bit of headroom on +12V on the high side. Some
other rails don't have that luxury.

A multimeter is recommended, if checking voltages. Do not
trust the ACPI-calibrated voltage readouts for this. The
multimeter might be accurate to around 2% or so. And be careful
with the multimeter probes - one of those modern 1200W supplies,
if you happened to short +12V, it would not be pretty. They
live for the chance to melt wiring. While in theory, individual
wire looms have 20A limiters (PSU shuts off), you don't
want to be testing the cheapness of the company making
the supply, even if you've paid $150 for it. In some ways,
the behavior of the supply, is not adequately captured in
the affixed labeling scheme (specifically, OC protection).
There's been at least one, where it didn't appear
there was adequate loom protection.

In terms of noise patterns, supplies have "ripple". This might
be in the 0.02 to 0.05V range or so. The output capacitors
determine how fast the rail can change instantaneously.

This is a really old schematic now, for PSU education,
but it still illustrates the design principles.
There's 1000uF on the +12V rail for example. Supplies
typically can have 4000-5000 more uF added to the rail
at the load, before it affects oscillation stability.
Precise information of that nature, is hard to get
from a manufacturer, but the designer is aware of
the issue. You can't put 250,000uF across a PC PSU.

http://www.pavouk.org/hw/en_atxps.html

The ATX supply "pushes" but does not "pull". It is
not an op amp or linear amplifier. If the supply
deviates due to transient loading, it likely does
not respond well to energy dumped back into the
supply. Motherboards don't generally do that.

Only one regulator in the whole PC is push/pull. And
that's the regulator for the DIMM terminator resistors,
where the current flow magnitude can be in the +2 amps
to -2 amps range (bus all 0's, bus all 1's). The
regulator must sink the -2 amps, in order to precisely
maintain the terminators at the correct voltage
(otherwise, your PC may suffer the "Photoshop bug").
Most other regulators are the "push only" variety.
A 7805 is a push only regulator. It's not intended
to sink backward current flow.

Summary: I doubt it is the PSU, but... that's why we
test stuff.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos@21:1/5 to Paulo da Silva on Tue Sep 28 11:54:29 2021

XPost: alt.os.linux

On Mon, 27 Sep 2021 20:32:19 +0100, Paulo da Silva wrote:

Às 19:34 de 27/09/21, Carlos E. R. escreveu:

On 27/09/2021 20.02, Paulo da Silva wrote:

Às 18:38 de 27/09/21, Marco Moock escreveu:

...

I wander if there is some kind of script or configuration that forces
the logs not to be buffered. I'll search in the internet ...

Yes. You can send kernel logs directly to another machine via ethernet,
or even better if available, serial port.

Directly from the kernel, mind.

I may be able to locate information later, if you are interested.
Hidden deep in my bug reports somewhere.

I would thank you very much if you could find them.
I am searching the internet for this stuff but so far I only found
trivial suggestions about logs.

Found the bug report :-)

Or one of them, I'm reading. Dec 2008.

First, I was told to "boot with console=tty0
console=ttyS0,<speed>", then run "klogconsole -r0 -l8" once booted.

Ok, this is not it, this is using an actual serial port.

Continue searching.

Ah, I wrote notes! I copy and translate them.
I'll post using "Pan" because I can disable word wrap. I hope it gets to you with the long lines intact, even if it is not "Usenet valid".

+++================================ kernel messages via serial port.

grub:
console=tty9 console=ttyS1,57600
shell:
klogconsole -r0 -l9

mind, interferes with hibernation

================================---

+++================================ netconsole. Kernel logging on remote machine.

Date: Fri, 23 May 2014 19:19:15 -0400
From: Cristian Rodríguez <...@opensuse.org>
Reply-To: OS-en <opensuse@opensuse.org>
To: opensuse@opensuse.org
Subject: Re: [opensuse] Kernel crash on multiple file write on reiserfs GPT partition.

...

P D O .. that means:

"P" --> propietary module loaded, developers will most likely ignore
your report if it comes in this form.

"D" --> the kernel has oopsed before, that means what you are showing
in the picture is a secondary oops, not the actual problem.

"O" -> "Out of tree module" is loaded, good luck with getting that fixed.

...

https://www.kernel.org/doc/Documentation/networking/netconsole.txt
...
Ah. Ok it appears to be the same as in "/usr/share/doc/packages/netconsole-tools/netlogging.txt"
...
The documentation is obsolete. The correct syntax appears to be:

modprobe netconsole 6666@192.168.1.14/eth0,6666@192.168.1.15

which I got from "http://www.cyberciti.biz/tips/linux-netconsole-log-management-tutorial.html".

not

modprobe netconsole netconsole="...

...

Try with the section "dynamic configuration" from the netconsole.txt doc.

Telcontar:~ # modprobe netconsole
Telcontar:~ # cd /sys/kernel/config/netconsole/ Telcontar:/sys/kernel/config/netconsole # ls Telcontar:/sys/kernel/config/netconsole # mkdir target1 Telcontar:/sys/kernel/config/netconsole # ls
target1
Telcontar:/sys/kernel/config/netconsole # cd target1/ Telcontar:/sys/kernel/config/netconsole/target1 # ls
dev_name enabled local_ip local_mac local_port remote_ip remote_mac remote_port
Telcontar:/sys/kernel/config/netconsole/target1 # cat local_
local_ip local_mac local_port Telcontar:/sys/kernel/config/netconsole/target1 # cat local_ip
0.0.0.0
Telcontar:/sys/kernel/config/netconsole/target1 # echo 192.168.1.14 > local_ip Telcontar:/sys/kernel/config/netconsole/target1 # echo 6666 > local_port

but

Telcontar:/sys/kernel/config/netconsole/target1 # echo "00:21:85:16:2D:0B" > local_mac
-bash: local_mac: Permission denied Telcontar:/sys/kernel/config/netconsole/target1 # cat local_mac
ff:ff:ff:ff:ff:

weird.

Telcontar:/sys/kernel/config/netconsole/target1 # echo "00:03:0D:05:17:FC" > remote_mac
Telcontar:/sys/kernel/config/netconsole/target1 # echo 6666 > remote_port Telcontar:/sys/kernel/config/netconsole/target1 # echo 192.168.1.15 > remote_ip Telcontar:/sys/kernel/config/netconsole/target1 # cat dev_name
eth0
Telcontar:/sys/kernel/config/netconsole/target1 # echo 1 > enabled

It is apparently started:
Telcontar:/sys/kernel/config/netconsole/target1 # tail /var/log/messages
<3.6> 2014-05-24 13:23:01 Telcontar systemd 1 - - Starting Session 78 of user news.
<3.6> 2014-05-24 13:25:01 Telcontar systemd 1 - - Starting Session 79 of user news.
<3.6> 2014-05-24 13:28:01 Telcontar systemd 1 - - Starting Session 80 of user news.
<0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.603827] netpoll: netconsole: local port 6666
<0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.609384] netpoll: netconsole: local IPv4 address 192.168.1.14
<0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.614762] netpoll: netconsole: interface 'eth0'
<0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.620095] netpoll: netconsole: remote port 6666
<0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.625373] netpoll: netconsole: remote IPv4 address 192.168.1.15
<0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.630545] netpoll: netconsole: remote ethernet address 00:03:0d:05:17:fc
<0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.635653] netconsole: network logging started

On the receiving computer, I have:

netcat -u -l 6666 | tee -a remote_log

I plugged a usb stick, and got the messages on the remote, so good!

Now I go for testing and crashing the machine again. Nvidia is not in the list. Wish me luck!

modprobe netconsole
cd /sys/kernel/config/netconsole/
ls
mkdir target1
ls

cd target1/
ls
cat *

echo 192.168.1.14 > local_ip
echo 6666 > local_port

echo "00:03:0D:05:17:FC" > remote_mac
echo 6666 > remote_port
echo 192.168.1.15 > remote_ip
cat dev_name
echo 1 > enabled

------

2015-11-22

6666 - Local port
192.168.1.5 - Local system IP
eth0 - Local system interface
514 - Remote syslogd udp port
192.168.1.100 - Remote syslogd IP
00:19:D1:2A:BA:A8 - Remote syslogd Mac

You can add above modprobe line to /etc/rc.local to load module automatically. Another recommend option is create /etc/modprobe.d/netconsole file and append following text:
# echo 'options netconsole netconsole=6666@192.168.1.5/eth0,514@192.168.1.100/00:19:D1:2A:BA:A8 '> /etc/modprobe.d/netconsole

echo 'options netconsole netconsole=6666@192.168.1.14/eth0,514@192.168.1.15/00:03:0d:05:17:fc '> /etc/modprobe.d/netconsole

The log shows:

<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081911] netpoll: netconsole: local port 6666
<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081920] netpoll: netconsole: local IPv4 address 192.168.1.14
<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081921] netpoll: netconsole: interface 'eth0'
<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081922] netpoll: netconsole: remote port 514
<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081923] netpoll: netconsole: remote IPv4 address 192.168.1.15
<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081925] netpoll: netconsole: remote ethernet address 00:03:0d:05:17:fc
<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081949] console [netcon0] enabled
<0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081949] netconsole: network logging started

2015-11-22 On the other side I don't receive anything, once I open the port. Nothing.

This does work:

modprobe netconsole
cd /sys/kernel/config/netconsole/
ls
mkdir target1
ls

cd target1/
ls
cat *

echo 192.168.1.14 > local_ip
echo 6666 > local_port

echo "00:03:0D:05:17:FC" > remote_mac
echo 6666 > remote_port
echo 192.168.1.15 > remote_ip
cat dev_name
echo 1 > enabled

But not on port 514.

This does work.

echo 'options netconsole netconsole=6666@192.168.1.14/eth0,6666@192.168.1.15/00:03:0d:05:17:fc '> /etc/modprobe.d/netconsole.conf

Meaning, syslog does not work.

netcat -u -l 6666 | tee -a remote_log

<https://www.kernel.org/doc/Documentation/networking/netconsole.txt>
Very good: <http://www.cyberciti.biz/tips/linux-netconsole-log-management-tutorial.html#comment-620097>

================================---

HTH :-)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E. R.@21:1/5 to stepore on Tue Sep 28 13:33:41 2021

XPost: alt.os.linux

On 28/09/2021 05.44, stepore wrote:

On 09/27/2021 12:32 PM, Paulo da Silva wrote:

I would thank you very much if you could find them.
I am searching the internet for this stuff but so far I only found
trivial suggestions about logs.

It's fairly trivial to setup another computer as a remote syslog server
and ship your laptop logs to that.

That's not the same thing as I proposed, because it runs in userspace.

--
Cheers,
Carlos E.R.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jonathan N. Little@21:1/5 to Paul on Tue Sep 28 10:57:24 2021

XPost: alt.os.linux

Paul wrote:

William Unruh wrote:

On 2021-09-27, J.O. Aho <user@example.net> wrote:

On 27/09/2021 19.22, Paulo da Silva wrote:

I am using kubuntu 20.04.
From time to time - may be a month or a couple of hours - my computer >>>> completely freezes. Everything stops. The screen shows the last image. >>>> Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I wouod bet on a hardware problem. No warning. random. eg, the power
supply voltage could drop briefly. The system has not way of recording
it.
Buy a new computer.

Among enthusiasts, it is popular to stock a spare
power supply. You can fit your spare supply and
retest, and see if that theory holds water.
Right now, the junk room sports a Seasonic S12
as the "designated hitter".

Running Prime95 (statically compiled Linux version
in "Just Testing" mode), while using the existing
supply, is an acceptance test. It tests machine
cooling is adequate (run something lmsensors based,
to see whether temp overshoots, while you're waiting
for the machine to shut off on CPU THERMTRIP). It draws
max CPU power. My machine, wall power climbs to 180W
while running that CPU integrity test.

   https://www.mersenne.org/download/

If you have NVidia driver, you can add in a graphics
test if you want, but I don't have anything for that
in mind. I have a CUDA app, but it would be a pig
to set up due to libs and so on. On my machine, running
the graphics test case while Prime95 is running, raises
machine power to 360W (on a 550W PSU). Modern video
cards have a power limiter, and they also have a
status indicator in software, indicating which limiter is limiting
GPU performance. Running NVENC or NVDEC for example,
the card won't use more than 1/3rd of max power.

Normally, my machine power level doesn't go past 200W
without testing assistance like that. 360W to 400W loading,
is via synthetic (unlikely) tests.

*******

Haswell CPUs, at the time, some power supplies would
become unstable at low load, leading to "Haswell certified"
power supplies. But the most likely reason for that
to happen, was the existence of some older supplies
that have (on the label), a row of numbers for
"minimum consumption". No supply created in at least
the last ten years, has that row of numbers on the label.

The absolute worst situation of that type, is there
existed one supply, where the 12V rail needed 25% loading
to remain stable. So if the rail was 40 amps, the label would
read: Naturally, I was careful to never buy a supply
with the two-row MIN/MAX labeling, as it's an admission
of "stupid" in design. You would always be looking over
your shoulder, if you bought the one on the left.

Ancient supply label          Modern supply label (zero amps is OK)      ... +12V
Min       10A                      ... +12V Max       40A                 Max       40A

With lots of computer hardware today, such a guarantee
could not be met in the form of min loading. The idle current
could easily drop below 10A for example. Some modern supplies
have met the "0 amps" requirement, by having a 5W or 10W
load inside the PSU for the purpose of meeting open circuit
stability requirements. It's unlikely an 80+ supply is
doing that.

And here, stability does not mean "oscillation",
stability means remaining in regulation, 12V +/- 5%. If
unloaded, a "MIN/MAX" supply might deviate past 5% by a bit.
12V only gets in trouble, if it drops below 11V, as an example of
how far it can be pushed on overload. Burning might result
(hard drive clamp device activates) at around +15V or so.
There's a bit of headroom on +12V on the high side. Some
other rails don't have that luxury.

A multimeter is recommended, if checking voltages. Do not
trust the ACPI-calibrated voltage readouts for this. The
multimeter might be accurate to around 2% or so. And be careful
with the multimeter probes - one of those modern 1200W supplies,
if you happened to short +12V, it would not be pretty. They
live for the chance to melt wiring. While in theory, individual
wire looms have 20A limiters (PSU shuts off), you don't
want to be testing the cheapness of the company making
the supply, even if you've paid $150 for it. In some ways,
the behavior of the supply, is not adequately captured in
the affixed labeling scheme (specifically, OC protection).
There's been at least one, where it didn't appear
there was adequate loom protection.

In terms of noise patterns, supplies have "ripple". This might
be in the 0.02 to 0.05V range or so. The output capacitors
determine how fast the rail can change instantaneously.

This is a really old schematic now, for PSU education,
but it still illustrates the design principles.
There's 1000uF on the +12V rail for example. Supplies
typically can have 4000-5000 more uF added to the rail
at the load, before it affects oscillation stability.
Precise information of that nature, is hard to get
from a manufacturer, but the designer is aware of
the issue. You can't put 250,000uF across a PC PSU.

http://www.pavouk.org/hw/en_atxps.html

The ATX supply "pushes" but does not "pull". It is
not an op amp or linear amplifier. If the supply
deviates due to transient loading, it likely does
not respond well to energy dumped back into the
supply. Motherboards don't generally do that.

Only one regulator in the whole PC is push/pull. And
that's the regulator for the DIMM terminator resistors,
where the current flow magnitude can be in the +2 amps
to -2 amps range (bus all 0's, bus all 1's). The
regulator must sink the -2 amps, in order to precisely
maintain the terminators at the correct voltage
(otherwise, your PC may suffer the "Photoshop bug").
Most other regulators are the "push only" variety.
A 7805 is a push only regulator. It's not intended
to sink backward current flow.

Summary: I doubt it is the PSU, but... that's why we
         test stuff.

Later in the thread I believe OP said it was a laptop. Swapping PSU not
an option. One thing that most likely the cause especially on a laptop
is heat-related hardware issue. Laptops make this a more difficult issue
to deal with, but depending on the laptop I would open 'er up and at
least blow out all the dust. The Dell and Lenovo I have is a simple
process, some other brands, not so much. Looking at the mb caps and
crusty corrosion... My old Latitude D-820 was a lap-roster with nVidia
GPU that was notorious for GPU meltdowns. I cleaned and remount heap
pipes several times and avoided that fate.

--
Take care,

Jonathan
-------------------
LITTLE WORKS STUDIO
http://www.LittleWorksStudio.com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Jonathan N. Little on Tue Sep 28 11:42:37 2021

XPost: alt.os.linux

Jonathan N. Little wrote:

Later in the thread I believe OP said it was a laptop. Swapping PSU not
an option. One thing that most likely the cause especially on a laptop
is heat-related hardware issue. Laptops make this a more difficult issue
to deal with, but depending on the laptop I would open 'er up and at
least blow out all the dust. The Dell and Lenovo I have is a simple
process, some other brands, not so much. Looking at the mb caps and
crusty corrosion... My old Latitude D-820 was a lap-roster with nVidia
GPU that was notorious for GPU meltdowns. I cleaned and remount heap
pipes several times and avoided that fate.

Laptops are less debug-able.

The posts I read referred to a "computer".

Setting up a serial port, is the best way
to determine if it is really frozen. I prefer
the SuperIO serial port type, to USB serial.

I use this on the boot line of my newest computer:

console=ttyS0,57600n8

I have a serial cable that runs from the other
machine, over to this machine, where I can monitor it.

The nice thing about ttyS0, is it never moves,
whereas if you use USB serial adapters, you
don't know what the identifier for it is. Maybe
plugging in some other stuff, upsets your debug port.

Not that most people like serial ports, but
I like it. Gets the job done. Works good when
the HID stops working on a setup.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jasen Betts@21:1/5 to Paul on Tue Sep 28 20:02:58 2021

XPost: alt.os.linux

On 2021-09-28, Paul <nospam@needed.invalid> wrote:

Jonathan N. Little wrote:

Later in the thread I believe OP said it was a laptop. Swapping PSU not
an option. One thing that most likely the cause especially on a laptop
is heat-related hardware issue. Laptops make this a more difficult issue
to deal with, but depending on the laptop I would open 'er up and at
least blow out all the dust. The Dell and Lenovo I have is a simple
process, some other brands, not so much. Looking at the mb caps and
crusty corrosion... My old Latitude D-820 was a lap-roster with nVidia
GPU that was notorious for GPU meltdowns. I cleaned and remount heap
pipes several times and avoided that fate.

Laptops are less debug-able.

The posts I read referred to a "computer".

Setting up a serial port, is the best way
to determine if it is really frozen. I prefer
the SuperIO serial port type, to USB serial.

I use this on the boot line of my newest computer:

console=ttyS0,57600n8

I have a serial cable that runs from the other
machine, over to this machine, where I can monitor it.

The nice thing about ttyS0, is it never moves,
whereas if you use USB serial adapters, you
don't know what the identifier for it is. Maybe
plugging in some other stuff, upsets your debug port.

USB serial never moves if you use /dev/serial/by-path, then it's tied
to the physical socket you plugged it into (including any intermediate
hubs).

Not that most people like serial ports, but
I like it. Gets the job done. Works good when
the HID stops working on a setup.

Also way better performance than a VNC if you're working on remote
servers

--
Jasen.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Branimir Maksimovic@21:1/5 to Paulo da Silva on Wed Sep 29 01:09:15 2021

XPost: alt.os.linux

On 2021-09-27, Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:

Hi all!

From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

As I said, u use AVAST, it is not working on Linux...

--

7-77-777
Evil Sinner!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Wed Sep 29 18:45:37 2021

XPost: alt.os.linux

Às 12:54 de 28/09/21, Carlos escreveu:

On Mon, 27 Sep 2021 20:32:19 +0100, Paulo da Silva wrote:

Às 19:34 de 27/09/21, Carlos E. R. escreveu:

On 27/09/2021 20.02, Paulo da Silva wrote:

Às 18:38 de 27/09/21, Marco Moock escreveu:

...

I wander if there is some kind of script or configuration that forces
the logs not to be buffered. I'll search in the internet ...

Yes. You can send kernel logs directly to another machine via ethernet,
or even better if available, serial port.

Directly from the kernel, mind.

...

Found the bug report :-)

...

Thank you very much Carlos.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Fri Oct 1 04:05:52 2021

XPost: alt.os.linux

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

First let me explain how I use this computer.

I have a starting (boot) command - cpupower - to set the max. freq. to
2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to
powersave. Let me call this Slow Mode.
This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
is quite fast, this isn't noticeable.
When I need power, which rarely happens - training AIs or processing
large amount of data, I set it to the max. freq. and governor to
performance. I also do this for my FS :-) Let me call this Fast Mode.

Now, about this problem ...

1. I configured nvidia to ondemand. The freeze problem never occurred
anymore. But since it could not occur for a month or more, its
inconclusive yet. Anyway, from lots of things I have being reading it is
very likely that BIOS, for some reason, can't cool something going wrong
and just freezes the computer. So, no logs. Once again, inconclusive.

2. A new problem
When in Fast Mode, using a job with fullcpu causes a shutdown.
This time there is a log entry:
"thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

I monitored zone0 temperature and could see it goes up until 102C. Then
the computer initiates the emergency shutdown. So, the monitor gets
probably killed. Notice that the critical temp. for this zone is 100C.

I tried again but when the temp. of zone0 reached 99C I put the fans in
boost mode (max. speed) and the temperature dropped and got stable at 97C.
I tried this again, but now I just put the computer in Slow Mode. The
temp. drops to 40-50C!

So, why neither thermald or even the BIOS use these resources to drop
the temperature? In fact the fans rotate at higher speed but do not
reach the 6k RPM of boost mode. I tried several configurations for
thermald, including give priority to acting on freqs. No success. It
seems that thermald doesn't seem to care at all with its configs.

Finally
=======
If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I
tried this few times last couple of days.
Notice that all these problems are relatively recent.

By the way ... in windows this problem does not occur.

So:
SW problem, after an upgrade perahps? HW problem? Both?
I feel myself lost ...
As soon as I get some time, I'm thinking to install a new distro in a
different partition and see what happens there.
Until there, before I start a CPU intensive job I need to reboot before.
Not bad ... :-)

Thank you to all who responded and for any further comments or suggestions. Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Fri Oct 1 06:42:22 2021

XPost: alt.os.linux

Às 06:29 de 01/10/21, J.O. Aho escreveu:

On 01/10/2021 05.05, Paulo da Silva wrote:

If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I
tried this few times last couple of days.
Notice that all these problems are relatively recent.

If you mean the thermal issue, maybe you need to restart thermald after
you wake up from suspension. It's not unknown that some programs do not
work well with suspension.

I tried that. No luck!
I also played with some configurations, namely giving priority to freqs. because I know that lowering them causes the zone0 temp. to drop quickly.
BTW, in the meanwhile I remembered that the freeze problem also
occurred, at least once, with the system in "Slow Mode" this half of
max. freq. and powersave governor. That's why I suspect of something
related with the GPU - HW or SW.

I would keep an eye open for how much memory plasmashell uses, if you
see it creep over 1G, then it's time to restart it with "plasmashell --replace". Running top/htop once in a while should be ok.

Yes, I had several issues with plasmashell in all my computers :-( . I
have a script to handle them. I don't remember now what it does. Just
keeps working :-)

Thanks

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to Paulo da Silva on Fri Oct 1 07:29:50 2021

XPost: alt.os.linux

On 01/10/2021 05.05, Paulo da Silva wrote:

If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
Notice that all these problems are relatively recent.

If you mean the thermal issue, maybe you need to restart thermald after
you wake up from suspension. It's not unknown that some programs do not
work well with suspension.

I would keep an eye open for how much memory plasmashell uses, if you
see it creep over 1G, then it's time to restart it with "plasmashell --replace". Running top/htop once in a while should be ok.

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Paulo da Silva on Fri Oct 1 03:31:25 2021

XPost: alt.os.linux

On 10/1/2021 1:42 AM, Paulo da Silva wrote:

Às 06:29 de 01/10/21, J.O. Aho escreveu:

On 01/10/2021 05.05, Paulo da Silva wrote:

If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I >>> tried this few times last couple of days.
Notice that all these problems are relatively recent.

If you mean the thermal issue, maybe you need to restart thermald after
you wake up from suspension. It's not unknown that some programs do not
work well with suspension.

I tried that. No luck!
I also played with some configurations, namely giving priority to freqs. because I know that lowering them causes the zone0 temp. to drop quickly. BTW, in the meanwhile I remembered that the freeze problem also
occurred, at least once, with the system in "Slow Mode" this half of
max. freq. and powersave governor. That's why I suspect of something
related with the GPU - HW or SW.

I would keep an eye open for how much memory plasmashell uses, if you
see it creep over 1G, then it's time to restart it with "plasmashell
--replace". Running top/htop once in a while should be ok.

Yes, I had several issues with plasmashell in all my computers :-( . I
have a script to handle them. I don't remember now what it does. Just
keeps working :-)

Thanks

From a hardware perspective, some subsystems share power envelope
because they're in the same package (Intel CPU and Intel HD 630).

Or, they can share a common heatpipe, which means if one gets
hot, both get hot (Intel CPU and NVidia GPU chip share heatpipe).

The NVidia chip, should have an NVidia driver which controls
frequency and voltage as a function of "what limit you're hitting".
On something like Furmark, you would be power limited. Maybe
the GPU driver throttles (turns down clock) when the chip gets
too warm. And this means, you could even be in a situation where
a railed or turboed CPU causes the GPU to slow down.

It's beyond my pay scale, to balance all these things, but from
the looks of it, some feedback loop in your laptop is not working
as expected. When the CPU goes above 100C, it should start throttling.
The NVidia chip should have a throttle temperature too. And the
NVidia throttle point should take the GPU temperature measurement
error into account.

https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks

Somehow, you have to get them to agree when throttling should happen.
The NVidia driver already has this sort of behavior, but something
needs adjustment so the two subsystems, one of them does not "hog" the power envelope, and cause the other subsystem to shut down the computer.

For the digital temperature readout on the Intel CPU, it is most
accurate at the high end, where the throttle point is. I do not
know which measurement point on the NVidia, has the least error,
as the method used is not likely to be exactly the one Intel
uses for Core Temp.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From William Unruh@21:1/5 to Paulo da Silva on Fri Oct 1 13:16:58 2021

XPost: alt.os.linux

I am getting an occasional freeze as well. Yesterday, in the midst of a
Google Meet seminar I was delivering!. iAlmost Complete freeze on my end. No keys worked, screen frozen. Except that the people watching could still
hear me and see me and I could hear them. Alt-ctrl-F2 worked, so Linux was still running in
the background. I could not figure out how to unfreeze the google-meet
full screen and had to do the power button thingy. Of course then
another bug showed up. -- I sometimes run my laptop with a desktop
monitor attached. Often the second or third time I reboot, the system
seems to get completely confused and look for that second monitor as the default after it is run the "new hardware" search. It then times out (90
sec) on starting up akonidia(?) and then another 30 sec pause.starting
up something else, and spew out many pages of error/waring stuff befor
the boot process finished. So it took almost 5 min to reboot in the
midst of my seminar. Sheesh.

(Dell XPS13- 9360 machine, onboard Intel video
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

Mageia 8, kernel
Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

So yes, something in Linux is having problems freezing the system. I
suspect the video driver in my case.

On 2021-10-01, Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

First let me explain how I use this computer.

I have a starting (boot) command - cpupower - to set the max. freq. to
2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to powersave. Let me call this Slow Mode.
This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
is quite fast, this isn't noticeable.
When I need power, which rarely happens - training AIs or processing
large amount of data, I set it to the max. freq. and governor to
performance. I also do this for my FS :-) Let me call this Fast Mode.

Now, about this problem ...

1. I configured nvidia to ondemand. The freeze problem never occurred anymore. But since it could not occur for a month or more, its
inconclusive yet. Anyway, from lots of things I have being reading it is
very likely that BIOS, for some reason, can't cool something going wrong
and just freezes the computer. So, no logs. Once again, inconclusive.

2. A new problem
When in Fast Mode, using a job with fullcpu causes a shutdown.
This time there is a log entry:
"thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

I monitored zone0 temperature and could see it goes up until 102C. Then
the computer initiates the emergency shutdown. So, the monitor gets
probably killed. Notice that the critical temp. for this zone is 100C.

I tried again but when the temp. of zone0 reached 99C I put the fans in
boost mode (max. speed) and the temperature dropped and got stable at 97C.
I tried this again, but now I just put the computer in Slow Mode. The
temp. drops to 40-50C!

So, why neither thermald or even the BIOS use these resources to drop
the temperature? In fact the fans rotate at higher speed but do not
reach the 6k RPM of boost mode. I tried several configurations for
thermald, including give priority to acting on freqs. No success. It
seems that thermald doesn't seem to care at all with its configs.

Finally
=======
If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
Notice that all these problems are relatively recent.

By the way ... in windows this problem does not occur.

So:
SW problem, after an upgrade perahps? HW problem? Both?
I feel myself lost ...
As soon as I get some time, I'm thinking to install a new distro in a different partition and see what happens there.
Until there, before I start a CPU intensive job I need to reboot before.
Not bad ... :-)

Thank you to all who responded and for any further comments or suggestions. Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Fri Oct 1 17:12:44 2021

XPost: alt.os.linux

Às 14:16 de 01/10/21, William Unruh escreveu:

I am getting an occasional freeze as well. Yesterday, in the midst of a Google Meet seminar I was delivering!. iAlmost Complete freeze on my end. No keys worked, screen frozen. Except that the people watching could still
hear me and see me and I could hear them. Alt-ctrl-F2 worked, so Linux was still running in
the background. I could not figure out how to unfreeze the google-meet
full screen and had to do the power button thingy. Of course then
another bug showed up. -- I sometimes run my laptop with a desktop
monitor attached. Often the second or third time I reboot, the system
seems to get completely confused and look for that second monitor as the default after it is run the "new hardware" search. It then times out (90
sec) on starting up akonidia(?) and then another 30 sec pause.starting
up something else, and spew out many pages of error/waring stuff befor
the boot process finished. So it took almost 5 min to reboot in the
midst of my seminar. Sheesh.

(Dell XPS13- 9360 machine, onboard Intel video
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

Mageia 8, kernel
Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

So yes, something in Linux is having problems freezing the system. I
suspect the video driver in my case.

I never had any problem until recently - a couple of months or so. My
computer expired the 2 yrs warranty in June :-)
Unfortunately in my case there is nothing working after the freezes.
Even when it happens while listen to music, the sound entered in a +-1
second loop.

Regards
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Fri Oct 1 17:05:58 2021

XPost: alt.os.linux

Às 08:31 de 01/10/21, Paul escreveu:

On 10/1/2021 1:42 AM, Paulo da Silva wrote:

Às 06:29 de 01/10/21, J.O. Aho escreveu:

On 01/10/2021 05.05, Paulo da Silva wrote:

If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts
again!!! I
tried this few times last couple of days.
Notice that all these problems are relatively recent.

If you mean the thermal issue, maybe you need to restart thermald after
you wake up from suspension. It's not unknown that some programs do not
work well with suspension.

I tried that. No luck!
I also played with some configurations, namely giving priority to freqs.
because I know that lowering them causes the zone0 temp. to drop quickly.
BTW, in the meanwhile I remembered that the freeze problem also
occurred, at least once, with the system in "Slow Mode" this half of
max. freq. and powersave governor. That's why I suspect of something
related with the GPU - HW or SW.

I would keep an eye open for how much memory plasmashell uses, if you
see it creep over 1G, then it's time to restart it with "plasmashell
--replace". Running top/htop once in a while should be ok.

Yes, I had several issues with plasmashell in all my computers :-( . I
have a script to handle them. I don't remember now what it does. Just
keeps working :-)

Thanks

From a hardware perspective, some subsystems share power envelope
because they're in the same package (Intel CPU and Intel HD 630).

Or, they can share a common heatpipe, which means if one gets
hot, both get hot (Intel CPU and NVidia GPU chip share heatpipe).

Ah, this explains why in ondemand the GPU temperature still rises when
using the CPU! Also, I could see that using powersave mode in NVIDIA
settings, which cause the NVIDIA shutdown (it turns off), sometimes the
GPU fan still gets started.

The NVidia chip, should have an NVidia driver which controls
frequency and voltage as a function of "what limit you're hitting".
On something like Furmark, you would be power limited. Maybe
the GPU driver throttles (turns down clock) when the chip gets
too warm. And this means, you could even be in a situation where
a railed or turboed CPU causes the GPU to slow down.

It's supposed that thermald takes actions to low the temperature. Per
the configuration this should happen at 90C. After boot and before any suspension it gets stable at 97C. I don't know what is in control - it
may be the BIOS controlling the fans, something in the kernel or
thermald. After suspension, something fails. The temperature raises
until 110C and the emergency shutdown starts.
BTW, as temperature rises the fans always increase the speed. They never
reach the "boost" RPM however. If I boost them manually, the temperature
drops.

It's beyond my pay scale, to balance all these things, but from
the looks of it, some feedback loop in your laptop is not working
as expected. When the CPU goes above 100C, it should start throttling.
The NVidia chip should have a throttle temperature too. And the
NVidia throttle point should take the GPU temperature measurement
error into account.

As I said before, thermald should have taken actions at 90C. This is the
order of the actions by priority (file /etc/thermald/thermal-cpu-cdev-order.xml):

<CoolingDevice>rapl_controller</CoolingDevice>
<CoolingDevice>intel_pstate</CoolingDevice>
<CoolingDevice>intel_powerclamp</CoolingDevice>
<CoolingDevice>cpufreq</CoolingDevice>
<CoolingDevice>Processor</CoolingDevice>
</CoolingDeviceOrder>

I tried to put the cpufreq line in first place, because I know for sure
that lowring the cpufreq causes the temperature to drop quickly, but
nothing happens. I wonder if thermald is doing anything at all ...

https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks

...
Thanks for your enlightenments and comments.

I' running out of time for this problem.
For now I'm going with reboot whenever I need intense computation
services. I'm also with NVIDIA in ondemand mode.
Lately I'll:
1. Try to compile the last version of thermald.
2. Write a script, for just in case protection, to put as a service, to
lower the freqs. once the temperature reaches 99C, since normally it
gets stable at 97C and the critical is 100C. This should be the role of thermald ...

As times go by I also look for freezes, in the hope that ondemand mode
avoids them.
This is too confused to send the computer for repair.

Thank you Paul.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joerg@21:1/5 to Paulo da Silva on Fri Oct 1 15:44:58 2021

XPost: alt.os.linux

On 10/1/21 9:12 AM, Paulo da Silva wrote:

Às 14:16 de 01/10/21, William Unruh escreveu:

I am getting an occasional freeze as well. Yesterday, in the midst of a
Google Meet seminar I was delivering!.

Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
it was 20 minutes before the meeting.

My tiny ARM-based Arch-Linux box also has the occasional failure. It
just freezes and the CPU gets hot. Sometimes it happens after a few
days, sometimes after a month. No cues in the log. So I gave that one up.

... iAlmost Complete freeze on my end. No
keys worked, screen frozen. Except that the people watching could still
hear me and see me and I could hear them. Alt-ctrl-F2 worked, so Linux was still running in
the background. I could not figure out how to unfreeze the google-meet
full screen and had to do the power button thingy. Of course then
another bug showed up. -- I sometimes run my laptop with a desktop
monitor attached. Often the second or third time I reboot, the system
seems to get completely confused and look for that second monitor as the
default after it is run the "new hardware" search. It then times out (90
sec) on starting up akonidia(?) and then another 30 sec pause.starting
up something else, and spew out many pages of error/waring stuff befor
the boot process finished. So it took almost 5 min to reboot in the
midst of my seminar. Sheesh.

(Dell XPS13- 9360 machine, onboard Intel video
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

Mageia 8, kernel
Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

So yes, something in Linux is having problems freezing the system. I
suspect the video driver in my case.

I never had any problem until recently - a couple of months or so. My computer expired the 2 yrs warranty in June :-)

It would not help you anyhow with an OS crash problem.

Regarding the overtemp I assume you have looked whether there is one
particular software that is very wasteful with processor resources. I
had that with a morse code reading software so I no longer use it, and
don't need it anymore.

If something reaches a temperature limit with the fan fully blasting
that is suspicious. I had that about two years ago and then I found the
reason. We had adopted a dog and his fine hair got in there. So I had to
reduce my PC fan cleaning intervals.

Unfortunately in my case there is nothing working after the freezes.
Even when it happens while listen to music, the sound entered in a +-1
second loop.

That almost cannot be hardware.

--
Regards, Joerg

http://www.analogconsultants.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Sat Oct 2 01:52:28 2021

XPost: alt.os.linux

Às 23:44 de 01/10/21, Joerg escreveu:

On 10/1/21 9:12 AM, Paulo da Silva wrote:

Às 14:16 de 01/10/21, William Unruh escreveu:

I am getting an occasional freeze as well. Yesterday, in the midst of a
Google Meet seminar I was delivering!.

Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
it was 20 minutes before the meeting.

My tiny ARM-based Arch-Linux box also has the occasional failure. It
just freezes and the CPU gets hot. Sometimes it happens after a few
days, sometimes after a month. No cues in the log. So I gave that one up.

... iAlmost Complete freeze on my end. No
keys worked, screen frozen. Except that the people watching could still
hear me and see me and I could hear them. Alt-ctrl-F2 worked, so
Linux was still running in
the background. I could not figure out how to unfreeze the google-meet
full screen and had to do the power button thingy. Of course then
another bug showed up. -- I sometimes run my laptop with a desktop
monitor attached. Often the second or third time I reboot, the system
seems to get completely confused and look for that second monitor as the >>> default after it is run the "new hardware" search. It then times out (90 >>> sec) on starting up akonidia(?) and then another 30 sec pause.starting
up something else, and spew out many pages of error/waring stuff befor
the boot process finished. So it took almost 5 min to reboot in the
midst of my seminar. Sheesh.

(Dell XPS13- 9360 machine, onboard Intel video
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core
Processor Host Bridge/DRAM Registers (rev 02)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620
(rev 02)

Mageia 8, kernel
Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC
2021 x86_64 x86_64 x86_64 GNU/Linux

So yes, something in Linux is having problems freezing the system. I
suspect the video driver in my case.

I never had any problem until recently - a couple of months or so. My
computer expired the 2 yrs warranty in June :-)

It would not help you anyhow with an OS crash problem.

Regarding the overtemp I assume you have looked whether there is one particular software that is very wasteful with processor resources. I
had that with a morse code reading software so I no longer use it, and
don't need it anymore.

If something reaches a temperature limit with the fan fully blasting
that is suspicious. I had that about two years ago and then I found the reason. We had adopted a dog and his fine hair got in there. So I had to reduce my PC fan cleaning intervals.

Here I got that CPU situation lots of times.
I have lots of tasks very CPU/GPU intensive.
Anyway, as soon as I put the PC in Fast mode (max freqs and governor performance) almost anything I do, sometimes even scrolling a browser
page like Fb, causes the fans to rise RPM. Also they come back to almost
idle relatively fast when I just stop.

Unfortunately in my case there is nothing working after the freezes.
Even when it happens while listen to music, the sound entered in a +-1
second loop.

That almost cannot be hardware.

Hopefully not. There is one occurrence which doesn't allow me to discard
HW: From times to times, the fans go up to big RPM (noisy) for about 1
to 5 seconds and then follow down abruptly.The PC is doing nothing. This
also began to occur lately. As much as I know, is the BIOS that controls
the fans.

Thanks Joerg.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Joerg@21:1/5 to Paulo da Silva on Sat Oct 2 11:15:05 2021

XPost: alt.os.linux

On 10/1/21 5:52 PM, Paulo da Silva wrote:

Às 23:44 de 01/10/21, Joerg escreveu:

On 10/1/21 9:12 AM, Paulo da Silva wrote:

Às 14:16 de 01/10/21, William Unruh escreveu:

I am getting an occasional freeze as well. Yesterday, in the midst of a >>>> Google Meet seminar I was delivering!.

Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
it was 20 minutes before the meeting.

My tiny ARM-based Arch-Linux box also has the occasional failure. It
just freezes and the CPU gets hot. Sometimes it happens after a few
days, sometimes after a month. No cues in the log. So I gave that one up.

... iAlmost Complete freeze on my end. No
keys worked, screen frozen. Except that the people watching could still >>>> hear me and see me and I could hear them. Alt-ctrl-F2 worked, so
Linux was still running in
the background. I could not figure out how to unfreeze the google-meet >>>> full screen and had to do the power button thingy. Of course then
another bug showed up. -- I sometimes run my laptop with a desktop
monitor attached. Often the second or third time I reboot, the system
seems to get completely confused and look for that second monitor as the >>>> default after it is run the "new hardware" search. It then times out (90 >>>> sec) on starting up akonidia(?) and then another 30 sec pause.starting >>>> up something else, and spew out many pages of error/waring stuff befor >>>> the boot process finished. So it took almost 5 min to reboot in the
midst of my seminar. Sheesh.

(Dell XPS13- 9360 machine, onboard Intel video
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core
Processor Host Bridge/DRAM Registers (rev 02)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620
(rev 02)

Mageia 8, kernel
Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC
2021 x86_64 x86_64 x86_64 GNU/Linux

So yes, something in Linux is having problems freezing the system. I
suspect the video driver in my case.

I never had any problem until recently - a couple of months or so. My
computer expired the 2 yrs warranty in June :-)

It would not help you anyhow with an OS crash problem.

Regarding the overtemp I assume you have looked whether there is one
particular software that is very wasteful with processor resources. I
had that with a morse code reading software so I no longer use it, and
don't need it anymore.

If something reaches a temperature limit with the fan fully blasting
that is suspicious. I had that about two years ago and then I found the
reason. We had adopted a dog and his fine hair got in there. So I had to
reduce my PC fan cleaning intervals.

Here I got that CPU situation lots of times.
I have lots of tasks very CPU/GPU intensive.
Anyway, as soon as I put the PC in Fast mode (max freqs and governor performance) almost anything I do, sometimes even scrolling a browser
page like Fb, causes the fans to rise RPM. ...

Can you watch the CPU load percentage when that happens? I keep that
reading on the task bar so I can see when something becomes a MIPS
burner. I do the same with memory usage (mainly to see when Firefox has
reached too much memory leakage).

... Also they come back to almost
idle relatively fast when I just stop.

That is strange. When I do lengthy SPICE simulations where the CPU goes
to almost 100% workload the fans remain on full for half a minute or so.

But anyhow, if this huge increase and then decay happens with much less
than 100% CPU load that would point to a mechanical problem. Pet hair in
the fan path, thermal paste under the heatsink dried up, something like
that.

Unfortunately in my case there is nothing working after the freezes.
Even when it happens while listen to music, the sound entered in a +-1
second loop.

That almost cannot be hardware.

Hopefully not. There is one occurrence which doesn't allow me to discard
HW: From times to times, the fans go up to big RPM (noisy) for about 1
to 5 seconds and then follow down abruptly.The PC is doing nothing. This
also began to occur lately. As much as I know, is the BIOS that controls
the fans.

I don't know much about Ubuntu flavors (using MX-Linux myself) but the
fan speed can also be controlled by the OS, depending on how your
Kubuntu is configured:

https://askubuntu.com/questions/22108/how-to-control-fan-speed

Sometimes hardware (or a BIOS) does this on purpose. For example, my
DOCSIS modem for internet access has a fan that never needs to come on
because we never stream movies and stuff like that. Very little work for
the processor. Sometimes the fan still goes to full blast for a few
seconds, then off. I guess they programmed it that way to avoid the fan becoming "caked up" and stuck. Just like with a power generator, you
have to run it once a month or it might not start in a crisis situation.

Thanks Joerg.

As a co-worker once said, we are all here to serve :-)

--
Regards, Joerg

http://www.analogconsultants.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Joerg on Sat Oct 2 14:27:52 2021

XPost: alt.os.linux

On 10/2/2021 2:15 PM, Joerg wrote:

That is strange. When I do lengthy SPICE simulations where the CPU > goes to almost 100% workload the fans remain on full for half a minute or so.

You have a good eye for time there.

One of the Intel turbo boost options, the time constant by default
is one of 28 seconds or 56 seconds. This sounds like the 28 second version.

If you visit one of the enthusiast computer sites, they have
articles on the turbo boost feature. For example, a 65W processor
will jump up to 224W output for 28 seconds, before throttling back.
This accelerates short intense jobs, at the expense of your
nerves :-)

On the overclocker machines, there is also an option on a
number of desktop motherboards, to run the CPU constantly
at 125W, running the CPU clock above baseline until the
compute job is finished. The BIOS setting for this, may not
explain at all, what it is doing. This is why you buy motherboards
with more phases or nicer heatsinks, so that the thing will not
be stressed too much by the behavior.

These are the kinds of things, that you put a Kill-O-Watt meter
on the wall plug, so that you can characterize what kind of
policy is being used at the moment.

This is no longer "it says it's a 65W CPU and it always
draws 65W" era. It is a lot crazier than that. "TDP means
nothing" is the rule of the day.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter@21:1/5 to Paulo da Silva on Sun Oct 3 19:22:20 2021

XPost: alt.os.linux

On 01.10.2021 05:05, Paulo da Silva wrote:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

First let me explain how I use this computer.

I have a starting (boot) command - cpupower - to set the max. freq. to
2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to powersave. Let me call this Slow Mode.
This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
is quite fast, this isn't noticeable.
When I need power, which rarely happens - training AIs or processing
large amount of data, I set it to the max. freq. and governor to
performance. I also do this for my FS :-) Let me call this Fast Mode.

Now, about this problem ...

1. I configured nvidia to ondemand. The freeze problem never occurred anymore. But since it could not occur for a month or more, its
inconclusive yet. Anyway, from lots of things I have being reading it is
very likely that BIOS, for some reason, can't cool something going wrong
and just freezes the computer. So, no logs. Once again, inconclusive.

2. A new problem
When in Fast Mode, using a job with fullcpu causes a shutdown.
This time there is a log entry:
"thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

I monitored zone0 temperature and could see it goes up until 102C. Then
the computer initiates the emergency shutdown. So, the monitor gets
probably killed. Notice that the critical temp. for this zone is 100C.

I tried again but when the temp. of zone0 reached 99C I put the fans in
boost mode (max. speed) and the temperature dropped and got stable at 97C.
I tried this again, but now I just put the computer in Slow Mode. The
temp. drops to 40-50C!

So, why neither thermald or even the BIOS use these resources to drop
the temperature? In fact the fans rotate at higher speed but do not
reach the 6k RPM of boost mode. I tried several configurations for
thermald, including give priority to acting on freqs. No success. It
seems that thermald doesn't seem to care at all with its configs.

Finally
=======
If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
Notice that all these problems are relatively recent.

By the way ... in windows this problem does not occur.

So:
SW problem, after an upgrade perahps? HW problem? Both?
I feel myself lost ...
As soon as I get some time, I'm thinking to install a new distro in a different partition and see what happens there.
Until there, before I start a CPU intensive job I need to reboot before.
Not bad ... :-)

Thank you to all who responded and for any further comments or suggestions. Paulo

Maybe check out GreenWithEnvy (GWE)? It's a Afterburner-like app for
Linux. I do some gaming on my computer that is at times GPU heavy, and I
use GWE to control the GPU fans and temp during heavy GPU load. You set
up a graph for temp and rpm and this controls the fans dynamically.

Peter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Mon Oct 4 19:58:55 2021

XPost: alt.os.linux

Às 19:15 de 02/10/21, Joerg escreveu:

On 10/1/21 5:52 PM, Paulo da Silva wrote:

Às 23:44 de 01/10/21, Joerg escreveu:

On 10/1/21 9:12 AM, Paulo da Silva wrote:

Às 14:16 de 01/10/21, William Unruh escreveu:

I am getting an occasional freeze as well. Yesterday, in the midst
of a
Google Meet seminar I was delivering!.

Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
it was 20 minutes before the meeting.

My tiny ARM-based Arch-Linux box also has the occasional failure. It
just freezes and the CPU gets hot. Sometimes it happens after a few
days, sometimes after a month. No cues in the log. So I gave that one
up.

... iAlmost Complete freeze on my end. No
keys worked, screen frozen. Except that the people watching could
still
hear me and see me and I could hear them. Alt-ctrl-F2 worked, so
Linux was still running in
the background. I could not figure out how to unfreeze the google-meet >>>>> full screen and had to do the power button thingy. Of course then
another bug showed up. -- I sometimes run my laptop with a desktop
monitor attached. Often the second or third time I reboot, the system >>>>> seems to get completely confused and look for that second monitor
as the
default after it is run the "new hardware" search. It then times
out (90
sec) on starting up akonidia(?) and then another 30 sec pause.starting >>>>> up something else, and spew out many pages of error/waring stuff befor >>>>> the boot process finished. So it took almost 5 min to reboot in the
midst of my seminar. Sheesh.

(Dell XPS13- 9360 machine, onboard Intel video
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core
Processor Host Bridge/DRAM Registers (rev 02)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 >>>>> (rev 02)

Mageia 8, kernel
Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC
2021 x86_64 x86_64 x86_64 GNU/Linux

So yes, something in Linux is having problems freezing the system. I >>>>> suspect the video driver in my case.

I never had any problem until recently - a couple of months or so. My
computer expired the 2 yrs warranty in June :-)

It would not help you anyhow with an OS crash problem.

Regarding the overtemp I assume you have looked whether there is one
particular software that is very wasteful with processor resources. I
had that with a morse code reading software so I no longer use it, and
don't need it anymore.

If something reaches a temperature limit with the fan fully blasting
that is suspicious. I had that about two years ago and then I found the
reason. We had adopted a dog and his fine hair got in there. So I had to >>> reduce my PC fan cleaning intervals.

Here I got that CPU situation lots of times.
I have lots of tasks very CPU/GPU intensive.
Anyway, as soon as I put the PC in Fast mode (max freqs and governor
performance) almost anything I do, sometimes even scrolling a browser
page like Fb, causes the fans to rise RPM. ...

Can you watch the CPU load percentage when that happens? I keep that
reading on the task bar so I can see when something becomes a MIPS
burner. I do the same with memory usage (mainly to see when Firefox has reached too much memory leakage).

This has nothing to do with overload. It largely depends on clock
frequences which are also caused by governor "performance".
As soon as cpus work fans tend to rise rpms. It is not needed to much
work. The same happens always with windows where I was no able to
control these things.

... Also they come back to almost
idle relatively fast when I just stop.

That is strange. When I do lengthy SPICE simulations where the CPU goes
to almost 100% workload the fans remain on full for half a minute or so.

But anyhow, if this huge increase and then decay happens with much less
than 100% CPU load that would point to a mechanical problem. Pet hair in
the fan path, thermal paste under the heatsink dried up, something like
that.

No problems hw problems at this level here, for sure.
The system correctly handled all temperature stuff until recently.
Aside from the strange freeze problem - it didn't occur anymore so far!
- the pc correctly handle the fullcpu temperatures except after suspend
to RAM/wake. This didn't happen before. Probably some update jeopardized
the system. It also works fine for fullcpu in windows.

Unfortunately in my case there is nothing working after the freezes.
Even when it happens while listen to music, the sound entered in a +-1 >>>> second loop.

That almost cannot be hardware.

Hopefully not. There is one occurrence which doesn't allow me to discard
HW: From times to times, the fans go up to big RPM (noisy) for about 1
to 5 seconds and then follow down abruptly.The PC is doing nothing. This
also began to occur lately. As much as I know, is the BIOS that controls
the fans.

I don't know much about Ubuntu flavors (using MX-Linux myself) but the
fan speed can also be controlled by the OS, depending on how your
Kubuntu is configured:

https://askubuntu.com/questions/22108/how-to-control-fan-speed

Sometimes hardware (or a BIOS) does this on purpose. For example, my
DOCSIS modem for internet access has a fan that never needs to come on because we never stream movies and stuff like that. Very little work for
the processor. Sometimes the fan still goes to full blast for a few
seconds, then off. I guess they programmed it that way to avoid the fan becoming "caked up" and stuck. Just like with a power generator, you
have to run it once a month or it might not start in a crisis situation.

May be. Those situations never occurred anymore! I'm not having pikes of
RPM rising now :-)

Let's see what happens.
So far, running in low mode and need to be careful when in fast mode
rebooting first. As soon as possible, I'll write a small protection
script to lower freqs. when temperature goes 99C or more on zone0.

BTW, I tried the latest version of thermald. No success!!! I still can't understand why termald "refuses" to work! Look to the source is a no go
for me. Too much work ...

Thanks.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E.R.@21:1/5 to Paulo da Silva on Sun Oct 17 13:07:54 2021

XPost: alt.os.linux

On 01/10/2021 05.05, Paulo da Silva wrote:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

...

First let me explain how I use this computer.

I have a starting (boot) command - cpupower - to set the max. freq. to
2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to powersave. Let me call this Slow Mode.
This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
is quite fast, this isn't noticeable.
When I need power, which rarely happens - training AIs or processing
large amount of data, I set it to the max. freq. and governor to
performance. I also do this for my FS :-) Let me call this Fast Mode.

Now, about this problem ...

1. I configured nvidia to ondemand. The freeze problem never occurred anymore. But since it could not occur for a month or more, its
inconclusive yet. Anyway, from lots of things I have being reading it is
very likely that BIOS, for some reason, can't cool something going wrong
and just freezes the computer. So, no logs. Once again, inconclusive.

2. A new problem
When in Fast Mode, using a job with fullcpu causes a shutdown.
This time there is a log entry:
"thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

I monitored zone0 temperature and could see it goes up until 102C. Then
the computer initiates the emergency shutdown. So, the monitor gets
probably killed. Notice that the critical temp. for this zone is 100C.

I tried again but when the temp. of zone0 reached 99C I put the fans in
boost mode (max. speed) and the temperature dropped and got stable at 97C.
I tried this again, but now I just put the computer in Slow Mode. The
temp. drops to 40-50C!

So, why neither thermald or even the BIOS use these resources to drop
the temperature? In fact the fans rotate at higher speed but do not
reach the 6k RPM of boost mode. I tried several configurations for
thermald, including give priority to acting on freqs. No success. It
seems that thermald doesn't seem to care at all with its configs.

Finally
=======
If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
Notice that all these problems are relatively recent.

By the way ... in windows this problem does not occur.

So:
SW problem, after an upgrade perahps? HW problem? Both?
I feel myself lost ...
As soon as I get some time, I'm thinking to install a new distro in a different partition and see what happens there.
Until there, before I start a CPU intensive job I need to reboot before.
Not bad ... :-)

Thank you to all who responded and for any further comments or suggestions. Paulo

I have used two machines with limited cooling; one is a mini computer
box, fanless (idea is to be put on sitting room by the TV). When it is
doing something intense, it overheats and it throttles the CPU down.
Another is a laptop I prepared for another person, with a relatively
fast processor that can overheat if you demand some job for minutes, and
then it throttles down.

Both seem to be designed for this; be running normally with a small
load, but sprint on demand if the user needs to run something. But they
can not keep up the load for a long time because they have no fan, or a
too small fan.

Now, I did not install any daemon or configure anything, it was the
kernel itself doing it all, our of the box.

Both have only Intel graphics.

The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is "Intel(R) Pentium(R) CPU N3710 @ 1.60GHz" (4 cores)

The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB SSD/15.6"

In both cases I installed openSUSE Leap 15

--
Cheers, Carlos.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Sun Oct 17 19:15:34 2021

XPost: alt.os.linux

Às 12:07 de 17/10/21, Carlos E.R. escreveu:

On 01/10/2021 05.05, Paulo da Silva wrote:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

...

First let me explain how I use this computer.

I have a starting (boot) command - cpupower - to set the max. freq. to
2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to
powersave. Let me call this Slow Mode.
This causes my computer to be quiet with very low RPM on both fans. When
unplugged from the charger they are most of the time at 0 RPM. Since it
is quite fast, this isn't noticeable.
When I need power, which rarely happens - training AIs or processing
large amount of data, I set it to the max. freq. and governor to
performance. I also do this for my FS :-) Let me call this Fast Mode.

Now, about this problem ...

1. I configured nvidia to ondemand. The freeze problem never occurred
anymore. But since it could not occur for a month or more, its
inconclusive yet. Anyway, from lots of things I have being reading it is
very likely that BIOS, for some reason, can't cool something going wrong
and just freezes the computer. So, no logs. Once again, inconclusive.

2. A new problem
When in Fast Mode, using a job with fullcpu causes a shutdown.
This time there is a log entry:
"thermal thermal_zone0: critical temperature reached (110 C), shutting
down"
So, I tried to analyze the problem.

I monitored zone0 temperature and could see it goes up until 102C. Then
the computer initiates the emergency shutdown. So, the monitor gets
probably killed. Notice that the critical temp. for this zone is 100C.

I tried again but when the temp. of zone0 reached 99C I put the fans in
boost mode (max. speed) and the temperature dropped and got stable at
97C.
I tried this again, but now I just put the computer in Slow Mode. The
temp. drops to 40-50C!

So, why neither thermald or even the BIOS use these resources to drop
the temperature? In fact the fans rotate at higher speed but do not
reach the 6k RPM of boost mode. I tried several configurations for
thermald, including give priority to acting on freqs. No success. It
seems that thermald doesn't seem to care at all with its configs.

Finally
=======
If I reboot the computer:
Then it seems OK.
I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
stable at 97C!
If I suspend the computer, when restarting the problem starts again!!! I
tried this few times last couple of days.
Notice that all these problems are relatively recent.

By the way ... in windows this problem does not occur.

So:
SW problem, after an upgrade perahps? HW problem? Both?
I feel myself lost ...
As soon as I get some time, I'm thinking to install a new distro in a
different partition and see what happens there.
Until there, before I start a CPU intensive job I need to reboot before.
Not bad ... :-)

Thank you to all who responded and for any further comments or
suggestions.
Paulo

I have used two machines with limited cooling; one is a mini computer
box, fanless (idea is to be put on sitting room by the TV). When it is
doing something intense, it overheats and it throttles the CPU down.
Another is a laptop I prepared for another person, with a relatively
fast processor that can overheat if you demand some job for minutes, and
then it throttles down.

Both seem to be designed for this; be running normally with a small
load, but sprint on demand if the user needs to run something. But they
can not keep up the load for a long time because they have no fan, or a
too small fan.

Now, I did not install any daemon or configure anything, it was the
kernel itself doing it all, our of the box.

Both have only Intel graphics.

The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is "Intel(R) Pentium(R) CPU N3710 @ 1.60GHz" (4 cores)

The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB SSD/15.6"

In both cases I installed openSUSE Leap 15

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are
able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

About Opensuse ... that was the best and more stable distro I have ever
used. I dropped it because the problem of install certain type of SW -
lack of information or packages, and the unavailability of some library
sources for development. In debian likes I just need to install <lib
name>-dev. One example was libgcrypt20.

Regards
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From tom@21:1/5 to Paulo da Silva on Mon Oct 18 00:38:11 2021

XPost: alt.os.linux

On Mon, 27 Sep 2021 18:22:21 +0100
Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:

Hi all!

From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

sounds like something to do with the ram. Disable XMP.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E.R.@21:1/5 to Paulo da Silva on Mon Oct 18 14:26:29 2021

XPost: alt.os.linux

On 17/10/2021 20.15, Paulo da Silva wrote:

Às 12:07 de 17/10/21, Carlos E.R. escreveu:

I have used two machines with limited cooling; one is a mini computer
box, fanless (idea is to be put on sitting room by the TV). When it is
doing something intense, it overheats and it throttles the CPU down.
Another is a laptop I prepared for another person, with a relatively
fast processor that can overheat if you demand some job for minutes, and
then it throttles down.

Both seem to be designed for this; be running normally with a small
load, but sprint on demand if the user needs to run something. But they
can not keep up the load for a long time because they have no fan, or a
too small fan.

Now, I did not install any daemon or configure anything, it was the
kernel itself doing it all, our of the box.

Both have only Intel graphics.

The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is
"Intel(R) Pentium(R) CPU N3710 @ 1.60GHz" (4 cores)

The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB SSD/15.6" >>
In both cases I installed openSUSE Leap 15

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are
able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

Isengard:~ # ps afx | grep thermal
615 ? I< 0:00 \_ [acpi_thermal_pm]
23830 pts/23 S+ 0:00 \_ grep --color=auto thermal
Isengard:~ #

I'm not running thermald.

About Opensuse ... that was the best and more stable distro I have ever
used. I dropped it because the problem of install certain type of SW -
lack of information or packages, and the unavailability of some library sources for development. In debian likes I just need to install <lib name>-dev. One example was libgcrypt20.

What? All sources are available in openSUSE.

http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm

You just need to activate the sources repo in YaST. If some particular
package is missing the source, declare a bug.

If you just need the files to compile some other thing, you need the libname-devel package instead.

http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm

--
Cheers, Carlos.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to Paulo da Silva on Mon Oct 18 15:47:25 2021

XPost: alt.os.linux

On 17/10/2021 20.15, Paulo da Silva wrote:

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are
able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

I know I did tell you to test to reload the the thermald service and you
said it didn't make any difference, what about
- stop thermald
- rmmod the cpu temp module
- modprobe the cpu temp module
- start thermald

I'm not even sure if you can remove the module.

About Opensuse ... that was the best and more stable distro I have ever
used. I dropped it because the problem of install certain type of SW -
lack of information or packages, and the unavailability of some library sources for development.

I did run OpenSuSe at my two previous jobs, sure there was shortcoming
with getting packages, but as Carlos already pointed out the dev
packages are in a different repository. And of course you can get hold
of all the SRPMs too in case you want to make some changes to a package.

It's not the distro I would use at home, for me metadistributions has
been more in my taste except the time it takes to build all the packages.

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to tom on Mon Oct 18 11:48:36 2021

XPost: alt.os.linux

On 10/18/2021 3:38 AM, tom wrote:

On Mon, 27 Sep 2021 18:22:21 +0100
Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:

Hi all!

From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

sounds like something to do with the ram. Disable XMP.

But it's a freezing problem.

If the memory was bad, you'd expect the odd crash. Linux
seems to be pretty resistant to bad memory (like kernel panic),
when I tested with some unstable memory, so I don't think
the symptom description is a good match for it.

*******

To test memory, the latest memory test...
The download is compressed, so it's 9MB or so,
but expands to a larger file in Archive Manager.

https://www.memtest86.com/downloads/memtest86-usb.zip

memtest86-usb.img 500*1048576 bytes, nice for dd to USB stick

It can be "dd" transferred to a USB stick. It's a little
slow at startup, as it sniff around the hardware, but
the traditional memory test interface eventually appears.
This would be good for that new UEFI-only PC you bought.
(My old copy of memtest would not run, because it got
into a boot loop with the GOP video code. It would
restart every time the screen tried to update.)

My processor draws 30W while that is running,
versus 65W while Prime95 does a thermal test.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Tue Oct 19 01:49:32 2021

XPost: alt.os.linux

Às 13:26 de 18/10/21, Carlos E.R. escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

Às 12:07 de 17/10/21, Carlos E.R. escreveu:

I have used two machines with limited cooling; one is a mini computer
box, fanless (idea is to be put on sitting room by the TV). When it is
doing something intense, it overheats and it throttles the CPU down.
Another is a laptop I prepared for another person, with a relatively
fast processor that can overheat if you demand some job for minutes, and >>> then it throttles down.

Both seem to be designed for this; be running normally with a small
load, but sprint on demand if the user needs to run something. But they
can not keep up the load for a long time because they have no fan, or a
too small fan.

Now, I did not install any daemon or configure anything, it was the
kernel itself doing it all, our of the box.

Both have only Intel graphics.

The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is
"Intel(R) Pentium(R) CPU N3710 @ 1.60GHz" (4 cores)

The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB
SSD/15.6"

In both cases I installed openSUSE Leap 15

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are
able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

Isengard:~ # ps afx | grep thermal
615 ? I< 0:00 \_ [acpi_thermal_pm]
23830 pts/23 S+ 0:00 \_ grep --color=auto thermal
Isengard:~ #

I'm not running thermald.

Yes! The BIOS and/or the kernel should be enough to avoid temperatures problems. thermald, should at least be a last resource protection.
None of them avoid the temperature from rising after suspension!
At least one of them does before any suspension occurred. The
temperature never rises above 97ºC.

About Opensuse ... that was the best and more stable distro I have ever
used. I dropped it because the problem of install certain type of SW -
lack of information or packages, and the unavailability of some library
sources for development. In debian likes I just need to install <lib
name>-dev. One example was libgcrypt20.

What? All sources are available in openSUSE.

http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm

You just need to activate the sources repo in YaST. If some particular package is missing the source, declare a bug.

If you just need the files to compile some other thing, you need the libname-devel package instead.

http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm

Yes, now they are. But they weren't when I needed them.
May be I'll give OS a try again.

Regards.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Tue Oct 19 02:14:47 2021

XPost: alt.os.linux

Às 14:47 de 18/10/21, J.O. Aho escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are
able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

I know I did tell you to test to reload the the thermald service and you
said it didn't make any difference, what about
- stop thermald
- rmmod the cpu temp module
- modprobe the cpu temp module
- start thermald

I'm not even sure if you can remove the module.

Good idea, but unfortunately it didn't work!
I managed to remove all thermal related modules and installed them
again. No success! Temp keeps rising until I kill the full cpu test script!

About Opensuse ... that was the best and more stable distro I have ever
used. I dropped it because the problem of install certain type of SW -
lack of information or packages, and the unavailability of some library
sources for development.

I did run OpenSuSe at my two previous jobs, sure there was shortcoming
with getting packages, but as Carlos already pointed out the dev
packages are in a different repository. And of course you can get hold
of all the SRPMs too in case you want to make some changes to a package.

It's not the distro I would use at home, for me metadistributions has
been more in my taste except the time it takes to build all the packages.

A few years ago I used Gentoo for long time.
Them it became too boring managing all that stuff of configurations, use
flags, ...
Besides, from times to times I get some compilations problems and needed
to make some "hacks".

Thanks.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Tue Oct 19 03:08:11 2021

XPost: alt.os.linux

Às 01:49 de 19/10/21, Paulo da Silva escreveu:

Às 13:26 de 18/10/21, Carlos E.R. escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

Às 12:07 de 17/10/21, Carlos E.R. escreveu:

I have used two machines with limited cooling; one is a mini computer
box, fanless (idea is to be put on sitting room by the TV). When it is >>>> doing something intense, it overheats and it throttles the CPU down.
Another is a laptop I prepared for another person, with a relatively
fast processor that can overheat if you demand some job for minutes, and >>>> then it throttles down.

Both seem to be designed for this; be running normally with a small
load, but sprint on demand if the user needs to run something. But they >>>> can not keep up the load for a long time because they have no fan, or a >>>> too small fan.

Now, I did not install any daemon or configure anything, it was the
kernel itself doing it all, our of the box.

Both have only Intel graphics.

The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is >>>> "Intel(R) Pentium(R) CPU N3710 @ 1.60GHz" (4 cores)

The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB
SSD/15.6"

In both cases I installed openSUSE Leap 15

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are >>> able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

Isengard:~ # ps afx | grep thermal
615 ? I< 0:00 \_ [acpi_thermal_pm]
23830 pts/23 S+ 0:00 \_ grep --color=auto thermal
Isengard:~ #

I'm not running thermald.

Yes! The BIOS and/or the kernel should be enough to avoid temperatures problems. thermald, should at least be a last resource protection.
None of them avoid the temperature from rising after suspension!
At least one of them does before any suspension occurred. The
temperature never rises above 97ºC.

I tried to reboot, stopped thermald and the temperature at full cpu
still gets stable at 97ºC. So, thermald seems to be doing nothing at all.

Regards.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From J.O. Aho@21:1/5 to Paulo da Silva on Tue Oct 19 07:55:56 2021

XPost: alt.os.linux

On 19/10/2021 03.14, Paulo da Silva wrote:

Às 14:47 de 18/10/21, J.O. Aho escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are >>> able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

I know I did tell you to test to reload the the thermald service and you
said it didn't make any difference, what about
- stop thermald
- rmmod the cpu temp module
- modprobe the cpu temp module
- start thermald

I'm not even sure if you can remove the module.

Good idea, but unfortunately it didn't work!
I managed to remove all thermal related modules and installed them
again. No success! Temp keeps rising until I kill the full cpu test script!

Take a look at this thread at github: https://github.com/intel/thermal_daemon/issues/268

In the comment https://github.com/intel/thermal_daemon/issues/268#issuecomment-788709112
it's mentioned that the thermald works after suspension after a patched
version was used.

As I understand you can increase the debug information to get more info
about what thermald is doing, that could maybe help while trying to
figure it out.

--

//Aho

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Wed Oct 20 00:09:05 2021

XPost: alt.os.linux

Às 06:55 de 19/10/21, J.O. Aho escreveu:

On 19/10/2021 03.14, Paulo da Silva wrote:

Às 14:47 de 18/10/21, J.O. Aho escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they
are
able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

I know I did tell you to test to reload the the thermald service and you >>> said it didn't make any difference, what about
  - stop thermald
  - rmmod the cpu temp module
  - modprobe the cpu temp module
  - start thermald

I'm not even sure if you can remove the module.

Good idea, but unfortunately it didn't work!
I managed to remove all thermal related modules and installed them
again. No success! Temp keeps rising until I kill the full cpu test
script!

Take a look at this thread at github: https://github.com/intel/thermal_daemon/issues/268

In the comment https://github.com/intel/thermal_daemon/issues/268#issuecomment-788709112 it's mentioned that the thermald works after suspension after a patched version was used.

As I understand you can increase the debug information to get more info
about what thermald is doing, that could maybe help while trying to
figure it out.

I'll try that. Not much hope, however.
The patch is included in the last version.
With the version of kubuntu 20.04:
- I have tried --adaptative and --ignore-cpuid--check. It didn't
complain but I could not determine if they are both active.

It should be expectable that the patch was back ported to kubuntu 20.04.
Anyway ... I'll try the last version again, but this time with both
switches active, to see what happens.

Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Wed Oct 20 02:41:46 2021

XPost: alt.os.linux

Às 00:09 de 20/10/21, Paulo da Silva escreveu:

Às 06:55 de 19/10/21, J.O. Aho escreveu:

On 19/10/2021 03.14, Paulo da Silva wrote:

Às 14:47 de 18/10/21, J.O. Aho escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising? >>>>> At least the sensors are working - I can monitor them and, at least, >>>>> lowering the CPU's freqs result in temps lowering. Also the fans are >>>>> able to go to higher RPM. If I manually put them in boost mode, they >>>>> are
able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

I know I did tell you to test to reload the the thermald service and you >>>> said it didn't make any difference, what about
  - stop thermald
  - rmmod the cpu temp module
  - modprobe the cpu temp module
  - start thermald

I'm not even sure if you can remove the module.

Good idea, but unfortunately it didn't work!
I managed to remove all thermal related modules and installed them
again. No success! Temp keeps rising until I kill the full cpu test
script!

Take a look at this thread at github:
https://github.com/intel/thermal_daemon/issues/268

In the comment
https://github.com/intel/thermal_daemon/issues/268#issuecomment-788709112
it's mentioned that the thermald works after suspension after a patched
version was used.

As I understand you can increase the debug information to get more info
about what thermald is doing, that could maybe help while trying to
figure it out.

I'll try that. Not much hope, however.
The patch is included in the last version.
With the version of kubuntu 20.04:
- I have tried --adaptative and --ignore-cpuid--check. It didn't
complain but I could not determine if they are both active.

It should be expectable that the patch was back ported to kubuntu 20.04. Anyway ... I'll try the last version again, but this time with both
switches active, to see what happens.

And NO :-(
Not working, same symptoms.
For some reason, bios and/or kernel does not stop temperature from
rising after suspension and thermald seems to have no role on this.
Removing it does not change anything.

Log was not very ellucidative for me. The only message with some sense
is something that says it's too early for acting or something like that.
When I get some patience I'll give it another try.

Thanks anyway.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E.R.@21:1/5 to Paulo da Silva on Wed Oct 20 14:21:38 2021

XPost: alt.os.linux

On 19/10/2021 02.49, Paulo da Silva wrote:

Às 13:26 de 18/10/21, Carlos E.R. escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

Às 12:07 de 17/10/21, Carlos E.R. escreveu:

I have used two machines with limited cooling; one is a mini computer
box, fanless (idea is to be put on sitting room by the TV). When it is >>>> doing something intense, it overheats and it throttles the CPU down.
Another is a laptop I prepared for another person, with a relatively
fast processor that can overheat if you demand some job for minutes, and >>>> then it throttles down.

Both seem to be designed for this; be running normally with a small
load, but sprint on demand if the user needs to run something. But they >>>> can not keep up the load for a long time because they have no fan, or a >>>> too small fan.

Now, I did not install any daemon or configure anything, it was the
kernel itself doing it all, our of the box.

Both have only Intel graphics.

The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is >>>> "Intel(R) Pentium(R) CPU N3710 @ 1.60GHz" (4 cores)

The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB
SSD/15.6"

In both cases I installed openSUSE Leap 15

That's the main point, Carlos. Why doesn't my PC (kernel, bios,
whatever) is unable to control the temperature after suspend/wake?
Besides, why thermald also seems to do anything to stop temp rising?
At least the sensors are working - I can monitor them and, at least,
lowering the CPU's freqs result in temps lowering. Also the fans are
able to go to higher RPM. If I manually put them in boost mode, they are >>> able to stop the temp rising!
Immediately after (re)boot the system never goes above 97ºC!

Isengard:~ # ps afx | grep thermal
615 ? I< 0:00 \_ [acpi_thermal_pm]
23830 pts/23 S+ 0:00 \_ grep --color=auto thermal
Isengard:~ #

I'm not running thermald.

Yes! The BIOS and/or the kernel should be enough to avoid temperatures problems. thermald, should at least be a last resource protection.
None of them avoid the temperature from rising after suspension!
At least one of them does before any suspension occurred. The
temperature never rises above 97ºC.

I have no personal experience with thermald, so I can't offer advice on it.

About Opensuse ... that was the best and more stable distro I have ever
used. I dropped it because the problem of install certain type of SW -
lack of information or packages, and the unavailability of some library
sources for development. In debian likes I just need to install <lib
name>-dev. One example was libgcrypt20.

What? All sources are available in openSUSE.

http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm

You just need to activate the sources repo in YaST. If some particular
package is missing the source, declare a bug.

If you just need the files to compile some other thing, you need the
libname-devel package instead.

http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm

Yes, now they are. But they weren't when I needed them.
May be I'll give OS a try again.

In the case a source package is missing, just declare a bug.

I saw yesterday this command to zypper:

source-install (si) name...
Install specified source packages and their build
dependencies. If the name of a binary package is given, the
corresponding source package is looked up and installed instead.

This command will try to find the newest available versions
of the source packages and uses rpm -i to install them, optionally
together with all the packages that are required to build the source
package. The default location where rpm installs source packages to is /usr/src/packages/{SPECS,SOURCES}, but the values can be changed in your
local rpm configuration. In case of doubt try executing rpm --eval
"%{_specdir} and %{_sourcedir}".

Note that the source packages must be available in
repositories you are using. You can check whether a repository contains
any source packages using the following command:

$ zypper search -t srcpackage -r alias|name|#|URI

$ zypper search -t srcpackage -r alias|name|#|URI

--
Cheers, Carlos.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Wed Oct 20 18:22:08 2021

XPost: alt.os.linux

Às 13:21 de 20/10/21, Carlos E.R. escreveu:

On 19/10/2021 02.49, Paulo da Silva wrote:

Às 13:26 de 18/10/21, Carlos E.R. escreveu:

On 17/10/2021 20.15, Paulo da Silva wrote:

Às 12:07 de 17/10/21, Carlos E.R. escreveu:

...

About Opensuse ... that was the best and more stable distro I have ever >>>> used. I dropped it because the problem of install certain type of SW - >>>> lack of information or packages, and the unavailability of some library >>>> sources for development. In debian likes I just need to install <lib
name>-dev. One example was libgcrypt20.

What? All sources are available in openSUSE.

http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm

You just need to activate the sources repo in YaST. If some particular
package is missing the source, declare a bug.

If you just need the files to compile some other thing, you need the
libname-devel package instead.

http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm

Yes, now they are. But they weren't when I needed them.
May be I'll give OS a try again.

In the case a source package is missing, just declare a bug.

I saw yesterday this command to zypper:

       source-install (si) name...
           Install specified source packages and their build dependencies. If the name of a binary package is given, the
corresponding source package is looked up and installed instead.

           This command will try to find the newest available versions
of the source packages and uses rpm -i to install them, optionally
together with all the packages that are required to build the source
package. The default location where rpm installs source packages to is /usr/src/packages/{SPECS,SOURCES}, but the values can be changed in your local rpm configuration. In case of doubt try executing rpm --eval "%{_specdir} and %{_sourcedir}".

           Note that the source packages must be available in repositories you are using. You can check whether a repository contains
any source packages using the following command:

               $ zypper search -t srcpackage -r alias|name|#|URI

               $ zypper search -t srcpackage -r alias|name|#|URI

OK, let's say I want to give opensuse a try.

Let's say I install it and it still cannot handle my temperature
problem. I need to check this before I go into install and configure all
SW I use. This takes a couple of weeks.
How to delete it?

I know I did it in the past, but just to be sure ... is it:

1. boot into my actual system.
2. do grub-install or grub-install /dev/nvme0n1 (disk)?
3. efibootmgr -B -b <bootnum>?
4. Do I need further cleans in /boot/efi?

Is this enough?

Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E.R.@21:1/5 to Paulo da Silva on Thu Oct 21 00:19:53 2021

XPost: alt.os.linux

On 20/10/2021 19.22, Paulo da Silva wrote:

Às 13:21 de 20/10/21, Carlos E.R. escreveu:

OK, let's say I want to give opensuse a try.

Let's say I install it and it still cannot handle my temperature
problem. I need to check this before I go into install and configure all
SW I use. This takes a couple of weeks.
How to delete it?

I know I did it in the past, but just to be sure ... is it:

1. boot into my actual system.
2. do grub-install or grub-install /dev/nvme0n1 (disk)?

I don't think you need that one.

3. efibootmgr -B -b <bootnum>?

Yes.

4. Do I need further cleans in /boot/efi?

You can erase the directory /boot/efi/EFI/opensuse, and of course the
root partition.

Maybe you could try one of the live versions, put it under load, and see
what happens with the temps and the fans. It is not fully reliable, but
it is faster.

--
Cheers, Carlos.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ordinary Poster@21:1/5 to Paulo da Silva on Thu Oct 21 00:04:44 2021

XPost: alt.os.linux

On 20/10/2021 18:22, Paulo da Silva wrote:

OK, let's say I want to give opensuse a try.

People just use a live Flash drive to try things. They don't install
anything.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Thu Oct 21 02:25:58 2021

XPost: alt.os.linux

Às 02:19 de 21/10/21, Paulo da Silva escreveu:

Às 23:19 de 20/10/21, Carlos E.R. escreveu:

On 20/10/2021 19.22, Paulo da Silva wrote:

Às 13:21 de 20/10/21, Carlos E.R. escreveu:

...

Maybe you could try one of the live versions, put it under load, and see
what happens with the temps and the fans. It is not fully reliable, but
it is faster.

Is there a simple way to prepare a pen with r/w permissions from the
iso? I remember to use unetbootin, or something like that, to do it, but
it stopped working at a given point. Since then I have been using dd,
but this makes the pen readonly.
I would like to update the system, make some trivial confs, and install
some sw and it would be nice to make them permanent.
Don't take time with this if you don't know. In the meanwhile I'll
search the net and test on a VM.

Just one more question I forgot ...
Is it the same to install from the live image or is it better to
download the installer image? I'm asking because I never found a distro
with both images.

Thank you.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Thu Oct 21 02:19:20 2021

XPost: alt.os.linux

Às 23:19 de 20/10/21, Carlos E.R. escreveu:

On 20/10/2021 19.22, Paulo da Silva wrote:

Às 13:21 de 20/10/21, Carlos E.R. escreveu:

OK, let's say I want to give opensuse a try.

Let's say I install it and it still cannot handle my temperature
problem. I need to check this before I go into install and configure all
SW I use. This takes a couple of weeks.
How to delete it?

I know I did it in the past, but just to be sure ... is it:

1. boot into my actual system.
2. do grub-install or grub-install /dev/nvme0n1 (disk)?

I don't think you need that one.

Are you sure? What if I remove that partition content? Doesn't grub need
it? I am asking because I always believed (without fundament) that there
is always a main system for boot.

3. efibootmgr -B -b <bootnum>?

Yes.

4. Do I need further cleans in /boot/efi?

You can erase the directory /boot/efi/EFI/opensuse, and of course the
root partition.

Maybe you could try one of the live versions, put it under load, and see
what happens with the temps and the fans. It is not fully reliable, but
it is faster.

Is there a simple way to prepare a pen with r/w permissions from the
iso? I remember to use unetbootin, or something like that, to do it, but
it stopped working at a given point. Since then I have been using dd,
but this makes the pen readonly.
I would like to update the system, make some trivial confs, and install
some sw and it would be nice to make them permanent.
Don't take time with this if you don't know. In the meanwhile I'll
search the net and test on a VM.

Thanks
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David W. Hodgins@21:1/5 to All on Wed Oct 20 21:58:24 2021

XPost: alt.os.linux

On Wed, 20 Oct 2021 21:25:58 -0400, Paulo da Silva

Is there a simple way to prepare a pen with r/w permissions from the
iso? I remember to use unetbootin, or something like that, to do it, but
it stopped working at a given point. Since then I have been using dd,
but this makes the pen readonly.
I would like to update the system, make some trivial confs, and install
some sw and it would be nice to make them permanent.
Don't take time with this if you don't know. In the meanwhile I'll
search the net and test on a VM.

For Mageia, the isodumper program/package from the Mageia repos. When writing an
image to a usb stick, with the option to add a persistent partition selected, it
uses dd to write the image, then adds an ext4 partition to the remaining space with
the label mgalive-persist. The Mageia live iso images look for the partition, and if
found mounts it as an overlayfs so all changes made, including installing additional
packages, are stored for later use.

Just one more question I forgot ...
Is it the same to install from the live image or is it better to
download the installer image? I'm asking because I never found a distro
with both images.

When installing from a live iso, the contents of the iso (all files seen when it's booted, not the iso file itself) are copied to the selected/mounted file systems. If installing while running in live mode, and selecting the install
from the running live system, the changes made in live mode, including those stored in the mgalive-persist file system, are included.

I expect other distros that support persistence use similar packages and methods.

Regards, Dave Hodgins

--
Change dwhodgins@nomail.afraid.org to davidwhodgins@teksavvy.com for
email replies.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Thu Oct 21 03:47:33 2021

XPost: alt.os.linux

Às 02:58 de 21/10/21, David W. Hodgins escreveu:

On Wed, 20 Oct 2021 21:25:58 -0400, Paulo da Silva

Is there a simple way to prepare a pen with r/w permissions from the
iso? I remember to use unetbootin, or something like that, to do it, but >>> it stopped working at a given point. Since then I have been using dd,
but this makes the pen readonly.
I would like to update the system, make some trivial confs, and install
some sw and it would be nice to make them permanent.
Don't take time with this if you don't know. In the meanwhile I'll
search the net and test on a VM.

For Mageia, the isodumper program/package from the Mageia repos. When
writing an
image to a usb stick, with the option to add a persistent partition
selected, it
uses dd to write the image, then adds an ext4 partition to the remaining space with
the label mgalive-persist. The Mageia live iso images look for the
partition, and if
found mounts it as an overlayfs so all changes made, including
installing additional
packages, are stored for later use.

This is good. I don't know if Opensuse does the same. Most likely not.

Just one more question I forgot ...
Is it the same to install from the live image or is it better to
download the installer image? I'm asking because I never found a distro
with both images.

When installing from a live iso, the contents of the iso (all files seen
when
it's booted, not the iso file itself) are copied to the selected/mounted
file
systems. If installing while running in live mode, and selecting the
install
from the running live system, the changes made in live mode, including
those
stored in the mgalive-persist file system, are included.

I expect other distros that support persistence use similar packages and methods.

At least my network wifi configuration goes to the new installed system.
I'm not sure about the other stuff.

Thanks.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Henry Crun@21:1/5 to Paulo da Silva on Thu Oct 21 07:46:02 2021

On 21/10/2021 4:19, Paulo da Silva wrote:

Às 23:19 de 20/10/21, Carlos E.R. escreveu:

On 20/10/2021 19.22, Paulo da Silva wrote:

Às 13:21 de 20/10/21, Carlos E.R. escreveu:

OK, let's say I want to give opensuse a try.

Let's say I install it and it still cannot handle my temperature
problem. I need to check this before I go into install and configure all >>> SW I use. This takes a couple of weeks.
How to delete it?

I know I did it in the past, but just to be sure ... is it:

1. boot into my actual system.
2. do grub-install or grub-install /dev/nvme0n1 (disk)?

I don't think you need that one.

Are you sure? What if I remove that partition content? Doesn't grub need
it? I am asking because I always believed (without fundament) that there
is always a main system for boot.

3. efibootmgr -B -b <bootnum>?

Yes.

4. Do I need further cleans in /boot/efi?

You can erase the directory /boot/efi/EFI/opensuse, and of course the
root partition.

Maybe you could try one of the live versions, put it under load, and see
what happens with the temps and the fans. It is not fully reliable, but
it is faster.

Is there a simple way to prepare a pen with r/w permissions from the
iso? I remember to use unetbootin, or something like that, to do it, but
it stopped working at a given point. Since then I have been using dd,
but this makes the pen readonly.
I would like to update the system, make some trivial confs, and install
some sw and it would be nice to make them permanent.
Don't take time with this if you don't know. In the meanwhile I'll
search the net and test on a VM.

Thanks
Paulo>

What distro are you trying?
If you install and use mkusb there is an option of creating a persistent (i.e read/write) bootable USB pen drive,
but limited to Debian or Ubuntu. (This is after all an Ubuntu newsgroup)
See https://help.ubuntu.com/community/mkusb

--
Mike R.
Home: http://alpha.mike-r.com/
QOTD: http://alpha.mike-r.com/qotd.php
No Micro$oft products were used in the URLs above, or in preparing this message.
Recommended reading: http://www.catb.org/~esr/faqs/smart-questions.html#before
and: http://alpha.mike-r.com/jargon/T/top-post.html
Missile address: N31.7624/E34.9691

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Ordinary Poster on Thu Oct 21 00:56:50 2021

XPost: alt.os.linux

On 10/20/2021 7:04 PM, Ordinary Poster wrote:

On 20/10/2021 18:22, Paulo da Silva wrote:

OK, let's say I want to give opensuse a try.

People just use a live Flash drive to try things. They don't install anything.

Downloaded the 900MB "LiveDVD" one.

https://sjc.edge.kernel.org/opensuse/tumbleweed/iso/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20211016-Media.iso

Shows an install icon, but it will probably
be doing some sort of network install, with
some delays while it gets stuff from the network.

Whereas the 4GB version will at least have a few
files onboard.

For a one-off install, the 900MB might be the answer.
If you think you'll be installing more than once,
then it might be more important to get a larger
piece of media.

This is what I see in a VM, when clicking the Install
icon in the 900MB one.

[Picture]

https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E. R.@21:1/5 to Paul on Thu Oct 21 15:10:20 2021

XPost: alt.os.linux

On 21/10/2021 06.56, Paul wrote:

On 10/20/2021 7:04 PM, Ordinary Poster wrote:

On 20/10/2021 18:22, Paulo da Silva wrote:

OK, let's say I want to give opensuse a try.

People just use a live Flash drive to try things. They don't install
anything.

Downloaded the 900MB "LiveDVD" one.

https://sjc.edge.kernel.org/opensuse/tumbleweed/iso/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20211016-Media.iso

That one is intended to be run as is, on the USB stick, without
installation, although installation is possible. There should be a KDE
version, another GNome, another XFCE, and another dedicated to rescue
work (the later two might be the same one).

All of them are intended to copy with dd from the image to the USB
device (say, /dev/sdb), destroying all the partitions (creates new
ones). On the first run they create a read/write partition where you can
save files. It is possible to add some packages with zypper (not the
kernel, though).

Don't try to "make them bootable", that would destroy them. Just copy to
the stick, unmodified, with dd or dedicated programs (as described in
the openSUSE wiki).

Then there are two other images, one of about 4GB (the DVD) and another
mall one for network install. Those are the pure installation images,
can not be "run". That is, of course they boot and run but what you get
has only the purpose of installation.

Shows an install icon, but it will probably
be doing some sort of network install, with
some delays while it gets stuff from the network.

Whereas the 4GB version will at least have a few
files onboard.

For a one-off install, the 900MB might be the answer.
If you think you'll be installing more than once,
then it might be more important to get a larger
piece of media.

This is what I see in a VM, when clicking the Install
icon in the 900MB one.

[Picture]

https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

If that's the "Tumbleweed-KDE-Live" you can just cancel the install and
use the system as is, no installation.

--
Cheers,
Carlos E.R.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Carlos E. R.@21:1/5 to Paulo da Silva on Thu Oct 21 15:24:57 2021

XPost: alt.os.linux

On 21/10/2021 03.19, Paulo da Silva wrote:

Às 23:19 de 20/10/21, Carlos E.R. escreveu:

On 20/10/2021 19.22, Paulo da Silva wrote:

Às 13:21 de 20/10/21, Carlos E.R. escreveu:

OK, let's say I want to give opensuse a try.

Let's say I install it and it still cannot handle my temperature
problem. I need to check this before I go into install and configure all >>> SW I use. This takes a couple of weeks.
How to delete it?

I know I did it in the past, but just to be sure ... is it:

1. boot into my actual system.
2. do grub-install or grub-install /dev/nvme0n1 (disk)?

I don't think you need that one.

Are you sure? What if I remove that partition content? Doesn't grub need
it? I am asking because I always believed (without fundament) that there
is always a main system for boot.

Not if you are using UEFI.

Of course, I'm never completely sure, specially if I did not do the
system myself ;-)

It is the code in the /boot/efi/EFI/opensuse directly which would call
the grub code or maybe a kernel loader.

And this code is called by UEFI code, and you change that with
"efibootmgr -B -b <bootnum>" everything else is not strictly required

Maybe you could try one of the live versions, put it under load, and see
what happens with the temps and the fans. It is not fully reliable, but
it is faster.

Is there a simple way to prepare a pen with r/w permissions from the
iso? I remember to use unetbootin, or something like that, to do it, but
it stopped working at a given point. Since then I have been using dd,
but this makes the pen readonly.

Just dd, if the ISO was prepared for it. I know the one named "rescue"
is, it is the one I use.

I would like to update the system, make some trivial confs, and install
some sw and it would be nice to make them permanent.
Don't take time with this if you don't know. In the meanwhile I'll
search the net and test on a VM.

The "rescue" iso should be perfect for testing how the system responds
how it behaves when overheating. Just tell it to clone a hard disk
partition to a compressed file with parallelization, it should overload
the CPU fast. No need to install your code and things.

I can find the script I use for this later today, different computer.

--
Cheers,
Carlos E.R.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Carlos E. R. on Thu Oct 21 10:43:37 2021

XPost: alt.os.linux

On 10/21/2021 9:10 AM, Carlos E. R. wrote:

On 21/10/2021 06.56, Paul wrote:

This is what I see in a VM, when clicking the Install
icon in the 900MB one.

[Picture]

https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

If that's the "Tumbleweed-KDE-Live" you can just cancel the install and use the system as is, no installation.

When a person wants to run a specific graphics driver,
an install comes in handy for that case. Even a USB stick
with persistence would do, but persistence easily exhausts
the 4GB formulation, and it helps to have a larger
casper-rw than that. I think Rufus can do that (rufus.ie).

You might need a specific graphics driver, to get a machine
hot enough to tip over.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rockinghorse Winner@21:1/5 to Carlos E. R. on Thu Oct 21 18:41:10 2021

XPost: alt.os.linux

On 2021-10-21, Carlos E. R. <robin_listas@es.invalid> wrote:

On 21/10/2021 06.56, Paul wrote:

On 10/20/2021 7:04 PM, Ordinary Poster wrote:

On 20/10/2021 18:22, Paulo da Silva wrote:

OK, let's say I want to give opensuse a try.

People just use a live Flash drive to try things. They don't install
anything.

Downloaded the 900MB "LiveDVD" one.

https://sjc.edge.kernel.org/opensuse/tumbleweed/iso/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20211016-Media.iso

That one is intended to be run as is, on the USB stick, without
installation, although installation is possible. There should be a KDE version, another GNome, another XFCE, and another dedicated to rescue
work (the later two might be the same one).

All of them are intended to copy with dd from the image to the USB
device (say, /dev/sdb), destroying all the partitions (creates new
ones). On the first run they create a read/write partition where you can
save files. It is possible to add some packages with zypper (not the
kernel, though).

Don't try to "make them bootable", that would destroy them. Just copy to
the stick, unmodified, with dd or dedicated programs (as described in
the openSUSE wiki).

Then there are two other images, one of about 4GB (the DVD) and another
mall one for network install. Those are the pure installation images,
can not be "run". That is, of course they boot and run but what you get
has only the purpose of installation.

Shows an install icon, but it will probably
be doing some sort of network install, with
some delays while it gets stuff from the network.

Whereas the 4GB version will at least have a few
files onboard.

For a one-off install, the 900MB might be the answer.
If you think you'll be installing more than once,
then it might be more important to get a larger
piece of media.

This is what I see in a VM, when clicking the Install
icon in the 900MB one.

[Picture]

https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

If that's the "Tumbleweed-KDE-Live" you can just cancel the install and
use the system as is, no installation.

Install on an external SSD drive, and get a more realistic experience....if it's a no go, you just rinse, repeat with another distro....

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Paul on Thu Oct 21 14:42:42 2021

XPost: alt.os.linux

On 10/21/2021 10:43 AM, Paul wrote:

On 10/21/2021 9:10 AM, Carlos E. R. wrote:

On 21/10/2021 06.56, Paul wrote:

This is what I see in a VM, when clicking the Install
icon in the 900MB one.

    [Picture]

    https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

If that's the "Tumbleweed-KDE-Live" you can just cancel the install and use the system as is, no installation.

When a person wants to run a specific graphics driver,
an install comes in handy for that case. Even a USB stick
with persistence would do, but persistence easily exhausts
the 4GB formulation, and it helps to have a larger
casper-rw than that. I think Rufus can do that (rufus.ie).

You might need a specific graphics driver, to get a machine
hot enough to tip over.

   Paul

For the OP, that distro is using a UEFI-only install,
so it expects GPT partitioning and UEFI boot in the BIOS.
That means it cannot share with a MSDOS partitioned disk
and legacy boot setup.

I had to back up my disk drive (MSDOS partitioned), clean
it off, then allow SUSE to use the whole thing for GPT, to
allow the install to quickly get under way. It says the
install will take 40 minutes. Afterwards, I will restore
from backup, to put the disk back in original condition.

If it supported MSDOS partitioning and legacy (CSM) boot,
I probably would have been able to come up with an install
plan so it would fit alongside UbuntuStudio.

I figured something was up, when I wasn't seeing the word
"hybrid" when scanning the ISO with "disktype" utility.

Paul

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Sat Oct 30 00:36:44 2021

XPost: alt.os.linux

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

The current situation:

1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
2. The "freezes" origin still unknown. Now I am almost sure that it does
not come from the BIOS. In fact, during a freeze, there was one
occurrence of several continuous "fan jets".
3. A couple of "fan jets" also occurred once while in the grub menu!
4. The uncontrollable rising of temperature of /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after
suspend/wake also occurs with opensuse leap 15.3 live. Before
suspending, that temperature is kept stable at 97ºC.

I tried to use several kernels available in kubuntu, including an intel
version 5.13, but I was unable to get them boot in graphic mode - nvidia
470. Some more ... time and I'll try it without Nvidia drivers.

Thanks for your attention.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bobbie Sellers@21:1/5 to Paulo da Silva on Fri Oct 29 17:33:59 2021

On 10/29/21 16:36, Paulo da Silva wrote:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

The current situation:

1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
2. The "freezes" origin still unknown. Now I am almost sure that it does
not come from the BIOS. In fact, during a freeze, there was one
occurrence of several continuous "fan jets".
3. A couple of "fan jets" also occurred once while in the grub menu!
4. The uncontrollable rising of temperature of /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after suspend/wake also occurs with opensuse leap 15.3 live. Before
suspending, that temperature is kept stable at 97ºC.

I tried to use several kernels available in kubuntu, including an intel version 5.13, but I was unable to get them boot in graphic mode - nvidia
470. Some more ... time and I'll try it without Nvidia drivers.

Thanks for your attention.
Paulo

Have you opened the case and used compressed air to get the dust out?

How long has the CPU been in place under the heat sink. The grease or thermal paste used can dry out and lose heat conductivity.

Good luck with your machine, Paulo.

bliss - if Linux was truely elitist I could not afford the entry fee.
--

bliss dash SF 4 ever at dslextreme dot com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Mon Nov 1 20:13:53 2021

Às 01:33 de 30/10/21, Bobbie Sellers escreveu:

On 10/29/21 16:36, Paulo da Silva wrote:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

The current situation:

1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
2. The "freezes" origin still unknown. Now I am almost sure that it does
not come from the BIOS. In fact, during a freeze, there was one
occurrence of several continuous "fan jets".
3. A couple of "fan jets" also occurred once while in the grub menu!
4. The uncontrollable rising of temperature of
/sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after
suspend/wake also occurs with opensuse leap 15.3 live. Before
suspending, that temperature is kept stable at 97ºC.

I tried to use several kernels available in kubuntu, including an intel
version 5.13, but I was unable to get them boot in graphic mode - nvidia
470. Some more ... time and I'll try it without Nvidia drivers.

Thanks for your attention.
Paulo

Have you opened the case and used compressed air to get the dust out?

How long has the CPU been in place under the heat sink. The grease or thermal paste used can dry out and lose heat conductivity.

Of course it is very likely there are some problems with the
sensors/cooling system. But what I do not understand is why the
temperature gets controlled, by the kernel perhaps, before first
suspension and not after waking from suspension!
I have tried Opensuse and Clear linux. All have the same problem.
I have written a small script that successfully controls the temperature
just changing the CPU's freqs. I don't know how to act on the other
cooling systems. thermald, which was supposed to do this, fails miserabilly.

Regards.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Paulo da Silva on Mon Nov 1 22:21:00 2021

On 11/1/2021 4:13 PM, Paulo da Silva wrote:

Às 01:33 de 30/10/21, Bobbie Sellers escreveu:

On 10/29/21 16:36, Paulo da Silva wrote:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer >>>> completely freezes. Everything stops. The screen shows the last image. >>>> Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

The current situation:

1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
2. The "freezes" origin still unknown. Now I am almost sure that it does >>> not come from the BIOS. In fact, during a freeze, there was one
occurrence of several continuous "fan jets".
3. A couple of "fan jets" also occurred once while in the grub menu!
4. The uncontrollable rising of temperature of
/sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after
suspend/wake also occurs with opensuse leap 15.3 live. Before
suspending, that temperature is kept stable at 97ºC.

I tried to use several kernels available in kubuntu, including an intel
version 5.13, but I was unable to get them boot in graphic mode - nvidia >>> 470. Some more ... time and I'll try it without Nvidia drivers.

Thanks for your attention.
Paulo

Have you opened the case and used compressed air to get the dust out?

How long has the CPU been in place under the heat sink. The grease or >> thermal paste used can dry out and lose heat conductivity.

Of course it is very likely there are some problems with the
sensors/cooling system. But what I do not understand is why the
temperature gets controlled, by the kernel perhaps, before first
suspension and not after waking from suspension!
I have tried Opensuse and Clear linux. All have the same problem.
I have written a small script that successfully controls the temperature
just changing the CPU's freqs. I don't know how to act on the other
cooling systems. thermald, which was supposed to do this, fails miserabilly.

Regards.
Paulo

Find some docs first.

https://01.org/linux-thermal-daemon/documentation/introduction-thermal-daemon

It is possible the ThermalD package doesn't have sufficient XML files
to control every possible HW config. Maybe some platforms will require
hand programming.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paulo da Silva@21:1/5 to All on Tue Dec 7 22:20:14 2021

XPost: alt.os.linux

Às 00:36 de 30/10/21, Paulo da Silva escreveu:

Às 18:22 de 27/09/21, Paulo da Silva escreveu:

Hi all!

From time to time - may be a month or a couple of hours - my computer
completely freezes. Everything stops. The screen shows the last image.
Not even the cursor moves. No keyboard key works including the
Alt-PrtScreen keys, like REISUB.

I need to press the power on/off button for 5 secs to restart it.

After restart the journalctl -b -b1 shows nothing at the freeze time.

I changed my NVIDIA driver to 470. I also tried to put the driver in
ondemand status. No success. Sooner or later it freezes.

Is there a way to get some information on what this is happening?

I am using kubuntu 20.04.

Thank you.

The current situation:

1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
2. The "freezes" origin still unknown. Now I am almost sure that it does
not come from the BIOS. In fact, during a freeze, there was one
occurrence of several continuous "fan jets".
3. A couple of "fan jets" also occurred once while in the grub menu!
4. The uncontrollable rising of temperature of /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after suspend/wake also occurs with opensuse leap 15.3 live. Before
suspending, that temperature is kept stable at 97ºC.

I tried to use several kernels available in kubuntu, including an intel version 5.13, but I was unable to get them boot in graphic mode - nvidia
470. Some more ... time and I'll try it without Nvidia drivers.

1. Freezes completely disappeared after changing the kernel to ubuntu
hwe - currently 5.11.
2. "fan jets" also went out but not when changing the kernel. May be
something changed in some windows/pc control sw, during a windows
update, or some change of EC after I kept the PC disconnected from power
with the battery full discharged for more than 5 hours just to reset it.
3. The problem of the uncontrolled rising of temperature in acpitz zone
after waking from suspension when at "full cpu" still remains. I'm
controlling it with a python script changing upper frequencies of cpu cores.

Thanks to all interested in this problem.
Paulo

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Thu Apr 25 22:17:10 2024
  from Wales, Uk via Telnet
- Keyop
  Thu Apr 25 21:14:50 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Fri Apr 26 08:24:20 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 06:40:30 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (3 / 13)
Uptime:	69:59:41
Calls:	6,656
Calls today:	2
Files:	12,200
Messages:	5,332,146
Posted today:	1

System freezes: How to get the reason?

Who's Online

Recent Visitors

System Info