• System freezes: How to get the reason?

    From Paulo da Silva@21:1/5 to All on Mon Sep 27 18:22:21 2021
    XPost: alt.os.linux

    Hi all!

    From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Mon Sep 27 19:38:14 2021
    XPost: alt.os.linux

    Am Mon, 27 Sep 2021 18:35:37 +0100
    schrieb Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt>:

    Às 18:27 de 27/09/21, Marco Moock escreveu:
    Does it also happen with the nouveau driver?
    Does it happen in the live system?
    Does it happen with another graphics card or with that card in
    another computer?
    I can't use the nouveau driver because I need the computer (a laptop)
    for AI deep learning with tensorflow GPU.


    You can try if it does not happen with nouveau, maybe in the live system, it doesn't have nvidia-470 installed

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Paulo da Silva on Mon Sep 27 13:59:07 2021
    XPost: alt.os.linux

    Paulo da Silva wrote:
    Às 18:27 de 27/09/21, Marco Moock escreveu:
    Does it also happen with the nouveau driver?
    Does it happen in the live system?
    Does it happen with another graphics card or with that card in another computer?

    I can't use the nouveau driver because I need the computer (a laptop)
    for AI deep learning with tensorflow GPU.

    Do you correlate the failure, with any particular
    activity on the machine ?

    For example, a more mundane activity on a computer,
    is the usage of modern Firefox. While the user is
    not viewing a web page, Firefox seems to leak memory
    until all available memory in Ring 3 is used up.

    But Linux has Out of Memory (OOM) killer, for the
    handling of memory exhaustion that way. The system
    should not freeze because Firefox happens to be
    running.

    Whereas, I don't know what happens, if a GPU that
    uses shared memory, happens to request more and
    more RAM for some GPU activity. An NVidia GPU is
    more likely to have its own memory chips, and be
    less likely to cause resource exhaustion on its own.

    Try running "nvidia-smi" in a terminal window,
    selecting the option to have it update the
    screen repetitively (like "top" in a sense), and
    watch resource consumption listed there. If you're
    running the NVidia driver, that program should be
    installed for you.

    You could run "top" in one terminal window (using
    the information near the top of top, for resource info).
    And run "nvidia-smi" in a second window, to watch
    for dwindling NVidia resources.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Mon Sep 27 19:02:28 2021
    XPost: alt.os.linux

    Às 18:38 de 27/09/21, Marco Moock escreveu:
    Am Mon, 27 Sep 2021 18:35:37 +0100
    schrieb Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt>:

    Às 18:27 de 27/09/21, Marco Moock escreveu:
    Does it also happen with the nouveau driver?
    Does it happen in the live system?
    Does it happen with another graphics card or with that card in
    another computer?
    I can't use the nouveau driver because I need the computer (a laptop)
    for AI deep learning with tensorflow GPU.


    You can try if it does not happen with nouveau, maybe in the live system, it doesn't have nvidia-470 installed


    I only changed to 470 after the problem caused me a small loss of data
    in the hope for a solution, but it also failed. So far I was able to
    accept a failure once in a while. Most of time it works without any
    problem. It may be a month perhaps more without any problem. Also that's
    why using nouveau is not possible.

    I wander if there is some kind of script or configuration that forces
    the logs not to be buffered. I'll search in the internet ...

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Mon Sep 27 19:27:24 2021
    XPost: alt.os.linux

    Does it also happen with the nouveau driver?
    Does it happen in the live system?
    Does it happen with another graphics card or with that card in another computer?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Mon Sep 27 18:35:37 2021
    XPost: alt.os.linux

    Às 18:27 de 27/09/21, Marco Moock escreveu:
    Does it also happen with the nouveau driver?
    Does it happen in the live system?
    Does it happen with another graphics card or with that card in another computer?

    I can't use the nouveau driver because I need the computer (a laptop)
    for AI deep learning with tensorflow GPU.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Java Jive@21:1/5 to Paulo da Silva on Mon Sep 27 19:36:08 2021
    XPost: alt.os.linux

    On 27/09/2021 18:22, Paulo da Silva wrote:

    From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    Sounds as though it might be hardware. At least that could be something
    to eliminate. Maybe run a memcheck, and an fsck of the entire disk surface?

    --

    Fake news kills!

    I may be contacted via the contact address given on my website:
    www.macfh.co.uk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E. R.@21:1/5 to Paulo da Silva on Mon Sep 27 20:34:04 2021
    XPost: alt.os.linux

    On 27/09/2021 20.02, Paulo da Silva wrote:
    Às 18:38 de 27/09/21, Marco Moock escreveu:
    ...

    I wander if there is some kind of script or configuration that forces
    the logs not to be buffered. I'll search in the internet ...

    Yes. You can send kernel logs directly to another machine via ethernet,
    or even better if available, serial port.

    Directly from the kernel, mind.

    I may be able to locate information later, if you are interested. Hidden
    deep in my bug reports somewhere. But I don't have my notes taken on the machine I used for this, it is on another city.

    --
    Cheers,
    Carlos E.R.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Mon Sep 27 20:42:17 2021
    XPost: alt.os.linux

    Am Mon, 27 Sep 2021 19:36:08 +0100
    schrieb Java Jive <java@evij.com.invalid>:
    Maybe run a memcheck, and an fsck of the
    entire disk surface?

    If is is a drive fault, SysRq+R should still work.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Mon Sep 27 20:29:31 2021
    XPost: alt.os.linux

    Às 19:36 de 27/09/21, Java Jive escreveu:
    On 27/09/2021 18:22, Paulo da Silva wrote:

     From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    Sounds as though it might be hardware.
    Almost for sure ...
    ...

    Maybe run a memcheck,
    How? In my boot menu there is no such option :-(

    and an fsck of the entire disk
    surface?
    I am running btrfs and I use scrub after the freezes. Never had an error
    on my SSD.
    Also smartctl -a only reports one error for a long time
    Error Information Log Entries: 1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Mon Sep 27 20:32:19 2021
    XPost: alt.os.linux

    Às 19:34 de 27/09/21, Carlos E. R. escreveu:
    On 27/09/2021 20.02, Paulo da Silva wrote:
    Às 18:38 de 27/09/21, Marco Moock escreveu:
    ...

    I wander if there is some kind of script or configuration that forces
    the logs not to be buffered. I'll search in the internet ...

    Yes. You can send kernel logs directly to another machine via ethernet,
    or even better if available, serial port.

    Directly from the kernel, mind.

    I may be able to locate information later, if you are interested. Hidden
    deep in my bug reports somewhere.
    I would thank you very much if you could find them.
    I am searching the internet for this stuff but so far I only found
    trivial suggestions about logs.

    But I don't have my notes taken on the
    machine I used for this, it is on another city.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Java Jive@21:1/5 to Paulo da Silva on Mon Sep 27 20:57:46 2021
    XPost: alt.os.linux

    On 27/09/2021 20:29, Paulo da Silva wrote:
    Às 19:36 de 27/09/21, Java Jive escreveu:
    On 27/09/2021 18:22, Paulo da Silva wrote:

     From time to time - may be a month or a couple of hours - my computer >>> completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    Sounds as though it might be hardware.
    Almost for sure ...
    ....

    Maybe run a memcheck,
    How? In my boot menu there is no such option :-(

    Download an image and boot from it:
    https://www.memtest86.com/

    and an fsck of the entire disk
    surface?
    I am running btrfs and I use scrub after the freezes. Never had an error
    on my SSD.
    Also smartctl -a only reports one error for a long time
    Error Information Log Entries: 1

    Fair enough, I didn't realise it was an SSD not a spinner, and it was
    just one possible line of enquiry.

    --

    Fake news kills!

    I may be contacted via the contact address given on my website:
    www.macfh.co.uk

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Mon Sep 27 20:23:54 2021
    XPost: alt.os.linux

    Às 18:59 de 27/09/21, Paul escreveu:
    Paulo da Silva wrote:
    Às 18:27 de 27/09/21, Marco Moock escreveu:
    Does it also happen with the nouveau driver?
    Does it happen in the live system?
    Does it happen with another graphics card or with that card in
    another computer?

    I can't use the nouveau driver because I need the computer (a laptop)
    for AI deep learning with tensorflow GPU.

    Do you correlate the failure, with any particular
    activity on the machine ?
    Certainly no. For example the last time I just left the computer
    unatended making a backup. When I returned to the computer it was
    frozen. The backup had terminated, however,

    For example, a more mundane activity on a computer,
    is the usage of modern Firefox. While the user is
    not viewing a web page, Firefox seems to leak memory
    until all available memory in Ring 3 is used up.

    But Linux has Out of Memory (OOM) killer, for the
    handling of memory exhaustion that way. The system
    should not freeze because Firefox happens to be
    running.

    I have panel widgets monitoring many things, among them memory. The
    laptop has 32GB of RAM. I rarely need them except for some data
    processing on AI.
    Also the temperature is kept low because the clock is set to half freq.
    except when I need to run some special tasks, like training AI
    algorithms for example. This is a very fast machine.
    BTW, I never get a freeze when running these tasks. Certainly a
    coincidence, because the freezes in general are very rare.

    Whereas, I don't know what happens, if a GPU that
    uses shared memory, happens to request more and
    more RAM for some GPU activity. An NVidia GPU is
    more likely to have its own memory chips, and be
    less likely to cause resource exhaustion on its own.

    Try running "nvidia-smi" in a terminal window,
    selecting the option to have it update the
    screen repetitively (like "top" in a sense), and
    watch resource consumption listed there. If you're
    running the NVidia driver, that program should be
    installed for you.

    You could run "top" in one terminal window (using
    the information near the top of top, for resource info).
    And run "nvidia-smi" in a second window, to watch
    for dwindling NVidia resources.

    I'll try this. Thanks Paul.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to Paulo da Silva on Mon Sep 27 22:47:16 2021
    XPost: alt.os.linux

    On 27/09/2021 19.22, Paulo da Silva wrote:

    I am using kubuntu 20.04.
    From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    Does your HDD led flash at lot?
    If so, I would bet my money on that Plasma5 has leaked memory, in such
    case the following bug could be of interest for your: https://bugs.kde.org/show_bug.cgi?id=436061

    There ain't much you can do about this, the machine is too occupied with swapping that you won't be able to ssh to the machine. It could be wise
    to disable swap and those get the kernel to kill a random process and
    hopefully it is plasmashell. I have had times when plasmashell has taken
    58G of RAM and it's no other option than reboot the computer.



    After restart the journalctl -b -b1 shows nothing at the freeze time.

    Tend to be difficult to write to file when system under heavy load.

    Is there a way to get some information on what this is happening?

    For me it was more to try to be notified before it's get too bad, like
    logging the output from top* once every five minutes and that way be
    able to see memory usage.



    * for example use: top -b -n 1 >> /path/to/file/where/you/want/to/log


    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E. R.@21:1/5 to Paulo da Silva on Tue Sep 28 00:58:43 2021
    XPost: alt.os.linux

    On 27/09/2021 21.29, Paulo da Silva wrote:
    Às 19:36 de 27/09/21, Java Jive escreveu:


    and an fsck of the entire disk
    surface?
    I am running btrfs and I use scrub after the freezes. Never had an error
    on my SSD.
    Also smartctl -a only reports one error for a long time
    Error Information Log Entries: 1

    You should do a smartctl short test, then a long test.


    --
    Cheers,
    Carlos E.R.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Tue Sep 28 00:36:16 2021
    XPost: alt.os.linux

    Às 23:58 de 27/09/21, Carlos E. R. escreveu:
    On 27/09/2021 21.29, Paulo da Silva wrote:
    Às 19:36 de 27/09/21, Java Jive escreveu:


    and an fsck of the entire disk
    surface?
    I am running btrfs and I use scrub after the freezes. Never had an error
    on my SSD.
    Also smartctl -a only reports one error for a long time
    Error Information Log Entries: 1

    You should do a smartctl short test, then a long test.

    It doesn't work for SSD, at least for mine.
    Only smartctl -a /dev/...

    # smartctl -t long /dev/nvme0n1
    smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-88-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

    NVMe device successfully opened

    Use 'smartctl -a' (or '-x') to print SMART (and more) information

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E. R.@21:1/5 to Paulo da Silva on Tue Sep 28 02:15:46 2021
    XPost: alt.os.linux

    On 28/09/2021 01.36, Paulo da Silva wrote:
    Às 23:58 de 27/09/21, Carlos E. R. escreveu:
    On 27/09/2021 21.29, Paulo da Silva wrote:
    Às 19:36 de 27/09/21, Java Jive escreveu:


    and an fsck of the entire disk
    surface?
    I am running btrfs and I use scrub after the freezes. Never had an error >>> on my SSD.
    Also smartctl -a only reports one error for a long time
    Error Information Log Entries: 1

    You should do a smartctl short test, then a long test.

    It doesn't work for SSD, at least for mine.

    It works on mine. Sigh...

    Only smartctl -a /dev/...

    # smartctl -t long /dev/nvme0n1

    Ah, that's not an SSD proper, but an nvme. Does not have a SATA
    connection, has to emulate some things. Thus smart may not work.

    smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-88-generic] (local build) Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

    NVMe device successfully opened

    Use 'smartctl -a' (or '-x') to print SMART (and more) information


    Pity.


    --
    Cheers,
    Carlos E.R.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From William Unruh@21:1/5 to J.O. Aho on Tue Sep 28 01:55:20 2021
    XPost: alt.os.linux

    On 2021-09-27, J.O. Aho <user@example.net> wrote:

    On 27/09/2021 19.22, Paulo da Silva wrote:

    I am using kubuntu 20.04.
    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I wouod bet on a hardware problem. No warning. random. eg, the power
    supply voltage could drop briefly. The system has not way of recording
    it.
    Buy a new computer.

    Does your HDD led flash at lot?
    If so, I would bet my money on that Plasma5 has leaked memory, in such
    case the following bug could be of interest for your: https://bugs.kde.org/show_bug.cgi?id=436061

    There ain't much you can do about this, the machine is too occupied with swapping that you won't be able to ssh to the machine. It could be wise
    to disable swap and those get the kernel to kill a random process and hopefully it is plasmashell. I have had times when plasmashell has taken
    58G of RAM and it's no other option than reboot the computer.



    After restart the journalctl -b -b1 shows nothing at the freeze time.

    Tend to be difficult to write to file when system under heavy load.

    Is there a way to get some information on what this is happening?

    For me it was more to try to be notified before it's get too bad, like logging the output from top* once every five minutes and that way be
    able to see memory usage.



    * for example use: top -b -n 1 >> /path/to/file/where/you/want/to/log



    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From stepore@21:1/5 to Paulo da Silva on Mon Sep 27 20:44:24 2021
    XPost: alt.os.linux

    On 09/27/2021 12:32 PM, Paulo da Silva wrote:
    I would thank you very much if you could find them.
    I am searching the internet for this stuff but so far I only found
    trivial suggestions about logs.


    It's fairly trivial to setup another computer as a remote syslog server
    and ship your laptop logs to that. Or if you're really keen, use
    something like graylog or ELK stack or even free version of Splunk to
    ship your logs to. They give you great insights to system logs.

    On that note, it might be worth it to you to set up something like
    Grafana on another server (or again Splunk) so you can setup and see
    dashboards and historical overview of all system resources before/after
    a freeze.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to William Unruh on Tue Sep 28 11:03:36 2021
    XPost: alt.os.linux

    On 28/09/2021 03.55, William Unruh wrote:
    On 2021-09-27, J.O. Aho <user@example.net> wrote:

    On 27/09/2021 19.22, Paulo da Silva wrote:

    I am using kubuntu 20.04.
    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I wouod bet on a hardware problem. No warning. random. eg, the power
    supply voltage could drop briefly. The system has not way of recording
    it.

    The plasmashell issue is quite random, sometimes it can take days before
    it happens, sometimes it's just a short while after fresh reboot, so I
    wouldn't jump on a hardware issue before ruling out powershell bug.

    It must be quite expensive for you to get a new computer each time you
    had a software issue.

    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to William Unruh on Tue Sep 28 05:47:50 2021
    XPost: alt.os.linux

    William Unruh wrote:
    On 2021-09-27, J.O. Aho <user@example.net> wrote:
    On 27/09/2021 19.22, Paulo da Silva wrote:

    I am using kubuntu 20.04.
    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I wouod bet on a hardware problem. No warning. random. eg, the power
    supply voltage could drop briefly. The system has not way of recording
    it.
    Buy a new computer.

    Among enthusiasts, it is popular to stock a spare
    power supply. You can fit your spare supply and
    retest, and see if that theory holds water.
    Right now, the junk room sports a Seasonic S12
    as the "designated hitter".

    Running Prime95 (statically compiled Linux version
    in "Just Testing" mode), while using the existing
    supply, is an acceptance test. It tests machine
    cooling is adequate (run something lmsensors based,
    to see whether temp overshoots, while you're waiting
    for the machine to shut off on CPU THERMTRIP). It draws
    max CPU power. My machine, wall power climbs to 180W
    while running that CPU integrity test.

    https://www.mersenne.org/download/

    If you have NVidia driver, you can add in a graphics
    test if you want, but I don't have anything for that
    in mind. I have a CUDA app, but it would be a pig
    to set up due to libs and so on. On my machine, running
    the graphics test case while Prime95 is running, raises
    machine power to 360W (on a 550W PSU). Modern video
    cards have a power limiter, and they also have a
    status indicator in software, indicating which limiter is limiting
    GPU performance. Running NVENC or NVDEC for example,
    the card won't use more than 1/3rd of max power.

    Normally, my machine power level doesn't go past 200W
    without testing assistance like that. 360W to 400W loading,
    is via synthetic (unlikely) tests.

    *******

    Haswell CPUs, at the time, some power supplies would
    become unstable at low load, leading to "Haswell certified"
    power supplies. But the most likely reason for that
    to happen, was the existence of some older supplies
    that have (on the label), a row of numbers for
    "minimum consumption". No supply created in at least
    the last ten years, has that row of numbers on the label.

    The absolute worst situation of that type, is there
    existed one supply, where the 12V rail needed 25% loading
    to remain stable. So if the rail was 40 amps, the label would
    read: Naturally, I was careful to never buy a supply
    with the two-row MIN/MAX labeling, as it's an admission
    of "stupid" in design. You would always be looking over
    your shoulder, if you bought the one on the left.

    Ancient supply label Modern supply label (zero amps is OK)
    ... +12V
    Min 10A ... +12V
    Max 40A Max 40A

    With lots of computer hardware today, such a guarantee
    could not be met in the form of min loading. The idle current
    could easily drop below 10A for example. Some modern supplies
    have met the "0 amps" requirement, by having a 5W or 10W
    load inside the PSU for the purpose of meeting open circuit
    stability requirements. It's unlikely an 80+ supply is
    doing that.

    And here, stability does not mean "oscillation",
    stability means remaining in regulation, 12V +/- 5%. If
    unloaded, a "MIN/MAX" supply might deviate past 5% by a bit.
    12V only gets in trouble, if it drops below 11V, as an example of
    how far it can be pushed on overload. Burning might result
    (hard drive clamp device activates) at around +15V or so.
    There's a bit of headroom on +12V on the high side. Some
    other rails don't have that luxury.

    A multimeter is recommended, if checking voltages. Do not
    trust the ACPI-calibrated voltage readouts for this. The
    multimeter might be accurate to around 2% or so. And be careful
    with the multimeter probes - one of those modern 1200W supplies,
    if you happened to short +12V, it would not be pretty. They
    live for the chance to melt wiring. While in theory, individual
    wire looms have 20A limiters (PSU shuts off), you don't
    want to be testing the cheapness of the company making
    the supply, even if you've paid $150 for it. In some ways,
    the behavior of the supply, is not adequately captured in
    the affixed labeling scheme (specifically, OC protection).
    There's been at least one, where it didn't appear
    there was adequate loom protection.

    In terms of noise patterns, supplies have "ripple". This might
    be in the 0.02 to 0.05V range or so. The output capacitors
    determine how fast the rail can change instantaneously.

    This is a really old schematic now, for PSU education,
    but it still illustrates the design principles.
    There's 1000uF on the +12V rail for example. Supplies
    typically can have 4000-5000 more uF added to the rail
    at the load, before it affects oscillation stability.
    Precise information of that nature, is hard to get
    from a manufacturer, but the designer is aware of
    the issue. You can't put 250,000uF across a PC PSU.

    http://www.pavouk.org/hw/en_atxps.html

    The ATX supply "pushes" but does not "pull". It is
    not an op amp or linear amplifier. If the supply
    deviates due to transient loading, it likely does
    not respond well to energy dumped back into the
    supply. Motherboards don't generally do that.

    Only one regulator in the whole PC is push/pull. And
    that's the regulator for the DIMM terminator resistors,
    where the current flow magnitude can be in the +2 amps
    to -2 amps range (bus all 0's, bus all 1's). The
    regulator must sink the -2 amps, in order to precisely
    maintain the terminators at the correct voltage
    (otherwise, your PC may suffer the "Photoshop bug").
    Most other regulators are the "push only" variety.
    A 7805 is a push only regulator. It's not intended
    to sink backward current flow.

    Summary: I doubt it is the PSU, but... that's why we
    test stuff.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos@21:1/5 to Paulo da Silva on Tue Sep 28 11:54:29 2021
    XPost: alt.os.linux

    On Mon, 27 Sep 2021 20:32:19 +0100, Paulo da Silva wrote:

    Às 19:34 de 27/09/21, Carlos E. R. escreveu:
    On 27/09/2021 20.02, Paulo da Silva wrote:
    Às 18:38 de 27/09/21, Marco Moock escreveu:
    ...

    I wander if there is some kind of script or configuration that forces
    the logs not to be buffered. I'll search in the internet ...

    Yes. You can send kernel logs directly to another machine via ethernet,
    or even better if available, serial port.

    Directly from the kernel, mind.

    I may be able to locate information later, if you are interested.
    Hidden deep in my bug reports somewhere.
    I would thank you very much if you could find them.
    I am searching the internet for this stuff but so far I only found
    trivial suggestions about logs.

    Found the bug report :-)

    Or one of them, I'm reading. Dec 2008.

    First, I was told to "boot with console=tty0
    console=ttyS0,<speed>", then run "klogconsole -r0 -l8" once booted.

    Ok, this is not it, this is using an actual serial port.


    Continue searching.

    Ah, I wrote notes! I copy and translate them.
    I'll post using "Pan" because I can disable word wrap. I hope it gets to you with the long lines intact, even if it is not "Usenet valid".

    +++================================ kernel messages via serial port.

    grub:
    console=tty9 console=ttyS1,57600
    shell:
    klogconsole -r0 -l9

    mind, interferes with hibernation

    ================================---

    +++================================ netconsole. Kernel logging on remote machine.

    Date: Fri, 23 May 2014 19:19:15 -0400
    From: Cristian Rodrí­guez <...@opensuse.org>
    Reply-To: OS-en <opensuse@opensuse.org>
    To: opensuse@opensuse.org
    Subject: Re: [opensuse] Kernel crash on multiple file write on reiserfs GPT partition.

    ...

    P D O .. that means:

    "P" --> propietary module loaded, developers will most likely ignore
    your report if it comes in this form.

    "D" --> the kernel has oopsed before, that means what you are showing
    in the picture is a secondary oops, not the actual problem.

    "O" -> "Out of tree module" is loaded, good luck with getting that fixed.

    ...

    https://www.kernel.org/doc/Documentation/networking/netconsole.txt
    ...
    Ah. Ok it appears to be the same as in "/usr/share/doc/packages/netconsole-tools/netlogging.txt"
    ...
    The documentation is obsolete. The correct syntax appears to be:

    modprobe netconsole 6666@192.168.1.14/eth0,6666@192.168.1.15


    which I got from "http://www.cyberciti.biz/tips/linux-netconsole-log-management-tutorial.html".

    not

    modprobe netconsole netconsole="...

    ...

    Try with the section "dynamic configuration" from the netconsole.txt doc.

    Telcontar:~ # modprobe netconsole
    Telcontar:~ # cd /sys/kernel/config/netconsole/ Telcontar:/sys/kernel/config/netconsole # ls Telcontar:/sys/kernel/config/netconsole # mkdir target1 Telcontar:/sys/kernel/config/netconsole # ls
    target1
    Telcontar:/sys/kernel/config/netconsole # cd target1/ Telcontar:/sys/kernel/config/netconsole/target1 # ls
    dev_name enabled local_ip local_mac local_port remote_ip remote_mac remote_port
    Telcontar:/sys/kernel/config/netconsole/target1 # cat local_
    local_ip local_mac local_port Telcontar:/sys/kernel/config/netconsole/target1 # cat local_ip
    0.0.0.0
    Telcontar:/sys/kernel/config/netconsole/target1 # echo 192.168.1.14 > local_ip Telcontar:/sys/kernel/config/netconsole/target1 # echo 6666 > local_port

    but

    Telcontar:/sys/kernel/config/netconsole/target1 # echo "00:21:85:16:2D:0B" > local_mac
    -bash: local_mac: Permission denied Telcontar:/sys/kernel/config/netconsole/target1 # cat local_mac
    ff:ff:ff:ff:ff:

    weird.

    Telcontar:/sys/kernel/config/netconsole/target1 # echo "00:03:0D:05:17:FC" > remote_mac
    Telcontar:/sys/kernel/config/netconsole/target1 # echo 6666 > remote_port Telcontar:/sys/kernel/config/netconsole/target1 # echo 192.168.1.15 > remote_ip Telcontar:/sys/kernel/config/netconsole/target1 # cat dev_name
    eth0
    Telcontar:/sys/kernel/config/netconsole/target1 # echo 1 > enabled

    It is apparently started:
    Telcontar:/sys/kernel/config/netconsole/target1 # tail /var/log/messages
    <3.6> 2014-05-24 13:23:01 Telcontar systemd 1 - - Starting Session 78 of user news.
    <3.6> 2014-05-24 13:25:01 Telcontar systemd 1 - - Starting Session 79 of user news.
    <3.6> 2014-05-24 13:28:01 Telcontar systemd 1 - - Starting Session 80 of user news.
    <0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.603827] netpoll: netconsole: local port 6666
    <0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.609384] netpoll: netconsole: local IPv4 address 192.168.1.14
    <0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.614762] netpoll: netconsole: interface 'eth0'
    <0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.620095] netpoll: netconsole: remote port 6666
    <0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.625373] netpoll: netconsole: remote IPv4 address 192.168.1.15
    <0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.630545] netpoll: netconsole: remote ethernet address 00:03:0d:05:17:fc
    <0.6> 2014-05-24 13:28:06 Telcontar kernel - - - [10768.635653] netconsole: network logging started


    On the receiving computer, I have:

    netcat -u -l 6666 | tee -a remote_log


    I plugged a usb stick, and got the messages on the remote, so good!


    Now I go for testing and crashing the machine again. Nvidia is not in the list. Wish me luck!


    modprobe netconsole
    cd /sys/kernel/config/netconsole/
    ls
    mkdir target1
    ls

    cd target1/
    ls
    cat *

    echo 192.168.1.14 > local_ip
    echo 6666 > local_port

    echo "00:03:0D:05:17:FC" > remote_mac
    echo 6666 > remote_port
    echo 192.168.1.15 > remote_ip
    cat dev_name
    echo 1 > enabled




    ------

    2015-11-22

    6666 - Local port
    192.168.1.5 - Local system IP
    eth0 - Local system interface
    514 - Remote syslogd udp port
    192.168.1.100 - Remote syslogd IP
    00:19:D1:2A:BA:A8 - Remote syslogd Mac

    You can add above modprobe line to /etc/rc.local to load module automatically. Another recommend option is create /etc/modprobe.d/netconsole file and append following text:
    # echo 'options netconsole netconsole=6666@192.168.1.5/eth0,514@192.168.1.100/00:19:D1:2A:BA:A8 '> /etc/modprobe.d/netconsole


    echo 'options netconsole netconsole=6666@192.168.1.14/eth0,514@192.168.1.15/00:03:0d:05:17:fc '> /etc/modprobe.d/netconsole

    The log shows:

    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081911] netpoll: netconsole: local port 6666
    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081920] netpoll: netconsole: local IPv4 address 192.168.1.14
    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081921] netpoll: netconsole: interface 'eth0'
    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081922] netpoll: netconsole: remote port 514
    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081923] netpoll: netconsole: remote IPv4 address 192.168.1.15
    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081925] netpoll: netconsole: remote ethernet address 00:03:0d:05:17:fc
    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081949] console [netcon0] enabled
    <0.6> 2015-11-22 14:38:04 Telcontar kernel - - - [ 3495.081949] netconsole: network logging started




    2015-11-22 On the other side I don't receive anything, once I open the port. Nothing.


    This does work:

    modprobe netconsole
    cd /sys/kernel/config/netconsole/
    ls
    mkdir target1
    ls

    cd target1/
    ls
    cat *

    echo 192.168.1.14 > local_ip
    echo 6666 > local_port

    echo "00:03:0D:05:17:FC" > remote_mac
    echo 6666 > remote_port
    echo 192.168.1.15 > remote_ip
    cat dev_name
    echo 1 > enabled

    But not on port 514.

    This does work.

    echo 'options netconsole netconsole=6666@192.168.1.14/eth0,6666@192.168.1.15/00:03:0d:05:17:fc '> /etc/modprobe.d/netconsole.conf

    Meaning, syslog does not work.

    netcat -u -l 6666 | tee -a remote_log


    <https://www.kernel.org/doc/Documentation/networking/netconsole.txt>
    Very good: <http://www.cyberciti.biz/tips/linux-netconsole-log-management-tutorial.html#comment-620097>

    ================================---


    HTH :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E. R.@21:1/5 to stepore on Tue Sep 28 13:33:41 2021
    XPost: alt.os.linux

    On 28/09/2021 05.44, stepore wrote:
    On 09/27/2021 12:32 PM, Paulo da Silva wrote:
    I would thank you very much if you could find them.
    I am searching the internet for this stuff but so far I only found
    trivial suggestions about logs.


    It's fairly trivial to setup another computer as a remote syslog server
    and ship your laptop logs to that.

    That's not the same thing as I proposed, because it runs in userspace.


    --
    Cheers,
    Carlos E.R.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jonathan N. Little@21:1/5 to Paul on Tue Sep 28 10:57:24 2021
    XPost: alt.os.linux

    Paul wrote:
    William Unruh wrote:
    On 2021-09-27, J.O. Aho <user@example.net> wrote:
    On 27/09/2021 19.22, Paulo da Silva wrote:

    I am using kubuntu 20.04.
     From time to time - may be a month or a couple of hours - my computer >>>> completely freezes. Everything stops. The screen shows the last image. >>>> Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I wouod bet on a hardware problem. No warning. random. eg, the power
    supply voltage could drop briefly. The system has not way of recording
    it.
    Buy a new computer.

    Among enthusiasts, it is popular to stock a spare
    power supply. You can fit your spare supply and
    retest, and see if that theory holds water.
    Right now, the junk room sports a Seasonic S12
    as the "designated hitter".

    Running Prime95 (statically compiled Linux version
    in "Just Testing" mode), while using the existing
    supply, is an acceptance test. It tests machine
    cooling is adequate (run something lmsensors based,
    to see whether temp overshoots, while you're waiting
    for the machine to shut off on CPU THERMTRIP). It draws
    max CPU power. My machine, wall power climbs to 180W
    while running that CPU integrity test.

       https://www.mersenne.org/download/

    If you have NVidia driver, you can add in a graphics
    test if you want, but I don't have anything for that
    in mind. I have a CUDA app, but it would be a pig
    to set up due to libs and so on. On my machine, running
    the graphics test case while Prime95 is running, raises
    machine power to 360W (on a 550W PSU). Modern video
    cards have a power limiter, and they also have a
    status indicator in software, indicating which limiter is limiting
    GPU performance. Running NVENC or NVDEC for example,
    the card won't use more than 1/3rd of max power.

    Normally, my machine power level doesn't go past 200W
    without testing assistance like that. 360W to 400W loading,
    is via synthetic (unlikely) tests.

    *******

    Haswell CPUs, at the time, some power supplies would
    become unstable at low load, leading to "Haswell certified"
    power supplies. But the most likely reason for that
    to happen, was the existence of some older supplies
    that have (on the label), a row of numbers for
    "minimum consumption". No supply created in at least
    the last ten years, has that row of numbers on the label.

    The absolute worst situation of that type, is there
    existed one supply, where the 12V rail needed 25% loading
    to remain stable. So if the rail was 40 amps, the label would
    read: Naturally, I was careful to never buy a supply
    with the two-row MIN/MAX labeling, as it's an admission
    of "stupid" in design. You would always be looking over
    your shoulder, if you bought the one on the left.

    Ancient supply label          Modern supply label (zero amps is OK)      ... +12V
    Min       10A                      ... +12V Max       40A                 Max       40A

    With lots of computer hardware today, such a guarantee
    could not be met in the form of min loading. The idle current
    could easily drop below 10A for example. Some modern supplies
    have met the "0 amps" requirement, by having a 5W or 10W
    load inside the PSU for the purpose of meeting open circuit
    stability requirements. It's unlikely an 80+ supply is
    doing that.

    And here, stability does not mean "oscillation",
    stability means remaining in regulation, 12V +/- 5%. If
    unloaded, a "MIN/MAX" supply might deviate past 5% by a bit.
    12V only gets in trouble, if it drops below 11V, as an example of
    how far it can be pushed on overload. Burning might result
    (hard drive clamp device activates) at around +15V or so.
    There's a bit of headroom on +12V on the high side. Some
    other rails don't have that luxury.

    A multimeter is recommended, if checking voltages. Do not
    trust the ACPI-calibrated voltage readouts for this. The
    multimeter might be accurate to around 2% or so. And be careful
    with the multimeter probes - one of those modern 1200W supplies,
    if you happened to short +12V, it would not be pretty. They
    live for the chance to melt wiring. While in theory, individual
    wire looms have 20A limiters (PSU shuts off), you don't
    want to be testing the cheapness of the company making
    the supply, even if you've paid $150 for it. In some ways,
    the behavior of the supply, is not adequately captured in
    the affixed labeling scheme (specifically, OC protection).
    There's been at least one, where it didn't appear
    there was adequate loom protection.

    In terms of noise patterns, supplies have "ripple". This might
    be in the 0.02 to 0.05V range or so. The output capacitors
    determine how fast the rail can change instantaneously.

    This is a really old schematic now, for PSU education,
    but it still illustrates the design principles.
    There's 1000uF on the +12V rail for example. Supplies
    typically can have 4000-5000 more uF added to the rail
    at the load, before it affects oscillation stability.
    Precise information of that nature, is hard to get
    from a manufacturer, but the designer is aware of
    the issue. You can't put 250,000uF across a PC PSU.

    http://www.pavouk.org/hw/en_atxps.html

    The ATX supply "pushes" but does not "pull". It is
    not an op amp or linear amplifier. If the supply
    deviates due to transient loading, it likely does
    not respond well to energy dumped back into the
    supply. Motherboards don't generally do that.

    Only one regulator in the whole PC is push/pull. And
    that's the regulator for the DIMM terminator resistors,
    where the current flow magnitude can be in the +2 amps
    to -2 amps range (bus all 0's, bus all 1's). The
    regulator must sink the -2 amps, in order to precisely
    maintain the terminators at the correct voltage
    (otherwise, your PC may suffer the "Photoshop bug").
    Most other regulators are the "push only" variety.
    A 7805 is a push only regulator. It's not intended
    to sink backward current flow.

    Summary: I doubt it is the PSU, but... that's why we
             test stuff.


    Later in the thread I believe OP said it was a laptop. Swapping PSU not
    an option. One thing that most likely the cause especially on a laptop
    is heat-related hardware issue. Laptops make this a more difficult issue
    to deal with, but depending on the laptop I would open 'er up and at
    least blow out all the dust. The Dell and Lenovo I have is a simple
    process, some other brands, not so much. Looking at the mb caps and
    crusty corrosion... My old Latitude D-820 was a lap-roster with nVidia
    GPU that was notorious for GPU meltdowns. I cleaned and remount heap
    pipes several times and avoided that fate.

    --
    Take care,

    Jonathan
    -------------------
    LITTLE WORKS STUDIO
    http://www.LittleWorksStudio.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Jonathan N. Little on Tue Sep 28 11:42:37 2021
    XPost: alt.os.linux

    Jonathan N. Little wrote:


    Later in the thread I believe OP said it was a laptop. Swapping PSU not
    an option. One thing that most likely the cause especially on a laptop
    is heat-related hardware issue. Laptops make this a more difficult issue
    to deal with, but depending on the laptop I would open 'er up and at
    least blow out all the dust. The Dell and Lenovo I have is a simple
    process, some other brands, not so much. Looking at the mb caps and
    crusty corrosion... My old Latitude D-820 was a lap-roster with nVidia
    GPU that was notorious for GPU meltdowns. I cleaned and remount heap
    pipes several times and avoided that fate.


    Laptops are less debug-able.

    The posts I read referred to a "computer".

    Setting up a serial port, is the best way
    to determine if it is really frozen. I prefer
    the SuperIO serial port type, to USB serial.

    I use this on the boot line of my newest computer:

    console=ttyS0,57600n8

    I have a serial cable that runs from the other
    machine, over to this machine, where I can monitor it.

    The nice thing about ttyS0, is it never moves,
    whereas if you use USB serial adapters, you
    don't know what the identifier for it is. Maybe
    plugging in some other stuff, upsets your debug port.

    Not that most people like serial ports, but
    I like it. Gets the job done. Works good when
    the HID stops working on a setup.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jasen Betts@21:1/5 to Paul on Tue Sep 28 20:02:58 2021
    XPost: alt.os.linux

    On 2021-09-28, Paul <nospam@needed.invalid> wrote:
    Jonathan N. Little wrote:


    Later in the thread I believe OP said it was a laptop. Swapping PSU not
    an option. One thing that most likely the cause especially on a laptop
    is heat-related hardware issue. Laptops make this a more difficult issue
    to deal with, but depending on the laptop I would open 'er up and at
    least blow out all the dust. The Dell and Lenovo I have is a simple
    process, some other brands, not so much. Looking at the mb caps and
    crusty corrosion... My old Latitude D-820 was a lap-roster with nVidia
    GPU that was notorious for GPU meltdowns. I cleaned and remount heap
    pipes several times and avoided that fate.


    Laptops are less debug-able.

    The posts I read referred to a "computer".

    Setting up a serial port, is the best way
    to determine if it is really frozen. I prefer
    the SuperIO serial port type, to USB serial.

    I use this on the boot line of my newest computer:

    console=ttyS0,57600n8

    I have a serial cable that runs from the other
    machine, over to this machine, where I can monitor it.

    The nice thing about ttyS0, is it never moves,
    whereas if you use USB serial adapters, you
    don't know what the identifier for it is. Maybe
    plugging in some other stuff, upsets your debug port.

    USB serial never moves if you use /dev/serial/by-path, then it's tied
    to the physical socket you plugged it into (including any intermediate
    hubs).

    Not that most people like serial ports, but
    I like it. Gets the job done. Works good when
    the HID stops working on a setup.

    Also way better performance than a VNC if you're working on remote
    servers



    --
    Jasen.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Branimir Maksimovic@21:1/5 to Paulo da Silva on Wed Sep 29 01:09:15 2021
    XPost: alt.os.linux

    On 2021-09-27, Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:
    Hi all!

    From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.

    As I said, u use AVAST, it is not working on Linux...


    --

    7-77-777
    Evil Sinner!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Wed Sep 29 18:45:37 2021
    XPost: alt.os.linux

    Às 12:54 de 28/09/21, Carlos escreveu:
    On Mon, 27 Sep 2021 20:32:19 +0100, Paulo da Silva wrote:

    Às 19:34 de 27/09/21, Carlos E. R. escreveu:
    On 27/09/2021 20.02, Paulo da Silva wrote:
    Às 18:38 de 27/09/21, Marco Moock escreveu:
    ...

    I wander if there is some kind of script or configuration that forces
    the logs not to be buffered. I'll search in the internet ...

    Yes. You can send kernel logs directly to another machine via ethernet,
    or even better if available, serial port.

    Directly from the kernel, mind.

    ...


    Found the bug report :-)

    ...

    Thank you very much Carlos.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Fri Oct 1 04:05:52 2021
    XPost: alt.os.linux

    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    First let me explain how I use this computer.

    I have a starting (boot) command - cpupower - to set the max. freq. to
    2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to
    powersave. Let me call this Slow Mode.
    This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
    is quite fast, this isn't noticeable.
    When I need power, which rarely happens - training AIs or processing
    large amount of data, I set it to the max. freq. and governor to
    performance. I also do this for my FS :-) Let me call this Fast Mode.

    Now, about this problem ...

    1. I configured nvidia to ondemand. The freeze problem never occurred
    anymore. But since it could not occur for a month or more, its
    inconclusive yet. Anyway, from lots of things I have being reading it is
    very likely that BIOS, for some reason, can't cool something going wrong
    and just freezes the computer. So, no logs. Once again, inconclusive.

    2. A new problem
    When in Fast Mode, using a job with fullcpu causes a shutdown.
    This time there is a log entry:
    "thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

    I monitored zone0 temperature and could see it goes up until 102C. Then
    the computer initiates the emergency shutdown. So, the monitor gets
    probably killed. Notice that the critical temp. for this zone is 100C.

    I tried again but when the temp. of zone0 reached 99C I put the fans in
    boost mode (max. speed) and the temperature dropped and got stable at 97C.
    I tried this again, but now I just put the computer in Slow Mode. The
    temp. drops to 40-50C!

    So, why neither thermald or even the BIOS use these resources to drop
    the temperature? In fact the fans rotate at higher speed but do not
    reach the 6k RPM of boost mode. I tried several configurations for
    thermald, including give priority to acting on freqs. No success. It
    seems that thermald doesn't seem to care at all with its configs.

    Finally
    =======
    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I
    tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    By the way ... in windows this problem does not occur.

    So:
    SW problem, after an upgrade perahps? HW problem? Both?
    I feel myself lost ...
    As soon as I get some time, I'm thinking to install a new distro in a
    different partition and see what happens there.
    Until there, before I start a CPU intensive job I need to reboot before.
    Not bad ... :-)

    Thank you to all who responded and for any further comments or suggestions. Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Fri Oct 1 06:42:22 2021
    XPost: alt.os.linux

    Às 06:29 de 01/10/21, J.O. Aho escreveu:
    On 01/10/2021 05.05, Paulo da Silva wrote:

    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I
    tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    If you mean the thermal issue, maybe you need to restart thermald after
    you wake up from suspension. It's not unknown that some programs do not
    work well with suspension.
    I tried that. No luck!
    I also played with some configurations, namely giving priority to freqs. because I know that lowering them causes the zone0 temp. to drop quickly.
    BTW, in the meanwhile I remembered that the freeze problem also
    occurred, at least once, with the system in "Slow Mode" this half of
    max. freq. and powersave governor. That's why I suspect of something
    related with the GPU - HW or SW.


    I would keep an eye open for how much memory plasmashell uses, if you
    see it creep over 1G, then it's time to restart it with "plasmashell --replace". Running top/htop once in a while should be ok.
    Yes, I had several issues with plasmashell in all my computers :-( . I
    have a script to handle them. I don't remember now what it does. Just
    keeps working :-)

    Thanks

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to Paulo da Silva on Fri Oct 1 07:29:50 2021
    XPost: alt.os.linux

    On 01/10/2021 05.05, Paulo da Silva wrote:

    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    If you mean the thermal issue, maybe you need to restart thermald after
    you wake up from suspension. It's not unknown that some programs do not
    work well with suspension.

    I would keep an eye open for how much memory plasmashell uses, if you
    see it creep over 1G, then it's time to restart it with "plasmashell --replace". Running top/htop once in a while should be ok.


    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Paulo da Silva on Fri Oct 1 03:31:25 2021
    XPost: alt.os.linux

    On 10/1/2021 1:42 AM, Paulo da Silva wrote:
    Às 06:29 de 01/10/21, J.O. Aho escreveu:
    On 01/10/2021 05.05, Paulo da Silva wrote:

    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I >>> tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    If you mean the thermal issue, maybe you need to restart thermald after
    you wake up from suspension. It's not unknown that some programs do not
    work well with suspension.
    I tried that. No luck!
    I also played with some configurations, namely giving priority to freqs. because I know that lowering them causes the zone0 temp. to drop quickly. BTW, in the meanwhile I remembered that the freeze problem also
    occurred, at least once, with the system in "Slow Mode" this half of
    max. freq. and powersave governor. That's why I suspect of something
    related with the GPU - HW or SW.


    I would keep an eye open for how much memory plasmashell uses, if you
    see it creep over 1G, then it's time to restart it with "plasmashell
    --replace". Running top/htop once in a while should be ok.
    Yes, I had several issues with plasmashell in all my computers :-( . I
    have a script to handle them. I don't remember now what it does. Just
    keeps working :-)

    Thanks

    From a hardware perspective, some subsystems share power envelope
    because they're in the same package (Intel CPU and Intel HD 630).

    Or, they can share a common heatpipe, which means if one gets
    hot, both get hot (Intel CPU and NVidia GPU chip share heatpipe).

    The NVidia chip, should have an NVidia driver which controls
    frequency and voltage as a function of "what limit you're hitting".
    On something like Furmark, you would be power limited. Maybe
    the GPU driver throttles (turns down clock) when the chip gets
    too warm. And this means, you could even be in a situation where
    a railed or turboed CPU causes the GPU to slow down.

    It's beyond my pay scale, to balance all these things, but from
    the looks of it, some feedback loop in your laptop is not working
    as expected. When the CPU goes above 100C, it should start throttling.
    The NVidia chip should have a throttle temperature too. And the
    NVidia throttle point should take the GPU temperature measurement
    error into account.

    https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks

    Somehow, you have to get them to agree when throttling should happen.
    The NVidia driver already has this sort of behavior, but something
    needs adjustment so the two subsystems, one of them does not "hog" the power envelope, and cause the other subsystem to shut down the computer.

    For the digital temperature readout on the Intel CPU, it is most
    accurate at the high end, where the throttle point is. I do not
    know which measurement point on the NVidia, has the least error,
    as the method used is not likely to be exactly the one Intel
    uses for Core Temp.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From William Unruh@21:1/5 to Paulo da Silva on Fri Oct 1 13:16:58 2021
    XPost: alt.os.linux

    I am getting an occasional freeze as well. Yesterday, in the midst of a
    Google Meet seminar I was delivering!. iAlmost Complete freeze on my end. No keys worked, screen frozen. Except that the people watching could still
    hear me and see me and I could hear them. Alt-ctrl-F2 worked, so Linux was still running in
    the background. I could not figure out how to unfreeze the google-meet
    full screen and had to do the power button thingy. Of course then
    another bug showed up. -- I sometimes run my laptop with a desktop
    monitor attached. Often the second or third time I reboot, the system
    seems to get completely confused and look for that second monitor as the default after it is run the "new hardware" search. It then times out (90
    sec) on starting up akonidia(?) and then another 30 sec pause.starting
    up something else, and spew out many pages of error/waring stuff befor
    the boot process finished. So it took almost 5 min to reboot in the
    midst of my seminar. Sheesh.

    (Dell XPS13- 9360 machine, onboard Intel video
    00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

    Mageia 8, kernel
    Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

    So yes, something in Linux is having problems freezing the system. I
    suspect the video driver in my case.

    On 2021-10-01, Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    First let me explain how I use this computer.

    I have a starting (boot) command - cpupower - to set the max. freq. to
    2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to powersave. Let me call this Slow Mode.
    This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
    is quite fast, this isn't noticeable.
    When I need power, which rarely happens - training AIs or processing
    large amount of data, I set it to the max. freq. and governor to
    performance. I also do this for my FS :-) Let me call this Fast Mode.

    Now, about this problem ...

    1. I configured nvidia to ondemand. The freeze problem never occurred anymore. But since it could not occur for a month or more, its
    inconclusive yet. Anyway, from lots of things I have being reading it is
    very likely that BIOS, for some reason, can't cool something going wrong
    and just freezes the computer. So, no logs. Once again, inconclusive.

    2. A new problem
    When in Fast Mode, using a job with fullcpu causes a shutdown.
    This time there is a log entry:
    "thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

    I monitored zone0 temperature and could see it goes up until 102C. Then
    the computer initiates the emergency shutdown. So, the monitor gets
    probably killed. Notice that the critical temp. for this zone is 100C.

    I tried again but when the temp. of zone0 reached 99C I put the fans in
    boost mode (max. speed) and the temperature dropped and got stable at 97C.
    I tried this again, but now I just put the computer in Slow Mode. The
    temp. drops to 40-50C!

    So, why neither thermald or even the BIOS use these resources to drop
    the temperature? In fact the fans rotate at higher speed but do not
    reach the 6k RPM of boost mode. I tried several configurations for
    thermald, including give priority to acting on freqs. No success. It
    seems that thermald doesn't seem to care at all with its configs.

    Finally
    =======
    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    By the way ... in windows this problem does not occur.

    So:
    SW problem, after an upgrade perahps? HW problem? Both?
    I feel myself lost ...
    As soon as I get some time, I'm thinking to install a new distro in a different partition and see what happens there.
    Until there, before I start a CPU intensive job I need to reboot before.
    Not bad ... :-)

    Thank you to all who responded and for any further comments or suggestions. Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Fri Oct 1 17:12:44 2021
    XPost: alt.os.linux

    Às 14:16 de 01/10/21, William Unruh escreveu:
    I am getting an occasional freeze as well. Yesterday, in the midst of a Google Meet seminar I was delivering!. iAlmost Complete freeze on my end. No keys worked, screen frozen. Except that the people watching could still
    hear me and see me and I could hear them. Alt-ctrl-F2 worked, so Linux was still running in
    the background. I could not figure out how to unfreeze the google-meet
    full screen and had to do the power button thingy. Of course then
    another bug showed up. -- I sometimes run my laptop with a desktop
    monitor attached. Often the second or third time I reboot, the system
    seems to get completely confused and look for that second monitor as the default after it is run the "new hardware" search. It then times out (90
    sec) on starting up akonidia(?) and then another 30 sec pause.starting
    up something else, and spew out many pages of error/waring stuff befor
    the boot process finished. So it took almost 5 min to reboot in the
    midst of my seminar. Sheesh.

    (Dell XPS13- 9360 machine, onboard Intel video
    00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

    Mageia 8, kernel
    Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

    So yes, something in Linux is having problems freezing the system. I
    suspect the video driver in my case.

    I never had any problem until recently - a couple of months or so. My
    computer expired the 2 yrs warranty in June :-)
    Unfortunately in my case there is nothing working after the freezes.
    Even when it happens while listen to music, the sound entered in a +-1
    second loop.

    Regards
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Fri Oct 1 17:05:58 2021
    XPost: alt.os.linux

    Às 08:31 de 01/10/21, Paul escreveu:
    On 10/1/2021 1:42 AM, Paulo da Silva wrote:
    Às 06:29 de 01/10/21, J.O. Aho escreveu:
    On 01/10/2021 05.05, Paulo da Silva wrote:

    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts
    again!!! I
    tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    If you mean the thermal issue, maybe you need to restart thermald after
    you wake up from suspension. It's not unknown that some programs do not
    work well with suspension.
    I tried that. No luck!
    I also played with some configurations, namely giving priority to freqs.
    because I know that lowering them causes the zone0 temp. to drop quickly.
    BTW, in the meanwhile I remembered that the freeze problem also
    occurred, at least once, with the system in "Slow Mode" this half of
    max. freq. and powersave governor. That's why I suspect of something
    related with the GPU - HW or SW.


    I would keep an eye open for how much memory plasmashell uses, if you
    see it creep over 1G, then it's time to restart it with "plasmashell
    --replace". Running top/htop once in a while should be ok.
    Yes, I had several issues with plasmashell in all my computers :-( . I
    have a script to handle them. I don't remember now what it does. Just
    keeps working :-)

    Thanks

    From a hardware perspective, some subsystems share power envelope
    because they're in the same package (Intel CPU and Intel HD 630).

    Or, they can share a common heatpipe, which means if one gets
    hot, both get hot (Intel CPU and NVidia GPU chip share heatpipe).
    Ah, this explains why in ondemand the GPU temperature still rises when
    using the CPU! Also, I could see that using powersave mode in NVIDIA
    settings, which cause the NVIDIA shutdown (it turns off), sometimes the
    GPU fan still gets started.


    The NVidia chip, should have an NVidia driver which controls
    frequency and voltage as a function of "what limit you're hitting".
    On something like Furmark, you would be power limited. Maybe
    the GPU driver throttles (turns down clock) when the chip gets
    too warm. And this means, you could even be in a situation where
    a railed or turboed CPU causes the GPU to slow down.
    It's supposed that thermald takes actions to low the temperature. Per
    the configuration this should happen at 90C. After boot and before any suspension it gets stable at 97C. I don't know what is in control - it
    may be the BIOS controlling the fans, something in the kernel or
    thermald. After suspension, something fails. The temperature raises
    until 110C and the emergency shutdown starts.
    BTW, as temperature rises the fans always increase the speed. They never
    reach the "boost" RPM however. If I boost them manually, the temperature
    drops.


    It's beyond my pay scale, to balance all these things, but from
    the looks of it, some feedback loop in your laptop is not working
    as expected. When the CPU goes above 100C, it should start throttling.
    The NVidia chip should have a throttle temperature too. And the
    NVidia throttle point should take the GPU temperature measurement
    error into account.
    As I said before, thermald should have taken actions at 90C. This is the
    order of the actions by priority (file /etc/thermald/thermal-cpu-cdev-order.xml):
    <!--
    Specifies the order of compensation to cool CPU only.
    There is a default already implemented in the code, but
    this file can be used to change order

    The Following cooling device can present


    <CoolingDeviceOrder>
    <!-- Specify Cooling device order -->
    <CoolingDevice>rapl_controller</CoolingDevice>
    <CoolingDevice>intel_pstate</CoolingDevice>
    <CoolingDevice>intel_powerclamp</CoolingDevice>
    <CoolingDevice>cpufreq</CoolingDevice>
    <CoolingDevice>Processor</CoolingDevice>
    </CoolingDeviceOrder>

    I tried to put the cpufreq line in first place, because I know for sure
    that lowring the cpufreq causes the temperature to drop quickly, but
    nothing happens. I wonder if thermald is doing anything at all ...


    https://wiki.archlinux.org/title/NVIDIA/Tips_and_tricks

    ...
    Thanks for your enlightenments and comments.

    I' running out of time for this problem.
    For now I'm going with reboot whenever I need intense computation
    services. I'm also with NVIDIA in ondemand mode.
    Lately I'll:
    1. Try to compile the last version of thermald.
    2. Write a script, for just in case protection, to put as a service, to
    lower the freqs. once the temperature reaches 99C, since normally it
    gets stable at 97C and the critical is 100C. This should be the role of thermald ...

    As times go by I also look for freezes, in the hope that ondemand mode
    avoids them.
    This is too confused to send the computer for repair.

    Thank you Paul.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joerg@21:1/5 to Paulo da Silva on Fri Oct 1 15:44:58 2021
    XPost: alt.os.linux

    On 10/1/21 9:12 AM, Paulo da Silva wrote:
    Às 14:16 de 01/10/21, William Unruh escreveu:
    I am getting an occasional freeze as well. Yesterday, in the midst of a
    Google Meet seminar I was delivering!.


    Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
    it was 20 minutes before the meeting.

    My tiny ARM-based Arch-Linux box also has the occasional failure. It
    just freezes and the CPU gets hot. Sometimes it happens after a few
    days, sometimes after a month. No cues in the log. So I gave that one up.


    ... iAlmost Complete freeze on my end. No
    keys worked, screen frozen. Except that the people watching could still
    hear me and see me and I could hear them. Alt-ctrl-F2 worked, so Linux was still running in
    the background. I could not figure out how to unfreeze the google-meet
    full screen and had to do the power button thingy. Of course then
    another bug showed up. -- I sometimes run my laptop with a desktop
    monitor attached. Often the second or third time I reboot, the system
    seems to get completely confused and look for that second monitor as the
    default after it is run the "new hardware" search. It then times out (90
    sec) on starting up akonidia(?) and then another 30 sec pause.starting
    up something else, and spew out many pages of error/waring stuff befor
    the boot process finished. So it took almost 5 min to reboot in the
    midst of my seminar. Sheesh.

    (Dell XPS13- 9360 machine, onboard Intel video
    00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers (rev 02)
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)

    Mageia 8, kernel
    Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

    So yes, something in Linux is having problems freezing the system. I
    suspect the video driver in my case.

    I never had any problem until recently - a couple of months or so. My computer expired the 2 yrs warranty in June :-)


    It would not help you anyhow with an OS crash problem.

    Regarding the overtemp I assume you have looked whether there is one
    particular software that is very wasteful with processor resources. I
    had that with a morse code reading software so I no longer use it, and
    don't need it anymore.

    If something reaches a temperature limit with the fan fully blasting
    that is suspicious. I had that about two years ago and then I found the
    reason. We had adopted a dog and his fine hair got in there. So I had to
    reduce my PC fan cleaning intervals.


    Unfortunately in my case there is nothing working after the freezes.
    Even when it happens while listen to music, the sound entered in a +-1
    second loop.


    That almost cannot be hardware.

    --
    Regards, Joerg

    http://www.analogconsultants.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Sat Oct 2 01:52:28 2021
    XPost: alt.os.linux

    Às 23:44 de 01/10/21, Joerg escreveu:
    On 10/1/21 9:12 AM, Paulo da Silva wrote:
    Às 14:16 de 01/10/21, William Unruh escreveu:
    I am getting an occasional freeze as well. Yesterday, in the midst of a
    Google Meet seminar I was delivering!.


    Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
    it was 20 minutes before the meeting.

    My tiny ARM-based Arch-Linux box also has the occasional failure. It
    just freezes and the CPU gets hot. Sometimes it happens after a few
    days, sometimes after a month. No cues in the log. So I gave that one up.


                            ... iAlmost Complete freeze on my end. No
    keys worked, screen frozen. Except that the people watching could still
    hear me and see me and I could hear them. Alt-ctrl-F2 worked, so
    Linux was still running in
    the background. I could not figure out how to unfreeze the google-meet
    full screen and had to do the power button thingy. Of course then
    another bug showed up. -- I sometimes run my laptop with a desktop
    monitor attached. Often the second or third time I reboot, the system
    seems to get completely confused and look for that second monitor as the >>> default after it is run the "new hardware" search. It then times out (90 >>> sec) on starting up akonidia(?) and then another 30 sec pause.starting
    up something else, and spew out many pages of error/waring stuff befor
    the boot process finished. So it took almost 5 min to reboot in the
    midst of my seminar. Sheesh.

    (Dell XPS13- 9360 machine, onboard Intel video
    00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core
    Processor Host Bridge/DRAM Registers (rev 02)
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620
    (rev 02)

    Mageia 8, kernel
    Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC
    2021 x86_64 x86_64 x86_64 GNU/Linux

    So yes, something in Linux is having problems freezing the system. I
    suspect the video driver in my case.

    I never had any problem until recently - a couple of months or so. My
    computer expired the 2 yrs warranty in June :-)


    It would not help you anyhow with an OS crash problem.

    Regarding the overtemp I assume you have looked whether there is one particular software that is very wasteful with processor resources. I
    had that with a morse code reading software so I no longer use it, and
    don't need it anymore.

    If something reaches a temperature limit with the fan fully blasting
    that is suspicious. I had that about two years ago and then I found the reason. We had adopted a dog and his fine hair got in there. So I had to reduce my PC fan cleaning intervals.
    Here I got that CPU situation lots of times.
    I have lots of tasks very CPU/GPU intensive.
    Anyway, as soon as I put the PC in Fast mode (max freqs and governor performance) almost anything I do, sometimes even scrolling a browser
    page like Fb, causes the fans to rise RPM. Also they come back to almost
    idle relatively fast when I just stop.


    Unfortunately in my case there is nothing working after the freezes.
    Even when it happens while listen to music, the sound entered in a +-1
    second loop.


    That almost cannot be hardware.

    Hopefully not. There is one occurrence which doesn't allow me to discard
    HW: From times to times, the fans go up to big RPM (noisy) for about 1
    to 5 seconds and then follow down abruptly.The PC is doing nothing. This
    also began to occur lately. As much as I know, is the BIOS that controls
    the fans.

    Thanks Joerg.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Joerg@21:1/5 to Paulo da Silva on Sat Oct 2 11:15:05 2021
    XPost: alt.os.linux

    On 10/1/21 5:52 PM, Paulo da Silva wrote:
    Às 23:44 de 01/10/21, Joerg escreveu:
    On 10/1/21 9:12 AM, Paulo da Silva wrote:
    Às 14:16 de 01/10/21, William Unruh escreveu:
    I am getting an occasional freeze as well. Yesterday, in the midst of a >>>> Google Meet seminar I was delivering!.


    Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
    it was 20 minutes before the meeting.

    My tiny ARM-based Arch-Linux box also has the occasional failure. It
    just freezes and the CPU gets hot. Sometimes it happens after a few
    days, sometimes after a month. No cues in the log. So I gave that one up.


                            ... iAlmost Complete freeze on my end. No
    keys worked, screen frozen. Except that the people watching could still >>>> hear me and see me and I could hear them. Alt-ctrl-F2 worked, so
    Linux was still running in
    the background. I could not figure out how to unfreeze the google-meet >>>> full screen and had to do the power button thingy. Of course then
    another bug showed up. -- I sometimes run my laptop with a desktop
    monitor attached. Often the second or third time I reboot, the system
    seems to get completely confused and look for that second monitor as the >>>> default after it is run the "new hardware" search. It then times out (90 >>>> sec) on starting up akonidia(?) and then another 30 sec pause.starting >>>> up something else, and spew out many pages of error/waring stuff befor >>>> the boot process finished. So it took almost 5 min to reboot in the
    midst of my seminar. Sheesh.

    (Dell XPS13- 9360 machine, onboard Intel video
    00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core
    Processor Host Bridge/DRAM Registers (rev 02)
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620
    (rev 02)

    Mageia 8, kernel
    Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC
    2021 x86_64 x86_64 x86_64 GNU/Linux

    So yes, something in Linux is having problems freezing the system. I
    suspect the video driver in my case.

    I never had any problem until recently - a couple of months or so. My
    computer expired the 2 yrs warranty in June :-)


    It would not help you anyhow with an OS crash problem.

    Regarding the overtemp I assume you have looked whether there is one
    particular software that is very wasteful with processor resources. I
    had that with a morse code reading software so I no longer use it, and
    don't need it anymore.

    If something reaches a temperature limit with the fan fully blasting
    that is suspicious. I had that about two years ago and then I found the
    reason. We had adopted a dog and his fine hair got in there. So I had to
    reduce my PC fan cleaning intervals.
    Here I got that CPU situation lots of times.
    I have lots of tasks very CPU/GPU intensive.
    Anyway, as soon as I put the PC in Fast mode (max freqs and governor performance) almost anything I do, sometimes even scrolling a browser
    page like Fb, causes the fans to rise RPM. ...


    Can you watch the CPU load percentage when that happens? I keep that
    reading on the task bar so I can see when something becomes a MIPS
    burner. I do the same with memory usage (mainly to see when Firefox has
    reached too much memory leakage).


    ... Also they come back to almost
    idle relatively fast when I just stop.


    That is strange. When I do lengthy SPICE simulations where the CPU goes
    to almost 100% workload the fans remain on full for half a minute or so.

    But anyhow, if this huge increase and then decay happens with much less
    than 100% CPU load that would point to a mechanical problem. Pet hair in
    the fan path, thermal paste under the heatsink dried up, something like
    that.



    Unfortunately in my case there is nothing working after the freezes.
    Even when it happens while listen to music, the sound entered in a +-1
    second loop.


    That almost cannot be hardware.

    Hopefully not. There is one occurrence which doesn't allow me to discard
    HW: From times to times, the fans go up to big RPM (noisy) for about 1
    to 5 seconds and then follow down abruptly.The PC is doing nothing. This
    also began to occur lately. As much as I know, is the BIOS that controls
    the fans.


    I don't know much about Ubuntu flavors (using MX-Linux myself) but the
    fan speed can also be controlled by the OS, depending on how your
    Kubuntu is configured:

    https://askubuntu.com/questions/22108/how-to-control-fan-speed

    Sometimes hardware (or a BIOS) does this on purpose. For example, my
    DOCSIS modem for internet access has a fan that never needs to come on
    because we never stream movies and stuff like that. Very little work for
    the processor. Sometimes the fan still goes to full blast for a few
    seconds, then off. I guess they programmed it that way to avoid the fan becoming "caked up" and stuck. Just like with a power generator, you
    have to run it once a month or it might not start in a crisis situation.


    Thanks Joerg.


    As a co-worker once said, we are all here to serve :-)

    --
    Regards, Joerg

    http://www.analogconsultants.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Joerg on Sat Oct 2 14:27:52 2021
    XPost: alt.os.linux

    On 10/2/2021 2:15 PM, Joerg wrote:

    That is strange. When I do lengthy SPICE simulations where the CPU > goes to almost 100% workload the fans remain on full for half a minute or so.

    You have a good eye for time there.

    One of the Intel turbo boost options, the time constant by default
    is one of 28 seconds or 56 seconds. This sounds like the 28 second version.

    If you visit one of the enthusiast computer sites, they have
    articles on the turbo boost feature. For example, a 65W processor
    will jump up to 224W output for 28 seconds, before throttling back.
    This accelerates short intense jobs, at the expense of your
    nerves :-)

    On the overclocker machines, there is also an option on a
    number of desktop motherboards, to run the CPU constantly
    at 125W, running the CPU clock above baseline until the
    compute job is finished. The BIOS setting for this, may not
    explain at all, what it is doing. This is why you buy motherboards
    with more phases or nicer heatsinks, so that the thing will not
    be stressed too much by the behavior.

    These are the kinds of things, that you put a Kill-O-Watt meter
    on the wall plug, so that you can characterize what kind of
    policy is being used at the moment.

    This is no longer "it says it's a 65W CPU and it always
    draws 65W" era. It is a lot crazier than that. "TDP means
    nothing" is the rule of the day.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Peter@21:1/5 to Paulo da Silva on Sun Oct 3 19:22:20 2021
    XPost: alt.os.linux

    On 01.10.2021 05:05, Paulo da Silva wrote:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    First let me explain how I use this computer.

    I have a starting (boot) command - cpupower - to set the max. freq. to
    2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to powersave. Let me call this Slow Mode.
    This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
    is quite fast, this isn't noticeable.
    When I need power, which rarely happens - training AIs or processing
    large amount of data, I set it to the max. freq. and governor to
    performance. I also do this for my FS :-) Let me call this Fast Mode.

    Now, about this problem ...

    1. I configured nvidia to ondemand. The freeze problem never occurred anymore. But since it could not occur for a month or more, its
    inconclusive yet. Anyway, from lots of things I have being reading it is
    very likely that BIOS, for some reason, can't cool something going wrong
    and just freezes the computer. So, no logs. Once again, inconclusive.

    2. A new problem
    When in Fast Mode, using a job with fullcpu causes a shutdown.
    This time there is a log entry:
    "thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

    I monitored zone0 temperature and could see it goes up until 102C. Then
    the computer initiates the emergency shutdown. So, the monitor gets
    probably killed. Notice that the critical temp. for this zone is 100C.

    I tried again but when the temp. of zone0 reached 99C I put the fans in
    boost mode (max. speed) and the temperature dropped and got stable at 97C.
    I tried this again, but now I just put the computer in Slow Mode. The
    temp. drops to 40-50C!

    So, why neither thermald or even the BIOS use these resources to drop
    the temperature? In fact the fans rotate at higher speed but do not
    reach the 6k RPM of boost mode. I tried several configurations for
    thermald, including give priority to acting on freqs. No success. It
    seems that thermald doesn't seem to care at all with its configs.

    Finally
    =======
    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    By the way ... in windows this problem does not occur.

    So:
    SW problem, after an upgrade perahps? HW problem? Both?
    I feel myself lost ...
    As soon as I get some time, I'm thinking to install a new distro in a different partition and see what happens there.
    Until there, before I start a CPU intensive job I need to reboot before.
    Not bad ... :-)

    Thank you to all who responded and for any further comments or suggestions. Paulo

    Maybe check out GreenWithEnvy (GWE)? It's a Afterburner-like app for
    Linux. I do some gaming on my computer that is at times GPU heavy, and I
    use GWE to control the GPU fans and temp during heavy GPU load. You set
    up a graph for temp and rpm and this controls the fans dynamically.

    Peter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Mon Oct 4 19:58:55 2021
    XPost: alt.os.linux

    Às 19:15 de 02/10/21, Joerg escreveu:
    On 10/1/21 5:52 PM, Paulo da Silva wrote:
    Às 23:44 de 01/10/21, Joerg escreveu:
    On 10/1/21 9:12 AM, Paulo da Silva wrote:
    Às 14:16 de 01/10/21, William Unruh escreveu:
    I am getting an occasional freeze as well. Yesterday, in the midst
    of a
    Google Meet seminar I was delivering!.


    Interesting. I also had that happen yesterday, on MX-Linux 19. Luckily
    it was 20 minutes before the meeting.

    My tiny ARM-based Arch-Linux box also has the occasional failure. It
    just freezes and the CPU gets hot. Sometimes it happens after a few
    days, sometimes after a month. No cues in the log. So I gave that one
    up.


                             ... iAlmost Complete freeze on my end. No
    keys worked, screen frozen. Except that the people watching could
    still
    hear me and see me and I could hear them. Alt-ctrl-F2 worked, so
    Linux was still running in
    the background. I could not figure out how to unfreeze the google-meet >>>>> full screen and had to do the power button thingy. Of course then
    another bug showed up. -- I sometimes run my laptop with a desktop
    monitor attached. Often the second or third time I reboot, the system >>>>> seems to get completely confused and look for that second monitor
    as the
    default after it is run the "new hardware" search. It then times
    out (90
    sec) on starting up akonidia(?) and then another 30 sec pause.starting >>>>> up something else, and spew out many pages of error/waring stuff befor >>>>> the boot process finished. So it took almost 5 min to reboot in the
    midst of my seminar. Sheesh.

    (Dell XPS13- 9360 machine, onboard Intel video
    00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v6/7th Gen Core
    Processor Host Bridge/DRAM Registers (rev 02)
    00:02.0 VGA compatible controller: Intel Corporation HD Graphics 620 >>>>> (rev 02)

    Mageia 8, kernel
    Linux planet 5.10.37-server-2.mga8 #1 SMP Mon May 17 17:44:38 UTC
    2021 x86_64 x86_64 x86_64 GNU/Linux

    So yes, something in Linux is having problems freezing the system. I >>>>> suspect the video driver in my case.

    I never had any problem until recently - a couple of months or so. My
    computer expired the 2 yrs warranty in June :-)


    It would not help you anyhow with an OS crash problem.

    Regarding the overtemp I assume you have looked whether there is one
    particular software that is very wasteful with processor resources. I
    had that with a morse code reading software so I no longer use it, and
    don't need it anymore.

    If something reaches a temperature limit with the fan fully blasting
    that is suspicious. I had that about two years ago and then I found the
    reason. We had adopted a dog and his fine hair got in there. So I had to >>> reduce my PC fan cleaning intervals.
    Here I got that CPU situation lots of times.
    I have lots of tasks very CPU/GPU intensive.
    Anyway, as soon as I put the PC in Fast mode (max freqs and governor
    performance) almost anything I do, sometimes even scrolling a browser
    page like Fb, causes the fans to rise RPM. ...


    Can you watch the CPU load percentage when that happens? I keep that
    reading on the task bar so I can see when something becomes a MIPS
    burner. I do the same with memory usage (mainly to see when Firefox has reached too much memory leakage).
    This has nothing to do with overload. It largely depends on clock
    frequences which are also caused by governor "performance".
    As soon as cpus work fans tend to rise rpms. It is not needed to much
    work. The same happens always with windows where I was no able to
    control these things.



                                   ... Also they come back to almost
    idle relatively fast when I just stop.


    That is strange. When I do lengthy SPICE simulations where the CPU goes
    to almost 100% workload the fans remain on full for half a minute or so.

    But anyhow, if this huge increase and then decay happens with much less
    than 100% CPU load that would point to a mechanical problem. Pet hair in
    the fan path, thermal paste under the heatsink dried up, something like
    that.

    No problems hw problems at this level here, for sure.
    The system correctly handled all temperature stuff until recently.
    Aside from the strange freeze problem - it didn't occur anymore so far!
    - the pc correctly handle the fullcpu temperatures except after suspend
    to RAM/wake. This didn't happen before. Probably some update jeopardized
    the system. It also works fine for fullcpu in windows.



    Unfortunately in my case there is nothing working after the freezes.
    Even when it happens while listen to music, the sound entered in a +-1 >>>> second loop.


    That almost cannot be hardware.

    Hopefully not. There is one occurrence which doesn't allow me to discard
    HW: From times to times, the fans go up to big RPM (noisy) for about 1
    to 5 seconds and then follow down abruptly.The PC is doing nothing. This
    also began to occur lately. As much as I know, is the BIOS that controls
    the fans.


    I don't know much about Ubuntu flavors (using MX-Linux myself) but the
    fan speed can also be controlled by the OS, depending on how your
    Kubuntu is configured:

    https://askubuntu.com/questions/22108/how-to-control-fan-speed

    Sometimes hardware (or a BIOS) does this on purpose. For example, my
    DOCSIS modem for internet access has a fan that never needs to come on because we never stream movies and stuff like that. Very little work for
    the processor. Sometimes the fan still goes to full blast for a few
    seconds, then off. I guess they programmed it that way to avoid the fan becoming "caked up" and stuck. Just like with a power generator, you
    have to run it once a month or it might not start in a crisis situation.
    May be. Those situations never occurred anymore! I'm not having pikes of
    RPM rising now :-)

    Let's see what happens.
    So far, running in low mode and need to be careful when in fast mode
    rebooting first. As soon as possible, I'll write a small protection
    script to lower freqs. when temperature goes 99C or more on zone0.

    BTW, I tried the latest version of thermald. No success!!! I still can't understand why termald "refuses" to work! Look to the source is a no go
    for me. Too much work ...

    Thanks.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E.R.@21:1/5 to Paulo da Silva on Sun Oct 17 13:07:54 2021
    XPost: alt.os.linux

    On 01/10/2021 05.05, Paulo da Silva wrote:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    ...


    First let me explain how I use this computer.

    I have a starting (boot) command - cpupower - to set the max. freq. to
    2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to powersave. Let me call this Slow Mode.
    This causes my computer to be quiet with very low RPM on both fans. When unplugged from the charger they are most of the time at 0 RPM. Since it
    is quite fast, this isn't noticeable.
    When I need power, which rarely happens - training AIs or processing
    large amount of data, I set it to the max. freq. and governor to
    performance. I also do this for my FS :-) Let me call this Fast Mode.

    Now, about this problem ...

    1. I configured nvidia to ondemand. The freeze problem never occurred anymore. But since it could not occur for a month or more, its
    inconclusive yet. Anyway, from lots of things I have being reading it is
    very likely that BIOS, for some reason, can't cool something going wrong
    and just freezes the computer. So, no logs. Once again, inconclusive.

    2. A new problem
    When in Fast Mode, using a job with fullcpu causes a shutdown.
    This time there is a log entry:
    "thermal thermal_zone0: critical temperature reached (110 C), shutting down" So, I tried to analyze the problem.

    I monitored zone0 temperature and could see it goes up until 102C. Then
    the computer initiates the emergency shutdown. So, the monitor gets
    probably killed. Notice that the critical temp. for this zone is 100C.

    I tried again but when the temp. of zone0 reached 99C I put the fans in
    boost mode (max. speed) and the temperature dropped and got stable at 97C.
    I tried this again, but now I just put the computer in Slow Mode. The
    temp. drops to 40-50C!

    So, why neither thermald or even the BIOS use these resources to drop
    the temperature? In fact the fans rotate at higher speed but do not
    reach the 6k RPM of boost mode. I tried several configurations for
    thermald, including give priority to acting on freqs. No success. It
    seems that thermald doesn't seem to care at all with its configs.

    Finally
    =======
    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    By the way ... in windows this problem does not occur.

    So:
    SW problem, after an upgrade perahps? HW problem? Both?
    I feel myself lost ...
    As soon as I get some time, I'm thinking to install a new distro in a different partition and see what happens there.
    Until there, before I start a CPU intensive job I need to reboot before.
    Not bad ... :-)

    Thank you to all who responded and for any further comments or suggestions. Paulo

    I have used two machines with limited cooling; one is a mini computer
    box, fanless (idea is to be put on sitting room by the TV). When it is
    doing something intense, it overheats and it throttles the CPU down.
    Another is a laptop I prepared for another person, with a relatively
    fast processor that can overheat if you demand some job for minutes, and
    then it throttles down.

    Both seem to be designed for this; be running normally with a small
    load, but sprint on demand if the user needs to run something. But they
    can not keep up the load for a long time because they have no fan, or a
    too small fan.

    Now, I did not install any daemon or configure anything, it was the
    kernel itself doing it all, our of the box.

    Both have only Intel graphics.

    The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is "Intel(R) Pentium(R) CPU N3710 @ 1.60GHz" (4 cores)

    The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB SSD/15.6"

    In both cases I installed openSUSE Leap 15

    --
    Cheers, Carlos.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Sun Oct 17 19:15:34 2021
    XPost: alt.os.linux

    Às 12:07 de 17/10/21, Carlos E.R. escreveu:
    On 01/10/2021 05.05, Paulo da Silva wrote:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    ...


    First let me explain how I use this computer.

    I have a starting (boot) command - cpupower - to set the max. freq. to
    2.5 GHz. It's max. value used to be 5.1GHz. Also set the governor to
    powersave. Let me call this Slow Mode.
    This causes my computer to be quiet with very low RPM on both fans. When
    unplugged from the charger they are most of the time at 0 RPM. Since it
    is quite fast, this isn't noticeable.
    When I need power, which rarely happens - training AIs or processing
    large amount of data, I set it to the max. freq. and governor to
    performance. I also do this for my FS :-) Let me call this Fast Mode.

    Now, about this problem ...

    1. I configured nvidia to ondemand. The freeze problem never occurred
    anymore. But since it could not occur for a month or more, its
    inconclusive yet. Anyway, from lots of things I have being reading it is
    very likely that BIOS, for some reason, can't cool something going wrong
    and just freezes the computer. So, no logs. Once again, inconclusive.

    2. A new problem
    When in Fast Mode, using a job with fullcpu causes a shutdown.
    This time there is a log entry:
    "thermal thermal_zone0: critical temperature reached (110 C), shutting
    down"
    So, I tried to analyze the problem.

    I monitored zone0 temperature and could see it goes up until 102C. Then
    the computer initiates the emergency shutdown. So, the monitor gets
    probably killed. Notice that the critical temp. for this zone is 100C.

    I tried again but when the temp. of zone0 reached 99C I put the fans in
    boost mode (max. speed) and the temperature dropped and got stable at
    97C.
    I tried this again, but now I just put the computer in Slow Mode. The
    temp. drops to 40-50C!

    So, why neither thermald or even the BIOS use these resources to drop
    the temperature? In fact the fans rotate at higher speed but do not
    reach the 6k RPM of boost mode. I tried several configurations for
    thermald, including give priority to acting on freqs. No success. It
    seems that thermald doesn't seem to care at all with its configs.

    Finally
    =======
    If I reboot the computer:
    Then it seems OK.
    I put it in Fast Mode, execute a fullcpu job and zone0 temp. keeps
    stable at 97C!
    If I suspend the computer, when restarting the problem starts again!!! I
    tried this few times last couple of days.
    Notice that all these problems are relatively recent.

    By the way ... in windows this problem does not occur.

    So:
    SW problem, after an upgrade perahps? HW problem? Both?
    I feel myself lost ...
    As soon as I get some time, I'm thinking to install a new distro in a
    different partition and see what happens there.
    Until there, before I start a CPU intensive job I need to reboot before.
    Not bad ... :-)

    Thank you to all who responded and for any further comments or
    suggestions.
    Paulo

    I have used two machines with limited cooling; one is a mini computer
    box, fanless (idea is to be put on sitting room by the TV). When it is
    doing something intense, it overheats and it throttles the CPU down.
    Another is a laptop I prepared for another person, with a relatively
    fast processor that can overheat if you demand some job for minutes, and
    then it throttles down.

    Both seem to be designed for this; be running normally with a small
    load, but sprint on demand if the user needs to run something. But they
    can not keep up the load for a long time because they have no fan, or a
    too small fan.

    Now, I did not install any daemon or configure anything, it was the
    kernel itself doing it all, our of the box.

    Both have only Intel graphics.

    The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is "Intel(R) Pentium(R) CPU  N3710  @ 1.60GHz" (4 cores)

    The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB SSD/15.6"

    In both cases I installed openSUSE Leap 15

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are
    able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    About Opensuse ... that was the best and more stable distro I have ever
    used. I dropped it because the problem of install certain type of SW -
    lack of information or packages, and the unavailability of some library
    sources for development. In debian likes I just need to install <lib
    name>-dev. One example was libgcrypt20.

    Regards
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From tom@21:1/5 to Paulo da Silva on Mon Oct 18 00:38:11 2021
    XPost: alt.os.linux

    On Mon, 27 Sep 2021 18:22:21 +0100
    Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:

    Hi all!

    From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.

    sounds like something to do with the ram. Disable XMP.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E.R.@21:1/5 to Paulo da Silva on Mon Oct 18 14:26:29 2021
    XPost: alt.os.linux

    On 17/10/2021 20.15, Paulo da Silva wrote:
    Às 12:07 de 17/10/21, Carlos E.R. escreveu:



    I have used two machines with limited cooling; one is a mini computer
    box, fanless (idea is to be put on sitting room by the TV). When it is
    doing something intense, it overheats and it throttles the CPU down.
    Another is a laptop I prepared for another person, with a relatively
    fast processor that can overheat if you demand some job for minutes, and
    then it throttles down.

    Both seem to be designed for this; be running normally with a small
    load, but sprint on demand if the user needs to run something. But they
    can not keep up the load for a long time because they have no fan, or a
    too small fan.

    Now, I did not install any daemon or configure anything, it was the
    kernel itself doing it all, our of the box.

    Both have only Intel graphics.

    The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is
    "Intel(R) Pentium(R) CPU  N3710  @ 1.60GHz" (4 cores)

    The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB SSD/15.6" >>
    In both cases I installed openSUSE Leap 15

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are
    able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    Isengard:~ # ps afx | grep thermal
    615 ? I< 0:00 \_ [acpi_thermal_pm]
    23830 pts/23 S+ 0:00 \_ grep --color=auto thermal
    Isengard:~ #

    I'm not running thermald.


    About Opensuse ... that was the best and more stable distro I have ever
    used. I dropped it because the problem of install certain type of SW -
    lack of information or packages, and the unavailability of some library sources for development. In debian likes I just need to install <lib name>-dev. One example was libgcrypt20.

    What? All sources are available in openSUSE.


    http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm


    You just need to activate the sources repo in YaST. If some particular
    package is missing the source, declare a bug.


    If you just need the files to compile some other thing, you need the libname-devel package instead.

    http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm


    --
    Cheers, Carlos.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to Paulo da Silva on Mon Oct 18 15:47:25 2021
    XPost: alt.os.linux

    On 17/10/2021 20.15, Paulo da Silva wrote:

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are
    able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    I know I did tell you to test to reload the the thermald service and you
    said it didn't make any difference, what about
    - stop thermald
    - rmmod the cpu temp module
    - modprobe the cpu temp module
    - start thermald

    I'm not even sure if you can remove the module.


    About Opensuse ... that was the best and more stable distro I have ever
    used. I dropped it because the problem of install certain type of SW -
    lack of information or packages, and the unavailability of some library sources for development.

    I did run OpenSuSe at my two previous jobs, sure there was shortcoming
    with getting packages, but as Carlos already pointed out the dev
    packages are in a different repository. And of course you can get hold
    of all the SRPMs too in case you want to make some changes to a package.

    It's not the distro I would use at home, for me metadistributions has
    been more in my taste except the time it takes to build all the packages.


    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to tom on Mon Oct 18 11:48:36 2021
    XPost: alt.os.linux

    On 10/18/2021 3:38 AM, tom wrote:
    On Mon, 27 Sep 2021 18:22:21 +0100
    Paulo da Silva <p_d_a_s_i_l_v_a_ns@nonetnoaddress.pt> wrote:

    Hi all!

    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.

    sounds like something to do with the ram. Disable XMP.

    But it's a freezing problem.

    If the memory was bad, you'd expect the odd crash. Linux
    seems to be pretty resistant to bad memory (like kernel panic),
    when I tested with some unstable memory, so I don't think
    the symptom description is a good match for it.

    *******

    To test memory, the latest memory test...
    The download is compressed, so it's 9MB or so,
    but expands to a larger file in Archive Manager.

    https://www.memtest86.com/downloads/memtest86-usb.zip

    memtest86-usb.img 500*1048576 bytes, nice for dd to USB stick

    It can be "dd" transferred to a USB stick. It's a little
    slow at startup, as it sniff around the hardware, but
    the traditional memory test interface eventually appears.
    This would be good for that new UEFI-only PC you bought.
    (My old copy of memtest would not run, because it got
    into a boot loop with the GOP video code. It would
    restart every time the screen tried to update.)

    My processor draws 30W while that is running,
    versus 65W while Prime95 does a thermal test.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Tue Oct 19 01:49:32 2021
    XPost: alt.os.linux

    Às 13:26 de 18/10/21, Carlos E.R. escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:
    Às 12:07 de 17/10/21, Carlos E.R. escreveu:



    I have used two machines with limited cooling; one is a mini computer
    box, fanless (idea is to be put on sitting room by the TV). When it is
    doing something intense, it overheats and it throttles the CPU down.
    Another is a laptop I prepared for another person, with a relatively
    fast processor that can overheat if you demand some job for minutes, and >>> then it throttles down.

    Both seem to be designed for this; be running normally with a small
    load, but sprint on demand if the user needs to run something. But they
    can not keep up the load for a long time because they have no fan, or a
    too small fan.

    Now, I did not install any daemon or configure anything, it was the
    kernel itself doing it all, our of the box.

    Both have only Intel graphics.

    The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is
    "Intel(R) Pentium(R) CPU  N3710  @ 1.60GHz" (4 cores)

    The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB
    SSD/15.6"

    In both cases I installed openSUSE Leap 15

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are
    able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    Isengard:~ # ps afx | grep thermal
      615 ?        I<     0:00  \_ [acpi_thermal_pm]
    23830 pts/23   S+     0:00          \_ grep --color=auto thermal
    Isengard:~ #

    I'm not running thermald.
    Yes! The BIOS and/or the kernel should be enough to avoid temperatures problems. thermald, should at least be a last resource protection.
    None of them avoid the temperature from rising after suspension!
    At least one of them does before any suspension occurred. The
    temperature never rises above 97ºC.



    About Opensuse ... that was the best and more stable distro I have ever
    used. I dropped it because the problem of install certain type of SW -
    lack of information or packages, and the unavailability of some library
    sources for development. In debian likes I just need to install <lib
    name>-dev. One example was libgcrypt20.

    What? All sources are available in openSUSE.


    http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm



    You just need to activate the sources repo in YaST. If some particular package is missing the source, declare a bug.


    If you just need the files to compile some other thing, you need the libname-devel package instead.

    http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm

    Yes, now they are. But they weren't when I needed them.
    May be I'll give OS a try again.

    Regards.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Tue Oct 19 02:14:47 2021
    XPost: alt.os.linux

    Às 14:47 de 18/10/21, J.O. Aho escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are
    able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    I know I did tell you to test to reload the the thermald service and you
    said it didn't make any difference, what about
     - stop thermald
     - rmmod the cpu temp module
     - modprobe the cpu temp module
     - start thermald

    I'm not even sure if you can remove the module.
    Good idea, but unfortunately it didn't work!
    I managed to remove all thermal related modules and installed them
    again. No success! Temp keeps rising until I kill the full cpu test script!



    About Opensuse ... that was the best and more stable distro I have ever
    used. I dropped it because the problem of install certain type of SW -
    lack of information or packages, and the unavailability of some library
    sources for development.

    I did run OpenSuSe at my two previous jobs, sure there was shortcoming
    with getting packages, but as Carlos already pointed out the dev
    packages are in a different repository. And of course you can get hold
    of all the SRPMs too in case you want to make some changes to a package.

    It's not the distro I would use at home, for me metadistributions has
    been more in my taste except the time it takes to build all the packages.

    A few years ago I used Gentoo for long time.
    Them it became too boring managing all that stuff of configurations, use
    flags, ...
    Besides, from times to times I get some compilations problems and needed
    to make some "hacks".

    Thanks.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Tue Oct 19 03:08:11 2021
    XPost: alt.os.linux

    Às 01:49 de 19/10/21, Paulo da Silva escreveu:
    Às 13:26 de 18/10/21, Carlos E.R. escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:
    Às 12:07 de 17/10/21, Carlos E.R. escreveu:



    I have used two machines with limited cooling; one is a mini computer
    box, fanless (idea is to be put on sitting room by the TV). When it is >>>> doing something intense, it overheats and it throttles the CPU down.
    Another is a laptop I prepared for another person, with a relatively
    fast processor that can overheat if you demand some job for minutes, and >>>> then it throttles down.

    Both seem to be designed for this; be running normally with a small
    load, but sprint on demand if the user needs to run something. But they >>>> can not keep up the load for a long time because they have no fan, or a >>>> too small fan.

    Now, I did not install any daemon or configure anything, it was the
    kernel itself doing it all, our of the box.

    Both have only Intel graphics.

    The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is >>>> "Intel(R) Pentium(R) CPU  N3710  @ 1.60GHz" (4 cores)

    The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB
    SSD/15.6"

    In both cases I installed openSUSE Leap 15

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are >>> able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    Isengard:~ # ps afx | grep thermal
      615 ?        I<     0:00  \_ [acpi_thermal_pm]
    23830 pts/23   S+     0:00          \_ grep --color=auto thermal
    Isengard:~ #

    I'm not running thermald.
    Yes! The BIOS and/or the kernel should be enough to avoid temperatures problems. thermald, should at least be a last resource protection.
    None of them avoid the temperature from rising after suspension!
    At least one of them does before any suspension occurred. The
    temperature never rises above 97ºC.

    I tried to reboot, stopped thermald and the temperature at full cpu
    still gets stable at 97ºC. So, thermald seems to be doing nothing at all.

    Regards.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to Paulo da Silva on Tue Oct 19 07:55:56 2021
    XPost: alt.os.linux

    On 19/10/2021 03.14, Paulo da Silva wrote:
    Às 14:47 de 18/10/21, J.O. Aho escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are >>> able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    I know I did tell you to test to reload the the thermald service and you
    said it didn't make any difference, what about
     - stop thermald
     - rmmod the cpu temp module
     - modprobe the cpu temp module
     - start thermald

    I'm not even sure if you can remove the module.
    Good idea, but unfortunately it didn't work!
    I managed to remove all thermal related modules and installed them
    again. No success! Temp keeps rising until I kill the full cpu test script!

    Take a look at this thread at github: https://github.com/intel/thermal_daemon/issues/268

    In the comment https://github.com/intel/thermal_daemon/issues/268#issuecomment-788709112
    it's mentioned that the thermald works after suspension after a patched
    version was used.

    As I understand you can increase the debug information to get more info
    about what thermald is doing, that could maybe help while trying to
    figure it out.

    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Wed Oct 20 00:09:05 2021
    XPost: alt.os.linux

    Às 06:55 de 19/10/21, J.O. Aho escreveu:
    On 19/10/2021 03.14, Paulo da Silva wrote:
    Às 14:47 de 18/10/21, J.O. Aho escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they
    are
    able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    I know I did tell you to test to reload the the thermald service and you >>> said it didn't make any difference, what about
      - stop thermald
      - rmmod the cpu temp module
      - modprobe the cpu temp module
      - start thermald

    I'm not even sure if you can remove the module.
    Good idea, but unfortunately it didn't work!
    I managed to remove all thermal related modules and installed them
    again. No success! Temp keeps rising until I kill the full cpu test
    script!

    Take a look at this thread at github: https://github.com/intel/thermal_daemon/issues/268

    In the comment https://github.com/intel/thermal_daemon/issues/268#issuecomment-788709112 it's mentioned that the thermald works after suspension after a patched version was used.

    As I understand you can increase the debug information to get more info
    about what thermald is doing, that could maybe help while trying to
    figure it out.

    I'll try that. Not much hope, however.
    The patch is included in the last version.
    With the version of kubuntu 20.04:
    - I have tried --adaptative and --ignore-cpuid--check. It didn't
    complain but I could not determine if they are both active.

    It should be expectable that the patch was back ported to kubuntu 20.04.
    Anyway ... I'll try the last version again, but this time with both
    switches active, to see what happens.

    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Wed Oct 20 02:41:46 2021
    XPost: alt.os.linux

    Às 00:09 de 20/10/21, Paulo da Silva escreveu:
    Às 06:55 de 19/10/21, J.O. Aho escreveu:
    On 19/10/2021 03.14, Paulo da Silva wrote:
    Às 14:47 de 18/10/21, J.O. Aho escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising? >>>>> At least the sensors are working - I can monitor them and, at least, >>>>> lowering the CPU's freqs result in temps lowering. Also the fans are >>>>> able to go to higher RPM. If I manually put them in boost mode, they >>>>> are
    able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    I know I did tell you to test to reload the the thermald service and you >>>> said it didn't make any difference, what about
      - stop thermald
      - rmmod the cpu temp module
      - modprobe the cpu temp module
      - start thermald

    I'm not even sure if you can remove the module.
    Good idea, but unfortunately it didn't work!
    I managed to remove all thermal related modules and installed them
    again. No success! Temp keeps rising until I kill the full cpu test
    script!

    Take a look at this thread at github:
    https://github.com/intel/thermal_daemon/issues/268

    In the comment
    https://github.com/intel/thermal_daemon/issues/268#issuecomment-788709112
    it's mentioned that the thermald works after suspension after a patched
    version was used.

    As I understand you can increase the debug information to get more info
    about what thermald is doing, that could maybe help while trying to
    figure it out.

    I'll try that. Not much hope, however.
    The patch is included in the last version.
    With the version of kubuntu 20.04:
    - I have tried --adaptative and --ignore-cpuid--check. It didn't
    complain but I could not determine if they are both active.

    It should be expectable that the patch was back ported to kubuntu 20.04. Anyway ... I'll try the last version again, but this time with both
    switches active, to see what happens.

    And NO :-(
    Not working, same symptoms.
    For some reason, bios and/or kernel does not stop temperature from
    rising after suspension and thermald seems to have no role on this.
    Removing it does not change anything.

    Log was not very ellucidative for me. The only message with some sense
    is something that says it's too early for acting or something like that.
    When I get some patience I'll give it another try.

    Thanks anyway.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E.R.@21:1/5 to Paulo da Silva on Wed Oct 20 14:21:38 2021
    XPost: alt.os.linux

    On 19/10/2021 02.49, Paulo da Silva wrote:
    Às 13:26 de 18/10/21, Carlos E.R. escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:
    Às 12:07 de 17/10/21, Carlos E.R. escreveu:



    I have used two machines with limited cooling; one is a mini computer
    box, fanless (idea is to be put on sitting room by the TV). When it is >>>> doing something intense, it overheats and it throttles the CPU down.
    Another is a laptop I prepared for another person, with a relatively
    fast processor that can overheat if you demand some job for minutes, and >>>> then it throttles down.

    Both seem to be designed for this; be running normally with a small
    load, but sprint on demand if the user needs to run something. But they >>>> can not keep up the load for a long time because they have no fan, or a >>>> too small fan.

    Now, I did not install any daemon or configure anything, it was the
    kernel itself doing it all, our of the box.

    Both have only Intel graphics.

    The minipc is a "msi CubiN Mini-PC" (I can't find exact model), cpu is >>>> "Intel(R) Pentium(R) CPU  N3710  @ 1.60GHz" (4 cores)

    The laptop is "Lenovo ThinkPad E15 Intel Core i5-10210U/8GB/512GB
    SSD/15.6"

    In both cases I installed openSUSE Leap 15

    That's the main point, Carlos. Why doesn't my PC (kernel, bios,
    whatever) is unable to control the temperature after suspend/wake?
    Besides, why thermald also seems to do anything to stop temp rising?
    At least the sensors are working - I can monitor them and, at least,
    lowering the CPU's freqs result in temps lowering. Also the fans are
    able to go to higher RPM. If I manually put them in boost mode, they are >>> able to stop the temp rising!
    Immediately after (re)boot the system never goes above 97ºC!

    Isengard:~ # ps afx | grep thermal
      615 ?        I<     0:00  \_ [acpi_thermal_pm]
    23830 pts/23   S+     0:00          \_ grep --color=auto thermal
    Isengard:~ #

    I'm not running thermald.
    Yes! The BIOS and/or the kernel should be enough to avoid temperatures problems. thermald, should at least be a last resource protection.
    None of them avoid the temperature from rising after suspension!
    At least one of them does before any suspension occurred. The
    temperature never rises above 97ºC.

    I have no personal experience with thermald, so I can't offer advice on it.




    About Opensuse ... that was the best and more stable distro I have ever
    used. I dropped it because the problem of install certain type of SW -
    lack of information or packages, and the unavailability of some library
    sources for development. In debian likes I just need to install <lib
    name>-dev. One example was libgcrypt20.

    What? All sources are available in openSUSE.


    http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm



    You just need to activate the sources repo in YaST. If some particular
    package is missing the source, declare a bug.


    If you just need the files to compile some other thing, you need the
    libname-devel package instead.

    http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm

    Yes, now they are. But they weren't when I needed them.
    May be I'll give OS a try again.


    In the case a source package is missing, just declare a bug.


    I saw yesterday this command to zypper:


    source-install (si) name...
    Install specified source packages and their build
    dependencies. If the name of a binary package is given, the
    corresponding source package is looked up and installed instead.

    This command will try to find the newest available versions
    of the source packages and uses rpm -i to install them, optionally
    together with all the packages that are required to build the source
    package. The default location where rpm installs source packages to is /usr/src/packages/{SPECS,SOURCES}, but the values can be changed in your
    local rpm configuration. In case of doubt try executing rpm --eval
    "%{_specdir} and %{_sourcedir}".

    Note that the source packages must be available in
    repositories you are using. You can check whether a repository contains
    any source packages using the following command:

    $ zypper search -t srcpackage -r alias|name|#|URI

    $ zypper search -t srcpackage -r alias|name|#|URI


    --
    Cheers, Carlos.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Wed Oct 20 18:22:08 2021
    XPost: alt.os.linux

    Às 13:21 de 20/10/21, Carlos E.R. escreveu:
    On 19/10/2021 02.49, Paulo da Silva wrote:
    Às 13:26 de 18/10/21, Carlos E.R. escreveu:
    On 17/10/2021 20.15, Paulo da Silva wrote:
    Às 12:07 de 17/10/21, Carlos E.R. escreveu:

    ...

    About Opensuse ... that was the best and more stable distro I have ever >>>> used. I dropped it because the problem of install certain type of SW - >>>> lack of information or packages, and the unavailability of some library >>>> sources for development. In debian likes I just need to install <lib
    name>-dev. One example was libgcrypt20.

    What? All sources are available in openSUSE.


    http://download.opensuse.org/source/distribution/leap/15.2/repo/oss/src/libgcrypt-1.8.2-lp152.16.8.src.rpm




    You just need to activate the sources repo in YaST. If some particular
    package is missing the source, declare a bug.


    If you just need the files to compile some other thing, you need the
    libname-devel package instead.

    http://download.opensuse.org/distribution/leap/15.2/repo/oss/x86_64/libgcrypt-devel-1.8.2-lp152.16.8.x86_64.rpm


    Yes, now they are. But they weren't when I needed them.
    May be I'll give OS a try again.


    In the case a source package is missing, just declare a bug.


    I saw yesterday this command to zypper:


           source-install (si) name...
               Install specified source packages and their build dependencies. If the name of a binary package is given, the
    corresponding source package is looked up and installed instead.

               This command will try to find the newest available versions
    of the source packages and uses rpm -i to install them, optionally
    together with all the packages that are required to build the source
    package. The default location where rpm installs source packages to is /usr/src/packages/{SPECS,SOURCES}, but the values can be changed in your local rpm configuration. In case of doubt try executing rpm --eval "%{_specdir} and %{_sourcedir}".

               Note that the source packages must be available in repositories you are using. You can check whether a repository contains
    any source packages using the following command:

                   $ zypper search -t srcpackage -r alias|name|#|URI

                   $ zypper search -t srcpackage -r alias|name|#|URI

    OK, let's say I want to give opensuse a try.

    Let's say I install it and it still cannot handle my temperature
    problem. I need to check this before I go into install and configure all
    SW I use. This takes a couple of weeks.
    How to delete it?

    I know I did it in the past, but just to be sure ... is it:

    1. boot into my actual system.
    2. do grub-install or grub-install /dev/nvme0n1 (disk)?
    3. efibootmgr -B -b <bootnum>?
    4. Do I need further cleans in /boot/efi?

    Is this enough?

    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E.R.@21:1/5 to Paulo da Silva on Thu Oct 21 00:19:53 2021
    XPost: alt.os.linux

    On 20/10/2021 19.22, Paulo da Silva wrote:
    Às 13:21 de 20/10/21, Carlos E.R. escreveu:


    OK, let's say I want to give opensuse a try.

    Let's say I install it and it still cannot handle my temperature
    problem. I need to check this before I go into install and configure all
    SW I use. This takes a couple of weeks.
    How to delete it?

    I know I did it in the past, but just to be sure ... is it:

    1. boot into my actual system.
    2. do grub-install or grub-install /dev/nvme0n1 (disk)?

    I don't think you need that one.

    3. efibootmgr -B -b <bootnum>?

    Yes.

    4. Do I need further cleans in /boot/efi?

    You can erase the directory /boot/efi/EFI/opensuse, and of course the
    root partition.



    Maybe you could try one of the live versions, put it under load, and see
    what happens with the temps and the fans. It is not fully reliable, but
    it is faster.

    --
    Cheers, Carlos.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ordinary Poster@21:1/5 to Paulo da Silva on Thu Oct 21 00:04:44 2021
    XPost: alt.os.linux

    On 20/10/2021 18:22, Paulo da Silva wrote:
    OK, let's say I want to give opensuse a try.

    People just use a live Flash drive to try things. They don't install
    anything.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Thu Oct 21 02:25:58 2021
    XPost: alt.os.linux

    Às 02:19 de 21/10/21, Paulo da Silva escreveu:
    Às 23:19 de 20/10/21, Carlos E.R. escreveu:
    On 20/10/2021 19.22, Paulo da Silva wrote:
    Às 13:21 de 20/10/21, Carlos E.R. escreveu:

    ...


    Maybe you could try one of the live versions, put it under load, and see
    what happens with the temps and the fans. It is not fully reliable, but
    it is faster.
    Is there a simple way to prepare a pen with r/w permissions from the
    iso? I remember to use unetbootin, or something like that, to do it, but
    it stopped working at a given point. Since then I have been using dd,
    but this makes the pen readonly.
    I would like to update the system, make some trivial confs, and install
    some sw and it would be nice to make them permanent.
    Don't take time with this if you don't know. In the meanwhile I'll
    search the net and test on a VM.
    Just one more question I forgot ...
    Is it the same to install from the live image or is it better to
    download the installer image? I'm asking because I never found a distro
    with both images.

    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Thu Oct 21 02:19:20 2021
    XPost: alt.os.linux

    Às 23:19 de 20/10/21, Carlos E.R. escreveu:
    On 20/10/2021 19.22, Paulo da Silva wrote:
    Às 13:21 de 20/10/21, Carlos E.R. escreveu:


    OK, let's say I want to give opensuse a try.

    Let's say I install it and it still cannot handle my temperature
    problem. I need to check this before I go into install and configure all
    SW I use. This takes a couple of weeks.
    How to delete it?

    I know I did it in the past, but just to be sure ... is it:

    1. boot into my actual system.
    2. do grub-install or grub-install /dev/nvme0n1 (disk)?

    I don't think you need that one.
    Are you sure? What if I remove that partition content? Doesn't grub need
    it? I am asking because I always believed (without fundament) that there
    is always a main system for boot.


    3. efibootmgr -B -b <bootnum>?

    Yes.

    4. Do I need further cleans in /boot/efi?

    You can erase the directory /boot/efi/EFI/opensuse, and of course the
    root partition.



    Maybe you could try one of the live versions, put it under load, and see
    what happens with the temps and the fans. It is not fully reliable, but
    it is faster.
    Is there a simple way to prepare a pen with r/w permissions from the
    iso? I remember to use unetbootin, or something like that, to do it, but
    it stopped working at a given point. Since then I have been using dd,
    but this makes the pen readonly.
    I would like to update the system, make some trivial confs, and install
    some sw and it would be nice to make them permanent.
    Don't take time with this if you don't know. In the meanwhile I'll
    search the net and test on a VM.

    Thanks
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David W. Hodgins@21:1/5 to All on Wed Oct 20 21:58:24 2021
    XPost: alt.os.linux

    On Wed, 20 Oct 2021 21:25:58 -0400, Paulo da Silva
    Is there a simple way to prepare a pen with r/w permissions from the
    iso? I remember to use unetbootin, or something like that, to do it, but
    it stopped working at a given point. Since then I have been using dd,
    but this makes the pen readonly.
    I would like to update the system, make some trivial confs, and install
    some sw and it would be nice to make them permanent.
    Don't take time with this if you don't know. In the meanwhile I'll
    search the net and test on a VM.

    For Mageia, the isodumper program/package from the Mageia repos. When writing an
    image to a usb stick, with the option to add a persistent partition selected, it
    uses dd to write the image, then adds an ext4 partition to the remaining space with
    the label mgalive-persist. The Mageia live iso images look for the partition, and if
    found mounts it as an overlayfs so all changes made, including installing additional
    packages, are stored for later use.

    Just one more question I forgot ...
    Is it the same to install from the live image or is it better to
    download the installer image? I'm asking because I never found a distro
    with both images.

    When installing from a live iso, the contents of the iso (all files seen when it's booted, not the iso file itself) are copied to the selected/mounted file systems. If installing while running in live mode, and selecting the install
    from the running live system, the changes made in live mode, including those stored in the mgalive-persist file system, are included.

    I expect other distros that support persistence use similar packages and methods.

    Regards, Dave Hodgins

    --
    Change dwhodgins@nomail.afraid.org to davidwhodgins@teksavvy.com for
    email replies.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Thu Oct 21 03:47:33 2021
    XPost: alt.os.linux

    Às 02:58 de 21/10/21, David W. Hodgins escreveu:
    On Wed, 20 Oct 2021 21:25:58 -0400, Paulo da Silva
    Is there a simple way to prepare a pen with r/w permissions from the
    iso? I remember to use unetbootin, or something like that, to do it, but >>> it stopped working at a given point. Since then I have been using dd,
    but this makes the pen readonly.
    I would like to update the system, make some trivial confs, and install
    some sw and it would be nice to make them permanent.
    Don't take time with this if you don't know. In the meanwhile I'll
    search the net and test on a VM.

    For Mageia, the isodumper program/package from the Mageia repos. When
    writing an
    image to a usb stick, with the option to add a persistent partition
    selected, it
    uses dd to write the image, then adds an ext4 partition to the remaining space with
    the label mgalive-persist. The Mageia live iso images look for the
    partition, and if
    found mounts it as an overlayfs so all changes made, including
    installing additional
    packages, are stored for later use.
    This is good. I don't know if Opensuse does the same. Most likely not.


    Just one more question I forgot ...
    Is it the same to install from the live image or is it better to
    download the installer image? I'm asking because I never found a distro
    with both images.

    When installing from a live iso, the contents of the iso (all files seen
    when
    it's booted, not the iso file itself) are copied to the selected/mounted
    file
    systems. If installing while running in live mode, and selecting the
    install
    from the running live system, the changes made in live mode, including
    those
    stored in the mgalive-persist file system, are included.

    I expect other distros that support persistence use similar packages and methods.
    At least my network wifi configuration goes to the new installed system.
    I'm not sure about the other stuff.

    Thanks.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Henry Crun@21:1/5 to Paulo da Silva on Thu Oct 21 07:46:02 2021
    On 21/10/2021 4:19, Paulo da Silva wrote:
    Às 23:19 de 20/10/21, Carlos E.R. escreveu:
    On 20/10/2021 19.22, Paulo da Silva wrote:
    Às 13:21 de 20/10/21, Carlos E.R. escreveu:


    OK, let's say I want to give opensuse a try.

    Let's say I install it and it still cannot handle my temperature
    problem. I need to check this before I go into install and configure all >>> SW I use. This takes a couple of weeks.
    How to delete it?

    I know I did it in the past, but just to be sure ... is it:

    1. boot into my actual system.
    2. do grub-install or grub-install /dev/nvme0n1 (disk)?

    I don't think you need that one.
    Are you sure? What if I remove that partition content? Doesn't grub need
    it? I am asking because I always believed (without fundament) that there
    is always a main system for boot.


    3. efibootmgr -B -b <bootnum>?

    Yes.

    4. Do I need further cleans in /boot/efi?

    You can erase the directory /boot/efi/EFI/opensuse, and of course the
    root partition.



    Maybe you could try one of the live versions, put it under load, and see
    what happens with the temps and the fans. It is not fully reliable, but
    it is faster.
    Is there a simple way to prepare a pen with r/w permissions from the
    iso? I remember to use unetbootin, or something like that, to do it, but
    it stopped working at a given point. Since then I have been using dd,
    but this makes the pen readonly.
    I would like to update the system, make some trivial confs, and install
    some sw and it would be nice to make them permanent.
    Don't take time with this if you don't know. In the meanwhile I'll
    search the net and test on a VM.

    Thanks
    Paulo>

    What distro are you trying?
    If you install and use mkusb there is an option of creating a persistent (i.e read/write) bootable USB pen drive,
    but limited to Debian or Ubuntu. (This is after all an Ubuntu newsgroup)
    See https://help.ubuntu.com/community/mkusb



    --
    Mike R.
    Home: http://alpha.mike-r.com/
    QOTD: http://alpha.mike-r.com/qotd.php
    No Micro$oft products were used in the URLs above, or in preparing this message.
    Recommended reading: http://www.catb.org/~esr/faqs/smart-questions.html#before
    and: http://alpha.mike-r.com/jargon/T/top-post.html
    Missile address: N31.7624/E34.9691

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Ordinary Poster on Thu Oct 21 00:56:50 2021
    XPost: alt.os.linux

    On 10/20/2021 7:04 PM, Ordinary Poster wrote:
    On 20/10/2021 18:22, Paulo da Silva wrote:
    OK, let's say I want to give opensuse a try.

    People just use a live Flash drive to try things. They don't install anything.

    Downloaded the 900MB "LiveDVD" one.

    https://sjc.edge.kernel.org/opensuse/tumbleweed/iso/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20211016-Media.iso

    Shows an install icon, but it will probably
    be doing some sort of network install, with
    some delays while it gets stuff from the network.

    Whereas the 4GB version will at least have a few
    files onboard.

    For a one-off install, the 900MB might be the answer.
    If you think you'll be installing more than once,
    then it might be more important to get a larger
    piece of media.

    This is what I see in a VM, when clicking the Install
    icon in the 900MB one.

    [Picture]

    https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E. R.@21:1/5 to Paul on Thu Oct 21 15:10:20 2021
    XPost: alt.os.linux

    On 21/10/2021 06.56, Paul wrote:
    On 10/20/2021 7:04 PM, Ordinary Poster wrote:
    On 20/10/2021 18:22, Paulo da Silva wrote:
    OK, let's say I want to give opensuse a try.

    People just use a live Flash drive to try things. They don't install
    anything.

    Downloaded the 900MB "LiveDVD" one.

    https://sjc.edge.kernel.org/opensuse/tumbleweed/iso/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20211016-Media.iso

    That one is intended to be run as is, on the USB stick, without
    installation, although installation is possible. There should be a KDE
    version, another GNome, another XFCE, and another dedicated to rescue
    work (the later two might be the same one).

    All of them are intended to copy with dd from the image to the USB
    device (say, /dev/sdb), destroying all the partitions (creates new
    ones). On the first run they create a read/write partition where you can
    save files. It is possible to add some packages with zypper (not the
    kernel, though).

    Don't try to "make them bootable", that would destroy them. Just copy to
    the stick, unmodified, with dd or dedicated programs (as described in
    the openSUSE wiki).


    Then there are two other images, one of about 4GB (the DVD) and another
    mall one for network install. Those are the pure installation images,
    can not be "run". That is, of course they boot and run but what you get
    has only the purpose of installation.



    Shows an install icon, but it will probably
    be doing some sort of network install, with
    some delays while it gets stuff from the network.

    Whereas the 4GB version will at least have a few
    files onboard.

    For a one-off install, the 900MB might be the answer.
    If you think you'll be installing more than once,
    then it might be more important to get a larger
    piece of media.

    This is what I see in a VM, when clicking the Install
    icon in the 900MB one.

       [Picture]

       https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

    If that's the "Tumbleweed-KDE-Live" you can just cancel the install and
    use the system as is, no installation.


    --
    Cheers,
    Carlos E.R.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Carlos E. R.@21:1/5 to Paulo da Silva on Thu Oct 21 15:24:57 2021
    XPost: alt.os.linux

    On 21/10/2021 03.19, Paulo da Silva wrote:
    Às 23:19 de 20/10/21, Carlos E.R. escreveu:
    On 20/10/2021 19.22, Paulo da Silva wrote:
    Às 13:21 de 20/10/21, Carlos E.R. escreveu:


    OK, let's say I want to give opensuse a try.

    Let's say I install it and it still cannot handle my temperature
    problem. I need to check this before I go into install and configure all >>> SW I use. This takes a couple of weeks.
    How to delete it?

    I know I did it in the past, but just to be sure ... is it:

    1. boot into my actual system.
    2. do grub-install or grub-install /dev/nvme0n1 (disk)?

    I don't think you need that one.
    Are you sure? What if I remove that partition content? Doesn't grub need
    it? I am asking because I always believed (without fundament) that there
    is always a main system for boot.

    Not if you are using UEFI.

    Of course, I'm never completely sure, specially if I did not do the
    system myself ;-)

    It is the code in the /boot/efi/EFI/opensuse directly which would call
    the grub code or maybe a kernel loader.

    And this code is called by UEFI code, and you change that with
    "efibootmgr -B -b <bootnum>" everything else is not strictly required



    Maybe you could try one of the live versions, put it under load, and see
    what happens with the temps and the fans. It is not fully reliable, but
    it is faster.
    Is there a simple way to prepare a pen with r/w permissions from the
    iso? I remember to use unetbootin, or something like that, to do it, but
    it stopped working at a given point. Since then I have been using dd,
    but this makes the pen readonly.

    Just dd, if the ISO was prepared for it. I know the one named "rescue"
    is, it is the one I use.

    I would like to update the system, make some trivial confs, and install
    some sw and it would be nice to make them permanent.
    Don't take time with this if you don't know. In the meanwhile I'll
    search the net and test on a VM.

    The "rescue" iso should be perfect for testing how the system responds
    how it behaves when overheating. Just tell it to clone a hard disk
    partition to a compressed file with parallelization, it should overload
    the CPU fast. No need to install your code and things.

    I can find the script I use for this later today, different computer.


    --
    Cheers,
    Carlos E.R.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Carlos E. R. on Thu Oct 21 10:43:37 2021
    XPost: alt.os.linux

    On 10/21/2021 9:10 AM, Carlos E. R. wrote:
    On 21/10/2021 06.56, Paul wrote:
    This is what I see in a VM, when clicking the Install
    icon in the 900MB one.

        [Picture]

        https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

    If that's the "Tumbleweed-KDE-Live" you can just cancel the install and use the system as is, no installation.

    When a person wants to run a specific graphics driver,
    an install comes in handy for that case. Even a USB stick
    with persistence would do, but persistence easily exhausts
    the 4GB formulation, and it helps to have a larger
    casper-rw than that. I think Rufus can do that (rufus.ie).

    You might need a specific graphics driver, to get a machine
    hot enough to tip over.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rockinghorse Winner@21:1/5 to Carlos E. R. on Thu Oct 21 18:41:10 2021
    XPost: alt.os.linux

    On 2021-10-21, Carlos E. R. <robin_listas@es.invalid> wrote:
    On 21/10/2021 06.56, Paul wrote:
    On 10/20/2021 7:04 PM, Ordinary Poster wrote:
    On 20/10/2021 18:22, Paulo da Silva wrote:
    OK, let's say I want to give opensuse a try.

    People just use a live Flash drive to try things. They don't install
    anything.

    Downloaded the 900MB "LiveDVD" one.

    https://sjc.edge.kernel.org/opensuse/tumbleweed/iso/openSUSE-Tumbleweed-KDE-Live-x86_64-Snapshot20211016-Media.iso

    That one is intended to be run as is, on the USB stick, without
    installation, although installation is possible. There should be a KDE version, another GNome, another XFCE, and another dedicated to rescue
    work (the later two might be the same one).

    All of them are intended to copy with dd from the image to the USB
    device (say, /dev/sdb), destroying all the partitions (creates new
    ones). On the first run they create a read/write partition where you can
    save files. It is possible to add some packages with zypper (not the
    kernel, though).

    Don't try to "make them bootable", that would destroy them. Just copy to
    the stick, unmodified, with dd or dedicated programs (as described in
    the openSUSE wiki).


    Then there are two other images, one of about 4GB (the DVD) and another
    mall one for network install. Those are the pure installation images,
    can not be "run". That is, of course they boot and run but what you get
    has only the purpose of installation.



    Shows an install icon, but it will probably
    be doing some sort of network install, with
    some delays while it gets stuff from the network.

    Whereas the 4GB version will at least have a few
    files onboard.

    For a one-off install, the 900MB might be the answer.
    If you think you'll be installing more than once,
    then it might be more important to get a larger
    piece of media.

    This is what I see in a VM, when clicking the Install
    icon in the 900MB one.

       [Picture]

       https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

    If that's the "Tumbleweed-KDE-Live" you can just cancel the install and
    use the system as is, no installation.



    Install on an external SSD drive, and get a more realistic experience....if it's a no go, you just rinse, repeat with another distro....

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Paul on Thu Oct 21 14:42:42 2021
    XPost: alt.os.linux

    On 10/21/2021 10:43 AM, Paul wrote:
    On 10/21/2021 9:10 AM, Carlos E. R. wrote:
    On 21/10/2021 06.56, Paul wrote:
    This is what I see in a VM, when clicking the Install
    icon in the 900MB one.

        [Picture]

        https://i.postimg.cc/R085hy8s/900-MB-disc-has-install-icon.gif

    If that's the "Tumbleweed-KDE-Live" you can just cancel the install and use the system as is, no installation.

    When a person wants to run a specific graphics driver,
    an install comes in handy for that case. Even a USB stick
    with persistence would do, but persistence easily exhausts
    the 4GB formulation, and it helps to have a larger
    casper-rw than that. I think Rufus can do that (rufus.ie).

    You might need a specific graphics driver, to get a machine
    hot enough to tip over.

       Paul

    For the OP, that distro is using a UEFI-only install,
    so it expects GPT partitioning and UEFI boot in the BIOS.
    That means it cannot share with a MSDOS partitioned disk
    and legacy boot setup.

    I had to back up my disk drive (MSDOS partitioned), clean
    it off, then allow SUSE to use the whole thing for GPT, to
    allow the install to quickly get under way. It says the
    install will take 40 minutes. Afterwards, I will restore
    from backup, to put the disk back in original condition.

    If it supported MSDOS partitioning and legacy (CSM) boot,
    I probably would have been able to come up with an install
    plan so it would fit alongside UbuntuStudio.

    I figured something was up, when I wasn't seeing the word
    "hybrid" when scanning the ISO with "disktype" utility.

    Paul


    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Sat Oct 30 00:36:44 2021
    XPost: alt.os.linux

    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    From time to time - may be a month or a couple of hours - my computer completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    The current situation:

    1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
    2. The "freezes" origin still unknown. Now I am almost sure that it does
    not come from the BIOS. In fact, during a freeze, there was one
    occurrence of several continuous "fan jets".
    3. A couple of "fan jets" also occurred once while in the grub menu!
    4. The uncontrollable rising of temperature of /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after
    suspend/wake also occurs with opensuse leap 15.3 live. Before
    suspending, that temperature is kept stable at 97ºC.

    I tried to use several kernels available in kubuntu, including an intel
    version 5.13, but I was unable to get them boot in graphic mode - nvidia
    470. Some more ... time and I'll try it without Nvidia drivers.

    Thanks for your attention.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bobbie Sellers@21:1/5 to Paulo da Silva on Fri Oct 29 17:33:59 2021
    On 10/29/21 16:36, Paulo da Silva wrote:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    The current situation:

    1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
    2. The "freezes" origin still unknown. Now I am almost sure that it does
    not come from the BIOS. In fact, during a freeze, there was one
    occurrence of several continuous "fan jets".
    3. A couple of "fan jets" also occurred once while in the grub menu!
    4. The uncontrollable rising of temperature of /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after suspend/wake also occurs with opensuse leap 15.3 live. Before
    suspending, that temperature is kept stable at 97ºC.

    I tried to use several kernels available in kubuntu, including an intel version 5.13, but I was unable to get them boot in graphic mode - nvidia
    470. Some more ... time and I'll try it without Nvidia drivers.

    Thanks for your attention.
    Paulo

    Have you opened the case and used compressed air to get the dust out?

    How long has the CPU been in place under the heat sink. The grease or thermal paste used can dry out and lose heat conductivity.

    Good luck with your machine, Paulo.

    bliss - if Linux was truely elitist I could not afford the entry fee.
    --

    bliss dash SF 4 ever at dslextreme dot com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Mon Nov 1 20:13:53 2021
    Às 01:33 de 30/10/21, Bobbie Sellers escreveu:
    On 10/29/21 16:36, Paulo da Silva wrote:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

     From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    The current situation:

    1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
    2. The "freezes" origin still unknown. Now I am almost sure that it does
    not come from the BIOS. In fact, during a freeze, there was one
    occurrence of several continuous "fan jets".
    3. A couple of "fan jets" also occurred once while in the grub menu!
    4. The uncontrollable rising of temperature of
    /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after
    suspend/wake also occurs with opensuse leap 15.3 live. Before
    suspending, that temperature is kept stable at 97ºC.

    I tried to use several kernels available in kubuntu, including an intel
    version 5.13, but I was unable to get them boot in graphic mode - nvidia
    470. Some more ... time and I'll try it without Nvidia drivers.

    Thanks for your attention.
    Paulo

     Have you opened the case and used compressed air to get the dust out?

     How long has the CPU been in place under the heat sink.  The grease or thermal paste used can dry out and lose heat conductivity.

    Of course it is very likely there are some problems with the
    sensors/cooling system. But what I do not understand is why the
    temperature gets controlled, by the kernel perhaps, before first
    suspension and not after waking from suspension!
    I have tried Opensuse and Clear linux. All have the same problem.
    I have written a small script that successfully controls the temperature
    just changing the CPU's freqs. I don't know how to act on the other
    cooling systems. thermald, which was supposed to do this, fails miserabilly.

    Regards.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Paulo da Silva on Mon Nov 1 22:21:00 2021
    On 11/1/2021 4:13 PM, Paulo da Silva wrote:
    Às 01:33 de 30/10/21, Bobbie Sellers escreveu:
    On 10/29/21 16:36, Paulo da Silva wrote:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

     From time to time - may be a month or a couple of hours - my computer >>>> completely freezes. Everything stops. The screen shows the last image. >>>> Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    The current situation:

    1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
    2. The "freezes" origin still unknown. Now I am almost sure that it does >>> not come from the BIOS. In fact, during a freeze, there was one
    occurrence of several continuous "fan jets".
    3. A couple of "fan jets" also occurred once while in the grub menu!
    4. The uncontrollable rising of temperature of
    /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after
    suspend/wake also occurs with opensuse leap 15.3 live. Before
    suspending, that temperature is kept stable at 97ºC.

    I tried to use several kernels available in kubuntu, including an intel
    version 5.13, but I was unable to get them boot in graphic mode - nvidia >>> 470. Some more ... time and I'll try it without Nvidia drivers.

    Thanks for your attention.
    Paulo

     Have you opened the case and used compressed air to get the dust out?

     How long has the CPU been in place under the heat sink.  The grease or >> thermal paste used can dry out and lose heat conductivity.

    Of course it is very likely there are some problems with the
    sensors/cooling system. But what I do not understand is why the
    temperature gets controlled, by the kernel perhaps, before first
    suspension and not after waking from suspension!
    I have tried Opensuse and Clear linux. All have the same problem.
    I have written a small script that successfully controls the temperature
    just changing the CPU's freqs. I don't know how to act on the other
    cooling systems. thermald, which was supposed to do this, fails miserabilly.

    Regards.
    Paulo


    Find some docs first.

    https://01.org/linux-thermal-daemon/documentation/introduction-thermal-daemon

    It is possible the ThermalD package doesn't have sufficient XML files
    to control every possible HW config. Maybe some platforms will require
    hand programming.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paulo da Silva@21:1/5 to All on Tue Dec 7 22:20:14 2021
    XPost: alt.os.linux

    Às 00:36 de 30/10/21, Paulo da Silva escreveu:
    Às 18:22 de 27/09/21, Paulo da Silva escreveu:
    Hi all!

    From time to time - may be a month or a couple of hours - my computer
    completely freezes. Everything stops. The screen shows the last image.
    Not even the cursor moves. No keyboard key works including the
    Alt-PrtScreen keys, like REISUB.

    I need to press the power on/off button for 5 secs to restart it.

    After restart the journalctl -b -b1 shows nothing at the freeze time.

    I changed my NVIDIA driver to 470. I also tried to put the driver in
    ondemand status. No success. Sooner or later it freezes.

    Is there a way to get some information on what this is happening?

    I am using kubuntu 20.04.

    Thank you.


    The current situation:

    1. The sporadic "fan jets" are from the normal fan. Not the GPU one.
    2. The "freezes" origin still unknown. Now I am almost sure that it does
    not come from the BIOS. In fact, during a freeze, there was one
    occurrence of several continuous "fan jets".
    3. A couple of "fan jets" also occurred once while in the grub menu!
    4. The uncontrollable rising of temperature of /sys/devices/virtual/thermal/thermal_zone0/temp at full cpu after suspend/wake also occurs with opensuse leap 15.3 live. Before
    suspending, that temperature is kept stable at 97ºC.

    I tried to use several kernels available in kubuntu, including an intel version 5.13, but I was unable to get them boot in graphic mode - nvidia
    470. Some more ... time and I'll try it without Nvidia drivers.


    1. Freezes completely disappeared after changing the kernel to ubuntu
    hwe - currently 5.11.
    2. "fan jets" also went out but not when changing the kernel. May be
    something changed in some windows/pc control sw, during a windows
    update, or some change of EC after I kept the PC disconnected from power
    with the battery full discharged for more than 5 hours just to reset it.
    3. The problem of the uncontrolled rising of temperature in acpitz zone
    after waking from suspension when at "full cpu" still remains. I'm
    controlling it with a python script changing upper frequencies of cpu cores.

    Thanks to all interested in this problem.
    Paulo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)