On 2021-09-24, at 05:58, Philip Webb <purslow@ca.inter.net> wrote:
While I was asleep yesterday, my machine reported on all 3 Konsoles :
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9d0b4c16001d011b
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0 APIC 0 microcode 6000822
-- end of report --
I don't remember seeing this before : how concerned should I be ?
On 2021-09-24, at 05:58, Philip Webb <purslow@ca.inter.net> wrote:
While I was asleep yesterday, my machine reported on all 3 Konsoles :From the manpage:
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9d0b4c16001d011b
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0 APIC 0 microcode 6000822
-- end of report --
Most errors can be corrected by the CPU
by internal error correction mechanisms. Uncorrected errors cause
machine check exceptions which may kill processes or panic the machine.
A small number of corrected errors is usually not a cause for worry,
but a large number can indicate future failure.
When an uncorrected machine check error happens
that the kernel cannot recover from, then it will usually panic the system. In this case when there was a warm reset after the panic,
mcelog should pick up the machine check errors after reboot.
This is not possible after a cold reset.
If you are overclocking, try disabling it.
210924 Andrew Udvare wrote:
On 2021-09-24, at 05:58, Philip Webb <purslow@ca.inter.net> wrote:
While I was asleep yesterday, my machine reported on all 3 Konsoles :From the manpage:
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9d0b4c16001d011b >>> Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0 APIC 0 microcode 6000822
-- end of report --
Which man page is that ?
On 24/09/2021 06:48, Philip Webb wrote::
210924 Andrew Udvare wrote:
On 2021-09-24, at 05:58, Philip Webb <purslow@ca.inter.net> wrote:
While I was asleep yesterday, my machine reported on all 3 Konsoles
9d0b4c16001d011bMessage from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4:
APIC 0 microcode 6000822Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0
-- end of report --From the manpage:
Which man page is that ?
man mcelog
On 2021-09-24, at 05:58, Philip Webb <purslow@ca.inter.net> wrote:
While I was asleep yesterday, my machine reported on all 3 KonsolesI have no direct experience with this error,
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 4: 9d0b4c16001d011b
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: TSC 0 ADDR 19e617980 MISC c01a000001000000
Message from syslogd@ at Thu Sep 23 19:38:11 2021 ...
: mce: [Hardware Error]: PROCESSOR 2:600f20 TIME 1632440315 SOCKET 0 APIC 0 microcode 6000822
however I'd suggest it was most likely an error
reading a block of DRAM and not likely the CPU itself failing.
I periodically get mce errors on my i980 machine
when running big PixInsight jobs and I hit thermal limits.
I'd suggest you run extensive memory tests
and if you don't see any problems don't worry too much.
It's always wise to do good backups in case the problem gets worse.
On Fri, Sep 24, 2021 at 8:23 AM Andrew Udvare <audvare@gmail.com> wrote:
man mcelog
man mcelog
'man mcelog' + 'man mce' find nothing. does it need to be installed ?
</div><div><br></div><div>Do you also have lm-sensors installed? Running sensord?</div><div><br></div><div>Genuine CPU issues seem pretty rare, so I would check for overheating or power issues, and lm-sensors will help with that.<br></div></div></div>
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 294 |
Nodes: | 16 (2 / 14) |
Uptime: | 244:35:33 |
Calls: | 6,626 |
Calls today: | 2 |
Files: | 12,175 |
Messages: | 5,320,390 |