• Unstable system revisited

    From Grimble@2:250/1 to All on Mon Jun 21 15:41:45 2021
    I tried with IOMMU Enabled, and with IOMMU disabled and iommu=soft in
    Grub - no improvement.
    I suspect the problem lies in the graphical environment - the screen
    froze for about 20 minutes but is now responding again. While frozen,
    the system still served files via NFS to other systems. Certaily some of
    the freezes seemed to occur in connection with mouse movements/button
    presses.
    Thought I might follow the wiki How to set up the graphical server with Nouveau as an alternative to the nVidia driver. Now I get "Invalid magic number" message, but there seems to be no posts that correct this. An
    ideas please?
    --
    Grimble
    Registered Linux User #450547
    Machine 'Bach' running Plasma 5.20.4 on 5.10.43-desktop-1.mga8 kernel.
    Mageia release 8 (Official) for x86_64

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From grimble@2:250/1 to All on Tue Jun 22 19:19:09 2021
    On 21/06/2021 15:41, Grimble wrote:
    I tried with IOMMU Enabled, and with IOMMU disabled and iommu=soft in
    Grub - no improvement.
    I suspect the problem lies in the graphical environment - the screen
    froze for about 20 minutes but is now responding again. While frozen,
    the system still served files via NFS to other systems. Certaily some of
    the freezes seemed to occur in connection with mouse movements/button presses.
    Thought I might follow the wiki How to set up the graphical server with Nouveau as an alternative to the nVidia driver. Now I get "Invalid magic number" message, but there seems to be no posts that correct this. An
    ideas please?
    Seems like an update to kernel 5.10.43 failed part way through, so had
    to drop back to 5.10.41. Change to nouveau worked but I'm not optimistic
    the problem's solved - Chrome bombed as soon as I launched it.

    --
    Grimble
    Registered Linux User #450547
    Machine 'mozart' running Plasma 5.20.4 on 5.10.41-desktop-1.mga8 kernel.
    Mageia release 8 (Official) for x86_64

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From David W. Hodgins@2:250/1 to All on Tue Jun 22 22:13:34 2021
    On Tue, 22 Jun 2021 14:19:09 -0400, grimble <grimble@nomail.afraid.org> wrote:

    On 21/06/2021 15:41, Grimble wrote:
    I tried with IOMMU Enabled, and with IOMMU disabled and iommu=soft in
    Grub - no improvement.
    I suspect the problem lies in the graphical environment - the screen
    froze for about 20 minutes but is now responding again. While frozen,
    the system still served files via NFS to other systems. Certaily some of
    the freezes seemed to occur in connection with mouse movements/button
    presses.
    Thought I might follow the wiki How to set up the graphical server with
    Nouveau as an alternative to the nVidia driver. Now I get "Invalid magic
    number" message, but there seems to be no posts that correct this. An
    ideas please?
    Seems like an update to kernel 5.10.43 failed part way through, so had
    to drop back to 5.10.41. Change to nouveau worked but I'm not optimistic
    the problem's solved - Chrome bombed as soon as I launched it.

    More info about how the update failed is needed. Did it run out of space?

    Regards, Dave Hodgins

    --
    Change dwhodgins@nomail.afraid.org to davidwhodgins@teksavvy.com for
    email replies.

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Grimble@2:250/1 to All on Wed Jun 23 14:56:33 2021
    On 22/06/2021 22:13, David W. Hodgins wrote:
    On Tue, 22 Jun 2021 14:19:09 -0400, grimble <grimble@nomail.afraid.org> wrote:

    On 21/06/2021 15:41, Grimble wrote:
    I tried with IOMMU Enabled, and with IOMMU disabled and iommu=soft in
    Grub - no improvement.
    I suspect the problem lies in the graphical environment - the screen
    froze for about 20 minutes but is now responding again. While frozen,
    the system still served files via NFS to other systems. Certaily some of >>> the freezes seemed to occur in connection with mouse movements/button
    presses.
    Thought I might follow the wiki How to set up the graphical server with
    Nouveau as an alternative to the nVidia driver. Now I get "Invalid magic >>> number" message, but there seems to be no posts that correct this. An
    ideas please?
    Seems like an update to kernel 5.10.43 failed part way through, so had
    to drop back to 5.10.41. Change to nouveau worked but I'm not optimistic
    the problem's solved - Chrome bombed as soon as I launched it.

    More info about how the update failed is needed. Did it run out of space?

    Regards, Dave Hodgins

    Very unlikely, 9+ GB available on /. I noticed several of the 5.10.43
    files in /boot, such as vmlinuz- and sysvers- were 0 B.

    --
    Grimble
    Registered Linux User #450547
    Machine 'Bach' running Plasma 5.20.4 on 5.10.43-desktop-1.mga8 kernel.
    Mageia release 8 (Official) for x86_64

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Wed Jun 23 17:41:03 2021
    On Wed, 23 Jun 2021 14:56:33 +0100, Grimble wrote:
    I noticed several of the 5.10.43
    files in /boot, such as vmlinuz- and sysvers- were 0 B.

    Then those are failures. /boot should look something like this snippet:

    239K Jun 11 02:33 config-5.10.43-desktop-1.mga8
    4.0K Dec 26 08:34 dracut
    4.0K Jun 14 09:21 grub2
    17M Jun 14 09:12 initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd-desktop.img -> initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd.img -> initrd-5.10.43-desktop-1.mga8.img
    193K Jun 11 02:33 symvers-5.10.43-desktop-1.mga8.xz
    4.8M Jun 11 02:33 System.map-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz -> vmlinuz-5.10.43-desktop-1.mga8
    7.7M Jun 11 02:33 vmlinuz-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz-desktop -> vmlinuz-5.10.43-desktop-1.mga8

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Grimble@2:250/1 to All on Thu Jun 24 12:55:34 2021
    On 23/06/2021 17:41, Bit Twister wrote:
    On Wed, 23 Jun 2021 14:56:33 +0100, Grimble wrote:
    I noticed several of the 5.10.43
    files in /boot, such as vmlinuz- and sysvers- were 0 B.

    Then those are failures. /boot should look something like this snippet:

    239K Jun 11 02:33 config-5.10.43-desktop-1.mga8
    4.0K Dec 26 08:34 dracut
    4.0K Jun 14 09:21 grub2
    17M Jun 14 09:12 initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd-desktop.img -> initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd.img -> initrd-5.10.43-desktop-1.mga8.img
    193K Jun 11 02:33 symvers-5.10.43-desktop-1.mga8.xz
    4.8M Jun 11 02:33 System.map-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz -> vmlinuz-5.10.43-desktop-1.mga8
    7.7M Jun 11 02:33 vmlinuz-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz-desktop -> vmlinuz-5.10.43-desktop-1.mga8

    Thanks, Bit. Fortunately 5.10.45 installed without problems. However,
    we're still trying to find the root cause of this instability.
    G

    --
    Grimble
    Registered Linux User #450547
    Machine 'Bach' running Plasma 5.20.4 on 5.10.45-desktop-2.mga8 kernel.
    Mageia release 8 (Official) for x86_64

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Thu Jun 24 13:13:07 2021
    On Thu, 24 Jun 2021 12:55:34 +0100, Grimble wrote:
    On 23/06/2021 17:41, Bit Twister wrote:
    On Wed, 23 Jun 2021 14:56:33 +0100, Grimble wrote:
    I noticed several of the 5.10.43
    files in /boot, such as vmlinuz- and sysvers- were 0 B.

    Then those are failures. /boot should look something like this snippet:

    239K Jun 11 02:33 config-5.10.43-desktop-1.mga8
    4.0K Dec 26 08:34 dracut
    4.0K Jun 14 09:21 grub2
    17M Jun 14 09:12 initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd-desktop.img -> initrd-5.10.43-desktop-1.mga8.img >> 33 Jun 14 09:13 initrd.img -> initrd-5.10.43-desktop-1.mga8.img
    193K Jun 11 02:33 symvers-5.10.43-desktop-1.mga8.xz
    4.8M Jun 11 02:33 System.map-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz -> vmlinuz-5.10.43-desktop-1.mga8
    7.7M Jun 11 02:33 vmlinuz-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz-desktop -> vmlinuz-5.10.43-desktop-1.mga8

    Thanks, Bit. Fortunately 5.10.45 installed without problems.

    Well that is a step forward. Since the old kernels did not compile you
    might consider using mcc to remove them.

    However,
    we're still trying to find the root cause of this instability.

    Yes I have been following the thread. Maybe enabling core dump might give
    you a chance to find out why.

    $ cat enable_coredump.txt
    #*********** start of enable_coredump.txt ****************************
    # following are instructions for enabling core dumps.
    #
    # "ulimit -c unlimited" has to executed during login either by
    # each user in $HOME/.bash_profile or $HOME/.bashrc or globally
    # for everyone. The following is globally.

    # As root, paste the following in a root terminal:

    echo '#!/bin/bash
    # Enable core dump for each user
    ulimit -c unlimited
    #***** end of /etc/profile.d/xx_enable_coredump.sh
    ' > /etc/profile.d/xx_enable_coredump.sh


    echo '#!/bin/csh
    # Enable core dump for each user
    ulimit -c unlimited
    #***** end of /etc/profile.d/xx_enable_coredump.csh
    ' > /etc/profile.d/xx_enable_coredump.csh

    chmod +x /etc/profile.d/xx_enable_coredump.sh
    chmod +x /etc/profile.d/xx_enable_coredump.csh

    #****************************************************************
    # you need to make sysctl changes, as root, paste the following: #****************************************************************

    mkdir --parents /etc/sysctl.d/

    echo '#**** start of /etc/sysctl.d/xx_enable_coredump.conf
    # enable System Request debugging functionality of the kernel
    kernel.sysrq = 1

    # Enabling suid dump PID appending and set core location and name

    kernel.core_uses_pid = 1
    kernel.core_pattern = /var/tmp/%e_%p_%s.core
    fs.suid_dumpable = 2

    #********* end of /etc/sysctl.d/xx_enable_coredump.conf *********
    ' > /etc/sysctl.d/xx_enable_coredump.conf

    #***********************************************
    # reload all sysctl changes with the following: #***********************************************

    sysctl -a

    #**************************************************
    # add any/all users to the systemd-coredump group #**************************************************

    while read -r line; do
    set -- $(IFS=':' ; echo $line)
    usermod --append --groups systemd-coredump $1
    done < <(grep /home/ /etc/passwd)

    #******************************************************************
    # users have to log out/in to pick up the new systemd-coredump group.
    # verify user can create a core dump, In a user terminal run #******************************************************************

    firefox &
    pkill --signal SIGSEGV --full firefox
    ls -Al /var/tmp/
    rm /var/tmp/*firefox*.core

    #*******************************************************************
    # When a user launches a terminal, $HOME/.bashrc is normally executed.
    # I can recommend adding a command to check for core files. I also have
    # a hourly cron job to warn about core files. #*******************************************************************


    ------8<------8<------8<--cut below this line ----8<------8<
    #!/bin/bash
    #********************* start ck_4_core *******************************

    for _d in $HOME /tmp /var/tmp /var/lib/systemd/coredump ; do
    _cnt=$(ls $_d/*core* 2> /dev/null | wc -l)
    if [ $_cnt -gt 0 ] ; then
    echo " "
    ls -la $_d/*core*
    echo "
    # from $(hostname --short) ck_4_core
    # There is a $_d/*core file. To remove it, run
    rm --force $_d/*core*
    "
    fi
    done

    #*************** end ck_4_core **************************************** ------8<------8<------8<--cut above this line ----8<------8<

    Do remember to set execute bit on ck_4_core.
    chmod +x ck_4_core



    #*********** end of enable_coredump.txt ******************************

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Grimble@2:250/1 to All on Fri Jun 25 17:18:50 2021
    On 24/06/2021 13:13, Bit Twister wrote:
    On Thu, 24 Jun 2021 12:55:34 +0100, Grimble wrote:
    On 23/06/2021 17:41, Bit Twister wrote:
    On Wed, 23 Jun 2021 14:56:33 +0100, Grimble wrote:
    I noticed several of the 5.10.43
    files in /boot, such as vmlinuz- and sysvers- were 0 B.

    Then those are failures. /boot should look something like this snippet:

    239K Jun 11 02:33 config-5.10.43-desktop-1.mga8
    4.0K Dec 26 08:34 dracut
    4.0K Jun 14 09:21 grub2
    17M Jun 14 09:12 initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd-desktop.img -> initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd.img -> initrd-5.10.43-desktop-1.mga8.img
    193K Jun 11 02:33 symvers-5.10.43-desktop-1.mga8.xz
    4.8M Jun 11 02:33 System.map-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz -> vmlinuz-5.10.43-desktop-1.mga8
    7.7M Jun 11 02:33 vmlinuz-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz-desktop -> vmlinuz-5.10.43-desktop-1.mga8

    Thanks, Bit. Fortunately 5.10.45 installed without problems.

    Well that is a step forward. Since the old kernels did not compile you
    might consider using mcc to remove them.

    However,
    we're still trying to find the root cause of this instability.

    Yes I have been following the thread. Maybe enabling core dump might give
    you a chance to find out why.
    <Snipping out all your useful code >
    Did all that, launched firefox, got immediate message:
    Exiting due to channel error
    Crash annotation GraphicsCriticalError Receive IPC close with reason=AbnormalShutdown (t=1.32075)
    So, now I have 550MB of crash data. What to do with it?


    --
    Grimble
    Registered Linux User #450547
    Machine 'Bach' running Plasma 5.20.4 on 5.10.45-desktop-2.mga8 kernel.
    Mageia release 8 (Official) for x86_64

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From David W. Hodgins@2:250/1 to All on Fri Jun 25 17:47:32 2021
    On Fri, 25 Jun 2021 12:18:50 -0400, Grimble <grimble@nomail.afraid.org> wrote:

    On 24/06/2021 13:13, Bit Twister wrote:
    On Thu, 24 Jun 2021 12:55:34 +0100, Grimble wrote:
    On 23/06/2021 17:41, Bit Twister wrote:
    On Wed, 23 Jun 2021 14:56:33 +0100, Grimble wrote:
    I noticed several of the 5.10.43
    files in /boot, such as vmlinuz- and sysvers- were 0 B.

    Then those are failures. /boot should look something like this snippet: >>>>
    239K Jun 11 02:33 config-5.10.43-desktop-1.mga8
    4.0K Dec 26 08:34 dracut
    4.0K Jun 14 09:21 grub2
    17M Jun 14 09:12 initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd-desktop.img -> initrd-5.10.43-desktop-1.mga8.img
    33 Jun 14 09:13 initrd.img -> initrd-5.10.43-desktop-1.mga8.img
    193K Jun 11 02:33 symvers-5.10.43-desktop-1.mga8.xz
    4.8M Jun 11 02:33 System.map-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz -> vmlinuz-5.10.43-desktop-1.mga8
    7.7M Jun 11 02:33 vmlinuz-5.10.43-desktop-1.mga8
    30 Jun 14 09:13 vmlinuz-desktop -> vmlinuz-5.10.43-desktop-1.mga8 >>>>
    Thanks, Bit. Fortunately 5.10.45 installed without problems.

    Well that is a step forward. Since the old kernels did not compile you
    might consider using mcc to remove them.

    However,
    we're still trying to find the root cause of this instability.

    Yes I have been following the thread. Maybe enabling core dump might give
    you a chance to find out why.
    <Snipping out all your useful code >
    Did all that, launched firefox, got immediate message:
    Exiting due to channel error
    Crash annotation GraphicsCriticalError Receive IPC close with reason=AbnormalShutdown (t=1.32075)
    So, now I have 550MB of crash data. What to do with it?

    https://fedoraproject.org/wiki/Debugging_guidelines_for_Mozilla_products#Using_coredumpctl_to_get_backtrace

    Are you using wayland?
    https://bugzilla.mozilla.org/show_bug.cgi?id=1538435

    Regards, Dave Hodgins

    --
    Change dwhodgins@nomail.afraid.org to davidwhodgins@teksavvy.com for
    email replies.

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Sat Jun 26 01:43:44 2021
    On Fri, 25 Jun 2021 17:18:50 +0100, Grimble wrote:


    Did all that, launched firefox, got immediate message:
    Exiting due to channel error
    Crash annotation GraphicsCriticalError Receive IPC close with reason=AbnormalShutdown (t=1.32075)
    So, now I have 550MB of crash data. What to do with it?

    Delete it unless you want to just practice on it. The kill command was
    just to cause firefox to create a dump and that you can create a core file.

    Now all you can do is wait for a crash/core caused by whatever is
    causing causing your unstable system problem.


    Plug
    analyze core file
    into a search engine to get hints/commands to work on a core file.

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Bit Twister@2:250/1 to All on Sat Jun 26 08:56:11 2021
    On Fri, 25 Jun 2021 17:18:50 +0100, Grimble wrote:

    <Snipping out all your useful code >

    I forgot to add information about running ck_4_core in cron.



    #*******************************************************************
    # If using cron to run ck_4_core you have to monitor root's email.
    # If you have postfix installed/running I can recommend that you
    # have it send root email to your account. As root run
    # cd /etc/postfix
    # tail -12 aliases
    #
    # Example of my aliases change
    #
    # diff /var/local/vorig/etc/postfix/aliases_vinstall /etc/postfix/aliases
    # 80c80
    # < root: postfix
    # ---
    # > root: bittwister
    #
    # After aliases change run
    # postalias aliases
    # systemctl restart postfix
    #
    # and send a test message
    # mail -s "root email test shot" root < /dev/null
    #
    # Now check that the message is in your Linux mail inbox
    #
    # You may want to do a normal shutdown/startup to see if there are
    # any core files generated which you would need to ignore/delete.
    # I find lightdm Display Manager core file after each shutdown.
    #
    #*******************************************************************


    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)
  • From Grimble@2:250/1 to All on Sat Jun 26 13:49:02 2021
    On 26/06/2021 01:43, Bit Twister wrote:
    On Fri, 25 Jun 2021 17:18:50 +0100, Grimble wrote:


    Did all that, launched firefox, got immediate message:
    Exiting due to channel error
    Crash annotation GraphicsCriticalError Receive IPC close with
    reason=AbnormalShutdown (t=1.32075)
    So, now I have 550MB of crash data. What to do with it?

    Delete it unless you want to just practice on it. The kill command was
    just to cause firefox to create a dump and that you can create a core file.

    Didn't make myelf clear, Bit - I didn't give the Kill command, so it's a
    real dump to analyse. Thanks for the tips.


    --
    Grimble
    Registered Linux User #450547
    Machine 'Bach' running Plasma 5.20.4 on 5.10.45-desktop-2.mga8 kernel.
    Mageia release 8 (Official) for x86_64

    --- MBSE BBS v1.0.7.22 (GNU/Linux-x86_64)
    * Origin: A noiseless patient Spider (2:250/1@fidonet)