• systemd 252.6-1 produces an Instruction fault, sysvinit works

    From Frank Scheiner@21:1/5 to All on Thu May 18 13:20:01 2023
    Hi all,

    subject says it all: I yesterday upgraded my root FS(es) on my DS25 and
    noticed the following issue with the systemd version right where the
    login prompt should appear (I seem to remember that I recognized
    something similar already late last year with a self-compiled kernel but attributed it to the kernel being self-compiled):

    ```
    aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408
    aboot: switching to OSF/1 PALcode version 1.92
    aboot: loading initrd (5376720 bytes/10502 blocks) at 0xfffffc00ffacc000
    aboot: starting kernel network with arguments root=/dev/nfs ip=:::::enP2p2s5:dhcp console=ttyS0,9600n8
    [ 0.000000] Linux version 6.1.0-9-alpha-smp
    (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU
    ld (GNU Binutils for Debian) 2.40) #1 SMP Debian
    6.1.27-1 (2023-05-08)
    [ 0.000000] Booting GENERIC on Titan variation Granite using machine
    vector PRIVATEER from SRM
    [ 0.000000] Major Options: SMP MAGIC_SYSRQ
    [ 0.000000] Command line: root=/dev/nfs ip=:::::enP2p2s5:dhcp console=ttyS0,9600n8
    [...]
    Begin: Running /scripts/nfs-bottom ... done.
    Begin: Running /scripts/init-bottom ... done.
    [ 9.820307] systemd[1]: systemd 252.6-1 running in system mode (+PAM
    +AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL
    +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ
    +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
    [ 10.202143] systemd[1]: Detected architecture alpha.

    Welcome to Debian GNU/Linux 12 (bookworm)!

    [ 11.864251] systemd[1]: Queued start job for default target graphical.target.
    [ 11.958978] CPU 1
    [ 11.958978] systemd(1): Instruction fault 4
    [ 12.032220] pc = [<fffffc0005163bfc>] ra = [<fffffc0005163bf8>] ps
    = 0000 Not tainted
    [ 12.131829] pc is at 0xfffffc0005163bfc
    [ 12.177728] ra is at 0xfffffc0005163bf8
    [ 12.223626] v0 = 0000000000000000 t0 = 0000000000000023 t1 = fffffc00066eb800
    [ 12.310540] t2 = fffffc000512e680 t3 = 0000000000f00000 t4 = 0000000000000008
    [ 12.398431] t5 = 0000000000000001 t6 = 0000000000000000 t7 = fffffc0005160000
    [ 12.486321] a0 = 0000000000000000 a1 = fffffc0005163bc0 a2 = fffffc0005163bf8
    [ 12.573235] a3 = 0000000000000001 a4 = 00000002c8cf86cc a5 = 0000000000000001
    [ 12.661126] t8 = 0000000000000080 t9 = 0000000000000001 t10= fffffc0002891148
    [ 12.749016] t11= 0000000000000000 pv = fffffc00011d4a40 at = 5f19e10505e118bf
    [ 12.835930] gp = fffffc0002871148 sp = 00000000440a695e
    [ 12.899407] Disabling lock debugging due to kernel taint
    [ 12.962883] Trace:
    [ 12.987298] [<fffffc00011155d8>] cgroup_migrate_execute+0x338/0x600
    [ 13.062493] [<fffffc0001115da8>] cgroup_update_dfl_csses+0x2c8/0x330
    [ 13.138665] [<fffffc000111867c>] cgroup_subtree_control_write+0x56c/0x5e0
    [ 13.219719] [<fffffc000110dc24>] cgroup_file_write+0xa4/0x1a0
    [ 13.288079] [<fffffc0001379cd4>] kernfs_fop_write_iter+0x1a4/0x330
    [ 13.362297] [<fffffc00012a06c0>] vfs_write+0x250/0x4c0
    [ 13.423821] [<fffffc00012a0b1c>] ksys_write+0x8c/0x140
    [ 13.485344] [<fffffc000101158c>] entSys+0xac/0xc0
    [ 13.541985]
    [ 13.559563] Code:
    [ 13.559563] fffffc00
    [ 13.582024] 00000000
    [ 13.610344] 00000000
    [ 13.638664] 05163bfc
    [ 13.666985] fffffc00
    [ 13.695305] 02871148
    [ 13.723625] <fffffc00>
    [ 13.751946] 00000000
    [ 13.779289]
    ```

    With the sysvinit version of the root FS and initramfs everything works
    (did some `7z b` runs and an `openssl speed -elapsed` w/o an issue after booting, too):

    ```
    [...]
    INIT: Entering runlevel: 2
    Using makefile-style concurrent boot in runlevel 2.
    Setting up console font and keymap...done.
    [ 55.348604] loop0: detected capacity change from 0 to 2097152
    [ 55.389620] Adding 1048568k swap on /dev/loop0. Priority:-2
    extents:1 across:1048568k SSFS
    Starting enhanced syslogd: rsyslogd.
    Starting SMP IRQ Balancer: irqbalance.
    Starting periodic command scheduler: cron.
    Loading cpufreq kernel modules...done (none).
    Starting system message bus: dbus.
    Starting OpenBSD Secure Shell server: sshd.
    [ 57.059541] exim4(1425): unaligned trap at 0000020000900088: 000000005f9dc335 28 2
    [ 57.060517] exim4(1425): unaligned trap at 00000200008fff14: 000000005f9dc335 28 1
    [ 57.062470] exim4(1425): unaligned trap at 0000020000900088: 000000005f9dc335 28 2
    Starting MTA: exim4.

    Debian GNU/Linux 12 ds25 ttyS0

    ds25 login: [ 63.064420] do_entUnaUser: 15 callbacks suppressed
    [ 63.064420] exim4(1445): unaligned trap at 0000020000900088: 00000000979d736e 28 2
    [ 63.212858] exim4(1445): unaligned trap at 00000200008fff14: 00000000979d736e 28 1
    [ 63.304655] exim4(1445): unaligned trap at 0000020000900088: 00000000979d736e 28 2
    [ 63.395475] exim4(1445): unaligned trap at 00000200008fff14: 00000000979d736e 28 1
    [ 63.488248] exim4(1445): unaligned trap at 0000020000900088: 00000000979d736e 28 2

    ds25 login:
    ```

    I'll try to downgrade systemd with some versions from snapshot.debian.org.

    UPDATE: I did downgrade to 251.6-1 and 250.4-1, which both didn't help
    with Linux 6.1.0-9, though, Interestingly when booting with Linux
    5.3.0-3 everything works. And as 6.1.0-9 works well with sysvinit
    instead of systemd, it looks like the combination of kernel and systemd
    makes the problems. Looking back to [1], this already seems to happen
    for Linux 6.0.7.

    [1]: https://pastebin.com/FbFG7jHx

    Is an instruction fault (1) like an illegal instruction or (2) something
    else? The question remains, why sysvinit doesn't hit this problem. Does
    someone observe similar problems on their machines or e.g. the buildds
    with systemd and recent kernels (>= 6.x.x)?

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Cree@21:1/5 to Frank Scheiner on Sun May 21 02:30:01 2023
    On Thu, May 18, 2023 at 01:01:25PM +0200, Frank Scheiner wrote:
    Welcome to Debian GNU/Linux 12 (bookworm)!

    [ 11.958978] CPU 1
    [ 11.958978] systemd(1): Instruction fault 4
    [ 12.032220] pc = [<fffffc0005163bfc>] ra = [<fffffc0005163bf8>] ps
    = 0000 Not tainted
    [ 12.131829] pc is at 0xfffffc0005163bfc
    [ 12.177728] ra is at 0xfffffc0005163bf8

    Yeah, I think I have seen this one too when I tried out the new kernel.

    Instruction fault 4 occurs on the execution of a reserved instruction
    (i.e. an invalid opcode) or the execution of a privileged instruction
    in user mode.

    Interestingly the program counter is in kernel space. So raises the
    question is it the kernel that tried to execute an invalid instruction?

    Looking at the code:

    [ 13.559563] Code:
    [ 13.559563] fffffc00
    [ 13.582024] 00000000
    [ 13.610344] 00000000
    [ 13.638664] 05163bfc
    [ 13.666985] fffffc00
    [ 13.695305] 02871148
    [ 13.723625] <fffffc00>
    [ 13.751946] 00000000
    [ 13.779289]

    These do not appear to be valid code. They look more like addresses.
    It has the appearance that the kernel has jumped into data and tried
    to execute it as code!

    Maybe report it to the Linux Kernel Mail List?

    Cheers,
    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Paul Adrian Glaubitz@21:1/5 to Frank Scheiner on Sun May 21 10:10:01 2023
    Hello Frank!

    On Thu, 2023-05-18 at 13:01 +0200, Frank Scheiner wrote:
    subject says it all: I yesterday upgraded my root FS(es) on my DS25 and noticed the following issue with the systemd version right where the
    login prompt should appear (I seem to remember that I recognized
    something similar already late last year with a self-compiled kernel but attributed it to the kernel being self-compiled):

    ```
    aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408
    aboot: switching to OSF/1 PALcode version 1.92
    aboot: loading initrd (5376720 bytes/10502 blocks) at 0xfffffc00ffacc000 aboot: starting kernel network with arguments root=/dev/nfs ip=:::::enP2p2s5:dhcp console=ttyS0,9600n8
    [ 0.000000] Linux version 6.1.0-9-alpha-smp (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU
    ld (GNU Binutils for Debian) 2.40) #1 SMP Debian
    6.1.27-1 (2023-05-08)
    [ 0.000000] Booting GENERIC on Titan variation Granite using machine vector PRIVATEER from SRM
    [ 0.000000] Major Options: SMP MAGIC_SYSRQ
    [ 0.000000] Command line: root=/dev/nfs ip=:::::enP2p2s5:dhcp console=ttyS0,9600n8
    [...]
    Begin: Running /scripts/nfs-bottom ... done.
    Begin: Running /scripts/init-bottom ... done.
    [ 9.820307] systemd[1]: systemd 252.6-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL
    +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ
    +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
    [ 10.202143] systemd[1]: Detected architecture alpha.

    Welcome to Debian GNU/Linux 12 (bookworm)!

    [ 11.864251] systemd[1]: Queued start job for default target graphical.target.
    [ 11.958978] CPU 1
    [ 11.958978] systemd(1): Instruction fault 4
    [ 12.032220] pc = [<fffffc0005163bfc>] ra = [<fffffc0005163bf8>] ps
    = 0000 Not tainted
    [ 12.131829] pc is at 0xfffffc0005163bfc
    [ 12.177728] ra is at 0xfffffc0005163bf8
    [ 12.223626] v0 = 0000000000000000 t0 = 0000000000000023 t1 = fffffc00066eb800
    [ 12.310540] t2 = fffffc000512e680 t3 = 0000000000f00000 t4 = 0000000000000008
    [ 12.398431] t5 = 0000000000000001 t6 = 0000000000000000 t7 = fffffc0005160000
    [ 12.486321] a0 = 0000000000000000 a1 = fffffc0005163bc0 a2 = fffffc0005163bf8
    [ 12.573235] a3 = 0000000000000001 a4 = 00000002c8cf86cc a5 = 0000000000000001
    [ 12.661126] t8 = 0000000000000080 t9 = 0000000000000001 t10= fffffc0002891148
    [ 12.749016] t11= 0000000000000000 pv = fffffc00011d4a40 at = 5f19e10505e118bf
    [ 12.835930] gp = fffffc0002871148 sp = 00000000440a695e
    [ 12.899407] Disabling lock debugging due to kernel taint
    [ 12.962883] Trace:
    [ 12.987298] [<fffffc00011155d8>] cgroup_migrate_execute+0x338/0x600
    [ 13.062493] [<fffffc0001115da8>] cgroup_update_dfl_csses+0x2c8/0x330
    [ 13.138665] [<fffffc000111867c>] cgroup_subtree_control_write+0x56c/0x5e0 [ 13.219719] [<fffffc000110dc24>] cgroup_file_write+0xa4/0x1a0
    [ 13.288079] [<fffffc0001379cd4>] kernfs_fop_write_iter+0x1a4/0x330
    [ 13.362297] [<fffffc00012a06c0>] vfs_write+0x250/0x4c0
    [ 13.423821] [<fffffc00012a0b1c>] ksys_write+0x8c/0x140
    [ 13.485344] [<fffffc000101158c>] entSys+0xac/0xc0
    [ 13.541985]
    [ 13.559563] Code:
    [ 13.559563] fffffc00
    [ 13.582024] 00000000
    [ 13.610344] 00000000
    [ 13.638664] 05163bfc
    [ 13.666985] fffffc00
    [ 13.695305] 02871148
    [ 13.723625] <fffffc00>
    [ 13.751946] 00000000
    [ 13.779289]
    ```

    With the sysvinit version of the root FS and initramfs everything works
    (did some `7z b` runs and an `openssl speed -elapsed` w/o an issue after booting, too):

    As you can see from the backtrace, the issue is triggered somewhere in the CGroup code. Since SysVInit doesn't use CGroups, it's not surprising that
    you don't see the backtrace there.

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to Michael Cree on Mon May 22 12:10:01 2023
    Dear Michael,

    On 21.05.23 02:11, Michael Cree wrote:
    On Thu, May 18, 2023 at 01:01:25PM +0200, Frank Scheiner wrote:
    Welcome to Debian GNU/Linux 12 (bookworm)!

    [ 11.958978] CPU 1
    [ 11.958978] systemd(1): Instruction fault 4
    [ 12.032220] pc = [<fffffc0005163bfc>] ra = [<fffffc0005163bf8>] ps
    = 0000 Not tainted
    [ 12.131829] pc is at 0xfffffc0005163bfc
    [ 12.177728] ra is at 0xfffffc0005163bf8

    Yeah, I think I have seen this one too when I tried out the new kernel.

    Instruction fault 4 occurs on the execution of a reserved instruction
    (i.e. an invalid opcode) or the execution of a privileged instruction
    in user mode.

    Interestingly the program counter is in kernel space. So raises the
    question is it the kernel that tried to execute an invalid instruction?

    Looking at the code:

    [ 13.559563] Code:
    [ 13.559563] fffffc00
    [ 13.582024] 00000000
    [ 13.610344] 00000000
    [ 13.638664] 05163bfc
    [ 13.666985] fffffc00
    [ 13.695305] 02871148
    [ 13.723625] <fffffc00>
    [ 13.751946] 00000000
    [ 13.779289]

    These do not appear to be valid code. They look more like addresses.
    It has the appearance that the kernel has jumped into data and tried
    to execute it as code!

    Thanks for your explanations and analysis, that's more than helpful.

    Maybe report it to the Linux Kernel Mail List?

    The thread is here:

    https://marc.info/?l=linux-alpha&m=168474811430816&w=2

    I would have posted a link to the debian-alpha mailing list, but the
    assumed greylisting there prevented my message to appear there yet.

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)