• Re: Linux 6.1.27, cgroup: Instruction fault 4 with systemd

    From John Paul Adrian Glaubitz@21:1/5 to Frank Scheiner on Mon May 22 12:00:01 2023
    Hello Frank!

    On Mon, 2023-05-22 at 11:34 +0200, Frank Scheiner wrote:
    Maybe someone on linux-alpha has an idea what could be the reason?

    Try reproducing it with libcgroup to see if it's a systemd or a kernel bug:

    https://wiki.archlinux.org/title/cgroups#Examples

    Adrian

    --
    .''`. John Paul Adrian Glaubitz
    : :' : Debian Developer
    `. `' Physicist
    `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to All on Mon May 22 12:00:01 2023
    Dear all,

    as already outlined on the debian-alpha mailing list ([1]), I get an instruction fault 4 with Linux 6.1.27 (6.1.0-9 on Debian actually) and
    systemd on my DS25:

    ```
    aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408
    aboot: switching to OSF/1 PALcode version 1.92
    aboot: loading initrd (5376720 bytes/10502 blocks) at 0xfffffc00ffacc000
    aboot: starting kernel network with arguments root=/dev/nfs ip=:::::enP2p2s5:dhcp console=ttyS0,9600n8
    [ 0.000000] Linux version 6.1.0-9-alpha-smp
    (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU
    ld (GNU Binutils for Debian) 2.40) #1 SMP Debian
    6.1.27-1 (2023-05-08)
    [ 0.000000] Booting GENERIC on Titan variation Granite using machine
    vector PRIVATEER from SRM
    [ 0.000000] Major Options: SMP MAGIC_SYSRQ
    [ 0.000000] Command line: root=/dev/nfs ip=:::::enP2p2s5:dhcp console=ttyS0,9600n8
    [...]
    Begin: Running /scripts/nfs-bottom ... done.
    Begin: Running /scripts/init-bottom ... done.
    [ 9.820307] systemd[1]: systemd 252.6-1 running in system mode (+PAM
    +AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL
    +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ
    +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
    [ 10.202143] systemd[1]: Detected architecture alpha.

    Welcome to Debian GNU/Linux 12 (bookworm)!

    [ 11.864251] systemd[1]: Queued start job for default target graphical.target.
    [ 11.958978] CPU 1
    [ 11.958978] systemd(1): Instruction fault 4
    [ 12.032220] pc = [<fffffc0005163bfc>] ra = [<fffffc0005163bf8>] ps
    = 0000 Not tainted
    [ 12.131829] pc is at 0xfffffc0005163bfc
    [ 12.177728] ra is at 0xfffffc0005163bf8
    [ 12.223626] v0 = 0000000000000000 t0 = 0000000000000023 t1 = fffffc00066eb800
    [ 12.310540] t2 = fffffc000512e680 t3 = 0000000000f00000 t4 = 0000000000000008
    [ 12.398431] t5 = 0000000000000001 t6 = 0000000000000000 t7 = fffffc0005160000
    [ 12.486321] a0 = 0000000000000000 a1 = fffffc0005163bc0 a2 = fffffc0005163bf8
    [ 12.573235] a3 = 0000000000000001 a4 = 00000002c8cf86cc a5 = 0000000000000001
    [ 12.661126] t8 = 0000000000000080 t9 = 0000000000000001 t10= fffffc0002891148
    [ 12.749016] t11= 0000000000000000 pv = fffffc00011d4a40 at = 5f19e10505e118bf
    [ 12.835930] gp = fffffc0002871148 sp = 00000000440a695e
    [ 12.899407] Disabling lock debugging due to kernel taint
    [ 12.962883] Trace:
    [ 12.987298] [<fffffc00011155d8>] cgroup_migrate_execute+0x338/0x600
    [ 13.062493] [<fffffc0001115da8>] cgroup_update_dfl_csses+0x2c8/0x330
    [ 13.138665] [<fffffc000111867c>] cgroup_subtree_control_write+0x56c/0x5e0
    [ 13.219719] [<fffffc000110dc24>] cgroup_file_write+0xa4/0x1a0
    [ 13.288079] [<fffffc0001379cd4>] kernfs_fop_write_iter+0x1a4/0x330
    [ 13.362297] [<fffffc00012a06c0>] vfs_write+0x250/0x4c0
    [ 13.423821] [<fffffc00012a0b1c>] ksys_write+0x8c/0x140
    [ 13.485344] [<fffffc000101158c>] entSys+0xac/0xc0
    [ 13.541985]
    [ 13.559563] Code:
    [ 13.559563] fffffc00
    [ 13.582024] 00000000
    [ 13.610344] 00000000
    [ 13.638664] 05163bfc
    [ 13.666985] fffffc00
    [ 13.695305] 02871148
    [ 13.723625] <fffffc00>
    [ 13.751946] 00000000
    [ 13.779289]
    ```

    [1]: https://lists.debian.org/debian-alpha/2023/05/msg00007.html

    Checking with a few alternatives, this already seems to happen with
    Linux 6.0.7 and systemd 251.6-1 and 250.4-1.

    When using sysvinit, the system comes up OK and runs stable over a few
    runs of `7z b` and `openssl speed -elapsed`.

    It does also not happen when using Linux 5.3.0-3 from Debian with the
    same systemd versions on the same machine.

    ****

    Michael provided a first analysis on [2], Adrian locates it in the
    cgroup code.

    [2]: https://lists.debian.org/debian-alpha/2023/05/msg00010.html

    ****

    Maybe someone on linux-alpha has an idea what could be the reason?

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank Scheiner@21:1/5 to John Paul Adrian Glaubitz on Mon Jun 19 14:20:01 2023
    Hi,

    let me add some additional data point(s):

    After some testing on different machines and with different kernel types
    it looks like this problem is exclusive to MP kernels. This also when
    running a MP kernel on a single processor machine actually (tested on an AlphaServer 800 5/400 w/EV56).

    Running an SP kernel does not trigger that problem.

    I posted a diff between the -alpha-generic and -alpha-smp kernel
    configurations on [1].

    [1]: https://pastebin.com/AwZQjHD9

    On 22.05.23 11:37, John Paul Adrian Glaubitz wrote:
    Hello Frank!

    On Mon, 2023-05-22 at 11:34 +0200, Frank Scheiner wrote:
    Maybe someone on linux-alpha has an idea what could be the reason?

    Try reproducing it with libcgroup to see if it's a systemd or a kernel bug:

    https://wiki.archlinux.org/title/cgroups#Examples

    Took me a while to get back to this and actually get it working...

    Following misc. examples and manpages (e.g. [2] and [3]) I did the
    following to test cgroup functionality with System V init installed and
    running instead of systemd:

    ```
    root@ds25:~# uname -a
    Linux ds25 6.3.0-1-alpha-smp #1 SMP Debian 6.3.7-1 (2023-06-12) alpha
    GNU/Linux

    root@ds25:~# mount
    [...]
    cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,mode=755,inode64)
    cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
    cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
    [...]
    cgroup on /sys/fs/cgroup/rdma type cgroup (rw,relatime,rdma)
    cgroup on /sys/fs/cgroup/misc type cgroup (rw,relatime,misc)

    root@ds25:~# CGROUP=/sys/fs/cgroup

    root@ds25:~# mkdir $CGROUP/red
    root@ds25:~# mount -t cgroup -o cpuset red $CGROUP/red
    root@ds25:~# mkdir -p $CGROUP/red/shells/bash
    root@ds25:~# chown root:root $CGROUP/red/shells/bash/*
    root@ds25:~# id johndoe
    uid=1001(johndoe) gid=1001(johndoe) groups=1001(johndoe),100(users) root@ds25:~# chown root:johndoe $CGROUP/red/shells/bash/tasks
    root@ds25:~# echo $(cgget -n -v -r cpuset.mems /) > $CGROUP/red/shells/cpuset.mems
    root@ds25:~# echo $(cgget -n -v -r cpuset.cpus /) > $CGROUP/red/shells/cpuset.cpus
    root@ds25:~# echo 0 > $CGROUP/red/shells/bash/cpuset.mems
    root@ds25:~# echo 0 > $CGROUP/red/shells/bash/cpuset.cpus

    root@ds25:~# cat /proc/self/cgroup
    13:misc:/
    12:rdma:/
    11:pids:/
    10:net_prio:/
    9:perf_event:/
    8:net_cls:/
    7:freezer:/
    6:devices:/
    5:memory:/
    4:blkio:/
    3:cpuacct:/
    2:cpu:/
    1:cpuset:/

    root@ds25:~# echo $$
    1496

    root@ds25:~# cgexec -g cpuset:shells/bash bash

    root@ds25:~# echo $$
    1695

    root@ds25:~# cat /proc/self/cgroup
    13:misc:/
    [...]
    2:cpu:/
    1:cpuset:/shells/bash
    ```

    [2]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/ch-using_control_groups

    [3]: https://wiki.archlinux.org/title/cgroups#Examples

    I then ran `7za b` in that shell and though `7za` executes two threads
    assuming it has access to both CPUs, `htop` showed both of them running
    on the first processor only. So it looks like at least this part of the
    cgroup functionality is working with Linux 6.3.0-1 from Debian when
    using System V init.

    So it could be that this problem is only triggered with one or multiple specific controller(s). But I don't exactly know how to determine the
    used controller(s) for target "graphical.target" - where this seems to
    happen according to (see more details on [4]):

    ```
    [...]
    [ 11.864251] systemd[1]: Queued start job for default target graphical.target.
    [ 11.958978] CPU 1
    [ 11.958978] systemd(1): Instruction fault 4
    [...]
    ```

    [4]: https://lists.debian.org/debian-alpha/2023/05/msg00012.html

    Cheers,
    Frank

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)