• Debian SID kernel doesn't boot on PowerBook 3400c

    From Christophe Leroy@21:1/5 to All on Sat Jul 31 18:40:02 2021
    Stan Johnson <userm57@yahoo.com> a écrit :

    Hello,

    The current Debian SID kernel will not boot on a PowerBook 3400c running
    the latest version of Debian SID. If booted using the BootX extension,
    the kernel hangs immediately:

    "Welcome to Linux, kernel 5.10.0-8-powerpc"

    If booted from Mac OS, the Mac OS screen hangs.

    Booting also hangs if the "No video driver" option is selected in BootX,
    "No video driver" causes "video=ofonly" to be passed to the kernel.

    This is the current command line that I'm using in BootX:
    root=/dev/sda13 video=chips65550:vmode:14,cmode:16

    Kernel v5.9 works as expected.

    The config file I'm using is attached.

    Here are the results of a git bisect, marking v5.9 as "good" and the
    most current kernel as "bad":

    $ cd linux
    $ git remote update
    $ git bisect reset
    $ git bisect start
    $ git bisect bad
    $ git bisect good v5.9

    Note: "bad" -> hangs at boot; "good" -> boots to login prompt

    1) 5.11.0-rc5-pmac-00034-g684da7628d9 (bad)
    2) 5.10.0-rc3-pmac-00383-gbb9dd3ce617 (good)
    3) 5.10.0-pmac-06637-g2911ed9f47b (good)
    Note: I had to disable SMP to build this kernel.
    4) 5.10.0-pmac-10584-g9805529ec54 (good)
    Note: I had to disable SMP to build this kernel.
    5) 5.10.0-pmac-12577-g8552d28e140 (bad)
    6) 5.10.0-pmac-11576-g8a5be36b930 (bad)
    7) 5.10.0-pmac-11044-gbe695ee29e8 (good)
    Note: I had to disable SMP to build this kernel.
    8) 5.10.0-rc2-pmac-00288-g59d512e4374 (bad)
    9) 5.10.0-rc2-pmac-00155-gc3d35ddd1ec (good)
    10) 5.10.0-rc2-pmac-00221-g7049b288ea8 (good)
    11) 5.10.0-rc2-pmac-00254-g4b74a35fc7e (bad)
    12) 5.10.0-rc2-pmac-00237-ged22bb8d39f (good)
    13) 5.10.0-rc2-pmac-00245-g87b57ea7e10 (good)
    14) 5.10.0-rc2-pmac-00249-gf10881a46f8 (bad)
    15) 5.10.0-rc2-pmac-00247-gf8a4b277c3c (good)
    16) 5.10.0-rc2-pmac-00248-gdb972a3787d (bad)

    db972a3787d12b1ce9ba7a31ec376d8a79e04c47 is the first bad commit

    Not sure this is really the root of the problem.

    Can you try again without CONFIG_VMAP_STACK ?

    Thanks
    Christophe


    commit db972a3787d12b1ce9ba7a31ec376d8a79e04c47
    Author: Christophe Leroy <christophe.leroy@csgroup.eu>
    Date: Tue Dec 8 05:24:19 2020 +0000

    powerpc/powermac: Fix low_sleep_handler with CONFIG_VMAP_STACK

    low_sleep_handler() can't restore the context from standard
    stack because the stack can hardly be accessed with MMU OFF.

    Store everything in a global storage area instead of storing
    a pointer to the stack in that global storage area.

    To avoid a complete churn of the function, still use r1 as
    the pointer to the storage area during restore.

    Fixes: cd08f109e262 ("powerpc/32s: Enable CONFIG_VMAP_STACK")
    Reported-by: Giuseppe Sacco <giuseppe@sguazz.it>
    Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
    Tested-by: Giuseppe Sacco <giuseppe@sguazz.it>
    Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
    Link: https://lore.kernel.org/r/e3e0d8042a3ba75cb4a9546c19c408b5b5b28994.1607404931.git.christophe.leroy@csgroup.eu

    :040000 040000 d5039513d19748fc13712a2c67ae034371b95fe7 cbbdbdc4b05c713ea2577674260fd37e71306cc0 M arch

    Please let me know if you need more information.

    -Stan Johnson

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christophe Leroy@21:1/5 to All on Mon Aug 2 17:10:02 2021
    Le 31/07/2021 à 20:24, Stan Johnson a écrit :
    Hi Christophe,

    On 7/31/21 9:58 AM, Christophe Leroy wrote:
    Stan Johnson <userm57@yahoo.com> a écrit :

    Hello,

    The current Debian SID kernel will not boot on a PowerBook 3400c running >>> the latest version of Debian SID. If booted using the BootX extension,
    the kernel hangs immediately:

    "Welcome to Linux, kernel 5.10.0-8-powerpc"

    If booted from Mac OS, the Mac OS screen hangs.

    Booting also hangs if the "No video driver" option is selected in BootX, >>> "No video driver" causes "video=ofonly" to be passed to the kernel.

    This is the current command line that I'm using in BootX:
    root=/dev/sda13 video=chips65550:vmode:14,cmode:16

    Kernel v5.9 works as expected.

    The config file I'm using is attached.

    Here are the results of a git bisect, marking v5.9 as "good" and the
    most current kernel as "bad":

    $ cd linux
    $ git remote update
    $ git bisect reset
    $ git bisect start
    $ git bisect bad
    $ git bisect good v5.9

    Note: "bad" -> hangs at boot; "good" -> boots to login prompt

     1) 5.11.0-rc5-pmac-00034-g684da7628d9 (bad)
     2) 5.10.0-rc3-pmac-00383-gbb9dd3ce617 (good)
     3) 5.10.0-pmac-06637-g2911ed9f47b (good)
        Note: I had to disable SMP to build this kernel.
     4) 5.10.0-pmac-10584-g9805529ec54 (good)
        Note: I had to disable SMP to build this kernel.
     5) 5.10.0-pmac-12577-g8552d28e140 (bad)
     6) 5.10.0-pmac-11576-g8a5be36b930 (bad)
     7) 5.10.0-pmac-11044-gbe695ee29e8 (good)
        Note: I had to disable SMP to build this kernel.
     8) 5.10.0-rc2-pmac-00288-g59d512e4374 (bad)
     9) 5.10.0-rc2-pmac-00155-gc3d35ddd1ec (good)
    10) 5.10.0-rc2-pmac-00221-g7049b288ea8 (good)
    11) 5.10.0-rc2-pmac-00254-g4b74a35fc7e (bad)
    12) 5.10.0-rc2-pmac-00237-ged22bb8d39f (good)
    13) 5.10.0-rc2-pmac-00245-g87b57ea7e10 (good)
    14) 5.10.0-rc2-pmac-00249-gf10881a46f8 (bad)
    15) 5.10.0-rc2-pmac-00247-gf8a4b277c3c (good)
    16) 5.10.0-rc2-pmac-00248-gdb972a3787d (bad)

    db972a3787d12b1ce9ba7a31ec376d8a79e04c47 is the first bad commit

    Not sure this is really the root of the problem.

    Can you try again without CONFIG_VMAP_STACK ?

    Thanks
    Christophe
    ...


    With CONFIG_VMAP_STACK=y, 5.11.0-rc5-pmac-00034-g684da7628d9 hangs at
    boot on the PB 3400c.

    Without CONFIG_VMAP_STACK, 5.11.0-rc5-pmac-00034-g684da7628d9 boots as expected.

    I didn't re-build the Debian SID kernel, though I confirmed that the
    Debian config file for 5.10.0-8-powerpc includes CONFIG_VMAP_STACK=y.
    It's not clear whether removing CONFIG_VMAP_STACK would be appropriate
    for other powerpc systems.

    Please let me know why removing CONFIG_VMAP_STACK fixed the problem on
    the PB 3400c. Should CONFIG_HAVE_ARCH_VMAP_STACK also be removed?


    When CONFIG_HAVE_ARCH_VMAP_STACK is selected by the architecture, CONFIG_VMAP_STACK is selected by
    default.

    The point is that your config has CONFIG_ADB_PMU.

    A bug with VMAP stack was detected during 5.9 release cycle for platforms selecting CONFIG_ADB_PMU.
    Because fixing the bug was an heavy change, we prefered at that time to disable VMAP stack, so VMAP
    stack was deselected for CONFIG_ADB_PMU by commit 4a133eb351ccc275683ad49305d0b04dde903733.

    Then as a second step, the proper fix was implemented and then VMAP stack was enabled again by the
    commit you bisected.

    Taking into account that the problem disappears for you when you manually deselect VMAP stacks, it
    means the problem is not the fix itself, but the fact that VMAP stacks are now enable by default.

    We need to understand why VMAP stack doesn't work on your platform, more than that why it doesn't
    boot at all with VMAP stack.

    Could you send me the dmesg output of your system when it properly boots ?

    Did you check with kernel 5.13 ?

    Thanks
    Christophe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christophe Leroy@21:1/5 to All on Tue Aug 3 12:30:01 2021
    Le 02/08/2021 à 19:32, Stan Johnson a écrit :
    On 8/2/21 8:41 AM, Christophe Leroy wrote:


    Le 31/07/2021 à 20:24, Stan Johnson a écrit :
    Hi Christophe,

    On 7/31/21 9:58 AM, Christophe Leroy wrote:
    Stan Johnson <userm57@yahoo.com> a écrit :

    Hello,

    The current Debian SID kernel will not boot on a PowerBook 3400c
    running
    the latest version of Debian SID. If booted using the BootX extension, >>>>> the kernel hangs immediately:

    "Welcome to Linux, kernel 5.10.0-8-powerpc"

    If booted from Mac OS, the Mac OS screen hangs.

    Booting also hangs if the "No video driver" option is selected in
    BootX,
    "No video driver" causes "video=ofonly" to be passed to the kernel.

    This is the current command line that I'm using in BootX:
    root=/dev/sda13 video=chips65550:vmode:14,cmode:16

    Kernel v5.9 works as expected.

    The config file I'm using is attached.

    Here are the results of a git bisect, marking v5.9 as "good" and the >>>>> most current kernel as "bad":

    $ cd linux
    $ git remote update
    $ git bisect reset
    $ git bisect start
    $ git bisect bad
    $ git bisect good v5.9

    Note: "bad" -> hangs at boot; "good" -> boots to login prompt

      1) 5.11.0-rc5-pmac-00034-g684da7628d9 (bad)
      2) 5.10.0-rc3-pmac-00383-gbb9dd3ce617 (good)
      3) 5.10.0-pmac-06637-g2911ed9f47b (good)
         Note: I had to disable SMP to build this kernel.
      4) 5.10.0-pmac-10584-g9805529ec54 (good)
         Note: I had to disable SMP to build this kernel.
      5) 5.10.0-pmac-12577-g8552d28e140 (bad)
      6) 5.10.0-pmac-11576-g8a5be36b930 (bad)
      7) 5.10.0-pmac-11044-gbe695ee29e8 (good)
         Note: I had to disable SMP to build this kernel.
      8) 5.10.0-rc2-pmac-00288-g59d512e4374 (bad)
      9) 5.10.0-rc2-pmac-00155-gc3d35ddd1ec (good)
    10) 5.10.0-rc2-pmac-00221-g7049b288ea8 (good)
    11) 5.10.0-rc2-pmac-00254-g4b74a35fc7e (bad)
    12) 5.10.0-rc2-pmac-00237-ged22bb8d39f (good)
    13) 5.10.0-rc2-pmac-00245-g87b57ea7e10 (good)
    14) 5.10.0-rc2-pmac-00249-gf10881a46f8 (bad)
    15) 5.10.0-rc2-pmac-00247-gf8a4b277c3c (good)
    16) 5.10.0-rc2-pmac-00248-gdb972a3787d (bad)

    db972a3787d12b1ce9ba7a31ec376d8a79e04c47 is the first bad commit

    Not sure this is really the root of the problem.

    Can you try again without CONFIG_VMAP_STACK ?

    Thanks
    Christophe
    ...


    With CONFIG_VMAP_STACK=y, 5.11.0-rc5-pmac-00034-g684da7628d9 hangs at
    boot on the PB 3400c.

    Without CONFIG_VMAP_STACK, 5.11.0-rc5-pmac-00034-g684da7628d9 boots as
    expected.

    I didn't re-build the Debian SID kernel, though I confirmed that the
    Debian config file for 5.10.0-8-powerpc includes CONFIG_VMAP_STACK=y.
    It's not clear whether removing CONFIG_VMAP_STACK would be appropriate
    for other powerpc systems.

    Please let me know why removing CONFIG_VMAP_STACK fixed the problem on
    the PB 3400c. Should CONFIG_HAVE_ARCH_VMAP_STACK also be removed?


    When CONFIG_HAVE_ARCH_VMAP_STACK is selected by the architecture,
    CONFIG_VMAP_STACK  is selected by default.

    The point is that your config has CONFIG_ADB_PMU.

    A bug with VMAP stack was detected during 5.9 release cycle for
    platforms selecting CONFIG_ADB_PMU. Because fixing the bug was an heavy
    change, we prefered at that time to disable VMAP stack, so VMAP stack
    was deselected for CONFIG_ADB_PMU by commit
    4a133eb351ccc275683ad49305d0b04dde903733.

    Then as a second step, the proper fix was implemented and then VMAP
    stack was enabled again by the commit you bisected.

    Taking into account that the problem disappears for you when you
    manually deselect VMAP stacks, it means the problem is not the fix
    itself, but the fact that VMAP stacks are now enable by default.

    We need to understand why VMAP stack doesn't work on your platform, more
    than that why it doesn't boot at all with VMAP stack.

    Could you send me the dmesg output of your system when it properly boots ? >>
    Did you check with kernel 5.13 ?

    Thanks
    Christophe


    Christophe,

    Thanks for your response. It looks like I never tested v5.13 (I was originally just reporting that the default Debian SID kernel, 5.10.0-8-powerpc, hangs at boot on the PB 3400c).

    So I rebuilt the stock v5.13 from kernel.org using Finn's dot-config-powermac-5.13, which got changed slightly at compilation (see dot-config-v5.13-pmac, attached). It has CONFIG_VMAP_STACK and
    CONFIG_ADB_PMU set, and it booted, but there were multiple memory
    errors. So it looks like the hang-at-boot problem was fixed sometime
    after v5.11, but there are now memory errors (similar to Wallstreet).

    With CONFIG_VMAP_STACK not set (CONFIG_ADB_PMU is still set), the
    .config file turns into the attached dot-config-v5.13-pmac_NO_VMAP. And
    there were still memory errors (dmesg output attached).

    The memory errors may be a completely unrelated issue, since they occur regardless of the CONFIG_VMAP_STACK setting.

    To help rule out a hardware issue, I confirmed that memory errors don't
    occur with v5.8.2 (dmesg output attached).

    A useful git bisect might be possible if CONFIG_VMAP_STACK is disabled
    for each build. I would need to determine where the memory errors
    started (v5.9, v5.10, v5.11, or v5.12). There is the complication that
    (at least) several v5.10 kernels won't compile if SMP is set, so I might
    need to disable that everywhere as well, assuming the SMP fix didn't
    cause the memory errors.


    Thanks a lot for the information.

    Looks like the memory errors are linked to KUAP (Kernel Userspace Access Protection). Based on the
    places the problems happen, I don't think there are any invalid access, so there must be something
    wrong in the KUAP logic, probably linked to some interrupts happenning in kernel mode while the KUAP
    window is opened. And because is not selected by default on book3s/32 until 5.14, probably nobody
    ever tested it in a real environment before you.

    I think the issue may be linked to commit https://github.com/linuxppc/linux/commit/c16728835 which
    happened between 5.12 and 5.13. Would be nice if you could confirm that 5.12 doesn't have the
    problem (At the same time maybe you can see if 5.12 also boots OK with CONFIG_VMAP_STACK)

    Note that the error detected in the other thread which is being discussed with Finn might also be an
    issue to be checked while we are here.

    Thanks
    Christophe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Christophe Leroy on Wed Aug 4 03:00:01 2021
    On Tue, 3 Aug 2021, Christophe Leroy wrote:


    Looks like the memory errors are linked to KUAP (Kernel Userspace Access Protection). Based on the places the problems happen, I don't think
    there are any invalid access, so there must be something wrong in the
    KUAP logic, probably linked to some interrupts happenning in kernel mode while the KUAP window is opened. And because is not selected by default
    on book3s/32 until 5.14, probably nobody ever tested it in a real
    environment before you.

    I think the issue may be linked to commit https://github.com/linuxppc/linux/commit/c16728835 which happened
    between 5.12 and 5.13.

    The messages, "Kernel attempted to write user page (c6207c) - exploit
    attempt? (uid: 0)", appear in the console logs generated by v5.13. Those
    logs come from the Powerbook G3 discussion in the other thread. Could that
    be the same bug?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Stan Johnson on Wed Aug 4 02:20:01 2021
    On Tue, 3 Aug 2021, Stan Johnson wrote:


    I'm not sure of the issue you are referencing. If it's the Wallstreet
    issue, I believe we were waiting to hear back from you regarding the
    memory errors that crop up with CONFIG_VMAP_STACK=y and mem >464M.
    Finn, if that is not correct, please let me know.


    No, it's not correct. I sent a message dated 3 Aug 2021 with a patch from Christophe. I also sent (privately) a message with instructions for
    testing that patch. I will resend these now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christophe Leroy@21:1/5 to All on Wed Aug 4 08:50:01 2021
    Le 04/08/2021 à 02:34, Finn Thain a écrit :

    On Tue, 3 Aug 2021, Christophe Leroy wrote:


    Looks like the memory errors are linked to KUAP (Kernel Userspace Access
    Protection). Based on the places the problems happen, I don't think
    there are any invalid access, so there must be something wrong in the
    KUAP logic, probably linked to some interrupts happenning in kernel mode
    while the KUAP window is opened. And because is not selected by default
    on book3s/32 until 5.14, probably nobody ever tested it in a real
    environment before you.

    I think the issue may be linked to commit
    https://github.com/linuxppc/linux/commit/c16728835 which happened
    between 5.12 and 5.13.

    The messages, "Kernel attempted to write user page (c6207c) - exploit attempt? (uid: 0)", appear in the console logs generated by v5.13. Those
    logs come from the Powerbook G3 discussion in the other thread. Could that
    be the same bug?


    Yes, most likely.

    So you confirm this appears with 5.13 and not 5.12 ?

    Can you check if they happen at commit c16728835
    Can you check if they DO NOT happen at preceding commit c16728835~

    Could you test without CONFIG_PPC_KUAP
    Could you test with CONFIG_PPC_KUAP and CONFIG_PPC_KUAP_DEBUG

    Thanks
    Christophe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christophe Leroy@21:1/5 to All on Fri Aug 6 08:40:01 2021
    +nicholas piggin for the C interrupt stuff

    Le 06/08/2021 à 03:06, Finn Thain a écrit :
    (Christophe, you've seen some of this before, however there are new
    results added at the end. I've Cc'd the mailing lists this time.)

    On Wed, 4 Aug 2021, Stan Johnson wrote:

    On 8/4/21 8:41 PM, Finn Thain wrote:


    $ curl https://lore.kernel.org/lkml/9b64dde3-6ebd-b446-41d9-61e8cb0d8c39@csgroup.eu/raw
    ../message.mbox
    ok

    $ sha1 ../message.mbox
    SHA1 (../message.mbox) = 436ce0adf893c46c84c54607f73c838897caeeea


    On Wed, 4 Aug 2021, Christophe Leroy wrote:

    Can you check if they happen at commit c16728835


    $ git checkout c16728835eec
    Checking out files: 100% (20728/20728), done.
    Note: checking out 'c16728835eec'.

    You are in 'detached HEAD' state. You can look around, make experimental
    changes and commit them, and you can discard any commits you make in this
    state without impacting any branches by performing another checkout.

    If you want to create a new branch to retain commits you create, you may
    do so (now or later) by using -b with the checkout command again. Example: >>
    git checkout -b <new-branch-name>

    HEAD is now at c16728835eec powerpc/32: Manage KUAP in C
    $ git am ../message.mbox
    warning: Patch sent with format=flowed; space at the end of lines might be lost.
    Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
    $ cp ../dot-config-powermac-5.13 .config
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux
    $ strings vmlinux | fgrep 'Linux version'
    Linux version 5.12.0-rc3-pmac-00078-geb51c431b81 (johnson@ThinkPad) (powerpc-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP Wed Aug 4 21:50:47 MDT 2021

    1) PB 3400c
    Hangs at boot (Mac OS screen), no serial console output

    2) Wallstreet
    X fails, errors ("Kernel attempted to write user page", "BUG: Unable to
    handle kernel instruction fetch"), see Wallstreet_console-1.txt.


    The log shows that the error "Kernel attempted to write user page
    (b3399774) - exploit attempt?" happens after commit c16728835eec ("powerpc/32: Manage KUAP in C").

    I think I found a possible cause for this. After the above patch, locking KUAP on interrupt is done
    in interrupt_enter_prepare(). But in case of NMI interrupt, that function is not called. That means
    that when leaving interrupt through interrupt_exit_kernel_prepare(), the supposedly saved previous
    KUAP status is garbage.

    An easy way to fix that is to add missing stuff in interrupt_nmi_enter_prepare(), I'll do that at
    least for testing, but at the end it is not so easy, because of booke32 and 40x.

    The problem on booke32 and 40x is that the "critical interrupts" exit goes through interrupt_return
    when they happened in user mode and bypass interrupt_return when they happened in kernel mode. So it
    is not easy to manage.




    Can you check if they DO NOT happen at preceding commit c16728835~


    $ git checkout c16728835~
    Previous HEAD position was c16728835eec powerpc/32: Manage KUAP in C
    HEAD is now at 0b45359aa2df powerpc/8xx: Create C version of kuap save/restore/check helpers
    $ git am ../message.mbox
    warning: Patch sent with format=flowed; space at the end of lines might be lost.
    Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
    $ cp ../dot-config-powermac-5.13 .config
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux

    Linux version 5.12.0-rc3-pmac-00077-gc9f6e8dd045

    3) PB 3400c
    Hangs at boot (Mac OS screen)

    4) Wallstreet
    X fails, errors in console log (different than test 2), see
    Wallstreet_console-2.txt.


    This log shows that the errors "xfce4-session[1775]: bus error (7)" and "kernel BUG at arch/powerpc/kernel/interrupt.c:49!" happen prior to commit c16728835eec ("powerpc/32: Manage KUAP in C").

    As mentionned by Nic, this is due to r11 being cloberred. For the time being the only r11 clobber
    identified is the one I have provided a fix for. I'm wondering whether it was applied for all
    further tests or not.



    $ git checkout 0b45359aa2df
    ...
    HEAD is now at 0b45359aa2df powerpc/8xx: Create C version of kuap save/restore/check helpers
    $ git am ../message.mbox
    warning: Patch sent with format=flowed; space at the end of lines might be lost.
    Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
    $ cp ../dot-config-powermac-5.13 .config
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux

    Linux version 5.12.0-rc3-pmac-00077-ge06b29ce146

    5) PB 3400c
    Hangs at boot (Mac OS screen)

    6) Wallstreet
    X failed (X login succeeded, but setting up desktop failed), errors in
    console log, see Wallstreet_console-3.txt.


    (No need for those two tests: it's exactly the same code and almost the
    same failure modes: "kernel BUG at arch/powerpc/kernel/interrupt.c:50".)

    On Thu, 5 Aug 2021, Stan Johnson wrote:

    On 8/5/21 12:47 AM, Finn Thain wrote:

    On Wed, 4 Aug 2021, Christophe Leroy wrote:

    Could you test without CONFIG_PPC_KUAP
    ...

    $ git checkout c16728835eec
    ...
    HEAD is now at c16728835eec powerpc/32: Manage KUAP in C
    $ git am ../message.mbox
    warning: Patch sent with format=flowed; space at the end of lines might be lost.
    Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
    $ cp ../dot-config-powermac-5.13 .config
    $ scripts/config -d CONFIG_PPC_KUAP
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux
    $ grep CONFIG_PPC_KUAP .config
    # CONFIG_PPC_KUAP is not set

    Linux version 5.12.0-rc3-pmac-00078-g5cac2bc3752

    7) PB 3400c
    Hangs at boot (Mac OS screen)

    8) Wallstreet
    Everything works, no errors (see Wallstreet_console-4.txt).


    That would seem to implicate CONFIG_PPC_KUAP itself. (Note that all builds
    up until this one have CONFIG_PPC_KUAP=y.)

    Yes I believe so, see at the begining of this mail.





    Could you test with CONFIG_PPC_KUAP and CONFIG_PPC_KUAP_DEBUG
    ...

    $scripts/config -e CONFIG_PPC_KUAP
    $ scripts/config -e CONFIG_PPC_KUAP_DEBUG
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux
    $ grep CONFIG_PPC_KUAP .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y

    Linux version 5.12.0-rc3-pmac-00078-g5cac2bc3752

    9) PB 3400c
    Hangs at boot (Mac OS screen)

    10) Wallstreet
    X failed at first login, worked at second login, one error in console
    log ("BUG: Unable to handle kernel instruction fetch"), see
    Wallstreet_console-5.txt.


    One might expect to see "Kernel attempted to write user page (b3399774) - exploit attempt?" again here (see c16728835eec build above) but instead
    this log says "Oops: Kernel access of bad area, sig: 11".

    Maybe the test should be done a second time. As r11 is garbage it may or may not be a user address.
    If it is a user address the we get "Kernel attempted to write user page". If it is a random kernel
    address, we likely get "Kernel access of bad area" instead.



    BTW, this procedure could be made simpler and easier if I pushed git
    branches to a public repo for Stan to build, which included Christophe's
    fix plus hard-wired Kconfig changes. That way, the .config file could be
    held constant and the commit hash in the serial console log would be more meaningful.


    I like the idea, I think I'm going to provide testing fixes through a git repo, that will for sure
    make things easier.

    Thanks
    Christophe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Christophe Leroy on Fri Aug 6 12:10:02 2021
    On Fri, 6 Aug 2021, Christophe Leroy wrote:


    Can you check if they DO NOT happen at preceding commit c16728835~


    $ git checkout c16728835~
    Previous HEAD position was c16728835eec powerpc/32: Manage KUAP in C
    HEAD is now at 0b45359aa2df powerpc/8xx: Create C version of kuap save/restore/check helpers
    $ git am ../message.mbox
    warning: Patch sent with format=flowed; space at the end of lines might be
    lost.
    Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
    $ cp ../dot-config-powermac-5.13 .config
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux

    Linux version 5.12.0-rc3-pmac-00077-gc9f6e8dd045

    3) PB 3400c
    Hangs at boot (Mac OS screen)

    4) Wallstreet
    X fails, errors in console log (different than test 2), see Wallstreet_console-2.txt.


    This log shows that the errors "xfce4-session[1775]: bus error (7)" and "kernel BUG at arch/powerpc/kernel/interrupt.c:49!" happen prior to commit c16728835eec ("powerpc/32: Manage KUAP in C").

    As mentionned by Nic, this is due to r11 being cloberred. For the time being the only r11 clobber identified is the one I have provided a fix for. I'm wondering whether it was applied for all further tests or not.


    Your fix was applied to this build with "git am ../message.mbox".

    ...


    Could you test with CONFIG_PPC_KUAP and CONFIG_PPC_KUAP_DEBUG
    ...

    $scripts/config -e CONFIG_PPC_KUAP
    $ scripts/config -e CONFIG_PPC_KUAP_DEBUG
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux
    $ grep CONFIG_PPC_KUAP .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y

    Linux version 5.12.0-rc3-pmac-00078-g5cac2bc3752

    9) PB 3400c
    Hangs at boot (Mac OS screen)

    10) Wallstreet
    X failed at first login, worked at second login, one error in console
    log ("BUG: Unable to handle kernel instruction fetch"), see Wallstreet_console-5.txt.


    One might expect to see "Kernel attempted to write user page (b3399774) - exploit attempt?" again here (see c16728835eec build above) but instead this log says "Oops: Kernel access of bad area, sig: 11".

    Maybe the test should be done a second time. As r11 is garbage it may or
    may not be a user address. If it is a user address the we get "Kernel attempted to write user page". If it is a random kernel address, we
    likely get "Kernel access of bad area" instead.


    Your fix was applied here also.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christophe Leroy@21:1/5 to All on Fri Aug 6 12:20:01 2021
    Le 06/08/2021 à 11:43, Finn Thain a écrit :
    On Fri, 6 Aug 2021, Christophe Leroy wrote:


    Can you check if they DO NOT happen at preceding commit c16728835~ >>>>>>

    $ git checkout c16728835~
    Previous HEAD position was c16728835eec powerpc/32: Manage KUAP in C
    HEAD is now at 0b45359aa2df powerpc/8xx: Create C version of kuap
    save/restore/check helpers
    $ git am ../message.mbox
    warning: Patch sent with format=flowed; space at the end of lines might be >>>> lost.
    Applying: powerpc/32: Dismantle EXC_XFER_STD/LITE/TEMPLATE
    $ cp ../dot-config-powermac-5.13 .config
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean
    olddefconfig vmlinux

    Linux version 5.12.0-rc3-pmac-00077-gc9f6e8dd045

    3) PB 3400c
    Hangs at boot (Mac OS screen)

    4) Wallstreet
    X fails, errors in console log (different than test 2), see
    Wallstreet_console-2.txt.


    This log shows that the errors "xfce4-session[1775]: bus error (7)" and
    "kernel BUG at arch/powerpc/kernel/interrupt.c:49!" happen prior to commit >>> c16728835eec ("powerpc/32: Manage KUAP in C").

    As mentionned by Nic, this is due to r11 being cloberred. For the time being >> the only r11 clobber identified is the one I have provided a fix for. I'm
    wondering whether it was applied for all further tests or not.


    Your fix was applied to this build with "git am ../message.mbox".

    Ok good.


    ...


    Could you test with CONFIG_PPC_KUAP and CONFIG_PPC_KUAP_DEBUG
    ...

    $scripts/config -e CONFIG_PPC_KUAP
    $ scripts/config -e CONFIG_PPC_KUAP_DEBUG
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean
    olddefconfig vmlinux
    $ grep CONFIG_PPC_KUAP .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y

    Linux version 5.12.0-rc3-pmac-00078-g5cac2bc3752

    9) PB 3400c
    Hangs at boot (Mac OS screen)

    10) Wallstreet
    X failed at first login, worked at second login, one error in console
    log ("BUG: Unable to handle kernel instruction fetch"), see
    Wallstreet_console-5.txt.


    One might expect to see "Kernel attempted to write user page (b3399774) - >>> exploit attempt?" again here (see c16728835eec build above) but instead
    this log says "Oops: Kernel access of bad area, sig: 11".

    Maybe the test should be done a second time. As r11 is garbage it may or
    may not be a user address. If it is a user address the we get "Kernel
    attempted to write user page". If it is a random kernel address, we
    likely get "Kernel access of bad area" instead.


    Your fix was applied here also.


    Anyway, it would be worth trying to boot a few times more with the same kernel, because as I said
    the value is random, so it may or may not hit userspace, hence the possible difference of message,
    either "Kernel attempted to write user page" or "Kernel access of bad area" depending on whether the
    address is a user address or not.

    I have cooked a tentative fix for that KUAP stuff.
    Could you try the branch 'bugtest' at https://github.com/chleroy/linux.git

    Thanks
    Christophe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Christophe Leroy on Sat Aug 7 04:30:02 2021
    On Fri, 6 Aug 2021, Christophe Leroy wrote:


    I have cooked a tentative fix for that KUAP stuff.
    Could you try the branch 'bugtest' at https://github.com/chleroy/linux.git


    Thanks, Christophe.

    Stan, please test the following build.

    $ git remote add chleroy-linux https://github.com/chleroy/linux.git -f -t bugtest
    ...
    $ git checkout chleroy-linux/bugtest
    HEAD is now at 63e3756d1bdf powerpc/interrupts: Also perform KUAP/KUEP lock and usertime accounting on NMI
    $ cp ../dot-config-powermac-5.13 .config
    $ scripts/config -e CONFIG_PPC_KUAP -e CONFIG_PPC_KUAP_DEBUG -e CONFIG_VMAP_STACK
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux
    $ egrep "CONFIG_PPC_KUAP|CONFIG_VMAP_STACK" .config
    $ strings vmlinux |grep "Linux version"

    If that kernel produces errors, I'd try a second build as well:

    $ scripts/config -e CONFIG_PPC_KUAP -e CONFIG_PPC_KUAP_DEBUG -d CONFIG_VMAP_STACK
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean olddefconfig vmlinux
    $ egrep "CONFIG_PPC_KUAP|CONFIG_VMAP_STACK" .config
    $ strings vmlinux |grep "Linux version"

    Please boot using the same kernel parameters as last time and capture the serial console logs. In case we're still dealing with intermittent bugs it might be necessary to repeat these tests so I suggest you retain the
    vmlinux files.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Stan Johnson on Sat Aug 7 06:30:01 2021
    On Fri, 6 Aug 2021, Stan Johnson wrote:

    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    CONFIG_VMAP_STACK=y
    $ strings vmlinux | fgrep "Linux version"
    Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1

    1) PB 3400c
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
    Boots, no errors logging in at (text) fb console. Logging in via ssh and running "ls -Rail /usr/include" generated errors (and a hung ssh
    session). Once errors started, they repeated for almost every command.
    See pb3400c-63e3756d1bdf-1.txt.

    2) Wallstreet
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
    X login failed, there were errors ("Oops: Kernel access of bad area",
    "Oops: Exception in kernel mode"). Logging in via SSH, there were no additional errors after running "ls -Rail /usr/include" -- the errors
    did not escalate as they did on the PB 3400.
    See Wallstreet-63e3756d1bdf-1.txt.

    ...
    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    # CONFIG_VMAP_STACK is not set
    $ strings vmlinux | fgrep "Linux version"
    Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2

    3) PB 3400c
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
    Filesystem was corrupt from the previous test (probably from all the
    errors during shutdown). After fixing the filesystem:
    Boots, no errors logging in at (text) fb console. Logging in via ssh and running "ls -Rail /usr/include" generated a few errors. There didn't
    seem to be as many errors as in the previous test, there were a few
    errors during shutdown but the shutdown was otherwise normal.
    See pb3400c-63e3756d1bdf-2.txt.

    4) Wallstreet
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
    X login worked, and there were no errors. There were no errors during
    ssh access.
    See Wallstreet-63e3756d1bdf-2.txt.


    Thanks for collecting these results, Stan. Do you think that the
    successful result from test 4) could have been just chance?

    It appears that the bug affecting the Powerbook 3400 is unaffected by CONFIG_VMAP_STACK.

    Whereas the bug affecting the Powerbook G3 disappears when
    CONFIG_VMAP_STACK is disabled (assuming the result from 4 is reliable).

    Either way, these results reiterate that "Oops: Kernel access of bad area,
    sig: 11" was not entirely resolved by "powerpc/32s: Fix napping restore in
    data storage interrupt (DSI)".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christophe Leroy@21:1/5 to All on Sat Aug 7 17:00:02 2021
    Le 07/08/2021 à 15:09, Stan Johnson a écrit :
    On 8/6/21 10:08 PM, Finn Thain wrote:

    On Fri, 6 Aug 2021, Stan Johnson wrote:

    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    CONFIG_VMAP_STACK=y
    $ strings vmlinux | fgrep "Linux version"
    Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1

    1) PB 3400c
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
    Boots, no errors logging in at (text) fb console. Logging in via ssh and >>> running "ls -Rail /usr/include" generated errors (and a hung ssh
    session). Once errors started, they repeated for almost every command.
    See pb3400c-63e3756d1bdf-1.txt.

    2) Wallstreet
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
    X login failed, there were errors ("Oops: Kernel access of bad area",
    "Oops: Exception in kernel mode"). Logging in via SSH, there were no
    additional errors after running "ls -Rail /usr/include" -- the errors
    did not escalate as they did on the PB 3400.
    See Wallstreet-63e3756d1bdf-1.txt.

    ...
    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    # CONFIG_VMAP_STACK is not set
    $ strings vmlinux | fgrep "Linux version"
    Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2

    3) PB 3400c
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
    Filesystem was corrupt from the previous test (probably from all the
    errors during shutdown). After fixing the filesystem:
    Boots, no errors logging in at (text) fb console. Logging in via ssh and >>> running "ls -Rail /usr/include" generated a few errors. There didn't
    seem to be as many errors as in the previous test, there were a few
    errors during shutdown but the shutdown was otherwise normal.
    See pb3400c-63e3756d1bdf-2.txt.

    4) Wallstreet
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
    X login worked, and there were no errors. There were no errors during
    ssh access.
    See Wallstreet-63e3756d1bdf-2.txt.


    Thanks for collecting these results, Stan. Do you think that the
    successful result from test 4) could have been just chance?

    No. I repeated Test 4 above two more times on the Wallstreet. After
    stomping on it as hard as I could, I didn't see any errors. I ran the following tests simultaneously, with no errors:

    a) Ping flood the Wallstreet
    862132 packets transmitted, 862117 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.316/0.418/12.163/0.143 ms

    b) "ls -Rail /usr" in an ssh window.

    c) "find /usr/include -type f -exec sha1sum {} \;" in a second ssh window.

    d) With a, b and c running, I logged in at the X console (slow but it worked). Load average was 7.0 as reported by uptime.

    So the success seems to be repeatable (or at least the errors are so
    unlikely to happen that I'm not seeing anything).


    It appears that the bug affecting the Powerbook 3400 is unaffected by
    CONFIG_VMAP_STACK.

    Whereas the bug affecting the Powerbook G3 disappears when
    CONFIG_VMAP_STACK is disabled (assuming the result from 4 is reliable).

    Either way, these results reiterate that "Oops: Kernel access of bad area, >> sig: 11" was not entirely resolved by "powerpc/32s: Fix napping restore in >> data storage interrupt (DSI)".


    That sounds right. Thanks for investigating this.



    Thanks a lot for your patience and for the tests.

    I'm still having hard time understanding what the problem is.

    Could you try the new change I pushed into the git repo ? It shouldn't have any effect, but I prefer
    to eliminate all possibilities. The documentation says that SRR1 upper bit are 0 on DSI and the code
    relies on that. But if the doc is wrong then that can explain the problem. So now I'm forcing it to
    0 regardless.

    To get the change, you just have to do 'git pull -r' inside the directory where you checked out the
    sources and build.

    Thanks again
    Christophe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christophe Leroy@21:1/5 to All on Sat Aug 7 19:40:01 2021
    Le 07/08/2021 à 18:26, Stan Johnson a écrit :
    On 8/7/21 8:35 AM, Christophe Leroy wrote:


    Le 07/08/2021 à 15:09, Stan Johnson a écrit :
    On 8/6/21 10:08 PM, Finn Thain wrote:

    On Fri, 6 Aug 2021, Stan Johnson wrote:

    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    CONFIG_VMAP_STACK=y
    $ strings vmlinux | fgrep "Linux version"
    Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1

    1) PB 3400c
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
    Boots, no errors logging in at (text) fb console. Logging in via ssh >>>>> and
    running "ls -Rail /usr/include" generated errors (and a hung ssh
    session). Once errors started, they repeated for almost every command. >>>>> See pb3400c-63e3756d1bdf-1.txt.

    2) Wallstreet
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-1
    X login failed, there were errors ("Oops: Kernel access of bad area", >>>>> "Oops: Exception in kernel mode"). Logging in via SSH, there were no >>>>> additional errors after running "ls -Rail /usr/include" -- the errors >>>>> did not escalate as they did on the PB 3400.
    See Wallstreet-63e3756d1bdf-1.txt.

    ...
    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    # CONFIG_VMAP_STACK is not set
    $ strings vmlinux | fgrep "Linux version"
    Linux version 5.13.0-pmac-00004-g63e3756d1bd ...
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2

    3) PB 3400c
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
    Filesystem was corrupt from the previous test (probably from all the >>>>> errors during shutdown). After fixing the filesystem:
    Boots, no errors logging in at (text) fb console. Logging in via ssh >>>>> and
    running "ls -Rail /usr/include" generated a few errors. There didn't >>>>> seem to be as many errors as in the previous test, there were a few
    errors during shutdown but the shutdown was otherwise normal.
    See pb3400c-63e3756d1bdf-2.txt.

    4) Wallstreet
    vmlinux-5.13.0-pmac-00004-g63e3756d1bd-2
    X login worked, and there were no errors. There were no errors during >>>>> ssh access.
    See Wallstreet-63e3756d1bdf-2.txt.


    Thanks for collecting these results, Stan. Do you think that the
    successful result from test 4) could have been just chance?

    No. I repeated Test 4 above two more times on the Wallstreet. After
    stomping on it as hard as I could, I didn't see any errors. I ran the
    following tests simultaneously, with no errors:

    a) Ping flood the Wallstreet
    862132 packets transmitted, 862117 packets received, 0.0% packet loss
    round-trip min/avg/max/stddev = 0.316/0.418/12.163/0.143 ms

    b) "ls -Rail /usr" in an ssh window.

    c) "find /usr/include -type f -exec sha1sum {} \;" in a second ssh
    window.

    d) With a, b and c running, I logged in at the X console (slow but it
    worked). Load average was 7.0 as reported by uptime.

    So the success seems to be repeatable (or at least the errors are so
    unlikely to happen that I'm not seeing anything).


    It appears that the bug affecting the Powerbook 3400 is unaffected by
    CONFIG_VMAP_STACK.

    Whereas the bug affecting the Powerbook G3 disappears when
    CONFIG_VMAP_STACK is disabled (assuming the result from 4 is reliable). >>>>
    Either way, these results reiterate that "Oops: Kernel access of bad
    area,
    sig: 11" was not entirely resolved by "powerpc/32s: Fix napping
    restore in
    data storage interrupt (DSI)".


    That sounds right. Thanks for investigating this.



    Thanks a lot for your patience and for the tests.

    I'm still having hard time understanding what the problem is.

    Could you try the new change I pushed into the git repo ? It shouldn't
    have any effect, but I prefer to eliminate all possibilities. The
    documentation says that SRR1 upper bit are 0 on DSI and the code relies
    on that. But if the doc is wrong then that can explain the problem. So
    now I'm forcing it to 0 regardless.

    To get the change, you just have to do 'git pull -r' inside the
    directory where you checked out the sources and build.

    Thanks again
    Christophe


    Thanks, Christophe.

    In the same directory as previous builds:

    $ git checkout chleroy-linux/bugtest
    HEAD is now at 63e3756d1bdf powerpc/interrupts: Also perform KUAP/KUEP
    lock and usertime accounting on NMI
    $ git pull -r
    You are not currently on a branch.
    Please specify which branch you want to rebase against.
    ...
    $ git pull -r chleroy-linux
    remote: Enumerating objects: 6, done.
    remote: Counting objects: 100% (6/6), done.
    remote: Compressing objects: 100% (6/6), done.
    remote: Total 6 (delta 0), reused 6 (delta 0), pack-reused 0
    Unpacking objects: 100% (6/6), done.
    From https://github.com/chleroy/linux
    63e3756d1bdf..9023760b1361 bugtest -> chleroy-linux/bugtest
    Updating 63e3756d1bdf..9023760b1361
    Fast-forward
    arch/powerpc/kernel/head_book3s_32.S | 1 +
    1 file changed, 1 insertion(+)
    HEAD is up to date.

    Hopefully I did that right and ended up at the right spot.

    For tests 5 and 6:

    $ cp ../dot-config-powermac-5.13 .config
    $ scripts/config -e CONFIG_PPC_KUAP -e CONFIG_PPC_KUAP_DEBUG -e CONFIG_VMAP_STACK
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean
    olddefconfig vmlinux
    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    CONFIG_VMAP_STACK=y
    $ strings vmlinux | grep "Linux version"
    Linux version 5.13.0-pmac-00005-g9023760b136 (johnson@ThinkPad) (powerpc-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #3 SMP Sat Aug 7 09:29:11 MDT 2021
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00005-g9023760b136-1


    5) PB 3400c
    vmlinux-5.13.0-pmac-00005-g9023760b136-1
    Boots, no errors logging in at (text) fb console. Logging in via ssh and running "ls -Rail /usr/include" generated errors. As before, once errors started, they seemed to escalate, including errors during "shutdown -r now". See pb3400c-g9023760b136-1.txt.

    6) Wallstreet
    vmlinux-5.13.0-pmac-00005-g9023760b136-1
    X login failed, and there were errors. Logging in via SSH, there were no additional errors after running "ls -Rail /usr/include" -- as before,
    the errors did not escalate as they did on the PB 3400.
    See Wallstreet-g9023760b136-1.txt.

    For tests 7 and 8:

    $ cp ../dot-config-powermac-5.13 .config
    $ scripts/config -e CONFIG_PPC_KUAP -e CONFIG_PPC_KUAP_DEBUG -d CONFIG_VMAP_STACK
    $ make ARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- -j4 clean
    olddefconfig vmlinux
    $ egrep '(CONFIG_PPC_KUAP|CONFIG_VMAP_STACK)' .config
    CONFIG_PPC_KUAP=y
    CONFIG_PPC_KUAP_DEBUG=y
    # CONFIG_VMAP_STACK is not set
    $ strings vmlinux | grep "Linux version"
    Linux version 5.13.0-pmac-00005-g9023760b136 (johnson@ThinkPad) (powerpc-linux-gnu-gcc (Debian 8.3.0-2) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #4 SMP Sat Aug 7 09:49:03 MDT 2021
    $ cp vmlinux ../vmlinux-5.13.0-pmac-00005-g9023760b136-2


    7) PB 3400c
    vmlinux-5.13.0-pmac-00005-g9023760b136-2
    As before, the filesystem was corrupt from the previous test. After
    fixing that, this kernel boots, and there were no errors from logging in
    at the (text) fb console. Logging in via ssh and running "ls -Rail /usr/include" generated errors. There were a few errors logging in at
    the serial console and during shutdown, but the shutdown was otherwise normal.
    See pb3400c-g9023760b136-2.txt.

    8) Wallstreet
    vmlinux-5.13.0-pmac-00005-g9023760b136-2
    X login worked, and there were no errors. There were also no errors
    during ssh access.
    Simultaneous stress test, also no errors:
    a) Login at X console.
    b) Ping flood the Wallstreet
    359695 packets transmitted, 359688 packets received, 0.0% packet loss round-trip min/avg/max/stddev = 0.322/0.428/16.857/0.165 ms
    c) "ls -Rail /usr" in an ssh window.
    d) "find /usr/include -type f -exec sha1sum {} \;" in a second ssh window. See Wallstreet-g9023760b136-2.txt.

    As far as I could tell, there were no significant changes from the
    previous four tests.


    Ok, that was expected, but I wanted to be 100% sure to avoid looking into the wrong direction.

    To be honnest, I'm running out of ideas.

    We have two remaining independant problems as far as I understand:

    PB3400C (603ev core = No hash table)
    - A KUAP fault, regardless of CONFIG_VMAP_STACK, due to a clobber of r11 registers apparently.

    Wallstreet (Hash table)
    - Random faults, only with CONFIG_VMAP_STACK


    One thing I am wondering, could there be a link with SMP ?

    Would you mind trying with a kernel built without CONFIG_SMP ?

    Thanks
    Christophe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)