• Re: signal delivery, was Re: reliable reproducer

    From Finn Thain@21:1/5 to Finn Thain on Tue Apr 25 04:10:01 2023
    On Tue, 25 Apr 2023, Finn Thain wrote:

    On Tue, 25 Apr 2023, Michael Schmitz wrote:

    As to a cause for the corruption: all the calculations in setup_frame
    and sys_sigreturn use fsize, but get_sigframe() masks off the result
    of usp - sizeof(sigframe) - fsize to place the entire frame at a
    quadword boundary.
    ...

    I wonder if we are seeing some fallout from the issue described in do_page_fault() i.e. usp is unreliable.

    /* Accessing the stack below usp is always a bug. The
    "+ 256" is there due to some instructions doing
    pre-decrement on the stack and that doesn't show up
    until later. */
    if (address + 256 < rdusp())
    goto map_err;

    Maybe we should try modifying get_sigframe() to increase the gap between
    the signal and exception frames from 0-1 long words up to 64-65 long
    words.


    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
    index b9f6908a31bc..94104699f5a8 100644
    --- a/arch/m68k/kernel/signal.c
    +++ b/arch/m68k/kernel/signal.c
    @@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t frame_size)
    {
    unsigned long usp = sigsp(rdusp(), ksig);

    - return (void __user *)((usp - frame_size) & -8UL);
    + return (void __user *)((usp - 256 - frame_size) & -8UL);
    }

    static int setup_frame(struct ksignal *ksig, sigset_t *set,

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Tue Apr 25 04:40:01 2023
    Hi Finn,

    Am 25.04.2023 um 13:55 schrieb Finn Thain:
    On Tue, 25 Apr 2023, Finn Thain wrote:

    On Tue, 25 Apr 2023, Michael Schmitz wrote:

    As to a cause for the corruption: all the calculations in setup_frame
    and sys_sigreturn use fsize, but get_sigframe() masks off the result
    of usp - sizeof(sigframe) - fsize to place the entire frame at a
    quadword boundary.
    ...

    I wonder if we are seeing some fallout from the issue described in
    do_page_fault() i.e. usp is unreliable.

    /* Accessing the stack below usp is always a bug. The
    "+ 256" is there due to some instructions doing
    pre-decrement on the stack and that doesn't show up
    until later. */
    if (address + 256 < rdusp())
    goto map_err;

    Maybe we should try modifying get_sigframe() to increase the gap between
    the signal and exception frames from 0-1 long words up to 64-65 long
    words.


    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    Might happen, if the frame gap isn't actually equal to the exception
    frame extra size anymore? Aligning the start of the signal frame to the
    next lower quadword boundary increases the gap size.

    When setting up the sigframe, the extra is copied to the correct
    location (right past struct sigframe, or into uc_filler). When moving
    that exception frame into place, the assumption is that the gap is the
    extra size, not more.

    I'll try dropping the quadword alignment constraint - the return
    trampoline still ought to remain longword aligned.

    Cheers,

    Michael



    diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
    index b9f6908a31bc..94104699f5a8 100644
    --- a/arch/m68k/kernel/signal.c
    +++ b/arch/m68k/kernel/signal.c
    @@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t frame_size)
    {
    unsigned long usp = sigsp(rdusp(), ksig);

    - return (void __user *)((usp - frame_size) & -8UL);
    + return (void __user *)((usp - 256 - frame_size) & -8UL);
    }

    static int setup_frame(struct ksignal *ksig, sigset_t *set,


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Tue Apr 25 08:00:01 2023
    Hi Finn,

    Am 25.04.2023 um 14:32 schrieb Michael Schmitz:
    Maybe we should try modifying get_sigframe() to increase the gap between >>> the signal and exception frames from 0-1 long words up to 64-65 long
    words.


    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    Might happen, if the frame gap isn't actually equal to the exception
    frame extra size anymore? Aligning the start of the signal frame to the
    next lower quadword boundary increases the gap size.

    When setting up the sigframe, the extra is copied to the correct
    location (right past struct sigframe, or into uc_filler). When moving
    that exception frame into place, the assumption is that the gap is the
    extra size, not more.

    I'll try dropping the quadword alignment constraint - the return
    trampoline still ought to remain longword aligned.

    No luck - still stack corruption.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eero Tamminen@21:1/5 to Finn Thain on Tue Apr 25 11:30:02 2023
    Hi,

    On 25.4.2023 4.55, Finn Thain wrote:
    On Tue, 25 Apr 2023, Finn Thain wrote:
    ...
    I wonder if we are seeing some fallout from the issue described in
    do_page_fault() i.e. usp is unreliable.

    /* Accessing the stack below usp is always a bug. The
    "+ 256" is there due to some instructions doing
    pre-decrement on the stack and that doesn't show up
    until later. */
    if (address + 256 < rdusp())
    goto map_err;

    Maybe we should try modifying get_sigframe() to increase the gap between
    the signal and exception frames from 0-1 long words up to 64-65 long
    words.

    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
    index b9f6908a31bc..94104699f5a8 100644
    --- a/arch/m68k/kernel/signal.c
    +++ b/arch/m68k/kernel/signal.c
    @@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t frame_size)
    {
    unsigned long usp = sigsp(rdusp(), ksig);

    - return (void __user *)((usp - frame_size) & -8UL);
    + return (void __user *)((usp - 256 - frame_size) & -8UL);
    }

    static int setup_frame(struct ksignal *ksig, sigset_t *set,

    While this is most likely Hatari emulation [1] issue, it has some of the
    same triggering conditions, so I thought to mention it...

    Above patch does not fix kernel panic I'm seen on booting Linux under
    Hatari emulated Atari Falcon, to a small IDE root fs with just (old
    Debian) Busybox and a shell script acting as init: ----------------------------------------
    Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
    CPU: 0 PID: 1 Comm: sh Not tainted 6.2.0hatari-00006-g01793428cbc5-dirty #4 Stack from 00819dc8:
    00819dc8 0032b646 0032b646 00027800 00000001 00819de8 00292878 0032b646
    00819e0c 0028d60e 00027848 0000000b 0081caff 000300d6 00815a90 00000000
    00819f2c 00819e50 00028878 003243fd 0000000b 0000000b 00818007 0081ca30
    000300d6 0081caf8 00000001 00819f18 0081bbd0 00819f2c 00000000 00000000
    00000000 00000000 00819e60 00028ea0 0000000b 0000000b 00819e98 000303c8
    0000000b 00818000 00000002 00000000 00000000 00000000 8017705e 00819f78
    Call Trace: [<00027800>] set_cpu_online+0x1c/0x3e
    [<00292878>] dump_stack+0x10/0x16
    [<0028d60e>] panic+0xc4/0x22a
    [<00027848>] arch_local_irq_enable+0x0/0x22
    [<000300d6>] do_signal_stop+0x0/0x152
    [<00028878>] do_exit+0x138/0x642
    [<000300d6>] do_signal_stop+0x0/0x152
    [<00028ea0>] do_group_exit+0x22/0x62
    [<000303c8>] get_signal+0xf8/0x4ba
    [<00003508>] test_ti_thread_flag+0x0/0x1a
    [<00003f4a>] do_notify_resume+0x36/0x488
    [<00005706>] send_fault_sig+0x28/0x8c
    [<00005888>] do_page_fault+0x11e/0x242
    [<00005814>] do_page_fault+0xaa/0x242
    [<00002814>] do_signal_return+0x10/0x1a
    [<00020007>] _I_CALL_TOP+0xd83/0x1900
    [<0000b280>] sp_over+0x2c/0x3c
    [<00007201>] atari_irq_enable+0x3/0x2a
    [<000066f6>] atari_get_hardware_list+0x33a/0x3e8 ----------------------------------------

    (Only way to get rid of the panic is disabling both CPU cache and
    prefetch emulation.)


    Is it possible that in your case there's also IRQ (exception) happening
    at the same time with page fault and signal?


    - Eero

    [1] 030 MMU vs. cache/prefetch vs. exception handling vs. IDE emulation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to All on Tue Apr 25 14:10:01 2023
    Please try this patch. Signal delivery should only happen at insn
    boundaries, but due to the way the 030 handles return from bus error
    exceptions (the insn is resumed, not restarted like on the 040/060) the
    kernel may do it in the middle of the faulting insn.

    diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S
    index 4dd2fd7acba9..6c09a5710728 100644
    --- a/arch/m68k/kernel/entry.S
    +++ b/arch/m68k/kernel/entry.S
    @@ -117,7 +117,11 @@ ENTRY(buserr)
    movel %sp,%sp@- | stack frame pointer argument
    jbsr buserr_c
    addql #4,%sp
    - jra ret_from_exception
    + | don't do signal delivery when interrupted while insn is in progress
    + | (on the 020/030)
    + tstl %d0
    + jeq ret_from_exception
    + RESTORE_ALL

    ENTRY(trap)
    SAVE_ALL_INT
    diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
    index a700807c9b6d..40fc408d1333 100644
    --- a/arch/m68k/kernel/traps.c
    +++ b/arch/m68k/kernel/traps.c
    @@ -751,8 +751,10 @@ static inline void access_errorcf(unsigned int fs, struct frame *fp)
    }
    #endif /* CONFIG_COLDFIRE CONFIG_MMU */

    -asmlinkage void buserr_c(struct frame *fp)
    +asmlinkage int buserr_c(struct frame *fp)
    {
    + int not_insn_boundary = 0;
    +
    /* Only set esp0 if coming from user mode */
    if (user_mode(&fp->ptregs))
    current->thread.esp0 = (unsigned long) fp;
    @@ -793,8 +795,9 @@ asmlinkage void buserr_c(struct frame *fp)
    brea
  • From Andreas Schwab@21:1/5 to Finn Thain on Tue Apr 25 13:50:01 2023
    On Apr 25 2023, Finn Thain wrote:

    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
    index b9f6908a31bc..94104699f5a8 100644
    --- a/arch/m68k/kernel/signal.c
    +++ b/arch/m68k/kernel/signal.c
    @@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t frame_size)
    {
    unsigned long usp = sigsp(rdusp(), ksig);

    - return (void __user *)((usp - frame_size) & -8UL);
    + return (void __user *)((usp - 256 - frame_size) & -8UL);

    Probably the issue is that a bus error exception should never start
    signal delivery when returning to user space. On the 030 returning from
    a bus error resumes the execution of the faulting insn (unlike the
    040/060 which restart it), and the saved USP may have the original value
    from before the insn started (modified registers may not be updated
    until the insn is complete or just before the final bus cycle). Signal delivery should only ever happen at insn boundaries.

    --
    Andreas Schwab, schwab@linux-m68k.org
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Andreas Schwab on Tue Apr 25 21:50:01 2023
    Hi Andreas,

    On 25/04/23 23:25, Andreas Schwab wrote:
    On Apr 25 2023, Finn Thain wrote:

    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
    index b9f6908a31bc..94104699f5a8 100644
    --- a/arch/m68k/kernel/signal.c
    +++ b/arch/m68k/kernel/signal.c
    @@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t frame_size)
    {
    unsigned long usp = sigsp(rdusp(), ksig);

    - return (void __user *)((usp - frame_size) & -8UL);
    + return (void __user *)((usp - 256 - frame_size) & -8UL);
    Probably the issue is that a bus error exception should never start
    signal delivery when returning to user space. On the 030 returning from
    a bus error resumes the execution of the faulting insn (unlike the
    040/060 which restart it), and the saved USP may have the original value
    from before the insn started (modified registers may not be updated
    until the insn is complete or just before the final bus cycle). Signal delivery should only ever happen at insn boundaries.

    Thanks - we had seen evidence that a bus error generated mid-instruction
    did leave the USP at the address where the bus fault happened (not
    before the instruction started, neither what it would have been once the instruction completed), and the operation did not complete normally
    after the bus error (at least the value/address seen in the exception
    frame not stored). Finn had also demonstrated that skipping signal
    delivery on bus errors abolishes the stack corruption.  Your patch
    achieves the same objective in a different way, so I'm sure this will
    work as well.

    I had thought the 030 could resume the interrupted instruction using the information from the exception frame - and that does appear to work in
    all other cases except where signal delivery gets in the way, and it
    also works if moving the exception frame a little bit further down the
    stack. So our treatment of the bus error exception frame during signal
    delivery appears to be incorrect. Wouldn't you agree?

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Michael Schmitz on Wed Apr 26 04:00:01 2023
    Hi Andreas,

    On 26/04/23 07:46, Michael Schmitz wrote:
    Hi Andreas,

    On 25/04/23 23:25, Andreas Schwab wrote:
    On Apr 25 2023, Finn Thain wrote:

    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
    index b9f6908a31bc..94104699f5a8 100644
    --- a/arch/m68k/kernel/signal.c
    +++ b/arch/m68k/kernel/signal.c
    @@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t
    frame_size)
      {
          unsigned long usp = sigsp(rdusp(), ksig);
      -    return (void __user *)((usp - frame_size) & -8UL);
    +    return (void __user *)((usp - 256 - frame_size) & -8UL);
    Probably the issue is that a bus error exception should never start
    signal delivery when returning to user space.  On the 030 returning from
    a bus error resumes the execution of the faulting insn (unlike the
    040/060 which restart it), and the saved USP may have the original value
    from before the insn started (modified registers may not be updated
    until the insn is complete or just before the final bus cycle). Signal
    delivery should only ever happen at insn boundaries.

    Thanks - we had seen evidence that a bus error generated
    mid-instruction did leave the USP at the address where the bus fault
    happened (not before the instruction started, neither what it would
    have been once the instruction completed), and the operation did not
    complete normally after the bus error (at least the value/address seen
    in the exception frame not stored). Finn had also demonstrated that
    skipping signal delivery on bus errors abolishes the stack
    corruption.  Your patch achieves the same objective in a different
    way, so I'm sure this will work as well.

    I had thought the 030 could resume the interrupted instruction using
    the information from the exception frame - and that does appear to
    work in all other cases except where signal delivery gets in the way,
    and it also works if moving the exception frame a little bit further
    down the stack. So our treatment of the bus error exception frame
    during signal delivery appears to be incorrect. Wouldn't you agree?

    Inspection of the format b frame placed in the signal frame in both rt
    and non-rt cases (at the time the signal handler runs) shows the
    expected contents in the data output buffer, data fault address and ssw.
    At that time, returning to user space with rte would correctly resume
    the instruction execution.  I had previously confirmed that the register contents saved in the rt signal frame is correct also.

    That is with a kernel patched similar to above patch by Finn (using an
    offset of 128 or 64 instead of  256).

    Cheers,

        Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Wed Apr 26 04:20:01 2023
    On Wed, 26 Apr 2023, Michael Schmitz wrote:

    Thanks - we had seen evidence that a bus error generated mid-instruction
    did leave the USP at the address where the bus fault happened (not
    before the instruction started, neither what it would have been once the instruction completed), and the operation did not complete normally
    after the bus error (at least the value/address seen in the exception
    frame not stored).

    I'm afraid I still don't fully understand how and why the user stack
    (rather than the supervisor stack) gets used for processing the exception frame.

    Finn had also demonstrated that skipping signal delivery on bus errors abolishes the stack corruption. Your patch achieves the same objective
    in a different way, so I'm sure this will work as well.

    I had thought the 030 could resume the interrupted instruction using the information from the exception frame - and that does appear to work in
    all other cases except where signal delivery gets in the way, and it
    also works if moving the exception frame a little bit further down the
    stack. So our treatment of the bus error exception frame during signal delivery appears to be incorrect.

    Wouldn't that depend on the exception frame format? Perhaps it is unsafe
    to treat any format 0xB exception frame in the way we do. If so, what do
    we do about address error exceptions, which are to produce SIGBUS? The Programmers Reference Manual says "a long bus fault stack frame may be generated" in this case.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Andreas Schwab on Wed Apr 26 04:20:01 2023
    On Tue, 25 Apr 2023, Andreas Schwab wrote:

    Please try this patch.

    Thanks for looking into this. Your patch solves the problem for me and
    doesn't seem to cause any regression.

    Tested-by: Finn Thain <fthain@linux-m68k.org>

    Signal delivery should only happen at insn boundaries, but due to the
    way the 030 handles return from bus error

    The Programmer's Reference Manual says that the format 0xB frame is used
    by '020 as well which is consistent with the ifdef below, so it probably
    should be mentioned in the commit log too. (And I assume that '020 is also affected by this bug.)

    exceptions (the insn is resumed, not restarted like on the 040/060) the kernel may do it in the middle of the faulting insn.

    diff --git a/arch/m68k/kernel/entry.S b/arch/m68k/kernel/entry.S
    index 4dd2fd7acba9..6c09a5710728 100644
    --- a/arch/m68k/kernel/entry.S
    +++ b/arch/m68k/kernel/entry.S
    @@ -117,7 +117,11 @@ ENTRY(buserr)
    movel %sp,%sp@- | stack frame pointer argument
    jbsr buserr_c
    addql #4,%sp
    - jra ret_from_exception
    + | don't do signal delivery when interrupted while insn is in progress

    We might avoid the word "interrupted" like so:

    | deliver no signal if the fault occurred while insn was in progress

    + | (on the 020/030)
    + tstl %d0
    + jeq ret_from_exception
    + RESTORE_ALL

    ENTRY(trap)
    SAVE_ALL_INT
    diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c
    index a700807c9b6d..40fc408d1333 100644
    --- a/arch/m68k/kernel/traps.c
    +++ b/arch/m68k/kernel/traps.c
    @@ -751,8 +751,10 @@ static inline void access_errorcf(unsigned int fs, struct frame *fp)
    }
    #endif /* CONFIG_COLDFIRE CONFIG_MMU */

    -asmlinkage void buserr_c(struct frame *fp)
    +asmlinkage int buserr_c(struct frame *fp)
    {
    + int not_insn_boundary = 0;
    +
    /* Only set esp0 if coming from user mode */
    if (user_mode(&fp->ptregs))
    current->thread.esp0 = (unsigned long) fp;
    @@ -793,8 +795,9 @@ asmlinkage void buserr_c(struct frame *fp)
    break;
    #endif
    #if defined (CPU_M68020_OR_M68030)
    - case 0xa:
    case 0xb:
    + not_insn_boundary = 1;

    The build uses -Wimplicit-fallthrough so I added this:

    fallthrough;

    + case 0xa:
    bus_error030 (fp);
    break;
    #endif
    @@ -803,6 +806,8 @@ asmlinkage void buserr_c(struct frame *fp)
    pr_debug("Unknown SIGSEGV - 4\n");
    force_sig(SIGSEGV);
    }
    +
    + return not_insn_boundary;
    }


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Wed Apr 26 06:10:01 2023
    Hi Finn,

    Am 26.04.2023 um 14:02 schrieb Finn Thain:
    On Wed, 26 Apr 2023, Michael Schmitz wrote:

    Thanks - we had seen evidence that a bus error generated mid-instruction
    did leave the USP at the address where the bus fault happened (not
    before the instruction started, neither what it would have been once the
    instruction completed), and the operation did not complete normally
    after the bus error (at least the value/address seen in the exception
    frame not stored).

    I'm afraid I still don't fully understand how and why the user stack
    (rather than the supervisor stack) gets used for processing the exception frame.

    The kernel stack would not be accessible to the signal handler which
    must run in process context (i.e. user space).

    The exception frame is copied to the signal frame for informational
    purposes only (such as examination of processor state when the signal
    was taken - not too useful for SIGCHLD but could be used to interpret
    SIGSEGV).


    Finn had also demonstrated that skipping signal delivery on bus errors
    abolishes the stack corruption. Your patch achieves the same objective
    in a different way, so I'm sure this will work as well.

    I had thought the 030 could resume the interrupted instruction using the
    information from the exception frame - and that does appear to work in
    all other cases except where signal delivery gets in the way, and it
    also works if moving the exception frame a little bit further down the
    stack. So our treatment of the bus error exception frame during signal
    delivery appears to be incorrect.

    It seems I got confused about user and kernel stack there myself. And
    managed to confuse almost everyone else about this bug. Apologies for
    the incessant noise.

    What matters for the return from exception is an intact frame on the
    kernel stack. Anything we do on the user stack (mucking around with the
    offset the sigframe is set up at, copying siginfo, ucontext or
    sigcontext plus exception frame extra) does not change the kernel stack
    one whit.

    The mangle_kernel_stack stuff is needed because sys_sigreturn will place another exception frame on the kernel stack (a four word frame) that
    needs to be replaced by the bus error exception frame (or any other
    frame that caused the kernel mode entry prior to signal delivery) before finally returning from the bus error exception.

    Only at that time will the movel instruction that took the bus fault
    resume (and complete its writes correctly, I hope).

    Our problem may be that, if we take the signal too late and our main
    process inspects the stack that has been left partially saved only (due
    to the bus error processing still in-flight), we appear to be in
    trouble. After completing sys_sigreturn, everything will be OK.

    I can see this cause the stack error in the test case. Not sure it also
    applies to the dash case ...

    Wouldn't that depend on the exception frame format? Perhaps it is unsafe
    to treat any format 0xB exception frame in the way we do. If so, what do
    we do about address error exceptions, which are to produce SIGBUS? The Programmers Reference Manual says "a long bus fault stack frame may be generated" in this case.

    We don't handle access errors (beyond terminating the offending process).

    I hope this makes a little more sense now...

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Wed Apr 26 05:40:01 2023
    On Wed, 26 Apr 2023, Michael Schmitz wrote:

    On 26/04/23 07:46, Michael Schmitz wrote:

    I had thought the 030 could resume the interrupted instruction using
    the information from the exception frame - and that does appear to
    work in all other cases except where signal delivery gets in the way,
    and it also works if moving the exception frame a little bit further
    down the stack. So our treatment of the bus error exception frame
    during signal delivery appears to be incorrect. Wouldn't you agree?

    Inspection of the format b frame placed in the signal frame in both rt
    and non-rt cases (at the time the signal handler runs) shows the
    expected contents in the data output buffer, data fault address and ssw.
    At that time, returning to user space with rte would correctly resume
    the instruction execution. I had previously confirmed that the register contents saved in the rt signal frame is correct also.

    That is with a kernel patched similar to above patch by Finn (using an
    offset of 128 or 64 instead of 256).

    That means things go awry during sys_sigreturn or sys_rt_sigreturn. I'm
    not sure what happens to the exception frame:

    1:
    | stack contents now:
    | [original pt_regs address] [original switch_stack address]
    | [unused part of the gap] [moved switch_stack] [moved pt_regs]
    | [replacement exception frame]
    | return value of do_{rt_,}sigreturn() points to moved switch_stack.

    movel %d0,%sp | discard the leftover junk
    RESTORE_SWITCH_STACK
    | stack contents now is just [syscall return address] [pt_regs] [frame]
    | return pt_regs.d0
    movel %sp@(PT_OFF_D0+4),%d0
    rts

    ... but I noticed that the sys_rt_sigreturn entry point does
    SAVE_SWITCH_STACK while the sys_sigreturn entry point does not. Yet both
    jump to label 1 above, so both syscalls do RESTORE_SWITCH_STACK. Hmmm.

    .macro SAVE_SWITCH_STACK
    moveml %a3-%a6/%d6-%d7,%sp@-
    .endm

    .macro RESTORE_SWITCH_STACK
    moveml %sp@+,%a3-%a6/%d6-%d7
    .endm

    Well, that has to corrupt %a3, right?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Wed Apr 26 06:00:01 2023
    Hi Finn,

    Am 26.04.2023 um 15:27 schrieb Finn Thain:
    On Wed, 26 Apr 2023, Michael Schmitz wrote:

    On 26/04/23 07:46, Michael Schmitz wrote:

    I had thought the 030 could resume the interrupted instruction using
    the information from the exception frame - and that does appear to
    work in all other cases except where signal delivery gets in the way,
    and it also works if moving the exception frame a little bit further
    down the stack. So our treatment of the bus error exception frame
    during signal delivery appears to be incorrect. Wouldn't you agree?

    Inspection of the format b frame placed in the signal frame in both rt
    and non-rt cases (at the time the signal handler runs) shows the
    expected contents in the data output buffer, data fault address and ssw.
    At that time, returning to user space with rte would correctly resume
    the instruction execution. I had previously confirmed that the register
    contents saved in the rt signal frame is correct also.

    That is with a kernel patched similar to above patch by Finn (using an
    offset of 128 or 64 instead of 256).

    That means things go awry during sys_sigreturn or sys_rt_sigreturn. I'm
    not sure what happens to the exception frame:

    1:
    | stack contents now:
    | [original pt_regs address] [original switch_stack address]
    | [unused part of the gap] [moved switch_stack] [moved pt_regs]
    | [replacement exception frame]
    | return value of do_{rt_,}sigreturn() points to moved switch_stack.

    movel %d0,%sp | discard the leftover junk
    RESTORE_SWITCH_STACK
    | stack contents now is just [syscall return address] [pt_regs] [frame]
    | return pt_regs.d0
    movel %sp@(PT_OFF_D0+4),%d0
    rts

    ... but I noticed that the sys_rt_sigreturn entry point does SAVE_SWITCH_STACK while the sys_sigreturn entry point does not. Yet both
    jump to label 1 above, so both syscalls do RESTORE_SWITCH_STACK. Hmmm.

    In my copy of entry.S, sys_sigreturn has SAVE_SWITCH_STACK, just as sys_rt_sigreturn does.

    Cheers,

    Michael


    .macro SAVE_SWITCH_STACK
    moveml %a3-%a6/%d6-%d7,%sp@-
    .endm

    .macro RESTORE_SWITCH_STACK
    moveml %sp@+,%a3-%a6/%d6-%d7
    .endm

    Well, that has to corrupt %a3, right?


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Wed Apr 26 07:10:01 2023
    On Wed, 26 Apr 2023, Michael Schmitz wrote:

    In my copy of entry.S, sys_sigreturn has SAVE_SWITCH_STACK, just as sys_rt_sigreturn does.


    You're right.

    I think I've been at this too long.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Finn Thain@21:1/5 to Michael Schmitz on Wed Apr 26 07:00:01 2023
    On Wed, 26 Apr 2023, Michael Schmitz wrote:

    It seems I got confused about user and kernel stack there myself. And
    managed to confuse almost everyone else about this bug. Apologies for
    the incessant noise.


    Likewise... I think it amuses some readers and annoys others ;-)

    What matters for the return from exception is an intact frame on the
    kernel stack. Anything we do on the user stack (mucking around with the offset the sigframe is set up at, copying siginfo, ucontext or
    sigcontext plus exception frame extra) does not change the kernel stack
    one whit.

    The mangle_kernel_stack stuff is needed because sys_sigreturn will place another exception frame on the kernel stack (a four word frame) that
    needs to be replaced by the bus error exception frame (or any other
    frame that caused the kernel mode entry prior to signal delivery) before finally returning from the bus error exception.

    Only at that time will the movel instruction that took the bus fault
    resume (and complete its writes correctly, I hope).


    If the long format frame was corrupted while on the user stack, the
    partially completed MOVEM won't be resumed correctly. That's why I was concerned about a bug in sys_sigreturn.

    Our problem may be that, if we take the signal too late and our main
    process inspects the stack that has been left partially saved only (due
    to the bus error processing still in-flight), we appear to be in
    trouble. After completing sys_sigreturn, everything will be OK.

    I can see this cause the stack error in the test case. Not sure it also applies to the dash case ...

    Wouldn't that depend on the exception frame format? Perhaps it is
    unsafe to treat any format 0xB exception frame in the way we do. If
    so, what do we do about address error exceptions, which are to produce SIGBUS? The Programmers Reference Manual says "a long bus fault stack
    frame may be generated" in this case.

    We don't handle access errors (beyond terminating the offending
    process).


    SIGBUS could be caught and handled, perhaps followed by a setcontext() or siglongjmp() rather than return. So I don't think we get to disallow frame format 0xB.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Michael Schmitz on Wed Apr 26 10:50:01 2023
    On Apr 26 2023, Michael Schmitz wrote:

    The exception frame is copied to the signal frame for informational
    purposes only

    This is not for informational purpose. It is the interrupted context
    that is restored by the kernel in the sigreturn syscall.

    --
    Andreas Schwab, schwab@linux-m68k.org
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to All on Wed Apr 26 11:20:01 2023
    Hi Finn,

    Am 26.04.2023 um 16:42 schrieb Finn Thain:
    If the long format frame was corrupted while on the user stack, the
    partially completed MOVEM won't be resumed correctly. That's why I was concerned about a bug in sys_sigreturn.

    Yes, it turns out I hadn't read mangle_kernel_stack() carefully enough.
    I thought the exception frame had remained on the kernel stack to be
    restored, but I'd missed that it is actually being restored from the
    user stack copy to the kernel stack.

    Need to go back and check whether the sigmask (located after uc_filler
    in rt sigframes) ever got clobbered.

    Wouldn't that depend on the exception frame format? Perhaps it is
    unsafe to treat any format 0xB exception frame in the way we do. If
    so, what do we do about address error exceptions, which are to produce
    SIGBUS? The Programmers Reference Manual says "a long bus fault stack
    frame may be generated" in this case.

    We don't handle access errors (beyond terminating the offending

    Meant to say 'address errors' there. These are misaligned instruction
    accesses only, and we take them to indicate corruption of the binary or
    stack, so abort the process.

    process).


    SIGBUS could be caught and handled, perhaps followed by a setcontext() or siglongjmp() rather than return. So I don't think we get to disallow frame format 0xB.

    Yes, we may still need to take these signals. But we don't have to
    deliver the signals to the user process while exception processing is
    still in progress. For bus faults mid-instruction, it isn't enough to
    map in a page so writes can succeed, we also need to resume the
    instruction to carry out the write.

    Come to think of it, the same argument applies for faults taken at an instruction boundary. These need to be resumed as well, they just take
    less information to load back into the processor.

    Now I just wonder why 040 bus errors (exception restart instead of
    resume) aren't a problem. Pending writebacks are handled during signal
    return there, too. But the format 7 frame is only 15 longwords, the
    format b one is 23 longwords. We know increasing the apparent length of
    the signal frame helps ...

    Anyway, Andreas' patch addresses the issue without incurring too much
    delay in signal processing. If we find out what causes corruption of the exception frame, Andreas' patch can be dropped once a fix is in place.

    Cheers,

    Michael

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Schmitz@21:1/5 to Michael Schmitz on Wed Apr 26 21:50:01 2023
    Hi Finn,

    On 25/04/23 14:32, Michael Schmitz wrote:
    Hi Finn,

    Am 25.04.2023 um 13:55 schrieb Finn Thain:
    On Tue, 25 Apr 2023, Finn Thain wrote:

    On Tue, 25 Apr 2023, Michael Schmitz wrote:

    As to a cause for the corruption: all the calculations in setup_frame
    and sys_sigreturn use fsize, but get_sigframe() masks off the result
    of usp - sizeof(sigframe) - fsize to place the entire frame at a
    quadword boundary.
    ...

    I wonder if we are seeing some fallout from the issue described in
    do_page_fault() i.e. usp is unreliable.

                    /* Accessing the stack below usp is always a bug.  The
                       "+ 256" is there due to some instructions doing
                       pre-decrement on the stack and that doesn't show up
                       until later.  */
                    if (address + 256 < rdusp())
                            goto map_err;

    Minimal gap that avoids the corruption for me (in your patch below) is
    20 bytes. Still can't explain that (and I won't have time to throw more confusion into the discussion until maybe the weekend).

    Cheers,

        Michael


    Maybe we should try modifying get_sigframe() to increase the gap
    between
    the signal and exception frames from 0-1 long words up to 64-65 long
    words.


    It turns out that doing so (patch below) does make the problem go away.
    Was the exception frame getting clobbered?

    Might happen, if the frame gap isn't actually equal to the exception
    frame extra size anymore? Aligning the start of the signal frame to
    the next lower quadword boundary increases the gap size.

    When setting up the sigframe, the extra is copied to the correct
    location (right past struct sigframe, or into uc_filler). When moving
    that exception frame into place, the assumption is that the gap is the
    extra size, not more.

    I'll try dropping the quadword alignment constraint - the return
    trampoline still ought to remain longword aligned.

    Cheers,

        Michael



    diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
    index b9f6908a31bc..94104699f5a8 100644
    --- a/arch/m68k/kernel/signal.c
    +++ b/arch/m68k/kernel/signal.c
    @@ -862,7 +862,7 @@ get_sigframe(struct ksignal *ksig, size_t
    frame_size)
     {
         unsigned long usp = sigsp(rdusp(), ksig);

    -    return (void __user *)((usp - frame_size) & -8UL);
    +    return (void __user *)((usp - 256 - frame_size) & -8UL);
     }

     static int setup_frame(struct ksignal *ksig, sigset_t *set,


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Brad Boyer@21:1/5 to Michael Schmitz on Wed Apr 26 23:50:01 2023
    On Wed, Apr 26, 2023 at 09:10:50PM +1200, Michael Schmitz wrote:
    Am 26.04.2023 um 16:42 schrieb Finn Thain:
    If the long format frame was corrupted while on the user stack, the >partially completed MOVEM won't be resumed correctly. That's why I was >concerned about a bug in sys_sigreturn.

    Yes, it turns out I hadn't read mangle_kernel_stack() carefully enough. I thought the exception frame had remained on the kernel stack to be restored, but I'd missed that it is actually being restored from the user stack copy
    to the kernel stack.

    Isn't that a security hole? If we restore the exception frame from
    user memory, doesn't that allow a malicious program to affect the
    internal state of the CPU just by handling a signal?

    Brad Boyer
    flar@allandria.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)