• system hung up when offlining CPUs

    From YASUAKI ISHIMATSU@21:1/5 to Thomas Gleixner on Mon Oct 2 22:50:04 2017
    On 09/16/2017 11:02 AM, Thomas Gleixner wrote:
    On Sat, 16 Sep 2017, Thomas Gleixner wrote:
    On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
    Here are one irq's info of megasas:

    - Before offline CPU
    /proc/irq/70/smp_affinity_list
    24-29

    /proc/irq/70/effective_affinity
    00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000

    /sys/kernel/debug/irq/irqs/70
    handler: handle_edge_irq
    status: 0x00004000
    istate: 0x00000000
    ddepth: 0
    wdepth: 0
    dstate: 0x00609200
    IRQD_ACTIVATED
    IRQD_IRQ_STARTED
    IRQD_MOVE_PCNTXT
    IRQD_AFFINITY_SET
    IRQD_AFFINITY_MANAGED

    So this uses managed affinity, which means that once the last CPU in the
    affinity mask goes offline, the interrupt is shut down by the irq core
    code, which is the case:

    dstate: 0x00a39000
    IRQD_IRQ_DISABLED
    IRQD_IRQ_MASKED
    IRQD_MOVE_PCNTXT
    IRQD_AFFINITY_SET
    IRQD_AFFINITY_MANAGED
    IRQD_MANAGED_SHUTDOWN <---------------

    So the irq core code works as expected, but something in the
    driver/scsi/block stack seems to fiddle with that shut down queue.

    I only can tell about the inner workings of the irq code, but I have no
    clue about the rest.

    Though there is something wrong here:

    affinity: 24-29
    effectiv: 24-29

    and after offlining:

    affinity: 29
    effectiv: 29

    But that should be:

    affinity: 24-29
    effectiv: 29

    because the irq core code preserves 'affinity'. It merily updates 'effective', which is where your interrupts are routed to.

    Is the driver issuing any set_affinity() calls? If so, that's wrong.

    Which driver are we talking about?

    We are talking about megasas driver.
    So I added linux-scsi and maintainers of megasas into the thread.

    Thanks,
    Yasuaki Ishimatsu


    Thanks,

    tglx


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)