On Sat, 16 Sep 2017, Thomas Gleixner wrote:
On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:
Here are one irq's info of megasas:
- Before offline CPU
/proc/irq/70/smp_affinity_list
24-29
/proc/irq/70/effective_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000
/sys/kernel/debug/irq/irqs/70
handler: handle_edge_irq
status: 0x00004000
istate: 0x00000000
ddepth: 0
wdepth: 0
dstate: 0x00609200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
So this uses managed affinity, which means that once the last CPU in the
affinity mask goes offline, the interrupt is shut down by the irq core
code, which is the case:
dstate: 0x00a39000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN <---------------
So the irq core code works as expected, but something in the
driver/scsi/block stack seems to fiddle with that shut down queue.
I only can tell about the inner workings of the irq code, but I have no
clue about the rest.
Though there is something wrong here:
affinity: 24-29
effectiv: 24-29
and after offlining:
affinity: 29
effectiv: 29
But that should be:
affinity: 24-29
effectiv: 29
because the irq core code preserves 'affinity'. It merily updates 'effective', which is where your interrupts are routed to.
Is the driver issuing any set_affinity() calls? If so, that's wrong.
Which driver are we talking about?
Thanks,
tglx
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 226:36:33 |
Calls: | 6,624 |
Calls today: | 6 |
Files: | 12,171 |
Messages: | 5,318,699 |