Forum: >>> Magnum BBS <<<

system hung up when offlining CPUs

From YASUAKI ISHIMATSU@21:1/5 to Thomas Gleixner on Mon Oct 2 22:50:04 2017

On 09/16/2017 11:02 AM, Thomas Gleixner wrote:

On Sat, 16 Sep 2017, Thomas Gleixner wrote:

On Thu, 14 Sep 2017, YASUAKI ISHIMATSU wrote:

Here are one irq's info of megasas:

- Before offline CPU
/proc/irq/70/smp_affinity_list
24-29

/proc/irq/70/effective_affinity
00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,3f000000

/sys/kernel/debug/irq/irqs/70
handler: handle_edge_irq
status: 0x00004000
istate: 0x00000000
ddepth: 0
wdepth: 0
dstate: 0x00609200
IRQD_ACTIVATED
IRQD_IRQ_STARTED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED

So this uses managed affinity, which means that once the last CPU in the
affinity mask goes offline, the interrupt is shut down by the irq core
code, which is the case:

dstate: 0x00a39000
IRQD_IRQ_DISABLED
IRQD_IRQ_MASKED
IRQD_MOVE_PCNTXT
IRQD_AFFINITY_SET
IRQD_AFFINITY_MANAGED
IRQD_MANAGED_SHUTDOWN <---------------

So the irq core code works as expected, but something in the
driver/scsi/block stack seems to fiddle with that shut down queue.

I only can tell about the inner workings of the irq code, but I have no
clue about the rest.

Though there is something wrong here:

affinity: 24-29
effectiv: 24-29

and after offlining:

affinity: 29
effectiv: 29

But that should be:

affinity: 24-29
effectiv: 29

because the irq core code preserves 'affinity'. It merily updates 'effective', which is where your interrupts are routed to.

Is the driver issuing any set_affinity() calls? If so, that's wrong.

Which driver are we talking about?

We are talking about megasas driver.
So I added linux-scsi and maintainers of megasas into the thread.

Thanks,
Yasuaki Ishimatsu

Thanks,

tglx

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Smithy
  Fri Apr 19 18:53:54 2024
  from Plymouth via Telnet
- Bob Worm
  Fri Apr 19 14:04:19 2024
  from Wales, Uk via Telnet
- Richard
  Fri Apr 19 12:43:01 2024
  from Leeds, Uk via SSH
- Bob Worm
  Fri Apr 19 09:15:26 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	293
Nodes:	16 (2 / 14)
Uptime:	226:36:33
Calls:	6,624
Calls today:	6
Files:	12,171
Messages:	5,318,699

system hung up when offlining CPUs

Who's Online

Recent Visitors

System Info