Forum: >>> Magnum BBS <<<

[v8 0/4] cgroup-aware OOM killer

From Tetsuo Handa@21:1/5 to Shakeel Butt on Mon Oct 2 14:00:01 2017

Shakeel Butt wrote:

I think Tim has given very clear explanation why comparing A & D makes perfect sense. However I think the above example, a single user system
where a user has designed and created the whole hierarchy and then
attaches different jobs/applications to different nodes in this
hierarchy, is also a valid scenario. One solution I can think of, to
cater both scenarios, is to introduce a notion of 'bypass oom' or not
include a memcg for oom comparision and instead include its children
in the comparison.

I'm not catching up to this thread because I don't use memcg.
But if there are multiple scenarios, what about offloading memcg OOM
handling to loadable kernel modules (like there are many filesystems
which are called by VFS interface) ? We can do try and error more casually.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Michal Hocko@21:1/5 to Shakeel Butt on Mon Oct 2 14:30:01 2017

On Sun 01-10-17 16:29:48, Shakeel Butt wrote:

Going back to Michal's example, say the user configured the following:

root
/ \
A D
/ \
B C

A global OOM event happens and we find this:
- A > D
- B, C, D are oomgroups

What the user is telling us is that B, C, and D are compound memory consumers. They cannot be divided into their task parts from a memory
point of view.

However, the user doesn't say the same for A: the A subtree summarizes
and controls aggregate consumption of B and C, but without groupoom
set on A, the user says that A is in fact divisible into independent
memory consumers B and C.

If we don't have to kill all of A, but we'd have to kill all of D,
does it make sense to compare the two?

I think Tim has given very clear explanation why comparing A & D makes perfect sense. However I think the above example, a single user system
where a user has designed and created the whole hierarchy and then
attaches different jobs/applications to different nodes in this
hierarchy, is also a valid scenario.

Yes and nobody is disputing that, really. I guess the main disconnect
here is that different people want to have more detailed control over
the victim selection while the patchset tries to handle the most
simplistic scenario when a no userspace control over the selection is
required. And I would claim that this will be a last majority of setups
and we should address it first.

A more fine grained control needs some more thinking to come up with a
sensible and long term sustainable API. Just look back and see at the oom_score_adj story and how it ended up unusable in the end (well apart
from never/always kill corner cases). Let's not repeat that again now.

I strongly believe that we can come up with something - be it priority
based, BFP based or module based selection. But let's start simple with
the most basic scenario first with a most sensible semantic implemented.

I believe the latest version (v9) looks sensible from the semantic point
of view and we should focus on making it into a mergeable shape.
--
Michal Hocko
SUSE Labs

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Shakeel Butt@21:1/5 to All on Mon Oct 2 22:50:13 2017

Yes and nobody is disputing that, really. I guess the main disconnect
here is that different people want to have more detailed control over
the victim selection while the patchset tries to handle the most
simplistic scenario when a no userspace control over the selection is required. And I would claim that this will be a last majority of setups
and we should address it first.

IMHO the disconnect/disagreement is which memcgs should be compared
with each other for oom victim selection. Let's forget about oom
priority and just take size into the account. Should the oom selection algorithm, compare the leaves of the hierarchy or should it compare
siblings? For the single user system, comparing leaves makes sense
while in a multi user system, siblings should be compared for victim
selection.

Coming back to the same example:

root
/ \
A D
/ \
B C

Let's view it as a multi user system and some central job scheduler
has asked a node controller on this system to start two jobs 'A' &
'D'. 'A' then went on to create sub-containers. Now, on system oom,
IMO the most simple sensible thing to do from the semantic point of
view is to compare 'A' and 'D' and if 'A''s usage is higher then
killall 'A' if oom_group or recursively find victim memcg taking 'A'
as root.

I have noted before that for single user systems, comparing 'B', 'C' &
'D' is the most sensible thing to do.

Now, in the multi user system, I can kind of force the comparison of
'A' & 'D' by setting oom_group on 'A'. IMO that is abuse of
'oom_group' as it will get double meanings/semantics which are
comparison leader and killall. I would humbly suggest to have two
separate notions instead. Let's say oom_gang (if you prefer just
'oom_group' is fine too) and killall.

For the single user system example, 'B', 'C' and 'D' will have
'oom_gang' set and if the user wants killall semantics too, he can set
it separately.

For the multi user, 'A' and 'D' will have 'oom_gang' set. Now, lets
say 'A' was selected on system oom, if 'killall' was set on 'A' then
'A' will be selected as victim otherwise the oom selection algorithm
will recursively take 'A' as root and try to find victim memcg.

Another major semantic of 'oom_gang' is that the leaves will always be
treated as 'oom_gang'.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Jasonrx8
  Wed Apr 17 13:08:27 2024
  from Sydney Nsw via SSH
- Bob Worm
  Thu Apr 18 21:44:01 2024
  from Wales, Uk via Telnet
- Bob Worm
  Thu Apr 18 13:24:26 2024
  from Wales, Uk via Telnet
- Chippey
  Fri Apr 19 02:45:49 2024
  from Winnipeg, Canada via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	293
Nodes:	16 (2 / 14)
Uptime:	211:17:58
Calls:	6,619
Calls today:	1
Files:	12,168
Messages:	5,317,308

[v8 0/4] cgroup-aware OOM killer

Who's Online

Recent Visitors

System Info