• [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages

    From Michal Hocko@21:1/5 to Alexandru Moise on Mon Oct 2 15:00:01 2017
    On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
    This attempts to bring more flexibility to how hugepages are allocated
    by making it possible to decide whether we want the hugepages to be
    allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore=" boot parameter for non-movable allocations.

    A new boot parameter is introduced, "hugepages_movable=", this sets the default value for the "hugepages_treat_as_movable" sysctl. This allows
    us to determine the zone for hugepages allocated at boot time. It only affects 2M hugepages allocated at boot time for now because 1G
    hugepages are allocated much earlier in the boot process and ignore
    this sysctl completely.

    The "hugepages_treat_as_movable" sysctl is also turned into a mandatory setting that all hugepage allocations at runtime must respect (both
    2M and 1G sized hugepages). The default value is changed to "1" to
    preserve the existing behavior that if hugepage migration is supported,
    then the pages will be allocated from ZONE_MOVABLE.

    Note however if not enough contiguous memory is present in ZONE_MOVABLE
    then the allocation will fallback to the non-movable zone and those
    pages will not be migratable.

    This changelog doesn't explain _why_ we would need something like that.

    The implementation is a bit dirty so obviously I'm open to suggestions
    for a better way to implement this behavior, or comments whether the whole idea is fundamentally __wrong__.

    To be honest I think this is just a wrong approach. hugepages_treat_as_movable is quite questionable to be honest because it breaks the basic semantic
    of the movable zone if the hugetlb pages are not really migratable which
    should be the only criterion. Hugetlb pages are no different from other migratable pages in that regards.
    --
    Michal Hocko
    SUSE Labs

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michal Hocko@21:1/5 to Alexandru Moise on Mon Oct 2 22:50:15 2017
    On Mon 02-10-17 17:06:38, Alexandru Moise wrote:
    On Mon, Oct 02, 2017 at 04:27:17PM +0200, Michal Hocko wrote:
    On Mon 02-10-17 16:06:33, Alexandru Moise wrote:
    On Mon, Oct 02, 2017 at 02:54:32PM +0200, Michal Hocko wrote:
    On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
    This attempts to bring more flexibility to how hugepages are allocated
    by making it possible to decide whether we want the hugepages to be allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
    boot parameter for non-movable allocations.

    A new boot parameter is introduced, "hugepages_movable=", this sets the
    default value for the "hugepages_treat_as_movable" sysctl. This allows
    us to determine the zone for hugepages allocated at boot time. It only
    affects 2M hugepages allocated at boot time for now because 1G hugepages are allocated much earlier in the boot process and ignore this sysctl completely.

    The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
    setting that all hugepage allocations at runtime must respect (both 2M and 1G sized hugepages). The default value is changed to "1" to preserve the existing behavior that if hugepage migration is supported,
    then the pages will be allocated from ZONE_MOVABLE.

    Note however if not enough contiguous memory is present in ZONE_MOVABLE
    then the allocation will fallback to the non-movable zone and those pages will not be migratable.

    This changelog doesn't explain _why_ we would need something like that.


    So people shouldn't be able to choose whether their hugepages should be migratable or not?

    How are hugetlb pages any different from THP wrt. migrateability POV? Or any other mapped memory to the userspace in general?

    THP shares more with regular userspace mapped memory than with hugetlbfs pages.
    They have separate codepaths in migrate_pages().

    That is a mere implementation detail. You are right that THP shares more
    with regular userspace memory because it is transparent from the
    configuration POV but that has nothing to do with page migration AFAICS.

    And no one ever sets the movable
    flag on a hugetlbfs mapping, so even though __PageMovable(hpage) on a hugetlbfs
    page returns false, it will still move.

    __PageMovable is a completely unrelated thing. It is for pages which are
    !LRU but still movable.



    Maybe they consider some of their applications more important than others.

    I do not understand this part.

    Say:
    You have a large number of correctable errors on a subpage of a compound page. So you copy the contents of the page to another hugepage, break the original page and offline the subpage.

    I suspect you have HWPoisoning in mind right?

    No, rather soft offlining.

    I thought this is the same thing.

    But maybe you'd rather that some of
    your hugepages not be broken and moved because you're not that worried about
    memory corruption, but more about availability.

    Could you be more specific please?

    You can have a platform with reliable DIMM modules and a platform with less reliable
    DIMM modules. So you would prefer to inhibit hugepage migration on the platform with
    reliable DIMM modules that you know will behave ok even under a high number of
    correctable memory errors. tools like mcelog however are not hugepage aware and
    cannot be told "if this PFN is part of a hugepage, don't try to soft offline it",
    rather deciding which PFNs should be unmovable should be done in the kernel, but it should still be controllable by the administrator.

    This sounds like a userspace policy that should be handled outside of
    the kernel.

    For hugetlbfs pages in particular, this behavior is not present, without this patch.


    Without this patch even if hugepages are in the non-movable zone, they move.

    which is ok. This is very same with any other movable allocations.

    So you can have movable pages in the non-movable kernel zone?

    yes. Most configuration even do not have any movable zone unless
    explicitly configured.

    The implementation is a bit dirty so obviously I'm open to suggestions
    for a better way to implement this behavior, or comments whether the whole
    idea is fundamentally __wrong__.

    To be honest I think this is just a wrong approach. hugepages_treat_as_movable
    is quite questionable to be honest because it breaks the basic semantic of the movable zone if the hugetlb pages are not really migratable which
    should be the only criterion. Hugetlb pages are no different from other migratable pages in that regards.

    Shouldn't hugepages allocated to unmovable zone, by definition, not be able
    to be migrated? With this patch, hugepages in the movable zone do move, but
    hugepages in the non-movable zone don't. Or am I misunderstanding the semantics
    completely?

    yes. movable zone is only about a guarantee to move memory around.
    Movable allocations are still allowed to use kernel zones (aka non-movable). The main reason for the movable zone these days is memory hotplug which needs a semi-guarantee that the memory used can be
    migrated elsewhere to free up the offlined memory.

    But isn't kernel-zone memory guaranteed not to migrate?

    No.

    I agree that movable allocations are allowed to fallback to kernel zones. i.e. This is behavior is correct:
    Page A is in ZONE_MOVABLE, page B is in kernel zone.
    Page A gets soft-offlined, the contents are moved to page B.

    This behavior is not correct:
    Page C is in kernel zone, page D is also in kernel zone.
    Page C gets soft offlined, contents of page C get moved to page D.

    Why is this incorrect?

    With hugepages, there is no check for whereto the migration goes because
    the pages are pre-allocated and simply dequeued from the hstate freelist.

    true

    Thus hugepages will end up being unreserved and moved to a different
    reserved hugepage, and the administrator has no control over this behavior, even if they're kernel zone pages.

    I really fail to see why kernel vs. movable zones play any role here.
    Zones should be mostly an implementation detail which userspace
    shouldn't really care about.

    --
    Michal Hocko
    SUSE Labs

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)