• msync Multithreaded Locking

    From =?UTF-8?Q?Maximilian_B=C3=B6ther?=@21:1/5 to All on Wed Mar 24 03:52:13 2021
    Hello,

    I am investigating an application that writes random data in fixed-size chunks (e.g. 4k) to random locations in a large buffer file. I have several processes (not threads) doing that, each process has its own buffer file assigned.

    If I use mmap+msync to write and persist data to disk, I see a performance spike for 16 processes, and a performance drop for more threads (32 processes).

    If I use open+write+fsync, I do not see such a spike, instead a performance plateau (and mmap is slower than open/write).

    I've read multiple times [1,2] that both mmap and msync can take locks. With vtune, I analyzed that we are indeed spinlocking, and spending the most time in `clear_page_erms` and `xas_load` functions.

    However, when reading the source code for msync [3], I cannot understand whether these locks are global or per-file. The paper [2] states that the locks are on radix-trees within the kernel that are per-file, however, as I do observe some spinlocks in
    the kernel, I believe that some locks may be global, as I have one file per process.

    Do you have an explanation on why we have such a spike at 16 processes for mmap and input on the locking behavior of msync?

    Thank you!

    Best,
    Maximilian Böther

    [1] https://kb.pmem.io/development/100000025-Why-msync-is-less-optimal-for-persistent-memory/
    [2] Optimizing Memory-mapped I/O forFast Storage Devices, Papagiannis et al., ATC '20
    [3] https://elixir.bootlin.com/linux/latest/source/mm/msync.c

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)