• [PATCH RFC] mm: implement write-behind policy for sequential file write

    From Konstantin Khlebnikov@21:1/5 to All on Mon Oct 2 12:00:02 2017
    Traditional writeback tries to accumulate as much dirty data as possible.
    This is worth strategy for extremely short-living files and for batching
    writes for saving battery power. But for workloads where disk latency is important this policy generates periodic disk load spikes which increases latency for concurrent operations.

    Present writeback engine allows to tune only dirty data size or expiration time. Such tuning cannot eliminate pikes - this just lowers and multiplies them. Other option is switching into sync mode which flushes written data
    right after each write, obviously this have significant performance impact. Such tuning is system-wide and affects memory-mapped and randomly written files, flusher threads handle them much better.

    This patch implements write-behind policy which tracks sequential writes
    and starts background writeback when have enough dirty pages in a row.

    Write-behind tracks current writing position and looks into two windows
    behind it: first represents unwitten pages, Second - async writeback.

    Next write starts background writeback when first window exceed threshold
    and waits for pages falling behind async writeback window. This allows to combine small writes into bigger requests and maintain optimal io-depth.

    This affects only writes via syscalls, memory mapped writes are unchanged.
    Also write-behind doesn't affect files with fadvise POSIX_FADV_RANDOM.

    If async window set to 0 then write-behind skips dirty pages for congested
    disk and never wait for writeback. This is used for files with O_NONBLOCK.

    Also for files with fadvise POSIX_FADV_NOREUSE write-behind automatically evicts completely written pages from cache. This is perfect for writing
    verbose logs without pushing more important data out of cache.

    As a bonus write-behind makes blkio throttling much more smooth for most
    bulk file operations like copying or downloading which writes sequentially.

    Size of minimal write-behind request is set in: /sys/block/$DISK/bdi/min_write_behind_kb
    Default is 256Kb, 0 - disable write-behind for this disk.

    Size of async window set in:
    /sys/block/$DISK/bdi/async_write_behind_kb
    Default is 1024Kb, 0 - disables sync write-behind.

    Write-behind is controlled by sysctl vm.dirty_write_behind:
    =0: disabled, default
    =1: enabled

    Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    ---
    Documentation/ABI/testing/sysfs-class-bdi | 11 ++++
    Documentation/sysctl/vm.txt | 15 +++++
    include/linux/backing-dev-defs.h | 2 +
    include/linux/fs.h | 9 +++
    include/linux/mm.h | 3 +
    kernel/sysctl.c | 9 +++
    mm/backing-dev.c | 46 +++++++++-------
    mm/fadvise.c | 4 +
    mm/page-writeback.c | 84 +++++++++++++++++++++++++++++
    9 files changed, 162 insertions(+), 21 deletions(-)

    diff --git a/Documentation/ABI/testing/sysfs-class-bdi b/Documentation/ABI/testing/sysfs-class-bdi
    index d773d5697cf5..50a8b8750c13 100644
    --- a/Documentation/ABI/testing/sysfs-class-bdi
    +++ b/Documentation/ABI/testing/sysfs-class-bdi
    @@ -30,6 +30,17 @@ read_ahead_kb (read-write)

    Size of the read-ahead window in kilobytes

    +min_write_behind_kb (read-write)
    +
    + Size of minimal write-behind request in kilobytes.
    + 0
  • From Florian Weimer@21:1/5 to Konstantin Khlebnikov on Mon Oct 2 13:30:02 2017
    On 10/02/2017 11:54 AM, Konstantin Khlebnikov wrote:
    This patch implements write-behind policy which tracks sequential writes
    and starts background writeback when have enough dirty pages in a row.

    Does this apply to data for files which have never been written to disk
    before?

    I think one of the largest benefits of the extensive write-back caching
    in Linux is that the cache is discarded if the file is deleted before it
    is ever written to disk. (But maybe I'm wrong about this.)

    Thanks,
    Florian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Konstantin Khlebnikov@21:1/5 to Florian Weimer on Mon Oct 2 14:00:01 2017
    On 02.10.2017 14:23, Florian Weimer wrote:
    On 10/02/2017 11:54 AM, Konstantin Khlebnikov wrote:
    This patch implements write-behind policy which tracks sequential writes
    and starts background writeback when have enough dirty pages in a row.

    Does this apply to data for files which have never been written to disk before?

    I think one of the largest benefits of the extensive write-back caching in Linux is that the cache is discarded if the file is deleted
    before it is ever written to disk. (But maybe I'm wrong about this.)

    Yes. I've mentioned that current policy is good for short-living files.

    Write-behind keeps small files (<256kB) in cache and writes files smaller
    than 1MB in background, synchronous writes starts only after 1MB.

    But in other hand such files have to be written if somebody calls sync or metadata changes are serialized by journal transactions, or memory pressure flushes them to the disk. So this caching is very unstable and uncertain.
    In some cases caching makes whole operation much slower because actual disk write starts later than could be.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)