• [gentoo-user] [OT] how to delete a directory tree really fast

    From Helmut Jarausch@21:1/5 to All on Fri Oct 22 13:40:02 2021
    Hi,

    Is there anything faster than

    rm -rf <TREE>

    ?

    I'm using rync with the --link-dest=<PREVBackUp> option.

    Since this option uses hard links extensively, both, <TREE> and
    <PREVBackUp> have to be
    on the same file system. Therefore, just re-making the file system
    anew, cannot be used.

    There are more than 55,000 files on some <PREVBackUp> which is located
    on a BTRFS file system.
    Standard 'rm -rf' is really slow.

    Is there anything I can do about this?

    Many thanks for a hint,
    Helmut

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Miles Malone@21:1/5 to Rich Freeman on Fri Oct 22 14:40:02 2021
    And honestly, expanding on what Rich said... Given your particular circumstances with the extensive number of hardlinks are pretty
    specific, I reckon you might be best off just setting up a small scale
    test of some options and profiling it. Converting it all to a btrfs
    subvolume might be a realistic option, or might take an order of
    magnitude more time than just waiting for it all to delete. or all of
    the various move tricks mentioned previously

    If this were an "I know I need to do this in the future, what should I
    do" question then you'd either put it all in a subvolume to begin
    with, or select the file system specifically for its speed deleting
    small files... (Certainly dont quote me here, but wasnt JFS the king
    of that back in the day? I cant quite recall)

    On Fri, 22 Oct 2021 at 22:29, Rich Freeman <rich0@gentoo.org> wrote:

    On Fri, Oct 22, 2021 at 7:36 AM Helmut Jarausch <jarausch@skynet.be> wrote:


    There are more than 55,000 files on some <PREVBackUp> which is located
    on a BTRFS file system.
    Standard 'rm -rf' is really slow.

    Is there anything I can do about this?


    I don't have any solid suggestions as I haven't used btrfs in a while.
    File deletion speed is something that is very filesystem specific, but
    on most it tends to be slow.

    An obvious solution would be garbage collection, which is something
    used by some filesystems but I'm not aware of any mainstream ones.
    You can sort-of get that behavior by renaming a directory before
    deleting it. Suppose you have a directory created by a build system
    and you want to do a new build. Deleting the directory takes a long
    time. So, first you rename it to something else (or move it someplace
    on the same filesystem which is fast), then you kick off your build
    which no longer sees the old directory, and then you can delete the
    old directory slowly at your leisure. Of course, as with all garbage collection, you need to have the spare space to hold the data while it
    gets cleaned up.

    I'm not sure if btffs is any faster at deleting snapshots/reflinks
    than hard links. I suspect it wouldn't be, but you could test that.
    Instead of populating a directory with hard links, create a snapshot
    of the directory tree, and then rsync over it/etc. The result looks
    the same but is COW copies. Again, I'm not sure that btrfs will be
    any faster at deleting reflinks than hard links though - they're both
    similar metadata operations. I see there is a patch in the works for
    rsync that uses reflinks instead of hard links to do it all in one
    command. That has a lot of benefits, but again I'm not sure if it
    will help with deletion.

    You could also explore other filesystems that may or may not have
    faster deletion, or look to see if there is any way to optimize it on
    btrfs.

    If you can spare the space, the option of moving the directory to make
    it look like it was deleted will work on basically any filesystem. If
    you want to further automate it you could move it to a tmp directory
    on the same filesystem and have tmpreaper do your garbage collection. Consider using ionice to run it at a lower priority, but I'm not sure
    how much impact that has on metadata operations like deletion.

    --
    Rich


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Freeman@21:1/5 to m.malone@homicidalteddybear.net on Fri Oct 22 15:00:01 2021
    On Fri, Oct 22, 2021 at 8:39 AM Miles Malone
    <m.malone@homicidalteddybear.net> wrote:

    small files... (Certainly dont quote me here, but wasnt JFS the king
    of that back in the day? I cant quite recall)


    It is lightning fast on lizardfs due to garbage collection, but
    metadata on lizardfs is expensive, requiring RAM on the master server
    for every inode. I'd never use it for lots of small files.

    My lizardfs master is using 609MiB for 1,111,394 files (the bulk of
    which are in snapshots, which create records for every file inside, so
    if you snapshot 100k files you end up with 200k files). Figure 1kB
    per file to be safe. Not a big deal if you're storing large files
    (which is what I'm mostly doing). Performance isn't eye-popping
    either - I have no idea how well it would work for something like a
    build system where IOPS matters. For bulk storage of big stuff though
    it is spectacular, and scales very well.

    Cephfs also uses delayed deletion. I have no idea how well it
    performs, or what the cost of metadata is, though I suspect it is a
    lot smarter about RAM requirements on the metadata server. Well,
    maybe, at least in the past it wasn't all that smart about RAM
    requirements on the object storage daemons. I'd seriously look at it
    if doing anything new.

    Distributed filesystems tend to be garbage collected simply due to
    latency. There are data integrity benefits to synchronous writes, but
    there is rarely much benefit on blocking on delections, so why do it?
    These filesystems already need all kinds of synchronization
    capabilities due to node failures, so syncing deletions is just a
    logical design.

    For conventional filesystems a log-based filesystem is naturally garbage-collected, but those can have their own issues.

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Freeman@21:1/5 to jarausch@skynet.be on Fri Oct 22 14:30:01 2021
    On Fri, Oct 22, 2021 at 7:36 AM Helmut Jarausch <jarausch@skynet.be> wrote:


    There are more than 55,000 files on some <PREVBackUp> which is located
    on a BTRFS file system.
    Standard 'rm -rf' is really slow.

    Is there anything I can do about this?


    I don't have any solid suggestions as I haven't used btrfs in a while.
    File deletion speed is something that is very filesystem specific, but
    on most it tends to be slow.

    An obvious solution would be garbage collection, which is something
    used by some filesystems but I'm not aware of any mainstream ones.
    You can sort-of get that behavior by renaming a directory before
    deleting it. Suppose you have a directory created by a build system
    and you want to do a new build. Deleting the directory takes a long
    time. So, first you rename it to something else (or move it someplace
    on the same filesystem which is fast), then you kick off your build
    which no longer sees the old directory, and then you can delete the
    old directory slowly at your leisure. Of course, as with all garbage collection, you need to have the spare space to hold the data while it
    gets cleaned up.

    I'm not sure if btffs is any faster at deleting snapshots/reflinks
    than hard links. I suspect it wouldn't be, but you could test that.
    Instead of populating a directory with hard links, create a snapshot
    of the directory tree, and then rsync over it/etc. The result looks
    the same but is COW copies. Again, I'm not sure that btrfs will be
    any faster at deleting reflinks than hard links though - they're both
    similar metadata operations. I see there is a patch in the works for
    rsync that uses reflinks instead of hard links to do it all in one
    command. That has a lot of benefits, but again I'm not sure if it
    will help with deletion.

    You could also explore other filesystems that may or may not have
    faster deletion, or look to see if there is any way to optimize it on
    btrfs.

    If you can spare the space, the option of moving the directory to make
    it look like it was deleted will work on basically any filesystem. If
    you want to further automate it you could move it to a tmp directory
    on the same filesystem and have tmpreaper do your garbage collection.
    Consider using ionice to run it at a lower priority, but I'm not sure
    how much impact that has on metadata operations like deletion.

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Vitor Hugo Nunes dos Santos@21:1/5 to All on Fri Oct 22 18:30:03 2021
    The real solution would have been having a subvolume for the directory.Subvolume deletion on BTRFS is near instant.Same for ZFS with datasets, etc.October 22, 2021 9:50 AM, "Rich Freeman" <rich0@gentoo.org> wrote:> On Fri, Oct 22, 2021 at 8:39 AM
    Miles Malone> <m.malone@homicidalteddybear.net> wrote:> >> small files... (Certainly dont quote me here, but wasnt JFS the king>> of that back in the day? I cant quite recall)> > It is lightning fast on lizardfs due to garbage collection, but>
    metadata on lizardfs is expensive, requiring RAM on the master server> for every inode. I'd never use it for lots of small files.> > My lizardfs master is using 609MiB for 1,111,394 files (the bulk of> which are in snapshots, which create records for
    every file inside, so> if you snapshot 100k files you end up with 200k files). Figure 1kB> per file to be safe. Not a big deal if you're storing large files> (which is what I'm mostly doing). Performance isn't eye-popping> either - I have no idea how
    well it would work for something like a> build system where IOPS matters. For bulk storage of big stuff though> it is spectacular, and scales very well.> > Cephfs also uses delayed deletion. I have no idea how well it> performs, or what the cost of
    metadata is, though I suspect it is a> lot smarter about RAM requirements on the metadata server. Well,> maybe, at least in the past it wasn't all that smart about RAM> requirements on the object storage daemons. I'd seriously look at it> if doing
    anything new.> > Distributed filesystems tend to be garbage collected simply due to> latency. There are data integrity benefits to synchronous writes, but> there is rarely much benefit on blocking on delections, so why do it?> These filesystems
    already need all kinds of synchronization> capabilities due to node failures, so syncing deletions is just a> logical design.> > For conventional filesystems a log-based filesystem is naturally> garbage-collected, but those can have their own issues.
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Jarausch@21:1/5 to Vitor Hugo Nunes dos Santos on Fri Oct 22 21:30:02 2021
    On 10/22/2021 06:15:58 PM, Vitor Hugo Nunes dos Santos wrote:
    The real solution would have been having a subvolume for the
    directory.
    Subvolume deletion on BTRFS is near instant.
    Same for ZFS with datasets, etc.

    Thanks!
    Is it possible to have a hard link from one subvolume to a different
    one?


    October 22, 2021 9:50 AM, "Rich Freeman" <rich0@gentoo.org> wrote:

    On Fri, Oct 22, 2021 at 8:39 AM Miles Malone <m.malone@homicidalteddybear.net> wrote:

    small files... (Certainly dont quote me here, but wasnt JFS the
    king
    of that back in the day? I cant quite recall)

    It is lightning fast on lizardfs due to garbage collection, but
    metadata on lizardfs is expensive, requiring RAM on the master
    server
    for every inode. I'd never use it for lots of small files.

    My lizardfs master is using 609MiB for 1,111,394 files (the bulk of
    which are in snapshots, which create records for every file inside,
    so
    if you snapshot 100k files you end up with 200k files). Figure 1kB
    per file to be safe. Not a big deal if you're storing large files
    (which is what I'm mostly doing). Performance isn't eye-popping
    either - I have no idea how well it would work for something like a
    build system where IOPS matters. For bulk storage of big stuff
    though
    it is spectacular, and scales very well.

    Cephfs also uses delayed deletion. I have no idea how well it
    performs, or what the cost of metadata is, though I suspect it is a
    lot smarter about RAM requirements on the metadata server. Well,
    maybe, at least in the past it wasn't all that smart about RAM
    requirements on the object storage daemons. I'd seriously look at it
    if doing anything new.

    Distributed filesystems tend to be garbage collected simply due to
    latency. There are data integrity benefits to synchronous writes,
    but
    there is rarely much benefit on blocking on delections, so why do
    it?
    These filesystems already need all kinds of synchronization
    capabilities due to node failures, so syncing deletions is just a
    logical design.

    For conventional filesystems a log-based filesystem is naturally garbage-collected, but those can have their own issues.

    --
    Rich





    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Freeman@21:1/5 to jarausch@skynet.be on Fri Oct 22 21:30:02 2021
    On Fri, Oct 22, 2021 at 3:21 PM Helmut Jarausch <jarausch@skynet.be> wrote:

    Is it possible to have a hard link from one subvolume to a different
    one?

    You could do a quick test, but I don't think so. I haven't used btrfs
    in years but they're basically separate filesystems as far as most
    commands are concerned. I don't think you can create reflinks between subvolumes either.

    The files are already reflinked by design though. You'd just make a
    new snapshot and then rsync over it. Anything that doesn't change
    will already share space on disk by virtue of the snapshot. Anything
    that does change will only be modified on the snapshot you target with
    rsync. I'm not sure why you'd want to use a hardlink - it doesn't
    provide the isolation you already get from the snapshot.


    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Jarausch@21:1/5 to All on Sun Oct 24 16:50:01 2021
    Many thanks to Rich, Miles, Vitor and Laurence for their hints on this subject.
    Especially the suggestion to use BTRFS Snapshots was very helpful.

    Finally I have come up with the following shell script for backing up
    my /home directory,
    comments are more than welcome, of course.
    (/HBackUp contains a BTRFS file system)


    #!/bin/zsh
    mount /HBackUp
    pushd /HBackUp
    NUM_TO_KEEP=5
    if [ -d "ohome_$NUM_TO_KEEP" ]
    then
    btrfs subvolume delete "ohome_$NUM_TO_KEEP"
    fi

    if ! [ -f _RSync_Failed ]
    then
    for backup_no in $(seq $((NUM_TO_KEEP-1)) -1 1)
    do
    if [ -d ohome_$backup_no ]
    then
    backup_no_plus_1="$((backup_no+1))"
    mv $ohome_prev "ohome_$backup_no_plus_1"
    fi
    done
    if ! btrfs subvolume snapshot ohome ohome_1
    then
    echo Sorry, creating snapshot ohome_1 failed 1>&2
    exit 1
    fi
    fi

    RSync_OK=False
    touch _RSync_Failed

    popd
    pushd /home

    if rsync -aqxAHSW --delete . /HBackUp/ohome/
    then
    RSync_OK=True
    else
    case "$?" in
    23)
    # This means some files gave permission denied - assume
    that's OK
    RSync_OK=True
    ;;
    24)
    # This means one or more files disappeared
    RSync_OK=True
    ;;
    *)
    echo rsync failed. Exit code was $?"
    ;;
    esac
    fi

    popd

    cd /HBackUp

    case "$RSync_OK" in
    True)
    rm _RSync_Failed
    ;;
    False)
    cd /
    umount /HBackUp
    exit 1
    ;;
    esac

    cd /
    umount /HBackUp

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rich Freeman@21:1/5 to jarausch@skynet.be on Mon Oct 25 00:20:01 2021
    On Sun, Oct 24, 2021 at 10:48 AM Helmut Jarausch <jarausch@skynet.be> wrote:

    Finally I have come up with the following shell script for backing up
    my /home directory,
    comments are more than welcome, of course.
    (/HBackUp contains a BTRFS file system)


    Is /home on btrfs? If not then something like this is probably your
    best bet (not sure if existing tools work, and I didn't carefully
    check your script for any missed error handling).

    If /home is on btrfs, just on a different filesystem, then you're
    probably much better off using something based on send/receive. This
    can let you very efficiently transfer snapshots incrementally and
    preserve COW between the copies (that is, unchanged data is
    deduplicated). While rsync is very efficient for network transfer or
    amount of data written, in order to detect changes it has to read all
    the inodes in both copies of the data. If you don't trust mtime in
    either copy then you have to further read all the data itself. That
    is a lot of read IO and CPU on both ends, though if they're on
    different hosts the network traffic is very efficient.

    Btrfs send/receive can determine what has changed between two
    snapshots very efficiently, as snapshots are b-trees that are
    cross-linked where there are no changes, so you only have to descend
    and then transfer branches that actually contain changes (similar to
    git). If only one file changes between snapshots the number of reads
    to find it scales logarithmically with the amount of metadata, while
    rsync scales linearly as it has to read all of it. Also, with btrfs
    the incremental change set can be prepared without any reference to
    the target (since both copies are on the source), so latency isn't a
    factor at all. You can just dump the incremental backups to serial
    files if you prefer, though that requires keeping them all as with
    normal incremental backups. If you receive them into a btrfs
    filesystem then you only need to retain one snapshot in common between
    the source and target to allow future incrementals.

    I think there are tools out there which already implement this for
    btrfs. I know such tools exist for zfs and I use one myself. I never
    used this with btrfs as it was immature at the time I switched over.

    --
    Rich

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Poncho@21:1/5 to Rich Freeman on Mon Oct 25 09:50:01 2021
    On 25.10.21 00:15, Rich Freeman wrote:
    I think there are tools out there which already implement this for
    btrfs.

    I personally use:

    app-backup/btrbk
    https://digint.ch/btrbk/
    Tool for creating snapshots and remote backups of btrfs subvolumes

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)