On Fri, Oct 22, 2021 at 7:36 AM Helmut Jarausch <jarausch@skynet.be> wrote:
There are more than 55,000 files on some <PREVBackUp> which is located
on a BTRFS file system.
Standard 'rm -rf' is really slow.
Is there anything I can do about this?
I don't have any solid suggestions as I haven't used btrfs in a while.
File deletion speed is something that is very filesystem specific, but
on most it tends to be slow.
An obvious solution would be garbage collection, which is something
used by some filesystems but I'm not aware of any mainstream ones.
You can sort-of get that behavior by renaming a directory before
deleting it. Suppose you have a directory created by a build system
and you want to do a new build. Deleting the directory takes a long
time. So, first you rename it to something else (or move it someplace
on the same filesystem which is fast), then you kick off your build
which no longer sees the old directory, and then you can delete the
old directory slowly at your leisure. Of course, as with all garbage collection, you need to have the spare space to hold the data while it
gets cleaned up.
I'm not sure if btffs is any faster at deleting snapshots/reflinks
than hard links. I suspect it wouldn't be, but you could test that.
Instead of populating a directory with hard links, create a snapshot
of the directory tree, and then rsync over it/etc. The result looks
the same but is COW copies. Again, I'm not sure that btrfs will be
any faster at deleting reflinks than hard links though - they're both
similar metadata operations. I see there is a patch in the works for
rsync that uses reflinks instead of hard links to do it all in one
command. That has a lot of benefits, but again I'm not sure if it
will help with deletion.
You could also explore other filesystems that may or may not have
faster deletion, or look to see if there is any way to optimize it on
btrfs.
If you can spare the space, the option of moving the directory to make
it look like it was deleted will work on basically any filesystem. If
you want to further automate it you could move it to a tmp directory
on the same filesystem and have tmpreaper do your garbage collection. Consider using ionice to run it at a lower priority, but I'm not sure
how much impact that has on metadata operations like deletion.
--
Rich
small files... (Certainly dont quote me here, but wasnt JFS the king
of that back in the day? I cant quite recall)
There are more than 55,000 files on some <PREVBackUp> which is located
on a BTRFS file system.
Standard 'rm -rf' is really slow.
Is there anything I can do about this?
Rich
The real solution would have been having a subvolume for the
directory.
Subvolume deletion on BTRFS is near instant.
Same for ZFS with datasets, etc.
October 22, 2021 9:50 AM, "Rich Freeman" <rich0@gentoo.org> wrote:
On Fri, Oct 22, 2021 at 8:39 AM Miles Malone <m.malone@homicidalteddybear.net> wrote:
kingsmall files... (Certainly dont quote me here, but wasnt JFS the
of that back in the day? I cant quite recall)
It is lightning fast on lizardfs due to garbage collection, butserver
metadata on lizardfs is expensive, requiring RAM on the master
for every inode. I'd never use it for lots of small files.
My lizardfs master is using 609MiB for 1,111,394 files (the bulk ofso
which are in snapshots, which create records for every file inside,
if you snapshot 100k files you end up with 200k files). Figure 1kBthough
per file to be safe. Not a big deal if you're storing large files
(which is what I'm mostly doing). Performance isn't eye-popping
either - I have no idea how well it would work for something like a
build system where IOPS matters. For bulk storage of big stuff
it is spectacular, and scales very well.
Cephfs also uses delayed deletion. I have no idea how well it
performs, or what the cost of metadata is, though I suspect it is a
lot smarter about RAM requirements on the metadata server. Well,
maybe, at least in the past it wasn't all that smart about RAM
requirements on the object storage daemons. I'd seriously look at it
if doing anything new.
Distributed filesystems tend to be garbage collected simply due tobut
latency. There are data integrity benefits to synchronous writes,
there is rarely much benefit on blocking on delections, so why doit?
These filesystems already need all kinds of synchronization
capabilities due to node failures, so syncing deletions is just a
logical design.
For conventional filesystems a log-based filesystem is naturally garbage-collected, but those can have their own issues.
--
Rich
Is it possible to have a hard link from one subvolume to a different
one?
Finally I have come up with the following shell script for backing up
my /home directory,
comments are more than welcome, of course.
(/HBackUp contains a BTRFS file system)
I think there are tools out there which already implement this for
btrfs.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 87:42:23 |
Calls: | 6,658 |
Files: | 12,203 |
Messages: | 5,333,954 |