part text/plain 382Hi,
Hello list,
I have an external USB-3 drive with various system backups. There are 350 .tarPick your favorite of gzip, bzip2, xz or lzip (I recommend lzip) and
files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't need to compress them, so I didn't, but now I think I'm going to have to. Is there a reasonably efficient way to do this? I have 500GB spare space on /dev/sda, and
the machine runs constantly.
[2021-09-26 11:57] Peter Humphrey <peter@prh.myzen.co.uk>
part text/plain 382Hi,
Hello list,
I have an external USB-3 drive with various system backups. There are 350 .tarPick your favorite of gzip, bzip2, xz or lzip (I recommend lzip) and
files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't need to >> compress them, so I didn't, but now I think I'm going to have to. Is there a >> reasonably efficient way to do this? I have 500GB spare space on /dev/sda, and
the machine runs constantly.
then:
mount USB-3 /mnt; cd /mnt; lzip *
The archiver you chose will compress the file and add the appropriate extension all on its own and tar will use that (and the file magic) to
find the appropriate decompresser when you want to extract files later
(you can use `tar tf' to test if you want).
--
Simon Thelen
In addition to this, you may want to use the parallel implementations
of "gzip", "xz", "bzip2" or the new "zstd" (zstandard), which are
"pigz"[1], "pixz"[2], "pbzip2"[3], or "zstmt" (within package "app-arch/zstd")[4] in order to increase performance:
$ cd <path_to_mounted_backup_partition>
$ for tar_archive in *.tar; do pixz "${tar_archive}"; done
-Ramon
[1]
* https://www.zlib.net/pigz/
[2]
* https://github.com/vasi/pixz
[3]
* https://launchpad.net/pbzip2
* http://compression.ca/pbzip2/
[4]
* https://facebook.github.io/zstd/
On 26/09/2021 13:36, Simon Thelen wrote:
[2021-09-26 11:57] Peter Humphrey <peter@prh.myzen.co.uk>
part text/plain 382Hi,
Hello list,
I have an external USB-3 drive with various system backups. TherePick your favorite of gzip, bzip2, xz or lzip (I recommend lzip) and
are 350 .tar
files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't
need to
compress them, so I didn't, but now I think I'm going to have to. Is
there a
reasonably efficient way to do this? I have 500GB spare space on
/dev/sda, and
the machine runs constantly.
then:
mount USB-3 /mnt; cd /mnt; lzip *
The archiver you chose will compress the file and add the appropriate
extension all on its own and tar will use that (and the file magic) to
find the appropriate decompresser when you want to extract files later
(you can use `tar tf' to test if you want).
--
Simon Thelen
Addendum:
To complete the list. Here the parallel implementation of "lzip":
"plzip": https://www.nongnu.org/lzip/plzip.html
-Ramon
On 26/09/2021 14:23, Ramon Fischer wrote:
In addition to this, you may want to use the parallel implementations
of "gzip", "xz", "bzip2" or the new "zstd" (zstandard), which are "pigz"[1], "pixz"[2], "pbzip2"[3], or "zstmt" (within package "app-arch/zstd")[4] in order to increase performance:
$ cd <path_to_mounted_backup_partition>
$ for tar_archive in *.tar; do pixz "${tar_archive}"; done
-Ramon
[1]
* https://www.zlib.net/pigz/
[2]
* https://github.com/vasi/pixz
[3]
* https://launchpad.net/pbzip2
* http://compression.ca/pbzip2/
[4]
* https://facebook.github.io/zstd/
On 26/09/2021 13:36, Simon Thelen wrote:
[2021-09-26 11:57] Peter Humphrey <peter@prh.myzen.co.uk>
part text/plain 382
Hello list,
Hi,
I have an external USB-3 drive with various system backups. There
are 350 .tar
files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't
need to
compress them, so I didn't, but now I think I'm going to have to. Is
there a
reasonably efficient way to do this? I have 500GB spare space on
/dev/sda, and
the machine runs constantly.
Pick your favorite of gzip, bzip2, xz or lzip (I recommend lzip) and
then:
mount USB-3 /mnt; cd /mnt; lzip *
The archiver you chose will compress the file and add the appropriate
extension all on its own and tar will use that (and the file magic) to
find the appropriate decompresser when you want to extract files later
(you can use `tar tf' to test if you want).
Or, I could connect a second USB-3 drive to a different interface, then read from one and write to the other, with or without the SATA between.
Hello list,
I have an external USB-3 drive with various system backups. There are 350 .tar
files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't need
to
compress them, so I didn't, but now I think I'm going to have to. Is there
a
reasonably efficient way to do this?
</div>
On Sun, Sep 26, 2021 at 8:57 PM Peter Humphrey <peter@prh.myzen.co.uk>
wrote:
Hello list,
I have an external USB-3 drive with various system backups. There are 350 .tar files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't need to compress them, so I didn't, but now I think I'm going to have to. Is there a reasonably efficient way to do this?
find <mountpoint> -name \*tar -exec zstd -TN {} \;
Where N is the number of cores you want to allocate. zstd -T0 (or just zstdmt) if you want to use all the available cores. I use zstd for
everything now as it's as good as or better than all the others in the general case.
Parallel means it uses more than one core, so on a modern machine it is
much faster.
On Monday, 27 September 2021 02:39:19 BST Adam Carter wrote:<peter@prh.myzen.co.uk>
On Sun, Sep 26, 2021 at 8:57 PM Peter Humphrey
wrote:
Hello list,
I have an external USB-3 drive with various system backups. There are
350
.tar files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't need to compress them, so I didn't, but now I think I'm going to have
to.
Is there a reasonably efficient way to do this?
find <mountpoint> -name \*tar -exec zstd -TN {} \;
Where N is the number of cores you want to allocate. zstd -T0 (or just zstdmt) if you want to use all the available cores. I use zstd for everything now as it's as good as or better than all the others in the general case.
Parallel means it uses more than one core, so on a modern machine it is much faster.
Thanks to all who've helped. I can't avoid feeling, though, that the main bottleneck has been missed: that I have to read and write on a USB-3 drive. It's just taken 23 minutes to copy the current system backup from USB-3 to SATA SSD: 108GB in 8 .tar files.
Thanks to all who've helped. I can't avoid feeling, though, that the main bottleneck has been missed: that I have to read and write on a USB-3 drive. It's just taken 23 minutes to copy the current system backup from USB-3 to SATA SSD: 108GB in 8 .tar files.
Hello list,
I have an external USB-3 drive with various system backups. There are 350 .tar files (not .tar.gz etc.), amounting to 2.5TB. I was sure I wouldn't
need to compress them, so I didn't, but now I think I'm going to have to.
Is there a reasonably efficient way to do this? I have 500GB spare space on /dev/sda, and the machine runs constantly.
You keep mentioning USB3, but I think the main factor here is that the external drive is probably a spinning hard drive (I don't think you explicitly mentioned this but it seems likely esp with the volume of
data). That math works out to 78MB/s. Hard drive transfer speeds
depend on the drive itself and especially whether there is more than
one IO task to be performed, so I can't be entirely sure, but I'm
guessing that the USB3 interface itself is having almost no adverse
impact on the transfer rate.
The main thing to avoid is doing other sustained read/writes from the
drive at the same time.
It looks like you ended up doing the bulk of the compression on an
SSD, and obviously those don't care nearly as much about IOPS.
I've been playing around with lizardfs for bulk storage and found that
USB3 hard drives actually work very well, as long as you're mindful
about what physical ports are on what USB hosts and so on. A USB3
host can basically handle two hard drives with no loss of performance.
I'm not dealing with a ton of IO though so I can probably stack more
drives with pretty minimal impact unless there is a rebuild (in which
case the gigabit ethernet is probably still the larger bottleneck).
Even a Raspberry Pi 4 has two USB3 hosts, which means you could stack
4 hard drives on one and get basically the same performance as SATA.
When you couple that with the tendency of manufacturers to charge less
for USB3 drives than SATA drives of the same performance it just
becomes a much simpler solution than messing with HBAs and so on and
limiting yourself to hardware that can actually work with an HBA.
There are also backup tools which will handle the compression step for you.
app-backup/duplicity uses a similar tar file and index system with periodic full and then incremental chains. Plus it keeps a condensed list of file hashes from previous runs so it doesn't have to re-read the entire archive
to determine what changed the way rsync does.
app-backup/borgbackup is more complex, but is very, very good at deduplicating file data, which saves even more space. Furthermore, it can store backups for multiple systems and deduplicate between them, so if you have any other machines you can have backups there as well, potentially at negligble space cost if you have a lot of redundancy.
Thanks Laurence. I've looked at borg before, wondering whether I needed a more sophisticated tool than just tar, but it looked like too much work for little gain. I didn't know about duplicity, but I'm used to my weekly routine and it seems reliable, so I'll stick with it pro tem. I've been keeping a daily KMail archive since the bad old days, and five weekly backups of the whole system, together with 12 monthly backups and, recently an annual backup. That last may be overkill, I dare say.
On Wed, Sep 29, 2021 at 4:27 AM Peter Humphrey <peter@prh.myzen.co.uk> wrote:
Thanks Laurence. I've looked at borg before, wondering whether I needed aI think Restic might be gaining some ground on duplicity. I use
more sophisticated tool than just tar, but it looked like too much work for >> little gain. I didn't know about duplicity, but I'm used to my weekly routine
and it seems reliable, so I'll stick with it pro tem. I've been keeping a
daily KMail archive since the bad old days, and five weekly backups of the >> whole system, together with 12 monthly backups and, recently an annual
backup. That last may be overkill, I dare say.
duplicity and it is fine, so I haven't had much need to look at
anything else. Big advantages of duplicity over tar are:
1. It will do all the compression/encryption/etc stuff for you - all controlled via options.
2. It uses librsync, which means if one byte in the middle of a 10GB
file changes, you end up with a few bytes in your archive and not 10GB (pre-compression).
3. It has a ton of cloud/remote backends, so it is real easy to store
the data on AWS/Google/whatever. When operating this way it can keep
local copies of the metadata, and if for some reason those are lost it
can just pull that only down from the cloud to resync without a huge
bill.
4. It can do all the backup rotation logic (fulls, incrementals,
retention, etc).
5. It can prefix files so that on something like AWS you can have the
big data archive files go to glacier (cheap to store, expensive to
restore), and the small metadata stays in a data class that is cheap
to access.
6. By default local metadata is kept unencrypted, and anything on the
cloud is encrypted. This means that you can just keep a public key in
your keyring for completely unattended backups, without fear of access
to the private key. Obviously if you need to restore your metadata
from the cloud you'll need the private key for that.
If you like the more tar-like process another tool you might want to
look at is dar. It basically is a near-drop-in replacement for tar
but it stores indexes at the end of every file, which means that you
can view archive contents/etc or restore individual files without
scanning the whole archive. tar was really designed for tape where
random access is not possible.
have to split into three drives soon. So, compressing may help. Since it is video files, it may not help much but I'm not sure about that. Just curious.
Curious question here. As you may recall, I backup to a external hard drive. Would it make sense to use that software for a external hard drive? Right now, I'm just doing file updates with rsync and the drive is encrypted. Thing is, I'm going to
If I understand correctly you're using rsync+tar and then keeping a set of copies of various ages.
Dale
:-) :-)
If you lose a single file that you want to restore and have to go hunting for it, with tar you can only list the files in the archive by reading the entire thing into memory and only extract by reading from the beginning until you stumble across thematching filename. So with large archives to hunt through, that could take... a while...
dar is compatible with tar (Pretty sure, would have to look again, but I remember that being one of its main selling-points) but adds an index at the end of the file allowing listing of the contents and jumping to particular files without having toread the entire thing. Won't help with your space shortage, but will make searching and single-file restores much faster.
Duplicity and similar has the indices, and additionally a full+incremental scheme. So searching is reasonably quick, and restoring likewise doesn't have to grovel over all the data. It can be slower than tar or dar for restore though because it hasto restore first from the full, and then walk through however many incrementals are necessary to get the version you want. This comes with a substantial space savings though as each set of archive files after the full contains only the pieces which
Borg and similar break the files into variable-size chunks and store each chunk indexed by its content hash. So each chunk gets stored exactly once regardless of how many times it may occur in the data set. Backups then become simply lists of fileattributes and what chunks they contain. This results both in storing only changes between backup runs and in deduplication of commonly-occurring data chunks across the entire backup. The database-like structure also means that all backups can be
LMP
Since the drive also uses LVM, someone mentioned using snapshots.
Still
not real clear on those even tho I've read a bit about them. Some of
the backup technics are confusing to me. I get plain files, even incremental to a extent but some of the new stuff just muddies the water.
I really need to just build a file server, RAID or something. :/
An LVM snapshot creates a "copy on write" image. I'm just beginning to
dig into it myself, but I agree it's a bit confusing.
On Wed, Sep 29, 2021 at 5:48 PM Wols Lists<antlists@youngman.org.uk> wrote:
An LVM snapshot creates a "copy on write" image. I'm just beginning to
dig into it myself, but I agree it's a bit confusing.
So, snapshots in general are a solution for making backups atomic.
That is, they allow a backup to look as if the entire backup was taken
in an instant.
The simplest way to accomplish that is via offline backups. Unmount
the drive, mount it read-only, then perform a backup. That guarantees
that nothing changes between the time the backup starts/stops. Of
course, it also can mean that for many hours you can't really use the
drive.
Snapshots let you cheat. They create two views of the drive - one
that can be used normally, and one which is a moment-in-time snapshot
of what the drive looked like. You backup the snapshot, and you can
use the regular drive.
Curious question here. As you may recall, I backup to a external hard drive. Would it make sense to use that software for a external hard
drive?
Right now, I'm just doing file updates with rsync and the drive
is encrypted.
Thing is, I'm going to have to split into three drives soon. So, compressing may help. Since it is video files, it may not help much but
I'm not sure about that.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 294 |
Nodes: | 16 (3 / 13) |
Uptime: | 248:13:24 |
Calls: | 6,627 |
Calls today: | 3 |
Files: | 12,175 |
Messages: | 5,320,988 |