Can you please share your backup strategies for linux? I'm curious to know what tools you use and why?How do you automate/schedule backups? Which files/folders you back up? What is your prefered hardware/cloud storage and how do you manage storage space?
I use Borg Backup, automated with a bash script that Borg provides. A cron job runs the script at the desired frequency. I keep backups on different computers, ideally I would recommend one copy in the cloud and one copy on a local machine. Borg compresses and encrypts its backups.
Edit: I migrated a server once using the backups from this system and it worked great.
Borg backup is gold standard, with Vorta as a very nice GUI on machines that need it. Otherwise, all my other Linux machines are running in proxmox hypervisors and have container/snapshot/vm backups regularly through proxmox backup server to another machine. All the backup data is then replicated regularly, remotely via truenas scale replication tasks.
Borg via Vorta handles the hard parts: encryption, compression, deduplication, and archiving. You can mount backup snapshots like drives, without needing to expand them. It splits archives into small chunks so you can easily upload them to your cloud service of choice.
Adding my "Me too" to Vorta/Borg. I use it with Borgbase, which I like because it's legitimately cheap and they support Borg development. As well, you can set Borg backups with Borgbase to "append only," which prevents ransomware or other unexpected "whoopsies" from wiping out your backup history.
I backup most of my computer every hour, but have pruning rules that make sure things don't get too out of hand. I have a second backup that backs everything up to my NAS (using Vorta, again). This is helpful for things like my downloads folder, virtual machines, or STEAM library - things I wouldn't want to backup over the network, but on occasion I do find myself going "whoops, I wanted that."
I also have Vorta working on my Mom's Macbook, then have Borgbase send me an email when there isn't any activity for longer than a couple of days. Once I got automatic pruning working right I never had to touch this again.
I plug in an external drive every so often and drag and drop parts of my home dir into it like it's 1997. I'm not running a data center here. The boomer method is good enough and I don't do anything important enough to warrant going all out with professional snapshot based backup solutions and stuff. And I only save personal documents, media, and custom config files. Everything else is replaceable.
I use rsync to incrementally back up / to a separate drive, as well as a drive on another device (my server), which then packs, compresses and encrypts the latest backup of all devices daily, and uploads them to Hetzner as well as GDrive.
I was talking with a techhead from the 80s about what he did when his tape drives failed and the folly that is keeping data alive on a system that doesn't need to be. His foolproof backup storage is as follows.
At Christmas buy a new hard drive. If Moore's law allows, it should be double what you currently have
Put your current backup hardrive into a SATA drive slot. Copy over backup into new hard drive.
Write with a sharpie the date at which this was done on the harddrive. The new hard drive is your current backup.
Place the now old backup into your drawer and forget about it.
On New Years Day, load each of the drives into a SATA drive slot and fix any filesystem issues.
Shout out to all the homies with nothing, I'm still waiting to buy a larger disk in hopes of rescuing as much data from a failing 3TB disk as I can. I got some read errors and unplugged it about 3 months ago.
Dump configs to backup drive. Pray to the machine spirit that things don't blow up. Only update when I remember. I'm a terrible admin for my own stuff.
Scuse the cut and paste, but this is something I recently thought quite hard about and blogged, so stealing my own content:
What to back up? This is a core question to ask when you start planning. I think it’s quite simply answered by asking the secondary question: “Can I get the data again?” Don’t back up stuff you downloaded from the public internet unless it’s particularly rare. No TV, no Movies, no software installers. Don’t hoard data you can replace.
Do back up stuff you’ve personally created and that doesn’t exist elsewhere, or stuff that would cause you a lot of effort or upset if it wasn’t available. Letters you’ve written, pictures you’ve taken, code you authored, configurations and systems that took you a lot of time to set up and fine tune.
If you want to be able to restore a full system, that’s something else and generally dealt best with imaging – I’m talking about individual file backups here!
Backup Scenario Multiple household computers. Home linux servers. Many services running natively and in docker. A couple of windows computers.
Daily backups Once a day, automate backups of your important files.
On my linux machines, that’s things like some directories like /etc, /root, /docker-data, some shared files.
On my windows machines, then that’s some mapping data, word documents, pictures, geocaching files, generated backups and so on.
You work out the files and get an idea of how much space you need to set aside.
Then, with automated methods, have these files copied or zipped up to a common directory on an always-available server. Let’s call that /backup.
These should be versioned, so that older ones get expired automatically. You can do that with bash scripts, or automated backup software (I use backup-manager for local machines, and backuppc or robocopy for windows ones)
How many copies you keep depends on your preferences – 3 is a sound number, but choose what you want and what disk space you have. More than 1 is a good idea since you may not notice the next day if something is missing or broken.
Monthly Backups – Make them Offline if possible
I puzzled a long time over the best way to do offline backups. For years I would manually copy the contents of /backup to large HDDs once a month. That took an hour or two for a few terabytes.
Now, I attach an external USB hard drive to my server, with a smart power socket controlled by Home Assistant.
This means it’s “cold storage”. The computer can’t access it unless the switch is turned on – something no ransomware knows about. But I can write a script that turns on the power, waits a minute for it to spin up, then mounts the drive and copies the data. When it’s finished, it’ll then unmount the drive and turn off the switch, and lastly, email me to say “Oi, change the drives, human”.
Once I get that email, I open my safe (fireproof and in a different physical building) and take out the oldest of three usb Caddies. Swap that with the one on the server and put that away. Classic Grandfather/Father/Son backups.
Once a year, I change the oldest of those caddies to “Annual backup, 2024” and buy a new one. That way no monthly drive will be older than three years, and I have a (probably still viable) backup by year.
BTW – I use USB3 HDD caddies (and do test for speed – they vary hugely) because I keep a fair bit of data. But you can also use one of the large capacity USB Thumbdrives or MicroSD cards for this. It doesn’t really matter how slowly it writes, since you’ll be asleep when it’s backing up. But you do really want it to be reasonably fast to read data from, and also large enough for your data – the above system gets considerably less simple if you need multiple disks.
Error Check: Of course with automated systems, you need additional automated systems to ensure they’re working! When you complete a backup, touch a file to give you a timestamp of when it was done – online and offline. I find using “tree” to catalogue the files is worthwhile too, so you know what’s on there.
Lastly – test your backups. Once or twice a year, pick a backup at random and ensure you can copy and unpack the files. Ensure they are what you expect and free from errors.
I try to be good about everything being installed in packages, even if Im the one that made the package. that means I only have to worry about backing up my local package archive. but Ive never actualy recreated a personal system from a backup, and usually end up starting from a fresh install, slowly adding back things from the backup if I missed them. this tends to cut down on cruft and no longer needed hacks and fixes. also makes for a good way to be exposed to new paradigms (desktop environments, shells, etc)
something that helps is daily notes. one file for any day Im working on my system and want to remember what a custom file, confg edit, or downloaded/created package does and why. these get saved separately and I try to remember to grep them before asking the internet
i see the benefit to snapshots, but disk space is expensive, and Im (usually) careful (enough) not to lock myself out or prevent boots. anything catastophic I have to fix is usually seen as a fun, stressful learning experience! that rarely happens anymore, for better or for worse
I'm using rustic, a lock-free rust-written drop-in-replacement of restic, which (I'm referring to restic and therefore in extension to rustic) supports always-encrypted, deduplicating, compressed and easy backups without you needing to worry about whether to do a full- or incremental-backup.
All my machines run hourly backups of all mounted partitions to an append-only repo at borgbase. I have a file with ignore pattern globs to skip unwanted files and dirs (i.e.: **/.cache).
While I think borgbase is ok, ther're just using hetzner storage boxes in the background, which are cheaper if you use them directly. I'm thinking of migrating my backups to a handfull of homelabs from trusted friends and family instead.
The backups have a randomized delay of 5m and typically take about 8-9s each (unless big new files need to be uploaded). They are triggered by persistent systemd-timers.
The backups have been running across my laptop, pc and server for about 6 months now and I'm at ~380 GiB storage usage total.
I've mounted backup snapshots on multiple occasions already to either get an old version of a file, or restore it entirely.
There is a tool called redu which is like ncdu but works on restic/rustic repos. This makes it easy to identify which files blow up your backup size.
Synology NAS. I really love that thing. I use their synology drive software to backup the Linux home folder, as well as windows PCs, iPads, iPhones etc. I use their photos mobile software to automatically backup phone photos and videos. I also synchronize a few select folders between PCs so certain in-use files are always up to date. I set the NAS to keep 30 old versions of every file. This works great for my college kids - dad has a copy of everything in case they nuke a paper or something (which has happened).
I stopped cloning drives long ago. Now I just reinstall the os and packages. With Linux, this is honestly faster than deploying a backup - a single pacman command installs everything I want. Then I just log into things as I open them. Ya I might have to futz around with some settings or redownload some big games on steam - but the eye candy and games can wait - I can be productive pretty quickly after an install.
I DO use btrfs with automatic snapshots (snapper and btrfs assistant). This saves me from myself when I bork an update (which I’ve done more than once). If I make a mistake, I just rollback a snapshot, and try again without my stupid mistakes. This has saved my install 3 or 4 times now.
Lastly, I sneaker net an external hard drive to my office. On it is a manual backup of the NAS. I do this once per month. This protects from catastrophic failures like my house burning down. I might lose a month or so of pictures in the worst case scenario, but I still have my 25+ years of pictures of my kids, wedding videos, etc.
In the end, the only thing that really matters is not losing my lifetime of family pictures and the good memories they provoke.
Synology NAS here also, divided into private (family stuff, docker volumes etc) and public (Linux ISOs and anything that can be redownloaded). Both get backed up weekly to an older NAS with Hyper Backup. Private additionally goes onto a LUKS encrypted drive monthly which is spot-checked, taken offsite, and the previous offsite drive brought back. I don't back up any PC (don't care, just reinstall) or phones (they are backed up on iCloud).
Example of a Bash script that performs the following tasks
Checks the availability of an important web server.
Checks disk space usage.
Makes a backup of the specified directories.
Sends a report to the administrator's email.
Example script:
#!/bin/bash
# Settings
WEB_SERVER="https://example.com"
BACKUP_DIR="/backup"
TARGET_DIRS="/var/www /etc"
DISK_USAGE_THRESHOLD=90
ADMIN_EMAIL="admin@example.com"
DATE=$(date +"%Y-%m-%d")
BACKUP_FILE="$BACKUP_DIR/backup-$DATE.tar.gz"
# Checking web server availability
echo "Checking web server availability..."
if curl -s --head $WEB_SERVER | grep "200 OK" > /dev/null; then
echo "Web server is available."
else
echo "Warning: Web server is unavailable!" | mail -s "Problem with web server" $ADMIN_EMAIL
fi
# Checking disk space
echo "Checking disk space..."
DISK_USAGE=$(df / | grep / | awk '{ print $5 }' | sed 's/%//g')
if [ $DISK_USAGE -gt $DISK_USAGE_THRESHOLD ]; then
echo "Warning: Disk space usage exceeded $DISK_USAGE_THRESHOLD%!" | mail -s "Problem with disk space" $ADMIN_EMAIL
else
echo "There is enough disk space."
fi
# Creating backup
echo "Creating backup..."
tar -czf $BACKUP_FILE $TARGET_DIRS
if [ $? -eq 0 ]; then
echo "Backup created successfully: $BACKUP_FILE"
else
echo "Error creating backup!" | mail -s "Error creating backup" $ADMIN_EMAIL
fi
# Sending report
echo "Sending report to $ADMIN_EMAIL..."
REPORT="Report for $DATE\n\n"
REPORT+="Web server status: $(curl -s --head $WEB_SERVER | head -n 1)\n"
REPORT+="Disk space usage: $DISK_USAGE%\n"
REPORT+="Backup location: $BACKUP_FILE\n"
echo -e $REPORT | mail -s "Daily system report" $ADMIN_EMAIL
echo "Done."
Description:
Check web server: Uses curl command to check if the site is available.
Check disk space: Use df and awk to check disk usage. If the threshold (90%) is exceeded, a notification is sent.
Create a backup: The tar command archives and compresses the directories specified in the TARGET_DIRS variable.
Send a report: A report on all operations is sent to the administrator's email using mail.
How to use:
Set the desired parameters, such as the web server address, directories for backup, disk usage threshold and email.
I use OneDrive. I know people will hate but it’s cheap and works on everything (well, it takes a third party tool on Linux). If I care about it it goes in OneDrive, otherwise I don’t need it that much.
May I ask why you prefer that over Google Drive, or others such as Dropbox or Mega? I used it extensively when I used Windows, but that's been several years.
Here's one that probably nobody else here is doing. The backup goes on my mobile device. Yes, the thing in my pocket.
Mount it over SSHFS on the local network
Unlock a LUKS container in the form of a 30GB sparse file on the device
rsync the files across
Lock, unmount
The backup is incremental but the container file never changes size, no matter what's in it. Your data is in two places and always under your physical control. But the key is never stored on the remote device, so you could also do this with a VPS.
Dot files on github, an HHD for storing photos, downloads, documents as well as my not in use games. I also sync keepass files across all network devices.
My desktop, laptop and homelab all synd my important stuff over syncthing. They all do btrfs snapshots three months back in case an oopsie would propagate.
The homelab additionally fetches deduplicated snapshots of my VPS weekly, before syncing all of the above to an encrypted hetzner storage for those burning-down-the-house events.
I use immutable nixos installs. Everything to redeploy my OS is tracked in git including most app configurations. The one exception are some GUI apps I'd have to do manually on reinstall.
I have a persistence volume for things like:
Rollbacks
Personal files
Git repos
Logs
Caches / Games
I have 30 days (or last 5 minimum) of system rollbacks using BTRFS volumes.
The personal files are backed up hourly to a local server which then backs up nightly to B2 Backblaze using rclone in an encrypted volume using my private keys. The local server has a mishmash of drives in a mirrored LVM setup. While it works well for having mixed drives, I'll warn I haven't had a drive failure yet so I'm not sure the difficulty of replacing a drive.
My phone uses the same flow with RoundSync (rclone + GUI).
Git repos are backed up in git.
Logs aren't backed up. I just persist them for debugging and don't want them lost after every reboot.
Caches/Games are persisted but not backed up. Nixos uses symlinks and BTRFS to be immutable. That paradigm doesn't work well for this case. The one exception is a couple game folders are part of my personal files. WoW plugin folder, EvE online layouts, etc.
I used to use Dropbox (with rclone to encrypt). It was $20/mo for 2Tb. It is cheaper on paper. I don't backup nearly that much. Backblaze started at $1/mo for what I use. I'm now up to $2/mo. It will be a few years before I need to clean up my backups for cost reasons.
The local server is a PC in a case with 8 drive bays plus some NVME drives for fast storage. It has a couple older drives and for the last couple years I typically buy a pair of drives on sale (black Friday, prime day, etc). I have a little over 30TB mirrored, so slightly over 60TB in total. NVME is not counted in that. One NVME is for the system, the others are a caching layer (monero node) or temp storage (transcoding as it also my media server).
I like the case, but if I were to do it again, I'd probably get a rack mountable case.
You seem pretty organized in your strategy, I would suggest you just pull a drive in your LVM to check how that goes for you. I've had issues in JBOD style LVM volumes with drive swaps, but YMMV.
Frankly, I use ZFS now in anything that I would have use LVM in before. The feature set is way more robust. Also, an offsite ZFS replication to zfs.rent is a good backup of a backup. But Backblaze is pretty solid too.
Good call on a simulated failure. When I first set it up, it was LVM/BTRFS or ZFS as my top choices. It was a coin toss at the time because I hadn't built this sort of setup before.
When I researched this previously I concluded that there are two very good options for regular backups: Borg and Restic. These are especially efficient at backing up a diff of what has changed since the last backup. So you get snapshots of your filesystem state at each backup point without using a huge amount of space. You can mount any snapshot as a virtual directory. After the initial backup, incremental backups take a minute or two.
I use Borg, and I back up to cloud storage on Borgbase. I use Vorta as a GUI for Borg. I have Vorta start automatically when I start my window manager, and I have it set up for daily backups. I set up the same thing on my kid's computer.
I back up my home directory. I have some excluded directories like ~/.cache, and Steam's data directory. I use Baobab to find large directories that I don't want backed up.
I use the "exclude caches" option in the Borg "create archive" settings. That automatically excludes Rust target/ directories because they follow the Cache Directory Tagging Specification. Not all programming languages' tooling follows that spec so I also use directory name pattern excludes. For example I have an exclude pattern for .*/node_modules/.*
I use NixOS, and I keep my system config in a git repo so I don't need backups for anything outside my home directory.
Currently I use Borg Backup with Vorta as a GUI. I don't really do anything automated/scheduled, I just back it up manually to an external SSD every few days or so. I pretty much do my whole /home folder, except for a couple of subfolders that aren't really necessary (and Videos, which I back up separately.)
I do eventually want to upgrade to a NAS, but I'm waiting until we move to start setting that up. Also I don't really have an off-site plan yet which I know is bad, but I need to figure that out.
Nightly rsync to two NAS boxes in the house (TrueNAS Scale and a Synology). Docs go in NextCloud, hosted on a VM in my basement, which is also backed up to the Synology by Proxmox. Also backing up my main machine (Pop!_OS) and my wife’s laptop (ThinkPad E595, also Pop!_OS) using Spideroak One.
For files are in git (using stow to recreate) and my documents folder is syncing to nextcloud (selfhosted) and this also to my laptop. This is of course not a "Backup" per se more a "multiple copies" but it gets the job done and also firs my workflow.
To be happy with that I want to set up an offsite backup of data from my server to a NAS in my parents place but right now that's just a to-do I haven't put any work in yet ;)
Not only because third world issues, but because I like adrenaline, I don't have any backup strategy but an old external HDD where I haven't copied stuff since 2018.
When I could afford a new PC and tried to rsync my data from my old crappy laptop, much of it was lost.
That being said, I had a backup strategy back in the day that was burning CDs. I used to have a second HDD (a IDE one) but they were so freaking bad all of them went bad after a year or so, so I have like 3 or 4 of them stored without any chance to recover their data.
After having recently restored some stuff from an aging external hdd, i'm seriously considering getting a few dvdr discs and burning the important things every now and then.
I know they don't last forever either, but - just as a random example that has definitely never happened to me hahaha - you can drop them from a height of 3 feet and still get files off them!
I use Duplicity to backup my home directory, excluding Steam and Downloads folders. It is setup to backup weekly to my NAS mounted as NFS.
The NAS has a weekly cron task to upload the backups to pCloud using rclone.
I backup this way, several computers (2 desktop, 2 laptop, the NAS as well). The files included in this strategy are essentially my photos, documents and configs. My software installations, games, media library are not backed up.
You have loads of options but you need to also start from ... "what if". Work out how important your data really is. Take another look and ask the kids and others if they give a toss. You might find that no one cares about your photo collection in which case if your phone dies ... who cares? If you do care then sync them to a PC or laptop.
Firstly, for my dotfiles, I use home-manager. I keep the config on my git server and in theory I can pull it down and set up a system the way I like it.
In terms of backups, I use Pika to backup my home directory to my hard disk every day, so I can, in theory, pull back files I delete.
I also push a core selection of my files to my server using Pika, just in case my house burns down. Likewise, I pull backups from my server to my desktop (again with Pika) in case Linode starts messing me about.
I also have a 2TiB ssd I keep in a strongbox and some cloud storage which I push bigger things to sporadically.
I also take occasional data exports from online services I use. Because hey, Google or Discord can ban you at any time for no reason. :P
I use syncthing to sync almost everything across my computer, laptop (occasional usage), server (RAID1), old laptop (powered up once every month or so), and a few other devices (that only get a small subset of my data, though). On the computer, laptop, and server, I have btrfs snapshots (snapper). Overall, this works very well, I always have 4+ copies of my data in 2+ geographical locations.
The important stuff is in cloud storage using Cryptomator (I'm hoping that rclone should make sync simple), I should probably set up time shift in case things do go wrong
Timeshift for the system, works perfectly, if you screw up the system, bad update for instance just start it, and you'll be back up running in less than ten minutes. Simple Cron backups for data, documents etc, just in case you delete a folder, document, image etc . Both of these options to a second internal HD
I leverage btrfs or ZFS snapshots. I take rolling system level snapshots on a schedule (daily, weekly, monthly and separately before any package upgrades or installs) and user data snapshots every couple of hours. Then I use btrbk to sync those snapshots to an external drive at least once a week. When I have all of my networking gear and home services setup I also sync all of this to storage on my NAS. Any hosts on the network keep rolling snapshots stored on the NAS as well.
Important data also gets shoveled into a B2 bucket and/or Google drive if I need to be able to access it from a phone.
I keep snapshots small by splitting data up into well defined subvolumes, anything that can be reacquired from the cloud (downloads, package caches, steam libraries, movies, music, etc) isn't included in the backup strategy. If I download something and it's hard to find or important I move it out of downloads and into a location that is covered by my backups.
All of my servers make local dumps of their databases and config files to directories owned by unprivileged users. This includes file paths, permissions, and ownerships (so I know how to put them back).
My primary research server at home uses rsync to pull copies of those local backups from my servers.
My primary research server uses Restic to make a daily incremental backup to Backblaze's B2 service.
For my home server, I use Restic and a cronjob to weekly take snapshots of all my services. It then gets synced to a Backblaze B2 bucket (at $6/TB/mo). It's pretty neat, only saving the difference between the previous and current snapshot, removes older snapshots, and encrypts everything.
restic -> Wasabi, automated with shell script and cron. Uses an include list to tell it what paths to back up.
Script has Pushover credentials to send me backup alerts. Parses restic log to tell me how much was backed up, removed, success/failure of backup, and current repo size.
To be added: a periodic restore of a random file to have its hash compared to the current version of the file (will happen right after backup, unlikely to have changed in my workload), which will be subsequently deleted, and alert sent letting me know how the restore test went.
The only thing I use as a backup is a Live CD that's mounted to a USB thumb drive.
I used to use Timeshift but the one time I needed it, it didn't work for some reason. It also had a problem of making my PC temporarily unusable while it was making a backup, so I didn't enable it when I had to reinstall Linux Mint.
Same, Timeshift let me down one time when I needed it. I still use it though, and I'm afraid to upgrade Mint because I don't want to set my system again for of the upgrade fails to keep my configuration and Timeshift fails to take me back
I sync important files to s3 from a folder with awscli. Dot files and projects are in a private git repos. That's it.
If I maintained a server, I would do something more sophisticated, but installation is so dead simple these days that I could get a daily driver in working order very quickly.
For system files/configuration on my machines, timeshift set to run once a week.
For family photos and shared files, I built a pair of SFTP servers made from old HP thin-client PCs at two different geographic locations which automatically sync to each other once a day via cron job using vsftpd and lftp. Each one has both an NVMe and SATA SSD which run in a software RAID 1 configuration.
For any other files, a second local server also using vsftpd and two SSDs in USB enclosures. I manually back them up using rsync on an irregular basis.
I built a backup server out of my old desktop, running Ubuntu and ZFS
I have a dataset for each of my computers and i back them up to the corresponding datasets in the zfs pool on the server semi-regularly. The zfs pool has enough disks for some redundancy, so i can handle occasional drive failures. My other computers run arbitrary filesystems (ext4, btrfs, rarely ntfs)
the only problem with my current setup is that if there is file degradation on my workstation that i dont notice, it might get backed up to the server by mistake. then a degraded file might overwrite a non-degraded backup. to avoid this, i generally dont overwrite files when i backup. since 90% of my data is pictures, it's not a big deal since they dont change
Someday i'd like to set up proxmox and virtualize everything, and i'd also like to set up something offsite i could zfs-send to as a second backup
Internal RAID1 as first line of defense. Rsync to external drives where at least one is always offsite as second. Rclone to cloud storage for my most important data as the third.
Backups 2 and 3 are manual but I have reminders set and do it about once a month. I don't accrue much new data that I can't easily replace so that's fine for me.