227 lines
8.5 KiB
Markdown
227 lines
8.5 KiB
Markdown
+++
|
|
title = 'HowTo backup'
|
|
date = 2024-08-10
|
|
image = "https://imgs.xkcd.com/comics/backup_batteries.png"
|
|
+++
|
|
|
|
In short: 3-2-1 backup strategy + Disaster recovery plan.
|
|
|
|
|
|
|
|
## Backup strategy
|
|
|
|
You should have:
|
|
- 3 copies of data
|
|
- on 2 different types of storages
|
|
- including 1 off-site copy
|
|
|
|
AND you must test disaster recovery plan
|
|
|
|
### Why so many copies?
|
|
|
|
What if you accidently delete important files that you frequently edit? That's the reason to have snapshots.
|
|
|
|
What if your main drive with data will die? That's the reason to have backup nearby.
|
|
|
|
You think that you smart and have RAID for all those cases? Did you know that in drive arrays, one drive's failure significantly increases the short-term risk of a second drive failing. That's the reason to have off-site backup.
|
|
|
|
What if your main storage Server will die with all drives in it due to power spike (flood,etc...)? So, do off-site backups.
|
|
|
|
![](https://imgs.xkcd.com/comics/backup_batteries.png)
|
|
|
|
### 3 copies of data
|
|
|
|
You should have:
|
|
- Original data
|
|
- 1 backup at place (another drive)
|
|
- 1 backup in another place (encrypted in cloud, HDD stored in another remote location (friend's house))
|
|
|
|
Backups which should be made regularly (daily or more frequently for critical data, +depends how "hot" data (how fast it changes)).
|
|
My take on it - have a trusted source of data - RAID/Ceph and use snapshots to have copy of data to save some money on backup drives.
|
|
|
|
### 2 types of storages
|
|
|
|
You need to have different 2 types of storage to metigate if some error may affect all devices of 1 type.
|
|
|
|
Storage types examples:
|
|
- Internal HDD/SSD (we will focus on them)
|
|
- External HDD (them)
|
|
- USB drive/SSD
|
|
- Tape library
|
|
- Cloud storage (and them)
|
|
|
|
### 1 off-site copy
|
|
|
|
It's pretty simple:
|
|
- encrypted cloud backup
|
|
- encrypted HDD with backup in another town in friend's house (secured by bubble wrap)
|
|
- or at least encrypted HDD in another house (also secured by bubble wrap)
|
|
|
|
The more distant this off-site backup the better.
|
|
|
|
|
|
## Disaster recovery plan
|
|
|
|
People fall into 3 categories:
|
|
- those who don't do backups yet
|
|
- those who already do them
|
|
- and those who do them and tested them
|
|
|
|
You should be in 3rd catergory.
|
|
|
|
__So what is disaster recovery plan?__
|
|
|
|
You must be prepared in case if your main data and in-site backup dies. You must beforehand imitate:
|
|
- accidental data removal (to test in-site snapshots)
|
|
- drive failure and its change (to test RAID/Ceph solution)
|
|
- main storage failure (to test restore from in-site backup)
|
|
- entire site unavailability (to test off-site backup)
|
|
|
|
Ideally you should write for yourself step-by-step guide what to do in any of those situations
|
|
|
|
|
|
|
|
{{< spoiler Examples >}}
|
|
|
|
|
|
|
|
## Examples
|
|
|
|
|
|
|
|
### Enterprise-ish (Expensive at start, hard setup, easy to maintain)
|
|
|
|
Ceph cluster:
|
|
- requires 3 servers (at least) (OS - Proxmox)
|
|
Ideally server motherboard, ECC RAM, Intel Xeon E5 v4 CPU Family or better / AMD Epyc analog
|
|
- any number of drives (but at least 3 drives)
|
|
Ideally enterpise-class (or with "RAID support"). The more IOPS - the better
|
|
- [automatic snapshots](https://github.com/draga79/cephfs-snp)
|
|
- 10Gb network (if you expect total 9-ish (or more) HDD drives or some SSDs)
|
|
- Setup Samba/WebDAV/Nextcloud server which will share this storage to your network
|
|
- and ideally SSD cache (at least 2 SSDs with PLP) (1tb each more than enough for 10TB of raw storage)
|
|
|
|
Off-site backup:
|
|
|
|
Cloud storage + [dublicati](https://github.com/duplicati/duplicati)
|
|
OR
|
|
Proxmoxx Backup Server at another city (e.g. at friend's house) with RAID1/5/6
|
|
(thou you should set it up so if malware/hacker would get to root user it won't overwrite backups)
|
|
|
|
#### Pros
|
|
- Ideal if you already have homeserver and want to expand
|
|
- Low chances of loosing data because you essentially have 3 copies (by default, 2 min) of data + hourly/daily/weekly/montly snapshots
|
|
So if you get 2 dead drives in a same time - you still won't loose your data
|
|
Essentially it covers 2 copies of data
|
|
- If drive fails - you simple take it out, put new drive in and say that you want add this drive to pool via WebGUI
|
|
- With SSD cache you can throw in any trashy HDD drives until they start to fail
|
|
- You can add any number of drives
|
|
- And if you need/want to be able to freely shutdown one of a servers and still be able to access data - you need to distribute drives so their raw storage would be even on each server.
|
|
Or just add in few more server and distribute drives between them so you would still be able to access this storage
|
|
- If you get your house+servers destroed - you wouldn't loose your data
|
|
- You can access your storage from any device in your network as if it is on it device
|
|
|
|
#### Cons
|
|
- Expect 30% usable space from raw storage (you can use Erasure Coding (RAID5 analog) but it will be even slower)
|
|
- Bad/Slow (in terms of IOPS and delay times) drives without PLP SSD cache can have amazingly bad total speed
|
|
- Power usage might be a burden if you don't have any
|
|
- More performance comes with more drives because speed = available IOPS and avarage access time for 2-3 drives that have that data. So more drives, more IOPS we have (excluding SSD cache case)
|
|
- Ceph can be complicated to understand and maintain in case of failures
|
|
|
|
|
|
### Home-server (Medium cost, medium difficulty, hard to maintain)
|
|
|
|
CIFS/WebDAV/Nextcloud Share:
|
|
- get any PC, install linux on it, setup Samba/WebDAV/Nextcloud share
|
|
- X number of drives in RAIDZ (4+ even drives) (ideally RAIDZ2)
|
|
- ZFS automatic Snapshots
|
|
|
|
Off-site backup:
|
|
Cloud storage + [dublicati](https://github.com/duplicati/duplicati)
|
|
OR
|
|
Regular (montly/weekly) manual encrypted backup to external HDD which is given to friend.
|
|
|
|
#### Pros
|
|
- It's relativly cheap
|
|
- You get storage space from X-1 (or X-2) of drives
|
|
- You can access your storage from any device in your network as if it is on it device
|
|
- You can loose 1 (RAIDZ2 - 2) drive
|
|
|
|
#### Cons
|
|
- If drive fails - storage should be inaccessable for some time after you put new drive instead of failed drive.
|
|
- If 2/3 drives fails in short perioud of time - you loose data
|
|
- Hard to upgrade storage by using bigger disks, then more disks
|
|
- Drives should have same size
|
|
|
|
|
|
|
|
### Home PC (low cost, low difficulty, easy to maintain)
|
|
|
|
We will just put 2 (or more) drives in RAID1 in your PC.
|
|
Ideally - buy different drives with same-ish specs so they die in different time. And use file system with snapshot support
|
|
|
|
Off-site backup:
|
|
Cloud storage + [dublicati](https://github.com/duplicati/duplicati)
|
|
OR
|
|
Regular (montly/weekly) manual encrypted backup to external HDD which is given to friend.
|
|
|
|
|
|
#### Pros
|
|
- It's cheap
|
|
- Setup easy to understand
|
|
|
|
|
|
#### Cons
|
|
- 50% space from raw storage
|
|
- Potentially no snapshots if file system don't support it
|
|
- All of the drives should die to loose data
|
|
|
|
|
|
|
|
### Laptop (High cost, easy setup, easy to maintain)
|
|
|
|
This time we will do opposite:
|
|
- laptop with cloud storage synchronized in laptop and cloud (so files stored on laptop and cloud)
|
|
- ideally file system snapshot support
|
|
|
|
Off-site backup:
|
|
Regular (montly/weekly) manual encrypted backup to external HDD which is given to friend.
|
|
|
|
#### Pros
|
|
- It's cheap at first, but costly in the long run
|
|
- It's easy to setup and cloud providers give support (not the best but neverthless)
|
|
- It's much easier to maintain since you don't have to deal with hardware
|
|
|
|
#### Cons
|
|
- It's the most privacy unfriendly setup because you will have unencrypted data in cloud - or way that will sync only encrypted data to cloud
|
|
- Cloud subcription are costly in the long run
|
|
- To have backup - you should be connected to internet
|
|
- You may be affected by troubles by cloud provider
|
|
|
|
|
|
|
|
### Laptop+PC (Low cost, easy setup, may be hard to maintain)
|
|
|
|
We will use available hardware and its space, laptop+PC+off-site (friend's) PC for encrypted backups.
|
|
The trick is - we will use [syncthing](https://github.com/syncthing/syncthing) - amazing tools, allows P2P sync storage.
|
|
|
|
#### Pros
|
|
- P2P, no other servers involved!
|
|
- We can specify where data will be stored encrypted and where freely accessable
|
|
- as easy to setup as cloud provider
|
|
|
|
#### Cons
|
|
- The issue may be if file edited in 2 places before sync = version conflict
|
|
- Another problem - is storage space, it's easy to setup but it maybe hard to maintain if data drives have different free storage space.
|
|
|
|
|
|
|
|
{{< /spoiler >}}
|
|
|
|
{{< source >}}
|
|
https://raidz-calculator.com/raidz-types-reference.aspx
|
|
https://www.techtarget.com/searchdatabackup/definition/3-2-1-Backup-Strategy
|
|
https://en.wikipedia.org/wiki/Hard_disk_drive
|
|
{{< /source >}}
|
|
|