new posts

This commit is contained in:
casual 2024-08-10 23:12:34 +03:00
parent 6f63ade83e
commit 27bf55ef45
4 changed files with 213 additions and 78 deletions

View File

@ -1,10 +1,9 @@
+++
title = 'HowTo backup'
date = 2024-08-10
hidden = true
+++
In short: 3-2-1 backup strategy.<!--more-->
In short: 3-2-1 backup strategy + Disaster recovery plan.<!--more-->
@ -15,11 +14,17 @@ You should have:
- on 2 different types of storages
- including 1 off-site copy
AND you must test disaster recovery plan
### Why so many copies?
What if you accidently delete important files that you frequently edit? That's the reason to have snapshots.
What if your main drive with data will die? That's the reason to have backup nearby.
What if your main storage Server will die with all drives in it due to power spike (flood,etc...)? That's the reason to have off-site backup.
You think that you smart and have RAID for all those cases? Did you know that in drive arrays, one drive's failure significantly increases the short-term risk of a second drive failing. That's the reason to have off-site backup.
What if your main storage Server will die with all drives in it due to power spike (flood,etc...)? So, do off-site backups.
![](https://imgs.xkcd.com/comics/backup_batteries.png)
@ -30,7 +35,8 @@ You should have:
- 1 backup at place (another drive)
- 1 backup in another place (encrypted in cloud, HDD stored in another remote location (friend's house))
Backups which should be made regularly (daily or more frequently for critical data, +depends how "hot" data (how fast it changes))
Backups which should be made regularly (daily or more frequently for critical data, +depends how "hot" data (how fast it changes)).
My take on it - have a trusted source of data - RAID/Ceph and use snapshots to have copy of data to save some money on backup drives.
### 2 types of storages
@ -52,6 +58,28 @@ It's pretty simple:
The more distant this off-site backup the better.
## Disaster recovery plan
People fall into 3 categories:
- those who don't do backups yet
- those who already do them
- and those who do them and tested them
You should be in 3rd catergory.
__So what is disaster recovery plan?__
You must be prepared in case if your main data and in-site backup dies. You must beforehand imitate:
- accidental data removal (to test in-site snapshots)
- drive failure and its change (to test RAID/Ceph solution)
- main storage failure (to test restore from in-site backup)
- entire site unavailability (to test off-site backup)
Ideally you should write for yourself step-by-step guide what to do in any of those situations
{{< spoiler Examples >}}
&nbsp;
@ -69,6 +97,7 @@ Ceph cluster:
Ideally enterpise-class (or with "RAID support"). The more IOPS - the better
- [automatic snapshots](https://github.com/draga79/cephfs-snp)
- 10Gb network (if you expect total 9-ish (or more) HDD drives or some SSDs)
- Setup Samba/WebDAV/Nextcloud server which will share this storage to your network
- and ideally SSD cache (at least 2 SSDs with PLP) (1tb each more than enough for 10TB of raw storage)
Off-site backup:
@ -79,6 +108,7 @@ Proxmoxx Backup Server at another city (e.g. at friend's house) with RAID1/5/6
(thou you should set it up so if malware/hacker would get to root user it won't overwrite backups)
#### Pros
- Ideal if you already have homeserver and want to expand
- Low chances of loosing data because you essentially have 3 copies (by default, 2 min) of data + hourly/daily/weekly/montly snapshots
So if you get 2 dead drives in a same time - you still won't loose your data
Essentially it covers 2 copies of data
@ -88,50 +118,108 @@ Proxmoxx Backup Server at another city (e.g. at friend's house) with RAID1/5/6
- And if you need/want to be able to freely shutdown one of a servers and still be able to access data - you need to distribute drives so their raw storage would be even on each server.
Or just add in few more server and distribute drives between them so you would still be able to access this storage
- If you get your house+servers destroed - you wouldn't loose your data
- You can access your storage from any device in your network as if it is on it device
#### Cons
- Expect 30% usable space from raw storage (you can use Erasure Coding (RAID5 analog) but it will be slow as hell)
- Expect 30% usable space from raw storage (you can use Erasure Coding (RAID5 analog) but it will be even slower)
- Bad/Slow (in terms of IOPS and delay times) drives without PLP SSD cache can have amazingly bad total speed
- Power usage might be a burden if you don't have any
- More performance comes with more drives because speed = available IOPS and avarage access time for 2-3 drives that have that data. So more drives, more IOPS we have (excluding SSD cache case)
- Ceph can be complicated to understand and maintain in case of failures
### Home-server (Medium setup, hard to maintain)
### Home-server (Medium cost, medium difficulty, hard to maintain)
CIFS/WebDAV/Nextcloud Share:
- get any PC, install linux on it, setup Samba/WebDAV/Nextcloud share
- X number of drives in RAIDZ (4+ even drives) (ideally RAIDZ2)
- ZFS automatic Snapshots
Off-site backup:
Cloud storage + [dublicati](https://github.com/duplicati/duplicati)
OR
Regular (montly/weekly) manual encrypted backup to external HDD which is given to friend.
#### Pros
-
-
-
- It's relativly cheap
- You get storage space from X-1 (or X-2) of drives
- You can access your storage from any device in your network as if it is on it device
- You can loose 1 (RAIDZ2 - 2) drive
#### Cons
-
-
-
### Home PC (Medium setup, hard to maintain)
- If drive fails - storage should be inaccessable for some time after you put new drive instead of failed drive.
- If 2/3 drives fails in short perioud of time - you loose data
- Hard to upgrade storage by using bigger disks, then more disks
- Drives should have same size
### Home PC (low cost, low difficulty, easy to maintain)
We will just put 2 (or more) drives in RAID1 in your PC.
Ideally - buy different drives with same-ish specs so they die in different time. And use file system with snapshot support
Off-site backup:
Cloud storage + [dublicati](https://github.com/duplicati/duplicati)
OR
Regular (montly/weekly) manual encrypted backup to external HDD which is given to friend.
#### Pros
-
-
-
- It's cheap
- Setup easy to understand
#### Cons
-
-
-
### Portable Laptop (easy setup, hard to maintain)
- 50% space from raw storage
- Potentially no snapshots if file system don't support it
- All of the drives should die to loose data
### Laptop (High cost, easy setup, easy to maintain)
This time we will do opposite:
- laptop with cloud storage synchronized in laptop and cloud (so files stored on laptop and cloud)
- ideally file system snapshot support
Off-site backup:
Regular (montly/weekly) manual encrypted backup to external HDD which is given to friend.
#### Pros
-
-
-
- It's cheap at first, but costly in the long run
- It's easy to setup and cloud providers give support (not the best but neverthless)
- It's much easier to maintain since you don't have to deal with hardware
#### Cons
-
-
-
- It's the most privacy unfriendly setup because you will have unencrypted data in cloud - or way that will sync only encrypted data to cloud
- Cloud subcription are costly in the long run
- To have backup - you should be connected to internet
- You may be affected by troubles by cloud provider
### Laptop+PC (Low cost, easy setup, may be hard to maintain)
We will use available hardware and its space, laptop+PC+off-site (friend's) PC for encrypted backups.
The trick is - we will use [syncthing](https://github.com/syncthing/syncthing) - amazing tools, allows P2P sync storage.
#### Pros
- P2P, no other servers involved!
- We can specify where data will be stored encrypted and where freely accessable
- as easy to setup as cloud provider
#### Cons
- The issue may be if file edited in 2 places before sync = version conflict
- Another problem - is storage space, it's easy to setup but it maybe hard to maintain if data drives have different free storage space.
{{< /spoiler >}}
{{< source >}}
https://raidz-calculator.com/raidz-types-reference.aspx
https://www.techtarget.com/searchdatabackup/definition/3-2-1-Backup-Strategy
https://en.wikipedia.org/wiki/Hard_disk_drive
{{< /source >}}

View File

@ -1,38 +0,0 @@
+++
title = 'HowTo Buy HDD'
date = 2024-08-17
hidden = true
+++
##
https://www.extremetech.com/computing/170748-how-long-do-hard-drives-actually-live-for
Consumer HDD drive have following survival chances:
- 92% survival chance for 1.5y (due to manafacturing errors) (5.1% per year)
- 90% survival for 3 years (due to random failure) (1.41% per year)
- -12% every following year (due to wear-out)
For Enterprise HDDs survival a bit higher
https://www.reddit.com/r/DataHoarder/comments/k4rc7a/comment/gedlp75/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
Seller can have 97% positive review while scamming every 30th customer
4tb
Price - 4600-3100p (cost of new / lifetime (3 years) * runtime as % of lifetime) (/2, /3)
Run time - 13000-17000h (1.5-2 years)
2tb
Price 3300-2200p
(01,C7, 07,) No CRC errors, read errors, reassigned sectors (?)
1tb
Price - 1000p
{{< source >}}
https://www.reddit.com/r/DataHoarder/comments/1eg0kpf/brand_preference/
https://www.reddit.com/r/DataHoarder/comments/k4rc7a/comment/gedlp75/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
https://www.extremetech.com/computing/170748-how-long-do-hard-drives-actually-live-for
{{< /source >}}

View File

@ -0,0 +1,90 @@
+++
title = 'HowTo Buy HDD'
date = 2024-08-17
+++
We will talk about buying new drives and used ones<!--more-->
But first:
## Statistic
HDD drive have following survival chances (in case of work 24/7-ish):
- 92% survival chance for first 1.5y at work (due to manafacturing errors) (5.1% per year)
- 90% survival for 3 years (due to random failure) (1.41% per year)
- -12% every following year (due to wear-out)
For Enterprise HDDs survival rate may be a higher.
![](https://i.extremetech.com/imagery/content-types/017c7K9UIE7N2VnHK8XqLds/images-5.jpg)
(For at least _some_ Seagate drives, survival rate is 80% after 2 years, and presumably -10% every next year)
Now there is 3 main vendors who actually produce HDDs: WD, Seagate, Toshiba. Other vendors probably just relabelled those 3.
## New
All vendors have bad/good year/batch/etc... So try to pick different drives or from different production years. Also we will pick solely on price+performance+warranty metrics.
### Size
Before that, you should pick size. If you are not running data center then your maximum drive capacity should be 4tb (still depending on usecase but that's general recommendations)
It will cost more, but it's better to pick 2 drives of lower sizes than 1 bigger to drasticly improve data survival chances
### Performance
That's the most complicated thing because it highly depends from your needs. You can primarly look into RPMs.
"Hot" data - that changes frequently or accessed in random places (Databases, RAID5, Ceph, many users usecase) - need to have faster drives: 7200RPM+, the more IOPS, the better.
Better IOPS usually have drives with "RAID support".
"Cold" data - that really rarely written and read - be ok with slow drives: 5400RPM.
### Warranty / Survival time
That's important factor when choosing drive. Usually, it's 3y, but may be 5y. During that time you can expect that you get that drive replaced if it die. Vendors not stupid, they put warranty right before the chances of dying higher. So, warranty may indicate you how much presumably drive have highest survival chances. Be advised that in some countries those warranties may not work or vendor don't work in this country.
Also it seems that recent drives with helium (without "DO NOT COVER HOLE" label) - lasts more-ish.
If it's hard to change drive in your use case (e.g. RAID, operation critical server...), pick enterprise drive with more warranty. (And __maybe__ not Seagate)
### Price
Pick cheapest.
## Used
Here comes gray area of trust. That's because SMART data can be reset.
Thus your best bet are:
- __drives with 1.5y-2y (13000-17000h) power on time__ (due to statistic staff)
- __No CRC errors, read errors, reassigned sectors__ in SMART data
- if statistically expected that drive survive to its 3y birthday (hereinafter 'lifetime')- then
`price = cost of new drive / lifetime * current power on time (as % of lifetime)`
`(min = new drive/3, max = new drive/2)`
So let's say new cheapest drive costs 100$, min price would be - 33$, max - 50%
Optional:
- recently produced (<5y)
- WD (that's my personal choice, i think they tend to produce drives with better quality control, so less drives die due to manufacturing errors)
- Enterprise class (althou failure rates are "very much similar" they tend to be in worse operating environment, e.g. server room, so ["don't scream at the drives, they'll get discouraged"](https://www.youtube.com/watch?v=tDacjrSCeq4&pp=ygUKaGRkIHNjcmVhbQ%3D%3D))
### Sellers
As you understand there is no warranty (as SMART don't indicate reliability of drive) and drive may fail few weeks later.
So ask to pack drive nicely.
Seller can have 97% positive reviews while scamming every 30th customer. so don't look at positive reviews - look only at negatives.
![](https://www.explainxkcd.com/wiki/images/4/4c/a-minus-minus.png)
{{< source >}}
https://www.reddit.com/r/DataHoarder/comments/1eg0kpf/brand_preference/
https://www.reddit.com/r/DataHoarder/comments/k4rc7a/comment/gedlp75/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button
https://www.extremetech.com/computing/170748-how-long-do-hard-drives-actually-live-for
https://www.backblaze.com/blog/hard-drive-life-expectancy/
https://www.backblaze.com/blog/helium-filled-hard-drive-failure-rates/
https://darwinsdata.com/are-helium-hard-drives-worth-it/
https://en.wikipedia.org/wiki/Hard_disk_drive
{{< /source >}}

View File

@ -1,10 +1,9 @@
+++
title = 'HowTo Data Hoard'
date = 2024-08-31
hidden = true
+++
![](https://preview.redd.it/this-meme-speaks-to-me-v0-j9dc4klgmw0a1.png?width=640&crop=smart&auto=webp&s=91e23f46de5cbc09861302fcc5b4d00e8192c193)
## Who is data hoarder?
@ -13,6 +12,8 @@ hidden = true
Data hoarder archive large amounts of digital data (terrabytes) that might otherwise be lost, such as old video games, videos and websites.<!--more-->
![](https://preview.redd.it/this-meme-speaks-to-me-v0-j9dc4klgmw0a1.png?width=640&crop=smart&auto=webp&s=91e23f46de5cbc09861302fcc5b4d00e8192c193)
### Why does they do it?
{{< spoiler "Spoiler" >}}
@ -27,26 +28,20 @@ Usually you start becoming data hoarder when something that you expected to be o
## HowTo Data Hoard
1. [Buy terrabytes of drives](/tech/HowTo_buy_hdd)
2. Access terrabytes of data
- put drives in your PC
1. [Buy a lot of drives](/tech/howto_buy_hdd), raw 10TB would be a good start
- put drives in your PC (not bad idea)
- build/buy NAS
3. Use [3-2-1 backup](/tech/HowTo_backup) strategy for important data
- build Ceph cluster if you bald enough
3. Use [3-2-1 backup](/tech/howto_backup) strategy for important data
4. Download everything that you've ever needed in life and never delete
- [HowTo download site?](/tech/howto_download_site)
- [HowTo download youtube videos?](/tech/howto_download_youtube_video)
{{< spoiler Spoiler >}}
I forgot to remove it from template
{{< /spoiler >}}
- be a good boy and don't violate any local copyright law
{{< source >}}
Random YouTube videos
My Experience
{{< /source >}}
https://en.wikipedia.org/wiki/Digital_hoarding
https://www.pewresearch.org/data-labs/2024/05/17/when-online-content-disappears/
https://www.reddit.com/r/DataHoarder/
https://www.reddit.com/r/DataHoarder/comments/yzb5m0/this_meme_speaks_to_me/
{{< /source >}}