Sanitized mirror from private repository - 2026-03-25 08:51:57 UTC
This commit is contained in:
222
docs/infrastructure/backup-strategy.md
Normal file
222
docs/infrastructure/backup-strategy.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Backup Strategy
|
||||
|
||||
Last updated: 2026-03-21
|
||||
|
||||
## Overview
|
||||
|
||||
The homelab follows a **3-2-1+ backup strategy**: 3 copies of data, 2 different storage types, 1 offsite location, plus cloud backup to Backblaze B2.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ BACKUP FLOW │
|
||||
│ │
|
||||
│ Atlantis (Primary) ──── Hyper Backup (weekly) ──── Calypso (Local copy) │
|
||||
│ │ │
|
||||
│ ├── Syncthing (real-time) ──── Setillo (Tucson, offsite) │
|
||||
│ │ │
|
||||
│ └── Hyper Backup S3 (weekly) ──── Backblaze B2 (cloud) │
|
||||
│ │ │
|
||||
│ Calypso ──── Hyper Backup S3 (daily) ─────┘ │
|
||||
│ │
|
||||
│ Guava ──── Restic (daily 3AM) ──── Backblaze B2 (vk-guava, encrypted) │
|
||||
│ Jellyfish ──── No backup (risk) │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Backup Tasks
|
||||
|
||||
### Atlantis → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Task name** | Backblaze b2 |
|
||||
| **Schedule** | Weekly, Sundays 00:00 |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-atlantis` |
|
||||
| **Encrypted** | Yes (client-side) |
|
||||
| **Versioned** | Yes (Smart Recycle) |
|
||||
| **Rotation** | Smart Recycle: keep daily for 7 days, weekly for 4 weeks, monthly for 3 months (max 30 versions) |
|
||||
|
||||
**What's backed up:**
|
||||
- `/archive` — long-term cold storage
|
||||
- `/documents/msi_uqiyoe` — PC sync documents
|
||||
- `/documents/pc_sync_documents` — PC sync documents
|
||||
- `/downloads` — download staging
|
||||
- `/photo` — Synology Photos library
|
||||
- `/homes/vish/Photos` — user photo library
|
||||
- Apps: SynologyPhotos, SynologyDrive, FileStation, HyperBackup, SynoFinder
|
||||
|
||||
**What's NOT backed up to cloud:**
|
||||
- `/volume1/media` (~60TB) — too large for cloud backup, replicated to Setillo instead
|
||||
- `/volume1/docker` — container data (stateless, can be redeployed from git)
|
||||
|
||||
### Calypso → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Task name** | Backblaze S3 |
|
||||
| **Schedule** | Daily, 00:00 |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-concord-1` |
|
||||
| **Encrypted** | Yes (client-side) |
|
||||
| **Versioned** | Yes (Smart Recycle) |
|
||||
|
||||
**What's backed up:**
|
||||
- `/docker/authentik` — SSO provider data (critical)
|
||||
- `/docker/gitea` — Git hosting data (critical)
|
||||
- `/docker/headscale` — VPN control plane (critical)
|
||||
- `/docker/immich` — Photo management DB
|
||||
- `/docker/nginx-proxy-manager` — old NPM config (historical)
|
||||
- `/docker/paperlessngx` — Document management DB
|
||||
- `/docker/retro_site` — Personal website
|
||||
- `/docker/seafile` — File storage data
|
||||
- `/data/media/misc` — miscellaneous media
|
||||
- `/data/media/music` — music library
|
||||
- `/data/media/photos` — photo library
|
||||
- Apps: Gitea, MariaDB10, CloudSync, Authentik, Immich, Paperless, HyperBackup
|
||||
|
||||
### Atlantis → Calypso (Local Copy)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Method** | Hyper Backup |
|
||||
| **Schedule** | Weekly |
|
||||
| **Destination** | Calypso `/volume1/backups/` |
|
||||
| **What** | Media, photos, documents |
|
||||
| **Encrypted** | Yes |
|
||||
|
||||
### Atlantis/Calypso → Setillo (Offsite)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Method** | Syncthing (real-time replication) |
|
||||
| **Destination** | Setillo `/volume1/syncthing/` (Tucson, AZ) |
|
||||
| **Distance** | ~1,000 miles from primary site |
|
||||
| **What** | Docker configs, critical data |
|
||||
|
||||
### Setillo → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Task name** | Backblaze B2 |
|
||||
| **Schedule** | Scheduled |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-setillo` |
|
||||
| **Encrypted** | No (data encryption disabled — transit only) |
|
||||
| **Versioned** | Yes (Smart Recycle) |
|
||||
| **Rotation** | Smart Recycle: keep daily for 7 days, weekly for 4 weeks, monthly for 3 months (max 30 versions) |
|
||||
|
||||
**What's backed up:**
|
||||
- `/backups` — backup destination
|
||||
- `/homes/Setillo/Documents` — Edgar's documents
|
||||
- `/homes/vish` — vish home directory
|
||||
- `/PlexMediaServer/2015_2016_crista_green_iphone_5c` — legacy phone photos
|
||||
- `/PlexMediaServer/other` — other media
|
||||
- `/PlexMediaServer/photos` — photos
|
||||
- Apps: DownloadStation, FileStation, HyperBackup, SurveillanceStation, SynoFinder, WebDAVServer
|
||||
|
||||
### Guava (TrueNAS) → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Tool** | Restic + Rclone |
|
||||
| **Schedule** | Daily, 03:00 (TrueNAS cron job ID 1) |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-guava` |
|
||||
| **Repo path** | `vk-guava/restic` |
|
||||
| **Encrypted** | Yes (AES-256, restic client-side encryption) |
|
||||
| **Password file** | `/root/.restic-password` (chmod 600) |
|
||||
| **Rclone config** | `/root/.config/rclone/rclone.conf` |
|
||||
| **Retention** | `--keep-daily 7 --keep-weekly 4 --keep-monthly 3 --prune` |
|
||||
|
||||
**What's backed up:**
|
||||
- `/mnt/data/photos` (158 GB) — photo library (critical)
|
||||
- `/mnt/data/cocalc` (323 MB) — CoCalc notebooks and data
|
||||
- `/mnt/data/medical` (14 MB) — medical records (critical)
|
||||
- `/mnt/data/website` (58 MB) — website data
|
||||
- `/mnt/data/openproject` (13 MB) — project management DB
|
||||
- `/mnt/data/fasten` (5 MB) — health data
|
||||
|
||||
**What's NOT backed up:**
|
||||
- `/mnt/data/guava_turquoise` (3 TB) — large dataset, not yet assessed
|
||||
- `/mnt/data/jellyfin` (203 GB) — media metadata, re-downloadable
|
||||
- `/mnt/data/llama` (64 GB) — LLM models, re-downloadable
|
||||
- `/mnt/data/iso` (556 MB) — ISOs, re-downloadable
|
||||
|
||||
**Backup command (manual run):**
|
||||
```bash
|
||||
sudo restic -r rclone:b2:vk-guava/restic \
|
||||
--password-file /root/.restic-password \
|
||||
backup /mnt/data/photos /mnt/data/cocalc /mnt/data/medical \
|
||||
/mnt/data/website /mnt/data/openproject /mnt/data/fasten
|
||||
```
|
||||
|
||||
**Restore command:**
|
||||
```bash
|
||||
sudo restic -r rclone:b2:vk-guava/restic \
|
||||
--password-file /root/.restic-password \
|
||||
restore latest --target /mnt/data/restore
|
||||
```
|
||||
|
||||
**Check integrity:**
|
||||
```bash
|
||||
sudo restic -r rclone:b2:vk-guava/restic \
|
||||
--password-file /root/.restic-password \
|
||||
check
|
||||
```
|
||||
|
||||
### Disabled Tasks
|
||||
|
||||
| Task | Host | Reason |
|
||||
|------|------|--------|
|
||||
| Backblaze S3 Atlantis (ID 12) | Atlantis | Old task, replaced by "Backblaze b2" (ID 20) |
|
||||
|
||||
## Hosts Without Backup
|
||||
|
||||
| Host | Data at Risk | Mitigation |
|
||||
|------|-------------|------------|
|
||||
| **Jellyfish** (RPi 5) | 1.8TB photos (LUKS2 encrypted NVMe) | LUKS encryption protects at rest, but no redundancy beyond the single drive. Syncthing from phone provides source-of-truth copy. |
|
||||
| **Homelab VM** | Docker data, monitoring databases | Stateless — all compose files in git, data is regenerable. NetBox/Semaphore DBs are the main risk |
|
||||
| **Concord NUC** | Home Assistant config, AdGuard | Container data is relatively small and rebuildable |
|
||||
|
||||
**Recommendation:** Set up Backblaze B2 backup for Jellyfish (photo archive) — irreplaceable data with no cloud backup. Guava is now covered.
|
||||
|
||||
## Recovery Procedures
|
||||
|
||||
### Full NAS Recovery (Atlantis)
|
||||
|
||||
1. Replace failed hardware / reinstall DSM
|
||||
2. Restore from Calypso (fastest — local, weekly copy)
|
||||
3. Or restore from Backblaze B2 (slower — download over internet)
|
||||
4. Redeploy Docker stacks from git (all GitOps-managed)
|
||||
|
||||
### Service Recovery (Any Host)
|
||||
|
||||
1. All Docker stacks are in git (`hosts/` directory)
|
||||
2. Portainer GitOps auto-deploys on push
|
||||
3. Just create the Portainer stack pointing to the compose file
|
||||
4. Service-specific data may need restore from backup
|
||||
|
||||
### Critical Service Priority
|
||||
|
||||
| Priority | Service | Backup Source | Recovery Time |
|
||||
|----------|---------|--------------|---------------|
|
||||
| 1 | Authentik (SSO) | Calypso B2 daily | ~30 min |
|
||||
| 2 | Gitea (Git) | Calypso B2 daily | ~30 min |
|
||||
| 3 | NPM (Reverse Proxy) | Calypso B2 daily / matrix-ubuntu local | ~5 min (redeploy) |
|
||||
| 4 | Plex (Media) | Atlantis B2 weekly | ~1 hr (metadata only, media on disk) |
|
||||
| 5 | Paperless (Documents) | Calypso B2 daily | ~30 min |
|
||||
|
||||
## Monitoring
|
||||
|
||||
- **DIUN**: Monitors container image updates (weekly, ntfy notification)
|
||||
- **Uptime Kuma**: Monitors service availability (97 monitors)
|
||||
- **HyperBackup**: Sends DSM notification on backup success/failure
|
||||
- **Backblaze B2**: Dashboard at `https://secure.backblaze.com/b2_buckets.htm`
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Storage Topology](../diagrams/storage-topology.md) — detailed storage layout per host
|
||||
- [Image Update Guide](../admin/IMAGE_UPDATE_GUIDE.md) — how services are updated
|
||||
- [Offline & Remote Access](offline-and-remote-access.md) — accessing services when internet is down
|
||||
- [Ansible Playbook Guide](../admin/ANSIBLE_PLAYBOOK_GUIDE.md) — `backup_configs.yml` and `backup_databases.yml` playbooks
|
||||
Reference in New Issue
Block a user