Files
homelab-optimized/docs/infrastructure/backup-strategy.md
Gitea Mirror Bot 8882a5948e
Some checks failed
Documentation / Build Docusaurus (push) Failing after 18m57s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-05 11:10:41 UTC
2026-04-05 11:10:41 +00:00

235 lines
9.7 KiB
Markdown

# Backup Strategy
Last updated: 2026-03-21
## Overview
The homelab follows a **3-2-1+ backup strategy**: 3 copies of data, 2 different storage types, 1 offsite location, plus cloud backup to Backblaze B2.
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ BACKUP FLOW │
│ │
│ Atlantis (Primary) ──── Hyper Backup (weekly) ──── Calypso (Local copy) │
│ │ │
│ ├── Syncthing (real-time) ──── Setillo (Tucson, offsite) │
│ │ │
│ └── Hyper Backup S3 (weekly) ──── Backblaze B2 (cloud) │
│ │ │
│ Calypso ──── Hyper Backup S3 (daily) ─────┘ │
│ │
│ Guava ──── Restic (daily 3AM) ──── Backblaze B2 (vk-guava, encrypted) │
│ Jellyfish ──── No backup (risk) │
└─────────────────────────────────────────────────────────────────────────────┘
```
## Backup Tasks
### Atlantis → Backblaze B2 (Cloud)
| Setting | Value |
|---------|-------|
| **Task name** | Backblaze b2 |
| **Schedule** | Weekly, Sundays 00:00 |
| **Destination** | `s3.us-west-004.backblazeb2.com` |
| **Bucket** | `vk-atlantis` |
| **Encrypted** | Yes (client-side) |
| **Versioned** | Yes (Smart Recycle) |
| **Rotation** | Smart Recycle: keep daily for 7 days, weekly for 4 weeks, monthly for 3 months (max 30 versions) |
**What's backed up:**
- `/archive` — long-term cold storage
- `/documents/msi_uqiyoe` — PC sync documents
- `/documents/pc_sync_documents` — PC sync documents
- `/downloads` — download staging
- `/photo` — Synology Photos library
- `/homes/vish/Photos` — user photo library
- Apps: SynologyPhotos, SynologyDrive, FileStation, HyperBackup, SynoFinder
**What's NOT backed up to cloud:**
- `/volume1/media` (~60TB) — too large for cloud backup, replicated to Setillo instead
- `/volume1/docker` — container data (stateless, can be redeployed from git)
### Calypso → Backblaze B2 (Cloud)
| Setting | Value |
|---------|-------|
| **Task name** | Backblaze S3 |
| **Schedule** | Daily, 00:00 |
| **Destination** | `s3.us-west-004.backblazeb2.com` |
| **Bucket** | `vk-concord-1` |
| **Encrypted** | Yes (client-side) |
| **Versioned** | Yes (Smart Recycle) |
**What's backed up:**
- `/docker/authentik` — SSO provider data (critical)
- `/docker/gitea` — Git hosting data (critical)
- `/docker/headscale` — VPN control plane (critical)
- `/docker/immich` — Photo management DB
- `/docker/nginx-proxy-manager` — old NPM config (historical)
- `/docker/paperlessngx` — Document management DB
- `/docker/retro_site` — Personal website
- `/docker/seafile` — File storage data
- `/data/media/misc` — miscellaneous media
- `/data/media/music` — music library
- `/data/media/photos` — photo library
- Apps: Gitea, MariaDB10, CloudSync, Authentik, Immich, Paperless, HyperBackup
### Atlantis → Calypso (Local Copy)
| Setting | Value |
|---------|-------|
| **Method** | Hyper Backup |
| **Schedule** | Weekly |
| **Destination** | Calypso `/volume1/backups/` |
| **What** | Media, photos, documents |
| **Encrypted** | Yes |
### Atlantis/Calypso → Setillo (Offsite)
| Setting | Value |
|---------|-------|
| **Method** | Syncthing (real-time replication) |
| **Destination** | Setillo `/volume1/syncthing/` (Tucson, AZ) |
| **Distance** | ~1,000 miles from primary site |
| **What** | Docker configs, critical data |
### Setillo → Backblaze B2 (Cloud)
| Setting | Value |
|---------|-------|
| **Task name** | Backblaze B2 |
| **Schedule** | Scheduled |
| **Destination** | `s3.us-west-004.backblazeb2.com` |
| **Bucket** | `vk-setillo` |
| **Encrypted** | No (data encryption disabled — transit only) |
| **Versioned** | Yes (Smart Recycle) |
| **Rotation** | Smart Recycle: keep daily for 7 days, weekly for 4 weeks, monthly for 3 months (max 30 versions) |
**What's backed up:**
- `/backups` — backup destination
- `/homes/Setillo/Documents` — Edgar's documents
- `/homes/vish` — vish home directory
- `/PlexMediaServer/2015_2016_crista_green_iphone_5c` — legacy phone photos
- `/PlexMediaServer/other` — other media
- `/PlexMediaServer/photos` — photos
- Apps: DownloadStation, FileStation, HyperBackup, SurveillanceStation, SynoFinder, WebDAVServer
### Guava (TrueNAS) → Backblaze B2 (Cloud)
| Setting | Value |
|---------|-------|
| **Tool** | Restic + Rclone |
| **Schedule** | Daily, 03:00 (TrueNAS cron job ID 1) |
| **Destination** | `s3.us-west-004.backblazeb2.com` |
| **Bucket** | `vk-guava` |
| **Repo path** | `vk-guava/restic` |
| **Encrypted** | Yes (AES-256, restic client-side encryption) |
| **Password file** | `/root/.restic-password` (chmod 600) |
| **Rclone config** | `/root/.config/rclone/rclone.conf` |
| **Retention** | `--keep-daily 7 --keep-weekly 4 --keep-monthly 3 --prune` |
**What's backed up:**
- `/mnt/data/photos` (158 GB) — photo library (critical)
- `/mnt/data/cocalc` (323 MB) — CoCalc notebooks and data
- `/mnt/data/medical` (14 MB) — medical records (critical)
- `/mnt/data/website` (58 MB) — website data
- `/mnt/data/openproject` (13 MB) — project management DB
- `/mnt/data/fasten` (5 MB) — health data
**What's NOT backed up:**
- `/mnt/data/guava_turquoise` (3 TB) — large dataset, not yet assessed
- `/mnt/data/jellyfin` (203 GB) — media metadata, re-downloadable
- `/mnt/data/llama` (64 GB) — LLM models, re-downloadable
- `/mnt/data/iso` (556 MB) — ISOs, re-downloadable
**Backup command (manual run):**
```bash
sudo restic -r rclone:b2:vk-guava/restic \
--password-file /root/.restic-password \
backup /mnt/data/photos /mnt/data/cocalc /mnt/data/medical \
/mnt/data/website /mnt/data/openproject /mnt/data/fasten
```
**Restore command:**
```bash
sudo restic -r rclone:b2:vk-guava/restic \
--password-file /root/.restic-password \
restore latest --target /mnt/data/restore
```
**Check integrity:**
```bash
sudo restic -r rclone:b2:vk-guava/restic \
--password-file /root/.restic-password \
check
```
### Guava Backup → Moon (Browse Access)
The Guava full backup on atlantis is NFS-mounted on moon for browsing:
| Setting | Value |
|---------|-------|
| **Source** | atlantis `/volume1/archive/guava_full_backup` |
| **Mount** | moon `/home/moon/guava_backup_atlantis` |
| **Protocol** | NFS v3 over Tailscale (`100.83.230.112`) |
| **Access** | Read-only, moon user (uid 1000) |
| **Persistent** | fstab with `_netdev,nofail` |
### Disabled Tasks
| Task | Host | Reason |
|------|------|--------|
| Backblaze S3 Atlantis (ID 12) | Atlantis | Old task, replaced by "Backblaze b2" (ID 20) |
## Hosts Without Backup
| Host | Data at Risk | Mitigation |
|------|-------------|------------|
| **Jellyfish** (RPi 5) | 1.8TB photos (LUKS2 encrypted NVMe) | LUKS encryption protects at rest, but no redundancy beyond the single drive. Syncthing from phone provides source-of-truth copy. |
| **Homelab VM** | Docker data, monitoring databases | Stateless — all compose files in git, data is regenerable. NetBox DB is the main risk |
| **Concord NUC** | Home Assistant config, AdGuard | Container data is relatively small and rebuildable |
**Recommendation:** Set up Backblaze B2 backup for Jellyfish (photo archive) — irreplaceable data with no cloud backup. Guava is now covered.
## Recovery Procedures
### Full NAS Recovery (Atlantis)
1. Replace failed hardware / reinstall DSM
2. Restore from Calypso (fastest — local, weekly copy)
3. Or restore from Backblaze B2 (slower — download over internet)
4. Redeploy Docker stacks from git (all GitOps-managed)
### Service Recovery (Any Host)
1. All Docker stacks are in git (`hosts/` directory)
2. Portainer GitOps auto-deploys on push
3. Just create the Portainer stack pointing to the compose file
4. Service-specific data may need restore from backup
### Critical Service Priority
| Priority | Service | Backup Source | Recovery Time |
|----------|---------|--------------|---------------|
| 1 | Authentik (SSO) | Calypso B2 daily | ~30 min |
| 2 | Gitea (Git) | Calypso B2 daily | ~30 min |
| 3 | NPM (Reverse Proxy) | Calypso B2 daily / matrix-ubuntu local | ~5 min (redeploy) |
| 4 | Plex (Media) | Atlantis B2 weekly | ~1 hr (metadata only, media on disk) |
| 5 | Paperless (Documents) | Calypso B2 daily | ~30 min |
## Monitoring
- **DIUN**: Monitors container image updates (weekly, ntfy notification)
- **Uptime Kuma**: Monitors service availability (97 monitors)
- **HyperBackup**: Sends DSM notification on backup success/failure
- **Backblaze B2**: Dashboard at `https://secure.backblaze.com/b2_buckets.htm`
## Related Documentation
- [Storage Topology](../diagrams/storage-topology.md) — detailed storage layout per host
- [Image Update Guide](../admin/IMAGE_UPDATE_GUIDE.md) — how services are updated
- [Offline & Remote Access](offline-and-remote-access.md) — accessing services when internet is down
- [Ansible Playbook Guide](../admin/ANSIBLE_PLAYBOOK_GUIDE.md) — `backup_configs.yml` and `backup_databases.yml` playbooks