Files
homelab-optimized/docs/admin/backup-plan.md
Gitea Mirror Bot 72afe8052c
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m0s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-20 00:58:22 UTC
2026-04-20 00:58:22 +00:00

325 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Backup Plan — Decision Document
> **Status**: Planning — awaiting decisions on open questions before implementation
> **Last updated**: 2026-03-13
> **Related**: [backup-strategies.md](backup-strategies.md) (aspirational doc, mostly not yet deployed)
---
## Current State (Honest)
| What | Status |
|---|---|
| Synology Hyper Backup (Atlantis → Calypso) | ✅ Running, configured in DSM GUI |
| Synology Hyper Backup (Atlantis → Setillo) | ✅ Running, configured in DSM GUI |
| Syncthing docker config sync (Atlantis/Calypso/Setillo) | ✅ Running |
| Synology snapshots for media volumes | ✅ Adequate — decided, no change needed |
| Scheduled database backups | ❌ Not deployed (Firefly sidecar is the only exception) |
| Docker volume backups for non-Synology hosts | ❌ Not deployed |
| Cloud (Backblaze B2) | ❌ Account exists, nothing uploading yet |
| Unified backup monitoring / alerting | ❌ Not deployed |
The migration scripts (`backup-matrix.sh`, `backup-mastodon.sh`, `backup.sh`) are
one-off migration artifacts — not scheduled, not monitored.
---
## Recommended Tool: Borgmatic
Borgmatic wraps BorgBackup (deduplicated, encrypted, compressed backups) with a
single YAML config file that handles scheduling, database hooks, and alerting.
| Concern | How Borgmatic addresses it |
|---|---|
| Deduplication | BorgBackup — only changed chunks stored; daily full runs are cheap |
| Encryption | AES-256 at rest, passphrase-protected repo |
| Database backups | Native `postgresql_databases` and `mysql_databases` hooks — calls pg_dump/mysqldump before each run, streams output into the Borg repo |
| Scheduling | Built-in cron expression in config, or run as a container with the `borgmatic-cron` image |
| Alerting | Native ntfy / healthchecks.io / email hooks — fires on failure |
| Restoration | `borgmatic extract` or direct `borg extract` — well-documented |
| Complexity | Low — one YAML file per host, one Docker container |
### Why not the alternatives
| Tool | Reason not chosen |
|---|---|
| Restic | No built-in DB hooks, no built-in scheduler — needs cron + wrapper scripts |
| Kopia | Newer, less battle-tested at this scale; no native DB hooks |
| Duplicati | Unstable history of bugs; no DB hooks; GUI-only config |
| rclone | Sync tool, not a backup tool — no dedup, no versioning, no DB hooks |
| Raw rsync | No dedup, no encryption, no DB hooks, fragile for large trees |
Restic is the closest alternative and would be acceptable if Borgmatic hits issues,
but Borgmatic's native DB hooks are the deciding factor.
---
## Proposed Architecture
### What to back up per host
**Atlantis** (primary NAS, highest value — do first)
- `/volume2/metadata/docker2/` — all container config/data dirs (~194GB used)
- Databases via hooks:
- `immich-db` (PostgreSQL) — photo metadata
- `vaultwarden` (SQLite) — passwords, via pre-hook tar
- `sonarr`, `radarr`, `prowlarr`, `bazarr`, `lidarr` (SQLite) — via pre-hook
- `tdarr` (SQLite + JSON) — transcode config
- `/volume1/data/media/`**covered by Synology snapshots, excluded from Borg**
**Calypso** (secondary NAS)
- `/volume1/docker/` — all container config/data dirs
- Databases via hooks:
- `paperless-db` (PostgreSQL)
- `authentik-db` (PostgreSQL)
- `immich-db` (PostgreSQL, Calypso instance)
- `seafile-db` (MySQL)
- `gitea-db` (PostgreSQL) — see open question #5 below
**homelab-vm** (this machine, `100.67.40.126`)
- Docker named volumes — scrutiny, ntfy, syncthing, archivebox, openhands, hoarder, monitoring stack
- Mostly config-weight data, no large databases
**NUC (concord)**
- Docker named volumes — homeassistant, adguard, syncthing, invidious
**Pi-5**
- Docker named volumes — uptime-kuma (SQLite), glances, diun
**Setillo (Seattle VM)** — lower priority, open question (see below)
---
## Options — Borg Repo Destination
All hosts need a repo to write to. Three options:
### Option A — Atlantis as central repo host (simplest)
```
Atlantis (local) → /volume1/backups/borg/atlantis/
Calypso → SSH → Atlantis:/volume1/backups/borg/calypso/
homelab-vm → SSH → Atlantis:/volume1/backups/borg/homelab-vm/
NUC → SSH → Atlantis:/volume1/backups/borg/nuc/
Pi-5 → SSH → Atlantis:/volume1/backups/borg/rpi5/
```
Pros:
- Atlantis already gets Hyper Backup → Calypso + rsync → Setillo, so all Borg
repos get carried offsite for free with no extra work
- Single place to manage retention policies
- 46TB free on Atlantis — ample room
Cons:
- Atlantis is a single point of failure for all repos
### Option B — Atlantis ↔ Calypso cross-backup (more resilient)
```
Atlantis → SSH → Calypso:/volume1/backups/borg/atlantis/
Calypso → SSH → Atlantis:/volume1/backups/borg/calypso/
Other hosts → Atlantis (same as Option A)
```
Pros:
- If Atlantis dies completely, Calypso independently holds Atlantis's backup
- True cross-backup between the two most critical hosts
Cons:
- Two SSH trust relationships to set up and maintain
- Calypso Borg repo would not be on Atlantis, so it doesn't get carried to Setillo
via the existing Hyper Backup job unless the job is updated to include it
### Option C — Local repo per host, then push to Atlantis
- Each host writes a local repo first, then pushes to Atlantis
- Adds a local copy for fast restores without SSH
- Doubles storage use on each host
- Probably unnecessary given Synology's local snapshot coverage on Atlantis/Calypso
**Recommendation: Option A** if simplicity is the priority; **Option B** if you want
Atlantis and Calypso to be truly independent backup failure domains.
---
## Options — Backblaze B2
B2 account exists. The question is what to push there.
### Option 1 — Borg repos via rclone (recommended)
```
Atlantis (weekly cron):
rclone sync /volume1/backups/borg/ b2:homelab-borg/
```
- BorgBackup's chunk-based dedup means only new/changed chunks upload each week
- Estimated size: initial ~50200GB (configs + DBs only, media excluded), then small incrementals
- rclone runs as a container or cron job on Atlantis after the daily Borg runs complete
- Cost at B2 rates ($0.006/GB/month): ~$11.20/month for 200GB
### Option 2 — DB dumps only to B2
- Simpler — just upload the daily pg_dump files
- No dedup — each upload is a full dump
- Less efficient at scale but trivially easy to implement
### Option 3 — Skip B2 for now
- Setillo offsite rsync is sufficient for current risk tolerance
- Add B2 once monitoring is in place and Borgmatic is proven stable
**Recommendation: Option 1** — the dedup makes it cheap and the full Borg repo in B2
means any host can be restored from cloud without needing Setillo to be online.
---
## Open Questions
These must be answered before implementation starts.
### 1. Which hosts to cover?
- [ ] Atlantis
- [ ] Calypso
- [ ] homelab-vm
- [ ] NUC
- [ ] Pi-5
- [ ] Setillo (Seattle VM)
### 2. Borg repo destination
- [ ] Option A: Atlantis only (simplest)
- [ ] Option B: Atlantis ↔ Calypso cross-backup (more resilient)
- [ ] Option C: Local first, then push to Atlantis
### 3. B2 scope
- [ ] Option 1: Borg repos via rclone (recommended)
- [ ] Option 2: DB dumps only
- [ ] Option 3: Skip for now
### 4. Secrets management
Borgmatic configs need: Borg passphrase, SSH private key (to reach Atlantis repo),
B2 app key (if B2 enabled).
Option A — **Portainer env vars** (consistent with rest of homelab)
- Passphrase injected at deploy time, never in git
- SSH keys stored as host-mounted files, path referenced in config
Option B — **Files on host only**
- Drop secrets to e.g. `/volume1/docker/borgmatic/secrets/` per host
- Mount read-only into borgmatic container
- Nothing in git, nothing in Portainer
Option C — **Ansible vault**
- Encrypt secrets in git — fully tracked and reproducible
- More setup overhead
- [ ] Option A: Portainer env vars
- [ ] Option B: Files on host only
- [ ] Option C: Ansible vault
### 5. Gitea chicken-and-egg
CI runs on Gitea. If Borgmatic on Calypso backs up `gitea-db` and Calypso/Gitea
goes down, restoring Gitea is a manual procedure outside of CI — which is acceptable.
The alternative is to exclude `gitea-db` from Borgmatic and back it up separately
(e.g. a simple daily pg_dump cron on Calypso that Hyper Backup then carries).
- [ ] Include gitea-db in Borgmatic (manual restore procedure documented)
- [ ] Exclude from Borgmatic, use separate pg_dump cron
### 6. Alerting ntfy topic
Borgmatic can push failure alerts to the existing ntfy stack on homelab-vm.
- [ ] Confirm ntfy topic name to use (e.g. `homelab-backups` or `homelab`)
- [ ] Confirm ntfy internal URL (e.g. `http://100.67.40.126:<port>`)
---
## Implementation Phases (draft, not yet started)
Once decisions above are made, implementation follows these phases in order:
**Phase 1 — Atlantis**
1. Create `hosts/synology/atlantis/borgmatic.yaml`
2. Config: backs up `/volume2/metadata/docker2`, DB hooks for all postgres/sqlite containers
3. Repo destination per decision on Q2
4. Alert on failure via ntfy
**Phase 2 — Calypso**
1. Create `hosts/synology/calypso/borgmatic.yaml`
2. Config: backs up `/volume1/docker`, DB hooks for paperless/authentik/immich/seafile/(gitea)
3. Repo: SSH to Atlantis (or cross-backup per Q2)
**Phase 3 — homelab-vm, NUC, Pi-5**
1. Create borgmatic stack per host
2. Mount `/var/lib/docker/volumes` read-only into container
3. Repos: SSH to Atlantis
4. Staggered schedule: 02:00 Atlantis / 03:00 Calypso / 04:00 homelab-vm / 04:30 NUC / 05:00 Pi-5
**Phase 4 — B2 cloud egress** (if Option 1 or 2 chosen)
1. Add rclone container or cron on Atlantis
2. Weekly sync of Borg repos → `b2:homelab-borg/`
**Phase 5 — Monitoring**
1. Borgmatic ntfy hook per host — fires on any failure
2. Uptime Kuma push monitor per host — borgmatic pings after each successful run
3. Alert if no ping received in 25h
---
## Borgmatic Config Skeleton (reference)
```yaml
# /etc/borgmatic/config.yaml (inside container)
# This is illustrative — actual configs will be generated per host
repositories:
- path: ssh://borg@100.83.230.112/volume1/backups/borg/calypso
label: atlantis-remote
source_directories:
- /mnt/docker # host /volume1/docker mounted here
exclude_patterns:
- '*/cache'
- '*/transcode'
- '*/thumbs'
- '*.tmp'
- '*.log'
postgresql_databases:
- name: paperless
hostname: paperless-db
username: paperless
password: "REDACTED_PASSWORD"
format: custom
- name: authentik
hostname: authentik-db
username: authentik
password: "REDACTED_PASSWORD"
format: custom
retention:
keep_daily: 14
keep_weekly: 8
keep_monthly: 6
ntfy:
topic: homelab-backups
server: http://100.67.40.126:2586
states:
- fail
encryption_passphrase: ${BORG_PASSPHRASE}
```
---
## Related Docs
- [backup-strategies.md](backup-strategies.md) — existing aspirational doc (partially outdated)
- [portainer-backup.md](portainer-backup.md) — Portainer-specific backup notes
- [disaster-recovery.md](../troubleshooting/disaster-recovery.md)