349 lines
8.8 KiB
Markdown
349 lines
8.8 KiB
Markdown
# 🔄 Portainer Backup & Recovery Plan
|
|
|
|
**Last Updated**: 2026-01-27
|
|
|
|
This document outlines the backup strategy for Portainer and all managed Docker infrastructure.
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
Portainer manages **5 endpoints** with **130+ containers** across the homelab. A comprehensive backup strategy ensures quick recovery from failures.
|
|
|
|
### Current Backup Configuration ✅
|
|
|
|
| Setting | Value |
|
|
|---------|-------|
|
|
| **Destination** | Backblaze B2 (`vk-portainer` bucket) |
|
|
| **Schedule** | Daily at 3:00 AM |
|
|
| **Retention** | 30 days (auto-delete lifecycle rule) |
|
|
| **Encryption** | Yes (AES-256) |
|
|
| **Backup Size** | ~30 MB per backup |
|
|
| **Max Storage** | ~900 MB |
|
|
| **Monthly Cost** | ~$0.005 |
|
|
|
|
### What's Backed Up
|
|
|
|
| Component | Location | Backup Method | Frequency |
|
|
|-----------|----------|---------------|-----------|
|
|
| Portainer DB | Atlantis:/portainer | **Backblaze B2** | Daily 3AM |
|
|
| Stack definitions | Git repo | Already versioned | On change |
|
|
| Container volumes | Per-host | Scheduled rsync | Daily |
|
|
| Secrets/Env vars | Portainer | Included in B2 backup | Daily |
|
|
|
|
---
|
|
|
|
## Portainer Server Backup
|
|
|
|
### Active Configuration: Backblaze B2 ✅
|
|
|
|
Automatic backups are configured via Portainer UI:
|
|
- **Settings → Backup configuration → S3 Compatible**
|
|
|
|
**Current Settings:**
|
|
```
|
|
S3 Host: https://s3.us-west-004.backblazeb2.com
|
|
Bucket: vk-portainer
|
|
Region: us-west-004
|
|
Schedule: 0 3 * * * (daily at 3 AM)
|
|
Encryption: Enabled
|
|
```
|
|
|
|
### Manual Backup via API
|
|
|
|
```bash
|
|
# Trigger immediate backup
|
|
curl -X POST "http://vishinator.synology.me:10000/api/backup/s3/execute" \
|
|
-H "X-API-Key: "REDACTED_API_KEY" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"accessKeyID": "004d35b7f4bf4300000000001",
|
|
"secretAccessKey": "K004SyhG7s+Xv/LDB32SAJFLKhe5dj0",
|
|
"region": "us-west-004",
|
|
"bucketName": "vk-portainer",
|
|
"password": "portainer-backup-2026",
|
|
"s3CompatibleHost": "https://s3.us-west-004.backblazeb2.com"
|
|
}'
|
|
|
|
# Download backup locally
|
|
curl -X GET "http://vishinator.synology.me:10000/api/backup" \
|
|
-H "X-API-Key: "REDACTED_API_KEY" \
|
|
-o portainer-backup-$(date +%Y%m%d).tar.gz
|
|
```
|
|
|
|
### Option 2: Volume Backup (Manual)
|
|
|
|
```bash
|
|
# On Atlantis (where Portainer runs)
|
|
# Stop Portainer temporarily
|
|
docker stop portainer
|
|
|
|
# Backup the data volume
|
|
tar -czvf /volume1/backups/portainer/portainer-$(date +%Y%m%d).tar.gz \
|
|
/volume1/docker/portainer/data
|
|
|
|
# Restart Portainer
|
|
docker start portainer
|
|
```
|
|
|
|
### Option 3: Scheduled Backup Script
|
|
|
|
Create `/volume1/scripts/backup-portainer.sh`:
|
|
```bash
|
|
#!/bin/bash
|
|
BACKUP_DIR="/volume1/backups/portainer"
|
|
DATE=$(date +%Y%m%d_%H%M%S)
|
|
RETENTION_DAYS=30
|
|
|
|
# Create backup directory
|
|
mkdir -p $BACKUP_DIR
|
|
|
|
# Backup Portainer data (hot backup - no downtime)
|
|
docker run --rm \
|
|
-v portainer_data:/data \
|
|
-v $BACKUP_DIR:/backup \
|
|
alpine tar -czvf /backup/portainer-$DATE.tar.gz /data
|
|
|
|
# Cleanup old backups
|
|
find $BACKUP_DIR -name "portainer-*.tar.gz" -mtime +$RETENTION_DAYS -delete
|
|
|
|
echo "Backup completed: portainer-$DATE.tar.gz"
|
|
```
|
|
|
|
Add to crontab:
|
|
```bash
|
|
# Daily at 3 AM
|
|
0 3 * * * /volume1/scripts/backup-portainer.sh >> /var/log/portainer-backup.log 2>&1
|
|
```
|
|
|
|
---
|
|
|
|
## Stack Definitions Backup
|
|
|
|
All stack definitions are stored in Git (git.vish.gg/Vish/homelab), providing:
|
|
- ✅ Version history
|
|
- ✅ Change tracking
|
|
- ✅ Easy rollback
|
|
- ✅ Multi-location redundancy
|
|
|
|
### Git Repository Structure
|
|
```
|
|
homelab/
|
|
├── Atlantis/ # Atlantis stack configs
|
|
├── Calypso/ # Calypso stack configs
|
|
├── homelab_vm/ # Homelab VM configs
|
|
│ ├── monitoring.yaml
|
|
│ ├── openhands.yaml
|
|
│ ├── ntfy.yaml
|
|
│ └── prometheus_grafana_hub/
|
|
│ └── alerting/
|
|
├── concord_nuc/ # NUC configs
|
|
└── docs/ # Documentation
|
|
```
|
|
|
|
### Backup Git Repo Locally
|
|
```bash
|
|
# Clone full repo with history
|
|
git clone --mirror https://git.vish.gg/Vish/homelab.git homelab-backup.git
|
|
|
|
# Update existing mirror
|
|
cd homelab-backup.git && git remote update
|
|
```
|
|
|
|
---
|
|
|
|
## Container Volume Backup Strategy
|
|
|
|
### Critical Volumes to Backup
|
|
|
|
| Service | Volume Path | Priority | Size |
|
|
|---------|-------------|----------|------|
|
|
| Grafana | /var/lib/grafana | High | ~500MB |
|
|
| Prometheus | /prometheus | Medium | ~2GB |
|
|
| ntfy | /var/cache/ntfy | Low | ~100MB |
|
|
| Alertmanager | /alertmanager | Medium | ~50MB |
|
|
|
|
### Backup Script for Homelab VM
|
|
|
|
Create `/home/homelab/scripts/backup-volumes.sh`:
|
|
```bash
|
|
#!/bin/bash
|
|
BACKUP_DIR="/home/homelab/backups"
|
|
DATE=$(date +%Y%m%d)
|
|
REMOTE="atlantis:/volume1/backups/homelab-vm"
|
|
|
|
# Create local backup
|
|
mkdir -p $BACKUP_DIR/$DATE
|
|
|
|
# Backup critical volumes
|
|
for vol in grafana prometheus alertmanager; do
|
|
docker run --rm \
|
|
-v ${vol}_data:/data \
|
|
-v $BACKUP_DIR/$DATE:/backup \
|
|
alpine tar -czvf /backup/${vol}.tar.gz /data
|
|
done
|
|
|
|
# Sync to remote (Atlantis NAS)
|
|
rsync -av --delete $BACKUP_DIR/$DATE/ $REMOTE/$DATE/
|
|
|
|
# Keep last 7 days locally
|
|
find $BACKUP_DIR -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;
|
|
|
|
echo "Backup completed: $DATE"
|
|
```
|
|
|
|
---
|
|
|
|
## Disaster Recovery Procedures
|
|
|
|
### Scenario 1: Portainer Server Failure
|
|
|
|
**Recovery Steps:**
|
|
1. Deploy new Portainer instance on Atlantis
|
|
2. Restore from backup
|
|
3. Re-add edge agents (they will auto-reconnect)
|
|
|
|
```bash
|
|
# Deploy fresh Portainer
|
|
docker run -d -p 10000:9000 -p 8000:8000 \
|
|
--name portainer --restart always \
|
|
-v /var/run/docker.sock:/var/run/docker.sock \
|
|
-v portainer_data:/data \
|
|
portainer/portainer-ee:latest
|
|
|
|
# Restore from backup
|
|
docker stop portainer
|
|
tar -xzvf portainer-backup.tar.gz -C /
|
|
docker start portainer
|
|
```
|
|
|
|
### Scenario 2: Edge Agent Failure (e.g., Homelab VM)
|
|
|
|
**Recovery Steps:**
|
|
1. Reinstall Docker on the host
|
|
2. Install Portainer agent
|
|
3. Redeploy stacks from Git
|
|
|
|
```bash
|
|
# Install Portainer Edge Agent
|
|
docker run -d \
|
|
-v /var/run/docker.sock:/var/run/docker.sock \
|
|
-v /var/lib/docker/volumes:/var/lib/docker/volumes \
|
|
-v portainer_agent_data:/data \
|
|
--name portainer_edge_agent \
|
|
--restart always \
|
|
-e EDGE=1 \
|
|
-e EDGE_ID=<edge-id> \
|
|
-e EDGE_KEY=<edge-key> \
|
|
-e EDGE_INSECURE_POLL=1 \
|
|
portainer/agent:latest
|
|
|
|
# Stacks will auto-deploy from Git (if AutoUpdate enabled)
|
|
# Or manually trigger via Portainer API
|
|
```
|
|
|
|
### Scenario 3: Complete Infrastructure Loss
|
|
|
|
**Recovery Priority:**
|
|
1. Network (router, switch)
|
|
2. Atlantis NAS (Portainer server)
|
|
3. Git server (Gitea on Calypso)
|
|
4. Edge agents
|
|
|
|
**Full Recovery Checklist:**
|
|
- [ ] Restore network connectivity
|
|
- [ ] Boot Atlantis, restore Portainer backup
|
|
- [ ] Boot Calypso, verify Gitea accessible
|
|
- [ ] Start edge agents on each host
|
|
- [ ] Verify all stacks deployed from Git
|
|
- [ ] Test alerting notifications
|
|
- [ ] Verify monitoring dashboards
|
|
|
|
---
|
|
|
|
## Portainer API Backup Commands
|
|
|
|
### Export All Stack Definitions
|
|
```bash
|
|
#!/bin/bash
|
|
API_KEY=REDACTED_API_KEY
|
|
BASE_URL="http://vishinator.synology.me:10000"
|
|
OUTPUT_DIR="./portainer-export-$(date +%Y%m%d)"
|
|
|
|
mkdir -p $OUTPUT_DIR
|
|
|
|
# Get all stacks
|
|
curl -s -H "X-API-Key: $API_KEY" "$BASE_URL/api/stacks" | \
|
|
jq -r '.[] | "\(.Id) \(.Name) \(.EndpointId)"' | \
|
|
while read id name endpoint; do
|
|
echo "Exporting stack: $name (ID: $id)"
|
|
curl -s -H "X-API-Key: $API_KEY" \
|
|
"$BASE_URL/api/stacks/$id/file" | \
|
|
jq -r '.REDACTED_APP_PASSWORD' > "$OUTPUT_DIR/${name}.yaml"
|
|
done
|
|
|
|
echo "Exported to $OUTPUT_DIR"
|
|
```
|
|
|
|
### Export Endpoint Configuration
|
|
```bash
|
|
curl -s -H "X-API-Key: $API_KEY" \
|
|
"$BASE_URL/api/endpoints" | jq > endpoints-backup.json
|
|
```
|
|
|
|
---
|
|
|
|
## Automated Backup Schedule
|
|
|
|
| Backup Type | Frequency | Retention | Location |
|
|
|-------------|-----------|-----------|----------|
|
|
| Portainer DB | Daily 3AM | 30 days | Atlantis NAS |
|
|
| Git repo mirror | Daily 4AM | Unlimited | Calypso NAS |
|
|
| Container volumes | Daily 5AM | 7 days local, 30 days remote | Atlantis NAS |
|
|
| Full export | Weekly Sunday | 4 weeks | Off-site (optional) |
|
|
|
|
---
|
|
|
|
## Verification & Testing
|
|
|
|
### Monthly Backup Test Checklist
|
|
- [ ] Verify Portainer backup file integrity
|
|
- [ ] Test restore to staging environment
|
|
- [ ] Verify Git repo clone works
|
|
- [ ] Test volume restore for one service
|
|
- [ ] Document any issues found
|
|
|
|
### Backup Monitoring
|
|
Add to Prometheus alerting:
|
|
```yaml
|
|
- alert: BackupFailed
|
|
expr: time() - backup_last_success_timestamp > 86400
|
|
for: 1h
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "Backup hasn't run in 24 hours"
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Reference
|
|
|
|
### Backup Locations
|
|
```
|
|
Atlantis:/volume1/backups/
|
|
├── portainer/ # Portainer DB backups
|
|
├── homelab-vm/ # Homelab VM volume backups
|
|
├── calypso/ # Calypso volume backups
|
|
└── git-mirrors/ # Git repository mirrors
|
|
```
|
|
|
|
### Important Files
|
|
- Portainer API Key: `ptr_REDACTED_PORTAINER_TOKEN`
|
|
- Git repo: `https://git.vish.gg/Vish/homelab`
|
|
- Edge agent keys: Stored in Portainer (Settings → Environments)
|
|
|
|
### Emergency Contacts
|
|
- Synology Support: 1-425-952-7900
|
|
- Portainer Support: https://www.portainer.io/support
|