Sanitized mirror from private repository - 2026-04-20 01:32:01 UTC
This commit is contained in:
348
docs/admin/portainer-backup.md
Normal file
348
docs/admin/portainer-backup.md
Normal file
@@ -0,0 +1,348 @@
|
||||
# 🔄 Portainer Backup & Recovery Plan
|
||||
|
||||
**Last Updated**: 2026-01-27
|
||||
|
||||
This document outlines the backup strategy for Portainer and all managed Docker infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Portainer manages **5 endpoints** with **130+ containers** across the homelab. A comprehensive backup strategy ensures quick recovery from failures.
|
||||
|
||||
### Current Backup Configuration ✅
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Destination** | Backblaze B2 (`vk-portainer` bucket) |
|
||||
| **Schedule** | Daily at 3:00 AM |
|
||||
| **Retention** | 30 days (auto-delete lifecycle rule) |
|
||||
| **Encryption** | Yes (AES-256) |
|
||||
| **Backup Size** | ~30 MB per backup |
|
||||
| **Max Storage** | ~900 MB |
|
||||
| **Monthly Cost** | ~$0.005 |
|
||||
|
||||
### What's Backed Up
|
||||
|
||||
| Component | Location | Backup Method | Frequency |
|
||||
|-----------|----------|---------------|-----------|
|
||||
| Portainer DB | Atlantis:/portainer | **Backblaze B2** | Daily 3AM |
|
||||
| Stack definitions | Git repo | Already versioned | On change |
|
||||
| Container volumes | Per-host | Scheduled rsync | Daily |
|
||||
| Secrets/Env vars | Portainer | Included in B2 backup | Daily |
|
||||
|
||||
---
|
||||
|
||||
## Portainer Server Backup
|
||||
|
||||
### Active Configuration: Backblaze B2 ✅
|
||||
|
||||
Automatic backups are configured via Portainer UI:
|
||||
- **Settings → Backup configuration → S3 Compatible**
|
||||
|
||||
**Current Settings:**
|
||||
```
|
||||
S3 Host: https://s3.us-west-004.backblazeb2.com
|
||||
Bucket: vk-portainer
|
||||
Region: us-west-004
|
||||
Schedule: 0 3 * * * (daily at 3 AM)
|
||||
Encryption: Enabled
|
||||
```
|
||||
|
||||
### Manual Backup via API
|
||||
|
||||
```bash
|
||||
# Trigger immediate backup
|
||||
curl -X POST "http://vishinator.synology.me:10000/api/backup/s3/execute" \
|
||||
-H "X-API-Key: "REDACTED_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"accessKeyID": "004d35b7f4bf4300000000001",
|
||||
"secretAccessKey": "K004SyhG7s+Xv/LDB32SAJFLKhe5dj0",
|
||||
"region": "us-west-004",
|
||||
"bucketName": "vk-portainer",
|
||||
"password": "portainer-backup-2026",
|
||||
"s3CompatibleHost": "https://s3.us-west-004.backblazeb2.com"
|
||||
}'
|
||||
|
||||
# Download backup locally
|
||||
curl -X GET "http://vishinator.synology.me:10000/api/backup" \
|
||||
-H "X-API-Key: "REDACTED_API_KEY" \
|
||||
-o portainer-backup-$(date +%Y%m%d).tar.gz
|
||||
```
|
||||
|
||||
### Option 2: Volume Backup (Manual)
|
||||
|
||||
```bash
|
||||
# On Atlantis (where Portainer runs)
|
||||
# Stop Portainer temporarily
|
||||
docker stop portainer
|
||||
|
||||
# Backup the data volume
|
||||
tar -czvf /volume1/backups/portainer/portainer-$(date +%Y%m%d).tar.gz \
|
||||
/volume1/docker/portainer/data
|
||||
|
||||
# Restart Portainer
|
||||
docker start portainer
|
||||
```
|
||||
|
||||
### Option 3: Scheduled Backup Script
|
||||
|
||||
Create `/volume1/scripts/backup-portainer.sh`:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="/volume1/backups/portainer"
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
RETENTION_DAYS=30
|
||||
|
||||
# Create backup directory
|
||||
mkdir -p $BACKUP_DIR
|
||||
|
||||
# Backup Portainer data (hot backup - no downtime)
|
||||
docker run --rm \
|
||||
-v portainer_data:/data \
|
||||
-v $BACKUP_DIR:/backup \
|
||||
alpine tar -czvf /backup/portainer-$DATE.tar.gz /data
|
||||
|
||||
# Cleanup old backups
|
||||
find $BACKUP_DIR -name "portainer-*.tar.gz" -mtime +$RETENTION_DAYS -delete
|
||||
|
||||
echo "Backup completed: portainer-$DATE.tar.gz"
|
||||
```
|
||||
|
||||
Add to crontab:
|
||||
```bash
|
||||
# Daily at 3 AM
|
||||
0 3 * * * /volume1/scripts/backup-portainer.sh >> /var/log/portainer-backup.log 2>&1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Stack Definitions Backup
|
||||
|
||||
All stack definitions are stored in Git (git.vish.gg/Vish/homelab), providing:
|
||||
- ✅ Version history
|
||||
- ✅ Change tracking
|
||||
- ✅ Easy rollback
|
||||
- ✅ Multi-location redundancy
|
||||
|
||||
### Git Repository Structure
|
||||
```
|
||||
homelab/
|
||||
├── Atlantis/ # Atlantis stack configs
|
||||
├── Calypso/ # Calypso stack configs
|
||||
├── homelab_vm/ # Homelab VM configs
|
||||
│ ├── monitoring.yaml
|
||||
│ ├── openhands.yaml
|
||||
│ ├── ntfy.yaml
|
||||
│ └── prometheus_grafana_hub/
|
||||
│ └── alerting/
|
||||
├── concord_nuc/ # NUC configs
|
||||
└── docs/ # Documentation
|
||||
```
|
||||
|
||||
### Backup Git Repo Locally
|
||||
```bash
|
||||
# Clone full repo with history
|
||||
git clone --mirror https://git.vish.gg/Vish/homelab.git homelab-backup.git
|
||||
|
||||
# Update existing mirror
|
||||
cd homelab-backup.git && git remote update
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Container Volume Backup Strategy
|
||||
|
||||
### Critical Volumes to Backup
|
||||
|
||||
| Service | Volume Path | Priority | Size |
|
||||
|---------|-------------|----------|------|
|
||||
| Grafana | /var/lib/grafana | High | ~500MB |
|
||||
| Prometheus | /prometheus | Medium | ~2GB |
|
||||
| ntfy | /var/cache/ntfy | Low | ~100MB |
|
||||
| Alertmanager | /alertmanager | Medium | ~50MB |
|
||||
|
||||
### Backup Script for Homelab VM
|
||||
|
||||
Create `/home/homelab/scripts/backup-volumes.sh`:
|
||||
```bash
|
||||
#!/bin/bash
|
||||
BACKUP_DIR="/home/homelab/backups"
|
||||
DATE=$(date +%Y%m%d)
|
||||
REMOTE="atlantis:/volume1/backups/homelab-vm"
|
||||
|
||||
# Create local backup
|
||||
mkdir -p $BACKUP_DIR/$DATE
|
||||
|
||||
# Backup critical volumes
|
||||
for vol in grafana prometheus alertmanager; do
|
||||
docker run --rm \
|
||||
-v ${vol}_data:/data \
|
||||
-v $BACKUP_DIR/$DATE:/backup \
|
||||
alpine tar -czvf /backup/${vol}.tar.gz /data
|
||||
done
|
||||
|
||||
# Sync to remote (Atlantis NAS)
|
||||
rsync -av --delete $BACKUP_DIR/$DATE/ $REMOTE/$DATE/
|
||||
|
||||
# Keep last 7 days locally
|
||||
find $BACKUP_DIR -maxdepth 1 -type d -mtime +7 -exec rm -rf {} \;
|
||||
|
||||
echo "Backup completed: $DATE"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Disaster Recovery Procedures
|
||||
|
||||
### Scenario 1: Portainer Server Failure
|
||||
|
||||
**Recovery Steps:**
|
||||
1. Deploy new Portainer instance on Atlantis
|
||||
2. Restore from backup
|
||||
3. Re-add edge agents (they will auto-reconnect)
|
||||
|
||||
```bash
|
||||
# Deploy fresh Portainer
|
||||
docker run -d -p 10000:9000 -p 8000:8000 \
|
||||
--name portainer --restart always \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v portainer_data:/data \
|
||||
portainer/portainer-ee:latest
|
||||
|
||||
# Restore from backup
|
||||
docker stop portainer
|
||||
tar -xzvf portainer-backup.tar.gz -C /
|
||||
docker start portainer
|
||||
```
|
||||
|
||||
### Scenario 2: Edge Agent Failure (e.g., Homelab VM)
|
||||
|
||||
**Recovery Steps:**
|
||||
1. Reinstall Docker on the host
|
||||
2. Install Portainer agent
|
||||
3. Redeploy stacks from Git
|
||||
|
||||
```bash
|
||||
# Install Portainer Edge Agent
|
||||
docker run -d \
|
||||
-v /var/run/docker.sock:/var/run/docker.sock \
|
||||
-v /var/lib/docker/volumes:/var/lib/docker/volumes \
|
||||
-v portainer_agent_data:/data \
|
||||
--name portainer_edge_agent \
|
||||
--restart always \
|
||||
-e EDGE=1 \
|
||||
-e EDGE_ID=<edge-id> \
|
||||
-e EDGE_KEY=<edge-key> \
|
||||
-e EDGE_INSECURE_POLL=1 \
|
||||
portainer/agent:latest
|
||||
|
||||
# Stacks will auto-deploy from Git (if AutoUpdate enabled)
|
||||
# Or manually trigger via Portainer API
|
||||
```
|
||||
|
||||
### Scenario 3: Complete Infrastructure Loss
|
||||
|
||||
**Recovery Priority:**
|
||||
1. Network (router, switch)
|
||||
2. Atlantis NAS (Portainer server)
|
||||
3. Git server (Gitea on Calypso)
|
||||
4. Edge agents
|
||||
|
||||
**Full Recovery Checklist:**
|
||||
- [ ] Restore network connectivity
|
||||
- [ ] Boot Atlantis, restore Portainer backup
|
||||
- [ ] Boot Calypso, verify Gitea accessible
|
||||
- [ ] Start edge agents on each host
|
||||
- [ ] Verify all stacks deployed from Git
|
||||
- [ ] Test alerting notifications
|
||||
- [ ] Verify monitoring dashboards
|
||||
|
||||
---
|
||||
|
||||
## Portainer API Backup Commands
|
||||
|
||||
### Export All Stack Definitions
|
||||
```bash
|
||||
#!/bin/bash
|
||||
API_KEY=REDACTED_API_KEY
|
||||
BASE_URL="http://vishinator.synology.me:10000"
|
||||
OUTPUT_DIR="./portainer-export-$(date +%Y%m%d)"
|
||||
|
||||
mkdir -p $OUTPUT_DIR
|
||||
|
||||
# Get all stacks
|
||||
curl -s -H "X-API-Key: $API_KEY" "$BASE_URL/api/stacks" | \
|
||||
jq -r '.[] | "\(.Id) \(.Name) \(.EndpointId)"' | \
|
||||
while read id name endpoint; do
|
||||
echo "Exporting stack: $name (ID: $id)"
|
||||
curl -s -H "X-API-Key: $API_KEY" \
|
||||
"$BASE_URL/api/stacks/$id/file" | \
|
||||
jq -r '.REDACTED_APP_PASSWORD' > "$OUTPUT_DIR/${name}.yaml"
|
||||
done
|
||||
|
||||
echo "Exported to $OUTPUT_DIR"
|
||||
```
|
||||
|
||||
### Export Endpoint Configuration
|
||||
```bash
|
||||
curl -s -H "X-API-Key: $API_KEY" \
|
||||
"$BASE_URL/api/endpoints" | jq > endpoints-backup.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Automated Backup Schedule
|
||||
|
||||
| Backup Type | Frequency | Retention | Location |
|
||||
|-------------|-----------|-----------|----------|
|
||||
| Portainer DB | Daily 3AM | 30 days | Atlantis NAS |
|
||||
| Git repo mirror | Daily 4AM | Unlimited | Calypso NAS |
|
||||
| Container volumes | Daily 5AM | 7 days local, 30 days remote | Atlantis NAS |
|
||||
| Full export | Weekly Sunday | 4 weeks | Off-site (optional) |
|
||||
|
||||
---
|
||||
|
||||
## Verification & Testing
|
||||
|
||||
### Monthly Backup Test Checklist
|
||||
- [ ] Verify Portainer backup file integrity
|
||||
- [ ] Test restore to staging environment
|
||||
- [ ] Verify Git repo clone works
|
||||
- [ ] Test volume restore for one service
|
||||
- [ ] Document any issues found
|
||||
|
||||
### Backup Monitoring
|
||||
Add to Prometheus alerting:
|
||||
```yaml
|
||||
- alert: BackupFailed
|
||||
expr: time() - backup_last_success_timestamp > 86400
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Backup hasn't run in 24 hours"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Backup Locations
|
||||
```
|
||||
Atlantis:/volume1/backups/
|
||||
├── portainer/ # Portainer DB backups
|
||||
├── homelab-vm/ # Homelab VM volume backups
|
||||
├── calypso/ # Calypso volume backups
|
||||
└── git-mirrors/ # Git repository mirrors
|
||||
```
|
||||
|
||||
### Important Files
|
||||
- Portainer API Key: `ptr_REDACTED_PORTAINER_TOKEN`
|
||||
- Git repo: `https://git.vish.gg/Vish/homelab`
|
||||
- Edge agent keys: Stored in Portainer (Settings → Environments)
|
||||
|
||||
### Emergency Contacts
|
||||
- Synology Support: 1-425-952-7900
|
||||
- Portainer Support: https://www.portainer.io/support
|
||||
Reference in New Issue
Block a user