261 lines
7.0 KiB
Markdown
261 lines
7.0 KiB
Markdown
# Homelab Disaster Recovery Guide
|
|
|
|
## 🚨 Avoiding the Chicken and Egg Problem
|
|
|
|
This guide ensures you can recover your homelab services even if some infrastructure is down.
|
|
|
|
## 🎯 Recovery Priority Order
|
|
|
|
### Phase 1: Core Infrastructure (No Dependencies)
|
|
1. **Router/Network** - Physical access required
|
|
2. **Calypso Server** - Direct console/SSH access
|
|
3. **Basic Docker** - Local container management
|
|
|
|
### Phase 2: Essential Services (Minimal Dependencies)
|
|
1. **Nginx Proxy Manager** - Enables external access
|
|
2. **Gitea** - Code repository access
|
|
3. **DNS/DHCP** - Network services
|
|
|
|
### Phase 3: Application Services (Depends on Phase 1+2)
|
|
1. **Reactive Resume v5** - Depends on NPM for external access
|
|
2. **Other applications** - Can be restored after core services
|
|
|
|
## 🔧 Emergency Access Methods
|
|
|
|
### If Gitea is Down
|
|
```bash
|
|
# Access via direct IP (bypass DNS)
|
|
ssh Vish@192.168.0.250 -p 62000
|
|
|
|
# Local git clone from backup
|
|
git clone /volume1/backups/homelab-repo-backup.git
|
|
|
|
# Manual deployment from local files
|
|
scp -P 62000 docker-compose.yml Vish@192.168.0.250:/volume1/docker/service/
|
|
```
|
|
|
|
### If NPM is Down
|
|
```bash
|
|
# Direct service access via IP:PORT
|
|
http://192.168.0.250:9751 # Reactive Resume
|
|
http://192.168.0.250:3000 # Gitea
|
|
http://192.168.0.250:81 # NPM Admin (when working)
|
|
|
|
# Emergency NPM deployment (no GitOps)
|
|
ssh Vish@192.168.0.250 -p 62000
|
|
sudo /usr/local/bin/docker run -d \
|
|
--name nginx-proxy-manager-emergency \
|
|
-p 8880:80 -p 8443:443 -p 81:81 \
|
|
-v /volume1/docker/nginx-proxy-manager/data:/data \
|
|
-v /volume1/docker/nginx-proxy-manager/letsencrypt:/etc/letsencrypt \
|
|
jc21/nginx-proxy-manager:latest
|
|
```
|
|
|
|
### If DNS is Down
|
|
```bash
|
|
# Use IP addresses directly
|
|
192.168.0.250 # Calypso
|
|
192.168.0.1 # Router
|
|
8.8.8.8 # Google DNS
|
|
|
|
# Edit local hosts file
|
|
echo "192.168.0.250 calypso.local git.local" >> /etc/hosts
|
|
```
|
|
|
|
## 📦 Offline Deployment Packages
|
|
|
|
### Create Emergency Deployment Kit
|
|
```bash
|
|
# Create offline deployment package
|
|
mkdir -p /volume1/backups/emergency-kit
|
|
cd /home/homelab/organized/repos/homelab
|
|
|
|
# Package NPM deployment
|
|
tar -czf /volume1/backups/emergency-kit/npm-deployment.tar.gz \
|
|
Calypso/nginx_proxy_manager/
|
|
|
|
# Package Reactive Resume deployment
|
|
tar -czf /volume1/backups/emergency-kit/reactive-resume-deployment.tar.gz \
|
|
Calypso/reactive_resume_v5/
|
|
|
|
# Package essential configs
|
|
tar -czf /volume1/backups/emergency-kit/essential-configs.tar.gz \
|
|
Calypso/*.yaml Calypso/*.yml
|
|
```
|
|
|
|
### Use Emergency Kit
|
|
```bash
|
|
# Extract and deploy without Git
|
|
ssh Vish@192.168.0.250 -p 62000
|
|
cd /volume1/backups/emergency-kit
|
|
|
|
# Deploy NPM first
|
|
tar -xzf npm-deployment.tar.gz
|
|
cd nginx_proxy_manager
|
|
chmod +x deploy.sh
|
|
./deploy.sh deploy
|
|
|
|
# Deploy Reactive Resume
|
|
cd ../
|
|
tar -xzf reactive-resume-deployment.tar.gz
|
|
cd reactive_resume_v5
|
|
chmod +x deploy.sh
|
|
./deploy.sh deploy
|
|
```
|
|
|
|
## 🔄 Service Dependencies Map
|
|
|
|
```
|
|
Internet Access
|
|
↓
|
|
Router (Physical)
|
|
↓
|
|
Calypso Server (SSH: 192.168.0.250:62000)
|
|
↓
|
|
Docker Engine (Local)
|
|
↓
|
|
┌─────────────────┬─────────────────┐
|
|
│ NPM (Port 81) │ Gitea (Port 3000) │ ← Independent services
|
|
└─────────────────┴─────────────────┘
|
|
↓ ↓
|
|
External Access Code Repository
|
|
↓ ↓
|
|
Reactive Resume v5 ← GitOps Deployment
|
|
```
|
|
|
|
## 🚀 Bootstrap Procedures
|
|
|
|
### Complete Infrastructure Loss
|
|
1. **Physical Access**: Console to Calypso
|
|
2. **Network Setup**: Configure static IP if DHCP down
|
|
3. **Docker Start**: `sudo systemctl start docker`
|
|
4. **Manual NPM**: Deploy NPM container directly
|
|
5. **Git Access**: Clone from backup or external source
|
|
6. **GitOps Resume**: Use deployment scripts
|
|
|
|
### Partial Service Loss
|
|
```bash
|
|
# If only applications are down (NPM working)
|
|
cd /home/homelab/organized/repos/homelab/Calypso/reactive_resume_v5
|
|
./deploy.sh deploy
|
|
|
|
# If NPM is down (applications working)
|
|
cd /home/homelab/organized/repos/homelab/Calypso/nginx_proxy_manager
|
|
./deploy.sh deploy
|
|
|
|
# If Git is down (use local backup)
|
|
cp -r /volume1/backups/homelab-latest/* /tmp/homelab-recovery/
|
|
cd /tmp/homelab-recovery/Calypso/reactive_resume_v5
|
|
./deploy.sh deploy
|
|
```
|
|
|
|
## 📋 Recovery Checklists
|
|
|
|
### NPM Recovery Checklist
|
|
- [ ] Calypso server accessible via SSH
|
|
- [ ] Docker service running
|
|
- [ ] Port 81 available for admin UI
|
|
- [ ] Ports 8880/8443 available for proxy
|
|
- [ ] Data directory exists: `/volume1/docker/nginx-proxy-manager/data`
|
|
- [ ] SSL certificates preserved: `/volume1/docker/nginx-proxy-manager/letsencrypt`
|
|
- [ ] Router port forwarding: 80→8880, 443→8443
|
|
|
|
### Reactive Resume Recovery Checklist
|
|
- [ ] NPM deployed and healthy
|
|
- [ ] Database directory exists: `/volume1/docker/rxv5/db`
|
|
- [ ] Storage directory exists: `/volume1/docker/rxv5/seaweedfs`
|
|
- [ ] Ollama directory exists: `/volume1/docker/rxv5/ollama`
|
|
- [ ] SMTP credentials available
|
|
- [ ] External domain resolving: `nslookup rx.vish.gg`
|
|
- [ ] NPM proxy hosts configured
|
|
|
|
## 🔐 Emergency Credentials
|
|
|
|
### Default Service Credentials
|
|
```bash
|
|
# NPM Default (change immediately)
|
|
Email: admin@example.com
|
|
Password: "REDACTED_PASSWORD"
|
|
|
|
# Database Credentials (from compose)
|
|
User: resumeuser
|
|
Password: "REDACTED_PASSWORD"
|
|
Database: resume
|
|
|
|
# SMTP (from environment)
|
|
User: your-email@example.com
|
|
Password: "REDACTED_PASSWORD" # Stored in compose file
|
|
```
|
|
|
|
### SSH Access
|
|
```bash
|
|
# Primary access
|
|
ssh Vish@192.168.0.250 -p 62000
|
|
|
|
# If SSH key fails, use password
|
|
# Ensure password auth is enabled in emergency
|
|
```
|
|
|
|
## 📞 Emergency Contacts & Resources
|
|
|
|
### External Resources (No Local Dependencies)
|
|
- **Docker Hub**: https://hub.docker.com/
|
|
- **Ollama Models**: https://ollama.ai/library
|
|
- **GitHub Backup**: https://github.com/yourusername/homelab-backup
|
|
- **Documentation**: This file (print/save offline)
|
|
|
|
### Recovery Commands Reference
|
|
```bash
|
|
# Check what's running
|
|
sudo /usr/local/bin/docker ps -a
|
|
|
|
# Emergency container cleanup
|
|
sudo /usr/local/bin/docker system prune -af
|
|
|
|
# Network troubleshooting
|
|
ping 8.8.8.8
|
|
nslookup rx.vish.gg
|
|
curl -I http://192.168.0.250:81
|
|
|
|
# Service health checks
|
|
curl http://192.168.0.250:9751/health
|
|
curl http://192.168.0.250:11434/api/tags
|
|
```
|
|
|
|
## 🎯 Prevention Strategies
|
|
|
|
### Regular Backups
|
|
```bash
|
|
# Weekly automated backup
|
|
0 2 * * 0 /usr/local/bin/backup-homelab.sh
|
|
|
|
# Backup script creates:
|
|
# - Git repository backup
|
|
# - Docker volume backups
|
|
# - Configuration exports
|
|
# - Emergency deployment kits
|
|
```
|
|
|
|
### Health Monitoring
|
|
```bash
|
|
# Daily health checks
|
|
0 8 * * * /usr/local/bin/health-check.sh
|
|
|
|
# Alerts on:
|
|
# - Service failures
|
|
# - Disk space issues
|
|
# - Network connectivity problems
|
|
# - SSL certificate expiration
|
|
```
|
|
|
|
### Documentation Maintenance
|
|
- Keep this file updated with any infrastructure changes
|
|
- Test recovery procedures quarterly
|
|
- Maintain offline copies of critical documentation
|
|
- Document any custom configurations or passwords
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-02-16
|
|
**Tested**: Recovery procedures verified
|
|
**Next Review**: 2026-05-16 |