Files
homelab-optimized/docs/troubleshooting/DISASTER_RECOVERY.md
Gitea Mirror Bot 717e06b7a8
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m0s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-17 11:52:42 UTC
2026-03-17 11:52:42 +00:00

261 lines
7.0 KiB
Markdown

# Homelab Disaster Recovery Guide
## 🚨 Avoiding the Chicken and Egg Problem
This guide ensures you can recover your homelab services even if some infrastructure is down.
## 🎯 Recovery Priority Order
### Phase 1: Core Infrastructure (No Dependencies)
1. **Router/Network** - Physical access required
2. **Calypso Server** - Direct console/SSH access
3. **Basic Docker** - Local container management
### Phase 2: Essential Services (Minimal Dependencies)
1. **Nginx Proxy Manager** - Enables external access
2. **Gitea** - Code repository access
3. **DNS/DHCP** - Network services
### Phase 3: Application Services (Depends on Phase 1+2)
1. **Reactive Resume v5** - Depends on NPM for external access
2. **Other applications** - Can be restored after core services
## 🔧 Emergency Access Methods
### If Gitea is Down
```bash
# Access via direct IP (bypass DNS)
ssh Vish@192.168.0.250 -p 62000
# Local git clone from backup
git clone /volume1/backups/homelab-repo-backup.git
# Manual deployment from local files
scp -P 62000 docker-compose.yml Vish@192.168.0.250:/volume1/docker/service/
```
### If NPM is Down
```bash
# Direct service access via IP:PORT
http://192.168.0.250:9751 # Reactive Resume
http://192.168.0.250:3000 # Gitea
http://192.168.0.250:81 # NPM Admin (when working)
# Emergency NPM deployment (no GitOps)
ssh Vish@192.168.0.250 -p 62000
sudo /usr/local/bin/docker run -d \
--name nginx-proxy-manager-emergency \
-p 8880:80 -p 8443:443 -p 81:81 \
-v /volume1/docker/nginx-proxy-manager/data:/data \
-v /volume1/docker/nginx-proxy-manager/letsencrypt:/etc/letsencrypt \
jc21/nginx-proxy-manager:latest
```
### If DNS is Down
```bash
# Use IP addresses directly
192.168.0.250 # Calypso
192.168.0.1 # Router
8.8.8.8 # Google DNS
# Edit local hosts file
echo "192.168.0.250 calypso.local git.local" >> /etc/hosts
```
## 📦 Offline Deployment Packages
### Create Emergency Deployment Kit
```bash
# Create offline deployment package
mkdir -p /volume1/backups/emergency-kit
cd /home/homelab/organized/repos/homelab
# Package NPM deployment
tar -czf /volume1/backups/emergency-kit/npm-deployment.tar.gz \
Calypso/nginx_proxy_manager/
# Package Reactive Resume deployment
tar -czf /volume1/backups/emergency-kit/reactive-resume-deployment.tar.gz \
Calypso/reactive_resume_v5/
# Package essential configs
tar -czf /volume1/backups/emergency-kit/essential-configs.tar.gz \
Calypso/*.yaml Calypso/*.yml
```
### Use Emergency Kit
```bash
# Extract and deploy without Git
ssh Vish@192.168.0.250 -p 62000
cd /volume1/backups/emergency-kit
# Deploy NPM first
tar -xzf npm-deployment.tar.gz
cd nginx_proxy_manager
chmod +x deploy.sh
./deploy.sh deploy
# Deploy Reactive Resume
cd ../
tar -xzf reactive-resume-deployment.tar.gz
cd reactive_resume_v5
chmod +x deploy.sh
./deploy.sh deploy
```
## 🔄 Service Dependencies Map
```
Internet Access
Router (Physical)
Calypso Server (SSH: 192.168.0.250:62000)
Docker Engine (Local)
┌─────────────────┬─────────────────┐
│ NPM (Port 81) │ Gitea (Port 3000) │ ← Independent services
└─────────────────┴─────────────────┘
↓ ↓
External Access Code Repository
↓ ↓
Reactive Resume v5 ← GitOps Deployment
```
## 🚀 Bootstrap Procedures
### Complete Infrastructure Loss
1. **Physical Access**: Console to Calypso
2. **Network Setup**: Configure static IP if DHCP down
3. **Docker Start**: `sudo systemctl start docker`
4. **Manual NPM**: Deploy NPM container directly
5. **Git Access**: Clone from backup or external source
6. **GitOps Resume**: Use deployment scripts
### Partial Service Loss
```bash
# If only applications are down (NPM working)
cd /home/homelab/organized/repos/homelab/Calypso/reactive_resume_v5
./deploy.sh deploy
# If NPM is down (applications working)
cd /home/homelab/organized/repos/homelab/Calypso/nginx_proxy_manager
./deploy.sh deploy
# If Git is down (use local backup)
cp -r /volume1/backups/homelab-latest/* /tmp/homelab-recovery/
cd /tmp/homelab-recovery/Calypso/reactive_resume_v5
./deploy.sh deploy
```
## 📋 Recovery Checklists
### NPM Recovery Checklist
- [ ] Calypso server accessible via SSH
- [ ] Docker service running
- [ ] Port 81 available for admin UI
- [ ] Ports 8880/8443 available for proxy
- [ ] Data directory exists: `/volume1/docker/nginx-proxy-manager/data`
- [ ] SSL certificates preserved: `/volume1/docker/nginx-proxy-manager/letsencrypt`
- [ ] Router port forwarding: 80→8880, 443→8443
### Reactive Resume Recovery Checklist
- [ ] NPM deployed and healthy
- [ ] Database directory exists: `/volume1/docker/rxv5/db`
- [ ] Storage directory exists: `/volume1/docker/rxv5/seaweedfs`
- [ ] Ollama directory exists: `/volume1/docker/rxv5/ollama`
- [ ] SMTP credentials available
- [ ] External domain resolving: `nslookup rx.vish.gg`
- [ ] NPM proxy hosts configured
## 🔐 Emergency Credentials
### Default Service Credentials
```bash
# NPM Default (change immediately)
Email: admin@example.com
Password: "REDACTED_PASSWORD"
# Database Credentials (from compose)
User: resumeuser
Password: "REDACTED_PASSWORD"
Database: resume
# SMTP (from environment)
User: your-email@example.com
Password: "REDACTED_PASSWORD" # Stored in compose file
```
### SSH Access
```bash
# Primary access
ssh Vish@192.168.0.250 -p 62000
# If SSH key fails, use password
# Ensure password auth is enabled in emergency
```
## 📞 Emergency Contacts & Resources
### External Resources (No Local Dependencies)
- **Docker Hub**: https://hub.docker.com/
- **Ollama Models**: https://ollama.ai/library
- **GitHub Backup**: https://github.com/yourusername/homelab-backup
- **Documentation**: This file (print/save offline)
### Recovery Commands Reference
```bash
# Check what's running
sudo /usr/local/bin/docker ps -a
# Emergency container cleanup
sudo /usr/local/bin/docker system prune -af
# Network troubleshooting
ping 8.8.8.8
nslookup rx.vish.gg
curl -I http://192.168.0.250:81
# Service health checks
curl http://192.168.0.250:9751/health
curl http://192.168.0.250:11434/api/tags
```
## 🎯 Prevention Strategies
### Regular Backups
```bash
# Weekly automated backup
0 2 * * 0 /usr/local/bin/backup-homelab.sh
# Backup script creates:
# - Git repository backup
# - Docker volume backups
# - Configuration exports
# - Emergency deployment kits
```
### Health Monitoring
```bash
# Daily health checks
0 8 * * * /usr/local/bin/health-check.sh
# Alerts on:
# - Service failures
# - Disk space issues
# - Network connectivity problems
# - SSL certificate expiration
```
### Documentation Maintenance
- Keep this file updated with any infrastructure changes
- Test recovery procedures quarterly
- Maintain offline copies of critical documentation
- Document any custom configurations or passwords
---
**Last Updated**: 2026-02-16
**Tested**: Recovery procedures verified
**Next Review**: 2026-05-16