Files
homelab-optimized/docs/troubleshooting/DISASTER_RECOVERY.md
Gitea Mirror Bot ac5a4ca940
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m3s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-01 04:44:34 UTC
2026-04-01 04:44:34 +00:00

7.0 KiB

Homelab Disaster Recovery Guide

🚨 Avoiding the Chicken and Egg Problem

This guide ensures you can recover your homelab services even if some infrastructure is down.

🎯 Recovery Priority Order

Phase 1: Core Infrastructure (No Dependencies)

  1. Router/Network - Physical access required
  2. Calypso Server - Direct console/SSH access
  3. Basic Docker - Local container management

Phase 2: Essential Services (Minimal Dependencies)

  1. Nginx Proxy Manager - Enables external access
  2. Gitea - Code repository access
  3. DNS/DHCP - Network services

Phase 3: Application Services (Depends on Phase 1+2)

  1. Reactive Resume v5 - Depends on NPM for external access
  2. Other applications - Can be restored after core services

🔧 Emergency Access Methods

If Gitea is Down

# Access via direct IP (bypass DNS)
ssh Vish@192.168.0.250 -p 62000

# Local git clone from backup
git clone /volume1/backups/homelab-repo-backup.git

# Manual deployment from local files
scp -P 62000 docker-compose.yml Vish@192.168.0.250:/volume1/docker/service/

If NPM is Down

# Direct service access via IP:PORT
http://192.168.0.250:9751  # Reactive Resume
http://192.168.0.250:3000  # Gitea
http://192.168.0.250:81    # NPM Admin (when working)

# Emergency NPM deployment (no GitOps)
ssh Vish@192.168.0.250 -p 62000
sudo /usr/local/bin/docker run -d \
  --name nginx-proxy-manager-emergency \
  -p 8880:80 -p 8443:443 -p 81:81 \
  -v /volume1/docker/nginx-proxy-manager/data:/data \
  -v /volume1/docker/nginx-proxy-manager/letsencrypt:/etc/letsencrypt \
  jc21/nginx-proxy-manager:latest

If DNS is Down

# Use IP addresses directly
192.168.0.250  # Calypso
192.168.0.1    # Router
8.8.8.8        # Google DNS

# Edit local hosts file
echo "192.168.0.250 calypso.local git.local" >> /etc/hosts

📦 Offline Deployment Packages

Create Emergency Deployment Kit

# Create offline deployment package
mkdir -p /volume1/backups/emergency-kit
cd /home/homelab/organized/repos/homelab

# Package NPM deployment
tar -czf /volume1/backups/emergency-kit/npm-deployment.tar.gz \
  Calypso/nginx_proxy_manager/

# Package Reactive Resume deployment  
tar -czf /volume1/backups/emergency-kit/reactive-resume-deployment.tar.gz \
  Calypso/reactive_resume_v5/

# Package essential configs
tar -czf /volume1/backups/emergency-kit/essential-configs.tar.gz \
  Calypso/*.yaml Calypso/*.yml

Use Emergency Kit

# Extract and deploy without Git
ssh Vish@192.168.0.250 -p 62000
cd /volume1/backups/emergency-kit

# Deploy NPM first
tar -xzf npm-deployment.tar.gz
cd nginx_proxy_manager
chmod +x deploy.sh
./deploy.sh deploy

# Deploy Reactive Resume
cd ../
tar -xzf reactive-resume-deployment.tar.gz  
cd reactive_resume_v5
chmod +x deploy.sh
./deploy.sh deploy

🔄 Service Dependencies Map

Internet Access
    ↓
Router (Physical)
    ↓
Calypso Server (SSH: 192.168.0.250:62000)
    ↓
Docker Engine (Local)
    ↓
┌─────────────────┬─────────────────┐
│ NPM (Port 81)   │ Gitea (Port 3000) │ ← Independent services
└─────────────────┴─────────────────┘
    ↓                    ↓
External Access     Code Repository
    ↓                    ↓
Reactive Resume v5 ← GitOps Deployment

🚀 Bootstrap Procedures

Complete Infrastructure Loss

  1. Physical Access: Console to Calypso
  2. Network Setup: Configure static IP if DHCP down
  3. Docker Start: sudo systemctl start docker
  4. Manual NPM: Deploy NPM container directly
  5. Git Access: Clone from backup or external source
  6. GitOps Resume: Use deployment scripts

Partial Service Loss

# If only applications are down (NPM working)
cd /home/homelab/organized/repos/homelab/Calypso/reactive_resume_v5
./deploy.sh deploy

# If NPM is down (applications working)  
cd /home/homelab/organized/repos/homelab/Calypso/nginx_proxy_manager
./deploy.sh deploy

# If Git is down (use local backup)
cp -r /volume1/backups/homelab-latest/* /tmp/homelab-recovery/
cd /tmp/homelab-recovery/Calypso/reactive_resume_v5
./deploy.sh deploy

📋 Recovery Checklists

NPM Recovery Checklist

  • Calypso server accessible via SSH
  • Docker service running
  • Port 81 available for admin UI
  • Ports 8880/8443 available for proxy
  • Data directory exists: /volume1/docker/nginx-proxy-manager/data
  • SSL certificates preserved: /volume1/docker/nginx-proxy-manager/letsencrypt
  • Router port forwarding: 80→8880, 443→8443

Reactive Resume Recovery Checklist

  • NPM deployed and healthy
  • Database directory exists: /volume1/docker/rxv5/db
  • Storage directory exists: /volume1/docker/rxv5/seaweedfs
  • Ollama directory exists: /volume1/docker/rxv5/ollama
  • SMTP credentials available
  • External domain resolving: nslookup rx.vish.gg
  • NPM proxy hosts configured

🔐 Emergency Credentials

Default Service Credentials

# NPM Default (change immediately)
Email: admin@example.com
Password: "REDACTED_PASSWORD"

# Database Credentials (from compose)
User: resumeuser
Password: "REDACTED_PASSWORD"
Database: resume

# SMTP (from environment)
User: your-email@example.com
Password: "REDACTED_PASSWORD" # Stored in compose file

SSH Access

# Primary access
ssh Vish@192.168.0.250 -p 62000

# If SSH key fails, use password
# Ensure password auth is enabled in emergency

📞 Emergency Contacts & Resources

External Resources (No Local Dependencies)

Recovery Commands Reference

# Check what's running
sudo /usr/local/bin/docker ps -a

# Emergency container cleanup
sudo /usr/local/bin/docker system prune -af

# Network troubleshooting
ping 8.8.8.8
nslookup rx.vish.gg
curl -I http://192.168.0.250:81

# Service health checks
curl http://192.168.0.250:9751/health
curl http://192.168.0.250:11434/api/tags

🎯 Prevention Strategies

Regular Backups

# Weekly automated backup
0 2 * * 0 /usr/local/bin/backup-homelab.sh

# Backup script creates:
# - Git repository backup
# - Docker volume backups  
# - Configuration exports
# - Emergency deployment kits

Health Monitoring

# Daily health checks
0 8 * * * /usr/local/bin/health-check.sh

# Alerts on:
# - Service failures
# - Disk space issues
# - Network connectivity problems
# - SSL certificate expiration

Documentation Maintenance

  • Keep this file updated with any infrastructure changes
  • Test recovery procedures quarterly
  • Maintain offline copies of critical documentation
  • Document any custom configurations or passwords

Last Updated: 2026-02-16
Tested: Recovery procedures verified
Next Review: 2026-05-16