Files
homelab-optimized/docs/troubleshooting/disaster-recovery.md
Gitea Mirror Bot 5cdf36e545
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-05 12:11:15 UTC
2026-04-05 12:11:15 +00:00

16 KiB

🚨 Disaster Recovery Guide

🔴 Advanced Guide

This guide covers critical disaster recovery scenarios for your homelab, including complete router failure, network reconfiguration, and service restoration procedures.

🎯 Disaster Scenarios Covered

  1. 🔥 Router Failure - Complete router replacement and reconfiguration
  2. 🌐 Network Reconfiguration - ISP changes, subnet changes, IP conflicts
  3. 🔌 Power Outage Recovery - Bringing services back online in correct order
  4. 💾 Storage Failure - Data recovery and service restoration
  5. 🔐 Password Manager Outage - Accessing credentials when Vaultwarden is down

🔥 Router Failure Recovery

📋 Pre-Disaster Preparation (Do This Now!)

1. Document Current Network Configuration

# Create network documentation file
mkdir -p ~/homelab-recovery
cat > ~/homelab-recovery/network-config.md << 'EOF'
# Network Configuration Backup

## Router Information
- **Model**: [Your Router Model]
- **Firmware**: [Version]
- **Admin URL**: http://192.168.1.1
- **Admin User**: admin
- **Admin Password**: [Document in password manager]

## Network Settings
- **WAN Type**: DHCP / Static / PPPoE
- **ISP Settings**: [Document ISP-specific settings]
- **Subnet**: 192.168.1.0/24
- **DHCP Range**: 192.168.1.100-192.168.1.200
- **DNS Servers**: 1.1.1.1, 8.8.8.8

## Static IP Assignments
EOF

# Document all static IPs
echo "## Static IP Assignments" >> ~/homelab-recovery/network-config.md

2. Export Router Configuration

# Most routers allow config export
# Login to router web interface
# Look for: System → Backup/Restore → Export Configuration
# Save to: ~/homelab-recovery/router-backup-$(date +%Y%m%d).bin

3. Document Port Forwarding Rules

cat > ~/homelab-recovery/port-forwarding.md << 'EOF'
# Port Forwarding Rules

## Essential Services
| External Port | Internal IP | Internal Port | Protocol | Service |
|---------------|-------------|---------------|----------|---------|
| 51820 | 192.168.1.100 | 51820 | UDP | WireGuard (Atlantis) |
| 51820 | 192.168.1.102 | 51820 | UDP | WireGuard (Concord) |
| 80 | 192.168.1.100 | 8341 | TCP | HTTP (Nginx Proxy) |
| 443 | 192.168.1.100 | 8766 | TCP | HTTPS (Nginx Proxy) |

## Gaming Services (Optional)
| External Port | Internal IP | Internal Port | Protocol | Service |
|---------------|-------------|---------------|----------|---------|
| 7777 | 192.168.1.103 | 7777 | TCP/UDP | Satisfactory |
| 27015 | 192.168.1.103 | 27015 | TCP/UDP | L4D2 Server |

## Dynamic DNS
- **Service**: [Your DDNS Provider]
- **Hostname**: vishinator.synology.me
- **Update URL**: [Document update mechanism]
EOF

🛠️ Router Replacement Procedure

Step 1: Physical Setup

# 1. Connect new router to modem
# 2. Connect computer directly to router via Ethernet
# 3. Power on router and wait for boot (2-3 minutes)

Step 2: Basic Network Configuration

# Access router admin interface
# Default is usually: http://192.168.1.1 or http://192.168.0.1

# For TP-Link Archer BE800 v1.6: http://192.168.0.1 or http://tplinkwifi.net
# Default login: admin/admin

# If different subnet, find router IP:
ip route | grep default
# or
arp -a | grep -E "(router|gateway)"

Router Configuration Checklist:

# ✅ Set admin password (use password manager)
# ✅ Configure WAN connection (DHCP/Static/PPPoE)
# ✅ Set WiFi SSID and password
# ✅ Configure subnet: 192.168.1.0/24
# ✅ Set DHCP range: 192.168.1.100-192.168.1.200
# ✅ Configure DNS servers: 1.1.1.1, 8.8.8.8
# ✅ Enable UPnP (if needed)
# ✅ Disable WPS (security)

📖 For TP-Link Archer BE800 v1.6 specific instructions, see: TP-Link Archer BE800 Setup Guide

Step 3: Static IP Assignment

Critical Static IPs (Configure First):

# In router DHCP reservation settings:

# Primary Infrastructure
atlantis.vish.local     → 192.168.1.100  # MAC: [Document MAC]
calypso.vish.local      → 192.168.1.101  # MAC: [Document MAC]
concord-nuc.vish.local  → 192.168.1.102  # MAC: [Document MAC]

# Virtual Machines
homelab-vm.vish.local   → 192.168.1.103  # MAC: [Document MAC]
chicago-vm.vish.local   → 192.168.1.104  # MAC: [Document MAC]
bulgaria-vm.vish.local  → 192.168.1.105  # MAC: [Document MAC]

# Specialized Hosts
anubis.vish.local       → 192.168.1.106  # MAC: [Document MAC]
guava.vish.local        → 192.168.1.107  # MAC: [Document MAC]
setillo.vish.local      → 192.168.1.108  # MAC: [Document MAC]

# Raspberry Pi Cluster
rpi-vish.vish.local     → 192.168.1.109  # MAC: [Document MAC]
rpi-kevin.vish.local    → 192.168.1.110  # MAC: [Document MAC]

# Edge Devices
nvidia-shield.vish.local → 192.168.1.111  # MAC: [Document MAC]

Find MAC Addresses:

# On each host, run:
ip link show | grep -E "(ether|link)"
# or
cat /sys/class/net/eth0/address

# From router, check DHCP client list
# Or use network scanner:
nmap -sn 192.168.1.0/24
arp -a

Step 4: Port Forwarding Configuration

Essential Port Forwards (Configure Immediately):

# VPN Access (Highest Priority)
External: 51820/UDP → Internal: 192.168.1.100:51820 (Atlantis WireGuard)
External: 51821/UDP → Internal: 192.168.1.102:51820 (Concord WireGuard)

# Web Services (If needed)
External: 80/TCP → Internal: 192.168.1.100:8341 (HTTP)
External: 443/TCP → Internal: 192.168.1.100:8766 (HTTPS)

Gaming Services (If hosting public games):

# Satisfactory Server
External: 7777/TCP → Internal: 192.168.1.103:7777
External: 7777/UDP → Internal: 192.168.1.103:7777

# Left 4 Dead 2 Server
External: 27015/TCP → Internal: 192.168.1.103:27015
External: 27015/UDP → Internal: 192.168.1.103:27015
External: 27020/UDP → Internal: 192.168.1.103:27020
External: 27005/UDP → Internal: 192.168.1.103:27005

Step 5: Dynamic DNS Configuration

Update DDNS Settings:

# Method 1: Router Built-in DDNS
# Configure in router: Advanced → Dynamic DNS
# Service: [Your provider]
# Hostname: vishinator.synology.me
# Username: [Your DDNS username]
# Password: "REDACTED_PASSWORD" DDNS password]

# Method 2: Manual Update (if router doesn't support your provider)
# SSH to a homelab host and run:
curl -u "username:password" \
  "https://your-ddns-provider.com/update?hostname=vishinator.synology.me&myip=$(curl -s ifconfig.me)"

Test DDNS:

# Wait 5-10 minutes, then test:
nslookup vishinator.synology.me
dig vishinator.synology.me

# Should return your new external IP
curl ifconfig.me  # Compare with DDNS result

🔧 Service Recovery Order

Phase 1: Core Infrastructure (First 30 minutes)

# 1. Verify network connectivity
ping 8.8.8.8
ping google.com

# 2. Check all hosts are reachable
ping atlantis.vish.local
ping calypso.vish.local
ping concord-nuc.vish.local

# 3. Verify DNS resolution
nslookup atlantis.vish.local

Phase 2: Essential Services (Next 30 minutes)

# 4. Check VPN services
# Test WireGuard from external device
# Verify Tailscale connectivity

# 5. Verify password manager
curl -I https://atlantis.vish.local:8222  # Vaultwarden

# 6. Check monitoring
curl -I https://atlantis.vish.local:3000  # Grafana
curl -I https://atlantis.vish.local:3001  # Uptime Kuma

Phase 3: Media and Applications (Next hour)

# 7. Media services
curl -I https://atlantis.vish.local:32400  # Plex
curl -I https://calypso.vish.local:2283   # Immich

# 8. Communication services
curl -I https://homelab-vm.vish.local:8065  # Mattermost

# 9. Development services
curl -I https://atlantis.vish.local:8929    # GitLab

📱 Mobile Hotspot Emergency Access

If your internet is down but you need to configure the router:

# 1. Connect phone to new router WiFi
# 2. Enable mobile hotspot on another device
# 3. Connect computer to mobile hotspot
# 4. Access router via: http://192.168.1.1
# 5. Configure WAN settings to use mobile hotspot temporarily

🌐 Network Reconfiguration Scenarios

ISP Changes (New Modem/Different Settings)

Scenario 1: New Cable Modem

# 1. Connect new modem to router WAN port
# 2. Power cycle both devices (modem first, then router)
# 3. Check WAN connection in router interface
# 4. Update DDNS if external IP changed
# 5. Test port forwarding from external network

Scenario 2: Fiber Installation

# 1. Configure router for new connection type
# 2. May need PPPoE credentials from ISP
# 3. Update MTU settings if required (usually 1500 for fiber)
# 4. Test speed and latency
# 5. Update monitoring dashboards with new metrics

Scenario 3: Subnet Change Required

# If you need to change from 192.168.1.x to different subnet:

# 1. Plan new IP scheme
# Old: 192.168.1.0/24
# New: 192.168.2.0/24 (example)

# 2. Update router DHCP settings
# 3. Update static IP reservations
# 4. Update all service configurations
# 5. Update Tailscale subnet routes
# 6. Update monitoring configurations
# 7. Update documentation

IP Conflict Resolution

# If new router uses different default subnet:

# 1. Identify conflicts
nmap -sn 192.168.0.0/24  # Scan new subnet
nmap -sn 192.168.1.0/24  # Scan old subnet

# 2. Choose resolution strategy:
# Option A: Change router to use 192.168.1.x
# Option B: Reconfigure all devices for new subnet

# 3. Update all static configurations
# 4. Update firewall rules
# 5. Update service discovery

🔌 Power Outage Recovery

Startup Sequence (Critical Order)

# Phase 1: Infrastructure (0-5 minutes)
# 1. Modem/Internet connection
# 2. Router/Switch
# 3. NAS devices (Atlantis, Calypso) - these take longest to boot

# Phase 2: Core Services (5-10 minutes)
# 4. Primary compute hosts (concord-nuc)
# 5. Virtual machine hosts

# Phase 3: Applications (10-15 minutes)
# 6. Raspberry Pi devices
# 7. Edge devices
# 8. Verify all services are running

Automated Startup Script:

#!/bin/bash
# ~/homelab-recovery/startup-sequence.sh

echo "🔌 Starting homelab recovery sequence..."

# Wait for network
echo "⏳ Waiting for network connectivity..."
while ! ping -c 1 8.8.8.8 >/dev/null 2>&1; do
    sleep 5
done
echo "✅ Network is up"

# Check each host
hosts=(
    "atlantis.vish.local"
    "calypso.vish.local" 
    "concord-nuc.vish.local"
    "homelab-vm.vish.local"
    "chicago-vm.vish.local"
    "bulgaria-vm.vish.local"
)

for host in "${hosts[@]}"; do
    echo "🔍 Checking $host..."
    if ping -c 1 "$host" >/dev/null 2>&1; then
        echo "✅ $host is responding"
    else
        echo "❌ $host is not responding"
    fi
done

echo "🎯 Recovery sequence complete"

💾 Storage Failure Recovery

Backup Verification

# Before disaster strikes, verify backups exist:

# 1. Docker volume backups
ls -la /volume1/docker/*/
du -sh /volume1/docker/*/

# 2. Configuration backups
find ~/homelab-recovery -name "*.yml" -o -name "*.yaml"

# 3. Database backups
ls -la /volume1/docker/*/backup/
ls -la /volume1/docker/*/db_backup/

Service Restoration Priority

# 1. Password Manager (Vaultwarden) - Need passwords for everything else
# 2. DNS/DHCP (Pi-hole) - Network services
# 3. Monitoring (Grafana/Prometheus) - Visibility into recovery
# 4. VPN (WireGuard/Tailscale) - Remote access
# 5. Media services - Lower priority
# 6. Development services - Lowest priority

🔧 Emergency Toolkit

Essential Recovery Files

Create and maintain these files:

# Create recovery directory
mkdir -p ~/homelab-recovery/{configs,scripts,docs,backups}

# Network configuration
~/homelab-recovery/docs/network-config.md
~/homelab-recovery/docs/port-forwarding.md
~/homelab-recovery/docs/static-ips.md

# Service configurations
~/homelab-recovery/configs/docker-compose-essential.yml
~/homelab-recovery/configs/nginx-proxy-manager.conf
~/homelab-recovery/configs/wireguard-configs/

# Recovery scripts
~/homelab-recovery/scripts/startup-sequence.sh
~/homelab-recovery/scripts/test-connectivity.sh
~/homelab-recovery/scripts/restore-services.sh

# Backup files
~/homelab-recovery/backups/router-config-$(date +%Y%m%d).bin
~/homelab-recovery/backups/vaultwarden-backup.json
~/homelab-recovery/backups/essential-passwords.txt.gpg

Emergency Contact Information

cat > ~/homelab-recovery/docs/emergency-contacts.md << 'EOF'
# Emergency Contacts

## ISP Support
- **Provider**: [Your ISP]
- **Phone**: [Support number]
- **Account**: [Account number]
- **Service Address**: [Your address]

## Hardware Vendors
- **Router**: [Manufacturer support]
- **NAS**: Synology Support
- **Server**: [Hardware vendor]

## Service Providers
- **Domain Registrar**: [Your registrar]
- **DDNS Provider**: [Your DDNS service]
- **Cloud Backup**: [Your backup service]
EOF

Quick Reference Commands

# Network diagnostics
ping 8.8.8.8                    # Internet connectivity
nslookup google.com             # DNS resolution
ip route                        # Routing table
arp -a                          # ARP table
netstat -rn                     # Network routes

# Service checks
docker ps                       # Running containers
systemctl status tailscaled     # Tailscale status
systemctl status docker         # Docker status

# Port checks
nmap -p 22,80,443,51820 localhost
telnet hostname port
nc -zv hostname port

📋 Recovery Checklists

🔥 Router Failure Checklist

☐ Physical setup (modem → router → computer)
☐ Access router admin interface
☐ Configure basic settings (SSID, password, subnet)
☐ Set static IP reservations for all hosts
☐ Configure port forwarding rules
☐ Update DDNS settings
☐ Test VPN connectivity
☐ Verify all services accessible
☐ Update documentation with any changes
☐ Test from external network

🌐 Network Change Checklist

☐ Document old configuration
☐ Plan new IP scheme
☐ Update router settings
☐ Update static IP reservations
☐ Update service configurations
☐ Update Tailscale subnet routes
☐ Update monitoring dashboards
☐ Update documentation
☐ Test all services
☐ Update backup scripts

🔌 Power Outage Checklist

☐ Wait for stable power (use UPS if available)
☐ Start devices in correct order
☐ Verify network connectivity
☐ Check all hosts are responding
☐ Verify essential services are running
☐ Check for any corrupted data
☐ Update monitoring dashboards
☐ Document any issues encountered

🚨 Emergency Procedures

If Everything is Down

# 1. Stay calm and work systematically
# 2. Check physical connections first
# 3. Verify power to all devices
# 4. Check internet connectivity with direct connection
# 5. Work through recovery checklists step by step
# 6. Document everything for future reference

If You're Locked Out

# 1. Try default router credentials (often admin/admin)
# 2. Look for reset button on router (hold 10-30 seconds)
# 3. Check router label for default WiFi password
# 4. Use mobile hotspot for internet access during recovery
# 5. Access password manager from mobile device if needed

If Services Won't Start

# 1. Check Docker daemon is running
systemctl status docker

# 2. Check disk space
df -h

# 3. Check for port conflicts
netstat -tulpn | grep :port

# 4. Check container logs
docker logs container-name

# 5. Try starting services individually
docker-compose up service-name


💡 Pro Tip: Practice these procedures when everything is working! Run through the checklists quarterly to ensure your documentation is current and you're familiar with the process. A disaster is not the time to learn these procedures for the first time.