560 lines
14 KiB
Markdown
560 lines
14 KiB
Markdown
# Service Migration Runbook
|
|
|
|
## Overview
|
|
This runbook guides you through migrating a containerized service from one host to another in the homelab. The procedure minimizes downtime and ensures data integrity throughout the migration.
|
|
|
|
## Prerequisites
|
|
- [ ] SSH access to both source and target hosts
|
|
- [ ] Sufficient disk space on target host
|
|
- [ ] Network connectivity between hosts (Tailscale recommended)
|
|
- [ ] Service backup completed and verified
|
|
- [ ] Maintenance window scheduled (if downtime required)
|
|
- [ ] Portainer access for both hosts
|
|
|
|
## Metadata
|
|
- **Estimated Time**: 1-3 hours (depending on data size)
|
|
- **Risk Level**: Medium-High (data migration involved)
|
|
- **Requires Downtime**: Yes (typically 15-60 minutes)
|
|
- **Reversible**: Yes (can roll back to source host)
|
|
- **Tested On**: 2026-02-14
|
|
|
|
## When to Migrate Services
|
|
|
|
Common reasons for service migration:
|
|
|
|
| Scenario | Example | Recommended Target |
|
|
|----------|---------|-------------------|
|
|
| **Resource constraints** | NAS running out of CPU | Move to NUC or VM |
|
|
| **Storage constraints** | Running out of disk space | Move to larger NAS |
|
|
| **Performance issues** | High I/O affecting other services | Move to dedicated host |
|
|
| **Host consolidation** | Reducing number of active hosts | Consolidate to primary hosts |
|
|
| **Hardware maintenance** | Planned hardware upgrade | Temporary or permanent move |
|
|
| **Improved organization** | Group related services | Move to appropriate host |
|
|
|
|
## Migration Types
|
|
|
|
### Type 1: Simple Migration (Stateless Service)
|
|
- No persistent data
|
|
- Can be redeployed from scratch
|
|
- Example: Nginx, static web servers
|
|
- **Downtime**: Minimal (5-15 minutes)
|
|
|
|
### Type 2: Standard Migration (Small Data)
|
|
- Persistent data < 10GB
|
|
- Configuration and databases
|
|
- Example: Uptime Kuma, AdGuard Home
|
|
- **Downtime**: 15-30 minutes
|
|
|
|
### Type 3: Large Data Migration
|
|
- Persistent data > 10GB
|
|
- Media libraries, large databases
|
|
- Example: Plex, Immich, Jellyfin
|
|
- **Downtime**: 1-4 hours (depending on size)
|
|
|
|
## Pre-Migration Planning
|
|
|
|
### Step 1: Assess the Service
|
|
|
|
```bash
|
|
# SSH to source host
|
|
ssh [source-host]
|
|
|
|
# Identify container and volumes
|
|
docker ps | grep [service-name]
|
|
docker inspect [service-name] | grep -A 10 Mounts
|
|
|
|
# Check data size
|
|
docker exec [service-name] du -sh /config /data
|
|
|
|
# List all volumes used by service
|
|
docker volume ls | grep [service-name]
|
|
|
|
# Check volume sizes
|
|
docker system df -v | grep [service-name]
|
|
```
|
|
|
|
Document findings:
|
|
- Container name: ___________
|
|
- Image and tag: ___________
|
|
- Data size: ___________
|
|
- Volume count: ___________
|
|
- Network dependencies: ___________
|
|
- Port mappings: ___________
|
|
|
|
### Step 2: Check Target Host Capacity
|
|
|
|
```bash
|
|
# SSH to target host
|
|
ssh [target-host]
|
|
|
|
# Check available resources
|
|
df -h # Disk space
|
|
free -h # RAM
|
|
nproc # CPU cores
|
|
docker ps | wc -l # Current container count
|
|
|
|
# Check port conflicts
|
|
netstat -tlnp | grep [required-port]
|
|
```
|
|
|
|
### Step 3: Create Migration Plan
|
|
|
|
**Downtime Window**:
|
|
- Start: ___________
|
|
- End: ___________
|
|
- Duration: ___________
|
|
|
|
**Dependencies**:
|
|
- Services that depend on this: ___________
|
|
- Services this depends on: ___________
|
|
|
|
**Notification**:
|
|
- Who to notify: ___________
|
|
- When to notify: ___________
|
|
|
|
## Migration Procedure
|
|
|
|
### Method A: GitOps Migration (Recommended)
|
|
|
|
Best for: Most services with proper version control
|
|
|
|
#### Step 1: Backup Current Service
|
|
|
|
```bash
|
|
# SSH to source host
|
|
ssh [source-host]
|
|
|
|
# Create backup
|
|
docker stop [service-name]
|
|
docker export [service-name] > /tmp/[service-name]-backup.tar
|
|
|
|
# Backup volumes
|
|
for vol in $(docker volume ls -q | grep [service-name]); do
|
|
docker run --rm -v $vol:/source -v /tmp:/backup alpine tar czf /backup/$vol.tar.gz -C /source .
|
|
done
|
|
|
|
# Copy backups to safe location
|
|
scp /tmp/[service-name]*.tar* [backup-location]:~/backups/
|
|
```
|
|
|
|
#### Step 2: Export Configuration
|
|
|
|
```bash
|
|
# Get current docker-compose configuration
|
|
cd ~/Documents/repos/homelab
|
|
cat hosts/[source-host]/[service-name].yaml > /tmp/service-config.yaml
|
|
|
|
# Note environment variables
|
|
docker inspect [service-name] | grep -A 50 Env
|
|
```
|
|
|
|
#### Step 3: Copy Data to Target Host
|
|
|
|
**For Small Data (< 10GB)**: Use SCP
|
|
```bash
|
|
# From your workstation
|
|
scp -r [source-host]:/volume1/docker/[service-name] /tmp/
|
|
scp -r /tmp/[service-name] [target-host]:/path/to/docker/
|
|
```
|
|
|
|
**For Large Data (> 10GB)**: Use Rsync
|
|
```bash
|
|
# From source host to target host via Tailscale
|
|
ssh [source-host]
|
|
rsync -avz --progress /volume1/docker/[service-name]/ \
|
|
[target-host-tailscale-ip]:/path/to/docker/[service-name]/
|
|
|
|
# Monitor progress
|
|
watch -n 5 'du -sh /path/to/docker/[service-name]'
|
|
```
|
|
|
|
**For Very Large Data (> 100GB)**: Consider physical transfer
|
|
```bash
|
|
# Copy to USB drive, physically move, then copy to target
|
|
# Or use network-attached storage as intermediate
|
|
```
|
|
|
|
#### Step 4: Stop Service on Source Host
|
|
|
|
```bash
|
|
# SSH to source host
|
|
ssh [source-host]
|
|
|
|
# Stop the container
|
|
docker stop [service-name]
|
|
|
|
# Verify it's stopped
|
|
docker ps -a | grep [service-name]
|
|
```
|
|
|
|
#### Step 5: Update Git Configuration
|
|
|
|
```bash
|
|
# On your workstation
|
|
cd ~/Documents/repos/homelab
|
|
|
|
# Move service definition to new host
|
|
git mv hosts/[source-host]/[service-name].yaml \
|
|
hosts/[target-host]/[service-name].yaml
|
|
|
|
# Update paths in the configuration file if needed
|
|
nano hosts/[target-host]/[service-name].yaml
|
|
|
|
# Update volume paths for target host
|
|
# Atlantis/Calypso: /volume1/docker/[service-name]
|
|
# NUC/VM: /home/user/docker/[service-name]
|
|
# Raspberry Pi: /home/pi/docker/[service-name]
|
|
|
|
# Commit changes
|
|
git add hosts/[target-host]/[service-name].yaml
|
|
git commit -m "Migrate [service-name] from [source-host] to [target-host]
|
|
|
|
- Move service configuration
|
|
- Update volume paths for target host
|
|
- Migration date: $(date +%Y-%m-%d)
|
|
|
|
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"
|
|
|
|
git push origin main
|
|
```
|
|
|
|
#### Step 6: Deploy on Target Host
|
|
|
|
**Via Portainer UI**:
|
|
1. Open Portainer → Select target host endpoint
|
|
2. Go to **Stacks** → **Add stack** → **Git Repository**
|
|
3. Configure:
|
|
- Repository URL: Your git repository
|
|
- Compose path: `hosts/[target-host]/[service-name].yaml`
|
|
- Enable GitOps (optional)
|
|
4. Click **Deploy the stack**
|
|
|
|
**Via GitOps Auto-Sync**:
|
|
- Wait 5-10 minutes for automatic deployment
|
|
- Monitor Portainer for new stack appearance
|
|
|
|
#### Step 7: Verify Migration
|
|
|
|
```bash
|
|
# SSH to target host
|
|
ssh [target-host]
|
|
|
|
# Check container is running
|
|
docker ps | grep [service-name]
|
|
|
|
# Check logs for errors
|
|
docker logs [service-name] --tail 100
|
|
|
|
# Test service accessibility
|
|
curl http://localhost:[port] # Internal
|
|
curl https://[service].vish.gg # External (if applicable)
|
|
|
|
# Verify data integrity
|
|
docker exec [service-name] ls -lah /config
|
|
docker exec [service-name] ls -lah /data
|
|
|
|
# Check resource usage
|
|
docker stats [service-name] --no-stream
|
|
```
|
|
|
|
#### Step 8: Update DNS/Reverse Proxy (If Applicable)
|
|
|
|
```bash
|
|
# Update Nginx Proxy Manager or reverse proxy configuration
|
|
# Point [service].vish.gg to new host IP
|
|
|
|
# Update Cloudflare DNS if using Cloudflare Tunnels
|
|
|
|
# Update local DNS (AdGuard Home) if applicable
|
|
```
|
|
|
|
#### Step 9: Remove from Source Host
|
|
|
|
**Only after verifying target is working correctly!**
|
|
|
|
```bash
|
|
# SSH to source host
|
|
ssh [source-host]
|
|
|
|
# Remove container and volumes
|
|
docker stop [service-name]
|
|
docker rm [service-name]
|
|
|
|
# Optional: Remove volumes (only if data copied successfully)
|
|
# docker volume rm $(docker volume ls -q | grep [service-name])
|
|
|
|
# Remove data directory
|
|
rm -rf /volume1/docker/[service-name] # BE CAREFUL!
|
|
|
|
# Remove from Portainer if manually managed
|
|
# Portainer UI → Stacks → Remove stack
|
|
```
|
|
|
|
### Method B: Manual Export/Import
|
|
|
|
Best for: Quick migrations without git changes, or when testing
|
|
|
|
#### Step 1: Stop and Export
|
|
|
|
```bash
|
|
# SSH to source host
|
|
ssh [source-host]
|
|
|
|
# Stop service
|
|
docker stop [service-name]
|
|
|
|
# Export container and volumes
|
|
docker run --rm \
|
|
-v [service-name]_data:/source \
|
|
-v /tmp:/backup \
|
|
alpine tar czf /backup/[service-name]-data.tar.gz -C /source .
|
|
|
|
# Export configuration
|
|
docker inspect [service-name] > /tmp/[service-name]-config.json
|
|
```
|
|
|
|
#### Step 2: Transfer to Target
|
|
|
|
```bash
|
|
# Copy data to target host
|
|
scp /tmp/[service-name]-data.tar.gz [target-host]:/tmp/
|
|
scp /tmp/[service-name]-config.json [target-host]:/tmp/
|
|
```
|
|
|
|
#### Step 3: Import on Target
|
|
|
|
```bash
|
|
# SSH to target host
|
|
ssh [target-host]
|
|
|
|
# Create volume
|
|
docker volume create [service-name]_data
|
|
|
|
# Import data
|
|
docker run --rm \
|
|
-v [service-name]_data:/target \
|
|
-v /tmp:/backup \
|
|
alpine tar xzf /backup/[service-name]-data.tar.gz -C /target
|
|
|
|
# Create and start container using saved configuration
|
|
# Adjust paths and ports as needed
|
|
docker create --name [service-name] \
|
|
[options-from-config.json] \
|
|
[image:tag]
|
|
|
|
docker start [service-name]
|
|
```
|
|
|
|
## Post-Migration Tasks
|
|
|
|
### Update Documentation
|
|
|
|
```bash
|
|
# Update service inventory
|
|
nano docs/services/VERIFIED_SERVICE_INVENTORY.md
|
|
|
|
# Update the host column for migrated service
|
|
# | Service | Host | Port | URL | Status |
|
|
# | Service | [NEW-HOST] | 8080 | https://service.vish.gg | ✅ Active |
|
|
```
|
|
|
|
### Update Monitoring
|
|
|
|
```bash
|
|
# Update Prometheus configuration if needed
|
|
nano prometheus/prometheus.yml
|
|
|
|
# Update target host IP for scraped metrics
|
|
# Restart Prometheus if configuration changed
|
|
```
|
|
|
|
### Test Backups
|
|
|
|
```bash
|
|
# Verify backups work on new host
|
|
./backup.sh --test
|
|
|
|
# Ensure service data is included in backup
|
|
ls -lah /path/to/backups/[service-name]
|
|
```
|
|
|
|
### Performance Baseline
|
|
|
|
```bash
|
|
# Document baseline performance on new host
|
|
docker stats [service-name] --no-stream
|
|
|
|
# Monitor for 24 hours to ensure stability
|
|
```
|
|
|
|
## Verification Checklist
|
|
|
|
- [ ] Service running on target host: `docker ps`
|
|
- [ ] All data migrated correctly
|
|
- [ ] Configuration preserved
|
|
- [ ] Logs show no errors: `docker logs [service]`
|
|
- [ ] External access works (if applicable)
|
|
- [ ] Internal service connectivity works
|
|
- [ ] Reverse proxy updated (if applicable)
|
|
- [ ] DNS records updated (if applicable)
|
|
- [ ] Monitoring updated
|
|
- [ ] Documentation updated
|
|
- [ ] Backups include new location
|
|
- [ ] Old host cleaned up
|
|
- [ ] Users notified of any URL changes
|
|
|
|
## Rollback Procedure
|
|
|
|
If migration fails or causes issues:
|
|
|
|
### Quick Rollback (Within 24 hours)
|
|
|
|
```bash
|
|
# SSH to source host
|
|
ssh [source-host]
|
|
|
|
# Restore from backup
|
|
docker import /tmp/[service-name]-backup.tar [service-name]:backup
|
|
|
|
# Or redeploy from git (revert git changes)
|
|
cd ~/Documents/repos/homelab
|
|
git revert HEAD
|
|
git push origin main
|
|
|
|
# Restart service on source host
|
|
# Via Portainer or:
|
|
docker start [service-name]
|
|
```
|
|
|
|
### Full Rollback (After cleanup)
|
|
|
|
```bash
|
|
# Restore from backup
|
|
./restore.sh [backup-date]
|
|
|
|
# Redeploy to original host
|
|
# Follow original deployment procedure
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Data Transfer Very Slow
|
|
|
|
**Symptoms**: Rsync taking hours for moderate data
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Use compression for better network performance
|
|
rsync -avz --compress-level=6 --progress /source/ [target]:/dest/
|
|
|
|
# Or use parallel transfer tools
|
|
# Install: sudo apt-get install parallel
|
|
find /source -type f | parallel -j 4 scp {} [target]:/dest/{}
|
|
|
|
# For extremely large transfers, consider:
|
|
# 1. Physical USB drive transfer
|
|
# 2. NFS mount between hosts
|
|
# 3. Transfer during off-peak hours
|
|
```
|
|
|
|
### Issue: Service Won't Start on Target Host
|
|
|
|
**Symptoms**: Container starts then immediately exits
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check logs
|
|
docker logs [service-name]
|
|
|
|
# Common issues:
|
|
# 1. Path issues - Update volume paths in compose file
|
|
# 2. Permission issues - Check PUID/PGID
|
|
# 3. Port conflicts - Check if port already in use
|
|
# 4. Missing dependencies - Ensure all required services running
|
|
|
|
# Fix permissions
|
|
docker exec [service-name] chown -R 1000:1000 /config /data
|
|
```
|
|
|
|
### Issue: Lost Configuration Data
|
|
|
|
**Symptoms**: Service starts but settings are default
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check if volumes mounted correctly
|
|
docker inspect [service-name] | grep -A 10 Mounts
|
|
|
|
# Restore configuration from backup
|
|
docker stop [service-name]
|
|
docker run --rm -v [service-name]_config:/target -v /tmp:/backup alpine \
|
|
tar xzf /backup/config-backup.tar.gz -C /target
|
|
docker start [service-name]
|
|
```
|
|
|
|
### Issue: Network Connectivity Problems
|
|
|
|
**Symptoms**: Service can't reach other services
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check network configuration
|
|
docker network ls
|
|
docker network inspect [network-name]
|
|
|
|
# Add service to required networks
|
|
docker network connect [network-name] [service-name]
|
|
|
|
# Verify DNS resolution
|
|
docker exec [service-name] ping [other-service]
|
|
```
|
|
|
|
## Migration Examples
|
|
|
|
### Example 1: Migrate Uptime Kuma from Calypso to Homelab VM
|
|
|
|
```bash
|
|
# 1. Backup on Calypso
|
|
ssh calypso
|
|
docker stop uptime-kuma
|
|
tar czf /tmp/uptime-kuma-data.tar.gz /volume1/docker/uptime-kuma
|
|
|
|
# 2. Transfer
|
|
scp /tmp/uptime-kuma-data.tar.gz homelab-vm:/tmp/
|
|
|
|
# 3. Update git
|
|
cd ~/Documents/repos/homelab
|
|
git mv hosts/synology/calypso/uptime-kuma.yaml \
|
|
hosts/vms/homelab-vm/uptime-kuma.yaml
|
|
# Update paths in file
|
|
sed -i 's|/volume1/docker/uptime-kuma|/home/user/docker/uptime-kuma|g' \
|
|
hosts/vms/homelab-vm/uptime-kuma.yaml
|
|
|
|
# 4. Deploy on target
|
|
git add . && git commit -m "Migrate Uptime Kuma to Homelab VM" && git push
|
|
|
|
# 5. Verify and cleanup Calypso
|
|
```
|
|
|
|
### Example 2: Migrate AdGuard Home between Hosts
|
|
|
|
```bash
|
|
# AdGuard Home requires DNS configuration updates
|
|
# 1. Note current DNS settings on clients
|
|
# 2. Migrate service (as above)
|
|
# 3. Update client DNS to point to new host IP
|
|
# 4. Test DNS resolution from clients
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- [Add New Service](add-new-service.md)
|
|
- [Infrastructure Overview](../infrastructure/INFRASTRUCTURE_OVERVIEW.md)
|
|
- [Backup Strategies](../admin/backup-strategies.md)
|
|
- [Deployment Workflow](../admin/DEPLOYMENT_WORKFLOW.md)
|
|
|
|
## Change Log
|
|
|
|
- 2026-02-14 - Initial creation with multiple migration methods
|
|
- 2026-02-14 - Added large data migration strategies
|