Sanitized mirror from private repository - 2026-04-18 11:19:59 UTC
This commit is contained in:
559
docs/runbooks/service-migration.md
Normal file
559
docs/runbooks/service-migration.md
Normal file
@@ -0,0 +1,559 @@
|
||||
# Service Migration Runbook
|
||||
|
||||
## Overview
|
||||
This runbook guides you through migrating a containerized service from one host to another in the homelab. The procedure minimizes downtime and ensures data integrity throughout the migration.
|
||||
|
||||
## Prerequisites
|
||||
- [ ] SSH access to both source and target hosts
|
||||
- [ ] Sufficient disk space on target host
|
||||
- [ ] Network connectivity between hosts (Tailscale recommended)
|
||||
- [ ] Service backup completed and verified
|
||||
- [ ] Maintenance window scheduled (if downtime required)
|
||||
- [ ] Portainer access for both hosts
|
||||
|
||||
## Metadata
|
||||
- **Estimated Time**: 1-3 hours (depending on data size)
|
||||
- **Risk Level**: Medium-High (data migration involved)
|
||||
- **Requires Downtime**: Yes (typically 15-60 minutes)
|
||||
- **Reversible**: Yes (can roll back to source host)
|
||||
- **Tested On**: 2026-02-14
|
||||
|
||||
## When to Migrate Services
|
||||
|
||||
Common reasons for service migration:
|
||||
|
||||
| Scenario | Example | Recommended Target |
|
||||
|----------|---------|-------------------|
|
||||
| **Resource constraints** | NAS running out of CPU | Move to NUC or VM |
|
||||
| **Storage constraints** | Running out of disk space | Move to larger NAS |
|
||||
| **Performance issues** | High I/O affecting other services | Move to dedicated host |
|
||||
| **Host consolidation** | Reducing number of active hosts | Consolidate to primary hosts |
|
||||
| **Hardware maintenance** | Planned hardware upgrade | Temporary or permanent move |
|
||||
| **Improved organization** | Group related services | Move to appropriate host |
|
||||
|
||||
## Migration Types
|
||||
|
||||
### Type 1: Simple Migration (Stateless Service)
|
||||
- No persistent data
|
||||
- Can be redeployed from scratch
|
||||
- Example: Nginx, static web servers
|
||||
- **Downtime**: Minimal (5-15 minutes)
|
||||
|
||||
### Type 2: Standard Migration (Small Data)
|
||||
- Persistent data < 10GB
|
||||
- Configuration and databases
|
||||
- Example: Uptime Kuma, AdGuard Home
|
||||
- **Downtime**: 15-30 minutes
|
||||
|
||||
### Type 3: Large Data Migration
|
||||
- Persistent data > 10GB
|
||||
- Media libraries, large databases
|
||||
- Example: Plex, Immich, Jellyfin
|
||||
- **Downtime**: 1-4 hours (depending on size)
|
||||
|
||||
## Pre-Migration Planning
|
||||
|
||||
### Step 1: Assess the Service
|
||||
|
||||
```bash
|
||||
# SSH to source host
|
||||
ssh [source-host]
|
||||
|
||||
# Identify container and volumes
|
||||
docker ps | grep [service-name]
|
||||
docker inspect [service-name] | grep -A 10 Mounts
|
||||
|
||||
# Check data size
|
||||
docker exec [service-name] du -sh /config /data
|
||||
|
||||
# List all volumes used by service
|
||||
docker volume ls | grep [service-name]
|
||||
|
||||
# Check volume sizes
|
||||
docker system df -v | grep [service-name]
|
||||
```
|
||||
|
||||
Document findings:
|
||||
- Container name: ___________
|
||||
- Image and tag: ___________
|
||||
- Data size: ___________
|
||||
- Volume count: ___________
|
||||
- Network dependencies: ___________
|
||||
- Port mappings: ___________
|
||||
|
||||
### Step 2: Check Target Host Capacity
|
||||
|
||||
```bash
|
||||
# SSH to target host
|
||||
ssh [target-host]
|
||||
|
||||
# Check available resources
|
||||
df -h # Disk space
|
||||
free -h # RAM
|
||||
nproc # CPU cores
|
||||
docker ps | wc -l # Current container count
|
||||
|
||||
# Check port conflicts
|
||||
netstat -tlnp | grep [required-port]
|
||||
```
|
||||
|
||||
### Step 3: Create Migration Plan
|
||||
|
||||
**Downtime Window**:
|
||||
- Start: ___________
|
||||
- End: ___________
|
||||
- Duration: ___________
|
||||
|
||||
**Dependencies**:
|
||||
- Services that depend on this: ___________
|
||||
- Services this depends on: ___________
|
||||
|
||||
**Notification**:
|
||||
- Who to notify: ___________
|
||||
- When to notify: ___________
|
||||
|
||||
## Migration Procedure
|
||||
|
||||
### Method A: GitOps Migration (Recommended)
|
||||
|
||||
Best for: Most services with proper version control
|
||||
|
||||
#### Step 1: Backup Current Service
|
||||
|
||||
```bash
|
||||
# SSH to source host
|
||||
ssh [source-host]
|
||||
|
||||
# Create backup
|
||||
docker stop [service-name]
|
||||
docker export [service-name] > /tmp/[service-name]-backup.tar
|
||||
|
||||
# Backup volumes
|
||||
for vol in $(docker volume ls -q | grep [service-name]); do
|
||||
docker run --rm -v $vol:/source -v /tmp:/backup alpine tar czf /backup/$vol.tar.gz -C /source .
|
||||
done
|
||||
|
||||
# Copy backups to safe location
|
||||
scp /tmp/[service-name]*.tar* [backup-location]:~/backups/
|
||||
```
|
||||
|
||||
#### Step 2: Export Configuration
|
||||
|
||||
```bash
|
||||
# Get current docker-compose configuration
|
||||
cd ~/Documents/repos/homelab
|
||||
cat hosts/[source-host]/[service-name].yaml > /tmp/service-config.yaml
|
||||
|
||||
# Note environment variables
|
||||
docker inspect [service-name] | grep -A 50 Env
|
||||
```
|
||||
|
||||
#### Step 3: Copy Data to Target Host
|
||||
|
||||
**For Small Data (< 10GB)**: Use SCP
|
||||
```bash
|
||||
# From your workstation
|
||||
scp -r [source-host]:/volume1/docker/[service-name] /tmp/
|
||||
scp -r /tmp/[service-name] [target-host]:/path/to/docker/
|
||||
```
|
||||
|
||||
**For Large Data (> 10GB)**: Use Rsync
|
||||
```bash
|
||||
# From source host to target host via Tailscale
|
||||
ssh [source-host]
|
||||
rsync -avz --progress /volume1/docker/[service-name]/ \
|
||||
[target-host-tailscale-ip]:/path/to/docker/[service-name]/
|
||||
|
||||
# Monitor progress
|
||||
watch -n 5 'du -sh /path/to/docker/[service-name]'
|
||||
```
|
||||
|
||||
**For Very Large Data (> 100GB)**: Consider physical transfer
|
||||
```bash
|
||||
# Copy to USB drive, physically move, then copy to target
|
||||
# Or use network-attached storage as intermediate
|
||||
```
|
||||
|
||||
#### Step 4: Stop Service on Source Host
|
||||
|
||||
```bash
|
||||
# SSH to source host
|
||||
ssh [source-host]
|
||||
|
||||
# Stop the container
|
||||
docker stop [service-name]
|
||||
|
||||
# Verify it's stopped
|
||||
docker ps -a | grep [service-name]
|
||||
```
|
||||
|
||||
#### Step 5: Update Git Configuration
|
||||
|
||||
```bash
|
||||
# On your workstation
|
||||
cd ~/Documents/repos/homelab
|
||||
|
||||
# Move service definition to new host
|
||||
git mv hosts/[source-host]/[service-name].yaml \
|
||||
hosts/[target-host]/[service-name].yaml
|
||||
|
||||
# Update paths in the configuration file if needed
|
||||
nano hosts/[target-host]/[service-name].yaml
|
||||
|
||||
# Update volume paths for target host
|
||||
# Atlantis/Calypso: /volume1/docker/[service-name]
|
||||
# NUC/VM: /home/user/docker/[service-name]
|
||||
# Raspberry Pi: /home/pi/docker/[service-name]
|
||||
|
||||
# Commit changes
|
||||
git add hosts/[target-host]/[service-name].yaml
|
||||
git commit -m "Migrate [service-name] from [source-host] to [target-host]
|
||||
|
||||
- Move service configuration
|
||||
- Update volume paths for target host
|
||||
- Migration date: $(date +%Y-%m-%d)
|
||||
|
||||
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"
|
||||
|
||||
git push origin main
|
||||
```
|
||||
|
||||
#### Step 6: Deploy on Target Host
|
||||
|
||||
**Via Portainer UI**:
|
||||
1. Open Portainer → Select target host endpoint
|
||||
2. Go to **Stacks** → **Add stack** → **Git Repository**
|
||||
3. Configure:
|
||||
- Repository URL: Your git repository
|
||||
- Compose path: `hosts/[target-host]/[service-name].yaml`
|
||||
- Enable GitOps (optional)
|
||||
4. Click **Deploy the stack**
|
||||
|
||||
**Via GitOps Auto-Sync**:
|
||||
- Wait 5-10 minutes for automatic deployment
|
||||
- Monitor Portainer for new stack appearance
|
||||
|
||||
#### Step 7: Verify Migration
|
||||
|
||||
```bash
|
||||
# SSH to target host
|
||||
ssh [target-host]
|
||||
|
||||
# Check container is running
|
||||
docker ps | grep [service-name]
|
||||
|
||||
# Check logs for errors
|
||||
docker logs [service-name] --tail 100
|
||||
|
||||
# Test service accessibility
|
||||
curl http://localhost:[port] # Internal
|
||||
curl https://[service].vish.gg # External (if applicable)
|
||||
|
||||
# Verify data integrity
|
||||
docker exec [service-name] ls -lah /config
|
||||
docker exec [service-name] ls -lah /data
|
||||
|
||||
# Check resource usage
|
||||
docker stats [service-name] --no-stream
|
||||
```
|
||||
|
||||
#### Step 8: Update DNS/Reverse Proxy (If Applicable)
|
||||
|
||||
```bash
|
||||
# Update Nginx Proxy Manager or reverse proxy configuration
|
||||
# Point [service].vish.gg to new host IP
|
||||
|
||||
# Update Cloudflare DNS if using Cloudflare Tunnels
|
||||
|
||||
# Update local DNS (AdGuard Home) if applicable
|
||||
```
|
||||
|
||||
#### Step 9: Remove from Source Host
|
||||
|
||||
**Only after verifying target is working correctly!**
|
||||
|
||||
```bash
|
||||
# SSH to source host
|
||||
ssh [source-host]
|
||||
|
||||
# Remove container and volumes
|
||||
docker stop [service-name]
|
||||
docker rm [service-name]
|
||||
|
||||
# Optional: Remove volumes (only if data copied successfully)
|
||||
# docker volume rm $(docker volume ls -q | grep [service-name])
|
||||
|
||||
# Remove data directory
|
||||
rm -rf /volume1/docker/[service-name] # BE CAREFUL!
|
||||
|
||||
# Remove from Portainer if manually managed
|
||||
# Portainer UI → Stacks → Remove stack
|
||||
```
|
||||
|
||||
### Method B: Manual Export/Import
|
||||
|
||||
Best for: Quick migrations without git changes, or when testing
|
||||
|
||||
#### Step 1: Stop and Export
|
||||
|
||||
```bash
|
||||
# SSH to source host
|
||||
ssh [source-host]
|
||||
|
||||
# Stop service
|
||||
docker stop [service-name]
|
||||
|
||||
# Export container and volumes
|
||||
docker run --rm \
|
||||
-v [service-name]_data:/source \
|
||||
-v /tmp:/backup \
|
||||
alpine tar czf /backup/[service-name]-data.tar.gz -C /source .
|
||||
|
||||
# Export configuration
|
||||
docker inspect [service-name] > /tmp/[service-name]-config.json
|
||||
```
|
||||
|
||||
#### Step 2: Transfer to Target
|
||||
|
||||
```bash
|
||||
# Copy data to target host
|
||||
scp /tmp/[service-name]-data.tar.gz [target-host]:/tmp/
|
||||
scp /tmp/[service-name]-config.json [target-host]:/tmp/
|
||||
```
|
||||
|
||||
#### Step 3: Import on Target
|
||||
|
||||
```bash
|
||||
# SSH to target host
|
||||
ssh [target-host]
|
||||
|
||||
# Create volume
|
||||
docker volume create [service-name]_data
|
||||
|
||||
# Import data
|
||||
docker run --rm \
|
||||
-v [service-name]_data:/target \
|
||||
-v /tmp:/backup \
|
||||
alpine tar xzf /backup/[service-name]-data.tar.gz -C /target
|
||||
|
||||
# Create and start container using saved configuration
|
||||
# Adjust paths and ports as needed
|
||||
docker create --name [service-name] \
|
||||
[options-from-config.json] \
|
||||
[image:tag]
|
||||
|
||||
docker start [service-name]
|
||||
```
|
||||
|
||||
## Post-Migration Tasks
|
||||
|
||||
### Update Documentation
|
||||
|
||||
```bash
|
||||
# Update service inventory
|
||||
nano docs/services/VERIFIED_SERVICE_INVENTORY.md
|
||||
|
||||
# Update the host column for migrated service
|
||||
# | Service | Host | Port | URL | Status |
|
||||
# | Service | [NEW-HOST] | 8080 | https://service.vish.gg | ✅ Active |
|
||||
```
|
||||
|
||||
### Update Monitoring
|
||||
|
||||
```bash
|
||||
# Update Prometheus configuration if needed
|
||||
nano prometheus/prometheus.yml
|
||||
|
||||
# Update target host IP for scraped metrics
|
||||
# Restart Prometheus if configuration changed
|
||||
```
|
||||
|
||||
### Test Backups
|
||||
|
||||
```bash
|
||||
# Verify backups work on new host
|
||||
./backup.sh --test
|
||||
|
||||
# Ensure service data is included in backup
|
||||
ls -lah /path/to/backups/[service-name]
|
||||
```
|
||||
|
||||
### Performance Baseline
|
||||
|
||||
```bash
|
||||
# Document baseline performance on new host
|
||||
docker stats [service-name] --no-stream
|
||||
|
||||
# Monitor for 24 hours to ensure stability
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] Service running on target host: `docker ps`
|
||||
- [ ] All data migrated correctly
|
||||
- [ ] Configuration preserved
|
||||
- [ ] Logs show no errors: `docker logs [service]`
|
||||
- [ ] External access works (if applicable)
|
||||
- [ ] Internal service connectivity works
|
||||
- [ ] Reverse proxy updated (if applicable)
|
||||
- [ ] DNS records updated (if applicable)
|
||||
- [ ] Monitoring updated
|
||||
- [ ] Documentation updated
|
||||
- [ ] Backups include new location
|
||||
- [ ] Old host cleaned up
|
||||
- [ ] Users notified of any URL changes
|
||||
|
||||
## Rollback Procedure
|
||||
|
||||
If migration fails or causes issues:
|
||||
|
||||
### Quick Rollback (Within 24 hours)
|
||||
|
||||
```bash
|
||||
# SSH to source host
|
||||
ssh [source-host]
|
||||
|
||||
# Restore from backup
|
||||
docker import /tmp/[service-name]-backup.tar [service-name]:backup
|
||||
|
||||
# Or redeploy from git (revert git changes)
|
||||
cd ~/Documents/repos/homelab
|
||||
git revert HEAD
|
||||
git push origin main
|
||||
|
||||
# Restart service on source host
|
||||
# Via Portainer or:
|
||||
docker start [service-name]
|
||||
```
|
||||
|
||||
### Full Rollback (After cleanup)
|
||||
|
||||
```bash
|
||||
# Restore from backup
|
||||
./restore.sh [backup-date]
|
||||
|
||||
# Redeploy to original host
|
||||
# Follow original deployment procedure
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Data Transfer Very Slow
|
||||
|
||||
**Symptoms**: Rsync taking hours for moderate data
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Use compression for better network performance
|
||||
rsync -avz --compress-level=6 --progress /source/ [target]:/dest/
|
||||
|
||||
# Or use parallel transfer tools
|
||||
# Install: sudo apt-get install parallel
|
||||
find /source -type f | parallel -j 4 scp {} [target]:/dest/{}
|
||||
|
||||
# For extremely large transfers, consider:
|
||||
# 1. Physical USB drive transfer
|
||||
# 2. NFS mount between hosts
|
||||
# 3. Transfer during off-peak hours
|
||||
```
|
||||
|
||||
### Issue: Service Won't Start on Target Host
|
||||
|
||||
**Symptoms**: Container starts then immediately exits
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Check logs
|
||||
docker logs [service-name]
|
||||
|
||||
# Common issues:
|
||||
# 1. Path issues - Update volume paths in compose file
|
||||
# 2. Permission issues - Check PUID/PGID
|
||||
# 3. Port conflicts - Check if port already in use
|
||||
# 4. Missing dependencies - Ensure all required services running
|
||||
|
||||
# Fix permissions
|
||||
docker exec [service-name] chown -R 1000:1000 /config /data
|
||||
```
|
||||
|
||||
### Issue: Lost Configuration Data
|
||||
|
||||
**Symptoms**: Service starts but settings are default
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Check if volumes mounted correctly
|
||||
docker inspect [service-name] | grep -A 10 Mounts
|
||||
|
||||
# Restore configuration from backup
|
||||
docker stop [service-name]
|
||||
docker run --rm -v [service-name]_config:/target -v /tmp:/backup alpine \
|
||||
tar xzf /backup/config-backup.tar.gz -C /target
|
||||
docker start [service-name]
|
||||
```
|
||||
|
||||
### Issue: Network Connectivity Problems
|
||||
|
||||
**Symptoms**: Service can't reach other services
|
||||
|
||||
**Solutions**:
|
||||
```bash
|
||||
# Check network configuration
|
||||
docker network ls
|
||||
docker network inspect [network-name]
|
||||
|
||||
# Add service to required networks
|
||||
docker network connect [network-name] [service-name]
|
||||
|
||||
# Verify DNS resolution
|
||||
docker exec [service-name] ping [other-service]
|
||||
```
|
||||
|
||||
## Migration Examples
|
||||
|
||||
### Example 1: Migrate Uptime Kuma from Calypso to Homelab VM
|
||||
|
||||
```bash
|
||||
# 1. Backup on Calypso
|
||||
ssh calypso
|
||||
docker stop uptime-kuma
|
||||
tar czf /tmp/uptime-kuma-data.tar.gz /volume1/docker/uptime-kuma
|
||||
|
||||
# 2. Transfer
|
||||
scp /tmp/uptime-kuma-data.tar.gz homelab-vm:/tmp/
|
||||
|
||||
# 3. Update git
|
||||
cd ~/Documents/repos/homelab
|
||||
git mv hosts/synology/calypso/uptime-kuma.yaml \
|
||||
hosts/vms/homelab-vm/uptime-kuma.yaml
|
||||
# Update paths in file
|
||||
sed -i 's|/volume1/docker/uptime-kuma|/home/user/docker/uptime-kuma|g' \
|
||||
hosts/vms/homelab-vm/uptime-kuma.yaml
|
||||
|
||||
# 4. Deploy on target
|
||||
git add . && git commit -m "Migrate Uptime Kuma to Homelab VM" && git push
|
||||
|
||||
# 5. Verify and cleanup Calypso
|
||||
```
|
||||
|
||||
### Example 2: Migrate AdGuard Home between Hosts
|
||||
|
||||
```bash
|
||||
# AdGuard Home requires DNS configuration updates
|
||||
# 1. Note current DNS settings on clients
|
||||
# 2. Migrate service (as above)
|
||||
# 3. Update client DNS to point to new host IP
|
||||
# 4. Test DNS resolution from clients
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Add New Service](add-new-service.md)
|
||||
- [Infrastructure Overview](../infrastructure/INFRASTRUCTURE_OVERVIEW.md)
|
||||
- [Backup Strategies](../admin/backup-strategies.md)
|
||||
- [Deployment Workflow](../admin/DEPLOYMENT_WORKFLOW.md)
|
||||
|
||||
## Change Log
|
||||
|
||||
- 2026-02-14 - Initial creation with multiple migration methods
|
||||
- 2026-02-14 - Added large data migration strategies
|
||||
Reference in New Issue
Block a user