# ๐Ÿ”ง Maintenance Guide ## Overview This guide covers routine maintenance tasks to keep the homelab running smoothly, including updates, cleanup, and health checks. --- ## ๐Ÿ“… Maintenance Schedule ### Daily (Automated) - [ ] Database backups - [ ] Log rotation - [ ] Container health checks - [ ] Certificate monitoring ### Weekly - [ ] Review container updates (Watchtower reports) - [ ] Check disk space across all hosts - [ ] Review monitoring alerts - [ ] Verify backup integrity ### Monthly - [ ] Apply container updates - [ ] DSM/Proxmox security updates - [ ] Review and prune unused Docker resources - [ ] Test backup restoration - [ ] Review access logs for anomalies ### Quarterly - [ ] Full system health audit - [ ] Review and update documentation - [ ] Capacity planning review - [ ] Security audit - [ ] Test disaster recovery procedures --- ## ๐Ÿณ Docker Maintenance ### Container Updates ```bash # Check for available updates docker images --format "{{.Repository}}:{{.Tag}}" | while read img; do docker pull "$img" 2>/dev/null && echo "Updated: $img" done # Or use Watchtower for automated updates docker run -d \ --name watchtower \ -v /var/run/docker.sock:/var/run/docker.sock \ containrrr/watchtower \ --schedule "0 4 * * 0" \ # Sundays at 4 AM --cleanup ``` ### Prune Unused Resources ```bash # Remove stopped containers docker container prune -f # Remove unused images docker image prune -a -f # Remove unused volumes (CAREFUL!) docker volume prune -f # Remove unused networks docker network prune -f # All-in-one cleanup docker system prune -a --volumes -f # Check space recovered docker system df ``` ### Container Health Checks ```bash # Check all container statuses docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" # Find unhealthy containers docker ps --filter "health=unhealthy" # Restart unhealthy containers docker ps --filter "health=unhealthy" -q | xargs -r docker restart # Check container logs for errors for c in $(docker ps -q); do echo "=== $(docker inspect --format '{{.Name}}' $c) ===" docker logs "$c" --tail 20 2>&1 | grep -i "error\|warn\|fail" || echo "No issues" done ``` --- ## ๐Ÿ’พ Storage Maintenance ### Disk Space Monitoring ```bash # Check disk usage on all volumes df -h | grep -E "^/dev|volume" # Find large files find /volume1/docker -type f -size +1G -exec ls -lh {} \; # Find old log files find /volume1 -name "*.log" -mtime +30 -size +100M # Check Docker disk usage docker system df -v ``` ### Log Management ```bash # Truncate large container logs for log in $(find /var/lib/docker/containers -name "*-json.log" -size +100M); do echo "Truncating: $log" truncate -s 0 "$log" done # Configure log rotation in docker-compose services: myservice: logging: driver: "json-file" options: max-size: "10m" max-file: "3" ``` ### Database Maintenance ```bash # PostgreSQL vacuum and analyze docker exec postgres psql -U postgres -c "VACUUM ANALYZE;" # PostgreSQL reindex docker exec postgres psql -U postgres -c "REINDEX DATABASE postgres;" # Check database size docker exec postgres psql -U postgres -c " SELECT pg_database.datname, pg_size_pretty(pg_database_size(pg_database.datname)) AS size FROM pg_database ORDER BY pg_database_size(pg_database.datname) DESC;" ``` --- ## ๐Ÿ–ฅ๏ธ Synology Maintenance ### DSM Updates ```bash # Check for updates via CLI synoupgrade --check # Or via DSM UI: # Control Panel > Update & Restore > DSM Update ``` ### Storage Health ```bash # Check RAID status cat /proc/mdstat # Check disk health syno_hdd_util --all # Check for bad sectors smartctl -a /dev/sda | grep -E "Reallocated|Current_Pending" ``` ### Package Updates ```bash # List installed packages synopkg list --name # Update all packages synopkg update_all ``` ### Index Optimization ```bash # Rebuild media index (if slow) synoindex -R /volume1/media # Or via DSM: # Control Panel > Indexing Service > Re-index ``` --- ## ๐ŸŒ Network Maintenance ### DNS Cache ```bash # Flush Pi-hole DNS cache docker exec pihole pihole restartdns # Check DNS resolution dig @localhost google.com # Check Pi-hole stats docker exec pihole pihole -c -e ``` ### Certificate Renewal ```bash # Check certificate expiry echo | openssl s_client -servername example.com -connect example.com:443 2>/dev/null | \ openssl x509 -noout -dates # Force Let's Encrypt renewal (NPM) # Login to NPM UI > SSL Certificates > Renew # Wildcard cert renewal (if using DNS challenge) certbot renew --dns-cloudflare ``` ### Tailscale Maintenance ```bash # Check Tailscale status tailscale status # Update Tailscale tailscale update # Check for connectivity issues tailscale netcheck ``` --- ## ๐Ÿ“Š Monitoring Maintenance ### Prometheus ```bash # Check Prometheus targets curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}' # Clean old data (if needed) # Prometheus auto-cleans based on retention settings # Reload configuration curl -X POST http://localhost:9090/-/reload ``` ### Grafana ```bash # Backup Grafana dashboards docker exec grafana grafana-cli admin data-export /var/lib/grafana/dashboards-backup # Check datasource health curl -s http://admin:$GRAFANA_PASSWORD@localhost:3000/api/datasources | jq '.[].name' ``` --- ## ๐Ÿ”„ Update Procedures ### Safe Update Process ```bash # 1. Check current state docker ps -a # 2. Backup critical data ./backup-script.sh # 3. Pull new images docker-compose pull # 4. Stop services gracefully docker-compose down # 5. Start updated services docker-compose up -d # 6. Verify health docker ps docker logs --tail 50 # 7. Monitor for issues # Watch logs for 15-30 minutes ``` ### Rollback Procedure ```bash # If update fails, rollback: # 1. Stop broken containers docker-compose down # 2. Find previous image docker images | grep # 3. Update docker-compose.yml to use old tag # image: service:1.2.3 # Instead of :latest # 4. Restart docker-compose up -d ``` --- ## ๐Ÿงน Cleanup Scripts ### Weekly Cleanup Script ```bash #!/bin/bash # weekly-cleanup.sh echo "=== Weekly Maintenance $(date) ===" # Docker cleanup echo "Cleaning Docker..." docker system prune -f docker volume prune -f # Log cleanup echo "Cleaning logs..." find /var/log -name "*.gz" -mtime +30 -delete find /volume1/docker -name "*.log" -size +100M -exec truncate -s 0 {} \; # Temp file cleanup echo "Cleaning temp files..." find /tmp -type f -mtime +7 -delete 2>/dev/null # Report disk space echo "Disk space:" df -h | grep volume echo "=== Cleanup Complete ===" ``` ### Schedule with Cron ```bash # /etc/crontab # Weekly cleanup - Sundays at 3 AM 0 3 * * 0 root /volume1/scripts/weekly-cleanup.sh >> /var/log/maintenance.log 2>&1 # Monthly maintenance - 1st of month at 2 AM 0 2 1 * * root /volume1/scripts/monthly-maintenance.sh >> /var/log/maintenance.log 2>&1 ``` --- ## ๐Ÿ“‹ Maintenance Checklist Template ```markdown ## Weekly Maintenance - [DATE] ### Pre-Maintenance - [ ] Notify family of potential downtime - [ ] Check current backups are recent - [ ] Review any open issues ### Docker - [ ] Review Watchtower update report - [ ] Check for unhealthy containers - [ ] Prune unused resources ### Storage - [ ] Check disk space (>20% free) - [ ] Review large files/logs - [ ] Verify RAID health ### Network - [ ] Check DNS resolution - [ ] Verify Tailscale connectivity - [ ] Check SSL certificates ### Monitoring - [ ] Review Prometheus alerts - [ ] Check Grafana dashboards - [ ] Verify Uptime Kuma status ### Post-Maintenance - [ ] Document any changes made - [ ] Update maintenance log - [ ] Test critical services ``` --- ## ๐Ÿ”— Related Documentation - [Backup Strategies](backup-strategies.md) - [Monitoring Setup](monitoring.md) - [Performance Troubleshooting](../troubleshooting/performance.md) - [Disaster Recovery](../troubleshooting/disaster-recovery.md)