Files
homelab-optimized/docs/admin/maintenance.md
Gitea Mirror Bot 2931009345
Some checks failed
Documentation / Build Docusaurus (push) Failing after 8s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-16 10:55:35 UTC
2026-03-16 10:55:35 +00:00

7.8 KiB

🔧 Maintenance Guide

Overview

This guide covers routine maintenance tasks to keep the homelab running smoothly, including updates, cleanup, and health checks.


📅 Maintenance Schedule

Daily (Automated)

  • Database backups
  • Log rotation
  • Container health checks
  • Certificate monitoring

Weekly

  • Review container updates (Watchtower reports)
  • Check disk space across all hosts
  • Review monitoring alerts
  • Verify backup integrity

Monthly

  • Apply container updates
  • DSM/Proxmox security updates
  • Review and prune unused Docker resources
  • Test backup restoration
  • Review access logs for anomalies

Quarterly

  • Full system health audit
  • Review and update documentation
  • Capacity planning review
  • Security audit
  • Test disaster recovery procedures

🐳 Docker Maintenance

Container Updates

# Check for available updates
docker images --format "{{.Repository}}:{{.Tag}}" | while read img; do
  docker pull "$img" 2>/dev/null && echo "Updated: $img"
done

# Or use Watchtower for automated updates
docker run -d \
  --name watchtower \
  -v /var/run/docker.sock:/var/run/docker.sock \
  containrrr/watchtower \
  --schedule "0 4 * * 0" \  # Sundays at 4 AM
  --cleanup

Prune Unused Resources

# Remove stopped containers
docker container prune -f

# Remove unused images
docker image prune -a -f

# Remove unused volumes (CAREFUL!)
docker volume prune -f

# Remove unused networks
docker network prune -f

# All-in-one cleanup
docker system prune -a --volumes -f

# Check space recovered
docker system df

Container Health Checks

# Check all container statuses
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

# Find unhealthy containers
docker ps --filter "health=unhealthy"

# Restart unhealthy containers
docker ps --filter "health=unhealthy" -q | xargs -r docker restart

# Check container logs for errors
for c in $(docker ps -q); do
  echo "=== $(docker inspect --format '{{.Name}}' $c) ==="
  docker logs "$c" --tail 20 2>&1 | grep -i "error\|warn\|fail" || echo "No issues"
done

💾 Storage Maintenance

Disk Space Monitoring

# Check disk usage on all volumes
df -h | grep -E "^/dev|volume"

# Find large files
find /volume1/docker -type f -size +1G -exec ls -lh {} \;

# Find old log files
find /volume1 -name "*.log" -mtime +30 -size +100M

# Check Docker disk usage
docker system df -v

Log Management

# Truncate large container logs
for log in $(find /var/lib/docker/containers -name "*-json.log" -size +100M); do
  echo "Truncating: $log"
  truncate -s 0 "$log"
done

# Configure log rotation in docker-compose
services:
  myservice:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

Database Maintenance

# PostgreSQL vacuum and analyze
docker exec postgres psql -U postgres -c "VACUUM ANALYZE;"

# PostgreSQL reindex
docker exec postgres psql -U postgres -c "REINDEX DATABASE postgres;"

# Check database size
docker exec postgres psql -U postgres -c "
SELECT pg_database.datname, 
       pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) DESC;"

🖥️ Synology Maintenance

DSM Updates

# Check for updates via CLI
synoupgrade --check

# Or via DSM UI:
# Control Panel > Update & Restore > DSM Update

Storage Health

# Check RAID status
cat /proc/mdstat

# Check disk health
syno_hdd_util --all

# Check for bad sectors
smartctl -a /dev/sda | grep -E "Reallocated|Current_Pending"

Package Updates

# List installed packages
synopkg list --name

# Update all packages
synopkg update_all

Index Optimization

# Rebuild media index (if slow)
synoindex -R /volume1/media

# Or via DSM:
# Control Panel > Indexing Service > Re-index

🌐 Network Maintenance

DNS Cache

# Flush Pi-hole DNS cache
docker exec pihole pihole restartdns

# Check DNS resolution
dig @localhost google.com

# Check Pi-hole stats
docker exec pihole pihole -c -e

Certificate Renewal

# Check certificate expiry
echo | openssl s_client -servername example.com -connect example.com:443 2>/dev/null | \
  openssl x509 -noout -dates

# Force Let's Encrypt renewal (NPM)
# Login to NPM UI > SSL Certificates > Renew

# Wildcard cert renewal (if using DNS challenge)
certbot renew --dns-cloudflare

Tailscale Maintenance

# Check Tailscale status
tailscale status

# Update Tailscale
tailscale update

# Check for connectivity issues
tailscale netcheck

📊 Monitoring Maintenance

Prometheus

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | {job: .labels.job, health: .health}'

# Clean old data (if needed)
# Prometheus auto-cleans based on retention settings

# Reload configuration
curl -X POST http://localhost:9090/-/reload

Grafana

# Backup Grafana dashboards
docker exec grafana grafana-cli admin data-export /var/lib/grafana/dashboards-backup

# Check datasource health
curl -s http://admin:$GRAFANA_PASSWORD@localhost:3000/api/datasources | jq '.[].name'

🔄 Update Procedures

Safe Update Process

# 1. Check current state
docker ps -a

# 2. Backup critical data
./backup-script.sh

# 3. Pull new images
docker-compose pull

# 4. Stop services gracefully
docker-compose down

# 5. Start updated services
docker-compose up -d

# 6. Verify health
docker ps
docker logs <container> --tail 50

# 7. Monitor for issues
# Watch logs for 15-30 minutes

Rollback Procedure

# If update fails, rollback:

# 1. Stop broken containers
docker-compose down

# 2. Find previous image
docker images | grep <service>

# 3. Update docker-compose.yml to use old tag
# image: service:1.2.3  # Instead of :latest

# 4. Restart
docker-compose up -d

🧹 Cleanup Scripts

Weekly Cleanup Script

#!/bin/bash
# weekly-cleanup.sh

echo "=== Weekly Maintenance $(date) ==="

# Docker cleanup
echo "Cleaning Docker..."
docker system prune -f
docker volume prune -f

# Log cleanup
echo "Cleaning logs..."
find /var/log -name "*.gz" -mtime +30 -delete
find /volume1/docker -name "*.log" -size +100M -exec truncate -s 0 {} \;

# Temp file cleanup
echo "Cleaning temp files..."
find /tmp -type f -mtime +7 -delete 2>/dev/null

# Report disk space
echo "Disk space:"
df -h | grep volume

echo "=== Cleanup Complete ==="

Schedule with Cron

# /etc/crontab
# Weekly cleanup - Sundays at 3 AM
0 3 * * 0 root /volume1/scripts/weekly-cleanup.sh >> /var/log/maintenance.log 2>&1

# Monthly maintenance - 1st of month at 2 AM
0 2 1 * * root /volume1/scripts/monthly-maintenance.sh >> /var/log/maintenance.log 2>&1

📋 Maintenance Checklist Template

## Weekly Maintenance - [DATE]

### Pre-Maintenance
- [ ] Notify family of potential downtime
- [ ] Check current backups are recent
- [ ] Review any open issues

### Docker
- [ ] Review Watchtower update report
- [ ] Check for unhealthy containers
- [ ] Prune unused resources

### Storage
- [ ] Check disk space (>20% free)
- [ ] Review large files/logs
- [ ] Verify RAID health

### Network
- [ ] Check DNS resolution
- [ ] Verify Tailscale connectivity
- [ ] Check SSL certificates

### Monitoring
- [ ] Review Prometheus alerts
- [ ] Check Grafana dashboards
- [ ] Verify Uptime Kuma status

### Post-Maintenance
- [ ] Document any changes made
- [ ] Update maintenance log
- [ ] Test critical services