Files
homelab-optimized/docs/admin/maintenance-schedule.md
Gitea Mirror Bot 0de1e1dcd6
Some checks failed
Documentation / Build Docusaurus (push) Failing after 4m59s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-21 07:03:28 UTC
2026-03-21 07:03:28 +00:00

5.7 KiB

Maintenance Calendar & Schedule

Homelab maintenance schedule and recurring tasks


Overview

This document outlines the maintenance schedule for the homelab infrastructure. Following this calendar ensures service reliability, security, and optimal performance.


Daily Tasks (Automated)

Task Time Command/Tool Owner
Container updates 02:00 Watchtower Automated
Backup verification 03:00 Ansible Automated
Health checks Every 15min Prometheus Automated
Alert notifications Real-time Alertmanager Automated

Manual Daily Checks

  • Review ntfy alerts
  • Check Grafana dashboards for issues
  • Verify Uptime Kuma status page

Weekly Tasks

Sunday - Maintenance Day

Time Task Duration Notes
Morning Review Watchtower updates 30 min Check what's new
Mid-day Check disk usage 15 min All hosts
Afternoon Test backup restoration 1 hour Critical services only
Evening Review logs for errors 30 min Focus on alerts

Weekly Automation

# Run Ansible health check
ansible-playbook ansible/automation/playbooks/health_check.yml

# Generate disk usage report
ansible-playbook ansible/automation/playbooks/disk_usage_report.yml

# Check certificate expiration
ansible-playbook ansible/automation/playbooks/certificate_renewal.yml --check

Monthly Tasks

First Sunday of Month

Task Duration Notes
Security audit 1 hour Run security audit playbook
Docker cleanup 30 min Prune unused images/containers
Update documentation 1 hour Review and update docs
Review monitoring thresholds 30 min Adjust if needed
Check SSL certificates 15 min Manual review

Monthly Commands

# Security audit
ansible-playbook ansible/automation/playbooks/security_audit.yml

# Docker cleanup (all hosts)
ansible-playbook ansible/automation/playbooks/prune_containers.yml

# Log rotation check
ansible-playbook ansible/automation/playbooks/log_rotation.yml

# Full backup of configs
ansible-playbook ansible/automation/playbooks/backup_configs.yml

Quarterly Tasks

Month Start: January, April, July, October

Week Task Duration
Week 1 Disaster recovery test 2 hours
Week 2 Infrastructure review 2 hours
Week 3 Performance optimization 2 hours
Week 4 Documentation refresh 1 hour

Quarterly Checklist

  • Disaster Recovery Test

    • Restore a critical service from backup
    • Verify backup integrity
    • Document recovery time
  • Infrastructure Review

    • Review resource usage trends
    • Plan capacity upgrades
    • Evaluate new services
  • Performance Optimization

    • Tune Prometheus queries
    • Optimize Docker configurations
    • Review network performance
  • Documentation Refresh

    • Update runbooks
    • Verify links work
    • Update service inventory

Annual Tasks

Month Task Notes
January Year in review Review uptime, incidents
April Spring cleaning Deprecate unused services
July Mid-year capacity check Plan for growth
October Pre-holiday review Ensure stability

Annual Checklist

  • Annual uptime report
  • Hardware inspection
  • Cost/energy analysis
  • Security posture review
  • Disaster recovery drill (full)
  • Backup strategy review

Service-Specific Maintenance

Critical Services (Weekly)

Service Task Command
Authentik Verify SSO flows Manual login test
NPM Check proxy hosts UI review
Prometheus Verify metrics Query test
Vaultwarden Test backup Export/import test

Media Services (Monthly)

Service Task Notes
Plex Library analysis Check for issues
Sonarr/Radarr RSS sync test Verify downloads
Immich Backup verification Test restore

Network Services (Monthly)

Service Task Notes
Pi-hole Filter list update Check for updates
AdGuard Query log review Look for issues
WireGuard Check connections Active peers

Maintenance Windows

Standard Window

  • Day: Sunday
  • Time: 02:00 - 06:00 UTC
  • Notification: 24 hours advance notice

Emergency Window

  • Trigger: Critical security vulnerability
  • Time: As needed
  • Notification: ntfy alert

Automation Schedule

Cron Jobs (Homelab VM)

# Daily health checks
0 * * * * /opt/scripts/health_check.sh

# Hourly container stats
0 * * * * /opt/scripts/container_stats.sh

# Weekly backup
0 3 * * 0 /opt/scripts/backup.sh

Ansible Tower/Pencil (if configured)

  • Nightly: Container updates
  • Weekly: Full system audit
  • Monthly: Security scan

Incident Response During Maintenance

If an incident occurs during maintenance:

  1. Pause maintenance if service is impacted
  2. Document issue in incident log
  3. Resolve or rollback depending on severity
  4. Resume once stable
  5. Post-incident review within 48 hours

Checklist Template

Pre-Maintenance

  • Notify users (if needed)
  • Verify backups current
  • Document current state
  • Prepare rollback plan

During Maintenance

  • Monitor alerts
  • Document changes
  • Test incrementally

Post-Maintenance

  • Verify all services running
  • Check monitoring
  • Test critical paths
  • Update documentation
  • Close ticket