🏠 Homelab Ansible Playbooks
Comprehensive automation playbooks for managing your homelab infrastructure. These playbooks provide operational automation beyond the existing health monitoring and system management.
📋 Quick Reference
| Category | Playbook | Purpose | Priority |
|---|---|---|---|
| Service Management | service_status.yml |
Get status of all services | ⭐⭐⭐ |
restart_service.yml |
Restart services with dependencies | ⭐⭐⭐ | |
container_logs.yml |
Collect logs for troubleshooting | ⭐⭐⭐ | |
| Backup & Recovery | backup_databases.yml |
Automated database backups | ⭐⭐⭐ |
backup_configs.yml |
Configuration and data backups | ⭐⭐⭐ | |
disaster_recovery_test.yml |
Test DR procedures | ⭐⭐ | |
| Storage Management | disk_usage_report.yml |
Monitor storage usage | ⭐⭐⭐ |
prune_containers.yml |
Clean up Docker resources | ⭐⭐ | |
log_rotation.yml |
Manage log files | ⭐⭐ | |
| Security | security_updates.yml |
Automated security patches | ⭐⭐⭐ |
certificate_renewal.yml |
SSL certificate management | ⭐⭐ | |
| Monitoring | service_health_deep.yml |
Comprehensive health checks | ⭐⭐ |
🚀 Quick Start
Prerequisites
- Ansible 2.12+
- SSH access to all hosts via Tailscale
- Existing inventory from
/home/homelab/organized/repos/homelab/ansible/automation/hosts.ini
Run Your First Playbook
cd /home/homelab/organized/repos/homelab/ansible/automation
# Check status of all services
ansible-playbook playbooks/service_status.yml
# Check disk usage across all hosts
ansible-playbook playbooks/disk_usage_report.yml
# Backup all databases
ansible-playbook playbooks/backup_databases.yml
📦 Service Management Playbooks
service_status.yml - Service Status Check
Get comprehensive status of all services across your homelab.
# Check all hosts
ansible-playbook playbooks/service_status.yml
# Check specific host
ansible-playbook playbooks/service_status.yml --limit atlantis
# Generate JSON reports
ansible-playbook playbooks/service_status.yml
# Reports saved to: /tmp/HOSTNAME_status_TIMESTAMP.json
Features:
- System resource usage
- Container status and health
- Critical service monitoring
- Network connectivity checks
- JSON output for automation
restart_service.yml - Service Restart with Dependencies
Restart services with proper dependency handling and health checks.
# Restart a service
ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"
# Restart with custom wait time
ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"
# Force restart if graceful stop fails
ansible-playbook playbooks/restart_service.yml -e "service_name=problematic-service force_restart=true"
Features:
- Dependency-aware restart order
- Health check validation
- Graceful stop with force option
- Pre/post restart logging
- Service-specific wait times
container_logs.yml - Log Collection
Collect logs from multiple containers for troubleshooting.
# Collect logs for specific service
ansible-playbook playbooks/container_logs.yml -e "service_name=plex"
# Collect logs matching pattern
ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"
# Collect all container logs
ansible-playbook playbooks/container_logs.yml -e "collect_all=true"
# Custom log parameters
ansible-playbook playbooks/container_logs.yml -e "service_name=plex log_lines=500 log_since=2h"
Features:
- Pattern-based container selection
- Error analysis and counting
- Resource usage reporting
- Structured log organization
- Archive option for long-term storage
💾 Backup & Recovery Playbooks
backup_databases.yml - Database Backup Automation
Automated backup of all PostgreSQL and MySQL databases.
# Backup all databases
ansible-playbook playbooks/backup_databases.yml
# Full backup with verification
ansible-playbook playbooks/backup_databases.yml -e "backup_type=full verify_backups=true"
# Specific host backup
ansible-playbook playbooks/backup_databases.yml --limit atlantis
# Custom retention
ansible-playbook playbooks/backup_databases.yml -e "backup_retention_days=60"
Supported Databases:
- Atlantis: Immich, Vaultwarden, Joplin, Firefly
- Calypso: Authentik, Paperless
- Homelab VM: Mastodon, Matrix
Features:
- Automatic database discovery
- Compression and verification
- Retention management
- Backup integrity testing
- Multiple storage locations
backup_configs.yml - Configuration Backup
Backup docker-compose files, configs, and important data.
# Backup configurations
ansible-playbook playbooks/backup_configs.yml
# Include secrets (use with caution)
ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"
# Backup without compression
ansible-playbook playbooks/backup_configs.yml -e "compress_backups=false"
Backup Includes:
- Docker configurations
- SSH configurations
- Service-specific data
- System information snapshots
- Docker-compose files
disaster_recovery_test.yml - DR Testing
Test disaster recovery procedures and validate backup integrity.
# Basic DR test (dry run)
ansible-playbook playbooks/disaster_recovery_test.yml
# Full DR test with restore validation
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full dry_run=false"
# Test with failover procedures
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_failover=true"
Test Components:
- Backup validation and integrity
- Database restore testing
- RTO (Recovery Time Objective) analysis
- Service failover procedures
- DR readiness scoring
💿 Storage Management Playbooks
disk_usage_report.yml - Storage Monitoring
Monitor storage usage and generate comprehensive reports.
# Basic disk usage report
ansible-playbook playbooks/disk_usage_report.yml
# Detailed analysis with performance data
ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true include_performance=true"
# Set custom alert thresholds
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=90 warning_threshold=80"
# Send alerts for critical usage
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true"
Features:
- Filesystem usage monitoring
- Docker storage analysis
- Large file identification
- Temporary file analysis
- Alert thresholds and notifications
- JSON output for automation
prune_containers.yml - Docker Cleanup
Clean up unused containers, images, volumes, and networks.
# Basic cleanup (dry run)
ansible-playbook playbooks/prune_containers.yml
# Live cleanup
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
# Aggressive cleanup (removes old images)
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
# Custom retention and log cleanup
ansible-playbook playbooks/prune_containers.yml -e "keep_images_days=14 cleanup_logs=true max_log_size=50m"
Cleanup Actions:
- Remove stopped containers
- Remove dangling images
- Remove unused volumes (optional)
- Remove unused networks
- Truncate large container logs
- System-wide Docker prune
log_rotation.yml - Log Management
Manage log files across all services and system components.
# Basic log rotation (dry run)
ansible-playbook playbooks/log_rotation.yml
# Live log rotation with compression
ansible-playbook playbooks/log_rotation.yml -e "dry_run=false compress_old_logs=true"
# Aggressive cleanup
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true max_log_age_days=14"
# Custom log size limits
ansible-playbook playbooks/log_rotation.yml -e "max_log_size=50M"
Log Management:
- System log rotation
- Docker container log truncation
- Application log cleanup
- Log compression
- Retention policies
- Logrotate configuration
🔒 Security Playbooks
security_updates.yml - Automated Security Updates
Apply security patches and system updates.
# Security updates only
ansible-playbook playbooks/security_updates.yml
# Security updates with reboot if needed
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
# Full system update
ansible-playbook playbooks/security_updates.yml -e "security_only=false"
# Include Docker updates
ansible-playbook playbooks/security_updates.yml -e "update_docker=true"
Features:
- Security-only or full updates
- Pre-update configuration backup
- Kernel update detection
- Automatic reboot handling
- Service verification after updates
- Update reporting and logging
certificate_renewal.yml - SSL Certificate Management
Manage Let's Encrypt certificates and other SSL certificates.
# Check certificate status
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
# Renew certificates
ansible-playbook playbooks/certificate_renewal.yml
# Force renewal
ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"
# Custom renewal threshold
ansible-playbook playbooks/certificate_renewal.yml -e "renewal_threshold_days=45"
Certificate Support:
- Let's Encrypt via Certbot
- Nginx Proxy Manager certificates
- Traefik certificates
- Synology DSM certificates
🏥 Monitoring Playbooks
service_health_deep.yml - Comprehensive Health Checks
Deep health monitoring for all homelab services.
# Deep health check
ansible-playbook playbooks/service_health_deep.yml
# Include performance metrics
ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"
# Enable alerting
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
# Custom timeout
ansible-playbook playbooks/service_health_deep.yml -e "health_check_timeout=60"
Health Checks:
- Container health status
- Service endpoint testing
- Database connectivity
- Redis connectivity
- System performance metrics
- Log error analysis
- Dependency validation
🔧 Advanced Usage
Combining Playbooks
# Complete maintenance routine
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/backup_databases.yml
ansible-playbook playbooks/security_updates.yml
ansible-playbook playbooks/disk_usage_report.yml
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
Scheduling with Cron
# Add to crontab for automated execution
# Daily backups at 2 AM
0 2 * * * cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/backup_databases.yml
# Weekly cleanup on Sundays at 3 AM
0 3 * * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
# Monthly DR test on first Sunday at 4 AM
0 4 1-7 * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/disaster_recovery_test.yml
Custom Variables
Create host-specific variable files:
# host_vars/atlantis.yml
backup_retention_days: 60
max_log_size: "200M"
alert_threshold: 90
# host_vars/homelab_vm.yml
security_only: false
reboot_if_required: true
📊 Monitoring and Alerting
Integration with Existing Monitoring
These playbooks integrate with your existing Prometheus/Grafana stack:
# Generate metrics for Prometheus
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/disk_usage_report.yml
# JSON outputs can be parsed by monitoring systems
# Reports saved to /tmp/ directories with timestamps
Alert Configuration
# Enable alerts in playbooks
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true alert_threshold=85"
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
ansible-playbook playbooks/disaster_recovery_test.yml -e "send_alerts=true"
🚨 Emergency Procedures
Service Recovery
# Quick service restart
ansible-playbook playbooks/restart_service.yml -e "service_name=SERVICE_NAME host_target=HOST"
# Collect logs for troubleshooting
ansible-playbook playbooks/container_logs.yml -e "service_name=SERVICE_NAME"
# Check service health
ansible-playbook playbooks/service_health_deep.yml --limit HOST
Storage Emergency
# Check disk usage immediately
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=95"
# Emergency cleanup
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true dry_run=false"
Security Incident
# Apply security updates immediately
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
# Check certificate status
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
🔍 Troubleshooting
Common Issues
Playbook Fails with Permission Denied
# Check SSH connectivity
ansible all -m ping
# Verify sudo access
ansible all -m shell -a "sudo whoami" --become
Docker Commands Fail
# Check Docker daemon status
ansible-playbook playbooks/service_status.yml --limit HOSTNAME
# Verify Docker group membership
ansible HOST -m shell -a "groups $USER"
Backup Failures
# Check backup directory permissions
ansible HOST -m file -a "path=/volume1/backups state=directory" --become
# Test database connectivity
ansible-playbook playbooks/service_health_deep.yml --limit HOST
Debug Mode
# Run with verbose output
ansible-playbook playbooks/PLAYBOOK.yml -vvv
# Check specific tasks
ansible-playbook playbooks/PLAYBOOK.yml --list-tasks
ansible-playbook playbooks/PLAYBOOK.yml --start-at-task="TASK_NAME"
📚 Integration with Existing Automation
These playbooks complement your existing automation:
With Current Health Monitoring
# Existing health checks
ansible-playbook playbooks/synology_health.yml
ansible-playbook playbooks/check_apt_proxy.yml
# New comprehensive checks
ansible-playbook playbooks/service_health_deep.yml
ansible-playbook playbooks/disk_usage_report.yml
With GitOps Deployment
# After GitOps deployment
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/backup_configs.yml
🎯 Best Practices
Regular Maintenance Schedule
- Daily:
backup_databases.yml - Weekly:
security_updates.yml,disk_usage_report.yml - Monthly:
disaster_recovery_test.yml,prune_containers.yml - As Needed:
service_health_deep.yml,restart_service.yml
Safety Guidelines
- Always test with
dry_run=truefirst - Use
--limitfor single host testing - Keep backups before major changes
- Monitor service status after automation
Performance Optimization
- Run resource-intensive playbooks during low-usage hours
- Use
--forksto control parallelism - Monitor system resources during execution
📞 Support
For issues with these playbooks:
- Check the troubleshooting section above
- Review playbook logs in
/tmp/directories - Use debug mode (
-vvv) for detailed output - Verify integration with existing automation
Last Updated: {{ ansible_date_time.date if ansible_date_time is defined else 'Manual Update Required' }}
Total Playbooks: 10+ comprehensive automation playbooks
Coverage: Complete operational automation for homelab management