Files
homelab-optimized/ansible/automation/playbooks/README.md
Gitea Mirror Bot fb88e1b6d4
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m1s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-27 11:43:05 UTC
2026-03-27 11:43:05 +00:00

15 KiB

🏠 Homelab Ansible Playbooks

Comprehensive automation playbooks for managing your homelab infrastructure. These playbooks provide operational automation beyond the existing health monitoring and system management.

📋 Quick Reference

Category Playbook Purpose Priority
Service Management service_status.yml Get status of all services
restart_service.yml Restart services with dependencies
container_logs.yml Collect logs for troubleshooting
Backup & Recovery backup_databases.yml Automated database backups
backup_configs.yml Configuration and data backups
disaster_recovery_test.yml Test DR procedures
Storage Management disk_usage_report.yml Monitor storage usage
prune_containers.yml Clean up Docker resources
log_rotation.yml Manage log files
Security security_updates.yml Automated security patches
certificate_renewal.yml SSL certificate management
Monitoring service_health_deep.yml Comprehensive health checks

🚀 Quick Start

Prerequisites

  • Ansible 2.12+
  • SSH access to all hosts via Tailscale
  • Existing inventory from /home/homelab/organized/repos/homelab/ansible/automation/hosts.ini

Run Your First Playbook

cd /home/homelab/organized/repos/homelab/ansible/automation

# Check status of all services
ansible-playbook playbooks/service_status.yml

# Check disk usage across all hosts
ansible-playbook playbooks/disk_usage_report.yml

# Backup all databases
ansible-playbook playbooks/backup_databases.yml

📦 Service Management Playbooks

service_status.yml - Service Status Check

Get comprehensive status of all services across your homelab.

# Check all hosts
ansible-playbook playbooks/service_status.yml

# Check specific host
ansible-playbook playbooks/service_status.yml --limit atlantis

# Generate JSON reports
ansible-playbook playbooks/service_status.yml
# Reports saved to: /tmp/HOSTNAME_status_TIMESTAMP.json

Features:

  • System resource usage
  • Container status and health
  • Critical service monitoring
  • Network connectivity checks
  • JSON output for automation

restart_service.yml - Service Restart with Dependencies

Restart services with proper dependency handling and health checks.

# Restart a service
ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"

# Restart with custom wait time
ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"

# Force restart if graceful stop fails
ansible-playbook playbooks/restart_service.yml -e "service_name=problematic-service force_restart=true"

Features:

  • Dependency-aware restart order
  • Health check validation
  • Graceful stop with force option
  • Pre/post restart logging
  • Service-specific wait times

container_logs.yml - Log Collection

Collect logs from multiple containers for troubleshooting.

# Collect logs for specific service
ansible-playbook playbooks/container_logs.yml -e "service_name=plex"

# Collect logs matching pattern
ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"

# Collect all container logs
ansible-playbook playbooks/container_logs.yml -e "collect_all=true"

# Custom log parameters
ansible-playbook playbooks/container_logs.yml -e "service_name=plex log_lines=500 log_since=2h"

Features:

  • Pattern-based container selection
  • Error analysis and counting
  • Resource usage reporting
  • Structured log organization
  • Archive option for long-term storage

💾 Backup & Recovery Playbooks

backup_databases.yml - Database Backup Automation

Automated backup of all PostgreSQL and MySQL databases.

# Backup all databases
ansible-playbook playbooks/backup_databases.yml

# Full backup with verification
ansible-playbook playbooks/backup_databases.yml -e "backup_type=full verify_backups=true"

# Specific host backup
ansible-playbook playbooks/backup_databases.yml --limit atlantis

# Custom retention
ansible-playbook playbooks/backup_databases.yml -e "backup_retention_days=60"

Supported Databases:

  • Atlantis: Immich, Vaultwarden, Joplin, Firefly
  • Calypso: Authentik, Paperless
  • Homelab VM: Mastodon, Matrix

Features:

  • Automatic database discovery
  • Compression and verification
  • Retention management
  • Backup integrity testing
  • Multiple storage locations

backup_configs.yml - Configuration Backup

Backup docker-compose files, configs, and important data.

# Backup configurations
ansible-playbook playbooks/backup_configs.yml

# Include secrets (use with caution)
ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"

# Backup without compression
ansible-playbook playbooks/backup_configs.yml -e "compress_backups=false"

Backup Includes:

  • Docker configurations
  • SSH configurations
  • Service-specific data
  • System information snapshots
  • Docker-compose files

disaster_recovery_test.yml - DR Testing

Test disaster recovery procedures and validate backup integrity.

# Basic DR test (dry run)
ansible-playbook playbooks/disaster_recovery_test.yml

# Full DR test with restore validation
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full dry_run=false"

# Test with failover procedures
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_failover=true"

Test Components:

  • Backup validation and integrity
  • Database restore testing
  • RTO (Recovery Time Objective) analysis
  • Service failover procedures
  • DR readiness scoring

💿 Storage Management Playbooks

disk_usage_report.yml - Storage Monitoring

Monitor storage usage and generate comprehensive reports.

# Basic disk usage report
ansible-playbook playbooks/disk_usage_report.yml

# Detailed analysis with performance data
ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true include_performance=true"

# Set custom alert thresholds
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=90 warning_threshold=80"

# Send alerts for critical usage
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true"

Features:

  • Filesystem usage monitoring
  • Docker storage analysis
  • Large file identification
  • Temporary file analysis
  • Alert thresholds and notifications
  • JSON output for automation

prune_containers.yml - Docker Cleanup

Clean up unused containers, images, volumes, and networks.

# Basic cleanup (dry run)
ansible-playbook playbooks/prune_containers.yml

# Live cleanup
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"

# Aggressive cleanup (removes old images)
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"

# Custom retention and log cleanup
ansible-playbook playbooks/prune_containers.yml -e "keep_images_days=14 cleanup_logs=true max_log_size=50m"

Cleanup Actions:

  • Remove stopped containers
  • Remove dangling images
  • Remove unused volumes (optional)
  • Remove unused networks
  • Truncate large container logs
  • System-wide Docker prune

log_rotation.yml - Log Management

Manage log files across all services and system components.

# Basic log rotation (dry run)
ansible-playbook playbooks/log_rotation.yml

# Live log rotation with compression
ansible-playbook playbooks/log_rotation.yml -e "dry_run=false compress_old_logs=true"

# Aggressive cleanup
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true max_log_age_days=14"

# Custom log size limits
ansible-playbook playbooks/log_rotation.yml -e "max_log_size=50M"

Log Management:

  • System log rotation
  • Docker container log truncation
  • Application log cleanup
  • Log compression
  • Retention policies
  • Logrotate configuration

🔒 Security Playbooks

security_updates.yml - Automated Security Updates

Apply security patches and system updates.

# Security updates only
ansible-playbook playbooks/security_updates.yml

# Security updates with reboot if needed
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"

# Full system update
ansible-playbook playbooks/security_updates.yml -e "security_only=false"

# Include Docker updates
ansible-playbook playbooks/security_updates.yml -e "update_docker=true"

Features:

  • Security-only or full updates
  • Pre-update configuration backup
  • Kernel update detection
  • Automatic reboot handling
  • Service verification after updates
  • Update reporting and logging

certificate_renewal.yml - SSL Certificate Management

Manage Let's Encrypt certificates and other SSL certificates.

# Check certificate status
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"

# Renew certificates
ansible-playbook playbooks/certificate_renewal.yml

# Force renewal
ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"

# Custom renewal threshold
ansible-playbook playbooks/certificate_renewal.yml -e "renewal_threshold_days=45"

Certificate Support:

  • Let's Encrypt via Certbot
  • Nginx Proxy Manager certificates
  • Traefik certificates
  • Synology DSM certificates

🏥 Monitoring Playbooks

service_health_deep.yml - Comprehensive Health Checks

Deep health monitoring for all homelab services.

# Deep health check
ansible-playbook playbooks/service_health_deep.yml

# Include performance metrics
ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"

# Enable alerting
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"

# Custom timeout
ansible-playbook playbooks/service_health_deep.yml -e "health_check_timeout=60"

Health Checks:

  • Container health status
  • Service endpoint testing
  • Database connectivity
  • Redis connectivity
  • System performance metrics
  • Log error analysis
  • Dependency validation

🔧 Advanced Usage

Combining Playbooks

# Complete maintenance routine
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/backup_databases.yml
ansible-playbook playbooks/security_updates.yml
ansible-playbook playbooks/disk_usage_report.yml
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"

Scheduling with Cron

# Add to crontab for automated execution
# Daily backups at 2 AM
0 2 * * * cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/backup_databases.yml

# Weekly cleanup on Sundays at 3 AM
0 3 * * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"

# Monthly DR test on first Sunday at 4 AM
0 4 1-7 * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/disaster_recovery_test.yml

Custom Variables

Create host-specific variable files:

# host_vars/atlantis.yml
backup_retention_days: 60
max_log_size: "200M"
alert_threshold: 90

# host_vars/homelab_vm.yml
security_only: false
reboot_if_required: true

📊 Monitoring and Alerting

Integration with Existing Monitoring

These playbooks integrate with your existing Prometheus/Grafana stack:

# Generate metrics for Prometheus
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/disk_usage_report.yml

# JSON outputs can be parsed by monitoring systems
# Reports saved to /tmp/ directories with timestamps

Alert Configuration

# Enable alerts in playbooks
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true alert_threshold=85"
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
ansible-playbook playbooks/disaster_recovery_test.yml -e "send_alerts=true"

🚨 Emergency Procedures

Service Recovery

# Quick service restart
ansible-playbook playbooks/restart_service.yml -e "service_name=SERVICE_NAME host_target=HOST"

# Collect logs for troubleshooting
ansible-playbook playbooks/container_logs.yml -e "service_name=SERVICE_NAME"

# Check service health
ansible-playbook playbooks/service_health_deep.yml --limit HOST

Storage Emergency

# Check disk usage immediately
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=95"

# Emergency cleanup
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true dry_run=false"

Security Incident

# Apply security updates immediately
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"

# Check certificate status
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"

🔍 Troubleshooting

Common Issues

Playbook Fails with Permission Denied

# Check SSH connectivity
ansible all -m ping

# Verify sudo access
ansible all -m shell -a "sudo whoami" --become

Docker Commands Fail

# Check Docker daemon status
ansible-playbook playbooks/service_status.yml --limit HOSTNAME

# Verify Docker group membership
ansible HOST -m shell -a "groups $USER"

Backup Failures

# Check backup directory permissions
ansible HOST -m file -a "path=/volume1/backups state=directory" --become

# Test database connectivity
ansible-playbook playbooks/service_health_deep.yml --limit HOST

Debug Mode

# Run with verbose output
ansible-playbook playbooks/PLAYBOOK.yml -vvv

# Check specific tasks
ansible-playbook playbooks/PLAYBOOK.yml --list-tasks
ansible-playbook playbooks/PLAYBOOK.yml --start-at-task="TASK_NAME"

📚 Integration with Existing Automation

These playbooks complement your existing automation:

With Current Health Monitoring

# Existing health checks
ansible-playbook playbooks/synology_health.yml
ansible-playbook playbooks/check_apt_proxy.yml

# New comprehensive checks
ansible-playbook playbooks/service_health_deep.yml
ansible-playbook playbooks/disk_usage_report.yml

With GitOps Deployment

# After GitOps deployment
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/backup_configs.yml

🎯 Best Practices

Regular Maintenance Schedule

  • Daily: backup_databases.yml
  • Weekly: security_updates.yml, disk_usage_report.yml
  • Monthly: disaster_recovery_test.yml, prune_containers.yml
  • As Needed: service_health_deep.yml, restart_service.yml

Safety Guidelines

  • Always test with dry_run=true first
  • Use --limit for single host testing
  • Keep backups before major changes
  • Monitor service status after automation

Performance Optimization

  • Run resource-intensive playbooks during low-usage hours
  • Use --forks to control parallelism
  • Monitor system resources during execution

📞 Support

For issues with these playbooks:

  1. Check the troubleshooting section above
  2. Review playbook logs in /tmp/ directories
  3. Use debug mode (-vvv) for detailed output
  4. Verify integration with existing automation

Last Updated: {{ ansible_date_time.date if ansible_date_time is defined else 'Manual Update Required' }}
Total Playbooks: 10+ comprehensive automation playbooks
Coverage: Complete operational automation for homelab management