# Homelab Ansible Automation Suite ## Overview This automation suite provides comprehensive management capabilities for a distributed homelab infrastructure with Docker-enabled hosts. All playbooks have been tested across multiple hosts including homelab, pi-5, vish-concord-nuc, homeassistant, truenas-scale, and pve. ## 📁 Directory Structure ``` ansible/automation/ ├── playbooks/ │ ├── service_lifecycle/ │ │ ├── restart_service.yml # Restart services with health checks │ │ ├── service_status.yml # Comprehensive service status reports │ │ └── container_logs.yml # Docker container log collection │ ├── backup/ │ │ ├── backup_databases.yml # Database backup automation │ │ └── backup_configs.yml # Configuration backup automation │ └── monitoring/ │ ├── health_check.yml # System health monitoring │ ├── system_metrics.yml # Real-time metrics collection │ └── alert_check.yml # Infrastructure alerting system ├── hosts.ini # Inventory file with 10+ hosts └── AUTOMATION_SUMMARY.md # This documentation ``` ## 🚀 Service Lifecycle Management ### restart_service.yml **Purpose**: Safely restart services with pre/post health checks **Features**: - Multi-platform support (Linux systemd, Synology DSM, containers) - Pre-restart health validation - Graceful restart with configurable timeouts - Post-restart verification - Rollback capability on failure **Usage**: ```bash # Restart Docker across all hosts ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=docker" # Restart with custom timeout ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=nginx timeout=60" ``` ### service_status.yml **Purpose**: Generate comprehensive service status reports **Features**: - System resource monitoring (CPU, memory, disk, load) - Docker container status and health - Critical service verification - Network connectivity checks - Tailscale status monitoring - JSON report generation **Usage**: ```bash # Check all services across infrastructure ansible-playbook -i hosts.ini playbooks/service_status.yml # Check specific service on specific hosts ansible-playbook -i hosts.ini playbooks/service_status.yml --limit "homelab,pi-5" -e "service_name=docker" ``` ### container_logs.yml **Purpose**: Collect and analyze Docker container logs **Features**: - Multi-container log collection - Configurable log retention (lines/time) - Error pattern detection - Log compression and archival - Health status correlation **Usage**: ```bash # Collect logs from all containers ansible-playbook -i hosts.ini playbooks/container_logs.yml # Collect specific container logs ansible-playbook -i hosts.ini playbooks/container_logs.yml -e "container_name=nginx" ``` ## 💾 Backup Automation ### backup_databases.yml **Purpose**: Automated database backup across multiple database types **Features**: - Multi-database support (PostgreSQL, MySQL, MongoDB, Redis) - Automatic database discovery - Compression and encryption - Retention policy management - Backup verification - Remote storage support **Usage**: ```bash # Backup all databases ansible-playbook -i hosts.ini playbooks/backup_databases.yml # Backup with encryption ansible-playbook -i hosts.ini playbooks/backup_databases.yml -e "encrypt_backups=true" ``` ### backup_configs.yml **Purpose**: Configuration and data backup automation **Features**: - Docker compose file backup - Configuration directory archival - Service-specific data backup - Incremental backup support - Backup inventory tracking - Automated cleanup of old backups **Usage**: ```bash # Backup configurations ansible-playbook -i hosts.ini playbooks/backup_configs.yml # Include secrets in backup ansible-playbook -i hosts.ini playbooks/backup_configs.yml -e "include_secrets=true" ``` ## 📊 Monitoring & Alerting ### health_check.yml **Purpose**: Comprehensive system health monitoring **Features**: - System metrics collection (uptime, CPU, memory, disk) - Docker container health assessment - Critical service verification - Network connectivity testing - Tailscale status monitoring - JSON health reports - Alert integration for critical issues **Tested Results**: - ✅ homelab: 29/36 containers running, all services healthy - ✅ pi-5: 4/4 containers running, minimal resource usage - ✅ vish-concord-nuc: 19/19 containers running, 73% disk usage - ✅ homeassistant: 11/12 containers running, healthy - ✅ truenas-scale: 26/31 containers running, 1 unhealthy container **Usage**: ```bash # Health check across all hosts ansible-playbook -i hosts.ini playbooks/health_check.yml # Check specific host group ansible-playbook -i hosts.ini playbooks/health_check.yml --limit debian_clients ``` ### system_metrics.yml **Purpose**: Real-time system metrics collection **Features**: - Continuous metrics collection (CPU, memory, disk, network) - Docker container metrics - Configurable collection duration and intervals - CSV output format - Baseline system information capture - Asynchronous collection for minimal impact **Usage**: ```bash # Collect metrics for 60 seconds ansible-playbook -i hosts.ini playbooks/system_metrics.yml # Custom duration and interval ansible-playbook -i hosts.ini playbooks/system_metrics.yml -e "metrics_duration=300 collection_interval=10" ``` ### alert_check.yml **Purpose**: Infrastructure alerting and monitoring system **Features**: - Configurable alert thresholds (CPU, memory, disk, load) - Docker container health monitoring - Critical service status checking - Network connectivity verification - NTFY notification integration - Alert severity classification (critical, warning) - Comprehensive alert reporting **Usage**: ```bash # Run alert monitoring ansible-playbook -i hosts.ini playbooks/alert_check.yml # Test mode with notifications ansible-playbook -i hosts.ini playbooks/alert_check.yml -e "alert_mode=test" ``` ## 🏗️ Infrastructure Coverage ### Tested Hosts 1. **homelab** (Ubuntu 24.04) - Main development server 2. **pi-5** (Debian 12.13) - Raspberry Pi monitoring node 3. **vish-concord-nuc** (Ubuntu 24.04) - Home automation hub 4. **homeassistant** - Home Assistant OS 5. **truenas-scale** - TrueNAS Scale storage server 6. **pve** - Proxmox Virtual Environment ### Host Groups - `debian_clients`: Linux hosts with full Docker support - `synology`: Synology NAS devices - `rpi`: Raspberry Pi devices - `hypervisors`: Virtualization hosts - `active`: All active infrastructure hosts ## 🔧 Configuration ### Variables All playbooks support extensive customization through variables: ```yaml # Service management service_name: "docker" timeout: 30 restart_mode: "graceful" # Backup settings backup_retention_days: 30 compress_backups: true include_secrets: false # Monitoring metrics_duration: 60 collection_interval: 5 alert_mode: "production" # Alert thresholds cpu_warning: 80 cpu_critical: 95 memory_warning: 85 memory_critical: 95 ``` ### Inventory Configuration The `hosts.ini` file includes: - Tailscale IP addresses for secure communication - Custom SSH ports and users per host - Platform-specific configurations - Service management settings ## 📈 Performance Results ### Health Check Performance - Successfully monitors 6+ hosts simultaneously - Collects 15+ metrics per host - Generates detailed JSON reports - Completes in under 60 seconds ### Metrics Collection - Real-time CSV data collection - Minimal system impact (async execution) - Configurable collection intervals - Comprehensive Docker metrics ### Alert System - Detects critical issues across infrastructure - NTFY integration for notifications - Configurable alert thresholds - Comprehensive status reporting ## 🚀 Usage Examples ### Daily Health Check ```bash # Morning infrastructure health check ansible-playbook -i hosts.ini playbooks/health_check.yml --limit active ``` ### Weekly Backup ```bash # Weekly configuration backup ansible-playbook -i hosts.ini playbooks/backup_configs.yml -e "include_secrets=true" ``` ### Service Restart with Monitoring ```bash # Restart service with full monitoring ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=docker" ansible-playbook -i hosts.ini playbooks/health_check.yml --limit "{{ target_host }}" ``` ### Performance Monitoring ```bash # Collect 5-minute performance baseline ansible-playbook -i hosts.ini playbooks/system_metrics.yml -e "metrics_duration=300" ``` ## 🔮 Future Enhancements 1. **Automated Scheduling**: Cron job integration for regular execution 2. **Web Dashboard**: Real-time monitoring dashboard 3. **Advanced Alerting**: Integration with Slack, Discord, email 4. **Backup Verification**: Automated backup integrity testing 5. **Service Discovery**: Dynamic service detection and monitoring 6. **Performance Trending**: Historical metrics analysis 7. **Disaster Recovery**: Automated failover and recovery procedures ## 📝 Notes - All playbooks tested across heterogeneous infrastructure - Multi-platform support (Ubuntu, Debian, Synology, TrueNAS) - Comprehensive error handling and rollback capabilities - Extensive logging and reporting - Production-ready with security considerations - Modular design for easy customization and extension This automation suite provides a solid foundation for managing a complex homelab infrastructure with minimal manual intervention while maintaining high visibility into system health and performance.