Files
homelab-optimized/ansible/automation/AUTOMATION_SUMMARY.md
Gitea Mirror Bot 29ab8f0ed9
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has started running
Sanitized mirror from private repository - 2026-04-05 10:48:21 UTC
2026-04-05 10:48:21 +00:00

9.4 KiB

Homelab Ansible Automation Suite

Overview

This automation suite provides comprehensive management capabilities for a distributed homelab infrastructure with Docker-enabled hosts. All playbooks have been tested across multiple hosts including homelab, pi-5, vish-concord-nuc, homeassistant, truenas-scale, and pve.

📁 Directory Structure

ansible/automation/
├── playbooks/
│   ├── service_lifecycle/
│   │   ├── restart_service.yml      # Restart services with health checks
│   │   ├── service_status.yml       # Comprehensive service status reports
│   │   └── container_logs.yml       # Docker container log collection
│   ├── backup/
│   │   ├── backup_databases.yml     # Database backup automation
│   │   └── backup_configs.yml       # Configuration backup automation
│   └── monitoring/
│       ├── health_check.yml         # System health monitoring
│       ├── system_metrics.yml       # Real-time metrics collection
│       └── alert_check.yml          # Infrastructure alerting system
├── hosts.ini                        # Inventory file with 10+ hosts
└── AUTOMATION_SUMMARY.md            # This documentation

🚀 Service Lifecycle Management

restart_service.yml

Purpose: Safely restart services with pre/post health checks Features:

  • Multi-platform support (Linux systemd, Synology DSM, containers)
  • Pre-restart health validation
  • Graceful restart with configurable timeouts
  • Post-restart verification
  • Rollback capability on failure

Usage:

# Restart Docker across all hosts
ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=docker"

# Restart with custom timeout
ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=nginx timeout=60"

service_status.yml

Purpose: Generate comprehensive service status reports Features:

  • System resource monitoring (CPU, memory, disk, load)
  • Docker container status and health
  • Critical service verification
  • Network connectivity checks
  • Tailscale status monitoring
  • JSON report generation

Usage:

# Check all services across infrastructure
ansible-playbook -i hosts.ini playbooks/service_status.yml

# Check specific service on specific hosts
ansible-playbook -i hosts.ini playbooks/service_status.yml --limit "homelab,pi-5" -e "service_name=docker"

container_logs.yml

Purpose: Collect and analyze Docker container logs Features:

  • Multi-container log collection
  • Configurable log retention (lines/time)
  • Error pattern detection
  • Log compression and archival
  • Health status correlation

Usage:

# Collect logs from all containers
ansible-playbook -i hosts.ini playbooks/container_logs.yml

# Collect specific container logs
ansible-playbook -i hosts.ini playbooks/container_logs.yml -e "container_name=nginx"

💾 Backup Automation

backup_databases.yml

Purpose: Automated database backup across multiple database types Features:

  • Multi-database support (PostgreSQL, MySQL, MongoDB, Redis)
  • Automatic database discovery
  • Compression and encryption
  • Retention policy management
  • Backup verification
  • Remote storage support

Usage:

# Backup all databases
ansible-playbook -i hosts.ini playbooks/backup_databases.yml

# Backup with encryption
ansible-playbook -i hosts.ini playbooks/backup_databases.yml -e "encrypt_backups=true"

backup_configs.yml

Purpose: Configuration and data backup automation Features:

  • Docker compose file backup
  • Configuration directory archival
  • Service-specific data backup
  • Incremental backup support
  • Backup inventory tracking
  • Automated cleanup of old backups

Usage:

# Backup configurations
ansible-playbook -i hosts.ini playbooks/backup_configs.yml

# Include secrets in backup
ansible-playbook -i hosts.ini playbooks/backup_configs.yml -e "include_secrets=true"

📊 Monitoring & Alerting

health_check.yml

Purpose: Comprehensive system health monitoring Features:

  • System metrics collection (uptime, CPU, memory, disk)
  • Docker container health assessment
  • Critical service verification
  • Network connectivity testing
  • Tailscale status monitoring
  • JSON health reports
  • Alert integration for critical issues

Tested Results:

  • homelab: 29/36 containers running, all services healthy
  • pi-5: 4/4 containers running, minimal resource usage
  • vish-concord-nuc: 19/19 containers running, 73% disk usage
  • homeassistant: 11/12 containers running, healthy
  • truenas-scale: 26/31 containers running, 1 unhealthy container

Usage:

# Health check across all hosts
ansible-playbook -i hosts.ini playbooks/health_check.yml

# Check specific host group
ansible-playbook -i hosts.ini playbooks/health_check.yml --limit debian_clients

system_metrics.yml

Purpose: Real-time system metrics collection Features:

  • Continuous metrics collection (CPU, memory, disk, network)
  • Docker container metrics
  • Configurable collection duration and intervals
  • CSV output format
  • Baseline system information capture
  • Asynchronous collection for minimal impact

Usage:

# Collect metrics for 60 seconds
ansible-playbook -i hosts.ini playbooks/system_metrics.yml

# Custom duration and interval
ansible-playbook -i hosts.ini playbooks/system_metrics.yml -e "metrics_duration=300 collection_interval=10"

alert_check.yml

Purpose: Infrastructure alerting and monitoring system Features:

  • Configurable alert thresholds (CPU, memory, disk, load)
  • Docker container health monitoring
  • Critical service status checking
  • Network connectivity verification
  • NTFY notification integration
  • Alert severity classification (critical, warning)
  • Comprehensive alert reporting

Usage:

# Run alert monitoring
ansible-playbook -i hosts.ini playbooks/alert_check.yml

# Test mode with notifications
ansible-playbook -i hosts.ini playbooks/alert_check.yml -e "alert_mode=test"

🏗️ Infrastructure Coverage

Tested Hosts

  1. homelab (Ubuntu 24.04) - Main development server
  2. pi-5 (Debian 12.13) - Raspberry Pi monitoring node
  3. vish-concord-nuc (Ubuntu 24.04) - Home automation hub
  4. homeassistant - Home Assistant OS
  5. truenas-scale - TrueNAS Scale storage server
  6. pve - Proxmox Virtual Environment

Host Groups

  • debian_clients: Linux hosts with full Docker support
  • synology: Synology NAS devices
  • rpi: Raspberry Pi devices
  • hypervisors: Virtualization hosts
  • active: All active infrastructure hosts

🔧 Configuration

Variables

All playbooks support extensive customization through variables:

# Service management
service_name: "docker"
timeout: 30
restart_mode: "graceful"

# Backup settings
backup_retention_days: 30
compress_backups: true
include_secrets: false

# Monitoring
metrics_duration: 60
collection_interval: 5
alert_mode: "production"

# Alert thresholds
cpu_warning: 80
cpu_critical: 95
memory_warning: 85
memory_critical: 95

Inventory Configuration

The hosts.ini file includes:

  • Tailscale IP addresses for secure communication
  • Custom SSH ports and users per host
  • Platform-specific configurations
  • Service management settings

📈 Performance Results

Health Check Performance

  • Successfully monitors 6+ hosts simultaneously
  • Collects 15+ metrics per host
  • Generates detailed JSON reports
  • Completes in under 60 seconds

Metrics Collection

  • Real-time CSV data collection
  • Minimal system impact (async execution)
  • Configurable collection intervals
  • Comprehensive Docker metrics

Alert System

  • Detects critical issues across infrastructure
  • NTFY integration for notifications
  • Configurable alert thresholds
  • Comprehensive status reporting

🚀 Usage Examples

Daily Health Check

# Morning infrastructure health check
ansible-playbook -i hosts.ini playbooks/health_check.yml --limit active

Weekly Backup

# Weekly configuration backup
ansible-playbook -i hosts.ini playbooks/backup_configs.yml -e "include_secrets=true"

Service Restart with Monitoring

# Restart service with full monitoring
ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=docker"
ansible-playbook -i hosts.ini playbooks/health_check.yml --limit "{{ target_host }}"

Performance Monitoring

# Collect 5-minute performance baseline
ansible-playbook -i hosts.ini playbooks/system_metrics.yml -e "metrics_duration=300"

🔮 Future Enhancements

  1. Automated Scheduling: Cron job integration for regular execution
  2. Web Dashboard: Real-time monitoring dashboard
  3. Advanced Alerting: Integration with Slack, Discord, email
  4. Backup Verification: Automated backup integrity testing
  5. Service Discovery: Dynamic service detection and monitoring
  6. Performance Trending: Historical metrics analysis
  7. Disaster Recovery: Automated failover and recovery procedures

📝 Notes

  • All playbooks tested across heterogeneous infrastructure
  • Multi-platform support (Ubuntu, Debian, Synology, TrueNAS)
  • Comprehensive error handling and rollback capabilities
  • Extensive logging and reporting
  • Production-ready with security considerations
  • Modular design for easy customization and extension

This automation suite provides a solid foundation for managing a complex homelab infrastructure with minimal manual intervention while maintaining high visibility into system health and performance.