Files
homelab-optimized/ansible/automation/AUTOMATION_SUMMARY.md
Gitea Mirror Bot 85f77995ec
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-05 11:04:10 UTC
2026-04-05 11:04:10 +00:00

308 lines
9.4 KiB
Markdown

# Homelab Ansible Automation Suite
## Overview
This automation suite provides comprehensive management capabilities for a distributed homelab infrastructure with Docker-enabled hosts. All playbooks have been tested across multiple hosts including homelab, pi-5, vish-concord-nuc, homeassistant, truenas-scale, and pve.
## 📁 Directory Structure
```
ansible/automation/
├── playbooks/
│ ├── service_lifecycle/
│ │ ├── restart_service.yml # Restart services with health checks
│ │ ├── service_status.yml # Comprehensive service status reports
│ │ └── container_logs.yml # Docker container log collection
│ ├── backup/
│ │ ├── backup_databases.yml # Database backup automation
│ │ └── backup_configs.yml # Configuration backup automation
│ └── monitoring/
│ ├── health_check.yml # System health monitoring
│ ├── system_metrics.yml # Real-time metrics collection
│ └── alert_check.yml # Infrastructure alerting system
├── hosts.ini # Inventory file with 10+ hosts
└── AUTOMATION_SUMMARY.md # This documentation
```
## 🚀 Service Lifecycle Management
### restart_service.yml
**Purpose**: Safely restart services with pre/post health checks
**Features**:
- Multi-platform support (Linux systemd, Synology DSM, containers)
- Pre-restart health validation
- Graceful restart with configurable timeouts
- Post-restart verification
- Rollback capability on failure
**Usage**:
```bash
# Restart Docker across all hosts
ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=docker"
# Restart with custom timeout
ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=nginx timeout=60"
```
### service_status.yml
**Purpose**: Generate comprehensive service status reports
**Features**:
- System resource monitoring (CPU, memory, disk, load)
- Docker container status and health
- Critical service verification
- Network connectivity checks
- Tailscale status monitoring
- JSON report generation
**Usage**:
```bash
# Check all services across infrastructure
ansible-playbook -i hosts.ini playbooks/service_status.yml
# Check specific service on specific hosts
ansible-playbook -i hosts.ini playbooks/service_status.yml --limit "homelab,pi-5" -e "service_name=docker"
```
### container_logs.yml
**Purpose**: Collect and analyze Docker container logs
**Features**:
- Multi-container log collection
- Configurable log retention (lines/time)
- Error pattern detection
- Log compression and archival
- Health status correlation
**Usage**:
```bash
# Collect logs from all containers
ansible-playbook -i hosts.ini playbooks/container_logs.yml
# Collect specific container logs
ansible-playbook -i hosts.ini playbooks/container_logs.yml -e "container_name=nginx"
```
## 💾 Backup Automation
### backup_databases.yml
**Purpose**: Automated database backup across multiple database types
**Features**:
- Multi-database support (PostgreSQL, MySQL, MongoDB, Redis)
- Automatic database discovery
- Compression and encryption
- Retention policy management
- Backup verification
- Remote storage support
**Usage**:
```bash
# Backup all databases
ansible-playbook -i hosts.ini playbooks/backup_databases.yml
# Backup with encryption
ansible-playbook -i hosts.ini playbooks/backup_databases.yml -e "encrypt_backups=true"
```
### backup_configs.yml
**Purpose**: Configuration and data backup automation
**Features**:
- Docker compose file backup
- Configuration directory archival
- Service-specific data backup
- Incremental backup support
- Backup inventory tracking
- Automated cleanup of old backups
**Usage**:
```bash
# Backup configurations
ansible-playbook -i hosts.ini playbooks/backup_configs.yml
# Include secrets in backup
ansible-playbook -i hosts.ini playbooks/backup_configs.yml -e "include_secrets=true"
```
## 📊 Monitoring & Alerting
### health_check.yml
**Purpose**: Comprehensive system health monitoring
**Features**:
- System metrics collection (uptime, CPU, memory, disk)
- Docker container health assessment
- Critical service verification
- Network connectivity testing
- Tailscale status monitoring
- JSON health reports
- Alert integration for critical issues
**Tested Results**:
- ✅ homelab: 29/36 containers running, all services healthy
- ✅ pi-5: 4/4 containers running, minimal resource usage
- ✅ vish-concord-nuc: 19/19 containers running, 73% disk usage
- ✅ homeassistant: 11/12 containers running, healthy
- ✅ truenas-scale: 26/31 containers running, 1 unhealthy container
**Usage**:
```bash
# Health check across all hosts
ansible-playbook -i hosts.ini playbooks/health_check.yml
# Check specific host group
ansible-playbook -i hosts.ini playbooks/health_check.yml --limit debian_clients
```
### system_metrics.yml
**Purpose**: Real-time system metrics collection
**Features**:
- Continuous metrics collection (CPU, memory, disk, network)
- Docker container metrics
- Configurable collection duration and intervals
- CSV output format
- Baseline system information capture
- Asynchronous collection for minimal impact
**Usage**:
```bash
# Collect metrics for 60 seconds
ansible-playbook -i hosts.ini playbooks/system_metrics.yml
# Custom duration and interval
ansible-playbook -i hosts.ini playbooks/system_metrics.yml -e "metrics_duration=300 collection_interval=10"
```
### alert_check.yml
**Purpose**: Infrastructure alerting and monitoring system
**Features**:
- Configurable alert thresholds (CPU, memory, disk, load)
- Docker container health monitoring
- Critical service status checking
- Network connectivity verification
- NTFY notification integration
- Alert severity classification (critical, warning)
- Comprehensive alert reporting
**Usage**:
```bash
# Run alert monitoring
ansible-playbook -i hosts.ini playbooks/alert_check.yml
# Test mode with notifications
ansible-playbook -i hosts.ini playbooks/alert_check.yml -e "alert_mode=test"
```
## 🏗️ Infrastructure Coverage
### Tested Hosts
1. **homelab** (Ubuntu 24.04) - Main development server
2. **pi-5** (Debian 12.13) - Raspberry Pi monitoring node
3. **vish-concord-nuc** (Ubuntu 24.04) - Home automation hub
4. **homeassistant** - Home Assistant OS
5. **truenas-scale** - TrueNAS Scale storage server
6. **pve** - Proxmox Virtual Environment
### Host Groups
- `debian_clients`: Linux hosts with full Docker support
- `synology`: Synology NAS devices
- `rpi`: Raspberry Pi devices
- `hypervisors`: Virtualization hosts
- `active`: All active infrastructure hosts
## 🔧 Configuration
### Variables
All playbooks support extensive customization through variables:
```yaml
# Service management
service_name: "docker"
timeout: 30
restart_mode: "graceful"
# Backup settings
backup_retention_days: 30
compress_backups: true
include_secrets: false
# Monitoring
metrics_duration: 60
collection_interval: 5
alert_mode: "production"
# Alert thresholds
cpu_warning: 80
cpu_critical: 95
memory_warning: 85
memory_critical: 95
```
### Inventory Configuration
The `hosts.ini` file includes:
- Tailscale IP addresses for secure communication
- Custom SSH ports and users per host
- Platform-specific configurations
- Service management settings
## 📈 Performance Results
### Health Check Performance
- Successfully monitors 6+ hosts simultaneously
- Collects 15+ metrics per host
- Generates detailed JSON reports
- Completes in under 60 seconds
### Metrics Collection
- Real-time CSV data collection
- Minimal system impact (async execution)
- Configurable collection intervals
- Comprehensive Docker metrics
### Alert System
- Detects critical issues across infrastructure
- NTFY integration for notifications
- Configurable alert thresholds
- Comprehensive status reporting
## 🚀 Usage Examples
### Daily Health Check
```bash
# Morning infrastructure health check
ansible-playbook -i hosts.ini playbooks/health_check.yml --limit active
```
### Weekly Backup
```bash
# Weekly configuration backup
ansible-playbook -i hosts.ini playbooks/backup_configs.yml -e "include_secrets=true"
```
### Service Restart with Monitoring
```bash
# Restart service with full monitoring
ansible-playbook -i hosts.ini playbooks/restart_service.yml -e "service_name=docker"
ansible-playbook -i hosts.ini playbooks/health_check.yml --limit "{{ target_host }}"
```
### Performance Monitoring
```bash
# Collect 5-minute performance baseline
ansible-playbook -i hosts.ini playbooks/system_metrics.yml -e "metrics_duration=300"
```
## 🔮 Future Enhancements
1. **Automated Scheduling**: Cron job integration for regular execution
2. **Web Dashboard**: Real-time monitoring dashboard
3. **Advanced Alerting**: Integration with Slack, Discord, email
4. **Backup Verification**: Automated backup integrity testing
5. **Service Discovery**: Dynamic service detection and monitoring
6. **Performance Trending**: Historical metrics analysis
7. **Disaster Recovery**: Automated failover and recovery procedures
## 📝 Notes
- All playbooks tested across heterogeneous infrastructure
- Multi-platform support (Ubuntu, Debian, Synology, TrueNAS)
- Comprehensive error handling and rollback capabilities
- Extensive logging and reporting
- Production-ready with security considerations
- Modular design for easy customization and extension
This automation suite provides a solid foundation for managing a complex homelab infrastructure with minimal manual intervention while maintaining high visibility into system health and performance.