527 lines
15 KiB
Markdown
527 lines
15 KiB
Markdown
# 🏠 Homelab Ansible Playbooks
|
|
|
|
Comprehensive automation playbooks for managing your homelab infrastructure. These playbooks provide operational automation beyond the existing health monitoring and system management.
|
|
|
|
## 📋 Quick Reference
|
|
|
|
| Category | Playbook | Purpose | Priority |
|
|
|----------|----------|---------|----------|
|
|
| **Service Management** | `service_status.yml` | Get status of all services | ⭐⭐⭐ |
|
|
| | `restart_service.yml` | Restart services with dependencies | ⭐⭐⭐ |
|
|
| | `container_logs.yml` | Collect logs for troubleshooting | ⭐⭐⭐ |
|
|
| **Backup & Recovery** | `backup_databases.yml` | Automated database backups | ⭐⭐⭐ |
|
|
| | `backup_configs.yml` | Configuration and data backups | ⭐⭐⭐ |
|
|
| | `disaster_recovery_test.yml` | Test DR procedures | ⭐⭐ |
|
|
| **Storage Management** | `disk_usage_report.yml` | Monitor storage usage | ⭐⭐⭐ |
|
|
| | `prune_containers.yml` | Clean up Docker resources | ⭐⭐ |
|
|
| | `log_rotation.yml` | Manage log files | ⭐⭐ |
|
|
| **Security** | `security_updates.yml` | Automated security patches | ⭐⭐⭐ |
|
|
| | `certificate_renewal.yml` | SSL certificate management | ⭐⭐ |
|
|
| **Monitoring** | `service_health_deep.yml` | Comprehensive health checks | ⭐⭐ |
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Prerequisites
|
|
- Ansible 2.12+
|
|
- SSH access to all hosts via Tailscale
|
|
- Existing inventory from `/home/homelab/organized/repos/homelab/ansible/automation/hosts.ini`
|
|
|
|
### Run Your First Playbook
|
|
```bash
|
|
cd /home/homelab/organized/repos/homelab/ansible/automation
|
|
|
|
# Check status of all services
|
|
ansible-playbook playbooks/service_status.yml
|
|
|
|
# Check disk usage across all hosts
|
|
ansible-playbook playbooks/disk_usage_report.yml
|
|
|
|
# Backup all databases
|
|
ansible-playbook playbooks/backup_databases.yml
|
|
```
|
|
|
|
## 📦 Service Management Playbooks
|
|
|
|
### `service_status.yml` - Service Status Check
|
|
Get comprehensive status of all services across your homelab.
|
|
|
|
```bash
|
|
# Check all hosts
|
|
ansible-playbook playbooks/service_status.yml
|
|
|
|
# Check specific host
|
|
ansible-playbook playbooks/service_status.yml --limit atlantis
|
|
|
|
# Generate JSON reports
|
|
ansible-playbook playbooks/service_status.yml
|
|
# Reports saved to: /tmp/HOSTNAME_status_TIMESTAMP.json
|
|
```
|
|
|
|
**Features:**
|
|
- System resource usage
|
|
- Container status and health
|
|
- Critical service monitoring
|
|
- Network connectivity checks
|
|
- JSON output for automation
|
|
|
|
### `restart_service.yml` - Service Restart with Dependencies
|
|
Restart services with proper dependency handling and health checks.
|
|
|
|
```bash
|
|
# Restart a service
|
|
ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"
|
|
|
|
# Restart with custom wait time
|
|
ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"
|
|
|
|
# Force restart if graceful stop fails
|
|
ansible-playbook playbooks/restart_service.yml -e "service_name=problematic-service force_restart=true"
|
|
```
|
|
|
|
**Features:**
|
|
- Dependency-aware restart order
|
|
- Health check validation
|
|
- Graceful stop with force option
|
|
- Pre/post restart logging
|
|
- Service-specific wait times
|
|
|
|
### `container_logs.yml` - Log Collection
|
|
Collect logs from multiple containers for troubleshooting.
|
|
|
|
```bash
|
|
# Collect logs for specific service
|
|
ansible-playbook playbooks/container_logs.yml -e "service_name=plex"
|
|
|
|
# Collect logs matching pattern
|
|
ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"
|
|
|
|
# Collect all container logs
|
|
ansible-playbook playbooks/container_logs.yml -e "collect_all=true"
|
|
|
|
# Custom log parameters
|
|
ansible-playbook playbooks/container_logs.yml -e "service_name=plex log_lines=500 log_since=2h"
|
|
```
|
|
|
|
**Features:**
|
|
- Pattern-based container selection
|
|
- Error analysis and counting
|
|
- Resource usage reporting
|
|
- Structured log organization
|
|
- Archive option for long-term storage
|
|
|
|
## 💾 Backup & Recovery Playbooks
|
|
|
|
### `backup_databases.yml` - Database Backup Automation
|
|
Automated backup of all PostgreSQL and MySQL databases.
|
|
|
|
```bash
|
|
# Backup all databases
|
|
ansible-playbook playbooks/backup_databases.yml
|
|
|
|
# Full backup with verification
|
|
ansible-playbook playbooks/backup_databases.yml -e "backup_type=full verify_backups=true"
|
|
|
|
# Specific host backup
|
|
ansible-playbook playbooks/backup_databases.yml --limit atlantis
|
|
|
|
# Custom retention
|
|
ansible-playbook playbooks/backup_databases.yml -e "backup_retention_days=60"
|
|
```
|
|
|
|
**Supported Databases:**
|
|
- **Atlantis**: Immich, Vaultwarden, Joplin, Firefly
|
|
- **Calypso**: Authentik, Paperless
|
|
- **Homelab VM**: Mastodon, Matrix
|
|
|
|
**Features:**
|
|
- Automatic database discovery
|
|
- Compression and verification
|
|
- Retention management
|
|
- Backup integrity testing
|
|
- Multiple storage locations
|
|
|
|
### `backup_configs.yml` - Configuration Backup
|
|
Backup docker-compose files, configs, and important data.
|
|
|
|
```bash
|
|
# Backup configurations
|
|
ansible-playbook playbooks/backup_configs.yml
|
|
|
|
# Include secrets (use with caution)
|
|
ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"
|
|
|
|
# Backup without compression
|
|
ansible-playbook playbooks/backup_configs.yml -e "compress_backups=false"
|
|
```
|
|
|
|
**Backup Includes:**
|
|
- Docker configurations
|
|
- SSH configurations
|
|
- Service-specific data
|
|
- System information snapshots
|
|
- Docker-compose files
|
|
|
|
### `disaster_recovery_test.yml` - DR Testing
|
|
Test disaster recovery procedures and validate backup integrity.
|
|
|
|
```bash
|
|
# Basic DR test (dry run)
|
|
ansible-playbook playbooks/disaster_recovery_test.yml
|
|
|
|
# Full DR test with restore validation
|
|
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full dry_run=false"
|
|
|
|
# Test with failover procedures
|
|
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_failover=true"
|
|
```
|
|
|
|
**Test Components:**
|
|
- Backup validation and integrity
|
|
- Database restore testing
|
|
- RTO (Recovery Time Objective) analysis
|
|
- Service failover procedures
|
|
- DR readiness scoring
|
|
|
|
## 💿 Storage Management Playbooks
|
|
|
|
### `disk_usage_report.yml` - Storage Monitoring
|
|
Monitor storage usage and generate comprehensive reports.
|
|
|
|
```bash
|
|
# Basic disk usage report
|
|
ansible-playbook playbooks/disk_usage_report.yml
|
|
|
|
# Detailed analysis with performance data
|
|
ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true include_performance=true"
|
|
|
|
# Set custom alert thresholds
|
|
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=90 warning_threshold=80"
|
|
|
|
# Send alerts for critical usage
|
|
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true"
|
|
```
|
|
|
|
**Features:**
|
|
- Filesystem usage monitoring
|
|
- Docker storage analysis
|
|
- Large file identification
|
|
- Temporary file analysis
|
|
- Alert thresholds and notifications
|
|
- JSON output for automation
|
|
|
|
### `prune_containers.yml` - Docker Cleanup
|
|
Clean up unused containers, images, volumes, and networks.
|
|
|
|
```bash
|
|
# Basic cleanup (dry run)
|
|
ansible-playbook playbooks/prune_containers.yml
|
|
|
|
# Live cleanup
|
|
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
|
|
|
|
# Aggressive cleanup (removes old images)
|
|
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
|
|
|
|
# Custom retention and log cleanup
|
|
ansible-playbook playbooks/prune_containers.yml -e "keep_images_days=14 cleanup_logs=true max_log_size=50m"
|
|
```
|
|
|
|
**Cleanup Actions:**
|
|
- Remove stopped containers
|
|
- Remove dangling images
|
|
- Remove unused volumes (optional)
|
|
- Remove unused networks
|
|
- Truncate large container logs
|
|
- System-wide Docker prune
|
|
|
|
### `log_rotation.yml` - Log Management
|
|
Manage log files across all services and system components.
|
|
|
|
```bash
|
|
# Basic log rotation (dry run)
|
|
ansible-playbook playbooks/log_rotation.yml
|
|
|
|
# Live log rotation with compression
|
|
ansible-playbook playbooks/log_rotation.yml -e "dry_run=false compress_old_logs=true"
|
|
|
|
# Aggressive cleanup
|
|
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true max_log_age_days=14"
|
|
|
|
# Custom log size limits
|
|
ansible-playbook playbooks/log_rotation.yml -e "max_log_size=50M"
|
|
```
|
|
|
|
**Log Management:**
|
|
- System log rotation
|
|
- Docker container log truncation
|
|
- Application log cleanup
|
|
- Log compression
|
|
- Retention policies
|
|
- Logrotate configuration
|
|
|
|
## 🔒 Security Playbooks
|
|
|
|
### `security_updates.yml` - Automated Security Updates
|
|
Apply security patches and system updates.
|
|
|
|
```bash
|
|
# Security updates only
|
|
ansible-playbook playbooks/security_updates.yml
|
|
|
|
# Security updates with reboot if needed
|
|
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
|
|
|
|
# Full system update
|
|
ansible-playbook playbooks/security_updates.yml -e "security_only=false"
|
|
|
|
# Include Docker updates
|
|
ansible-playbook playbooks/security_updates.yml -e "update_docker=true"
|
|
```
|
|
|
|
**Features:**
|
|
- Security-only or full updates
|
|
- Pre-update configuration backup
|
|
- Kernel update detection
|
|
- Automatic reboot handling
|
|
- Service verification after updates
|
|
- Update reporting and logging
|
|
|
|
### `certificate_renewal.yml` - SSL Certificate Management
|
|
Manage Let's Encrypt certificates and other SSL certificates.
|
|
|
|
```bash
|
|
# Check certificate status
|
|
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
|
|
|
|
# Renew certificates
|
|
ansible-playbook playbooks/certificate_renewal.yml
|
|
|
|
# Force renewal
|
|
ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"
|
|
|
|
# Custom renewal threshold
|
|
ansible-playbook playbooks/certificate_renewal.yml -e "renewal_threshold_days=45"
|
|
```
|
|
|
|
**Certificate Support:**
|
|
- Let's Encrypt via Certbot
|
|
- Nginx Proxy Manager certificates
|
|
- Traefik certificates
|
|
- Synology DSM certificates
|
|
|
|
## 🏥 Monitoring Playbooks
|
|
|
|
### `service_health_deep.yml` - Comprehensive Health Checks
|
|
Deep health monitoring for all homelab services.
|
|
|
|
```bash
|
|
# Deep health check
|
|
ansible-playbook playbooks/service_health_deep.yml
|
|
|
|
# Include performance metrics
|
|
ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"
|
|
|
|
# Enable alerting
|
|
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
|
|
|
|
# Custom timeout
|
|
ansible-playbook playbooks/service_health_deep.yml -e "health_check_timeout=60"
|
|
```
|
|
|
|
**Health Checks:**
|
|
- Container health status
|
|
- Service endpoint testing
|
|
- Database connectivity
|
|
- Redis connectivity
|
|
- System performance metrics
|
|
- Log error analysis
|
|
- Dependency validation
|
|
|
|
## 🔧 Advanced Usage
|
|
|
|
### Combining Playbooks
|
|
```bash
|
|
# Complete maintenance routine
|
|
ansible-playbook playbooks/service_status.yml
|
|
ansible-playbook playbooks/backup_databases.yml
|
|
ansible-playbook playbooks/security_updates.yml
|
|
ansible-playbook playbooks/disk_usage_report.yml
|
|
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
|
|
```
|
|
|
|
### Scheduling with Cron
|
|
```bash
|
|
# Add to crontab for automated execution
|
|
# Daily backups at 2 AM
|
|
0 2 * * * cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/backup_databases.yml
|
|
|
|
# Weekly cleanup on Sundays at 3 AM
|
|
0 3 * * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
|
|
|
|
# Monthly DR test on first Sunday at 4 AM
|
|
0 4 1-7 * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/disaster_recovery_test.yml
|
|
```
|
|
|
|
### Custom Variables
|
|
Create host-specific variable files:
|
|
```bash
|
|
# host_vars/atlantis.yml
|
|
backup_retention_days: 60
|
|
max_log_size: "200M"
|
|
alert_threshold: 90
|
|
|
|
# host_vars/homelab_vm.yml
|
|
security_only: false
|
|
reboot_if_required: true
|
|
```
|
|
|
|
## 📊 Monitoring and Alerting
|
|
|
|
### Integration with Existing Monitoring
|
|
These playbooks integrate with your existing Prometheus/Grafana stack:
|
|
|
|
```bash
|
|
# Generate metrics for Prometheus
|
|
ansible-playbook playbooks/service_status.yml
|
|
ansible-playbook playbooks/disk_usage_report.yml
|
|
|
|
# JSON outputs can be parsed by monitoring systems
|
|
# Reports saved to /tmp/ directories with timestamps
|
|
```
|
|
|
|
### Alert Configuration
|
|
```bash
|
|
# Enable alerts in playbooks
|
|
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true alert_threshold=85"
|
|
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
|
|
ansible-playbook playbooks/disaster_recovery_test.yml -e "send_alerts=true"
|
|
```
|
|
|
|
## 🚨 Emergency Procedures
|
|
|
|
### Service Recovery
|
|
```bash
|
|
# Quick service restart
|
|
ansible-playbook playbooks/restart_service.yml -e "service_name=SERVICE_NAME host_target=HOST"
|
|
|
|
# Collect logs for troubleshooting
|
|
ansible-playbook playbooks/container_logs.yml -e "service_name=SERVICE_NAME"
|
|
|
|
# Check service health
|
|
ansible-playbook playbooks/service_health_deep.yml --limit HOST
|
|
```
|
|
|
|
### Storage Emergency
|
|
```bash
|
|
# Check disk usage immediately
|
|
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=95"
|
|
|
|
# Emergency cleanup
|
|
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
|
|
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true dry_run=false"
|
|
```
|
|
|
|
### Security Incident
|
|
```bash
|
|
# Apply security updates immediately
|
|
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
|
|
|
|
# Check certificate status
|
|
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
|
|
```
|
|
|
|
## 🔍 Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
**Playbook Fails with Permission Denied**
|
|
```bash
|
|
# Check SSH connectivity
|
|
ansible all -m ping
|
|
|
|
# Verify sudo access
|
|
ansible all -m shell -a "sudo whoami" --become
|
|
```
|
|
|
|
**Docker Commands Fail**
|
|
```bash
|
|
# Check Docker daemon status
|
|
ansible-playbook playbooks/service_status.yml --limit HOSTNAME
|
|
|
|
# Verify Docker group membership
|
|
ansible HOST -m shell -a "groups $USER"
|
|
```
|
|
|
|
**Backup Failures**
|
|
```bash
|
|
# Check backup directory permissions
|
|
ansible HOST -m file -a "path=/volume1/backups state=directory" --become
|
|
|
|
# Test database connectivity
|
|
ansible-playbook playbooks/service_health_deep.yml --limit HOST
|
|
```
|
|
|
|
### Debug Mode
|
|
```bash
|
|
# Run with verbose output
|
|
ansible-playbook playbooks/PLAYBOOK.yml -vvv
|
|
|
|
# Check specific tasks
|
|
ansible-playbook playbooks/PLAYBOOK.yml --list-tasks
|
|
ansible-playbook playbooks/PLAYBOOK.yml --start-at-task="TASK_NAME"
|
|
```
|
|
|
|
## 📚 Integration with Existing Automation
|
|
|
|
These playbooks complement your existing automation:
|
|
|
|
### With Current Health Monitoring
|
|
```bash
|
|
# Existing health checks
|
|
ansible-playbook playbooks/synology_health.yml
|
|
ansible-playbook playbooks/check_apt_proxy.yml
|
|
|
|
# New comprehensive checks
|
|
ansible-playbook playbooks/service_health_deep.yml
|
|
ansible-playbook playbooks/disk_usage_report.yml
|
|
```
|
|
|
|
### With GitOps Deployment
|
|
```bash
|
|
# After GitOps deployment
|
|
ansible-playbook playbooks/service_status.yml
|
|
ansible-playbook playbooks/backup_configs.yml
|
|
```
|
|
|
|
## 🎯 Best Practices
|
|
|
|
### Regular Maintenance Schedule
|
|
- **Daily**: `backup_databases.yml`
|
|
- **Weekly**: `security_updates.yml`, `disk_usage_report.yml`
|
|
- **Monthly**: `disaster_recovery_test.yml`, `prune_containers.yml`
|
|
- **As Needed**: `service_health_deep.yml`, `restart_service.yml`
|
|
|
|
### Safety Guidelines
|
|
- Always test with `dry_run=true` first
|
|
- Use `--limit` for single host testing
|
|
- Keep backups before major changes
|
|
- Monitor service status after automation
|
|
|
|
### Performance Optimization
|
|
- Run resource-intensive playbooks during low-usage hours
|
|
- Use `--forks` to control parallelism
|
|
- Monitor system resources during execution
|
|
|
|
## 📞 Support
|
|
|
|
For issues with these playbooks:
|
|
1. Check the troubleshooting section above
|
|
2. Review playbook logs in `/tmp/` directories
|
|
3. Use debug mode (`-vvv`) for detailed output
|
|
4. Verify integration with existing automation
|
|
|
|
---
|
|
|
|
**Last Updated**: {{ ansible_date_time.date if ansible_date_time is defined else 'Manual Update Required' }}
|
|
**Total Playbooks**: 10+ comprehensive automation playbooks
|
|
**Coverage**: Complete operational automation for homelab management |