Files
homelab-optimized/ansible/automation/playbooks/README.md
Gitea Mirror Bot 72afe8052c
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m0s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-20 00:58:22 UTC
2026-04-20 00:58:22 +00:00

527 lines
15 KiB
Markdown

# 🏠 Homelab Ansible Playbooks
Comprehensive automation playbooks for managing your homelab infrastructure. These playbooks provide operational automation beyond the existing health monitoring and system management.
## 📋 Quick Reference
| Category | Playbook | Purpose | Priority |
|----------|----------|---------|----------|
| **Service Management** | `service_status.yml` | Get status of all services | ⭐⭐⭐ |
| | `restart_service.yml` | Restart services with dependencies | ⭐⭐⭐ |
| | `container_logs.yml` | Collect logs for troubleshooting | ⭐⭐⭐ |
| **Backup & Recovery** | `backup_databases.yml` | Automated database backups | ⭐⭐⭐ |
| | `backup_configs.yml` | Configuration and data backups | ⭐⭐⭐ |
| | `disaster_recovery_test.yml` | Test DR procedures | ⭐⭐ |
| **Storage Management** | `disk_usage_report.yml` | Monitor storage usage | ⭐⭐⭐ |
| | `prune_containers.yml` | Clean up Docker resources | ⭐⭐ |
| | `log_rotation.yml` | Manage log files | ⭐⭐ |
| **Security** | `security_updates.yml` | Automated security patches | ⭐⭐⭐ |
| | `certificate_renewal.yml` | SSL certificate management | ⭐⭐ |
| **Monitoring** | `service_health_deep.yml` | Comprehensive health checks | ⭐⭐ |
## 🚀 Quick Start
### Prerequisites
- Ansible 2.12+
- SSH access to all hosts via Tailscale
- Existing inventory from `/home/homelab/organized/repos/homelab/ansible/automation/hosts.ini`
### Run Your First Playbook
```bash
cd /home/homelab/organized/repos/homelab/ansible/automation
# Check status of all services
ansible-playbook playbooks/service_status.yml
# Check disk usage across all hosts
ansible-playbook playbooks/disk_usage_report.yml
# Backup all databases
ansible-playbook playbooks/backup_databases.yml
```
## 📦 Service Management Playbooks
### `service_status.yml` - Service Status Check
Get comprehensive status of all services across your homelab.
```bash
# Check all hosts
ansible-playbook playbooks/service_status.yml
# Check specific host
ansible-playbook playbooks/service_status.yml --limit atlantis
# Generate JSON reports
ansible-playbook playbooks/service_status.yml
# Reports saved to: /tmp/HOSTNAME_status_TIMESTAMP.json
```
**Features:**
- System resource usage
- Container status and health
- Critical service monitoring
- Network connectivity checks
- JSON output for automation
### `restart_service.yml` - Service Restart with Dependencies
Restart services with proper dependency handling and health checks.
```bash
# Restart a service
ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"
# Restart with custom wait time
ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"
# Force restart if graceful stop fails
ansible-playbook playbooks/restart_service.yml -e "service_name=problematic-service force_restart=true"
```
**Features:**
- Dependency-aware restart order
- Health check validation
- Graceful stop with force option
- Pre/post restart logging
- Service-specific wait times
### `container_logs.yml` - Log Collection
Collect logs from multiple containers for troubleshooting.
```bash
# Collect logs for specific service
ansible-playbook playbooks/container_logs.yml -e "service_name=plex"
# Collect logs matching pattern
ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"
# Collect all container logs
ansible-playbook playbooks/container_logs.yml -e "collect_all=true"
# Custom log parameters
ansible-playbook playbooks/container_logs.yml -e "service_name=plex log_lines=500 log_since=2h"
```
**Features:**
- Pattern-based container selection
- Error analysis and counting
- Resource usage reporting
- Structured log organization
- Archive option for long-term storage
## 💾 Backup & Recovery Playbooks
### `backup_databases.yml` - Database Backup Automation
Automated backup of all PostgreSQL and MySQL databases.
```bash
# Backup all databases
ansible-playbook playbooks/backup_databases.yml
# Full backup with verification
ansible-playbook playbooks/backup_databases.yml -e "backup_type=full verify_backups=true"
# Specific host backup
ansible-playbook playbooks/backup_databases.yml --limit atlantis
# Custom retention
ansible-playbook playbooks/backup_databases.yml -e "backup_retention_days=60"
```
**Supported Databases:**
- **Atlantis**: Immich, Vaultwarden, Joplin, Firefly
- **Calypso**: Authentik, Paperless
- **Homelab VM**: Mastodon, Matrix
**Features:**
- Automatic database discovery
- Compression and verification
- Retention management
- Backup integrity testing
- Multiple storage locations
### `backup_configs.yml` - Configuration Backup
Backup docker-compose files, configs, and important data.
```bash
# Backup configurations
ansible-playbook playbooks/backup_configs.yml
# Include secrets (use with caution)
ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"
# Backup without compression
ansible-playbook playbooks/backup_configs.yml -e "compress_backups=false"
```
**Backup Includes:**
- Docker configurations
- SSH configurations
- Service-specific data
- System information snapshots
- Docker-compose files
### `disaster_recovery_test.yml` - DR Testing
Test disaster recovery procedures and validate backup integrity.
```bash
# Basic DR test (dry run)
ansible-playbook playbooks/disaster_recovery_test.yml
# Full DR test with restore validation
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full dry_run=false"
# Test with failover procedures
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_failover=true"
```
**Test Components:**
- Backup validation and integrity
- Database restore testing
- RTO (Recovery Time Objective) analysis
- Service failover procedures
- DR readiness scoring
## 💿 Storage Management Playbooks
### `disk_usage_report.yml` - Storage Monitoring
Monitor storage usage and generate comprehensive reports.
```bash
# Basic disk usage report
ansible-playbook playbooks/disk_usage_report.yml
# Detailed analysis with performance data
ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true include_performance=true"
# Set custom alert thresholds
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=90 warning_threshold=80"
# Send alerts for critical usage
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true"
```
**Features:**
- Filesystem usage monitoring
- Docker storage analysis
- Large file identification
- Temporary file analysis
- Alert thresholds and notifications
- JSON output for automation
### `prune_containers.yml` - Docker Cleanup
Clean up unused containers, images, volumes, and networks.
```bash
# Basic cleanup (dry run)
ansible-playbook playbooks/prune_containers.yml
# Live cleanup
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
# Aggressive cleanup (removes old images)
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
# Custom retention and log cleanup
ansible-playbook playbooks/prune_containers.yml -e "keep_images_days=14 cleanup_logs=true max_log_size=50m"
```
**Cleanup Actions:**
- Remove stopped containers
- Remove dangling images
- Remove unused volumes (optional)
- Remove unused networks
- Truncate large container logs
- System-wide Docker prune
### `log_rotation.yml` - Log Management
Manage log files across all services and system components.
```bash
# Basic log rotation (dry run)
ansible-playbook playbooks/log_rotation.yml
# Live log rotation with compression
ansible-playbook playbooks/log_rotation.yml -e "dry_run=false compress_old_logs=true"
# Aggressive cleanup
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true max_log_age_days=14"
# Custom log size limits
ansible-playbook playbooks/log_rotation.yml -e "max_log_size=50M"
```
**Log Management:**
- System log rotation
- Docker container log truncation
- Application log cleanup
- Log compression
- Retention policies
- Logrotate configuration
## 🔒 Security Playbooks
### `security_updates.yml` - Automated Security Updates
Apply security patches and system updates.
```bash
# Security updates only
ansible-playbook playbooks/security_updates.yml
# Security updates with reboot if needed
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
# Full system update
ansible-playbook playbooks/security_updates.yml -e "security_only=false"
# Include Docker updates
ansible-playbook playbooks/security_updates.yml -e "update_docker=true"
```
**Features:**
- Security-only or full updates
- Pre-update configuration backup
- Kernel update detection
- Automatic reboot handling
- Service verification after updates
- Update reporting and logging
### `certificate_renewal.yml` - SSL Certificate Management
Manage Let's Encrypt certificates and other SSL certificates.
```bash
# Check certificate status
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
# Renew certificates
ansible-playbook playbooks/certificate_renewal.yml
# Force renewal
ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"
# Custom renewal threshold
ansible-playbook playbooks/certificate_renewal.yml -e "renewal_threshold_days=45"
```
**Certificate Support:**
- Let's Encrypt via Certbot
- Nginx Proxy Manager certificates
- Traefik certificates
- Synology DSM certificates
## 🏥 Monitoring Playbooks
### `service_health_deep.yml` - Comprehensive Health Checks
Deep health monitoring for all homelab services.
```bash
# Deep health check
ansible-playbook playbooks/service_health_deep.yml
# Include performance metrics
ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"
# Enable alerting
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
# Custom timeout
ansible-playbook playbooks/service_health_deep.yml -e "health_check_timeout=60"
```
**Health Checks:**
- Container health status
- Service endpoint testing
- Database connectivity
- Redis connectivity
- System performance metrics
- Log error analysis
- Dependency validation
## 🔧 Advanced Usage
### Combining Playbooks
```bash
# Complete maintenance routine
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/backup_databases.yml
ansible-playbook playbooks/security_updates.yml
ansible-playbook playbooks/disk_usage_report.yml
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
```
### Scheduling with Cron
```bash
# Add to crontab for automated execution
# Daily backups at 2 AM
0 2 * * * cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/backup_databases.yml
# Weekly cleanup on Sundays at 3 AM
0 3 * * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
# Monthly DR test on first Sunday at 4 AM
0 4 1-7 * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/disaster_recovery_test.yml
```
### Custom Variables
Create host-specific variable files:
```bash
# host_vars/atlantis.yml
backup_retention_days: 60
max_log_size: "200M"
alert_threshold: 90
# host_vars/homelab_vm.yml
security_only: false
reboot_if_required: true
```
## 📊 Monitoring and Alerting
### Integration with Existing Monitoring
These playbooks integrate with your existing Prometheus/Grafana stack:
```bash
# Generate metrics for Prometheus
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/disk_usage_report.yml
# JSON outputs can be parsed by monitoring systems
# Reports saved to /tmp/ directories with timestamps
```
### Alert Configuration
```bash
# Enable alerts in playbooks
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true alert_threshold=85"
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
ansible-playbook playbooks/disaster_recovery_test.yml -e "send_alerts=true"
```
## 🚨 Emergency Procedures
### Service Recovery
```bash
# Quick service restart
ansible-playbook playbooks/restart_service.yml -e "service_name=SERVICE_NAME host_target=HOST"
# Collect logs for troubleshooting
ansible-playbook playbooks/container_logs.yml -e "service_name=SERVICE_NAME"
# Check service health
ansible-playbook playbooks/service_health_deep.yml --limit HOST
```
### Storage Emergency
```bash
# Check disk usage immediately
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=95"
# Emergency cleanup
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true dry_run=false"
```
### Security Incident
```bash
# Apply security updates immediately
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
# Check certificate status
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
```
## 🔍 Troubleshooting
### Common Issues
**Playbook Fails with Permission Denied**
```bash
# Check SSH connectivity
ansible all -m ping
# Verify sudo access
ansible all -m shell -a "sudo whoami" --become
```
**Docker Commands Fail**
```bash
# Check Docker daemon status
ansible-playbook playbooks/service_status.yml --limit HOSTNAME
# Verify Docker group membership
ansible HOST -m shell -a "groups $USER"
```
**Backup Failures**
```bash
# Check backup directory permissions
ansible HOST -m file -a "path=/volume1/backups state=directory" --become
# Test database connectivity
ansible-playbook playbooks/service_health_deep.yml --limit HOST
```
### Debug Mode
```bash
# Run with verbose output
ansible-playbook playbooks/PLAYBOOK.yml -vvv
# Check specific tasks
ansible-playbook playbooks/PLAYBOOK.yml --list-tasks
ansible-playbook playbooks/PLAYBOOK.yml --start-at-task="TASK_NAME"
```
## 📚 Integration with Existing Automation
These playbooks complement your existing automation:
### With Current Health Monitoring
```bash
# Existing health checks
ansible-playbook playbooks/synology_health.yml
ansible-playbook playbooks/check_apt_proxy.yml
# New comprehensive checks
ansible-playbook playbooks/service_health_deep.yml
ansible-playbook playbooks/disk_usage_report.yml
```
### With GitOps Deployment
```bash
# After GitOps deployment
ansible-playbook playbooks/service_status.yml
ansible-playbook playbooks/backup_configs.yml
```
## 🎯 Best Practices
### Regular Maintenance Schedule
- **Daily**: `backup_databases.yml`
- **Weekly**: `security_updates.yml`, `disk_usage_report.yml`
- **Monthly**: `disaster_recovery_test.yml`, `prune_containers.yml`
- **As Needed**: `service_health_deep.yml`, `restart_service.yml`
### Safety Guidelines
- Always test with `dry_run=true` first
- Use `--limit` for single host testing
- Keep backups before major changes
- Monitor service status after automation
### Performance Optimization
- Run resource-intensive playbooks during low-usage hours
- Use `--forks` to control parallelism
- Monitor system resources during execution
## 📞 Support
For issues with these playbooks:
1. Check the troubleshooting section above
2. Review playbook logs in `/tmp/` directories
3. Use debug mode (`-vvv`) for detailed output
4. Verify integration with existing automation
---
**Last Updated**: {{ ansible_date_time.date if ansible_date_time is defined else 'Manual Update Required' }}
**Total Playbooks**: 10+ comprehensive automation playbooks
**Coverage**: Complete operational automation for homelab management