Sanitized mirror from private repository - 2026-04-18 11:19:59 UTC
This commit is contained in:
527
ansible/automation/playbooks/README.md
Normal file
527
ansible/automation/playbooks/README.md
Normal file
@@ -0,0 +1,527 @@
|
||||
# 🏠 Homelab Ansible Playbooks
|
||||
|
||||
Comprehensive automation playbooks for managing your homelab infrastructure. These playbooks provide operational automation beyond the existing health monitoring and system management.
|
||||
|
||||
## 📋 Quick Reference
|
||||
|
||||
| Category | Playbook | Purpose | Priority |
|
||||
|----------|----------|---------|----------|
|
||||
| **Service Management** | `service_status.yml` | Get status of all services | ⭐⭐⭐ |
|
||||
| | `restart_service.yml` | Restart services with dependencies | ⭐⭐⭐ |
|
||||
| | `container_logs.yml` | Collect logs for troubleshooting | ⭐⭐⭐ |
|
||||
| **Backup & Recovery** | `backup_databases.yml` | Automated database backups | ⭐⭐⭐ |
|
||||
| | `backup_configs.yml` | Configuration and data backups | ⭐⭐⭐ |
|
||||
| | `disaster_recovery_test.yml` | Test DR procedures | ⭐⭐ |
|
||||
| **Storage Management** | `disk_usage_report.yml` | Monitor storage usage | ⭐⭐⭐ |
|
||||
| | `prune_containers.yml` | Clean up Docker resources | ⭐⭐ |
|
||||
| | `log_rotation.yml` | Manage log files | ⭐⭐ |
|
||||
| **Security** | `security_updates.yml` | Automated security patches | ⭐⭐⭐ |
|
||||
| | `certificate_renewal.yml` | SSL certificate management | ⭐⭐ |
|
||||
| **Monitoring** | `service_health_deep.yml` | Comprehensive health checks | ⭐⭐ |
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Prerequisites
|
||||
- Ansible 2.12+
|
||||
- SSH access to all hosts via Tailscale
|
||||
- Existing inventory from `/home/homelab/organized/repos/homelab/ansible/automation/hosts.ini`
|
||||
|
||||
### Run Your First Playbook
|
||||
```bash
|
||||
cd /home/homelab/organized/repos/homelab/ansible/automation
|
||||
|
||||
# Check status of all services
|
||||
ansible-playbook playbooks/service_status.yml
|
||||
|
||||
# Check disk usage across all hosts
|
||||
ansible-playbook playbooks/disk_usage_report.yml
|
||||
|
||||
# Backup all databases
|
||||
ansible-playbook playbooks/backup_databases.yml
|
||||
```
|
||||
|
||||
## 📦 Service Management Playbooks
|
||||
|
||||
### `service_status.yml` - Service Status Check
|
||||
Get comprehensive status of all services across your homelab.
|
||||
|
||||
```bash
|
||||
# Check all hosts
|
||||
ansible-playbook playbooks/service_status.yml
|
||||
|
||||
# Check specific host
|
||||
ansible-playbook playbooks/service_status.yml --limit atlantis
|
||||
|
||||
# Generate JSON reports
|
||||
ansible-playbook playbooks/service_status.yml
|
||||
# Reports saved to: /tmp/HOSTNAME_status_TIMESTAMP.json
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- System resource usage
|
||||
- Container status and health
|
||||
- Critical service monitoring
|
||||
- Network connectivity checks
|
||||
- JSON output for automation
|
||||
|
||||
### `restart_service.yml` - Service Restart with Dependencies
|
||||
Restart services with proper dependency handling and health checks.
|
||||
|
||||
```bash
|
||||
# Restart a service
|
||||
ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"
|
||||
|
||||
# Restart with custom wait time
|
||||
ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"
|
||||
|
||||
# Force restart if graceful stop fails
|
||||
ansible-playbook playbooks/restart_service.yml -e "service_name=problematic-service force_restart=true"
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Dependency-aware restart order
|
||||
- Health check validation
|
||||
- Graceful stop with force option
|
||||
- Pre/post restart logging
|
||||
- Service-specific wait times
|
||||
|
||||
### `container_logs.yml` - Log Collection
|
||||
Collect logs from multiple containers for troubleshooting.
|
||||
|
||||
```bash
|
||||
# Collect logs for specific service
|
||||
ansible-playbook playbooks/container_logs.yml -e "service_name=plex"
|
||||
|
||||
# Collect logs matching pattern
|
||||
ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"
|
||||
|
||||
# Collect all container logs
|
||||
ansible-playbook playbooks/container_logs.yml -e "collect_all=true"
|
||||
|
||||
# Custom log parameters
|
||||
ansible-playbook playbooks/container_logs.yml -e "service_name=plex log_lines=500 log_since=2h"
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Pattern-based container selection
|
||||
- Error analysis and counting
|
||||
- Resource usage reporting
|
||||
- Structured log organization
|
||||
- Archive option for long-term storage
|
||||
|
||||
## 💾 Backup & Recovery Playbooks
|
||||
|
||||
### `backup_databases.yml` - Database Backup Automation
|
||||
Automated backup of all PostgreSQL and MySQL databases.
|
||||
|
||||
```bash
|
||||
# Backup all databases
|
||||
ansible-playbook playbooks/backup_databases.yml
|
||||
|
||||
# Full backup with verification
|
||||
ansible-playbook playbooks/backup_databases.yml -e "backup_type=full verify_backups=true"
|
||||
|
||||
# Specific host backup
|
||||
ansible-playbook playbooks/backup_databases.yml --limit atlantis
|
||||
|
||||
# Custom retention
|
||||
ansible-playbook playbooks/backup_databases.yml -e "backup_retention_days=60"
|
||||
```
|
||||
|
||||
**Supported Databases:**
|
||||
- **Atlantis**: Immich, Vaultwarden, Joplin, Firefly
|
||||
- **Calypso**: Authentik, Paperless
|
||||
- **Homelab VM**: Mastodon, Matrix
|
||||
|
||||
**Features:**
|
||||
- Automatic database discovery
|
||||
- Compression and verification
|
||||
- Retention management
|
||||
- Backup integrity testing
|
||||
- Multiple storage locations
|
||||
|
||||
### `backup_configs.yml` - Configuration Backup
|
||||
Backup docker-compose files, configs, and important data.
|
||||
|
||||
```bash
|
||||
# Backup configurations
|
||||
ansible-playbook playbooks/backup_configs.yml
|
||||
|
||||
# Include secrets (use with caution)
|
||||
ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"
|
||||
|
||||
# Backup without compression
|
||||
ansible-playbook playbooks/backup_configs.yml -e "compress_backups=false"
|
||||
```
|
||||
|
||||
**Backup Includes:**
|
||||
- Docker configurations
|
||||
- SSH configurations
|
||||
- Service-specific data
|
||||
- System information snapshots
|
||||
- Docker-compose files
|
||||
|
||||
### `disaster_recovery_test.yml` - DR Testing
|
||||
Test disaster recovery procedures and validate backup integrity.
|
||||
|
||||
```bash
|
||||
# Basic DR test (dry run)
|
||||
ansible-playbook playbooks/disaster_recovery_test.yml
|
||||
|
||||
# Full DR test with restore validation
|
||||
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full dry_run=false"
|
||||
|
||||
# Test with failover procedures
|
||||
ansible-playbook playbooks/disaster_recovery_test.yml -e "test_failover=true"
|
||||
```
|
||||
|
||||
**Test Components:**
|
||||
- Backup validation and integrity
|
||||
- Database restore testing
|
||||
- RTO (Recovery Time Objective) analysis
|
||||
- Service failover procedures
|
||||
- DR readiness scoring
|
||||
|
||||
## 💿 Storage Management Playbooks
|
||||
|
||||
### `disk_usage_report.yml` - Storage Monitoring
|
||||
Monitor storage usage and generate comprehensive reports.
|
||||
|
||||
```bash
|
||||
# Basic disk usage report
|
||||
ansible-playbook playbooks/disk_usage_report.yml
|
||||
|
||||
# Detailed analysis with performance data
|
||||
ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true include_performance=true"
|
||||
|
||||
# Set custom alert thresholds
|
||||
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=90 warning_threshold=80"
|
||||
|
||||
# Send alerts for critical usage
|
||||
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true"
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Filesystem usage monitoring
|
||||
- Docker storage analysis
|
||||
- Large file identification
|
||||
- Temporary file analysis
|
||||
- Alert thresholds and notifications
|
||||
- JSON output for automation
|
||||
|
||||
### `prune_containers.yml` - Docker Cleanup
|
||||
Clean up unused containers, images, volumes, and networks.
|
||||
|
||||
```bash
|
||||
# Basic cleanup (dry run)
|
||||
ansible-playbook playbooks/prune_containers.yml
|
||||
|
||||
# Live cleanup
|
||||
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
|
||||
|
||||
# Aggressive cleanup (removes old images)
|
||||
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
|
||||
|
||||
# Custom retention and log cleanup
|
||||
ansible-playbook playbooks/prune_containers.yml -e "keep_images_days=14 cleanup_logs=true max_log_size=50m"
|
||||
```
|
||||
|
||||
**Cleanup Actions:**
|
||||
- Remove stopped containers
|
||||
- Remove dangling images
|
||||
- Remove unused volumes (optional)
|
||||
- Remove unused networks
|
||||
- Truncate large container logs
|
||||
- System-wide Docker prune
|
||||
|
||||
### `log_rotation.yml` - Log Management
|
||||
Manage log files across all services and system components.
|
||||
|
||||
```bash
|
||||
# Basic log rotation (dry run)
|
||||
ansible-playbook playbooks/log_rotation.yml
|
||||
|
||||
# Live log rotation with compression
|
||||
ansible-playbook playbooks/log_rotation.yml -e "dry_run=false compress_old_logs=true"
|
||||
|
||||
# Aggressive cleanup
|
||||
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true max_log_age_days=14"
|
||||
|
||||
# Custom log size limits
|
||||
ansible-playbook playbooks/log_rotation.yml -e "max_log_size=50M"
|
||||
```
|
||||
|
||||
**Log Management:**
|
||||
- System log rotation
|
||||
- Docker container log truncation
|
||||
- Application log cleanup
|
||||
- Log compression
|
||||
- Retention policies
|
||||
- Logrotate configuration
|
||||
|
||||
## 🔒 Security Playbooks
|
||||
|
||||
### `security_updates.yml` - Automated Security Updates
|
||||
Apply security patches and system updates.
|
||||
|
||||
```bash
|
||||
# Security updates only
|
||||
ansible-playbook playbooks/security_updates.yml
|
||||
|
||||
# Security updates with reboot if needed
|
||||
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
|
||||
|
||||
# Full system update
|
||||
ansible-playbook playbooks/security_updates.yml -e "security_only=false"
|
||||
|
||||
# Include Docker updates
|
||||
ansible-playbook playbooks/security_updates.yml -e "update_docker=true"
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Security-only or full updates
|
||||
- Pre-update configuration backup
|
||||
- Kernel update detection
|
||||
- Automatic reboot handling
|
||||
- Service verification after updates
|
||||
- Update reporting and logging
|
||||
|
||||
### `certificate_renewal.yml` - SSL Certificate Management
|
||||
Manage Let's Encrypt certificates and other SSL certificates.
|
||||
|
||||
```bash
|
||||
# Check certificate status
|
||||
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
|
||||
|
||||
# Renew certificates
|
||||
ansible-playbook playbooks/certificate_renewal.yml
|
||||
|
||||
# Force renewal
|
||||
ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"
|
||||
|
||||
# Custom renewal threshold
|
||||
ansible-playbook playbooks/certificate_renewal.yml -e "renewal_threshold_days=45"
|
||||
```
|
||||
|
||||
**Certificate Support:**
|
||||
- Let's Encrypt via Certbot
|
||||
- Nginx Proxy Manager certificates
|
||||
- Traefik certificates
|
||||
- Synology DSM certificates
|
||||
|
||||
## 🏥 Monitoring Playbooks
|
||||
|
||||
### `service_health_deep.yml` - Comprehensive Health Checks
|
||||
Deep health monitoring for all homelab services.
|
||||
|
||||
```bash
|
||||
# Deep health check
|
||||
ansible-playbook playbooks/service_health_deep.yml
|
||||
|
||||
# Include performance metrics
|
||||
ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"
|
||||
|
||||
# Enable alerting
|
||||
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
|
||||
|
||||
# Custom timeout
|
||||
ansible-playbook playbooks/service_health_deep.yml -e "health_check_timeout=60"
|
||||
```
|
||||
|
||||
**Health Checks:**
|
||||
- Container health status
|
||||
- Service endpoint testing
|
||||
- Database connectivity
|
||||
- Redis connectivity
|
||||
- System performance metrics
|
||||
- Log error analysis
|
||||
- Dependency validation
|
||||
|
||||
## 🔧 Advanced Usage
|
||||
|
||||
### Combining Playbooks
|
||||
```bash
|
||||
# Complete maintenance routine
|
||||
ansible-playbook playbooks/service_status.yml
|
||||
ansible-playbook playbooks/backup_databases.yml
|
||||
ansible-playbook playbooks/security_updates.yml
|
||||
ansible-playbook playbooks/disk_usage_report.yml
|
||||
ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
|
||||
```
|
||||
|
||||
### Scheduling with Cron
|
||||
```bash
|
||||
# Add to crontab for automated execution
|
||||
# Daily backups at 2 AM
|
||||
0 2 * * * cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/backup_databases.yml
|
||||
|
||||
# Weekly cleanup on Sundays at 3 AM
|
||||
0 3 * * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
|
||||
|
||||
# Monthly DR test on first Sunday at 4 AM
|
||||
0 4 1-7 * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/disaster_recovery_test.yml
|
||||
```
|
||||
|
||||
### Custom Variables
|
||||
Create host-specific variable files:
|
||||
```bash
|
||||
# host_vars/atlantis.yml
|
||||
backup_retention_days: 60
|
||||
max_log_size: "200M"
|
||||
alert_threshold: 90
|
||||
|
||||
# host_vars/homelab_vm.yml
|
||||
security_only: false
|
||||
reboot_if_required: true
|
||||
```
|
||||
|
||||
## 📊 Monitoring and Alerting
|
||||
|
||||
### Integration with Existing Monitoring
|
||||
These playbooks integrate with your existing Prometheus/Grafana stack:
|
||||
|
||||
```bash
|
||||
# Generate metrics for Prometheus
|
||||
ansible-playbook playbooks/service_status.yml
|
||||
ansible-playbook playbooks/disk_usage_report.yml
|
||||
|
||||
# JSON outputs can be parsed by monitoring systems
|
||||
# Reports saved to /tmp/ directories with timestamps
|
||||
```
|
||||
|
||||
### Alert Configuration
|
||||
```bash
|
||||
# Enable alerts in playbooks
|
||||
ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true alert_threshold=85"
|
||||
ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
|
||||
ansible-playbook playbooks/disaster_recovery_test.yml -e "send_alerts=true"
|
||||
```
|
||||
|
||||
## 🚨 Emergency Procedures
|
||||
|
||||
### Service Recovery
|
||||
```bash
|
||||
# Quick service restart
|
||||
ansible-playbook playbooks/restart_service.yml -e "service_name=SERVICE_NAME host_target=HOST"
|
||||
|
||||
# Collect logs for troubleshooting
|
||||
ansible-playbook playbooks/container_logs.yml -e "service_name=SERVICE_NAME"
|
||||
|
||||
# Check service health
|
||||
ansible-playbook playbooks/service_health_deep.yml --limit HOST
|
||||
```
|
||||
|
||||
### Storage Emergency
|
||||
```bash
|
||||
# Check disk usage immediately
|
||||
ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=95"
|
||||
|
||||
# Emergency cleanup
|
||||
ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
|
||||
ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true dry_run=false"
|
||||
```
|
||||
|
||||
### Security Incident
|
||||
```bash
|
||||
# Apply security updates immediately
|
||||
ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
|
||||
|
||||
# Check certificate status
|
||||
ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
|
||||
```
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Playbook Fails with Permission Denied**
|
||||
```bash
|
||||
# Check SSH connectivity
|
||||
ansible all -m ping
|
||||
|
||||
# Verify sudo access
|
||||
ansible all -m shell -a "sudo whoami" --become
|
||||
```
|
||||
|
||||
**Docker Commands Fail**
|
||||
```bash
|
||||
# Check Docker daemon status
|
||||
ansible-playbook playbooks/service_status.yml --limit HOSTNAME
|
||||
|
||||
# Verify Docker group membership
|
||||
ansible HOST -m shell -a "groups $USER"
|
||||
```
|
||||
|
||||
**Backup Failures**
|
||||
```bash
|
||||
# Check backup directory permissions
|
||||
ansible HOST -m file -a "path=/volume1/backups state=directory" --become
|
||||
|
||||
# Test database connectivity
|
||||
ansible-playbook playbooks/service_health_deep.yml --limit HOST
|
||||
```
|
||||
|
||||
### Debug Mode
|
||||
```bash
|
||||
# Run with verbose output
|
||||
ansible-playbook playbooks/PLAYBOOK.yml -vvv
|
||||
|
||||
# Check specific tasks
|
||||
ansible-playbook playbooks/PLAYBOOK.yml --list-tasks
|
||||
ansible-playbook playbooks/PLAYBOOK.yml --start-at-task="TASK_NAME"
|
||||
```
|
||||
|
||||
## 📚 Integration with Existing Automation
|
||||
|
||||
These playbooks complement your existing automation:
|
||||
|
||||
### With Current Health Monitoring
|
||||
```bash
|
||||
# Existing health checks
|
||||
ansible-playbook playbooks/synology_health.yml
|
||||
ansible-playbook playbooks/check_apt_proxy.yml
|
||||
|
||||
# New comprehensive checks
|
||||
ansible-playbook playbooks/service_health_deep.yml
|
||||
ansible-playbook playbooks/disk_usage_report.yml
|
||||
```
|
||||
|
||||
### With GitOps Deployment
|
||||
```bash
|
||||
# After GitOps deployment
|
||||
ansible-playbook playbooks/service_status.yml
|
||||
ansible-playbook playbooks/backup_configs.yml
|
||||
```
|
||||
|
||||
## 🎯 Best Practices
|
||||
|
||||
### Regular Maintenance Schedule
|
||||
- **Daily**: `backup_databases.yml`
|
||||
- **Weekly**: `security_updates.yml`, `disk_usage_report.yml`
|
||||
- **Monthly**: `disaster_recovery_test.yml`, `prune_containers.yml`
|
||||
- **As Needed**: `service_health_deep.yml`, `restart_service.yml`
|
||||
|
||||
### Safety Guidelines
|
||||
- Always test with `dry_run=true` first
|
||||
- Use `--limit` for single host testing
|
||||
- Keep backups before major changes
|
||||
- Monitor service status after automation
|
||||
|
||||
### Performance Optimization
|
||||
- Run resource-intensive playbooks during low-usage hours
|
||||
- Use `--forks` to control parallelism
|
||||
- Monitor system resources during execution
|
||||
|
||||
## 📞 Support
|
||||
|
||||
For issues with these playbooks:
|
||||
1. Check the troubleshooting section above
|
||||
2. Review playbook logs in `/tmp/` directories
|
||||
3. Use debug mode (`-vvv`) for detailed output
|
||||
4. Verify integration with existing automation
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: {{ ansible_date_time.date if ansible_date_time is defined else 'Manual Update Required' }}
|
||||
**Total Playbooks**: 10+ comprehensive automation playbooks
|
||||
**Coverage**: Complete operational automation for homelab management
|
||||
Reference in New Issue
Block a user