Sanitized mirror from private repository - 2026-04-18 11:19:59 UTC

2026-04-18 11:19:59 +00:00
commit fb00a325d1
1418 changed files with 359990 additions and 0 deletions
--- a/ansible/automation/playbooks/README.md
+++ b/ansible/automation/playbooks/README.md
@@ -0,0 +1,527 @@
+# 🏠 Homelab Ansible Playbooks
+
+Comprehensive automation playbooks for managing your homelab infrastructure. These playbooks provide operational automation beyond the existing health monitoring and system management.
+
+## 📋 Quick Reference
+
+| Category | Playbook | Purpose | Priority |
+|----------|----------|---------|----------|
+| **Service Management** | `service_status.yml` | Get status of all services | ⭐⭐⭐ |
+| | `restart_service.yml` | Restart services with dependencies | ⭐⭐⭐ |
+| | `container_logs.yml` | Collect logs for troubleshooting | ⭐⭐⭐ |
+| **Backup & Recovery** | `backup_databases.yml` | Automated database backups | ⭐⭐⭐ |
+| | `backup_configs.yml` | Configuration and data backups | ⭐⭐⭐ |
+| | `disaster_recovery_test.yml` | Test DR procedures | ⭐⭐ |
+| **Storage Management** | `disk_usage_report.yml` | Monitor storage usage | ⭐⭐⭐ |
+| | `prune_containers.yml` | Clean up Docker resources | ⭐⭐ |
+| | `log_rotation.yml` | Manage log files | ⭐⭐ |
+| **Security** | `security_updates.yml` | Automated security patches | ⭐⭐⭐ |
+| | `certificate_renewal.yml` | SSL certificate management | ⭐⭐ |
+| **Monitoring** | `service_health_deep.yml` | Comprehensive health checks | ⭐⭐ |
+
+## 🚀 Quick Start
+
+### Prerequisites
+- Ansible 2.12+
+- SSH access to all hosts via Tailscale
+- Existing inventory from `/home/homelab/organized/repos/homelab/ansible/automation/hosts.ini`
+
+### Run Your First Playbook
+```bash
+cd /home/homelab/organized/repos/homelab/ansible/automation
+
+# Check status of all services
+ansible-playbook playbooks/service_status.yml
+
+# Check disk usage across all hosts
+ansible-playbook playbooks/disk_usage_report.yml
+
+# Backup all databases
+ansible-playbook playbooks/backup_databases.yml
+```
+
+## 📦 Service Management Playbooks
+
+### `service_status.yml` - Service Status Check
+Get comprehensive status of all services across your homelab.
+
+```bash
+# Check all hosts
+ansible-playbook playbooks/service_status.yml
+
+# Check specific host
+ansible-playbook playbooks/service_status.yml --limit atlantis
+
+# Generate JSON reports
+ansible-playbook playbooks/service_status.yml
+# Reports saved to: /tmp/HOSTNAME_status_TIMESTAMP.json
+```
+
+**Features:**
+- System resource usage
+- Container status and health
+- Critical service monitoring
+- Network connectivity checks
+- JSON output for automation
+
+### `restart_service.yml` - Service Restart with Dependencies
+Restart services with proper dependency handling and health checks.
+
+```bash
+# Restart a service
+ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"
+
+# Restart with custom wait time
+ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"
+
+# Force restart if graceful stop fails
+ansible-playbook playbooks/restart_service.yml -e "service_name=problematic-service force_restart=true"
+```
+
+**Features:**
+- Dependency-aware restart order
+- Health check validation
+- Graceful stop with force option
+- Pre/post restart logging
+- Service-specific wait times
+
+### `container_logs.yml` - Log Collection
+Collect logs from multiple containers for troubleshooting.
+
+```bash
+# Collect logs for specific service
+ansible-playbook playbooks/container_logs.yml -e "service_name=plex"
+
+# Collect logs matching pattern
+ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"
+
+# Collect all container logs
+ansible-playbook playbooks/container_logs.yml -e "collect_all=true"
+
+# Custom log parameters
+ansible-playbook playbooks/container_logs.yml -e "service_name=plex log_lines=500 log_since=2h"
+```
+
+**Features:**
+- Pattern-based container selection
+- Error analysis and counting
+- Resource usage reporting
+- Structured log organization
+- Archive option for long-term storage
+
+## 💾 Backup & Recovery Playbooks
+
+### `backup_databases.yml` - Database Backup Automation
+Automated backup of all PostgreSQL and MySQL databases.
+
+```bash
+# Backup all databases
+ansible-playbook playbooks/backup_databases.yml
+
+# Full backup with verification
+ansible-playbook playbooks/backup_databases.yml -e "backup_type=full verify_backups=true"
+
+# Specific host backup
+ansible-playbook playbooks/backup_databases.yml --limit atlantis
+
+# Custom retention
+ansible-playbook playbooks/backup_databases.yml -e "backup_retention_days=60"
+```
+
+**Supported Databases:**
+- **Atlantis**: Immich, Vaultwarden, Joplin, Firefly
+- **Calypso**: Authentik, Paperless
+- **Homelab VM**: Mastodon, Matrix
+
+**Features:**
+- Automatic database discovery
+- Compression and verification
+- Retention management
+- Backup integrity testing
+- Multiple storage locations
+
+### `backup_configs.yml` - Configuration Backup
+Backup docker-compose files, configs, and important data.
+
+```bash
+# Backup configurations
+ansible-playbook playbooks/backup_configs.yml
+
+# Include secrets (use with caution)
+ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"
+
+# Backup without compression
+ansible-playbook playbooks/backup_configs.yml -e "compress_backups=false"
+```
+
+**Backup Includes:**
+- Docker configurations
+- SSH configurations
+- Service-specific data
+- System information snapshots
+- Docker-compose files
+
+### `disaster_recovery_test.yml` - DR Testing
+Test disaster recovery procedures and validate backup integrity.
+
+```bash
+# Basic DR test (dry run)
+ansible-playbook playbooks/disaster_recovery_test.yml
+
+# Full DR test with restore validation
+ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full dry_run=false"
+
+# Test with failover procedures
+ansible-playbook playbooks/disaster_recovery_test.yml -e "test_failover=true"
+```
+
+**Test Components:**
+- Backup validation and integrity
+- Database restore testing
+- RTO (Recovery Time Objective) analysis
+- Service failover procedures
+- DR readiness scoring
+
+## 💿 Storage Management Playbooks
+
+### `disk_usage_report.yml` - Storage Monitoring
+Monitor storage usage and generate comprehensive reports.
+
+```bash
+# Basic disk usage report
+ansible-playbook playbooks/disk_usage_report.yml
+
+# Detailed analysis with performance data
+ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true include_performance=true"
+
+# Set custom alert thresholds
+ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=90 warning_threshold=80"
+
+# Send alerts for critical usage
+ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true"
+```
+
+**Features:**
+- Filesystem usage monitoring
+- Docker storage analysis
+- Large file identification
+- Temporary file analysis
+- Alert thresholds and notifications
+- JSON output for automation
+
+### `prune_containers.yml` - Docker Cleanup
+Clean up unused containers, images, volumes, and networks.
+
+```bash
+# Basic cleanup (dry run)
+ansible-playbook playbooks/prune_containers.yml
+
+# Live cleanup
+ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
+
+# Aggressive cleanup (removes old images)
+ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
+
+# Custom retention and log cleanup
+ansible-playbook playbooks/prune_containers.yml -e "keep_images_days=14 cleanup_logs=true max_log_size=50m"
+```
+
+**Cleanup Actions:**
+- Remove stopped containers
+- Remove dangling images
+- Remove unused volumes (optional)
+- Remove unused networks
+- Truncate large container logs
+- System-wide Docker prune
+
+### `log_rotation.yml` - Log Management
+Manage log files across all services and system components.
+
+```bash
+# Basic log rotation (dry run)
+ansible-playbook playbooks/log_rotation.yml
+
+# Live log rotation with compression
+ansible-playbook playbooks/log_rotation.yml -e "dry_run=false compress_old_logs=true"
+
+# Aggressive cleanup
+ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true max_log_age_days=14"
+
+# Custom log size limits
+ansible-playbook playbooks/log_rotation.yml -e "max_log_size=50M"
+```
+
+**Log Management:**
+- System log rotation
+- Docker container log truncation
+- Application log cleanup
+- Log compression
+- Retention policies
+- Logrotate configuration
+
+## 🔒 Security Playbooks
+
+### `security_updates.yml` - Automated Security Updates
+Apply security patches and system updates.
+
+```bash
+# Security updates only
+ansible-playbook playbooks/security_updates.yml
+
+# Security updates with reboot if needed
+ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
+
+# Full system update
+ansible-playbook playbooks/security_updates.yml -e "security_only=false"
+
+# Include Docker updates
+ansible-playbook playbooks/security_updates.yml -e "update_docker=true"
+```
+
+**Features:**
+- Security-only or full updates
+- Pre-update configuration backup
+- Kernel update detection
+- Automatic reboot handling
+- Service verification after updates
+- Update reporting and logging
+
+### `certificate_renewal.yml` - SSL Certificate Management
+Manage Let's Encrypt certificates and other SSL certificates.
+
+```bash
+# Check certificate status
+ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
+
+# Renew certificates
+ansible-playbook playbooks/certificate_renewal.yml
+
+# Force renewal
+ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"
+
+# Custom renewal threshold
+ansible-playbook playbooks/certificate_renewal.yml -e "renewal_threshold_days=45"
+```
+
+**Certificate Support:**
+- Let's Encrypt via Certbot
+- Nginx Proxy Manager certificates
+- Traefik certificates
+- Synology DSM certificates
+
+## 🏥 Monitoring Playbooks
+
+### `service_health_deep.yml` - Comprehensive Health Checks
+Deep health monitoring for all homelab services.
+
+```bash
+# Deep health check
+ansible-playbook playbooks/service_health_deep.yml
+
+# Include performance metrics
+ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"
+
+# Enable alerting
+ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
+
+# Custom timeout
+ansible-playbook playbooks/service_health_deep.yml -e "health_check_timeout=60"
+```
+
+**Health Checks:**
+- Container health status
+- Service endpoint testing
+- Database connectivity
+- Redis connectivity
+- System performance metrics
+- Log error analysis
+- Dependency validation
+
+## 🔧 Advanced Usage
+
+### Combining Playbooks
+```bash
+# Complete maintenance routine
+ansible-playbook playbooks/service_status.yml
+ansible-playbook playbooks/backup_databases.yml
+ansible-playbook playbooks/security_updates.yml
+ansible-playbook playbooks/disk_usage_report.yml
+ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
+```
+
+### Scheduling with Cron
+```bash
+# Add to crontab for automated execution
+# Daily backups at 2 AM
+0 2 * * * cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/backup_databases.yml
+
+# Weekly cleanup on Sundays at 3 AM
+0 3 * * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
+
+# Monthly DR test on first Sunday at 4 AM
+0 4 1-7 * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/disaster_recovery_test.yml
+```
+
+### Custom Variables
+Create host-specific variable files:
+```bash
+# host_vars/atlantis.yml
+backup_retention_days: 60
+max_log_size: "200M"
+alert_threshold: 90
+
+# host_vars/homelab_vm.yml
+security_only: false
+reboot_if_required: true
+```
+
+## 📊 Monitoring and Alerting
+
+### Integration with Existing Monitoring
+These playbooks integrate with your existing Prometheus/Grafana stack:
+
+```bash
+# Generate metrics for Prometheus
+ansible-playbook playbooks/service_status.yml
+ansible-playbook playbooks/disk_usage_report.yml
+
+# JSON outputs can be parsed by monitoring systems
+# Reports saved to /tmp/ directories with timestamps
+```
+
+### Alert Configuration
+```bash
+# Enable alerts in playbooks
+ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true alert_threshold=85"
+ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
+ansible-playbook playbooks/disaster_recovery_test.yml -e "send_alerts=true"
+```
+
+## 🚨 Emergency Procedures
+
+### Service Recovery
+```bash
+# Quick service restart
+ansible-playbook playbooks/restart_service.yml -e "service_name=SERVICE_NAME host_target=HOST"
+
+# Collect logs for troubleshooting
+ansible-playbook playbooks/container_logs.yml -e "service_name=SERVICE_NAME"
+
+# Check service health
+ansible-playbook playbooks/service_health_deep.yml --limit HOST
+```
+
+### Storage Emergency
+```bash
+# Check disk usage immediately
+ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=95"
+
+# Emergency cleanup
+ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
+ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true dry_run=false"
+```
+
+### Security Incident
+```bash
+# Apply security updates immediately
+ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
+
+# Check certificate status
+ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
+```
+
+## 🔍 Troubleshooting
+
+### Common Issues
+
+**Playbook Fails with Permission Denied**
+```bash
+# Check SSH connectivity
+ansible all -m ping
+
+# Verify sudo access
+ansible all -m shell -a "sudo whoami" --become
+```
+
+**Docker Commands Fail**
+```bash
+# Check Docker daemon status
+ansible-playbook playbooks/service_status.yml --limit HOSTNAME
+
+# Verify Docker group membership
+ansible HOST -m shell -a "groups $USER"
+```
+
+**Backup Failures**
+```bash
+# Check backup directory permissions
+ansible HOST -m file -a "path=/volume1/backups state=directory" --become
+
+# Test database connectivity
+ansible-playbook playbooks/service_health_deep.yml --limit HOST
+```
+
+### Debug Mode
+```bash
+# Run with verbose output
+ansible-playbook playbooks/PLAYBOOK.yml -vvv
+
+# Check specific tasks
+ansible-playbook playbooks/PLAYBOOK.yml --list-tasks
+ansible-playbook playbooks/PLAYBOOK.yml --start-at-task="TASK_NAME"
+```
+
+## 📚 Integration with Existing Automation
+
+These playbooks complement your existing automation:
+
+### With Current Health Monitoring
+```bash
+# Existing health checks
+ansible-playbook playbooks/synology_health.yml
+ansible-playbook playbooks/check_apt_proxy.yml
+
+# New comprehensive checks
+ansible-playbook playbooks/service_health_deep.yml
+ansible-playbook playbooks/disk_usage_report.yml
+```
+
+### With GitOps Deployment
+```bash
+# After GitOps deployment
+ansible-playbook playbooks/service_status.yml
+ansible-playbook playbooks/backup_configs.yml
+```
+
+## 🎯 Best Practices
+
+### Regular Maintenance Schedule
+- **Daily**: `backup_databases.yml`
+- **Weekly**: `security_updates.yml`, `disk_usage_report.yml`
+- **Monthly**: `disaster_recovery_test.yml`, `prune_containers.yml`
+- **As Needed**: `service_health_deep.yml`, `restart_service.yml`
+
+### Safety Guidelines
+- Always test with `dry_run=true` first
+- Use `--limit` for single host testing
+- Keep backups before major changes
+- Monitor service status after automation
+
+### Performance Optimization
+- Run resource-intensive playbooks during low-usage hours
+- Use `--forks` to control parallelism
+- Monitor system resources during execution
+
+## 📞 Support
+
+For issues with these playbooks:
+1. Check the troubleshooting section above
+2. Review playbook logs in `/tmp/` directories
+3. Use debug mode (`-vvv`) for detailed output
+4. Verify integration with existing automation
+
+---
+
+**Last Updated**: {{ ansible_date_time.date if ansible_date_time is defined else 'Manual Update Required' }}  
+**Total Playbooks**: 10+ comprehensive automation playbooks  
+**Coverage**: Complete operational automation for homelab management