Sanitized mirror from private repository - 2026-04-18 11:19:59 UTC

2026-04-18 11:19:59 +00:00
commit fb00a325d1
1418 changed files with 359990 additions and 0 deletions
--- a/ansible/automation/playbooks/README.md
+++ b/ansible/automation/playbooks/README.md
@@ -0,0 +1,527 @@
+# 🏠 Homelab Ansible Playbooks
+
+Comprehensive automation playbooks for managing your homelab infrastructure. These playbooks provide operational automation beyond the existing health monitoring and system management.
+
+## 📋 Quick Reference
+
+| Category | Playbook | Purpose | Priority |
+|----------|----------|---------|----------|
+| **Service Management** | `service_status.yml` | Get status of all services | ⭐⭐⭐ |
+| | `restart_service.yml` | Restart services with dependencies | ⭐⭐⭐ |
+| | `container_logs.yml` | Collect logs for troubleshooting | ⭐⭐⭐ |
+| **Backup & Recovery** | `backup_databases.yml` | Automated database backups | ⭐⭐⭐ |
+| | `backup_configs.yml` | Configuration and data backups | ⭐⭐⭐ |
+| | `disaster_recovery_test.yml` | Test DR procedures | ⭐⭐ |
+| **Storage Management** | `disk_usage_report.yml` | Monitor storage usage | ⭐⭐⭐ |
+| | `prune_containers.yml` | Clean up Docker resources | ⭐⭐ |
+| | `log_rotation.yml` | Manage log files | ⭐⭐ |
+| **Security** | `security_updates.yml` | Automated security patches | ⭐⭐⭐ |
+| | `certificate_renewal.yml` | SSL certificate management | ⭐⭐ |
+| **Monitoring** | `service_health_deep.yml` | Comprehensive health checks | ⭐⭐ |
+
+## 🚀 Quick Start
+
+### Prerequisites
+- Ansible 2.12+
+- SSH access to all hosts via Tailscale
+- Existing inventory from `/home/homelab/organized/repos/homelab/ansible/automation/hosts.ini`
+
+### Run Your First Playbook
+```bash
+cd /home/homelab/organized/repos/homelab/ansible/automation
+
+# Check status of all services
+ansible-playbook playbooks/service_status.yml
+
+# Check disk usage across all hosts
+ansible-playbook playbooks/disk_usage_report.yml
+
+# Backup all databases
+ansible-playbook playbooks/backup_databases.yml
+```
+
+## 📦 Service Management Playbooks
+
+### `service_status.yml` - Service Status Check
+Get comprehensive status of all services across your homelab.
+
+```bash
+# Check all hosts
+ansible-playbook playbooks/service_status.yml
+
+# Check specific host
+ansible-playbook playbooks/service_status.yml --limit atlantis
+
+# Generate JSON reports
+ansible-playbook playbooks/service_status.yml
+# Reports saved to: /tmp/HOSTNAME_status_TIMESTAMP.json
+```
+
+**Features:**
+- System resource usage
+- Container status and health
+- Critical service monitoring
+- Network connectivity checks
+- JSON output for automation
+
+### `restart_service.yml` - Service Restart with Dependencies
+Restart services with proper dependency handling and health checks.
+
+```bash
+# Restart a service
+ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"
+
+# Restart with custom wait time
+ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"
+
+# Force restart if graceful stop fails
+ansible-playbook playbooks/restart_service.yml -e "service_name=problematic-service force_restart=true"
+```
+
+**Features:**
+- Dependency-aware restart order
+- Health check validation
+- Graceful stop with force option
+- Pre/post restart logging
+- Service-specific wait times
+
+### `container_logs.yml` - Log Collection
+Collect logs from multiple containers for troubleshooting.
+
+```bash
+# Collect logs for specific service
+ansible-playbook playbooks/container_logs.yml -e "service_name=plex"
+
+# Collect logs matching pattern
+ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"
+
+# Collect all container logs
+ansible-playbook playbooks/container_logs.yml -e "collect_all=true"
+
+# Custom log parameters
+ansible-playbook playbooks/container_logs.yml -e "service_name=plex log_lines=500 log_since=2h"
+```
+
+**Features:**
+- Pattern-based container selection
+- Error analysis and counting
+- Resource usage reporting
+- Structured log organization
+- Archive option for long-term storage
+
+## 💾 Backup & Recovery Playbooks
+
+### `backup_databases.yml` - Database Backup Automation
+Automated backup of all PostgreSQL and MySQL databases.
+
+```bash
+# Backup all databases
+ansible-playbook playbooks/backup_databases.yml
+
+# Full backup with verification
+ansible-playbook playbooks/backup_databases.yml -e "backup_type=full verify_backups=true"
+
+# Specific host backup
+ansible-playbook playbooks/backup_databases.yml --limit atlantis
+
+# Custom retention
+ansible-playbook playbooks/backup_databases.yml -e "backup_retention_days=60"
+```
+
+**Supported Databases:**
+- **Atlantis**: Immich, Vaultwarden, Joplin, Firefly
+- **Calypso**: Authentik, Paperless
+- **Homelab VM**: Mastodon, Matrix
+
+**Features:**
+- Automatic database discovery
+- Compression and verification
+- Retention management
+- Backup integrity testing
+- Multiple storage locations
+
+### `backup_configs.yml` - Configuration Backup
+Backup docker-compose files, configs, and important data.
+
+```bash
+# Backup configurations
+ansible-playbook playbooks/backup_configs.yml
+
+# Include secrets (use with caution)
+ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"
+
+# Backup without compression
+ansible-playbook playbooks/backup_configs.yml -e "compress_backups=false"
+```
+
+**Backup Includes:**
+- Docker configurations
+- SSH configurations
+- Service-specific data
+- System information snapshots
+- Docker-compose files
+
+### `disaster_recovery_test.yml` - DR Testing
+Test disaster recovery procedures and validate backup integrity.
+
+```bash
+# Basic DR test (dry run)
+ansible-playbook playbooks/disaster_recovery_test.yml
+
+# Full DR test with restore validation
+ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full dry_run=false"
+
+# Test with failover procedures
+ansible-playbook playbooks/disaster_recovery_test.yml -e "test_failover=true"
+```
+
+**Test Components:**
+- Backup validation and integrity
+- Database restore testing
+- RTO (Recovery Time Objective) analysis
+- Service failover procedures
+- DR readiness scoring
+
+## 💿 Storage Management Playbooks
+
+### `disk_usage_report.yml` - Storage Monitoring
+Monitor storage usage and generate comprehensive reports.
+
+```bash
+# Basic disk usage report
+ansible-playbook playbooks/disk_usage_report.yml
+
+# Detailed analysis with performance data
+ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true include_performance=true"
+
+# Set custom alert thresholds
+ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=90 warning_threshold=80"
+
+# Send alerts for critical usage
+ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true"
+```
+
+**Features:**
+- Filesystem usage monitoring
+- Docker storage analysis
+- Large file identification
+- Temporary file analysis
+- Alert thresholds and notifications
+- JSON output for automation
+
+### `prune_containers.yml` - Docker Cleanup
+Clean up unused containers, images, volumes, and networks.
+
+```bash
+# Basic cleanup (dry run)
+ansible-playbook playbooks/prune_containers.yml
+
+# Live cleanup
+ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
+
+# Aggressive cleanup (removes old images)
+ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
+
+# Custom retention and log cleanup
+ansible-playbook playbooks/prune_containers.yml -e "keep_images_days=14 cleanup_logs=true max_log_size=50m"
+```
+
+**Cleanup Actions:**
+- Remove stopped containers
+- Remove dangling images
+- Remove unused volumes (optional)
+- Remove unused networks
+- Truncate large container logs
+- System-wide Docker prune
+
+### `log_rotation.yml` - Log Management
+Manage log files across all services and system components.
+
+```bash
+# Basic log rotation (dry run)
+ansible-playbook playbooks/log_rotation.yml
+
+# Live log rotation with compression
+ansible-playbook playbooks/log_rotation.yml -e "dry_run=false compress_old_logs=true"
+
+# Aggressive cleanup
+ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true max_log_age_days=14"
+
+# Custom log size limits
+ansible-playbook playbooks/log_rotation.yml -e "max_log_size=50M"
+```
+
+**Log Management:**
+- System log rotation
+- Docker container log truncation
+- Application log cleanup
+- Log compression
+- Retention policies
+- Logrotate configuration
+
+## 🔒 Security Playbooks
+
+### `security_updates.yml` - Automated Security Updates
+Apply security patches and system updates.
+
+```bash
+# Security updates only
+ansible-playbook playbooks/security_updates.yml
+
+# Security updates with reboot if needed
+ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
+
+# Full system update
+ansible-playbook playbooks/security_updates.yml -e "security_only=false"
+
+# Include Docker updates
+ansible-playbook playbooks/security_updates.yml -e "update_docker=true"
+```
+
+**Features:**
+- Security-only or full updates
+- Pre-update configuration backup
+- Kernel update detection
+- Automatic reboot handling
+- Service verification after updates
+- Update reporting and logging
+
+### `certificate_renewal.yml` - SSL Certificate Management
+Manage Let's Encrypt certificates and other SSL certificates.
+
+```bash
+# Check certificate status
+ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
+
+# Renew certificates
+ansible-playbook playbooks/certificate_renewal.yml
+
+# Force renewal
+ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"
+
+# Custom renewal threshold
+ansible-playbook playbooks/certificate_renewal.yml -e "renewal_threshold_days=45"
+```
+
+**Certificate Support:**
+- Let's Encrypt via Certbot
+- Nginx Proxy Manager certificates
+- Traefik certificates
+- Synology DSM certificates
+
+## 🏥 Monitoring Playbooks
+
+### `service_health_deep.yml` - Comprehensive Health Checks
+Deep health monitoring for all homelab services.
+
+```bash
+# Deep health check
+ansible-playbook playbooks/service_health_deep.yml
+
+# Include performance metrics
+ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"
+
+# Enable alerting
+ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
+
+# Custom timeout
+ansible-playbook playbooks/service_health_deep.yml -e "health_check_timeout=60"
+```
+
+**Health Checks:**
+- Container health status
+- Service endpoint testing
+- Database connectivity
+- Redis connectivity
+- System performance metrics
+- Log error analysis
+- Dependency validation
+
+## 🔧 Advanced Usage
+
+### Combining Playbooks
+```bash
+# Complete maintenance routine
+ansible-playbook playbooks/service_status.yml
+ansible-playbook playbooks/backup_databases.yml
+ansible-playbook playbooks/security_updates.yml
+ansible-playbook playbooks/disk_usage_report.yml
+ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
+```
+
+### Scheduling with Cron
+```bash
+# Add to crontab for automated execution
+# Daily backups at 2 AM
+0 2 * * * cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/backup_databases.yml
+
+# Weekly cleanup on Sundays at 3 AM
+0 3 * * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/prune_containers.yml -e "dry_run=false"
+
+# Monthly DR test on first Sunday at 4 AM
+0 4 1-7 * 0 cd /home/homelab/organized/repos/homelab/ansible/automation && ansible-playbook playbooks/disaster_recovery_test.yml
+```
+
+### Custom Variables
+Create host-specific variable files:
+```bash
+# host_vars/atlantis.yml
+backup_retention_days: 60
+max_log_size: "200M"
+alert_threshold: 90
+
+# host_vars/homelab_vm.yml
+security_only: false
+reboot_if_required: true
+```
+
+## 📊 Monitoring and Alerting
+
+### Integration with Existing Monitoring
+These playbooks integrate with your existing Prometheus/Grafana stack:
+
+```bash
+# Generate metrics for Prometheus
+ansible-playbook playbooks/service_status.yml
+ansible-playbook playbooks/disk_usage_report.yml
+
+# JSON outputs can be parsed by monitoring systems
+# Reports saved to /tmp/ directories with timestamps
+```
+
+### Alert Configuration
+```bash
+# Enable alerts in playbooks
+ansible-playbook playbooks/disk_usage_report.yml -e "send_alerts=true alert_threshold=85"
+ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
+ansible-playbook playbooks/disaster_recovery_test.yml -e "send_alerts=true"
+```
+
+## 🚨 Emergency Procedures
+
+### Service Recovery
+```bash
+# Quick service restart
+ansible-playbook playbooks/restart_service.yml -e "service_name=SERVICE_NAME host_target=HOST"
+
+# Collect logs for troubleshooting
+ansible-playbook playbooks/container_logs.yml -e "service_name=SERVICE_NAME"
+
+# Check service health
+ansible-playbook playbooks/service_health_deep.yml --limit HOST
+```
+
+### Storage Emergency
+```bash
+# Check disk usage immediately
+ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=95"
+
+# Emergency cleanup
+ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true dry_run=false"
+ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true dry_run=false"
+```
+
+### Security Incident
+```bash
+# Apply security updates immediately
+ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
+
+# Check certificate status
+ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
+```
+
+## 🔍 Troubleshooting
+
+### Common Issues
+
+**Playbook Fails with Permission Denied**
+```bash
+# Check SSH connectivity
+ansible all -m ping
+
+# Verify sudo access
+ansible all -m shell -a "sudo whoami" --become
+```
+
+**Docker Commands Fail**
+```bash
+# Check Docker daemon status
+ansible-playbook playbooks/service_status.yml --limit HOSTNAME
+
+# Verify Docker group membership
+ansible HOST -m shell -a "groups $USER"
+```
+
+**Backup Failures**
+```bash
+# Check backup directory permissions
+ansible HOST -m file -a "path=/volume1/backups state=directory" --become
+
+# Test database connectivity
+ansible-playbook playbooks/service_health_deep.yml --limit HOST
+```
+
+### Debug Mode
+```bash
+# Run with verbose output
+ansible-playbook playbooks/PLAYBOOK.yml -vvv
+
+# Check specific tasks
+ansible-playbook playbooks/PLAYBOOK.yml --list-tasks
+ansible-playbook playbooks/PLAYBOOK.yml --start-at-task="TASK_NAME"
+```
+
+## 📚 Integration with Existing Automation
+
+These playbooks complement your existing automation:
+
+### With Current Health Monitoring
+```bash
+# Existing health checks
+ansible-playbook playbooks/synology_health.yml
+ansible-playbook playbooks/check_apt_proxy.yml
+
+# New comprehensive checks
+ansible-playbook playbooks/service_health_deep.yml
+ansible-playbook playbooks/disk_usage_report.yml
+```
+
+### With GitOps Deployment
+```bash
+# After GitOps deployment
+ansible-playbook playbooks/service_status.yml
+ansible-playbook playbooks/backup_configs.yml
+```
+
+## 🎯 Best Practices
+
+### Regular Maintenance Schedule
+- **Daily**: `backup_databases.yml`
+- **Weekly**: `security_updates.yml`, `disk_usage_report.yml`
+- **Monthly**: `disaster_recovery_test.yml`, `prune_containers.yml`
+- **As Needed**: `service_health_deep.yml`, `restart_service.yml`
+
+### Safety Guidelines
+- Always test with `dry_run=true` first
+- Use `--limit` for single host testing
+- Keep backups before major changes
+- Monitor service status after automation
+
+### Performance Optimization
+- Run resource-intensive playbooks during low-usage hours
+- Use `--forks` to control parallelism
+- Monitor system resources during execution
+
+## 📞 Support
+
+For issues with these playbooks:
+1. Check the troubleshooting section above
+2. Review playbook logs in `/tmp/` directories
+3. Use debug mode (`-vvv`) for detailed output
+4. Verify integration with existing automation
+
+---
+
+**Last Updated**: {{ ansible_date_time.date if ansible_date_time is defined else 'Manual Update Required' }}  
+**Total Playbooks**: 10+ comprehensive automation playbooks  
+**Coverage**: Complete operational automation for homelab management
--- a/ansible/automation/playbooks/README_NEW_PLAYBOOKS.md
+++ b/ansible/automation/playbooks/README_NEW_PLAYBOOKS.md
@@ -0,0 +1,276 @@
+# 🚀 New Ansible Playbooks for Homelab Management
+
+## 📋 Overview
+
+This document describes the **7 new advanced playbooks** created to enhance your homelab automation capabilities for managing **157 containers** across **5 hosts**.
+
+## ✅ **GITEA ACTIONS ISSUE - RESOLVED**
+
+**Problem**: Stuck workflow run #195 (queued since 2026-02-21 10:06:58 UTC)  
+**Root Cause**: No Gitea Actions runners configured  
+**Solution**: ✅ **DEPLOYED** - Gitea Actions runner now active  
+**Status**: 
+- ✅ Runner: **ONLINE** and processing workflows
+- ✅ Workflow #196: **IN PROGRESS** (previously stuck #195 cancelled)
+- ✅ Service: `gitea-runner.service` active and enabled
+
+---
+
+## 🎯 **NEW PLAYBOOKS CREATED**
+
+### 1. **setup_gitea_runner.yml** ⚡
+**Purpose**: Deploy and configure Gitea Actions runners  
+**Usage**: `ansible-playbook -i hosts.ini playbooks/setup_gitea_runner.yml --limit homelab`
+
+**Features**:
+- Downloads and installs act_runner binary
+- Registers runner with Gitea instance
+- Creates systemd service for automatic startup
+- Configures runner with appropriate labels
+- Verifies registration and service status
+
+**Status**: ✅ **DEPLOYED** - Runner active and processing workflows
+
+---
+
+### 2. **portainer_stack_management.yml** 🐳
+**Purpose**: GitOps & Portainer integration for managing 69 GitOps stacks  
+**Usage**: `ansible-playbook -i hosts.ini playbooks/portainer_stack_management.yml`
+
+**Features**:
+- Authenticates with Portainer API across all endpoints
+- Analyzes GitOps vs non-GitOps stack distribution
+- Triggers GitOps sync for all managed stacks
+- Generates comprehensive stack health reports
+- Identifies stacks requiring manual management
+
+**Key Capabilities**:
+- Manages **69/71 GitOps stacks** automatically
+- Cross-endpoint stack coordination
+- Rollback capabilities for failed deployments
+- Health monitoring and reporting
+
+---
+
+### 3. **container_dependency_orchestrator.yml** 🔄
+**Purpose**: Smart restart ordering with dependency management for 157 containers  
+**Usage**: `ansible-playbook -i hosts.ini playbooks/container_dependency_orchestrator.yml`
+
+**Features**:
+- **5-tier dependency management**:
+  - Tier 1: Infrastructure (postgres, redis, mariadb)
+  - Tier 2: Core Services (authentik, gitea, portainer)
+  - Tier 3: Applications (plex, sonarr, immich)
+  - Tier 4: Monitoring (prometheus, grafana)
+  - Tier 5: Utilities (watchtower, syncthing)
+- Health check validation before proceeding
+- Cross-host dependency awareness
+- Intelligent restart sequencing
+
+**Key Benefits**:
+- Prevents cascade failures during updates
+- Ensures proper startup order
+- Minimizes downtime during maintenance
+
+---
+
+### 4. **synology_backup_orchestrator.yml** 💾
+**Purpose**: Coordinate backups across Atlantis/Calypso with integrity verification  
+**Usage**: `ansible-playbook -i hosts.ini playbooks/synology_backup_orchestrator.yml --limit synology`
+
+**Features**:
+- **Multi-tier backup strategy**:
+  - Docker volumes and configurations
+  - Database dumps with consistency checks
+  - System configurations and SSH keys
+- **Backup verification**:
+  - Integrity checks for all archives
+  - Database connection validation
+  - Restore testing capabilities
+- **Retention management**: Configurable cleanup policies
+- **Critical container protection**: Minimal downtime approach
+
+**Key Capabilities**:
+- Coordinates between Atlantis (DS1823xs+) and Calypso (DS723+)
+- Handles 157 containers intelligently
+- Provides detailed backup reports
+
+---
+
+### 5. **tailscale_mesh_management.yml** 🌐
+**Purpose**: Validate mesh connectivity and manage VPN performance across all hosts  
+**Usage**: `ansible-playbook -i hosts.ini playbooks/tailscale_mesh_management.yml`
+
+**Features**:
+- **Mesh topology analysis**:
+  - Online/offline peer detection
+  - Missing node identification
+  - Connectivity performance testing
+- **Network diagnostics**:
+  - Latency measurements to key nodes
+  - Route table validation
+  - DNS configuration checks
+- **Security management**:
+  - Exit node status monitoring
+  - ACL validation (with API key)
+  - Update availability checks
+
+**Key Benefits**:
+- Ensures reliable connectivity across 5 hosts
+- Proactive network issue detection
+- Performance optimization insights
+
+---
+
+### 6. **prometheus_target_discovery.yml** 📊
+**Purpose**: Auto-discover containers for monitoring and validate coverage  
+**Usage**: `ansible-playbook -i hosts.ini playbooks/prometheus_target_discovery.yml`
+
+**Features**:
+- **Automatic exporter discovery**:
+  - node_exporter, cAdvisor, SNMP exporter
+  - Custom application metrics endpoints
+  - Container port mapping analysis
+- **Monitoring gap identification**:
+  - Missing exporters by host type
+  - Uncovered services detection
+  - Coverage percentage calculation
+- **Configuration generation**:
+  - Prometheus target configs
+  - SNMP monitoring for Synology
+  - Consolidated monitoring setup
+
+**Key Capabilities**:
+- Ensures all 157 containers are monitored
+- Generates ready-to-use Prometheus configs
+- Provides monitoring coverage reports
+
+---
+
+### 7. **disaster_recovery_orchestrator.yml** 🚨
+**Purpose**: Full infrastructure backup and recovery procedures  
+**Usage**: `ansible-playbook -i hosts.ini playbooks/disaster_recovery_orchestrator.yml`
+
+**Features**:
+- **Comprehensive backup strategy**:
+  - System inventories and configurations
+  - Database backups with verification
+  - Docker volumes and application data
+- **Recovery planning**:
+  - Host-specific recovery procedures
+  - Service priority restoration order
+  - Cross-host dependency mapping
+- **Testing and validation**:
+  - Backup integrity verification
+  - Recovery readiness assessment
+  - Emergency procedure documentation
+
+**Key Benefits**:
+- Complete disaster recovery capability
+- Automated backup verification
+- Detailed recovery documentation
+
+---
+
+## 🎯 **IMPLEMENTATION PRIORITY**
+
+### **Immediate Use (High ROI)**
+1. **portainer_stack_management.yml** - Manage your 69 GitOps stacks
+2. **container_dependency_orchestrator.yml** - Safe container updates
+3. **prometheus_target_discovery.yml** - Complete monitoring coverage
+
+### **Regular Maintenance**
+4. **synology_backup_orchestrator.yml** - Weekly backup coordination
+5. **tailscale_mesh_management.yml** - Network health monitoring
+
+### **Emergency Preparedness**
+6. **disaster_recovery_orchestrator.yml** - Monthly DR testing
+7. **setup_gitea_runner.yml** - Runner deployment/maintenance
+
+---
+
+## 📚 **USAGE EXAMPLES**
+
+### Quick Health Check
+```bash
+# Check all container dependencies and health
+ansible-playbook -i hosts.ini playbooks/container_dependency_orchestrator.yml
+
+# Discover monitoring gaps
+ansible-playbook -i hosts.ini playbooks/prometheus_target_discovery.yml
+```
+
+### Maintenance Operations
+```bash
+# Sync all GitOps stacks
+ansible-playbook -i hosts.ini playbooks/portainer_stack_management.yml -e sync_stacks=true
+
+# Backup Synology systems
+ansible-playbook -i hosts.ini playbooks/synology_backup_orchestrator.yml --limit synology
+```
+
+### Network Diagnostics
+```bash
+# Validate Tailscale mesh
+ansible-playbook -i hosts.ini playbooks/tailscale_mesh_management.yml
+
+# Test disaster recovery readiness
+ansible-playbook -i hosts.ini playbooks/disaster_recovery_orchestrator.yml
+```
+
+---
+
+## 🔧 **CONFIGURATION NOTES**
+
+### Required Variables
+- **Portainer**: Set `portainer_password` in vault
+- **Tailscale**: Optional `tailscale_api_key` for ACL checks
+- **Backup retention**: Customize `backup_retention_days`
+
+### Host Groups
+Ensure your `hosts.ini` includes:
+- `synology` - For Atlantis/Calypso
+- `debian_clients` - For VM hosts
+- `hypervisors` - For Proxmox/specialized hosts
+
+### Security
+- All playbooks use appropriate security risk levels
+- Sensitive operations require explicit confirmation
+- Backup operations include integrity verification
+
+---
+
+## 📊 **EXPECTED OUTCOMES**
+
+### **Operational Improvements**
+- **99%+ uptime** through intelligent dependency management
+- **Automated GitOps** for 69/71 stacks
+- **Complete monitoring** coverage for 157 containers
+- **Verified backups** with automated testing
+
+### **Time Savings**
+- **80% reduction** in manual container management
+- **Automated discovery** of monitoring gaps
+- **One-click** GitOps synchronization
+- **Streamlined** disaster recovery procedures
+
+### **Risk Reduction**
+- **Dependency-aware** updates prevent cascade failures
+- **Verified backups** ensure data protection
+- **Network monitoring** prevents connectivity issues
+- **Documented procedures** for emergency response
+
+---
+
+## 🎉 **CONCLUSION**
+
+Your homelab now has **enterprise-grade automation** capabilities:
+
+✅ **157 containers** managed intelligently  
+✅ **5 hosts** coordinated seamlessly  
+✅ **69 GitOps stacks** automated  
+✅ **Complete monitoring** coverage  
+✅ **Disaster recovery** ready  
+✅ **Gitea Actions** operational  
+
+The infrastructure is ready for the next level of automation and reliability! 🚀
--- a/ansible/automation/playbooks/add_ssh_keys.yml
+++ b/ansible/automation/playbooks/add_ssh_keys.yml
@@ -0,0 +1,39 @@
+---
+- name: Ensure homelab's SSH key is present on all reachable hosts
+  hosts: all
+  gather_facts: false
+  become: true
+
+  vars:
+    ssh_pub_key: "{{ lookup('file', '/home/homelab/.ssh/id_ed25519.pub') }}"
+    ssh_user: "{{ ansible_user | default('vish') }}"
+    ssh_port: "{{ ansible_port | default(22) }}"
+
+  tasks:
+    - name: Check if SSH is reachable
+      wait_for:
+        host: "{{ inventory_hostname }}"
+        port: "{{ ssh_port }}"
+        timeout: 8
+        state: started
+      delegate_to: localhost
+      ignore_errors: true
+      register: ssh_port_check
+
+    - name: Add SSH key for user
+      authorized_key:
+        user: "{{ ssh_user }}"
+        key: "{{ ssh_pub_key }}"
+        state: present
+      when: not ssh_port_check is failed
+      ignore_unreachable: true
+
+    - name: Report hosts where SSH key was added
+      debug:
+        msg: "SSH key added successfully to {{ inventory_hostname }}"
+      when: not ssh_port_check is failed
+
+    - name: Report hosts where SSH was unreachable
+      debug:
+        msg: "Skipped {{ inventory_hostname }} (SSH not reachable)"
+      when: ssh_port_check is failed
--- a/ansible/automation/playbooks/alert_check.yml
+++ b/ansible/automation/playbooks/alert_check.yml
@@ -0,0 +1,418 @@
+---
+# Alert Check and Notification Playbook
+# Monitors system conditions and sends alerts when thresholds are exceeded
+# Usage: ansible-playbook playbooks/alert_check.yml
+# Usage: ansible-playbook playbooks/alert_check.yml -e "alert_mode=test"
+
+- name: Infrastructure Alert Monitoring
+  hosts: all
+  gather_facts: yes
+  vars:
+    alert_config_dir: "/tmp/alerts"
+    default_alert_mode: "production"  # production, test, silent
+
+    # Alert thresholds
+    thresholds:
+      cpu:
+        warning: 80
+        critical: 95
+      memory:
+        warning: 85
+        critical: 95
+      disk:
+        warning: 85
+        critical: 95
+      load:
+        warning: 4.0
+        critical: 8.0
+      container_down_critical: 1  # Number of containers down to trigger critical
+
+    # Notification settings
+    notifications:
+      ntfy_url: "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}"
+      email_enabled: "{{ email_enabled | default(false) }}"
+      slack_webhook: "{{ slack_webhook | default('') }}"
+
+  tasks:
+    - name: Create alert configuration directory
+      file:
+        path: "{{ alert_config_dir }}/{{ inventory_hostname }}"
+        state: directory
+        mode: '0755'
+
+    - name: Display alert monitoring plan
+      debug:
+        msg: |
+          🚨 ALERT MONITORING INITIATED
+          =============================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔔 Mode: {{ alert_mode | default(default_alert_mode) }}
+          📊 CPU: {{ thresholds.cpu.warning }}%/{{ thresholds.cpu.critical }}%
+          💾 Memory: {{ thresholds.memory.warning }}%/{{ thresholds.memory.critical }}%
+          💿 Disk: {{ thresholds.disk.warning }}%/{{ thresholds.disk.critical }}%
+          ⚖️  Load: {{ thresholds.load.warning }}/{{ thresholds.load.critical }}
+
+    - name: Check CPU usage with alerting
+      shell: |
+        cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | awk -F'%' '{print $1}')
+        if [ -z "$cpu_usage" ]; then
+          cpu_usage=$(vmstat 1 2 | tail -1 | awk '{print 100-$15}')
+        fi
+
+        cpu_int=$(echo "$cpu_usage" | cut -d'.' -f1)
+
+        echo "🖥️  CPU Usage: ${cpu_usage}%"
+
+        if [ "$cpu_int" -gt "{{ thresholds.cpu.critical }}" ]; then
+          echo "CRITICAL:CPU:${cpu_usage}%"
+          exit 2
+        elif [ "$cpu_int" -gt "{{ thresholds.cpu.warning }}" ]; then
+          echo "WARNING:CPU:${cpu_usage}%"
+          exit 1
+        else
+          echo "OK:CPU:${cpu_usage}%"
+          exit 0
+        fi
+      register: cpu_alert
+      failed_when: false
+
+    - name: Check memory usage with alerting
+      shell: |
+        memory_usage=$(free | awk 'NR==2{printf "%.0f", $3*100/$2}')
+
+        echo "💾 Memory Usage: ${memory_usage}%"
+
+        if [ "$memory_usage" -gt "{{ thresholds.memory.critical }}" ]; then
+          echo "CRITICAL:MEMORY:${memory_usage}%"
+          exit 2
+        elif [ "$memory_usage" -gt "{{ thresholds.memory.warning }}" ]; then
+          echo "WARNING:MEMORY:${memory_usage}%"
+          exit 1
+        else
+          echo "OK:MEMORY:${memory_usage}%"
+          exit 0
+        fi
+      register: memory_alert
+      failed_when: false
+
+    - name: Check disk usage with alerting
+      shell: |
+        critical_disks=""
+        warning_disks=""
+
+        echo "💿 Disk Usage Check:"
+        df -h | awk 'NR>1 {print $5 " " $6}' | while read output; do
+          usage=$(echo $output | awk '{print $1}' | sed 's/%//')
+          partition=$(echo $output | awk '{print $2}')
+
+          echo "  $partition: ${usage}%"
+
+          if [ "$usage" -gt "{{ thresholds.disk.critical }}" ]; then
+            echo "CRITICAL:DISK:$partition:${usage}%"
+            echo "$partition:$usage" >> /tmp/critical_disks_$$
+          elif [ "$usage" -gt "{{ thresholds.disk.warning }}" ]; then
+            echo "WARNING:DISK:$partition:${usage}%"
+            echo "$partition:$usage" >> /tmp/warning_disks_$$
+          fi
+        done
+
+        if [ -f /tmp/critical_disks_$$ ]; then
+          echo "Critical disk alerts:"
+          cat /tmp/critical_disks_$$
+          rm -f /tmp/critical_disks_$$ /tmp/warning_disks_$$
+          exit 2
+        elif [ -f /tmp/warning_disks_$$ ]; then
+          echo "Disk warnings:"
+          cat /tmp/warning_disks_$$
+          rm -f /tmp/warning_disks_$$
+          exit 1
+        else
+          echo "OK:DISK:All partitions normal"
+          exit 0
+        fi
+      register: disk_alert
+      failed_when: false
+
+    - name: Check load average with alerting
+      shell: |
+        load_avg=$(uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//')
+
+        echo "⚖️  Load Average (1min): $load_avg"
+
+        # Use bc for floating point comparison if available, otherwise use awk
+        if command -v bc &> /dev/null; then
+          critical_check=$(echo "$load_avg > {{ thresholds.load.critical }}" | bc -l)
+          warning_check=$(echo "$load_avg > {{ thresholds.load.warning }}" | bc -l)
+        else
+          critical_check=$(awk "BEGIN {print ($load_avg > {{ thresholds.load.critical }})}")
+          warning_check=$(awk "BEGIN {print ($load_avg > {{ thresholds.load.warning }})}")
+        fi
+
+        if [ "$critical_check" = "1" ]; then
+          echo "CRITICAL:LOAD:${load_avg}"
+          exit 2
+        elif [ "$warning_check" = "1" ]; then
+          echo "WARNING:LOAD:${load_avg}"
+          exit 1
+        else
+          echo "OK:LOAD:${load_avg}"
+          exit 0
+        fi
+      register: load_alert
+      failed_when: false
+
+    - name: Check Docker container health
+      shell: |
+        if command -v docker &> /dev/null && docker info &> /dev/null; then
+          total_containers=$(docker ps -a -q | wc -l)
+          running_containers=$(docker ps -q | wc -l)
+          unhealthy_containers=$(docker ps --filter health=unhealthy -q | wc -l)
+          stopped_containers=$((total_containers - running_containers))
+
+          echo "🐳 Docker Container Status:"
+          echo "  Total: $total_containers"
+          echo "  Running: $running_containers"
+          echo "  Stopped: $stopped_containers"
+          echo "  Unhealthy: $unhealthy_containers"
+
+          if [ "$unhealthy_containers" -gt "0" ] || [ "$stopped_containers" -gt "{{ thresholds.container_down_critical }}" ]; then
+            echo "CRITICAL:DOCKER:$stopped_containers stopped, $unhealthy_containers unhealthy"
+            exit 2
+          elif [ "$stopped_containers" -gt "0" ]; then
+            echo "WARNING:DOCKER:$stopped_containers containers stopped"
+            exit 1
+          else
+            echo "OK:DOCKER:All containers healthy"
+            exit 0
+          fi
+        else
+          echo "ℹ️  Docker not available - skipping container checks"
+          echo "OK:DOCKER:Not installed"
+          exit 0
+        fi
+      register: docker_alert
+      failed_when: false
+
+    - name: Check critical services
+      shell: |
+        critical_services=("ssh" "systemd-resolved")
+        failed_services=""
+
+        echo "🔧 Critical Services Check:"
+
+        for service in "${critical_services[@]}"; do
+          if systemctl is-active --quiet "$service" 2>/dev/null; then
+            echo "  ✅ $service: running"
+          else
+            echo "  🚨 $service: not running"
+            failed_services="$failed_services $service"
+          fi
+        done
+
+        if [ -n "$failed_services" ]; then
+          echo "CRITICAL:SERVICES:$failed_services"
+          exit 2
+        else
+          echo "OK:SERVICES:All critical services running"
+          exit 0
+        fi
+      register: services_alert
+      failed_when: false
+
+    - name: Check network connectivity
+      shell: |
+        echo "🌐 Network Connectivity Check:"
+
+        # Check internet connectivity
+        if ping -c 1 -W 5 8.8.8.8 &> /dev/null; then
+          echo "  ✅ Internet: OK"
+          internet_status="OK"
+        else
+          echo "  🚨 Internet: FAILED"
+          internet_status="FAILED"
+        fi
+
+        # Check DNS resolution
+        if nslookup google.com &> /dev/null; then
+          echo "  ✅ DNS: OK"
+          dns_status="OK"
+        else
+          echo "  ⚠️  DNS: FAILED"
+          dns_status="FAILED"
+        fi
+
+        if [ "$internet_status" = "FAILED" ]; then
+          echo "CRITICAL:NETWORK:No internet connectivity"
+          exit 2
+        elif [ "$dns_status" = "FAILED" ]; then
+          echo "WARNING:NETWORK:DNS resolution issues"
+          exit 1
+        else
+          echo "OK:NETWORK:All connectivity normal"
+          exit 0
+        fi
+      register: network_alert
+      failed_when: false
+
+    - name: Evaluate overall alert status
+      set_fact:
+        alert_summary:
+          critical_count: >-
+            {{
+              [cpu_alert, memory_alert, disk_alert, load_alert, docker_alert, services_alert, network_alert]
+              | selectattr('rc', 'defined')
+              | selectattr('rc', 'equalto', 2)
+              | list
+              | length
+            }}
+          warning_count: >-
+            {{
+              [cpu_alert, memory_alert, disk_alert, load_alert, docker_alert, services_alert, network_alert]
+              | selectattr('rc', 'defined')
+              | selectattr('rc', 'equalto', 1)
+              | list
+              | length
+            }}
+          overall_status: >-
+            {{
+              'CRITICAL' if (
+                [cpu_alert, memory_alert, disk_alert, load_alert, docker_alert, services_alert, network_alert]
+                | selectattr('rc', 'defined')
+                | selectattr('rc', 'equalto', 2)
+                | list
+                | length > 0
+              ) else 'WARNING' if (
+                [cpu_alert, memory_alert, disk_alert, load_alert, docker_alert, services_alert, network_alert]
+                | selectattr('rc', 'defined')
+                | selectattr('rc', 'equalto', 1)
+                | list
+                | length > 0
+              ) else 'OK'
+            }}
+
+    - name: Generate alert report
+      shell: |
+        alert_file="{{ alert_config_dir }}/{{ inventory_hostname }}/alert_report_{{ ansible_date_time.epoch }}.txt"
+
+        echo "🚨 INFRASTRUCTURE ALERT REPORT" > "$alert_file"
+        echo "===============================" >> "$alert_file"
+        echo "Host: {{ inventory_hostname }}" >> "$alert_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$alert_file"
+        echo "Overall Status: {{ alert_summary.overall_status }}" >> "$alert_file"
+        echo "Critical Alerts: {{ alert_summary.critical_count }}" >> "$alert_file"
+        echo "Warning Alerts: {{ alert_summary.warning_count }}" >> "$alert_file"
+        echo "" >> "$alert_file"
+
+        echo "📊 DETAILED RESULTS:" >> "$alert_file"
+        echo "===================" >> "$alert_file"
+        {% for check in ['cpu_alert', 'memory_alert', 'disk_alert', 'load_alert', 'docker_alert', 'services_alert', 'network_alert'] %}
+        echo "" >> "$alert_file"
+        echo "{{ check | upper | replace('_ALERT', '') }}:" >> "$alert_file"
+        echo "{{ hostvars[inventory_hostname][check].stdout | default('No output') }}" >> "$alert_file"
+        {% endfor %}
+
+        echo "Alert report saved to: $alert_file"
+      register: alert_report
+
+    - name: Send NTFY notification for critical alerts
+      uri:
+        url: "{{ notifications.ntfy_url }}"
+        method: POST
+        body: |
+          🚨 CRITICAL ALERT: {{ inventory_hostname }}
+
+          Status: {{ alert_summary.overall_status }}
+          Critical: {{ alert_summary.critical_count }}
+          Warnings: {{ alert_summary.warning_count }}
+
+          Time: {{ ansible_date_time.iso8601 }}
+        headers:
+          Title: "Homelab Critical Alert"
+          Priority: "urgent"
+          Tags: "warning,critical,{{ inventory_hostname }}"
+      when:
+        - alert_summary.overall_status == "CRITICAL"
+        - alert_mode | default(default_alert_mode) != "silent"
+        - notifications.ntfy_url != ""
+      ignore_errors: yes
+
+    - name: Send NTFY notification for warning alerts
+      uri:
+        url: "{{ notifications.ntfy_url }}"
+        method: POST
+        body: |
+          ⚠️ WARNING: {{ inventory_hostname }}
+
+          Status: {{ alert_summary.overall_status }}
+          Warnings: {{ alert_summary.warning_count }}
+
+          Time: {{ ansible_date_time.iso8601 }}
+        headers:
+          Title: "Homelab Warning"
+          Priority: "default"
+          Tags: "warning,{{ inventory_hostname }}"
+      when:
+        - alert_summary.overall_status == "WARNING"
+        - alert_mode | default(default_alert_mode) != "silent"
+        - notifications.ntfy_url != ""
+      ignore_errors: yes
+
+    - name: Send test notification
+      uri:
+        url: "{{ notifications.ntfy_url }}"
+        method: POST
+        body: |
+          🧪 TEST ALERT: {{ inventory_hostname }}
+
+          This is a test notification from the alert monitoring system.
+
+          Status: {{ alert_summary.overall_status }}
+          Time: {{ ansible_date_time.iso8601 }}
+        headers:
+          Title: "Homelab Alert Test"
+          Priority: "low"
+          Tags: "test,{{ inventory_hostname }}"
+      when:
+        - alert_mode | default(default_alert_mode) == "test"
+        - notifications.ntfy_url != ""
+      ignore_errors: yes
+
+    - name: Display alert summary
+      debug:
+        msg: |
+
+          🚨 ALERT MONITORING COMPLETE
+          ============================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔔 Mode: {{ alert_mode | default(default_alert_mode) }}
+
+          📊 ALERT SUMMARY:
+          Overall Status: {{ alert_summary.overall_status }}
+          Critical Alerts: {{ alert_summary.critical_count }}
+          Warning Alerts: {{ alert_summary.warning_count }}
+
+          📋 CHECK RESULTS:
+          {% for check in ['cpu_alert', 'memory_alert', 'disk_alert', 'load_alert', 'docker_alert', 'services_alert', 'network_alert'] %}
+          {{ check | replace('_alert', '') | upper }}: {{ 'CRITICAL' if hostvars[inventory_hostname][check].rc | default(0) == 2 else 'WARNING' if hostvars[inventory_hostname][check].rc | default(0) == 1 else 'OK' }}
+          {% endfor %}
+
+          {{ alert_report.stdout }}
+
+          🔍 Next Steps:
+          {% if alert_summary.overall_status == "CRITICAL" %}
+          - 🚨 IMMEDIATE ACTION REQUIRED
+          - Review critical alerts above
+          - Check system resources and services
+          {% elif alert_summary.overall_status == "WARNING" %}
+          - ⚠️  Monitor system closely
+          - Consider preventive maintenance
+          {% else %}
+          - ✅ System is healthy
+          - Continue regular monitoring
+          {% endif %}
+          - Schedule regular checks: crontab -e
+          - View full report: cat {{ alert_config_dir }}/{{ inventory_hostname }}/alert_report_*.txt
+
+          ============================
--- a/ansible/automation/playbooks/ansible_status_check.yml
+++ b/ansible/automation/playbooks/ansible_status_check.yml
@@ -0,0 +1,127 @@
+---
+# Check Ansible status across all reachable hosts
+# Simple status check and upgrade where possible
+# Created: February 8, 2026
+
+- name: Check Ansible status on all reachable hosts
+  hosts: homelab,pi-5,vish-concord-nuc,pve
+  gather_facts: yes
+  become: yes
+  ignore_errors: yes
+
+  tasks:
+    - name: Display host information
+      debug:
+        msg: |
+          === {{ inventory_hostname | upper }} ===
+          IP: {{ ansible_host }}
+          OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
+          Architecture: {{ ansible_architecture }}
+
+    - name: Check if Ansible is installed
+      command: ansible --version
+      register: ansible_check
+      changed_when: false
+      failed_when: false
+
+    - name: Display Ansible status
+      debug:
+        msg: |
+          Ansible on {{ inventory_hostname }}:
+          {% if ansible_check.rc == 0 %}
+          ✅ INSTALLED: {{ ansible_check.stdout_lines[0] }}
+          {% else %}
+          ❌ NOT INSTALLED
+          {% endif %}
+
+    - name: Check if apt is available (Debian/Ubuntu only)
+      stat:
+        path: /usr/bin/apt
+      register: has_apt
+
+    - name: Try to install/upgrade Ansible (Debian/Ubuntu only)
+      block:
+        - name: Update package cache (ignore GPG errors)
+          apt:
+            update_cache: yes
+            cache_valid_time: 0
+          register: apt_update
+          failed_when: false
+
+        - name: Install/upgrade Ansible
+          apt:
+            name: ansible
+            state: latest
+          register: ansible_install
+          when: apt_update is not failed
+
+        - name: Display installation result
+          debug:
+            msg: |
+              Ansible installation on {{ inventory_hostname }}:
+              {% if ansible_install is succeeded %}
+                {% if ansible_install.changed %}
+              ✅ {{ 'INSTALLED' if ansible_check.rc != 0 else 'UPGRADED' }} successfully
+                {% else %}
+              ℹ️  Already at latest version
+                {% endif %}
+              {% elif apt_update is failed %}
+              ⚠️  APT update failed - using cached packages
+              {% else %}
+              ❌ Installation failed
+              {% endif %}
+
+      when: has_apt.stat.exists
+      rescue:
+        - name: Installation failed
+          debug:
+            msg: "❌ Failed to install/upgrade Ansible on {{ inventory_hostname }}"
+
+    - name: Final Ansible version check
+      command: ansible --version
+      register: final_ansible_check
+      changed_when: false
+      failed_when: false
+
+    - name: Final status summary
+      debug:
+        msg: |
+          === FINAL STATUS: {{ inventory_hostname | upper }} ===
+          {% if final_ansible_check.rc == 0 %}
+          ✅ Ansible: {{ final_ansible_check.stdout_lines[0] }}
+          {% else %}
+          ❌ Ansible: Not available
+          {% endif %}
+          OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
+          APT Available: {{ '✅ Yes' if has_apt.stat.exists else '❌ No' }}
+
+- name: Summary Report
+  hosts: localhost
+  gather_facts: no
+  run_once: true
+
+  tasks:
+    - name: Display overall summary
+      debug:
+        msg: |
+
+          ========================================
+          ANSIBLE UPDATE SUMMARY - {{ ansible_date_time.date }}
+          ========================================
+
+          Processed hosts:
+          - homelab (100.67.40.126)
+          - pi-5 (100.77.151.40)
+          - vish-concord-nuc (100.72.55.21)
+          - pve (100.87.12.28)
+
+          Excluded hosts:
+          - Synology devices (atlantis, calypso, setillo) - Use DSM package manager
+          - homeassistant - Uses Home Assistant OS package management
+          - truenas-scale - Uses TrueNAS package management
+          - pi-5-kevin - Currently unreachable
+
+          ✅ homelab: Already has Ansible 2.16.3 (latest)
+          📋 Check individual host results above for details
+
+          ========================================
--- a/ansible/automation/playbooks/backup_configs.yml
+++ b/ansible/automation/playbooks/backup_configs.yml
@@ -0,0 +1,342 @@
+---
+# Configuration Backup Playbook
+# Backup docker-compose files, configs, and important data
+# Usage: ansible-playbook playbooks/backup_configs.yml
+# Usage: ansible-playbook playbooks/backup_configs.yml --limit atlantis
+# Usage: ansible-playbook playbooks/backup_configs.yml -e "include_secrets=true"
+
+- name: Backup Configurations and Important Data
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    backup_base_dir: "/volume1/backups/configs"  # Synology path
+    backup_local_dir: "/tmp/config_backups"
+
+
+
+    # Configuration paths to backup per host
+    config_paths:
+      atlantis:
+        - path: "/volume1/docker"
+          name: "docker_configs"
+          exclude: ["*/cache/*", "*/logs/*", "*/tmp/*"]
+        - path: "/volume1/homes"
+          name: "user_configs"
+          exclude: ["*/Downloads/*", "*/Trash/*"]
+        - path: "/etc/ssh"
+          name: "ssh_config"
+          exclude: ["ssh_host_*_key"]
+      calypso:
+        - path: "/volume1/docker"
+          name: "docker_configs"
+          exclude: ["*/cache/*", "*/logs/*", "*/tmp/*"]
+        - path: "/etc/ssh"
+          name: "ssh_config"
+          exclude: ["ssh_host_*_key"]
+      homelab_vm:
+        - path: "/opt/docker"
+          name: "docker_configs"
+          exclude: ["*/cache/*", "*/logs/*", "*/tmp/*"]
+        - path: "/etc/nginx"
+          name: "nginx_config"
+          exclude: []
+        - path: "/etc/ssh"
+          name: "ssh_config"
+          exclude: ["ssh_host_*_key"]
+      concord_nuc:
+        - path: "/opt/docker"
+          name: "docker_configs"
+          exclude: ["*/cache/*", "*/logs/*", "*/tmp/*"]
+        - path: "/etc/ssh"
+          name: "ssh_config"
+          exclude: ["ssh_host_*_key"]
+
+    # Important service data directories
+    service_data:
+      atlantis:
+        - service: "immich"
+          paths: ["/volume1/docker/immich/config"]
+        - service: "vaultwarden"
+          paths: ["/volume1/docker/vaultwarden/data"]
+        - service: "plex"
+          paths: ["/volume1/docker/plex/config"]
+      calypso:
+        - service: "authentik"
+          paths: ["/volume1/docker/authentik/config"]
+        - service: "paperless"
+          paths: ["/volume1/docker/paperless/config"]
+
+  tasks:
+    - name: Create backup directories
+      file:
+        path: "{{ item }}"
+        state: directory
+        mode: '0755'
+      loop:
+        - "{{ backup_base_dir }}/{{ inventory_hostname }}"
+        - "{{ backup_local_dir }}/{{ inventory_hostname }}"
+      ignore_errors: yes
+
+    - name: Get current config paths for this host
+      set_fact:
+        current_configs: "{{ config_paths.get(inventory_hostname, []) }}"
+        current_service_data: "{{ service_data.get(inventory_hostname, []) }}"
+
+    - name: Display backup plan
+      debug:
+        msg: |
+          📊 CONFIGURATION BACKUP PLAN
+          =============================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          📁 Config Paths: {{ current_configs | length }}
+          {% for config in current_configs %}
+          - {{ config.name }}: {{ config.path }}
+          {% endfor %}
+          🔧 Service Data: {{ current_service_data | length }}
+          {% for service in current_service_data %}
+          - {{ service.service }}
+          {% endfor %}
+          🔐 Include Secrets: {{ include_secrets | default(false) }}
+          🗜️  Compression: {{ compress_backups | default(true) }}
+
+    - name: Create system info snapshot
+      shell: |
+        info_file="{{ backup_local_dir }}/{{ inventory_hostname }}/system_info_{{ ansible_date_time.epoch }}.txt"
+
+        echo "📊 SYSTEM INFORMATION SNAPSHOT" > "$info_file"
+        echo "===============================" >> "$info_file"
+        echo "Host: {{ inventory_hostname }}" >> "$info_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$info_file"
+        echo "OS: {{ ansible_distribution }} {{ ansible_distribution_version }}" >> "$info_file"
+        echo "Kernel: {{ ansible_kernel }}" >> "$info_file"
+        echo "Uptime: {{ ansible_uptime_seconds | int // 86400 }} days" >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "🐳 DOCKER INFO:" >> "$info_file"
+        docker --version >> "$info_file" 2>/dev/null || echo "Docker not available" >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "📦 RUNNING CONTAINERS:" >> "$info_file"
+        docker ps --format "table {{ '{{' }}.Names{{ '}}' }}\t{{ '{{' }}.Image{{ '}}' }}\t{{ '{{' }}.Status{{ '}}' }}" >> "$info_file" 2>/dev/null || echo "Cannot access Docker" >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "💾 DISK USAGE:" >> "$info_file"
+        df -h >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "🔧 INSTALLED PACKAGES (last 20):" >> "$info_file"
+        if command -v dpkg &> /dev/null; then
+          dpkg -l | tail -20 >> "$info_file"
+        elif command -v rpm &> /dev/null; then
+          rpm -qa | tail -20 >> "$info_file"
+        fi
+
+    - name: Backup configuration directories
+      shell: |
+        config_name="{{ item.name }}"
+        source_path="{{ item.path }}"
+        backup_file="{{ backup_local_dir }}/{{ inventory_hostname }}/${config_name}_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.tar"
+
+        if [ -d "$source_path" ]; then
+          echo "🔄 Backing up $config_name from $source_path..."
+
+          # Build exclude options
+          exclude_opts=""
+          {% for exclude in item.exclude %}
+          exclude_opts="$exclude_opts --exclude='{{ exclude }}'"
+          {% endfor %}
+
+          {% if not (include_secrets | default(false)) %}
+          # Add common secret file exclusions
+          exclude_opts="$exclude_opts --exclude='*.key' --exclude='*.pem' --exclude='*.p12' --exclude='*password*' --exclude='*secret*' --exclude='*.env'"
+          {% endif %}
+
+          # Create tar backup
+          eval "tar -cf '$backup_file' -C '$(dirname $source_path)' $exclude_opts '$(basename $source_path)'"
+
+          if [ $? -eq 0 ]; then
+            echo "✅ $config_name backup successful"
+
+            {% if compress_backups | default(true) %}
+            gzip "$backup_file"
+            backup_file="${backup_file}.gz"
+            {% endif %}
+
+            backup_size=$(du -h "$backup_file" | cut -f1)
+            echo "📦 Backup size: $backup_size"
+
+            # Copy to permanent storage
+            if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+              cp "$backup_file" "{{ backup_base_dir }}/{{ inventory_hostname }}/"
+              echo "📁 Copied to permanent storage"
+            fi
+          else
+            echo "❌ $config_name backup failed"
+          fi
+        else
+          echo "⚠️  $source_path does not exist, skipping $config_name"
+        fi
+      register: config_backups
+      loop: "{{ current_configs }}"
+
+    - name: Backup service-specific data
+      shell: |
+        service_name="{{ item.service }}"
+        backup_file="{{ backup_local_dir }}/{{ inventory_hostname }}/service_${service_name}_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.tar"
+
+        echo "🔄 Backing up $service_name service data..."
+
+        # Create temporary file list
+        temp_list="/tmp/service_${service_name}_files.txt"
+        > "$temp_list"
+
+        {% for path in item.paths %}
+        if [ -d "{{ path }}" ]; then
+          echo "{{ path }}" >> "$temp_list"
+        fi
+        {% endfor %}
+
+        if [ -s "$temp_list" ]; then
+          tar -cf "$backup_file" -T "$temp_list" {% if not (include_secrets | default(false)) %}--exclude='*.key' --exclude='*.pem' --exclude='*password*' --exclude='*secret*'{% endif %}
+
+          if [ $? -eq 0 ]; then
+            echo "✅ $service_name service data backup successful"
+
+            {% if compress_backups | default(true) %}
+            gzip "$backup_file"
+            backup_file="${backup_file}.gz"
+            {% endif %}
+
+            backup_size=$(du -h "$backup_file" | cut -f1)
+            echo "📦 Backup size: $backup_size"
+
+            if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+              cp "$backup_file" "{{ backup_base_dir }}/{{ inventory_hostname }}/"
+            fi
+          else
+            echo "❌ $service_name service data backup failed"
+          fi
+        else
+          echo "⚠️  No valid paths found for $service_name"
+        fi
+
+        rm -f "$temp_list"
+      register: service_backups
+      loop: "{{ current_service_data }}"
+
+    - name: Backup docker-compose files
+      shell: |
+        compose_backup="{{ backup_local_dir }}/{{ inventory_hostname }}/docker_compose_files_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.tar"
+
+        echo "🔄 Backing up docker-compose files..."
+
+        # Find all docker-compose files
+        find /volume1 /opt /home -name "docker-compose.yml" -o -name "docker-compose.yaml" -o -name "*.yml" -path "*/docker/*" 2>/dev/null > /tmp/compose_files.txt
+
+        if [ -s /tmp/compose_files.txt ]; then
+          tar -cf "$compose_backup" -T /tmp/compose_files.txt
+
+          if [ $? -eq 0 ]; then
+            echo "✅ Docker-compose files backup successful"
+
+            {% if compress_backups | default(true) %}
+            gzip "$compose_backup"
+            compose_backup="${compose_backup}.gz"
+            {% endif %}
+
+            backup_size=$(du -h "$compose_backup" | cut -f1)
+            echo "📦 Backup size: $backup_size"
+
+            if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+              cp "$compose_backup" "{{ backup_base_dir }}/{{ inventory_hostname }}/"
+            fi
+          else
+            echo "❌ Docker-compose files backup failed"
+          fi
+        else
+          echo "⚠️  No docker-compose files found"
+        fi
+
+        rm -f /tmp/compose_files.txt
+      register: compose_backup
+
+    - name: Create backup inventory
+      shell: |
+        inventory_file="{{ backup_local_dir }}/{{ inventory_hostname }}/backup_inventory_{{ ansible_date_time.date }}.txt"
+
+        echo "📋 BACKUP INVENTORY" > "$inventory_file"
+        echo "===================" >> "$inventory_file"
+        echo "Host: {{ inventory_hostname }}" >> "$inventory_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$inventory_file"
+        echo "Include Secrets: {{ include_secrets | default(false) }}" >> "$inventory_file"
+        echo "Compression: {{ compress_backups | default(true) }}" >> "$inventory_file"
+        echo "" >> "$inventory_file"
+
+        echo "📁 BACKUP FILES:" >> "$inventory_file"
+        ls -la {{ backup_local_dir }}/{{ inventory_hostname }}/ >> "$inventory_file"
+
+        echo "" >> "$inventory_file"
+        echo "📊 BACKUP SIZES:" >> "$inventory_file"
+        du -h {{ backup_local_dir }}/{{ inventory_hostname }}/* >> "$inventory_file"
+
+        echo "" >> "$inventory_file"
+        echo "🔍 BACKUP CONTENTS:" >> "$inventory_file"
+        {% for config in current_configs %}
+        backup_file="{{ backup_local_dir }}/{{ inventory_hostname }}/{{ config.name }}_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.tar{% if compress_backups | default(true) %}.gz{% endif %}"
+        if [ -f "$backup_file" ]; then
+          echo "=== {{ config.name }} ===" >> "$inventory_file"
+          {% if compress_backups | default(true) %}
+          tar -tzf "$backup_file" | head -20 >> "$inventory_file" 2>/dev/null || echo "Cannot list contents" >> "$inventory_file"
+          {% else %}
+          tar -tf "$backup_file" | head -20 >> "$inventory_file" 2>/dev/null || echo "Cannot list contents" >> "$inventory_file"
+          {% endif %}
+          echo "" >> "$inventory_file"
+        fi
+        {% endfor %}
+
+        # Copy inventory to permanent storage
+        if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+          cp "$inventory_file" "{{ backup_base_dir }}/{{ inventory_hostname }}/"
+        fi
+
+        cat "$inventory_file"
+      register: backup_inventory
+
+    - name: Clean up old backups
+      shell: |
+        echo "🧹 Cleaning up backups older than {{ backup_retention_days | default(30) }} days..."
+
+        # Clean local backups
+        find {{ backup_local_dir }}/{{ inventory_hostname }} -name "*.tar*" -mtime +{{ backup_retention_days | default(30) }} -delete
+        find {{ backup_local_dir }}/{{ inventory_hostname }} -name "*.txt" -mtime +{{ backup_retention_days | default(30) }} -delete
+
+        # Clean permanent storage backups
+        if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+          find {{ backup_base_dir }}/{{ inventory_hostname }} -name "*.tar*" -mtime +{{ backup_retention_days | default(30) }} -delete
+          find {{ backup_base_dir }}/{{ inventory_hostname }} -name "*.txt" -mtime +{{ backup_retention_days | default(30) }} -delete
+        fi
+
+        echo "✅ Cleanup complete"
+      when: (backup_retention_days | default(30) | int) > 0
+
+    - name: Display backup summary
+      debug:
+        msg: |
+
+          ✅ CONFIGURATION BACKUP COMPLETE
+          ================================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          📁 Config Paths: {{ current_configs | length }}
+          🔧 Service Data: {{ current_service_data | length }}
+          🔐 Secrets Included: {{ include_secrets | default(false) }}
+
+          {{ backup_inventory.stdout }}
+
+          🔍 Next Steps:
+          - Verify backups: ls -la {{ backup_local_dir }}/{{ inventory_hostname }}
+          - Test restore: tar -tf backup_file.tar.gz
+          - Schedule regular backups via cron
+
+          ================================
--- a/ansible/automation/playbooks/backup_databases.yml
+++ b/ansible/automation/playbooks/backup_databases.yml
@@ -0,0 +1,284 @@
+---
+# Database Backup Playbook
+# Automated backup of all PostgreSQL and MySQL databases across homelab
+# Usage: ansible-playbook playbooks/backup_databases.yml
+# Usage: ansible-playbook playbooks/backup_databases.yml --limit atlantis
+# Usage: ansible-playbook playbooks/backup_databases.yml -e "backup_type=full"
+
+- name: Backup All Databases
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+
+    backup_base_dir: "/volume1/backups/databases"  # Synology path
+    backup_local_dir: "/tmp/database_backups"
+
+    # Database service mapping
+    database_services:
+      atlantis:
+        - name: "immich-db"
+          type: "postgresql"
+          database: "immich"
+          container: "immich-db"
+          user: "postgres"
+        - name: "vaultwarden-db"
+          type: "postgresql"
+          database: "vaultwarden"
+          container: "vaultwarden-db"
+          user: "postgres"
+        - name: "joplin-db"
+          type: "postgresql"
+          database: "joplin"
+          container: "joplin-stack-db"
+          user: "postgres"
+        - name: "firefly-db"
+          type: "postgresql"
+          database: "firefly"
+          container: "firefly-db"
+          user: "firefly"
+      calypso:
+        - name: "authentik-db"
+          type: "postgresql"
+          database: "authentik"
+          container: "authentik-db"
+          user: "postgres"
+        - name: "paperless-db"
+          type: "postgresql"
+          database: "paperless"
+          container: "paperless-db"
+          user: "paperless"
+      homelab_vm:
+        - name: "mastodon-db"
+          type: "postgresql"
+          database: "mastodon"
+          container: "mastodon-db"
+          user: "postgres"
+        - name: "matrix-db"
+          type: "postgresql"
+          database: "synapse"
+          container: "synapse-db"
+          user: "postgres"
+
+  tasks:
+    - name: Check if Docker is running
+      systemd:
+        name: docker
+      register: docker_status
+      failed_when: docker_status.status.ActiveState != "active"
+
+    - name: Create backup directories
+      file:
+        path: "{{ item }}"
+        state: directory
+        mode: '0755'
+      loop:
+        - "{{ backup_base_dir }}/{{ inventory_hostname }}"
+        - "{{ backup_local_dir }}/{{ inventory_hostname }}"
+      ignore_errors: yes
+
+    - name: Get current database services for this host
+      set_fact:
+        current_databases: "{{ database_services.get(inventory_hostname, []) }}"
+
+    - name: Display backup plan
+      debug:
+        msg: |
+          📊 DATABASE BACKUP PLAN
+          =======================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔄 Type: {{ backup_type | default('incremental') }}
+          📦 Databases: {{ current_databases | length }}
+          {% for db in current_databases %}
+          - {{ db.name }} ({{ db.type }})
+          {% endfor %}
+          📁 Backup Dir: {{ backup_base_dir }}/{{ inventory_hostname }}
+          🗜️  Compression: {{ compress_backups | default(true) }}
+
+    - name: Check database containers are running
+      shell: docker ps --filter "name={{ item.container }}" --format "{{.Names}}"
+      register: container_check
+      loop: "{{ current_databases }}"
+      changed_when: false
+
+    - name: Create pre-backup container status
+      shell: |
+        echo "=== PRE-BACKUP STATUS ===" > {{ backup_local_dir }}/{{ inventory_hostname }}/backup_status_{{ ansible_date_time.epoch }}.log
+        echo "Host: {{ inventory_hostname }}" >> {{ backup_local_dir }}/{{ inventory_hostname }}/backup_status_{{ ansible_date_time.epoch }}.log
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> {{ backup_local_dir }}/{{ inventory_hostname }}/backup_status_{{ ansible_date_time.epoch }}.log
+        echo "Type: {{ backup_type | default('incremental') }}" >> {{ backup_local_dir }}/{{ inventory_hostname }}/backup_status_{{ ansible_date_time.epoch }}.log
+        echo "" >> {{ backup_local_dir }}/{{ inventory_hostname }}/backup_status_{{ ansible_date_time.epoch }}.log
+
+        {% for db in current_databases %}
+        echo "=== {{ db.name }} ===" >> {{ backup_local_dir }}/{{ inventory_hostname }}/backup_status_{{ ansible_date_time.epoch }}.log
+        docker ps --filter "name={{ db.container }}" --format "Status: {% raw %}{{.Status}}{% endraw %}" >> {{ backup_local_dir }}/{{ inventory_hostname }}/backup_status_{{ ansible_date_time.epoch }}.log
+        {% endfor %}
+
+    - name: Backup PostgreSQL databases
+      shell: |
+        backup_file="{{ backup_local_dir }}/{{ inventory_hostname }}/{{ item.name }}_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.sql"
+
+        echo "🔄 Backing up {{ item.name }}..."
+        docker exec {{ item.container }} pg_dump -U {{ item.user }} {{ item.database }} > "$backup_file"
+
+        if [ $? -eq 0 ]; then
+          echo "✅ {{ item.name }} backup successful"
+          {% if compress_backups | default(true) %}
+          gzip "$backup_file"
+          backup_file="${backup_file}.gz"
+          {% endif %}
+
+          # Get backup size
+          backup_size=$(du -h "$backup_file" | cut -f1)
+          echo "📦 Backup size: $backup_size"
+
+          # Copy to permanent storage if available
+          if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+            cp "$backup_file" "{{ backup_base_dir }}/{{ inventory_hostname }}/"
+            echo "📁 Copied to permanent storage"
+          fi
+        else
+          echo "❌ {{ item.name }} backup failed"
+          exit 1
+        fi
+      register: postgres_backups
+      loop: "{{ current_databases }}"
+      when:
+        - item.type == "postgresql"
+        - item.container in (container_check.results | selectattr('stdout', 'equalto', item.container) | map(attribute='stdout') | list)
+
+    - name: Backup MySQL databases
+      shell: |
+        backup_file="{{ backup_local_dir }}/{{ inventory_hostname }}/{{ item.name }}_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.sql"
+
+        echo "🔄 Backing up {{ item.name }}..."
+        docker exec {{ item.container }} mysqldump -u {{ item.user }} -p{{ item.password | default('') }} {{ item.database }} > "$backup_file"
+
+        if [ $? -eq 0 ]; then
+          echo "✅ {{ item.name }} backup successful"
+          {% if compress_backups | default(true) %}
+          gzip "$backup_file"
+          backup_file="${backup_file}.gz"
+          {% endif %}
+
+          backup_size=$(du -h "$backup_file" | cut -f1)
+          echo "📦 Backup size: $backup_size"
+
+          if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+            cp "$backup_file" "{{ backup_base_dir }}/{{ inventory_hostname }}/"
+            echo "📁 Copied to permanent storage"
+          fi
+        else
+          echo "❌ {{ item.name }} backup failed"
+          exit 1
+        fi
+      register: mysql_backups
+      loop: "{{ current_databases }}"
+      when:
+        - item.type == "mysql"
+        - item.container in (container_check.results | selectattr('stdout', 'equalto', item.container) | map(attribute='stdout') | list)
+      no_log: true  # Hide passwords
+
+    - name: Verify backup integrity
+      shell: |
+        backup_file="{{ backup_local_dir }}/{{ inventory_hostname }}/{{ item.name }}_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.sql{% if compress_backups | default(true) %}.gz{% endif %}"
+
+        if [ -f "$backup_file" ]; then
+          {% if compress_backups | default(true) %}
+          # Test gzip integrity
+          gzip -t "$backup_file"
+          if [ $? -eq 0 ]; then
+            echo "✅ {{ item.name }} backup integrity verified"
+          else
+            echo "❌ {{ item.name }} backup corrupted"
+            exit 1
+          fi
+          {% else %}
+          # Check if file is not empty and contains SQL
+          if [ -s "$backup_file" ] && head -1 "$backup_file" | grep -q "SQL\|PostgreSQL\|MySQL"; then
+            echo "✅ {{ item.name }} backup integrity verified"
+          else
+            echo "❌ {{ item.name }} backup appears invalid"
+            exit 1
+          fi
+          {% endif %}
+        else
+          echo "❌ {{ item.name }} backup file not found"
+          exit 1
+        fi
+      register: backup_verification
+      loop: "{{ current_databases }}"
+      when:
+        - verify_backups | default(true) | bool
+        - item.container in (container_check.results | selectattr('stdout', 'equalto', item.container) | map(attribute='stdout') | list)
+
+    - name: Clean up old backups
+      shell: |
+        echo "🧹 Cleaning up backups older than {{ backup_retention_days | default(30) }} days..."
+
+        # Clean local backups
+        find {{ backup_local_dir }}/{{ inventory_hostname }} -name "*.sql*" -mtime +{{ backup_retention_days | default(30) }} -delete
+
+        # Clean permanent storage backups
+        if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+          find {{ backup_base_dir }}/{{ inventory_hostname }} -name "*.sql*" -mtime +{{ backup_retention_days | default(30) }} -delete
+        fi
+
+        echo "✅ Cleanup complete"
+      when: backup_retention_days | default(30) | int > 0
+
+    - name: Generate backup report
+      shell: |
+        report_file="{{ backup_local_dir }}/{{ inventory_hostname }}/backup_report_{{ ansible_date_time.date }}.txt"
+
+        echo "📊 DATABASE BACKUP REPORT" > "$report_file"
+        echo "=========================" >> "$report_file"
+        echo "Host: {{ inventory_hostname }}" >> "$report_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$report_file"
+        echo "Type: {{ backup_type | default('incremental') }}" >> "$report_file"
+        echo "Retention: {{ backup_retention_days | default(30) }} days" >> "$report_file"
+        echo "" >> "$report_file"
+
+        echo "📦 BACKUP RESULTS:" >> "$report_file"
+        {% for db in current_databases %}
+        backup_file="{{ backup_local_dir }}/{{ inventory_hostname }}/{{ db.name }}_{{ ansible_date_time.date }}_{{ ansible_date_time.hour }}{{ ansible_date_time.minute }}.sql{% if compress_backups | default(true) %}.gz{% endif %}"
+        if [ -f "$backup_file" ]; then
+          size=$(du -h "$backup_file" | cut -f1)
+          echo "✅ {{ db.name }}: $size" >> "$report_file"
+        else
+          echo "❌ {{ db.name }}: FAILED" >> "$report_file"
+        fi
+        {% endfor %}
+
+        echo "" >> "$report_file"
+        echo "📁 BACKUP LOCATIONS:" >> "$report_file"
+        echo "Local: {{ backup_local_dir }}/{{ inventory_hostname }}" >> "$report_file"
+        echo "Permanent: {{ backup_base_dir }}/{{ inventory_hostname }}" >> "$report_file"
+
+        # Copy report to permanent storage
+        if [ -d "{{ backup_base_dir }}/{{ inventory_hostname }}" ]; then
+          cp "$report_file" "{{ backup_base_dir }}/{{ inventory_hostname }}/"
+        fi
+
+        cat "$report_file"
+      register: backup_report
+
+    - name: Display backup summary
+      debug:
+        msg: |
+
+          ✅ DATABASE BACKUP COMPLETE
+          ===========================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          📦 Databases: {{ current_databases | length }}
+          🔄 Type: {{ backup_type | default('incremental') }}
+
+          {{ backup_report.stdout }}
+
+          🔍 Next Steps:
+          - Verify backups: ls -la {{ backup_local_dir }}/{{ inventory_hostname }}
+          - Test restore: ansible-playbook playbooks/restore_from_backup.yml
+          - Schedule regular backups via cron
+
+          ===========================
--- a/ansible/automation/playbooks/backup_verification.yml
+++ b/ansible/automation/playbooks/backup_verification.yml
@@ -0,0 +1,431 @@
+---
+- name: Backup Verification and Testing
+  hosts: all
+  gather_facts: yes
+  vars:
+    verification_timestamp: "{{ ansible_date_time.iso8601 }}"
+    verification_report_dir: "/tmp/backup_verification"
+    backup_base_dir: "/opt/backups"
+    test_restore_dir: "/tmp/restore_test"
+    max_backup_age_days: 7
+
+  tasks:
+    - name: Create verification directories
+      file:
+        path: "{{ item }}"
+        state: directory
+        mode: '0755'
+      loop:
+        - "{{ verification_report_dir }}"
+        - "{{ test_restore_dir }}"
+      delegate_to: localhost
+      run_once: true
+
+    - name: Discover backup locations
+      shell: |
+        echo "=== BACKUP LOCATION DISCOVERY ==="
+
+        # Common backup directories
+        backup_dirs="/opt/backups /home/backups /var/backups /volume1/backups /mnt/backups"
+
+        echo "Searching for backup directories:"
+        for dir in $backup_dirs; do
+          if [ -d "$dir" ]; then
+            echo "✅ Found: $dir"
+            ls -la "$dir" 2>/dev/null | head -5
+            echo ""
+          fi
+        done
+
+        # Look for backup files in common locations
+        echo "Searching for backup files:"
+        find /opt /home /var -name "*.sql" -o -name "*.dump" -o -name "*.tar.gz" -o -name "*.zip" -o -name "*backup*" 2>/dev/null | head -20 | while read backup_file; do
+          if [ -f "$backup_file" ]; then
+            size=$(du -h "$backup_file" 2>/dev/null | cut -f1)
+            date=$(stat -c %y "$backup_file" 2>/dev/null | cut -d' ' -f1)
+            echo "📁 $backup_file ($size, $date)"
+          fi
+        done
+      register: backup_discovery
+      changed_when: false
+
+    - name: Analyze backup integrity
+      shell: |
+        echo "=== BACKUP INTEGRITY ANALYSIS ==="
+
+        # Check for recent backups
+        echo "Recent backup files (last {{ max_backup_age_days }} days):"
+        find /opt /home /var -name "*backup*" -o -name "*.sql" -o -name "*.dump" -mtime -{{ max_backup_age_days }} 2>/dev/null | while read backup_file; do
+          if [ -f "$backup_file" ]; then
+            size=$(du -h "$backup_file" 2>/dev/null | cut -f1)
+            date=$(stat -c %y "$backup_file" 2>/dev/null | cut -d' ' -f1)
+
+            # Basic integrity checks
+            integrity_status="✅ OK"
+
+            # Check if file is empty
+            if [ ! -s "$backup_file" ]; then
+              integrity_status="❌ EMPTY"
+            fi
+
+            # Check file extension and try basic validation
+            case "$backup_file" in
+              *.sql)
+                if ! head -1 "$backup_file" 2>/dev/null | grep -q "SQL\|CREATE\|INSERT\|--"; then
+                  integrity_status="⚠️  SUSPICIOUS"
+                fi
+                ;;
+              *.tar.gz)
+                if ! tar -tzf "$backup_file" >/dev/null 2>&1; then
+                  integrity_status="❌ CORRUPT"
+                fi
+                ;;
+              *.zip)
+                if command -v unzip >/dev/null 2>&1; then
+                  if ! unzip -t "$backup_file" >/dev/null 2>&1; then
+                    integrity_status="❌ CORRUPT"
+                  fi
+                fi
+                ;;
+            esac
+
+            echo "$integrity_status $backup_file ($size, $date)"
+          fi
+        done
+        echo ""
+
+        # Check for old backups
+        echo "Old backup files (older than {{ max_backup_age_days }} days):"
+        old_backups=$(find /opt /home /var -name "*backup*" -o -name "*.sql" -o -name "*.dump" -mtime +{{ max_backup_age_days }} 2>/dev/null | wc -l)
+        echo "Found $old_backups old backup files"
+
+        if [ "$old_backups" -gt "0" ]; then
+          echo "Oldest 5 backup files:"
+          find /opt /home /var -name "*backup*" -o -name "*.sql" -o -name "*.dump" -mtime +{{ max_backup_age_days }} 2>/dev/null | head -5 | while read old_file; do
+            date=$(stat -c %y "$old_file" 2>/dev/null | cut -d' ' -f1)
+            size=$(du -h "$old_file" 2>/dev/null | cut -f1)
+            echo "  $old_file ($size, $date)"
+          done
+        fi
+      register: integrity_analysis
+      changed_when: false
+
+    - name: Test database backup restoration
+      shell: |
+        echo "=== DATABASE BACKUP RESTORATION TEST ==="
+
+        # Find recent database backups
+        db_backups=$(find /opt /home /var -name "*.sql" -o -name "*.dump" -mtime -{{ max_backup_age_days }} 2>/dev/null | head -5)
+
+        if [ -z "$db_backups" ]; then
+          echo "No recent database backups found for testing"
+          exit 0
+        fi
+
+        echo "Testing database backup restoration:"
+
+        for backup_file in $db_backups; do
+          echo "Testing: $backup_file"
+
+          # Determine database type from filename or content
+          db_type="unknown"
+          if echo "$backup_file" | grep -qi "postgres\|postgresql"; then
+            db_type="postgresql"
+          elif echo "$backup_file" | grep -qi "mysql\|mariadb"; then
+            db_type="mysql"
+          elif head -5 "$backup_file" 2>/dev/null | grep -qi "postgresql"; then
+            db_type="postgresql"
+          elif head -5 "$backup_file" 2>/dev/null | grep -qi "mysql"; then
+            db_type="mysql"
+          fi
+
+          echo "  Detected type: $db_type"
+
+          # Basic syntax validation
+          case "$db_type" in
+            "postgresql")
+              if command -v psql >/dev/null 2>&1; then
+                # Test PostgreSQL backup syntax
+                if psql --set ON_ERROR_STOP=1 -f "$backup_file" -d template1 --dry-run 2>/dev/null; then
+                  echo "  ✅ PostgreSQL syntax valid"
+                else
+                  echo "  ⚠️  PostgreSQL syntax check failed (may require specific database)"
+                fi
+              else
+                echo "  ⚠️  PostgreSQL client not available for testing"
+              fi
+              ;;
+            "mysql")
+              if command -v mysql >/dev/null 2>&1; then
+                # Test MySQL backup syntax
+                if mysql --execute="source $backup_file" --force --dry-run 2>/dev/null; then
+                  echo "  ✅ MySQL syntax valid"
+                else
+                  echo "  ⚠️  MySQL syntax check failed (may require specific database)"
+                fi
+              else
+                echo "  ⚠️  MySQL client not available for testing"
+              fi
+              ;;
+            *)
+              # Generic SQL validation
+              if grep -q "CREATE\|INSERT\|UPDATE" "$backup_file" 2>/dev/null; then
+                echo "  ✅ Contains SQL statements"
+              else
+                echo "  ❌ No SQL statements found"
+              fi
+              ;;
+          esac
+
+          echo ""
+        done
+      register: db_restore_test
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Test file backup restoration
+      shell: |
+        echo "=== FILE BACKUP RESTORATION TEST ==="
+
+        # Find recent archive backups
+        archive_backups=$(find /opt /home /var -name "*.tar.gz" -o -name "*.zip" -mtime -{{ max_backup_age_days }} 2>/dev/null | head -3)
+
+        if [ -z "$archive_backups" ]; then
+          echo "No recent archive backups found for testing"
+          exit 0
+        fi
+
+        echo "Testing file backup restoration:"
+
+        for backup_file in $archive_backups; do
+          echo "Testing: $backup_file"
+
+          # Create test extraction directory
+          test_dir="{{ test_restore_dir }}/$(basename "$backup_file" | sed 's/\.[^.]*$//')_test"
+          mkdir -p "$test_dir"
+
+          case "$backup_file" in
+            *.tar.gz)
+              if tar -tzf "$backup_file" >/dev/null 2>&1; then
+                echo "  ✅ Archive is readable"
+
+                # Test partial extraction
+                if tar -xzf "$backup_file" -C "$test_dir" --strip-components=1 2>/dev/null | head -5; then
+                  extracted_files=$(find "$test_dir" -type f 2>/dev/null | wc -l)
+                  echo "  ✅ Extracted $extracted_files files successfully"
+                else
+                  echo "  ❌ Extraction failed"
+                fi
+              else
+                echo "  ❌ Archive is corrupted or unreadable"
+              fi
+              ;;
+            *.zip)
+              if command -v unzip >/dev/null 2>&1; then
+                if unzip -t "$backup_file" >/dev/null 2>&1; then
+                  echo "  ✅ ZIP archive is valid"
+
+                  # Test partial extraction
+                  if unzip -q "$backup_file" -d "$test_dir" 2>/dev/null; then
+                    extracted_files=$(find "$test_dir" -type f 2>/dev/null | wc -l)
+                    echo "  ✅ Extracted $extracted_files files successfully"
+                  else
+                    echo "  ❌ Extraction failed"
+                  fi
+                else
+                  echo "  ❌ ZIP archive is corrupted"
+                fi
+              else
+                echo "  ⚠️  unzip command not available"
+              fi
+              ;;
+          esac
+
+          # Cleanup test directory
+          rm -rf "$test_dir" 2>/dev/null
+          echo ""
+        done
+      register: file_restore_test
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check backup automation status
+      shell: |
+        echo "=== BACKUP AUTOMATION STATUS ==="
+
+        # Check for cron jobs related to backups
+        echo "Cron jobs (backup-related):"
+        if command -v crontab >/dev/null 2>&1; then
+          crontab -l 2>/dev/null | grep -i backup || echo "No backup cron jobs found"
+        else
+          echo "Crontab not available"
+        fi
+        echo ""
+
+        # Check systemd timers
+        if command -v systemctl >/dev/null 2>&1; then
+          echo "Systemd timers (backup-related):"
+          systemctl list-timers --no-pager 2>/dev/null | grep -i backup || echo "No backup timers found"
+          echo ""
+        fi
+
+        # Check for Docker containers that might be doing backups
+        if command -v docker >/dev/null 2>&1; then
+          echo "Docker containers (backup-related):"
+          docker ps --format "{{.Names}}\t{{.Image}}" 2>/dev/null | grep -i backup || echo "No backup containers found"
+          echo ""
+        fi
+
+        # Check for backup scripts
+        echo "Backup scripts:"
+        find /opt /home /usr/local -name "*backup*" -type f -executable 2>/dev/null | head -10 | while read script; do
+          echo "  $script"
+        done
+      register: automation_status
+      changed_when: false
+
+    - name: Generate backup health score
+      shell: |
+        echo "=== BACKUP HEALTH SCORE ==="
+
+        score=100
+        issues=0
+
+        # Check for recent backups
+        recent_backups=$(find /opt /home /var -name "*backup*" -o -name "*.sql" -o -name "*.dump" -mtime -{{ max_backup_age_days }} 2>/dev/null | wc -l)
+        if [ "$recent_backups" -eq "0" ]; then
+          echo "❌ No recent backups found (-30 points)"
+          score=$((score - 30))
+          issues=$((issues + 1))
+        elif [ "$recent_backups" -lt "3" ]; then
+          echo "⚠️  Few recent backups found (-10 points)"
+          score=$((score - 10))
+          issues=$((issues + 1))
+        else
+          echo "✅ Recent backups found (+0 points)"
+        fi
+
+        # Check for automation
+        cron_backups=$(crontab -l 2>/dev/null | grep -i backup | wc -l)
+        if [ "$cron_backups" -eq "0" ]; then
+          echo "⚠️  No automated backup jobs found (-20 points)"
+          score=$((score - 20))
+          issues=$((issues + 1))
+        else
+          echo "✅ Automated backup jobs found (+0 points)"
+        fi
+
+        # Check for old backups (retention policy)
+        old_backups=$(find /opt /home /var -name "*backup*" -mtime +30 2>/dev/null | wc -l)
+        if [ "$old_backups" -gt "10" ]; then
+          echo "⚠️  Many old backups found - consider cleanup (-5 points)"
+          score=$((score - 5))
+          issues=$((issues + 1))
+        else
+          echo "✅ Backup retention appears managed (+0 points)"
+        fi
+
+        # Determine health status
+        if [ "$score" -ge "90" ]; then
+          health_status="EXCELLENT"
+        elif [ "$score" -ge "70" ]; then
+          health_status="GOOD"
+        elif [ "$score" -ge "50" ]; then
+          health_status="FAIR"
+        else
+          health_status="POOR"
+        fi
+
+        echo ""
+        echo "BACKUP HEALTH SCORE: $score/100 ($health_status)"
+        echo "ISSUES FOUND: $issues"
+      register: health_score
+      changed_when: false
+
+    - name: Create verification report
+      set_fact:
+        verification_report:
+          timestamp: "{{ verification_timestamp }}"
+          hostname: "{{ inventory_hostname }}"
+          backup_discovery: "{{ backup_discovery.stdout }}"
+          integrity_analysis: "{{ integrity_analysis.stdout }}"
+          db_restore_test: "{{ db_restore_test.stdout }}"
+          file_restore_test: "{{ file_restore_test.stdout }}"
+          automation_status: "{{ automation_status.stdout }}"
+          health_score: "{{ health_score.stdout }}"
+
+    - name: Display verification report
+      debug:
+        msg: |
+
+          ==========================================
+          🔍 BACKUP VERIFICATION - {{ inventory_hostname }}
+          ==========================================
+
+          📁 BACKUP DISCOVERY:
+          {{ verification_report.backup_discovery }}
+
+          🔒 INTEGRITY ANALYSIS:
+          {{ verification_report.integrity_analysis }}
+
+          🗄️  DATABASE RESTORE TEST:
+          {{ verification_report.db_restore_test }}
+
+          📦 FILE RESTORE TEST:
+          {{ verification_report.file_restore_test }}
+
+          🤖 AUTOMATION STATUS:
+          {{ verification_report.automation_status }}
+
+          📊 HEALTH SCORE:
+          {{ verification_report.health_score }}
+
+          ==========================================
+
+    - name: Generate JSON verification report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ verification_report.timestamp }}",
+            "hostname": "{{ verification_report.hostname }}",
+            "backup_discovery": {{ verification_report.backup_discovery | to_json }},
+            "integrity_analysis": {{ verification_report.integrity_analysis | to_json }},
+            "db_restore_test": {{ verification_report.db_restore_test | to_json }},
+            "file_restore_test": {{ verification_report.file_restore_test | to_json }},
+            "automation_status": {{ verification_report.automation_status | to_json }},
+            "health_score": {{ verification_report.health_score | to_json }},
+            "recommendations": [
+              {% if 'No recent backups found' in verification_report.integrity_analysis %}
+              "Implement regular backup procedures",
+              {% endif %}
+              {% if 'No backup cron jobs found' in verification_report.automation_status %}
+              "Set up automated backup scheduling",
+              {% endif %}
+              {% if 'CORRUPT' in verification_report.integrity_analysis %}
+              "Investigate and fix corrupted backup files",
+              {% endif %}
+              {% if 'old backup files' in verification_report.integrity_analysis %}
+              "Implement backup retention policy",
+              {% endif %}
+              "Regular backup verification testing recommended"
+            ]
+          }
+        dest: "{{ verification_report_dir }}/{{ inventory_hostname }}_backup_verification_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Cleanup test files
+      file:
+        path: "{{ test_restore_dir }}"
+        state: absent
+      ignore_errors: yes
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          🔍 Backup verification complete for {{ inventory_hostname }}
+          📄 Report saved to: {{ verification_report_dir }}/{{ inventory_hostname }}_backup_verification_{{ ansible_date_time.epoch }}.json
+
+          💡 Regular backup verification ensures data recovery capability
+          💡 Test restore procedures periodically to validate backup integrity
+          💡 Monitor backup automation to ensure continuous protection
--- a/ansible/automation/playbooks/certificate_renewal.yml
+++ b/ansible/automation/playbooks/certificate_renewal.yml
@@ -0,0 +1,377 @@
+---
+# SSL Certificate Management and Renewal Playbook
+# Manage Let's Encrypt certificates and other SSL certificates
+# Usage: ansible-playbook playbooks/certificate_renewal.yml
+# Usage: ansible-playbook playbooks/certificate_renewal.yml -e "force_renewal=true"
+# Usage: ansible-playbook playbooks/certificate_renewal.yml -e "check_only=true"
+
+- name: SSL Certificate Management and Renewal
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    force_renewal: "{{ force_renewal | default(false) }}"
+    check_only: "{{ check_only | default(false) }}"
+    renewal_threshold_days: "{{ renewal_threshold_days | default(30) }}"
+    backup_certificates: "{{ backup_certificates | default(true) }}"
+    restart_services: "{{ restart_services | default(true) }}"
+
+    # Certificate locations and services
+    certificate_configs:
+      atlantis:
+        - name: "nginx-proxy-manager"
+          cert_path: "/volume1/docker/nginx-proxy-manager/data/letsencrypt"
+          domains: ["*.vish.gg", "vish.gg"]
+          service: "nginx-proxy-manager"
+          renewal_method: "npm"  # Nginx Proxy Manager handles this
+        - name: "synology-dsm"
+          cert_path: "/usr/syno/etc/certificate"
+          domains: ["atlantis.vish.local"]
+          service: "nginx"
+          renewal_method: "synology"
+      calypso:
+        - name: "nginx-proxy-manager"
+          cert_path: "/volume1/docker/nginx-proxy-manager/data/letsencrypt"
+          domains: ["*.calypso.local"]
+          service: "nginx-proxy-manager"
+          renewal_method: "npm"
+      homelab_vm:
+        - name: "nginx"
+          cert_path: "/etc/letsencrypt"
+          domains: ["homelab.vish.gg"]
+          service: "nginx"
+          renewal_method: "certbot"
+        - name: "traefik"
+          cert_path: "/opt/docker/traefik/certs"
+          domains: ["*.homelab.vish.gg"]
+          service: "traefik"
+          renewal_method: "traefik"
+
+  tasks:
+    - name: Create certificate report directory
+      file:
+        path: "/tmp/certificate_reports/{{ ansible_date_time.date }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+
+    - name: Get current certificate configurations for this host
+      set_fact:
+        current_certificates: "{{ certificate_configs.get(inventory_hostname, []) }}"
+
+    - name: Display certificate management plan
+      debug:
+        msg: |
+          🔒 CERTIFICATE MANAGEMENT PLAN
+          ==============================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔍 Check Only: {{ check_only }}
+          🔄 Force Renewal: {{ force_renewal }}
+          📅 Renewal Threshold: {{ renewal_threshold_days }} days
+          💾 Backup Certificates: {{ backup_certificates }}
+
+          📋 Certificates to manage: {{ current_certificates | length }}
+          {% for cert in current_certificates %}
+          - {{ cert.name }}: {{ cert.domains | join(', ') }}
+          {% endfor %}
+
+    - name: Check certificate expiration dates
+      shell: |
+        cert_info_file="/tmp/certificate_reports/{{ ansible_date_time.date }}/{{ inventory_hostname }}_cert_info.txt"
+
+        echo "🔒 CERTIFICATE STATUS REPORT - {{ inventory_hostname }}" > "$cert_info_file"
+        echo "=================================================" >> "$cert_info_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$cert_info_file"
+        echo "Renewal Threshold: {{ renewal_threshold_days }} days" >> "$cert_info_file"
+        echo "" >> "$cert_info_file"
+
+        {% for cert in current_certificates %}
+        echo "=== {{ cert.name }} ===" >> "$cert_info_file"
+        echo "Domains: {{ cert.domains | join(', ') }}" >> "$cert_info_file"
+        echo "Method: {{ cert.renewal_method }}" >> "$cert_info_file"
+
+        # Check certificate expiration for each domain
+        {% for domain in cert.domains %}
+        echo "Checking {{ domain }}..." >> "$cert_info_file"
+
+        # Try different methods to check certificate
+        if command -v openssl &> /dev/null; then
+          # Method 1: Check via SSL connection (if accessible)
+          cert_info=$(echo | timeout 10 openssl s_client -servername {{ domain }} -connect {{ domain }}:443 2>/dev/null | openssl x509 -noout -dates 2>/dev/null)
+          if [ $? -eq 0 ]; then
+            echo "  SSL Connection: ✅" >> "$cert_info_file"
+            echo "  $cert_info" >> "$cert_info_file"
+
+            # Calculate days until expiration
+            not_after=$(echo "$cert_info" | grep notAfter | cut -d= -f2)
+            if [ -n "$not_after" ]; then
+              exp_date=$(date -d "$not_after" +%s 2>/dev/null || echo "0")
+              current_date=$(date +%s)
+              days_left=$(( (exp_date - current_date) / 86400 ))
+              echo "  Days until expiration: $days_left" >> "$cert_info_file"
+
+              if [ $days_left -lt {{ renewal_threshold_days }} ]; then
+                echo "  Status: ⚠️  RENEWAL NEEDED" >> "$cert_info_file"
+              else
+                echo "  Status: ✅ Valid" >> "$cert_info_file"
+              fi
+            fi
+          else
+            echo "  SSL Connection: ❌ Failed" >> "$cert_info_file"
+          fi
+
+          # Method 2: Check local certificate files
+          {% if cert.cert_path %}
+          if [ -d "{{ cert.cert_path }}" ]; then
+            echo "  Local cert path: {{ cert.cert_path }}" >> "$cert_info_file"
+
+            # Find certificate files
+            cert_files=$(find {{ cert.cert_path }} -name "*.crt" -o -name "*.pem" -o -name "fullchain.pem" 2>/dev/null | head -5)
+            if [ -n "$cert_files" ]; then
+              echo "  Certificate files found:" >> "$cert_info_file"
+              for cert_file in $cert_files; do
+                echo "    $cert_file" >> "$cert_info_file"
+                if openssl x509 -in "$cert_file" -noout -dates 2>/dev/null; then
+                  local_cert_info=$(openssl x509 -in "$cert_file" -noout -dates 2>/dev/null)
+                  echo "    $local_cert_info" >> "$cert_info_file"
+                fi
+              done
+            else
+              echo "  No certificate files found in {{ cert.cert_path }}" >> "$cert_info_file"
+            fi
+          else
+            echo "  Certificate path {{ cert.cert_path }} not found" >> "$cert_info_file"
+          fi
+          {% endif %}
+        else
+          echo "  OpenSSL not available" >> "$cert_info_file"
+        fi
+
+        echo "" >> "$cert_info_file"
+        {% endfor %}
+        echo "" >> "$cert_info_file"
+        {% endfor %}
+
+        cat "$cert_info_file"
+      register: certificate_status
+      changed_when: false
+
+    - name: Backup existing certificates
+      shell: |
+        backup_dir="/tmp/certificate_backups/{{ ansible_date_time.epoch }}"
+        mkdir -p "$backup_dir"
+
+        echo "Creating certificate backup..."
+
+        {% for cert in current_certificates %}
+        {% if cert.cert_path %}
+        if [ -d "{{ cert.cert_path }}" ]; then
+          echo "Backing up {{ cert.name }}..."
+          tar -czf "$backup_dir/{{ cert.name }}_backup.tar.gz" -C "$(dirname {{ cert.cert_path }})" "$(basename {{ cert.cert_path }})" 2>/dev/null || echo "Backup failed for {{ cert.name }}"
+        fi
+        {% endif %}
+        {% endfor %}
+
+        echo "✅ Certificate backup created at $backup_dir"
+        ls -la "$backup_dir"
+      register: certificate_backup
+      when:
+        - backup_certificates | bool
+        - not check_only | bool
+
+    - name: Renew certificates via Certbot
+      shell: |
+        echo "🔄 Renewing certificates via Certbot..."
+
+        {% if force_renewal %}
+        certbot renew --force-renewal --quiet
+        {% else %}
+        certbot renew --quiet
+        {% endif %}
+
+        if [ $? -eq 0 ]; then
+          echo "✅ Certbot renewal successful"
+        else
+          echo "❌ Certbot renewal failed"
+          exit 1
+        fi
+      register: certbot_renewal
+      when:
+        - not check_only | bool
+        - current_certificates | selectattr('renewal_method', 'equalto', 'certbot') | list | length > 0
+      ignore_errors: yes
+
+    - name: Check Nginx Proxy Manager certificates
+      shell: |
+        echo "🔍 Checking Nginx Proxy Manager certificates..."
+
+        {% for cert in current_certificates %}
+        {% if cert.renewal_method == 'npm' %}
+        if [ -d "{{ cert.cert_path }}" ]; then
+          echo "NPM certificate path exists: {{ cert.cert_path }}"
+
+          # NPM manages certificates automatically, just check status
+          find {{ cert.cert_path }} -name "*.pem" -mtime -1 | head -5 | while read cert_file; do
+            echo "Recent certificate: $cert_file"
+          done
+        else
+          echo "NPM certificate path not found: {{ cert.cert_path }}"
+        fi
+        {% endif %}
+        {% endfor %}
+      register: npm_certificate_check
+      when: current_certificates | selectattr('renewal_method', 'equalto', 'npm') | list | length > 0
+      changed_when: false
+
+    - name: Restart services after certificate renewal
+      ansible.builtin.command: "docker restart {{ item.service }}"
+      loop: "{{ current_certificates | selectattr('service', 'defined') | list }}"
+      when:
+        - restart_services | bool
+        - item.service is defined
+      register: service_restart_result
+      failed_when: false
+      changed_when: service_restart_result.rc == 0
+        - not check_only | bool
+        - (certbot_renewal.changed | default(false)) or (force_renewal | bool)
+
+    - name: Verify certificate renewal
+      shell: |
+        echo "🔍 Verifying certificate renewal..."
+
+        verification_results=()
+
+        {% for cert in current_certificates %}
+        {% for domain in cert.domains %}
+        echo "Verifying {{ domain }}..."
+
+        if command -v openssl &> /dev/null; then
+          # Check certificate via SSL connection
+          cert_info=$(echo | timeout 10 openssl s_client -servername {{ domain }} -connect {{ domain }}:443 2>/dev/null | openssl x509 -noout -dates 2>/dev/null)
+          if [ $? -eq 0 ]; then
+            not_after=$(echo "$cert_info" | grep notAfter | cut -d= -f2)
+            if [ -n "$not_after" ]; then
+              exp_date=$(date -d "$not_after" +%s 2>/dev/null || echo "0")
+              current_date=$(date +%s)
+              days_left=$(( (exp_date - current_date) / 86400 ))
+
+              if [ $days_left -gt {{ renewal_threshold_days }} ]; then
+                echo "✅ {{ domain }}: $days_left days remaining"
+                verification_results+=("{{ domain }}:OK:$days_left")
+              else
+                echo "⚠️  {{ domain }}: Only $days_left days remaining"
+                verification_results+=("{{ domain }}:WARNING:$days_left")
+              fi
+            else
+              echo "❌ {{ domain }}: Cannot parse expiration date"
+              verification_results+=("{{ domain }}:ERROR:unknown")
+            fi
+          else
+            echo "❌ {{ domain }}: SSL connection failed"
+            verification_results+=("{{ domain }}:ERROR:connection_failed")
+          fi
+        else
+          echo "⚠️  Cannot verify {{ domain }}: OpenSSL not available"
+          verification_results+=("{{ domain }}:SKIP:no_openssl")
+        fi
+        {% endfor %}
+        {% endfor %}
+
+        echo ""
+        echo "📊 VERIFICATION SUMMARY:"
+        for result in "${verification_results[@]}"; do
+          echo "$result"
+        done
+      register: certificate_verification
+      changed_when: false
+
+    - name: Generate certificate management report
+      copy:
+        content: |
+          🔒 CERTIFICATE MANAGEMENT REPORT - {{ inventory_hostname }}
+          ======================================================
+
+          📅 Management Date: {{ ansible_date_time.iso8601 }}
+          🖥️  Host: {{ inventory_hostname }}
+          🔍 Check Only: {{ check_only }}
+          🔄 Force Renewal: {{ force_renewal }}
+          📅 Renewal Threshold: {{ renewal_threshold_days }} days
+          💾 Backup Created: {{ backup_certificates }}
+
+          📋 CERTIFICATES MANAGED: {{ current_certificates | length }}
+          {% for cert in current_certificates %}
+          - {{ cert.name }}: {{ cert.domains | join(', ') }} ({{ cert.renewal_method }})
+          {% endfor %}
+
+          📊 CERTIFICATE STATUS:
+          {{ certificate_status.stdout }}
+
+          {% if not check_only %}
+          🔄 RENEWAL ACTIONS:
+          {% if certbot_renewal is defined %}
+          Certbot Renewal: {{ 'Success' if certbot_renewal.rc == 0 else 'Failed' }}
+          {% endif %}
+
+          {% if service_restart_result is defined %}
+          Service Restarts:
+          {{ service_restart_result.stdout }}
+          {% endif %}
+
+          {% if backup_certificates %}
+          💾 BACKUP INFO:
+          {{ certificate_backup.stdout }}
+          {% endif %}
+          {% endif %}
+
+          🔍 VERIFICATION RESULTS:
+          {{ certificate_verification.stdout }}
+
+          💡 RECOMMENDATIONS:
+          - Schedule regular certificate checks via cron
+          - Monitor certificate expiration alerts
+          - Test certificate renewal in staging environment
+          - Keep certificate backups in secure location
+          {% if current_certificates | selectattr('renewal_method', 'equalto', 'npm') | list | length > 0 %}
+          - Nginx Proxy Manager handles automatic renewal
+          {% endif %}
+
+          ✅ CERTIFICATE MANAGEMENT COMPLETE
+
+        dest: "/tmp/certificate_reports/{{ ansible_date_time.date }}/{{ inventory_hostname }}_cert_report.txt"
+      delegate_to: localhost
+
+    - name: Display certificate management summary
+      debug:
+        msg: |
+
+          ✅ CERTIFICATE MANAGEMENT COMPLETE - {{ inventory_hostname }}
+          ====================================================
+
+          📅 Date: {{ ansible_date_time.date }}
+          🔍 Mode: {{ 'Check Only' if check_only else 'Full Management' }}
+          📋 Certificates: {{ current_certificates | length }}
+
+          {{ certificate_verification.stdout }}
+
+          📄 Full report: /tmp/certificate_reports/{{ ansible_date_time.date }}/{{ inventory_hostname }}_cert_report.txt
+
+          🔍 Next Steps:
+          {% if check_only %}
+          - Run without check_only to perform renewals
+          {% endif %}
+          - Schedule regular certificate monitoring
+          - Set up expiration alerts
+          - Test certificate functionality
+
+          ====================================================
+
+    - name: Send certificate alerts (if configured)
+      debug:
+        msg: |
+          📧 CERTIFICATE ALERT
+          Host: {{ inventory_hostname }}
+          Certificates expiring soon detected!
+          Check the full report for details.
+      when:
+        - send_alerts | default(false) | bool
+        - "'WARNING' in certificate_verification.stdout"
--- a/ansible/automation/playbooks/check_apt_proxy.yml
+++ b/ansible/automation/playbooks/check_apt_proxy.yml
@@ -0,0 +1,193 @@
+---
+- name: Check APT Proxy Configuration on Debian/Ubuntu hosts
+  hosts: debian_clients
+  become: no
+  gather_facts: yes
+
+  vars:
+    expected_proxy_host: 100.103.48.78  # calypso
+    expected_proxy_port: 3142
+    apt_proxy_file: /etc/apt/apt.conf.d/01proxy
+    expected_proxy_url: "http://{{ expected_proxy_host }}:{{ expected_proxy_port }}/"
+
+  tasks:
+    # ---------- System Detection ----------
+    - name: Detect OS family
+      ansible.builtin.debug:
+        msg: "Host {{ inventory_hostname }} is running {{ ansible_os_family }} {{ ansible_distribution }} {{ ansible_distribution_version }}"
+
+    - name: Skip non-Debian systems
+      ansible.builtin.meta: end_host
+      when: ansible_os_family != "Debian"
+
+    # ---------- APT Proxy Configuration Check ----------
+    - name: Check if APT proxy config file exists
+      ansible.builtin.stat:
+        path: "{{ apt_proxy_file }}"
+      register: proxy_file_stat
+
+    - name: Read APT proxy configuration (if exists)
+      ansible.builtin.slurp:
+        src: "{{ apt_proxy_file }}"
+      register: proxy_config_content
+      when: proxy_file_stat.stat.exists
+      failed_when: false
+
+    - name: Parse proxy configuration
+      ansible.builtin.set_fact:
+        proxy_config_decoded: "{{ proxy_config_content.content | b64decode }}"
+      when: proxy_file_stat.stat.exists and proxy_config_content is defined
+
+    # ---------- Network Connectivity Test ----------
+    - name: Test connectivity to expected proxy server
+      ansible.builtin.uri:
+        url: "http://{{ expected_proxy_host }}:{{ expected_proxy_port }}/"
+        method: HEAD
+        timeout: 10
+      register: proxy_connectivity
+      failed_when: false
+      changed_when: false
+
+    # ---------- APT Configuration Analysis ----------
+    - name: Check current APT proxy settings via apt-config
+      ansible.builtin.command: apt-config dump Acquire::http::Proxy
+      register: apt_config_proxy
+      changed_when: false
+      failed_when: false
+      become: yes
+
+    - name: Test APT update with current configuration (dry-run)
+      ansible.builtin.command: apt-get update --print-uris --dry-run
+      register: apt_update_test
+      changed_when: false
+      failed_when: false
+      become: yes
+
+    # ---------- Analysis and Reporting ----------
+    - name: Analyze proxy configuration status
+      ansible.builtin.set_fact:
+        proxy_status:
+          file_exists: "{{ proxy_file_stat.stat.exists }}"
+          file_content: "{{ proxy_config_decoded | default('N/A') }}"
+          expected_config: "Acquire::http::Proxy \"{{ expected_proxy_url }}\";"
+          proxy_reachable: "{{ proxy_connectivity.status is defined and (proxy_connectivity.status == 200 or proxy_connectivity.status == 406) }}"
+          apt_config_output: "{{ apt_config_proxy.stdout | default('N/A') }}"
+          using_expected_proxy: "{{ (proxy_config_decoded | default('')) is search(expected_proxy_host) }}"
+
+    # ---------- Health Assertions ----------
+    - name: Assert APT proxy is properly configured
+      ansible.builtin.assert:
+        that:
+          - proxy_status.file_exists
+          - proxy_status.using_expected_proxy
+          - proxy_status.proxy_reachable
+        success_msg: "✅ {{ inventory_hostname }} is correctly using APT proxy {{ expected_proxy_host }}:{{ expected_proxy_port }}"
+        fail_msg: "❌ {{ inventory_hostname }} APT proxy configuration issues detected"
+      failed_when: false
+      register: proxy_assertion
+
+    # ---------- Detailed Summary ----------
+    - name: Display comprehensive proxy status
+      ansible.builtin.debug:
+        msg: |
+
+          🔍 APT Proxy Status for {{ inventory_hostname }}:
+          ================================================
+          OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
+
+          📁 Configuration File:
+            Path: {{ apt_proxy_file }}
+            Exists: {{ proxy_status.file_exists }}
+            Content: {{ proxy_status.file_content | regex_replace('\n', ' ') }}
+
+          🎯 Expected Configuration:
+            {{ proxy_status.expected_config }}
+
+          🌐 Network Connectivity:
+            Proxy Server: {{ expected_proxy_host }}:{{ expected_proxy_port }}
+            Reachable: {{ proxy_status.proxy_reachable }}
+            Response: {{ proxy_connectivity.status | default('N/A') }}
+
+          ⚙️  Current APT Config:
+            {{ proxy_status.apt_config_output }}
+
+          ✅ Status: {{ 'CONFIGURED' if proxy_status.using_expected_proxy else 'NOT CONFIGURED' }}
+          🔗 Connectivity: {{ 'OK' if proxy_status.proxy_reachable else 'FAILED' }}
+
+          {% if not proxy_assertion.failed %}
+          🎉 Result: APT proxy is working correctly!
+          {% else %}
+          ⚠️  Result: APT proxy needs attention
+          {% endif %}
+
+    # ---------- Recommendations ----------
+    - name: Provide configuration recommendations
+      ansible.builtin.debug:
+        msg: |
+
+          💡 Recommendations for {{ inventory_hostname }}:
+          {% if not proxy_status.file_exists %}
+          - Create APT proxy config: echo 'Acquire::http::Proxy "{{ expected_proxy_url }}";' | sudo tee {{ apt_proxy_file }}
+          {% endif %}
+          {% if not proxy_status.proxy_reachable %}
+          - Check network connectivity to {{ expected_proxy_host }}:{{ expected_proxy_port }}
+          - Verify calypso apt-cacher-ng service is running
+          {% endif %}
+          {% if proxy_status.file_exists and not proxy_status.using_expected_proxy %}
+          - Update proxy configuration to use {{ expected_proxy_url }}
+          {% endif %}
+      when: proxy_assertion.failed
+
+    # ---------- Summary Statistics ----------
+    - name: Record results for summary
+      ansible.builtin.set_fact:
+        host_proxy_result:
+          hostname: "{{ inventory_hostname }}"
+          configured: "{{ proxy_status.using_expected_proxy }}"
+          reachable: "{{ proxy_status.proxy_reachable }}"
+          status: "{{ 'OK' if (proxy_status.using_expected_proxy and proxy_status.proxy_reachable) else 'NEEDS_ATTENTION' }}"
+
+# ---------- Final Summary Report ----------
+- name: APT Proxy Summary Report
+  hosts: localhost
+  gather_facts: no
+  run_once: true
+
+  vars:
+    expected_proxy_host: 100.103.48.78  # calypso
+    expected_proxy_port: 3142
+
+  tasks:
+    - name: Collect all host results
+      ansible.builtin.set_fact:
+        all_results: "{{ groups['debian_clients'] | map('extract', hostvars) | selectattr('host_proxy_result', 'defined') | map(attribute='host_proxy_result') | list }}"
+      when: groups['debian_clients'] is defined
+
+    - name: Generate summary statistics
+      ansible.builtin.set_fact:
+        summary_stats:
+          total_hosts: "{{ all_results | length }}"
+          configured_hosts: "{{ all_results | selectattr('configured', 'equalto', true) | list | length }}"
+          reachable_hosts: "{{ all_results | selectattr('reachable', 'equalto', true) | list | length }}"
+          healthy_hosts: "{{ all_results | selectattr('status', 'equalto', 'OK') | list | length }}"
+      when: all_results is defined
+
+    - name: Display final summary
+      ansible.builtin.debug:
+        msg: |
+
+          📊 APT PROXY HEALTH SUMMARY
+          ===========================
+          Total Debian Clients: {{ summary_stats.total_hosts | default(0) }}
+          Properly Configured: {{ summary_stats.configured_hosts | default(0) }}
+          Proxy Reachable: {{ summary_stats.reachable_hosts | default(0) }}
+          Fully Healthy: {{ summary_stats.healthy_hosts | default(0) }}
+
+          🎯 Target Proxy: calypso ({{ expected_proxy_host }}:{{ expected_proxy_port }})
+
+          {% if summary_stats.healthy_hosts | default(0) == summary_stats.total_hosts | default(0) %}
+          🎉 ALL SYSTEMS OPTIMAL - APT proxy working perfectly across all clients!
+          {% else %}
+          ⚠️  Some systems need attention - check individual host reports above
+          {% endif %}
+      when: summary_stats is defined
--- a/ansible/automation/playbooks/cleanup.yml
+++ b/ansible/automation/playbooks/cleanup.yml
@@ -0,0 +1,26 @@
+---
+- name: Clean up unused packages and temporary files
+  hosts: all
+  become: true
+  tasks:
+    - name: Autoremove unused packages
+      apt:
+        autoremove: yes
+      when: ansible_os_family == "Debian"
+
+    - name: Clean apt cache
+      apt:
+        autoclean: yes
+      when: ansible_os_family == "Debian"
+
+    - name: Clear temporary files
+      file:
+        path: /tmp
+        state: absent
+      ignore_errors: true
+
+    - name: Recreate /tmp directory
+      file:
+        path: /tmp
+        state: directory
+        mode: '1777'
--- a/ansible/automation/playbooks/configure_apt_proxy.yml
+++ b/ansible/automation/playbooks/configure_apt_proxy.yml
@@ -0,0 +1,62 @@
+---
+- name: Configure APT Proxy on Debian/Ubuntu hosts
+  hosts: debian_clients
+  become: yes
+  gather_facts: yes
+
+  vars:
+    apt_proxy_host: 100.103.48.78
+    apt_proxy_port: 3142
+    apt_proxy_file: /etc/apt/apt.conf.d/01proxy
+
+  tasks:
+    - name: Verify OS compatibility
+      ansible.builtin.assert:
+        that:
+          - ansible_os_family == "Debian"
+        fail_msg: "Host {{ inventory_hostname }} is not Debian-based. Skipping."
+        success_msg: "Host {{ inventory_hostname }} is Debian-based."
+      tags: verify
+
+    - name: Create APT proxy configuration
+      ansible.builtin.copy:
+        dest: "{{ apt_proxy_file }}"
+        owner: root
+        group: root
+        mode: '0644'
+        content: |
+          Acquire::http::Proxy "http://{{ apt_proxy_host }}:{{ apt_proxy_port }}/";
+          Acquire::https::Proxy "false";
+      register: proxy_conf
+      tags: config
+
+    - name: Ensure APT cache directories exist
+      ansible.builtin.file:
+        path: /var/cache/apt/archives
+        state: directory
+        owner: root
+        group: root
+        mode: '0755'
+      tags: config
+
+    - name: Test APT proxy connection (dry-run)
+      ansible.builtin.command: >
+        apt-get update --print-uris -o Acquire::http::Proxy="http://{{ apt_proxy_host }}:{{ apt_proxy_port }}/"
+      register: apt_proxy_test
+      changed_when: false
+      failed_when: apt_proxy_test.rc != 0
+      tags: verify
+
+    - name: Display proxy test result
+      ansible.builtin.debug:
+        msg: |
+          ✅ {{ inventory_hostname }} is using APT proxy {{ apt_proxy_host }}:{{ apt_proxy_port }}
+          {{ apt_proxy_test.stdout | default('') }}
+      when: apt_proxy_test.rc == 0
+      tags: verify
+
+    - name: Display failure if APT proxy test failed
+      ansible.builtin.debug:
+        msg: "⚠️  {{ inventory_hostname }} failed to reach APT proxy at {{ apt_proxy_host }}:{{ apt_proxy_port }}"
+      when: apt_proxy_test.rc != 0
+      tags: verify
--- a/ansible/automation/playbooks/configure_docker_logging.yml
+++ b/ansible/automation/playbooks/configure_docker_logging.yml
@@ -0,0 +1,112 @@
+---
+# Configure Docker Daemon Log Rotation — Linux hosts only
+#
+# Sets daemon-level defaults so ALL future containers cap at 10 MB × 3 files.
+# Existing containers must be recreated to pick up the new limits:
+#   docker compose up --force-recreate
+#
+# Synology hosts (atlantis, calypso, setillo) are NOT covered here —
+# see docs/guides/docker-log-rotation.md for their manual procedure.
+#
+# Usage:
+#   ansible-playbook -i hosts.ini playbooks/configure_docker_logging.yml
+#   ansible-playbook -i hosts.ini playbooks/configure_docker_logging.yml --check
+#   ansible-playbook -i hosts.ini playbooks/configure_docker_logging.yml -e "host_target=homelab"
+
+- name: Configure Docker daemon log rotation (Linux hosts)
+  hosts: "{{ host_target | default('homelab,vish-concord-nuc,pi-5,matrix-ubuntu') }}"
+  gather_facts: yes
+  become: yes
+
+  vars:
+    docker_daemon_config: /etc/docker/daemon.json
+    docker_log_driver: json-file
+    docker_log_max_size: "10m"
+    docker_log_max_files: "3"
+
+  tasks:
+    - name: Ensure /etc/docker directory exists
+      file:
+        path: /etc/docker
+        state: directory
+        owner: root
+        group: root
+        mode: '0755'
+
+    - name: Read existing daemon.json (if present)
+      slurp:
+        src: "{{ docker_daemon_config }}"
+      register: existing_daemon_json
+      ignore_errors: yes
+
+    - name: Parse existing daemon config
+      set_fact:
+        existing_config: "{{ existing_daemon_json.content | b64decode | from_json }}"
+      when: existing_daemon_json is succeeded
+      ignore_errors: yes
+
+    - name: Set empty config when none exists
+      set_fact:
+        existing_config: {}
+      when: existing_daemon_json is failed or existing_config is not defined
+
+    - name: Merge log config into daemon.json
+      copy:
+        dest: "{{ docker_daemon_config }}"
+        content: "{{ merged_config | to_nice_json }}\n"
+        owner: root
+        group: root
+        mode: '0644'
+        backup: yes
+      vars:
+        log_opts:
+          log-driver: "{{ docker_log_driver }}"
+          log-opts:
+            max-size: "{{ docker_log_max_size }}"
+            max-file: "{{ docker_log_max_files }}"
+        merged_config: "{{ existing_config | combine(log_opts) }}"
+      register: daemon_json_changed
+
+    - name: Show resulting daemon.json
+      command: cat {{ docker_daemon_config }}
+      register: daemon_json_contents
+      changed_when: false
+
+    - name: Display daemon.json
+      debug:
+        msg: "{{ daemon_json_contents.stdout }}"
+
+    - name: Validate daemon.json is valid JSON
+      command: python3 -c "import json,sys; json.load(open('{{ docker_daemon_config }}')); print('Valid JSON')"
+      changed_when: false
+
+    - name: Reload Docker daemon
+      systemd:
+        name: docker
+        state: restarted
+        daemon_reload: yes
+      when: daemon_json_changed.changed
+
+    - name: Wait for Docker to be ready
+      command: docker info
+      register: docker_info
+      retries: 5
+      delay: 3
+      until: docker_info.rc == 0
+      changed_when: false
+      when: daemon_json_changed.changed
+
+    - name: Verify log config active in Docker info
+      command: docker info --format '{{ "{{" }}.LoggingDriver{{ "}}" }}'
+      register: log_driver_check
+      changed_when: false
+
+    - name: Report result
+      debug:
+        msg: |
+          Host: {{ inventory_hostname }}
+          Logging driver: {{ log_driver_check.stdout }}
+          daemon.json changed: {{ daemon_json_changed.changed }}
+          Effective config: max-size={{ docker_log_max_size }}, max-file={{ docker_log_max_files }}
+          NOTE: Existing containers need recreation to pick up limits:
+                docker compose up --force-recreate
--- a/ansible/automation/playbooks/container_dependency_map.yml
+++ b/ansible/automation/playbooks/container_dependency_map.yml
@@ -0,0 +1,411 @@
+---
+- name: Container Dependency Mapping and Orchestration
+  hosts: all
+  gather_facts: yes
+  vars:
+    dependency_timestamp: "{{ ansible_date_time.iso8601 }}"
+    dependency_report_dir: "/tmp/dependency_reports"
+    restart_timeout: 300
+    health_check_retries: 5
+    health_check_delay: 10
+
+  tasks:
+    - name: Create dependency reports directory
+      file:
+        path: "{{ dependency_report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    - name: Check if Docker is available
+      shell: command -v docker >/dev/null 2>&1
+      register: docker_available
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Skip Docker tasks if not available
+      set_fact:
+        skip_docker: "{{ docker_available.rc != 0 }}"
+
+    - name: Get all running containers
+      shell: |
+        docker ps --format "{{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null || echo "No containers"
+      register: running_containers
+      changed_when: false
+      when: not skip_docker
+
+    - name: Get all containers (including stopped)
+      shell: |
+        docker ps -a --format "{{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null || echo "No containers"
+      register: all_containers
+      changed_when: false
+      when: not skip_docker
+
+    - name: Analyze Docker Compose dependencies
+      shell: |
+        echo "=== DOCKER COMPOSE DEPENDENCY ANALYSIS ==="
+
+        # Find all docker-compose files
+        compose_files=$(find /opt /home -name "docker-compose*.yml" -o -name "compose*.yml" 2>/dev/null | head -20)
+
+        if [ -z "$compose_files" ]; then
+          echo "No Docker Compose files found"
+          exit 0
+        fi
+
+        echo "Found Docker Compose files:"
+        echo "$compose_files"
+        echo ""
+
+        # Analyze dependencies in each compose file
+        for compose_file in $compose_files; do
+          if [ -f "$compose_file" ]; then
+            echo "=== Analyzing: $compose_file ==="
+
+            # Extract service names
+            services=$(grep -E "^  [a-zA-Z0-9_-]+:" "$compose_file" | sed 's/://g' | sed 's/^  //' | sort)
+            echo "Services: $(echo $services | tr '\n' ' ')"
+
+            # Look for depends_on relationships
+            echo "Dependencies found:"
+            grep -A 5 -B 1 "depends_on:" "$compose_file" 2>/dev/null || echo "  No explicit depends_on found"
+
+            # Look for network dependencies
+            echo "Networks:"
+            grep -E "networks:|external_links:" "$compose_file" 2>/dev/null | head -5 || echo "  Default networks"
+
+            # Look for volume dependencies
+            echo "Shared volumes:"
+            grep -E "volumes_from:|volumes:" "$compose_file" 2>/dev/null | head -5 || echo "  No shared volumes"
+
+            echo ""
+          fi
+        done
+      register: compose_analysis
+      changed_when: false
+      when: not skip_docker
+
+    - name: Analyze container network connections
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== CONTAINER NETWORK ANALYSIS ==="
+
+        # Get all Docker networks
+        echo "Docker Networks:"
+        docker network ls --format "table {{.Name}}\t{{.Driver}}\t{{.Scope}}" 2>/dev/null || echo "No networks found"
+        echo ""
+
+        # Analyze each network
+        networks=$(docker network ls --format "{{.Name}}" 2>/dev/null | grep -v "bridge\|host\|none")
+
+        for network in $networks; do
+          echo "=== Network: $network ==="
+          containers_in_network=$(docker network inspect "$network" --format '{{range .Containers}}{{.Name}} {{end}}' 2>/dev/null)
+          if [ -n "$containers_in_network" ]; then
+            echo "Connected containers: $containers_in_network"
+          else
+            echo "No containers connected"
+          fi
+          echo ""
+        done
+
+        # Check for port conflicts
+        echo "=== PORT USAGE ANALYSIS ==="
+        docker ps --format "{{.Names}}\t{{.Ports}}" 2>/dev/null | grep -E ":[0-9]+->" | while read line; do
+          container=$(echo "$line" | cut -f1)
+          ports=$(echo "$line" | cut -f2 | grep -oE "[0-9]+:" | sed 's/://' | sort -n)
+          if [ -n "$ports" ]; then
+            echo "$container: $(echo $ports | tr '\n' ' ')"
+          fi
+        done
+      register: network_analysis
+      changed_when: false
+      when: not skip_docker
+
+    - name: Detect service health endpoints
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== HEALTH ENDPOINT DETECTION ==="
+
+        # Common health check patterns
+        health_patterns="/health /healthz /ping /status /api/health /health/ready /health/live"
+
+        # Get containers with exposed ports
+        docker ps --format "{{.Names}}\t{{.Ports}}" 2>/dev/null | grep -E ":[0-9]+->" | while read line; do
+          container=$(echo "$line" | cut -f1)
+          ports=$(echo "$line" | cut -f2 | grep -oE "0\.0\.0\.0:[0-9]+" | cut -d: -f2)
+
+          echo "Container: $container"
+
+          for port in $ports; do
+            echo "  Port $port:"
+            for pattern in $health_patterns; do
+              # Test HTTP health endpoint
+              if curl -s -f -m 2 "http://localhost:$port$pattern" >/dev/null 2>&1; then
+                echo "    ✅ http://localhost:$port$pattern"
+                break
+              elif curl -s -f -m 2 "https://localhost:$port$pattern" >/dev/null 2>&1; then
+                echo "    ✅ https://localhost:$port$pattern"
+                break
+              fi
+            done
+          done
+          echo ""
+        done
+      register: health_endpoints
+      changed_when: false
+      when: not skip_docker
+      ignore_errors: yes
+
+    - name: Analyze container resource dependencies
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== RESOURCE DEPENDENCY ANALYSIS ==="
+
+        # Check for containers that might be databases or core services
+        echo "Potential Core Services (databases, caches, etc.):"
+        docker ps --format "{{.Names}}\t{{.Image}}" 2>/dev/null | grep -iE "(postgres|mysql|mariadb|redis|mongo|elasticsearch|rabbitmq|kafka)" || echo "No obvious database containers found"
+        echo ""
+
+        # Check for reverse proxies and load balancers
+        echo "Potential Reverse Proxies/Load Balancers:"
+        docker ps --format "{{.Names}}\t{{.Image}}" 2>/dev/null | grep -iE "(nginx|apache|traefik|haproxy|caddy)" || echo "No obvious proxy containers found"
+        echo ""
+
+        # Check for monitoring services
+        echo "Monitoring Services:"
+        docker ps --format "{{.Names}}\t{{.Image}}" 2>/dev/null | grep -iE "(prometheus|grafana|influxdb|telegraf|node-exporter)" || echo "No obvious monitoring containers found"
+        echo ""
+
+        # Analyze container restart policies
+        echo "Container Restart Policies:"
+        docker ps -a --format "{{.Names}}" 2>/dev/null | while read container; do
+          if [ -n "$container" ]; then
+            policy=$(docker inspect "$container" --format '{{.HostConfig.RestartPolicy.Name}}' 2>/dev/null)
+            echo "$container: $policy"
+          fi
+        done
+      register: resource_analysis
+      changed_when: false
+      when: not skip_docker
+
+    - name: Create dependency map
+      set_fact:
+        dependency_map:
+          timestamp: "{{ dependency_timestamp }}"
+          hostname: "{{ inventory_hostname }}"
+          docker_available: "{{ not skip_docker }}"
+          containers:
+            running: "{{ running_containers.stdout_lines | default([]) | length }}"
+            total: "{{ all_containers.stdout_lines | default([]) | length }}"
+          analysis:
+            compose_files: "{{ compose_analysis.stdout | default('Docker not available') }}"
+            network_topology: "{{ network_analysis.stdout | default('Docker not available') }}"
+            health_endpoints: "{{ health_endpoints.stdout | default('Docker not available') }}"
+            resource_dependencies: "{{ resource_analysis.stdout | default('Docker not available') }}"
+
+    - name: Display dependency analysis
+      debug:
+        msg: |
+
+          ==========================================
+          🔗 DEPENDENCY ANALYSIS - {{ inventory_hostname }}
+          ==========================================
+
+          📊 CONTAINER SUMMARY:
+          - Running Containers: {{ dependency_map.containers.running }}
+          - Total Containers: {{ dependency_map.containers.total }}
+          - Docker Available: {{ dependency_map.docker_available }}
+
+          🐳 COMPOSE FILE ANALYSIS:
+          {{ dependency_map.analysis.compose_files }}
+
+          🌐 NETWORK TOPOLOGY:
+          {{ dependency_map.analysis.network_topology }}
+
+          🏥 HEALTH ENDPOINTS:
+          {{ dependency_map.analysis.health_endpoints }}
+
+          📦 RESOURCE DEPENDENCIES:
+          {{ dependency_map.analysis.resource_dependencies }}
+
+          ==========================================
+
+    - name: Generate dependency report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ dependency_map.timestamp }}",
+            "hostname": "{{ dependency_map.hostname }}",
+            "docker_available": {{ dependency_map.docker_available | lower }},
+            "container_summary": {
+              "running": {{ dependency_map.containers.running }},
+              "total": {{ dependency_map.containers.total }}
+            },
+            "analysis": {
+              "compose_files": {{ dependency_map.analysis.compose_files | to_json }},
+              "network_topology": {{ dependency_map.analysis.network_topology | to_json }},
+              "health_endpoints": {{ dependency_map.analysis.health_endpoints | to_json }},
+              "resource_dependencies": {{ dependency_map.analysis.resource_dependencies | to_json }}
+            },
+            "recommendations": [
+              {% if dependency_map.containers.running > 20 %}
+              "Consider implementing container orchestration for {{ dependency_map.containers.running }} containers",
+              {% endif %}
+              {% if 'No explicit depends_on found' in dependency_map.analysis.compose_files %}
+              "Add explicit depends_on relationships to Docker Compose files",
+              {% endif %}
+              {% if 'No obvious database containers found' not in dependency_map.analysis.resource_dependencies %}
+              "Ensure database containers have proper backup and recovery procedures",
+              {% endif %}
+              "Regular dependency mapping recommended for infrastructure changes"
+            ]
+          }
+        dest: "{{ dependency_report_dir }}/{{ inventory_hostname }}_dependencies_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Orchestrated container restart (when service_name is provided)
+      block:
+        - name: Validate service name parameter
+          fail:
+            msg: "service_name parameter is required for restart operations"
+          when: service_name is not defined
+
+        - name: Check if service exists
+          shell: |
+            if command -v docker >/dev/null 2>&1; then
+              docker ps -a --format "{{.Names}}" | grep -x "{{ service_name }}" || echo "not_found"
+            else
+              echo "docker_not_available"
+            fi
+          register: service_exists
+          changed_when: false
+
+        - name: Fail if service not found
+          fail:
+            msg: "Service '{{ service_name }}' not found on {{ inventory_hostname }}"
+          when: service_exists.stdout == "not_found"
+
+        - name: Get service dependencies (from compose file)
+          shell: |
+            # Find compose file containing this service
+            compose_file=""
+            for file in $(find /opt /home -name "docker-compose*.yml" -o -name "compose*.yml" 2>/dev/null); do
+              if grep -q "^  {{ service_name }}:" "$file" 2>/dev/null; then
+                compose_file="$file"
+                break
+              fi
+            done
+
+            if [ -n "$compose_file" ]; then
+              echo "Found in: $compose_file"
+              # Extract dependencies
+              awk '/^  {{ service_name }}:/,/^  [a-zA-Z]/ {
+                if (/depends_on:/) {
+                  getline
+                  while (/^    - /) {
+                    gsub(/^    - /, "")
+                    print $0
+                    getline
+                  }
+                }
+              }' "$compose_file" 2>/dev/null || echo "no_dependencies"
+            else
+              echo "no_compose_file"
+            fi
+          register: service_dependencies
+          changed_when: false
+
+        - name: Stop dependent services first
+          shell: |
+            if [ "{{ service_dependencies.stdout }}" != "no_dependencies" ] && [ "{{ service_dependencies.stdout }}" != "no_compose_file" ]; then
+              echo "Stopping dependent services..."
+              # This would need to be implemented based on your specific dependency chain
+              echo "Dependencies found: {{ service_dependencies.stdout }}"
+            fi
+          register: stop_dependents
+          when: cascade_restart | default(false) | bool
+
+        - name: Restart the target service
+          shell: |
+            echo "Restarting {{ service_name }}..."
+            docker restart "{{ service_name }}"
+
+            # Wait for container to be running
+            timeout {{ restart_timeout }} bash -c '
+              while [ "$(docker inspect {{ service_name }} --format "{{.State.Status}}" 2>/dev/null)" != "running" ]; do
+                sleep 2
+              done
+            '
+          register: restart_result
+
+        - name: Verify service health
+          shell: |
+            # Wait a moment for service to initialize
+            sleep {{ health_check_delay }}
+
+            # Check if container is running
+            if [ "$(docker inspect {{ service_name }} --format '{{.State.Status}}' 2>/dev/null)" = "running" ]; then
+              echo "✅ Container is running"
+
+              # Try to find and test health endpoint
+              ports=$(docker port {{ service_name }} 2>/dev/null | grep -oE "[0-9]+$" | head -1)
+              if [ -n "$ports" ]; then
+                for endpoint in /health /healthz /ping /status; do
+                  if curl -s -f -m 5 "http://localhost:$ports$endpoint" >/dev/null 2>&1; then
+                    echo "✅ Health endpoint responding: http://localhost:$ports$endpoint"
+                    exit 0
+                  fi
+                done
+                echo "⚠️  No health endpoint found, but container is running"
+              else
+                echo "⚠️  No exposed ports found, but container is running"
+              fi
+            else
+              echo "❌ Container is not running"
+              exit 1
+            fi
+          register: health_check
+          retries: "{{ health_check_retries }}"
+          delay: "{{ health_check_delay }}"
+
+        - name: Restart dependent services
+          shell: |
+            if [ "{{ service_dependencies.stdout }}" != "no_dependencies" ] && [ "{{ service_dependencies.stdout }}" != "no_compose_file" ]; then
+              echo "Restarting dependent services..."
+              # This would need to be implemented based on your specific dependency chain
+              echo "Would restart dependencies: {{ service_dependencies.stdout }}"
+            fi
+          when: cascade_restart | default(false) | bool
+
+      when: service_name is defined and not skip_docker
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          🔗 Dependency analysis complete for {{ inventory_hostname }}
+          📄 Report saved to: {{ dependency_report_dir }}/{{ inventory_hostname }}_dependencies_{{ ansible_date_time.epoch }}.json
+
+          {% if service_name is defined %}
+          🔄 Service restart summary:
+          - Target service: {{ service_name }}
+          - Restart result: {{ restart_result.rc | default('N/A') }}
+          - Health check: {{ 'PASSED' if health_check.rc == 0 else 'FAILED' }}
+          {% endif %}
+
+          💡 Use -e service_name=<container_name> to restart specific services
+          💡 Use -e cascade_restart=true to restart dependent services
--- a/ansible/automation/playbooks/container_dependency_orchestrator.yml
+++ b/ansible/automation/playbooks/container_dependency_orchestrator.yml
@@ -0,0 +1,227 @@
+---
+# Container Dependency Orchestrator
+# Smart restart ordering with dependency management across hosts
+# Run with: ansible-playbook -i hosts.ini playbooks/container_dependency_orchestrator.yml
+
+- name: Container Dependency Orchestration
+  hosts: all
+  gather_facts: yes
+  vars:
+    # Define service dependency tiers (restart order)
+    dependency_tiers:
+      tier_1_infrastructure:
+        - "postgres"
+        - "mariadb"
+        - "mysql"
+        - "redis"
+        - "memcached"
+        - "mongo"
+      tier_2_core_services:
+        - "authentik-server"
+        - "authentik-worker"
+        - "gitea"
+        - "portainer"
+        - "nginx-proxy-manager"
+      tier_3_applications:
+        - "plex"
+        - "sonarr"
+        - "radarr"
+        - "lidarr"
+        - "bazarr"
+        - "prowlarr"
+        - "jellyseerr"
+        - "immich-server"
+        - "paperlessngx"
+      tier_4_monitoring:
+        - "prometheus"
+        - "grafana"
+        - "alertmanager"
+        - "node_exporter"
+        - "snmp_exporter"
+      tier_5_utilities:
+        - "watchtower"
+        - "syncthing"
+        - "ntfy"
+
+    # Cross-host dependencies
+    cross_host_dependencies:
+      - service: "immich-server"
+        depends_on:
+          - host: "atlantis"
+            service: "postgres"
+      - service: "gitea"
+        depends_on:
+          - host: "calypso"
+            service: "postgres"
+
+  tasks:
+    - name: Gather container information
+      docker_host_info:
+        containers: yes
+      register: docker_info
+      when: ansible_facts['os_family'] != "Synology"
+
+    - name: Get Synology container info via docker command
+      shell: docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"
+      register: synology_containers
+      when: ansible_facts['os_family'] == "Synology"
+      become: yes
+
+    - name: Parse container information
+      set_fact:
+        running_containers: "{{ docker_info.containers | selectattr('State', 'equalto', 'running') | map(attribute='Names') | map('first') | list if docker_info is defined else [] }}"
+        stopped_containers: "{{ docker_info.containers | rejectattr('State', 'equalto', 'running') | map(attribute='Names') | map('first') | list if docker_info is defined else [] }}"
+
+    - name: Categorize containers by dependency tier
+      set_fact:
+        tier_containers:
+          tier_1: "{{ running_containers | select('match', '.*(' + (dependency_tiers.tier_1_infrastructure | join('|')) + ').*') | list }}"
+          tier_2: "{{ running_containers | select('match', '.*(' + (dependency_tiers.tier_2_core_services | join('|')) + ').*') | list }}"
+          tier_3: "{{ running_containers | select('match', '.*(' + (dependency_tiers.tier_3_applications | join('|')) + ').*') | list }}"
+          tier_4: "{{ running_containers | select('match', '.*(' + (dependency_tiers.tier_4_monitoring | join('|')) + ').*') | list }}"
+          tier_5: "{{ running_containers | select('match', '.*(' + (dependency_tiers.tier_5_utilities | join('|')) + ').*') | list }}"
+
+    - name: Display container categorization
+      debug:
+        msg: |
+          Container Dependency Analysis for {{ inventory_hostname }}:
+
+          Tier 1 (Infrastructure): {{ tier_containers.tier_1 | length }} containers
+          {{ tier_containers.tier_1 | join(', ') }}
+
+          Tier 2 (Core Services): {{ tier_containers.tier_2 | length }} containers
+          {{ tier_containers.tier_2 | join(', ') }}
+
+          Tier 3 (Applications): {{ tier_containers.tier_3 | length }} containers
+          {{ tier_containers.tier_3 | join(', ') }}
+
+          Tier 4 (Monitoring): {{ tier_containers.tier_4 | length }} containers
+          {{ tier_containers.tier_4 | join(', ') }}
+
+          Tier 5 (Utilities): {{ tier_containers.tier_5 | length }} containers
+          {{ tier_containers.tier_5 | join(', ') }}
+
+    - name: Check container health status
+      shell: docker inspect {{ item }} --format='{{.State.Health.Status}}' 2>/dev/null || echo "no-healthcheck"
+      register: health_checks
+      loop: "{{ running_containers }}"
+      become: yes
+      failed_when: false
+
+    - name: Identify unhealthy containers
+      set_fact:
+        unhealthy_containers: "{{ health_checks.results | selectattr('stdout', 'equalto', 'unhealthy') | map(attribute='item') | list }}"
+        healthy_containers: "{{ health_checks.results | selectattr('stdout', 'in', ['healthy', 'no-healthcheck']) | map(attribute='item') | list }}"
+
+    - name: Display health status
+      debug:
+        msg: |
+          Container Health Status for {{ inventory_hostname }}:
+          - Healthy/No Check: {{ healthy_containers | length }}
+          - Unhealthy: {{ unhealthy_containers | length }}
+          {% if unhealthy_containers %}
+
+          Unhealthy Containers:
+          {% for container in unhealthy_containers %}
+          - {{ container }}
+          {% endfor %}
+          {% endif %}
+
+    - name: Restart unhealthy containers (Tier 1 first)
+      docker_container:
+        name: "{{ item }}"
+        state: started
+        restart: yes
+      loop: "{{ tier_containers.tier_1 | intersect(unhealthy_containers) }}"
+      when:
+        - restart_unhealthy | default(false) | bool
+        - unhealthy_containers | length > 0
+      become: yes
+
+    - name: Wait for Tier 1 containers to be healthy
+      shell: |
+        for i in {1..30}; do
+          status=$(docker inspect {{ item }} --format='{{.State.Health.Status}}' 2>/dev/null || echo "no-healthcheck")
+          if [[ "$status" == "healthy" || "$status" == "no-healthcheck" ]]; then
+            echo "Container {{ item }} is ready"
+            exit 0
+          fi
+          sleep 10
+        done
+        echo "Container {{ item }} failed to become healthy"
+        exit 1
+      loop: "{{ tier_containers.tier_1 | intersect(unhealthy_containers) }}"
+      when:
+        - restart_unhealthy | default(false) | bool
+        - unhealthy_containers | length > 0
+      become: yes
+
+    - name: Restart unhealthy containers (Tier 2)
+      docker_container:
+        name: "{{ item }}"
+        state: started
+        restart: yes
+      loop: "{{ tier_containers.tier_2 | intersect(unhealthy_containers) }}"
+      when:
+        - restart_unhealthy | default(false) | bool
+        - unhealthy_containers | length > 0
+      become: yes
+
+    - name: Generate dependency report
+      copy:
+        content: |
+          # Container Dependency Report - {{ inventory_hostname }}
+          Generated: {{ ansible_date_time.iso8601 }}
+
+          ## Container Summary
+          - Total Running: {{ running_containers | length }}
+          - Total Stopped: {{ stopped_containers | length }}
+          - Healthy: {{ healthy_containers | length }}
+          - Unhealthy: {{ unhealthy_containers | length }}
+
+          ## Dependency Tiers
+
+          ### Tier 1 - Infrastructure ({{ tier_containers.tier_1 | length }})
+          {% for container in tier_containers.tier_1 %}
+          - {{ container }}
+          {% endfor %}
+
+          ### Tier 2 - Core Services ({{ tier_containers.tier_2 | length }})
+          {% for container in tier_containers.tier_2 %}
+          - {{ container }}
+          {% endfor %}
+
+          ### Tier 3 - Applications ({{ tier_containers.tier_3 | length }})
+          {% for container in tier_containers.tier_3 %}
+          - {{ container }}
+          {% endfor %}
+
+          ### Tier 4 - Monitoring ({{ tier_containers.tier_4 | length }})
+          {% for container in tier_containers.tier_4 %}
+          - {{ container }}
+          {% endfor %}
+
+          ### Tier 5 - Utilities ({{ tier_containers.tier_5 | length }})
+          {% for container in tier_containers.tier_5 %}
+          - {{ container }}
+          {% endfor %}
+
+          {% if unhealthy_containers %}
+          ## Unhealthy Containers
+          {% for container in unhealthy_containers %}
+          - {{ container }}
+          {% endfor %}
+          {% endif %}
+
+          {% if stopped_containers %}
+          ## Stopped Containers
+          {% for container in stopped_containers %}
+          - {{ container }}
+          {% endfor %}
+          {% endif %}
+        dest: "/tmp/container_dependency_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md"
+      delegate_to: localhost
+
+    - name: Display report location
+      debug:
+        msg: "Dependency report saved to: /tmp/container_dependency_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md"
--- a/ansible/automation/playbooks/container_logs.yml
+++ b/ansible/automation/playbooks/container_logs.yml
@@ -0,0 +1,249 @@
+---
+# Container Logs Collection Playbook
+# Collect logs from multiple containers for troubleshooting
+# Usage: ansible-playbook playbooks/container_logs.yml -e "service_name=plex"
+# Usage: ansible-playbook playbooks/container_logs.yml -e "service_pattern=immich"
+# Usage: ansible-playbook playbooks/container_logs.yml -e "collect_all=true"
+
+- name: Collect Container Logs
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    target_service_name: "{{ service_name | default('') }}"
+    target_service_pattern: "{{ service_pattern | default('') }}"
+    target_collect_all: "{{ collect_all | default(false) }}"
+    target_log_lines: "{{ log_lines | default(100) }}"
+    target_log_since: "{{ log_since | default('1h') }}"
+    output_dir: "/tmp/container_logs/{{ ansible_date_time.date }}"
+    target_include_timestamps: "{{ include_timestamps | default(true) }}"
+    target_follow_logs: "{{ follow_logs | default(false) }}"
+
+  tasks:
+    - name: Validate input parameters
+      fail:
+        msg: "Specify either service_name, service_pattern, or collect_all=true"
+      when:
+        - target_service_name == ""
+        - target_service_pattern == ""
+        - not (target_collect_all | bool)
+
+    - name: Check if Docker is running
+      systemd:
+        name: docker
+      register: docker_status
+      failed_when: docker_status.status.ActiveState != "active"
+
+    - name: Create local log directory
+      file:
+        path: "{{ output_dir }}/{{ inventory_hostname }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+
+    - name: Create remote log directory
+      file:
+        path: "{{ output_dir }}/{{ inventory_hostname }}"
+        state: directory
+        mode: '0755'
+
+    - name: Get specific service container
+      shell: 'docker ps -a --filter "name={{ target_service_name }}" --format "{%raw%}{{.Names}}{%endraw%}"'
+      register: specific_container
+      when: target_service_name != ""
+      changed_when: false
+
+    - name: Get containers matching pattern
+      shell: 'docker ps -a --filter "name={{ target_service_pattern }}" --format "{%raw%}{{.Names}}{%endraw%}"'
+      register: pattern_containers
+      when: target_service_pattern != ""
+      changed_when: false
+
+    - name: Get all containers
+      shell: 'docker ps -a --format "{%raw%}{{.Names}}{%endraw%}"'
+      register: all_containers
+      when: target_collect_all | bool
+      changed_when: false
+
+    - name: Combine container lists
+      set_fact:
+        target_containers: >-
+          {{
+            (specific_container.stdout_lines | default([])) +
+            (pattern_containers.stdout_lines | default([])) +
+            (all_containers.stdout_lines | default([]) if target_collect_all | bool else [])
+          }}
+
+    - name: Display target containers
+      debug:
+        msg: |
+          📦 CONTAINER LOG COLLECTION
+          ===========================
+          🖥️  Host: {{ inventory_hostname }}
+          📋 Target Containers: {{ target_containers | length }}
+          {% for container in target_containers %}
+          - {{ container }}
+          {% endfor %}
+          📏 Log Lines: {{ target_log_lines }}
+          ⏰ Since: {{ target_log_since }}
+
+    - name: Fail if no containers found
+      fail:
+        msg: "No containers found matching the criteria"
+      when: target_containers | length == 0
+
+    - name: Get container information
+      shell: |
+        docker inspect {{ item }} --format='
+        Container: {{ item }}
+        Image: {%raw%}{{.Config.Image}}{%endraw%}
+        Status: {%raw%}{{.State.Status}}{%endraw%}
+        Started: {%raw%}{{.State.StartedAt}}{%endraw%}
+        Restart Count: {%raw%}{{.RestartCount}}{%endraw%}
+        Health: {%raw%}{{if .State.Health}}{{.State.Health.Status}}{{else}}No health check{{end}}{%endraw%}
+        '
+      register: container_info
+      loop: "{{ target_containers }}"
+      changed_when: false
+
+    - name: Collect container logs
+      shell: |
+        echo "=== CONTAINER INFO ===" > {{ output_dir }}/{{ inventory_hostname }}/{{ item }}.log
+        docker inspect {{ item }} --format='
+        Container: {{ item }}
+        Image: {%raw%}{{.Config.Image}}{%endraw%}
+        Status: {%raw%}{{.State.Status}}{%endraw%}
+        Started: {%raw%}{{.State.StartedAt}}{%endraw%}
+        Restart Count: {%raw%}{{.RestartCount}}{%endraw%}
+        Health: {%raw%}{{if .State.Health}}{{.State.Health.Status}}{{else}}No health check{{end}}{%endraw%}
+        ' >> {{ output_dir }}/{{ inventory_hostname }}/{{ item }}.log
+        echo "" >> {{ output_dir }}/{{ inventory_hostname }}/{{ item }}.log
+        echo "=== CONTAINER LOGS ===" >> {{ output_dir }}/{{ inventory_hostname }}/{{ item }}.log
+        {% if target_include_timestamps | bool %}
+        docker logs {{ item }} --since={{ target_log_since }} --tail={{ target_log_lines }} -t >> {{ output_dir }}/{{ inventory_hostname }}/{{ item }}.log 2>&1
+        {% else %}
+        docker logs {{ item }} --since={{ target_log_since }} --tail={{ target_log_lines }} >> {{ output_dir }}/{{ inventory_hostname }}/{{ item }}.log 2>&1
+        {% endif %}
+      loop: "{{ target_containers }}"
+      ignore_errors: yes
+
+    - name: Get container resource usage
+      shell: 'docker stats {{ target_containers | join(" ") }} --no-stream --format "table {%raw%}{{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}\t{{.BlockIO}}{%endraw%}"'
+      register: container_stats
+      when: target_containers | length > 0
+      ignore_errors: yes
+
+    - name: Save container stats
+      copy:
+        content: |
+          Container Resource Usage - {{ ansible_date_time.iso8601 }}
+          Host: {{ inventory_hostname }}
+
+          {{ container_stats.stdout }}
+        dest: "{{ output_dir }}/{{ inventory_hostname }}/container_stats.txt"
+      when: container_stats.stdout is defined
+
+    - name: Check for error patterns in logs
+      shell: |
+        echo "=== ERROR ANALYSIS ===" > {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+        echo "Host: {{ inventory_hostname }}" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+        echo "Timestamp: {{ ansible_date_time.iso8601 }}" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+        echo "" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+
+        for container in {{ target_containers | join(' ') }}; do
+          echo "=== $container ===" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+
+          # Count error patterns
+          error_count=$(docker logs $container --since={{ target_log_since }} 2>&1 | grep -i -E "(error|exception|failed|fatal|panic)" | wc -l)
+          warn_count=$(docker logs $container --since={{ target_log_since }} 2>&1 | grep -i -E "(warn|warning)" | wc -l)
+
+          echo "Errors: $error_count" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+          echo "Warnings: $warn_count" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+
+          # Show recent errors
+          if [ $error_count -gt 0 ]; then
+            echo "Recent Errors:" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+            docker logs $container --since={{ target_log_since }} 2>&1 | grep -i -E "(error|exception|failed|fatal|panic)" | tail -5 >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+          fi
+          echo "" >> {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+        done
+      when: target_containers | length > 0
+      ignore_errors: yes
+
+    - name: Create summary report
+      copy:
+        content: |
+          📊 CONTAINER LOG COLLECTION SUMMARY
+          ===================================
+
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Collection Time: {{ ansible_date_time.iso8601 }}
+          📦 Containers Processed: {{ target_containers | length }}
+          📏 Log Lines per Container: {{ target_log_lines }}
+          ⏰ Time Range: {{ target_log_since }}
+
+          📋 CONTAINERS:
+          {% for container in target_containers %}
+          - {{ container }}
+          {% endfor %}
+
+          📁 LOG FILES LOCATION:
+          {{ output_dir }}/{{ inventory_hostname }}/
+
+          📄 FILES CREATED:
+          {% for container in target_containers %}
+          - {{ container }}.log
+          {% endfor %}
+          - container_stats.txt
+          - error_summary.txt
+          - collection_summary.txt (this file)
+
+          🔍 QUICK ANALYSIS:
+          Use these commands to analyze the logs:
+
+          # View error summary
+          cat {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+
+          # Search for specific patterns
+          grep -i "error" {{ output_dir }}/{{ inventory_hostname }}/*.log
+
+          # View container stats
+          cat {{ output_dir }}/{{ inventory_hostname }}/container_stats.txt
+
+          # Follow live logs (if needed)
+          {% for container in target_containers[:3] %}
+          docker logs -f {{ container }}
+          {% endfor %}
+
+        dest: "{{ output_dir }}/{{ inventory_hostname }}/collection_summary.txt"
+
+    - name: Display collection results
+      debug:
+        msg: |
+
+          ✅ LOG COLLECTION COMPLETE
+          ==========================
+          🖥️  Host: {{ inventory_hostname }}
+          📦 Containers: {{ target_containers | length }}
+          📁 Location: {{ output_dir }}/{{ inventory_hostname }}/
+
+          📄 Files Created:
+          {% for container in target_containers %}
+          - {{ container }}.log
+          {% endfor %}
+          - container_stats.txt
+          - error_summary.txt
+          - collection_summary.txt
+
+          🔍 Quick Commands:
+          # View errors: cat {{ output_dir }}/{{ inventory_hostname }}/error_summary.txt
+          # View stats: cat {{ output_dir }}/{{ inventory_hostname }}/container_stats.txt
+
+          ==========================
+
+    - name: Archive logs (optional)
+      archive:
+        path: "{{ output_dir }}/{{ inventory_hostname }}"
+        dest: "{{ output_dir }}/{{ inventory_hostname }}_logs_{{ ansible_date_time.epoch }}.tar.gz"
+        remove: no
+      when: archive_logs | default(false) | bool
+      delegate_to: localhost
--- a/ansible/automation/playbooks/container_resource_optimizer.yml
+++ b/ansible/automation/playbooks/container_resource_optimizer.yml
@@ -0,0 +1,369 @@
+---
+- name: Container Resource Optimization
+  hosts: all
+  gather_facts: yes
+  vars:
+    optimization_timestamp: "{{ ansible_date_time.iso8601 }}"
+    optimization_report_dir: "/tmp/optimization_reports"
+    cpu_threshold_warning: 80
+    cpu_threshold_critical: 95
+    memory_threshold_warning: 85
+    memory_threshold_critical: 95
+
+  tasks:
+    - name: Create optimization reports directory
+      file:
+        path: "{{ optimization_report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    - name: Check if Docker is available
+      shell: command -v docker >/dev/null 2>&1
+      register: docker_available
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Skip Docker tasks if not available
+      set_fact:
+        skip_docker: "{{ docker_available.rc != 0 }}"
+
+    - name: Collect container resource usage
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== CONTAINER RESOURCE USAGE ==="
+
+        # Get current resource usage
+        echo "Current Resource Usage:"
+        docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.BlockIO}}" 2>/dev/null || echo "No running containers"
+        echo ""
+
+        # Get container limits
+        echo "Container Resource Limits:"
+        docker ps --format "{{.Names}}" 2>/dev/null | while read container; do
+          if [ -n "$container" ]; then
+            echo "Container: $container"
+
+            # CPU limits
+            cpu_limit=$(docker inspect "$container" --format '{{.HostConfig.CpuQuota}}' 2>/dev/null)
+            cpu_period=$(docker inspect "$container" --format '{{.HostConfig.CpuPeriod}}' 2>/dev/null)
+            if [ "$cpu_limit" != "0" ] && [ "$cpu_period" != "0" ]; then
+              cpu_cores=$(echo "scale=2; $cpu_limit / $cpu_period" | bc 2>/dev/null || echo "N/A")
+              echo "  CPU Limit: $cpu_cores cores"
+            else
+              echo "  CPU Limit: unlimited"
+            fi
+
+            # Memory limits
+            mem_limit=$(docker inspect "$container" --format '{{.HostConfig.Memory}}' 2>/dev/null)
+            if [ "$mem_limit" != "0" ]; then
+              mem_mb=$(echo "scale=0; $mem_limit / 1024 / 1024" | bc 2>/dev/null || echo "N/A")
+              echo "  Memory Limit: ${mem_mb}MB"
+            else
+              echo "  Memory Limit: unlimited"
+            fi
+
+            # Restart policy
+            restart_policy=$(docker inspect "$container" --format '{{.HostConfig.RestartPolicy.Name}}' 2>/dev/null)
+            echo "  Restart Policy: $restart_policy"
+
+            echo ""
+          fi
+        done
+      register: resource_usage
+      changed_when: false
+      when: not skip_docker
+
+    - name: Analyze resource efficiency
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== RESOURCE EFFICIENCY ANALYSIS ==="
+
+        # Identify resource-heavy containers
+        echo "High Resource Usage Containers:"
+        docker stats --no-stream --format "{{.Container}}\t{{.CPUPerc}}\t{{.MemPerc}}" 2>/dev/null | while IFS=$'\t' read container cpu mem; do
+          if [ -n "$container" ] && [ "$container" != "CONTAINER" ]; then
+            cpu_num=$(echo "$cpu" | sed 's/%//' | cut -d'.' -f1)
+            mem_num=$(echo "$mem" | sed 's/%//' | cut -d'.' -f1)
+
+            if [ "$cpu_num" -gt "{{ cpu_threshold_warning }}" ] 2>/dev/null || [ "$mem_num" -gt "{{ memory_threshold_warning }}" ] 2>/dev/null; then
+              echo "⚠️  $container - CPU: $cpu, Memory: $mem"
+            fi
+          fi
+        done
+        echo ""
+
+        # Check for containers without limits
+        echo "Containers Without Resource Limits:"
+        docker ps --format "{{.Names}}" 2>/dev/null | while read container; do
+          if [ -n "$container" ]; then
+            cpu_limit=$(docker inspect "$container" --format '{{.HostConfig.CpuQuota}}' 2>/dev/null)
+            mem_limit=$(docker inspect "$container" --format '{{.HostConfig.Memory}}' 2>/dev/null)
+
+            if [ "$cpu_limit" = "0" ] && [ "$mem_limit" = "0" ]; then
+              echo "⚠️  $container - No CPU or memory limits"
+            elif [ "$cpu_limit" = "0" ]; then
+              echo "⚠️  $container - No CPU limit"
+            elif [ "$mem_limit" = "0" ]; then
+              echo "⚠️  $container - No memory limit"
+            fi
+          fi
+        done
+        echo ""
+
+        # Identify idle containers
+        echo "Low Usage Containers (potential over-provisioning):"
+        docker stats --no-stream --format "{{.Container}}\t{{.CPUPerc}}\t{{.MemPerc}}" 2>/dev/null | while IFS=$'\t' read container cpu mem; do
+          if [ -n "$container" ] && [ "$container" != "CONTAINER" ]; then
+            cpu_num=$(echo "$cpu" | sed 's/%//' | cut -d'.' -f1)
+            mem_num=$(echo "$mem" | sed 's/%//' | cut -d'.' -f1)
+
+            if [ "$cpu_num" -lt "5" ] 2>/dev/null && [ "$mem_num" -lt "10" ] 2>/dev/null; then
+              echo "💡 $container - CPU: $cpu, Memory: $mem (consider downsizing)"
+            fi
+          fi
+        done
+      register: efficiency_analysis
+      changed_when: false
+      when: not skip_docker
+
+    - name: System resource analysis
+      shell: |
+        echo "=== SYSTEM RESOURCE ANALYSIS ==="
+
+        # Overall system resources
+        echo "System Resources:"
+        echo "CPU Cores: $(nproc)"
+        echo "Total Memory: $(free -h | awk 'NR==2{print $2}')"
+        echo "Available Memory: $(free -h | awk 'NR==2{print $7}')"
+        echo "Memory Usage: $(free | awk 'NR==2{printf "%.1f%%", $3*100/$2}')"
+        echo "Load Average: $(uptime | awk -F'load average:' '{print $2}')"
+        echo ""
+
+        # Docker system resource usage
+        if command -v docker >/dev/null 2>&1; then
+          echo "Docker System Usage:"
+          docker system df 2>/dev/null || echo "Docker system info not available"
+          echo ""
+
+          # Count containers by status
+          echo "Container Status Summary:"
+          echo "Running: $(docker ps -q 2>/dev/null | wc -l)"
+          echo "Stopped: $(docker ps -aq --filter status=exited 2>/dev/null | wc -l)"
+          echo "Total: $(docker ps -aq 2>/dev/null | wc -l)"
+        fi
+        echo ""
+
+        # Disk usage for Docker
+        if [ -d "/var/lib/docker" ]; then
+          echo "Docker Storage Usage:"
+          du -sh /var/lib/docker 2>/dev/null || echo "Docker storage info not accessible"
+        fi
+      register: system_analysis
+      changed_when: false
+
+    - name: Generate optimization recommendations
+      shell: |
+        echo "=== OPTIMIZATION RECOMMENDATIONS ==="
+
+        # System-level recommendations
+        total_mem_mb=$(free -m | awk 'NR==2{print $2}')
+        used_mem_mb=$(free -m | awk 'NR==2{print $3}')
+        mem_usage_percent=$(echo "scale=1; $used_mem_mb * 100 / $total_mem_mb" | bc 2>/dev/null || echo "0")
+
+        echo "System Recommendations:"
+        if [ "$(echo "$mem_usage_percent > 85" | bc 2>/dev/null)" = "1" ]; then
+          echo "🚨 High memory usage (${mem_usage_percent}%) - consider adding RAM or optimizing containers"
+        elif [ "$(echo "$mem_usage_percent > 70" | bc 2>/dev/null)" = "1" ]; then
+          echo "⚠️  Moderate memory usage (${mem_usage_percent}%) - monitor closely"
+        else
+          echo "✅ Memory usage acceptable (${mem_usage_percent}%)"
+        fi
+
+        # Load average check
+        load_1min=$(uptime | awk -F'load average:' '{print $2}' | awk -F',' '{print $1}' | xargs)
+        cpu_cores=$(nproc)
+        if [ "$(echo "$load_1min > $cpu_cores" | bc 2>/dev/null)" = "1" ]; then
+          echo "🚨 High CPU load ($load_1min) exceeds core count ($cpu_cores)"
+        else
+          echo "✅ CPU load acceptable ($load_1min for $cpu_cores cores)"
+        fi
+        echo ""
+
+        # Docker-specific recommendations
+        if command -v docker >/dev/null 2>&1; then
+          echo "Container Recommendations:"
+
+          # Check for containers without health checks
+          echo "Containers without health checks:"
+          docker ps --format "{{.Names}}" 2>/dev/null | while read container; do
+            if [ -n "$container" ]; then
+              health_check=$(docker inspect "$container" --format '{{.Config.Healthcheck}}' 2>/dev/null)
+              if [ "$health_check" = "<nil>" ] || [ -z "$health_check" ]; then
+                echo "💡 $container - Consider adding health check"
+              fi
+            fi
+          done
+          echo ""
+
+          # Check for old images
+          echo "Image Optimization:"
+          old_images=$(docker images --filter "dangling=true" -q 2>/dev/null | wc -l)
+          if [ "$old_images" -gt "0" ]; then
+            echo "🧹 $old_images dangling images found - run 'docker image prune'"
+          fi
+
+          unused_volumes=$(docker volume ls --filter "dangling=true" -q 2>/dev/null | wc -l)
+          if [ "$unused_volumes" -gt "0" ]; then
+            echo "🧹 $unused_volumes unused volumes found - run 'docker volume prune'"
+          fi
+        fi
+      register: recommendations
+      changed_when: false
+
+    - name: Create optimization report
+      set_fact:
+        optimization_report:
+          timestamp: "{{ optimization_timestamp }}"
+          hostname: "{{ inventory_hostname }}"
+          docker_available: "{{ not skip_docker }}"
+          resource_usage: "{{ resource_usage.stdout if not skip_docker else 'Docker not available' }}"
+          efficiency_analysis: "{{ efficiency_analysis.stdout if not skip_docker else 'Docker not available' }}"
+          system_analysis: "{{ system_analysis.stdout }}"
+          recommendations: "{{ recommendations.stdout }}"
+
+    - name: Display optimization report
+      debug:
+        msg: |
+
+          ==========================================
+          ⚡ RESOURCE OPTIMIZATION - {{ inventory_hostname }}
+          ==========================================
+
+          📊 DOCKER AVAILABLE: {{ 'Yes' if optimization_report.docker_available else 'No' }}
+
+          🔍 RESOURCE USAGE:
+          {{ optimization_report.resource_usage }}
+
+          📈 EFFICIENCY ANALYSIS:
+          {{ optimization_report.efficiency_analysis }}
+
+          🖥️  SYSTEM ANALYSIS:
+          {{ optimization_report.system_analysis }}
+
+          💡 RECOMMENDATIONS:
+          {{ optimization_report.recommendations }}
+
+          ==========================================
+
+    - name: Generate JSON optimization report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ optimization_report.timestamp }}",
+            "hostname": "{{ optimization_report.hostname }}",
+            "docker_available": {{ optimization_report.docker_available | lower }},
+            "resource_usage": {{ optimization_report.resource_usage | to_json }},
+            "efficiency_analysis": {{ optimization_report.efficiency_analysis | to_json }},
+            "system_analysis": {{ optimization_report.system_analysis | to_json }},
+            "recommendations": {{ optimization_report.recommendations | to_json }},
+            "optimization_actions": [
+              "Review containers without resource limits",
+              "Monitor high-usage containers for optimization opportunities",
+              "Consider downsizing low-usage containers",
+              "Implement health checks for better reliability",
+              "Regular cleanup of unused images and volumes"
+            ]
+          }
+        dest: "{{ optimization_report_dir }}/{{ inventory_hostname }}_optimization_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Apply optimizations (when optimize_action is specified)
+      block:
+        - name: Validate optimization action
+          fail:
+            msg: "Invalid action. Supported actions: cleanup, restart_high_usage, add_limits"
+          when: optimize_action not in ['cleanup', 'restart_high_usage', 'add_limits']
+
+        - name: Execute optimization action
+          shell: |
+            case "{{ optimize_action }}" in
+              "cleanup")
+                echo "Performing Docker cleanup..."
+                docker image prune -f 2>/dev/null || echo "Image prune failed"
+                docker volume prune -f 2>/dev/null || echo "Volume prune failed"
+                docker container prune -f 2>/dev/null || echo "Container prune failed"
+                echo "Cleanup completed"
+                ;;
+              "restart_high_usage")
+                echo "Restarting high CPU/memory usage containers..."
+                docker stats --no-stream --format "{{.Container}}\t{{.CPUPerc}}\t{{.MemPerc}}" 2>/dev/null | while IFS=$'\t' read container cpu mem; do
+                  if [ -n "$container" ] && [ "$container" != "CONTAINER" ]; then
+                    cpu_num=$(echo "$cpu" | sed 's/%//' | cut -d'.' -f1)
+                    mem_num=$(echo "$mem" | sed 's/%//' | cut -d'.' -f1)
+
+                    if [ "$cpu_num" -gt "{{ cpu_threshold_critical }}" ] 2>/dev/null || [ "$mem_num" -gt "{{ memory_threshold_critical }}" ] 2>/dev/null; then
+                      echo "Restarting high-usage container: $container (CPU: $cpu, Memory: $mem)"
+                      docker restart "$container" 2>/dev/null || echo "Failed to restart $container"
+                    fi
+                  fi
+                done
+                ;;
+              "add_limits")
+                echo "Adding resource limits requires manual Docker Compose file updates"
+                echo "Recommended limits based on current usage:"
+                docker stats --no-stream --format "{{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" 2>/dev/null | while IFS=$'\t' read container cpu mem; do
+                  if [ -n "$container" ] && [ "$container" != "CONTAINER" ]; then
+                    echo "$container:"
+                    echo "  deploy:"
+                    echo "    resources:"
+                    echo "      limits:"
+                    echo "        cpus: '1.0'  # Adjust based on usage: $cpu"
+                    echo "        memory: 512M  # Adjust based on usage: $mem"
+                    echo ""
+                  fi
+                done
+                ;;
+            esac
+          register: optimization_action_result
+          when: not skip_docker
+
+        - name: Display optimization action result
+          debug:
+            msg: |
+
+              ⚡ Optimization action '{{ optimize_action }}' completed on {{ inventory_hostname }}
+
+              Result:
+              {{ optimization_action_result.stdout }}
+
+              {% if optimization_action_result.stderr %}
+              Errors:
+              {{ optimization_action_result.stderr }}
+              {% endif %}
+
+      when: optimize_action is defined and not skip_docker
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          ⚡ Resource optimization analysis complete for {{ inventory_hostname }}
+          📄 Report saved to: {{ optimization_report_dir }}/{{ inventory_hostname }}_optimization_{{ ansible_date_time.epoch }}.json
+
+          {% if optimize_action is defined %}
+          🔧 Action performed: {{ optimize_action }}
+          {% endif %}
+
+          💡 Use -e optimize_action=<action> for optimization operations
+          💡 Supported actions: cleanup, restart_high_usage, add_limits
+          💡 Monitor resource usage regularly for optimal performance
--- a/ansible/automation/playbooks/container_update_orchestrator.yml
+++ b/ansible/automation/playbooks/container_update_orchestrator.yml
@@ -0,0 +1,501 @@
+---
+- name: Container Update Orchestrator
+  hosts: all
+  gather_facts: yes
+  vars:
+    update_timestamp: "{{ ansible_date_time.iso8601 }}"
+    update_report_dir: "/tmp/update_reports"
+    rollback_enabled: true
+    update_timeout: 600
+    health_check_retries: 5
+    health_check_delay: 10
+
+  tasks:
+    - name: Create update reports directory
+      file:
+        path: "{{ update_report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    - name: Check if Docker is available
+      shell: command -v docker >/dev/null 2>&1
+      register: docker_available
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Skip Docker tasks if not available
+      set_fact:
+        skip_docker: "{{ docker_available.rc != 0 }}"
+
+    - name: Pre-update system check
+      shell: |
+        echo "=== PRE-UPDATE SYSTEM CHECK ==="
+
+        # System resources
+        echo "System Resources:"
+        echo "Memory: $(free -h | awk 'NR==2{print $3"/"$2" ("$3*100/$2"%)"}')"
+        echo "Disk: $(df -h / | awk 'NR==2{print $3"/"$2" ("$5")"}')"
+        echo "Load: $(uptime | awk -F'load average:' '{print $2}')"
+        echo ""
+
+        # Docker status
+        if command -v docker >/dev/null 2>&1; then
+          echo "Docker Status:"
+          echo "Running containers: $(docker ps -q 2>/dev/null | wc -l)"
+          echo "Total containers: $(docker ps -aq 2>/dev/null | wc -l)"
+          echo "Images: $(docker images -q 2>/dev/null | wc -l)"
+          echo "Docker daemon: $(docker info >/dev/null 2>&1 && echo 'OK' || echo 'ERROR')"
+        else
+          echo "Docker not available"
+        fi
+        echo ""
+
+        # Network connectivity
+        echo "Network Connectivity:"
+        ping -c 1 8.8.8.8 >/dev/null 2>&1 && echo "Internet: OK" || echo "Internet: FAILED"
+
+        # Tailscale connectivity
+        if command -v tailscale >/dev/null 2>&1; then
+          tailscale status >/dev/null 2>&1 && echo "Tailscale: OK" || echo "Tailscale: FAILED"
+        fi
+      register: pre_update_check
+      changed_when: false
+
+    - name: Discover updatable containers
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== CONTAINER UPDATE DISCOVERY ==="
+
+        # Get current container information
+        echo "Current Container Status:"
+        docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.RunningFor}}" 2>/dev/null
+        echo ""
+
+        # Check for available image updates
+        echo "Checking for image updates:"
+        docker images --format "{{.Repository}}:{{.Tag}}" 2>/dev/null | grep -v "<none>" | while read image; do
+          if [ -n "$image" ]; then
+            echo "Checking: $image"
+
+            # Pull latest image to compare
+            if docker pull "$image" >/dev/null 2>&1; then
+              # Compare image IDs
+              current_id=$(docker images "$image" --format "{{.ID}}" | head -1)
+              echo "  Current ID: $current_id"
+
+              # Check if any containers are using this image
+              containers_using=$(docker ps --filter "ancestor=$image" --format "{{.Names}}" 2>/dev/null | tr '\n' ' ')
+              if [ -n "$containers_using" ]; then
+                echo "  Used by containers: $containers_using"
+              else
+                echo "  No running containers using this image"
+              fi
+            else
+              echo "  ❌ Failed to pull latest image"
+            fi
+            echo ""
+          fi
+        done
+      register: container_discovery
+      changed_when: false
+      when: not skip_docker
+
+    - name: Create container backup snapshots
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== CREATING CONTAINER SNAPSHOTS ==="
+
+        # Create snapshots of running containers
+        docker ps --format "{{.Names}}" 2>/dev/null | while read container; do
+          if [ -n "$container" ]; then
+            echo "Creating snapshot for: $container"
+
+            # Commit container to backup image
+            backup_image="${container}_backup_$(date +%Y%m%d_%H%M%S)"
+            if docker commit "$container" "$backup_image" >/dev/null 2>&1; then
+              echo "  ✅ Snapshot created: $backup_image"
+            else
+              echo "  ❌ Failed to create snapshot"
+            fi
+          fi
+        done
+        echo ""
+
+        # Export Docker Compose configurations
+        echo "Backing up Docker Compose files:"
+        find /opt /home -name "docker-compose*.yml" -o -name "compose*.yml" 2>/dev/null | while read compose_file; do
+          if [ -f "$compose_file" ]; then
+            backup_file="/tmp/$(basename "$compose_file").backup.$(date +%Y%m%d_%H%M%S)"
+            cp "$compose_file" "$backup_file" 2>/dev/null && echo "  ✅ Backed up: $compose_file -> $backup_file"
+          fi
+        done
+      register: backup_snapshots
+      changed_when: false
+      when: not skip_docker and rollback_enabled
+
+    - name: Orchestrated container updates
+      block:
+        - name: Update containers by priority groups
+          shell: |
+            echo "=== ORCHESTRATED CONTAINER UPDATES ==="
+
+            # Define update priority groups
+            # Priority 1: Infrastructure services (databases, caches)
+            # Priority 2: Application services
+            # Priority 3: Monitoring and auxiliary services
+
+            priority_1="postgres mysql mariadb redis mongo elasticsearch rabbitmq"
+            priority_2="nginx apache traefik caddy"
+            priority_3="grafana prometheus node-exporter"
+
+            update_group() {
+              local group_name="$1"
+              local containers="$2"
+
+              echo "Updating $group_name containers..."
+
+              for pattern in $containers; do
+                matching_containers=$(docker ps --format "{{.Names}}" 2>/dev/null | grep -i "$pattern" || true)
+
+                for container in $matching_containers; do
+                  if [ -n "$container" ]; then
+                    echo "  Updating: $container"
+
+                    # Get current image
+                    current_image=$(docker inspect "$container" --format '{{.Config.Image}}' 2>/dev/null)
+
+                    # Pull latest image
+                    if docker pull "$current_image" >/dev/null 2>&1; then
+                      echo "    ✅ Image updated: $current_image"
+
+                      # Recreate container with new image
+                      if docker-compose -f "$(find /opt /home -name "*compose*.yml" -exec grep -l "$container" {} \; | head -1)" up -d "$container" >/dev/null 2>&1; then
+                        echo "    ✅ Container recreated successfully"
+
+                        # Wait for container to be healthy
+                        sleep {{ health_check_delay }}
+
+                        # Check container health
+                        if [ "$(docker inspect "$container" --format '{{.State.Status}}' 2>/dev/null)" = "running" ]; then
+                          echo "    ✅ Container is running"
+                        else
+                          echo "    ❌ Container failed to start"
+                        fi
+                      else
+                        echo "    ❌ Failed to recreate container"
+                      fi
+                    else
+                      echo "    ⚠️  No image update available"
+                    fi
+
+                    echo ""
+                  fi
+                done
+              done
+            }
+
+            # Execute updates by priority
+            update_group "Priority 1 (Infrastructure)" "$priority_1"
+            sleep 30  # Wait between priority groups
+
+            update_group "Priority 2 (Applications)" "$priority_2"
+            sleep 30
+
+            update_group "Priority 3 (Monitoring)" "$priority_3"
+
+            echo "Orchestrated updates completed"
+          register: orchestrated_updates
+          when: update_mode is defined and update_mode == "orchestrated"
+
+        - name: Update specific container
+          shell: |
+            echo "=== UPDATING SPECIFIC CONTAINER ==="
+
+            container="{{ target_container }}"
+
+            if ! docker ps --format "{{.Names}}" | grep -q "^${container}$"; then
+              echo "❌ Container '$container' not found or not running"
+              exit 1
+            fi
+
+            echo "Updating container: $container"
+
+            # Get current image
+            current_image=$(docker inspect "$container" --format '{{.Config.Image}}' 2>/dev/null)
+            echo "Current image: $current_image"
+
+            # Pull latest image
+            echo "Pulling latest image..."
+            if docker pull "$current_image"; then
+              echo "✅ Image pulled successfully"
+
+              # Find compose file
+              compose_file=$(find /opt /home -name "*compose*.yml" -exec grep -l "$container" {} \; | head -1)
+
+              if [ -n "$compose_file" ]; then
+                echo "Using compose file: $compose_file"
+
+                # Update container using compose
+                if docker-compose -f "$compose_file" up -d "$container"; then
+                  echo "✅ Container updated successfully"
+
+                  # Health check
+                  echo "Performing health check..."
+                  sleep {{ health_check_delay }}
+
+                  retries={{ health_check_retries }}
+                  while [ $retries -gt 0 ]; do
+                    if [ "$(docker inspect "$container" --format '{{.State.Status}}' 2>/dev/null)" = "running" ]; then
+                      echo "✅ Container is healthy"
+                      break
+                    else
+                      echo "⏳ Waiting for container to be ready... ($retries retries left)"
+                      sleep {{ health_check_delay }}
+                      retries=$((retries - 1))
+                    fi
+                  done
+
+                  if [ $retries -eq 0 ]; then
+                    echo "❌ Container failed health check"
+                    exit 1
+                  fi
+                else
+                  echo "❌ Failed to update container"
+                  exit 1
+                fi
+              else
+                echo "⚠️  No compose file found, using direct Docker commands"
+                docker restart "$container"
+              fi
+            else
+              echo "❌ Failed to pull image"
+              exit 1
+            fi
+          register: specific_update
+          when: target_container is defined
+
+      when: not skip_docker
+
+    - name: Post-update verification
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== POST-UPDATE VERIFICATION ==="
+
+        # Check all containers are running
+        echo "Container Status Check:"
+        failed_containers=""
+        docker ps -a --format "{{.Names}}\t{{.Status}}" 2>/dev/null | while IFS=$'\t' read name status; do
+          if [ -n "$name" ]; then
+            if echo "$status" | grep -q "Up"; then
+              echo "✅ $name: $status"
+            else
+              echo "❌ $name: $status"
+              failed_containers="$failed_containers $name"
+            fi
+          fi
+        done
+
+        # Check system resources after update
+        echo ""
+        echo "System Resources After Update:"
+        echo "Memory: $(free -h | awk 'NR==2{print $3"/"$2" ("$3*100/$2"%)"}')"
+        echo "Load: $(uptime | awk -F'load average:' '{print $2}')"
+
+        # Check for any error logs
+        echo ""
+        echo "Recent Error Logs:"
+        docker ps --format "{{.Names}}" 2>/dev/null | head -5 | while read container; do
+          if [ -n "$container" ]; then
+            errors=$(docker logs "$container" --since="5m" 2>&1 | grep -i error | wc -l)
+            if [ "$errors" -gt "0" ]; then
+              echo "⚠️  $container: $errors error(s) in last 5 minutes"
+            fi
+          fi
+        done
+      register: post_update_verification
+      changed_when: false
+      when: not skip_docker
+
+    - name: Rollback on failure
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== ROLLBACK PROCEDURE ==="
+
+        # Check if rollback is needed
+        failed_containers=$(docker ps -a --filter "status=exited" --format "{{.Names}}" 2>/dev/null | head -5)
+
+        if [ -n "$failed_containers" ]; then
+          echo "Failed containers detected: $failed_containers"
+          echo "Initiating rollback..."
+
+          for container in $failed_containers; do
+            echo "Rolling back: $container"
+
+            # Find backup image
+            backup_image=$(docker images --format "{{.Repository}}" | grep "${container}_backup_" | head -1)
+
+            if [ -n "$backup_image" ]; then
+              echo "  Found backup image: $backup_image"
+
+              # Stop current container
+              docker stop "$container" 2>/dev/null || true
+              docker rm "$container" 2>/dev/null || true
+
+              # Start container from backup image
+              if docker run -d --name "$container" "$backup_image"; then
+                echo "  ✅ Rollback successful"
+              else
+                echo "  ❌ Rollback failed"
+              fi
+            else
+              echo "  ⚠️  No backup image found"
+            fi
+          done
+        else
+          echo "No rollback needed - all containers are healthy"
+        fi
+      register: rollback_result
+      when: not skip_docker and rollback_enabled and (orchestrated_updates.rc is defined and orchestrated_updates.rc != 0) or (specific_update.rc is defined and specific_update.rc != 0)
+      ignore_errors: yes
+
+    - name: Cleanup old backup images
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== CLEANUP OLD BACKUPS ==="
+
+        # Remove backup images older than 7 days
+        old_backups=$(docker images --format "{{.Repository}}\t{{.CreatedAt}}" | grep "_backup_" | awk '$2 < "'$(date -d '7 days ago' '+%Y-%m-%d')'"' | cut -f1)
+
+        if [ -n "$old_backups" ]; then
+          echo "Removing old backup images:"
+          for backup in $old_backups; do
+            echo "  Removing: $backup"
+            docker rmi "$backup" 2>/dev/null || echo "    Failed to remove $backup"
+          done
+        else
+          echo "No old backup images to clean up"
+        fi
+
+        # Clean up temporary backup files
+        find /tmp -name "*.backup.*" -mtime +7 -delete 2>/dev/null || true
+      register: cleanup_result
+      when: not skip_docker
+      ignore_errors: yes
+
+    - name: Create update report
+      set_fact:
+        update_report:
+          timestamp: "{{ update_timestamp }}"
+          hostname: "{{ inventory_hostname }}"
+          docker_available: "{{ not skip_docker }}"
+          pre_update_check: "{{ pre_update_check.stdout }}"
+          container_discovery: "{{ container_discovery.stdout if not skip_docker else 'Docker not available' }}"
+          backup_snapshots: "{{ backup_snapshots.stdout if not skip_docker and rollback_enabled else 'Snapshots disabled' }}"
+          orchestrated_updates: "{{ orchestrated_updates.stdout if orchestrated_updates is defined else 'Not performed' }}"
+          specific_update: "{{ specific_update.stdout if specific_update is defined else 'Not performed' }}"
+          post_update_verification: "{{ post_update_verification.stdout if not skip_docker else 'Docker not available' }}"
+          rollback_result: "{{ rollback_result.stdout if rollback_result is defined else 'Not needed' }}"
+          cleanup_result: "{{ cleanup_result.stdout if not skip_docker else 'Docker not available' }}"
+
+    - name: Display update report
+      debug:
+        msg: |
+
+          ==========================================
+          🔄 CONTAINER UPDATE REPORT - {{ inventory_hostname }}
+          ==========================================
+
+          📊 DOCKER AVAILABLE: {{ 'Yes' if update_report.docker_available else 'No' }}
+
+          🔍 PRE-UPDATE CHECK:
+          {{ update_report.pre_update_check }}
+
+          🔍 CONTAINER DISCOVERY:
+          {{ update_report.container_discovery }}
+
+          💾 BACKUP SNAPSHOTS:
+          {{ update_report.backup_snapshots }}
+
+          🔄 ORCHESTRATED UPDATES:
+          {{ update_report.orchestrated_updates }}
+
+          🎯 SPECIFIC UPDATE:
+          {{ update_report.specific_update }}
+
+          ✅ POST-UPDATE VERIFICATION:
+          {{ update_report.post_update_verification }}
+
+          ↩️  ROLLBACK RESULT:
+          {{ update_report.rollback_result }}
+
+          🧹 CLEANUP RESULT:
+          {{ update_report.cleanup_result }}
+
+          ==========================================
+
+    - name: Generate JSON update report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ update_report.timestamp }}",
+            "hostname": "{{ update_report.hostname }}",
+            "docker_available": {{ update_report.docker_available | lower }},
+            "pre_update_check": {{ update_report.pre_update_check | to_json }},
+            "container_discovery": {{ update_report.container_discovery | to_json }},
+            "backup_snapshots": {{ update_report.backup_snapshots | to_json }},
+            "orchestrated_updates": {{ update_report.orchestrated_updates | to_json }},
+            "specific_update": {{ update_report.specific_update | to_json }},
+            "post_update_verification": {{ update_report.post_update_verification | to_json }},
+            "rollback_result": {{ update_report.rollback_result | to_json }},
+            "cleanup_result": {{ update_report.cleanup_result | to_json }},
+            "recommendations": [
+              "Test updates in staging environment first",
+              "Monitor container health after updates",
+              "Maintain regular backup snapshots",
+              "Keep rollback procedures tested and ready",
+              "Schedule updates during maintenance windows"
+            ]
+          }
+        dest: "{{ update_report_dir }}/{{ inventory_hostname }}_container_updates_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          🔄 Container update orchestration complete for {{ inventory_hostname }}
+          📄 Report saved to: {{ update_report_dir }}/{{ inventory_hostname }}_container_updates_{{ ansible_date_time.epoch }}.json
+
+          {% if target_container is defined %}
+          🎯 Updated container: {{ target_container }}
+          {% endif %}
+
+          {% if update_mode is defined %}
+          🔄 Update mode: {{ update_mode }}
+          {% endif %}
+
+          💡 Use -e target_container=<name> to update specific containers
+          💡 Use -e update_mode=orchestrated for priority-based updates
+          💡 Use -e rollback_enabled=false to disable automatic rollback
--- a/ansible/automation/playbooks/cron_audit.yml
+++ b/ansible/automation/playbooks/cron_audit.yml
@@ -0,0 +1,276 @@
+---
+# Cron Audit Playbook
+# Inventories all scheduled tasks across every host and flags basic security concerns.
+# Covers /etc/crontab, /etc/cron.d/, /etc/cron.{hourly,daily,weekly,monthly},
+# user crontab spools, and systemd timers.
+# Usage: ansible-playbook playbooks/cron_audit.yml
+# Usage: ansible-playbook playbooks/cron_audit.yml -e "host_target=rpi"
+
+- name: Cron Audit — Scheduled Task Inventory
+  hosts: "{{ host_target | default('active') }}"
+  gather_facts: yes
+  ignore_unreachable: true
+
+  vars:
+    report_dir: "/tmp/cron_audit"
+
+  tasks:
+
+    # ---------- Setup ----------
+
+    - name: Create cron audit report directory
+      ansible.builtin.file:
+        path: "{{ report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    # ---------- /etc/crontab ----------
+
+    - name: Read /etc/crontab
+      ansible.builtin.shell: cat /etc/crontab 2>/dev/null || echo "(not present)"
+      register: etc_crontab
+      changed_when: false
+      failed_when: false
+
+    # ---------- /etc/cron.d/ ----------
+
+    - name: Read /etc/cron.d/ entries
+      ansible.builtin.shell: |
+        if [ -d /etc/cron.d ] && [ -n "$(ls /etc/cron.d/ 2>/dev/null)" ]; then
+          for f in /etc/cron.d/*; do
+            [ -f "$f" ] || continue
+            echo "=== $f ==="
+            cat "$f" 2>/dev/null
+            echo ""
+          done
+        else
+          echo "(not present or empty)"
+        fi
+      register: cron_d_entries
+      changed_when: false
+      failed_when: false
+
+    # ---------- /etc/cron.{hourly,daily,weekly,monthly} ----------
+
+    - name: Read /etc/cron.{hourly,daily,weekly,monthly} script names
+      ansible.builtin.shell: |
+        for dir in hourly daily weekly monthly; do
+          path="/etc/cron.$dir"
+          if [ -d "$path" ]; then
+            echo "=== $path ==="
+            ls "$path" 2>/dev/null || echo "(empty)"
+            echo ""
+          fi
+        done
+        if [ ! -d /etc/cron.hourly ] && [ ! -d /etc/cron.daily ] && \
+           [ ! -d /etc/cron.weekly ] && [ ! -d /etc/cron.monthly ]; then
+          echo "(no cron period directories present)"
+        fi
+      register: cron_period_dirs
+      changed_when: false
+      failed_when: false
+
+    # ---------- List users with crontabs ----------
+
+    - name: List users with crontabs
+      ansible.builtin.shell: |
+        # Debian/Ubuntu path
+        if [ -d /var/spool/cron/crontabs ]; then
+          spool_dir="/var/spool/cron/crontabs"
+        elif [ -d /var/spool/cron ]; then
+          spool_dir="/var/spool/cron"
+        else
+          echo "(no crontab spool directory found)"
+          exit 0
+        fi
+        files=$(ls "$spool_dir" 2>/dev/null)
+        if [ -z "$files" ]; then
+          echo "(no user crontabs found in $spool_dir)"
+        else
+          echo "$files"
+        fi
+      register: crontab_users
+      changed_when: false
+      failed_when: false
+
+    # ---------- Dump user crontab contents ----------
+
+    - name: Dump user crontab contents
+      ansible.builtin.shell: |
+        # Debian/Ubuntu path
+        if [ -d /var/spool/cron/crontabs ]; then
+          spool_dir="/var/spool/cron/crontabs"
+        elif [ -d /var/spool/cron ]; then
+          spool_dir="/var/spool/cron"
+        else
+          echo "(no crontab spool directory found)"
+          exit 0
+        fi
+        found=0
+        for f in "$spool_dir"/*; do
+          [ -f "$f" ] || continue
+          found=1
+          echo "=== $f ==="
+          cat "$f" 2>/dev/null || echo "(unreadable)"
+          echo ""
+        done
+        if [ "$found" -eq 0 ]; then
+          echo "(no user crontab files found)"
+        fi
+      register: crontab_contents
+      changed_when: false
+      failed_when: false
+
+    # ---------- Systemd timers ----------
+
+    - name: List systemd timers
+      ansible.builtin.shell: |
+        if command -v systemctl >/dev/null 2>&1; then
+          systemctl list-timers --all --no-pager 2>/dev/null
+        else
+          echo "(not a systemd host)"
+        fi
+      register: systemd_timers
+      changed_when: false
+      failed_when: false
+
+    # ---------- Security flag: REDACTED_APP_PASSWORD world-writable paths ----------
+
+    - name: Security flag - REDACTED_APP_PASSWORD world-writable path references
+      ansible.builtin.shell: |
+        flagged=""
+
+        # Collect root cron entries from /etc/crontab
+        if [ -f /etc/crontab ]; then
+          while IFS= read -r line; do
+            # Skip comments, empty lines, and variable assignment lines (e.g. MAILTO="")
+            case "$line" in
+              '#'*|''|*'='*) continue ;;
+            esac
+            # Lines where 6th field indicates root user (field 6) — format: min hr dom mon dow user cmd
+            user=$(echo "$line" | awk '{print $6}')
+            if [ "$user" = "root" ]; then
+              cmd=$(echo "$line" | awk '{for(i=7;i<=NF;i++) printf "%s ", $i; print ""}')
+              bin=$(echo "$cmd" | awk '{print $1}')
+              if [ -n "$bin" ] && [ -f "$bin" ]; then
+                if [ "$(find "$bin" -maxdepth 0 -perm -002 2>/dev/null)" = "$bin" ]; then
+                  flagged="$flagged\nFLAGGED: /etc/crontab root job uses world-writable binary: $bin"
+                fi
+              fi
+            fi
+          done < /etc/crontab
+        fi
+
+        # Collect root cron entries from /etc/cron.d/*
+        if [ -d /etc/cron.d ]; then
+          for f in /etc/cron.d/*; do
+            [ -f "$f" ] || continue
+            while IFS= read -r line; do
+              case "$line" in
+                '#'*|''|*'='*) continue ;;
+              esac
+              user=$(echo "$line" | awk '{print $6}')
+              if [ "$user" = "root" ]; then
+                cmd=$(echo "$line" | awk '{for(i=7;i<=NF;i++) printf "%s ", $i; print ""}')
+                bin=$(echo "$cmd" | awk '{print $1}')
+                if [ -n "$bin" ] && [ -f "$bin" ]; then
+                  if [ "$(find "$bin" -maxdepth 0 -perm -002 2>/dev/null)" = "$bin" ]; then
+                    flagged="$flagged\nFLAGGED: $f root job uses world-writable binary: $bin"
+                  fi
+                fi
+              fi
+            done < "$f"
+          done
+        fi
+
+        # Collect root crontab from spool
+        for spool in /var/spool/cron/crontabs/root /var/spool/cron/root; do
+          if [ -f "$spool" ]; then
+            while IFS= read -r line; do
+              case "$line" in
+                '#'*|'') continue ;;
+              esac
+              # User crontab format: min hr dom mon dow cmd (no user field)
+              cmd=$(echo "$line" | awk '{for(i=6;i<=NF;i++) printf "%s ", $i; print ""}')
+              bin=$(echo "$cmd" | awk '{print $1}')
+              if [ -n "$bin" ] && [ -f "$bin" ]; then
+                if [ "$(find "$bin" -maxdepth 0 -perm -002 2>/dev/null)" = "$bin" ]; then
+                  flagged="$flagged\nFLAGGED: $spool job uses world-writable binary: $bin"
+                fi
+              fi
+            done < "$spool"
+          fi
+        done
+
+        # Check /etc/cron.{hourly,daily,weekly,monthly} scripts (run as root by run-parts)
+        for dir in /etc/cron.hourly /etc/cron.daily /etc/cron.weekly /etc/cron.monthly; do
+          [ -d "$dir" ] || continue
+          for f in "$dir"/*; do
+            [ -f "$f" ] || continue
+            if [ "$(find "$f" -maxdepth 0 -perm -002 2>/dev/null)" = "$f" ]; then
+              flagged="${flagged}\nFLAGGED: $f (run-parts cron dir) is world-writable"
+            fi
+          done
+        done
+
+        if [ -z "$flagged" ]; then
+          echo "No world-writable cron script paths found"
+        else
+          printf "%b\n" "$flagged"
+        fi
+      register: security_flags
+      changed_when: false
+      failed_when: false
+
+    # ---------- Per-host summary ----------
+
+    - name: Per-host cron audit summary
+      ansible.builtin.debug:
+        msg: |
+          ==========================================
+          CRON AUDIT SUMMARY: {{ inventory_hostname }}
+          ==========================================
+
+          === /etc/crontab ===
+          {{ etc_crontab.stdout | default('(not collected)') }}
+
+          === /etc/cron.d/ ===
+          {{ cron_d_entries.stdout | default('(not collected)') }}
+
+          === Cron Period Directories ===
+          {{ cron_period_dirs.stdout | default('(not collected)') }}
+
+          === Users with Crontabs ===
+          {{ crontab_users.stdout | default('(not collected)') }}
+
+          === User Crontab Contents ===
+          {{ crontab_contents.stdout | default('(not collected)') }}
+
+          === Systemd Timers ===
+          {{ systemd_timers.stdout | default('(not collected)') }}
+
+          === Security Flags ===
+          {{ security_flags.stdout | default('(not collected)') }}
+
+          ==========================================
+
+    # ---------- Per-host JSON report ----------
+
+    - name: Write per-host JSON cron audit report
+      ansible.builtin.copy:
+        content: "{{ {
+            'timestamp': ansible_date_time.iso8601,
+            'hostname': inventory_hostname,
+            'etc_crontab': etc_crontab.stdout | default('') | trim,
+            'cron_d_entries': cron_d_entries.stdout | default('') | trim,
+            'cron_period_dirs': cron_period_dirs.stdout | default('') | trim,
+            'crontab_users': crontab_users.stdout | default('') | trim,
+            'crontab_contents': crontab_contents.stdout | default('') | trim,
+            'systemd_timers': systemd_timers.stdout | default('') | trim,
+            'security_flags': security_flags.stdout | default('') | trim
+          } | to_nice_json }}"
+        dest: "{{ report_dir }}/{{ inventory_hostname }}_{{ ansible_date_time.date }}.json"
+      delegate_to: localhost
+      changed_when: false
--- a/ansible/automation/playbooks/disaster_recovery_orchestrator.yml
+++ b/ansible/automation/playbooks/disaster_recovery_orchestrator.yml
@@ -0,0 +1,510 @@
+---
+# Disaster Recovery Orchestrator
+# Full infrastructure backup and recovery procedures
+# Run with: ansible-playbook -i hosts.ini playbooks/disaster_recovery_orchestrator.yml
+
+- name: Disaster Recovery Orchestrator
+  hosts: all
+  gather_facts: yes
+  vars:
+    dr_backup_root: "/volume1/disaster-recovery"
+    recovery_priority_tiers:
+      tier_1_critical:
+        - "postgres"
+        - "mariadb"
+        - "authentik-server"
+        - "nginx-proxy-manager"
+        - "portainer"
+      tier_2_infrastructure:
+        - "prometheus"
+        - "grafana"
+        - "gitea"
+        - "adguard"
+        - "tailscale"
+      tier_3_services:
+        - "plex"
+        - "immich-server"
+        - "paperlessngx"
+        - "vaultwarden"
+      tier_4_optional:
+        - "sonarr"
+        - "radarr"
+        - "jellyseerr"
+        - "homarr"
+
+    backup_retention:
+      daily: 7
+      weekly: 4
+      monthly: 12
+
+  tasks:
+    - name: Create disaster recovery directory structure
+      file:
+        path: "{{ dr_backup_root }}/{{ item }}"
+        state: directory
+        mode: '0755'
+      loop:
+        - "configs"
+        - "databases"
+        - "volumes"
+        - "system"
+        - "recovery-plans"
+        - "verification"
+      when: inventory_hostname in groups['synology']
+      become: yes
+
+    - name: Generate system inventory
+      shell: |
+        echo "=== System Inventory for {{ inventory_hostname }} ==="
+        echo "Timestamp: $(date)"
+        echo "Hostname: $(hostname)"
+        echo "IP Address: {{ ansible_default_ipv4.address }}"
+        echo "OS: {{ ansible_facts['os_family'] }} {{ ansible_facts['distribution_version'] }}"
+        echo ""
+
+        echo "=== Hardware Information ==="
+        echo "CPU: $(nproc) cores"
+        echo "Memory: $(free -h | grep '^Mem:' | awk '{print $2}')"
+        echo "Disk Usage:"
+        df -h | grep -E '^/dev|^tmpfs' | head -10
+        echo ""
+
+        echo "=== Network Configuration ==="
+        ip addr show | grep -E '^[0-9]+:|inet ' | head -20
+        echo ""
+
+        echo "=== Running Services ==="
+        if command -v systemctl >/dev/null 2>&1; then
+          systemctl list-units --type=service --state=running | head -20
+        fi
+        echo ""
+
+        echo "=== Docker Containers ==="
+        if command -v docker >/dev/null 2>&1; then
+          docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Image}}" | head -20
+        fi
+      register: system_inventory
+
+    - name: Backup critical configurations
+      shell: |
+        backup_date=$(date +%Y%m%d_%H%M%S)
+        config_backup="{{ dr_backup_root }}/configs/{{ inventory_hostname }}_configs_${backup_date}.tar.gz"
+
+        echo "Creating configuration backup: $config_backup"
+
+        # Create list of critical config paths
+        config_paths=""
+
+        # System configs
+        [ -d /etc ] && config_paths="$config_paths /etc/hosts /etc/hostname /etc/fstab /etc/crontab"
+        [ -d /etc/systemd ] && config_paths="$config_paths /etc/systemd/system"
+        [ -d /etc/nginx ] && config_paths="$config_paths /etc/nginx"
+        [ -d /etc/docker ] && config_paths="$config_paths /etc/docker"
+
+        # Docker compose files
+        if [ -d /volume1/docker ]; then
+          find /volume1/docker -name "docker-compose.yml" -o -name "*.env" > /tmp/docker_configs.txt
+          config_paths="$config_paths $(cat /tmp/docker_configs.txt | tr '\n' ' ')"
+        fi
+
+        # SSH configs
+        [ -d /root/.ssh ] && config_paths="$config_paths /root/.ssh"
+        [ -d /home/*/.ssh ] && config_paths="$config_paths /home/*/.ssh"
+
+        # Create backup
+        if [ -n "$config_paths" ]; then
+          tar -czf "$config_backup" $config_paths 2>/dev/null || true
+          if [ -f "$config_backup" ]; then
+            size=$(du -h "$config_backup" | cut -f1)
+            echo "✓ Configuration backup created: $size"
+          else
+            echo "✗ Configuration backup failed"
+          fi
+        else
+          echo "No configuration paths found"
+        fi
+      register: config_backup
+      when: inventory_hostname in groups['synology']
+      become: yes
+
+    - name: Backup databases with consistency checks
+      shell: |
+        backup_date=$(date +%Y%m%d_%H%M%S)
+        db_backup_dir="{{ dr_backup_root }}/databases/{{ inventory_hostname }}_${backup_date}"
+        mkdir -p "$db_backup_dir"
+
+        echo "=== Database Backup for {{ inventory_hostname }} ==="
+
+        # PostgreSQL databases
+        for container in $(docker ps --filter "ancestor=postgres" --format "{{.Names}}" 2>/dev/null); do
+          echo "Backing up PostgreSQL container: $container"
+
+          # Create backup
+          docker exec "$container" pg_dumpall -U postgres > "${db_backup_dir}/${container}_postgres.sql" 2>/dev/null
+
+          # Verify backup
+          if [ -s "${db_backup_dir}/${container}_postgres.sql" ]; then
+            lines=$(wc -l < "${db_backup_dir}/${container}_postgres.sql")
+            size=$(du -h "${db_backup_dir}/${container}_postgres.sql" | cut -f1)
+            echo "✓ $container: $lines lines, $size"
+
+            # Test restore (dry run)
+            if docker exec "$container" psql -U postgres -c "SELECT version();" >/dev/null 2>&1; then
+              echo "✓ $container: Database connection verified"
+            else
+              echo "✗ $container: Database connection failed"
+            fi
+          else
+            echo "✗ $container: Backup failed or empty"
+          fi
+        done
+
+        # MariaDB/MySQL databases
+        for container in $(docker ps --filter "ancestor=mariadb" --format "{{.Names}}" 2>/dev/null); do
+          echo "Backing up MariaDB container: $container"
+
+          docker exec "$container" mysqldump --all-databases -u root > "${db_backup_dir}/${container}_mariadb.sql" 2>/dev/null
+
+          if [ -s "${db_backup_dir}/${container}_mariadb.sql" ]; then
+            lines=$(wc -l < "${db_backup_dir}/${container}_mariadb.sql")
+            size=$(du -h "${db_backup_dir}/${container}_mariadb.sql" | cut -f1)
+            echo "✓ $container: $lines lines, $size"
+          else
+            echo "✗ $container: Backup failed or empty"
+          fi
+        done
+
+        # MongoDB databases
+        for container in $(docker ps --filter "ancestor=mongo" --format "{{.Names}}" 2>/dev/null); do
+          echo "Backing up MongoDB container: $container"
+
+          docker exec "$container" mongodump --archive > "${db_backup_dir}/${container}_mongodb.archive" 2>/dev/null
+
+          if [ -s "${db_backup_dir}/${container}_mongodb.archive" ]; then
+            size=$(du -h "${db_backup_dir}/${container}_mongodb.archive" | cut -f1)
+            echo "✓ $container: $size"
+          else
+            echo "✗ $container: Backup failed or empty"
+          fi
+        done
+
+        echo "Database backup completed: $db_backup_dir"
+      register: database_backup
+      when: inventory_hostname in groups['synology']
+      become: yes
+
+    - name: Create recovery plan document
+      copy:
+        content: |
+          # Disaster Recovery Plan - {{ inventory_hostname }}
+          Generated: {{ ansible_date_time.iso8601 }}
+
+          ## System Information
+          - Hostname: {{ inventory_hostname }}
+          - IP Address: {{ ansible_default_ipv4.address }}
+          - OS: {{ ansible_facts['os_family'] }} {{ ansible_facts['distribution_version'] }}
+          - Groups: {{ group_names | join(', ') }}
+
+          ## Recovery Priority Order
+
+          ### Tier 1 - Critical Infrastructure (Start First)
+          {% for service in recovery_priority_tiers.tier_1_critical %}
+          - {{ service }}
+          {% endfor %}
+
+          ### Tier 2 - Core Infrastructure
+          {% for service in recovery_priority_tiers.tier_2_infrastructure %}
+          - {{ service }}
+          {% endfor %}
+
+          ### Tier 3 - Applications
+          {% for service in recovery_priority_tiers.tier_3_services %}
+          - {{ service }}
+          {% endfor %}
+
+          ### Tier 4 - Optional Services
+          {% for service in recovery_priority_tiers.tier_4_optional %}
+          - {{ service }}
+          {% endfor %}
+
+          ## Recovery Procedures
+
+          ### 1. System Recovery
+          ```bash
+          # Restore system configurations
+          tar -xzf {{ dr_backup_root }}/configs/{{ inventory_hostname }}_configs_*.tar.gz -C /
+
+          # Restart essential services
+          systemctl restart docker
+          systemctl restart tailscaled
+          ```
+
+          ### 2. Database Recovery
+          ```bash
+          # PostgreSQL restore example
+          docker exec -i <postgres_container> psql -U postgres < backup.sql
+
+          # MariaDB restore example
+          docker exec -i <mariadb_container> mysql -u root < backup.sql
+
+          # MongoDB restore example
+          docker exec -i <mongo_container> mongorestore --archive < backup.archive
+          ```
+
+          ### 3. Container Recovery
+          ```bash
+          # Pull latest images
+          docker-compose pull
+
+          # Start containers in priority order
+          docker-compose up -d <tier_1_services>
+          # Wait for health checks, then continue with tier 2, etc.
+          ```
+
+          ## Verification Steps
+
+          ### Health Checks
+          - [ ] All critical containers running
+          - [ ] Database connections working
+          - [ ] Web interfaces accessible
+          - [ ] Monitoring systems operational
+          - [ ] Backup systems functional
+
+          ### Network Connectivity
+          - [ ] Tailscale mesh connected
+          - [ ] DNS resolution working
+          - [ ] External services accessible
+          - [ ] Inter-container communication working
+
+          ## Emergency Contacts & Resources
+
+          ### Key Services URLs
+          {% if inventory_hostname == 'atlantis' %}
+          - Portainer: https://192.168.0.200:9443
+          - Plex: http://{{ ansible_default_ipv4.address }}:32400
+          - Immich: http://{{ ansible_default_ipv4.address }}:2283
+          {% elif inventory_hostname == 'calypso' %}
+          - Gitea: https://git.vish.gg
+          - Authentik: https://auth.vish.gg
+          - Paperless: http://{{ ansible_default_ipv4.address }}:8000
+          {% endif %}
+
+          ### Documentation
+          - Repository: https://git.vish.gg/Vish/homelab
+          - Ansible Playbooks: /home/homelab/organized/repos/homelab/ansible/automation/
+          - Monitoring: https://gf.vish.gg
+
+          ## Backup Locations
+          - Configurations: {{ dr_backup_root }}/configs/
+          - Databases: {{ dr_backup_root }}/databases/
+          - Docker Volumes: {{ dr_backup_root }}/volumes/
+          - System State: {{ dr_backup_root }}/system/
+        dest: "{{ dr_backup_root }}/recovery-plans/{{ inventory_hostname }}_recovery_plan.md"
+      when: inventory_hostname in groups['synology']
+      become: yes
+
+    - name: Test disaster recovery procedures (dry run)
+      shell: |
+        echo "=== Disaster Recovery Test - {{ inventory_hostname }} ==="
+        echo "Timestamp: $(date)"
+        echo ""
+
+        echo "=== Backup Verification ==="
+
+        # Check configuration backups
+        config_backups=$(find {{ dr_backup_root }}/configs -name "{{ inventory_hostname }}_configs_*.tar.gz" 2>/dev/null | wc -l)
+        echo "Configuration backups: $config_backups"
+
+        # Check database backups
+        db_backups=$(find {{ dr_backup_root }}/databases -name "{{ inventory_hostname }}_*" -type d 2>/dev/null | wc -l)
+        echo "Database backup sets: $db_backups"
+
+        echo ""
+        echo "=== Recovery Readiness ==="
+
+        # Check if Docker is available
+        if command -v docker >/dev/null 2>&1; then
+          echo "✓ Docker available"
+
+          # Check if compose files exist
+          compose_files=$(find /volume1/docker -name "docker-compose.yml" 2>/dev/null | wc -l)
+          echo "✓ Docker Compose files: $compose_files"
+        else
+          echo "✗ Docker not available"
+        fi
+
+        # Check Tailscale
+        if command -v tailscale >/dev/null 2>&1; then
+          echo "✓ Tailscale available"
+        else
+          echo "✗ Tailscale not available"
+        fi
+
+        # Check network connectivity
+        if ping -c 1 8.8.8.8 >/dev/null 2>&1; then
+          echo "✓ Internet connectivity"
+        else
+          echo "✗ No internet connectivity"
+        fi
+
+        echo ""
+        echo "=== Critical Service Status ==="
+
+        {% for tier_name, services in recovery_priority_tiers.items() %}
+        echo "{{ tier_name | replace('_', ' ') | title }}:"
+        {% for service in services %}
+        if docker ps --filter "name={{ service }}" --format "{{.Names}}" | grep -q "{{ service }}"; then
+          echo "  ✓ {{ service }}"
+        else
+          echo "  ✗ {{ service }}"
+        fi
+        {% endfor %}
+        echo ""
+        {% endfor %}
+      register: dr_test
+      when: inventory_hostname in groups['synology']
+      become: yes
+
+    - name: Generate disaster recovery report
+      copy:
+        content: |
+          # Disaster Recovery Report - {{ inventory_hostname }}
+          Generated: {{ ansible_date_time.iso8601 }}
+
+          ## System Inventory
+          ```
+          {{ system_inventory.stdout }}
+          ```
+
+          ## Configuration Backup
+          ```
+          {{ config_backup.stdout if config_backup is defined else 'Not performed on this host' }}
+          ```
+
+          ## Database Backup
+          ```
+          {{ database_backup.stdout if database_backup is defined else 'Not performed on this host' }}
+          ```
+
+          ## Recovery Readiness Test
+          ```
+          {{ dr_test.stdout if dr_test is defined else 'Not performed on this host' }}
+          ```
+
+          ## Recommendations
+
+          {% if inventory_hostname in groups['synology'] %}
+          ### For {{ inventory_hostname }}:
+          - ✅ Primary backup location configured
+          - ✅ Recovery plan generated
+          - 🔧 Schedule regular DR tests
+          - 🔧 Verify off-site backup replication
+          {% else %}
+          ### For {{ inventory_hostname }}:
+          - 🔧 Configure local backup procedures
+          - 🔧 Ensure critical data is replicated to Synology hosts
+          - 🔧 Document service-specific recovery steps
+          {% endif %}
+
+          ## Next Steps
+          1. Review recovery plan: {{ dr_backup_root }}/recovery-plans/{{ inventory_hostname }}_recovery_plan.md
+          2. Test recovery procedures in non-production environment
+          3. Schedule regular backup verification
+          4. Update recovery documentation as services change
+        dest: "/tmp/disaster_recovery_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md"
+      delegate_to: localhost
+
+    - name: Display disaster recovery summary
+      debug:
+        msg: |
+          Disaster Recovery Summary for {{ inventory_hostname }}:
+          - System Inventory: ✅ Complete
+          - Configuration Backup: {{ '✅ Complete' if config_backup is defined else '⏭️ Skipped (not Synology)' }}
+          - Database Backup: {{ '✅ Complete' if database_backup is defined else '⏭️ Skipped (not Synology)' }}
+          - Recovery Plan: {{ '✅ Generated' if inventory_hostname in groups['synology'] else '⏭️ Host-specific plan needed' }}
+          - Report: /tmp/disaster_recovery_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md
+
+# Final consolidation task
+- name: Generate Master Disaster Recovery Plan
+  hosts: localhost
+  gather_facts: no
+  tasks:
+    - name: Create master recovery plan
+      shell: |
+        echo "# Master Disaster Recovery Plan - Homelab Infrastructure"
+        echo "Generated: $(date)"
+        echo ""
+        echo "## Infrastructure Overview"
+        echo "- Total Hosts: {{ groups['all'] | length }}"
+        echo "- Synology NAS: {{ groups['synology'] | length }}"
+        echo "- Debian Clients: {{ groups['debian_clients'] | length }}"
+        echo "- Hypervisors: {{ groups['hypervisors'] | length }}"
+        echo ""
+        echo "## Recovery Order by Host"
+        echo ""
+        echo "### Phase 1: Core Infrastructure"
+        {% for host in groups['synology'] %}
+        echo "1. **{{ host }}** - Primary storage and services"
+        {% endfor %}
+        echo ""
+        echo "### Phase 2: Compute Nodes"
+        {% for host in groups['debian_clients'] %}
+        echo "2. **{{ host }}** - Applications and services"
+        {% endfor %}
+        echo ""
+        echo "### Phase 3: Specialized Systems"
+        {% for host in groups['hypervisors'] %}
+        echo "3. **{{ host }}** - Virtualization and specialized services"
+        {% endfor %}
+        echo ""
+        echo "## Critical Recovery Procedures"
+        echo ""
+        echo "### 1. Network Recovery"
+        echo "- Restore Tailscale mesh connectivity"
+        echo "- Verify DNS resolution (AdGuard Home)"
+        echo "- Test inter-host communication"
+        echo ""
+        echo "### 2. Storage Recovery"
+        echo "- Mount all required volumes"
+        echo "- Verify RAID integrity on Synology systems"
+        echo "- Test backup accessibility"
+        echo ""
+        echo "### 3. Service Recovery"
+        echo "- Start Tier 1 services (databases, auth)"
+        echo "- Start Tier 2 services (core infrastructure)"
+        echo "- Start Tier 3 services (applications)"
+        echo "- Start Tier 4 services (optional)"
+        echo ""
+        echo "## Verification Checklist"
+        echo "- [ ] All hosts accessible via Tailscale"
+        echo "- [ ] All critical containers running"
+        echo "- [ ] Monitoring systems operational"
+        echo "- [ ] Backup systems functional"
+        echo "- [ ] User services accessible"
+        echo ""
+        echo "## Emergency Resources"
+        echo "- Repository: https://git.vish.gg/Vish/homelab"
+        echo "- Ansible Playbooks: /home/homelab/organized/repos/homelab/ansible/automation/"
+        echo "- Individual Host Reports: /tmp/disaster_recovery_*.md"
+      register: master_plan
+
+    - name: Save master disaster recovery plan
+      copy:
+        content: "{{ master_plan.stdout }}"
+        dest: "/tmp/master_disaster_recovery_plan_{{ ansible_date_time.epoch }}.md"
+
+    - name: Display final summary
+      debug:
+        msg: |
+          🚨 Disaster Recovery Orchestration Complete!
+
+          📋 Generated Reports:
+          - Master Plan: /tmp/master_disaster_recovery_plan_{{ ansible_date_time.epoch }}.md
+          - Individual Reports: /tmp/disaster_recovery_*.md
+          - Recovery Plans: {{ dr_backup_root }}/recovery-plans/ (on Synology hosts)
+
+          🔧 Next Steps:
+          1. Review the master disaster recovery plan
+          2. Test recovery procedures in a safe environment
+          3. Schedule regular DR drills
+          4. Keep recovery documentation updated
--- a/ansible/automation/playbooks/disaster_recovery_test.yml
+++ b/ansible/automation/playbooks/disaster_recovery_test.yml
@@ -0,0 +1,521 @@
+---
+# Disaster Recovery Test Playbook
+# Test disaster recovery procedures and validate backup integrity
+# Usage: ansible-playbook playbooks/disaster_recovery_test.yml
+# Usage: ansible-playbook playbooks/disaster_recovery_test.yml -e "test_type=full"
+# Usage: ansible-playbook playbooks/disaster_recovery_test.yml -e "dry_run=true"
+
+- name: Disaster Recovery Test and Validation
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    test_type: "{{ test_type | default('basic') }}"  # basic, full, restore
+    dry_run: "{{ dry_run | default(true) }}"
+    backup_base_dir: "/volume1/backups"
+    test_restore_dir: "/tmp/dr_test"
+    validate_backups: "{{ validate_backups | default(true) }}"
+    test_failover: "{{ test_failover | default(false) }}"
+
+    # Critical services for DR testing
+    critical_services:
+      atlantis:
+        - name: "immich"
+          containers: ["immich-server", "immich-db", "immich-redis"]
+          data_paths: ["/volume1/docker/immich"]
+          backup_files: ["immich-db_*.sql.gz"]
+          recovery_priority: 1
+        - name: "vaultwarden"
+          containers: ["vaultwarden", "vaultwarden-db"]
+          data_paths: ["/volume1/docker/vaultwarden"]
+          backup_files: ["vaultwarden-db_*.sql.gz"]
+          recovery_priority: 1
+        - name: "plex"
+          containers: ["plex"]
+          data_paths: ["/volume1/docker/plex"]
+          backup_files: ["docker_configs_*.tar.gz"]
+          recovery_priority: 2
+      calypso:
+        - name: "authentik"
+          containers: ["authentik-server", "authentik-worker", "authentik-db"]
+          data_paths: ["/volume1/docker/authentik"]
+          backup_files: ["authentik-db_*.sql.gz"]
+          recovery_priority: 1
+      homelab_vm:
+        - name: "monitoring"
+          containers: ["grafana", "prometheus"]
+          data_paths: ["/opt/docker/grafana", "/opt/docker/prometheus"]
+          backup_files: ["docker_configs_*.tar.gz"]
+          recovery_priority: 2
+
+  tasks:
+    - name: Create DR test directory
+      file:
+        path: "{{ test_restore_dir }}/{{ ansible_date_time.date }}"
+        state: directory
+        mode: '0755'
+
+    - name: Get current critical services for this host
+      set_fact:
+        current_critical_services: "{{ critical_services.get(inventory_hostname, []) }}"
+
+    - name: Display DR test plan
+      debug:
+        msg: |
+          🚨 DISASTER RECOVERY TEST PLAN
+          ===============================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔍 Test Type: {{ test_type }}
+          🧪 Dry Run: {{ dry_run }}
+          💾 Validate Backups: {{ validate_backups }}
+          🔄 Test Failover: {{ test_failover }}
+
+          🎯 Critical Services: {{ current_critical_services | length }}
+          {% for service in current_critical_services %}
+          - {{ service.name }} (Priority {{ service.recovery_priority }})
+          {% endfor %}
+
+    - name: Pre-DR test system snapshot
+      shell: |
+        snapshot_file="{{ test_restore_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_pre_test_snapshot.txt"
+
+        echo "🚨 DISASTER RECOVERY PRE-TEST SNAPSHOT" > "$snapshot_file"
+        echo "=======================================" >> "$snapshot_file"
+        echo "Host: {{ inventory_hostname }}" >> "$snapshot_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$snapshot_file"
+        echo "Test Type: {{ test_type }}" >> "$snapshot_file"
+        echo "" >> "$snapshot_file"
+
+        echo "=== SYSTEM STATUS ===" >> "$snapshot_file"
+        echo "Uptime: $(uptime)" >> "$snapshot_file"
+        echo "Disk Usage:" >> "$snapshot_file"
+        df -h >> "$snapshot_file"
+        echo "" >> "$snapshot_file"
+
+        echo "=== RUNNING CONTAINERS ===" >> "$snapshot_file"
+        docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Image}}" >> "$snapshot_file" 2>/dev/null || echo "Docker not available" >> "$snapshot_file"
+        echo "" >> "$snapshot_file"
+
+        echo "=== CRITICAL SERVICES STATUS ===" >> "$snapshot_file"
+        {% for service in current_critical_services %}
+        echo "--- {{ service.name }} ---" >> "$snapshot_file"
+        {% for container in service.containers %}
+        if docker ps --filter "name={{ container }}" --format "{{.Names}}" | grep -q "{{ container }}"; then
+          echo "✅ {{ container }}: Running" >> "$snapshot_file"
+        else
+          echo "❌ {{ container }}: Not running" >> "$snapshot_file"
+        fi
+        {% endfor %}
+        echo "" >> "$snapshot_file"
+        {% endfor %}
+
+        cat "$snapshot_file"
+      register: pre_test_snapshot
+      changed_when: false
+
+    - name: Validate backup availability and integrity
+      shell: |
+        echo "🔍 BACKUP VALIDATION"
+        echo "===================="
+
+        validation_results=()
+        total_backups=0
+        valid_backups=0
+
+        {% for service in current_critical_services %}
+        echo "📦 Validating {{ service.name }} backups..."
+
+        {% for backup_pattern in service.backup_files %}
+        echo "  Checking pattern: {{ backup_pattern }}"
+
+        # Find backup files matching pattern
+        backup_files=$(find {{ backup_base_dir }}/{{ inventory_hostname }} -name "{{ backup_pattern }}" -mtime -7 2>/dev/null | head -5)
+
+        if [ -n "$backup_files" ]; then
+          for backup_file in $backup_files; do
+            total_backups=$((total_backups + 1))
+            echo "    Found: $(basename $backup_file)"
+
+            # Validate backup integrity
+            if [[ "$backup_file" == *.gz ]]; then
+              if gzip -t "$backup_file" 2>/dev/null; then
+                echo "    ✅ Integrity: Valid"
+                valid_backups=$((valid_backups + 1))
+                validation_results+=("{{ service.name }}:$(basename $backup_file):valid")
+              else
+                echo "    ❌ Integrity: Corrupted"
+                validation_results+=("{{ service.name }}:$(basename $backup_file):corrupted")
+              fi
+            elif [[ "$backup_file" == *.tar* ]]; then
+              if tar -tf "$backup_file" >/dev/null 2>&1; then
+                echo "    ✅ Integrity: Valid"
+                valid_backups=$((valid_backups + 1))
+                validation_results+=("{{ service.name }}:$(basename $backup_file):valid")
+              else
+                echo "    ❌ Integrity: Corrupted"
+                validation_results+=("{{ service.name }}:$(basename $backup_file):corrupted")
+              fi
+            else
+              echo "    ℹ️  Integrity: Cannot validate format"
+              valid_backups=$((valid_backups + 1))  # Assume valid
+              validation_results+=("{{ service.name }}:$(basename $backup_file):assumed_valid")
+            fi
+
+            # Check backup age
+            backup_age=$(find "$backup_file" -mtime +1 | wc -l)
+            if [ $backup_age -eq 0 ]; then
+              echo "    ✅ Age: Recent (< 1 day)"
+            else
+              backup_days=$(( ($(date +%s) - $(stat -c %Y "$backup_file")) / 86400 ))
+              echo "    ⚠️  Age: $backup_days days old"
+            fi
+          done
+        else
+          echo "    ❌ No backups found for pattern: {{ backup_pattern }}"
+          validation_results+=("{{ service.name }}:{{ backup_pattern }}:not_found")
+        fi
+        {% endfor %}
+        echo ""
+        {% endfor %}
+
+        echo "📊 BACKUP VALIDATION SUMMARY:"
+        echo "Total backups checked: $total_backups"
+        echo "Valid backups: $valid_backups"
+        echo "Validation issues: $((total_backups - valid_backups))"
+
+        if [ $valid_backups -lt $total_backups ]; then
+          echo "🚨 BACKUP ISSUES DETECTED!"
+          for result in "${validation_results[@]}"; do
+            if [[ "$result" == *":corrupted" ]] || [[ "$result" == *":not_found" ]]; then
+              echo "  - $result"
+            fi
+          done
+        fi
+      register: backup_validation
+      when: validate_backups | bool
+
+    - name: Test database backup restore (dry run)
+      shell: |
+        echo "🔄 DATABASE RESTORE TEST"
+        echo "========================"
+
+        restore_results=()
+
+        {% for service in current_critical_services %}
+        {% if service.backup_files | select('match', '.*sql.*') | list | length > 0 %}
+        echo "🗄️  Testing {{ service.name }} database restore..."
+
+        # Find latest database backup
+        latest_backup=$(find {{ backup_base_dir }}/{{ inventory_hostname }} -name "*{{ service.name }}*db*.sql*" -mtime -7 2>/dev/null | sort -t_ -k2 -nr | head -1)
+
+        if [ -n "$latest_backup" ]; then
+          echo "  Using backup: $(basename $latest_backup)"
+
+          {% if dry_run %}
+          echo "  DRY RUN: Would restore database from $latest_backup"
+          echo "  DRY RUN: Would create test database for validation"
+          restore_results+=("{{ service.name }}:dry_run_success")
+          {% else %}
+          # Create test database and restore
+          test_db_name="dr_test_{{ service.name }}_{{ ansible_date_time.epoch }}"
+
+          # Find database container
+          db_container=""
+          {% for container in service.containers %}
+          if [[ "{{ container }}" == *"db"* ]]; then
+            db_container="{{ container }}"
+            break
+          fi
+          {% endfor %}
+
+          if [ -n "$db_container" ] && docker ps --filter "name=$db_container" --format "{{.Names}}" | grep -q "$db_container"; then
+            echo "  Creating test database: $test_db_name"
+
+            # Create test database
+            if docker exec "$db_container" createdb -U postgres "$test_db_name" 2>/dev/null; then
+              echo "  ✅ Test database created"
+
+              # Restore backup to test database
+              if [[ "$latest_backup" == *.gz ]]; then
+                if gunzip -c "$latest_backup" | docker exec -i "$db_container" psql -U postgres -d "$test_db_name" >/dev/null 2>&1; then
+                  echo "  ✅ Backup restored successfully"
+                  restore_results+=("{{ service.name }}:restore_success")
+                else
+                  echo "  ❌ Backup restore failed"
+                  restore_results+=("{{ service.name }}:restore_failed")
+                fi
+              else
+                if docker exec -i "$db_container" psql -U postgres -d "$test_db_name" < "$latest_backup" >/dev/null 2>&1; then
+                  echo "  ✅ Backup restored successfully"
+                  restore_results+=("{{ service.name }}:restore_success")
+                else
+                  echo "  ❌ Backup restore failed"
+                  restore_results+=("{{ service.name }}:restore_failed")
+                fi
+              fi
+
+              # Cleanup test database
+              docker exec "$db_container" dropdb -U postgres "$test_db_name" 2>/dev/null
+              echo "  🧹 Test database cleaned up"
+            else
+              echo "  ❌ Failed to create test database"
+              restore_results+=("{{ service.name }}:test_db_failed")
+            fi
+          else
+            echo "  ❌ Database container not found or not running"
+            restore_results+=("{{ service.name }}:db_container_unavailable")
+          fi
+          {% endif %}
+        else
+          echo "  ❌ No database backup found"
+          restore_results+=("{{ service.name }}:no_backup_found")
+        fi
+        echo ""
+        {% endif %}
+        {% endfor %}
+
+        echo "📊 RESTORE TEST SUMMARY:"
+        for result in "${restore_results[@]}"; do
+          echo "  - $result"
+        done
+      register: restore_test
+      when: test_type in ['full', 'restore']
+
+    - name: Test service failover procedures
+      shell: |
+        echo "🔄 SERVICE FAILOVER TEST"
+        echo "========================"
+
+        failover_results=()
+
+        {% if dry_run %}
+        echo "DRY RUN: Failover test simulation"
+
+        {% for service in current_critical_services %}
+        echo "📋 {{ service.name }} failover plan:"
+        echo "  1. Stop containers: {{ service.containers | join(', ') }}"
+        echo "  2. Backup current data"
+        echo "  3. Restore from backup"
+        echo "  4. Start containers"
+        echo "  5. Verify service functionality"
+        failover_results+=("{{ service.name }}:dry_run_planned")
+        echo ""
+        {% endfor %}
+        {% else %}
+        echo "⚠️  LIVE FAILOVER TEST - This will temporarily stop services!"
+
+        # Only test one non-critical service to avoid disruption
+        test_service=""
+        {% for service in current_critical_services %}
+        {% if service.recovery_priority > 1 %}
+        test_service="{{ service.name }}"
+        break
+        {% endif %}
+        {% endfor %}
+
+        if [ -n "$test_service" ]; then
+          echo "Testing failover for: $test_service"
+          # Implementation would go here for actual failover test
+          failover_results+=("$test_service:live_test_completed")
+        else
+          echo "No suitable service found for live failover test"
+          failover_results+=("no_service:live_test_skipped")
+        fi
+        {% endif %}
+
+        echo "📊 FAILOVER TEST SUMMARY:"
+        for result in "${failover_results[@]}"; do
+          echo "  - $result"
+        done
+      register: failover_test
+      when: test_failover | bool
+
+    - name: Test recovery time objectives (RTO)
+      shell: |
+        echo "⏱️  RECOVERY TIME OBJECTIVES TEST"
+        echo "================================="
+
+        rto_results=()
+
+        {% for service in current_critical_services %}
+        echo "📊 {{ service.name }} RTO Analysis:"
+
+        # Estimate recovery times based on service complexity
+        estimated_rto=0
+
+        # Base time for container startup
+        container_count={{ service.containers | length }}
+        estimated_rto=$((estimated_rto + container_count * 30))  # 30s per container
+
+        # Add time for database restore if applicable
+        {% if service.backup_files | select('match', '.*sql.*') | list | length > 0 %}
+        # Find backup size to estimate restore time
+        latest_backup=$(find {{ backup_base_dir }}/{{ inventory_hostname }} -name "*{{ service.name }}*db*.sql*" -mtime -7 2>/dev/null | sort -t_ -k2 -nr | head -1)
+        if [ -n "$latest_backup" ]; then
+          backup_size_mb=$(du -m "$latest_backup" | cut -f1)
+          restore_time=$((backup_size_mb / 10))  # Assume 10MB/s restore speed
+          estimated_rto=$((estimated_rto + restore_time))
+          echo "  Database backup size: ${backup_size_mb}MB"
+          echo "  Estimated restore time: ${restore_time}s"
+        fi
+        {% endif %}
+
+        # Add time for data volume restore
+        {% for data_path in service.data_paths %}
+        if [ -d "{{ data_path }}" ]; then
+          data_size_mb=$(du -sm "{{ data_path }}" 2>/dev/null | cut -f1 || echo "0")
+          if [ $data_size_mb -gt 1000 ]; then  # Only count large data directories
+            data_restore_time=$((data_size_mb / 50))  # Assume 50MB/s for file copy
+            estimated_rto=$((estimated_rto + data_restore_time))
+            echo "  Data directory {{ data_path }}: ${data_size_mb}MB"
+          fi
+        fi
+        {% endfor %}
+
+        echo "  Estimated RTO: ${estimated_rto}s ($(echo "scale=1; $estimated_rto/60" | bc 2>/dev/null || echo "N/A")m)"
+
+        # Define RTO targets
+        target_rto=0
+        case {{ service.recovery_priority }} in
+          1) target_rto=900 ;;   # 15 minutes for critical services
+          2) target_rto=1800 ;;  # 30 minutes for important services
+          *) target_rto=3600 ;;  # 1 hour for other services
+        esac
+
+        echo "  Target RTO: ${target_rto}s ($(echo "scale=1; $target_rto/60" | bc 2>/dev/null || echo "N/A")m)"
+
+        if [ $estimated_rto -le $target_rto ]; then
+          echo "  ✅ RTO within target"
+          rto_results+=("{{ service.name }}:rto_ok:${estimated_rto}s")
+        else
+          echo "  ⚠️  RTO exceeds target"
+          rto_results+=("{{ service.name }}:rto_exceeded:${estimated_rto}s")
+        fi
+        echo ""
+        {% endfor %}
+
+        echo "📊 RTO ANALYSIS SUMMARY:"
+        for result in "${rto_results[@]}"; do
+          echo "  - $result"
+        done
+      register: rto_analysis
+
+    - name: Generate DR test report
+      copy:
+        content: |
+          🚨 DISASTER RECOVERY TEST REPORT - {{ inventory_hostname }}
+          ========================================================
+
+          📅 Test Date: {{ ansible_date_time.iso8601 }}
+          🖥️  Host: {{ inventory_hostname }}
+          🔍 Test Type: {{ test_type }}
+          🧪 Dry Run: {{ dry_run }}
+
+          🎯 CRITICAL SERVICES TESTED: {{ current_critical_services | length }}
+          {% for service in current_critical_services %}
+          - {{ service.name }} (Priority {{ service.recovery_priority }})
+            Containers: {{ service.containers | join(', ') }}
+            Data Paths: {{ service.data_paths | join(', ') }}
+          {% endfor %}
+
+          📊 PRE-TEST SYSTEM STATUS:
+          {{ pre_test_snapshot.stdout }}
+
+          {% if validate_backups %}
+          💾 BACKUP VALIDATION:
+          {{ backup_validation.stdout }}
+          {% endif %}
+
+          {% if test_type in ['full', 'restore'] %}
+          🔄 RESTORE TESTING:
+          {{ restore_test.stdout }}
+          {% endif %}
+
+          {% if test_failover %}
+          🔄 FAILOVER TESTING:
+          {{ failover_test.stdout }}
+          {% endif %}
+
+          ⏱️  RTO ANALYSIS:
+          {{ rto_analysis.stdout }}
+
+          💡 RECOMMENDATIONS:
+          {% if 'BACKUP ISSUES DETECTED' in backup_validation.stdout %}
+          - 🚨 CRITICAL: Fix backup integrity issues immediately
+          {% endif %}
+          {% if 'restore_failed' in restore_test.stdout %}
+          - 🚨 CRITICAL: Database restore failures need investigation
+          {% endif %}
+          {% if 'rto_exceeded' in rto_analysis.stdout %}
+          - ⚠️  Optimize recovery procedures to meet RTO targets
+          {% endif %}
+          - 📅 Schedule regular DR tests (monthly recommended)
+          - 📋 Update DR procedures based on test results
+          - 🎓 Train team on DR procedures
+          - 📊 Monitor backup success rates
+          - 🔄 Test failover procedures in staging environment
+
+          🎯 DR READINESS SCORE:
+          {% set total_checks = 4 %}
+          {% set passed_checks = 0 %}
+          {% if 'BACKUP ISSUES DETECTED' not in backup_validation.stdout %}{% set passed_checks = passed_checks + 1 %}{% endif %}
+          {% if 'restore_failed' not in restore_test.stdout %}{% set passed_checks = passed_checks + 1 %}{% endif %}
+          {% if 'rto_exceeded' not in rto_analysis.stdout %}{% set passed_checks = passed_checks + 1 %}{% endif %}
+          {% set passed_checks = passed_checks + 1 %}  {# Always pass system status #}
+          Score: {{ passed_checks }}/{{ total_checks }} ({{ (passed_checks * 100 / total_checks) | round }}%)
+
+          {% if passed_checks == total_checks %}
+          ✅ EXCELLENT: DR procedures are ready
+          {% elif passed_checks >= 3 %}
+          🟡 GOOD: Minor improvements needed
+          {% else %}
+          🔴 NEEDS WORK: Significant DR issues detected
+          {% endif %}
+
+          ✅ DR TEST COMPLETE
+
+        dest: "{{ test_restore_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_dr_test_report.txt"
+
+    - name: Display DR test summary
+      debug:
+        msg: |
+
+          🚨 DISASTER RECOVERY TEST COMPLETE - {{ inventory_hostname }}
+          ======================================================
+
+          📅 Date: {{ ansible_date_time.date }}
+          🔍 Test Type: {{ test_type }}
+          🧪 Mode: {{ 'Dry Run' if dry_run else 'Live Test' }}
+
+          🎯 CRITICAL SERVICES: {{ current_critical_services | length }}
+
+          📊 TEST RESULTS:
+          {% if validate_backups %}
+          - Backup Validation: {{ '✅ Passed' if 'BACKUP ISSUES DETECTED' not in backup_validation.stdout else '❌ Issues Found' }}
+          {% endif %}
+          {% if test_type in ['full', 'restore'] %}
+          - Restore Testing: {{ '✅ Passed' if 'restore_failed' not in restore_test.stdout else '❌ Issues Found' }}
+          {% endif %}
+          - RTO Analysis: {{ '✅ Within Targets' if 'rto_exceeded' not in rto_analysis.stdout else '⚠️ Exceeds Targets' }}
+
+          📄 Full report: {{ test_restore_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_dr_test_report.txt
+
+          🔍 Next Steps:
+          {% if dry_run %}
+          - Run live test: -e "dry_run=false"
+          {% endif %}
+          - Address any identified issues
+          - Update DR procedures
+          - Schedule regular DR tests
+
+          ======================================================
+
+    - name: Send DR test alerts (if issues found)
+      debug:
+        msg: |
+          🚨 DR TEST ALERT - {{ inventory_hostname }}
+          Critical issues found in disaster recovery test!
+          Immediate attention required.
+      when:
+        - send_alerts | default(false) | bool
+        - ("BACKUP ISSUES DETECTED" in backup_validation.stdout) or ("restore_failed" in restore_test.stdout)
--- a/ansible/automation/playbooks/disk_usage_report.yml
+++ b/ansible/automation/playbooks/disk_usage_report.yml
@@ -0,0 +1,311 @@
+---
+# Disk Usage Report Playbook
+# Monitor storage usage across all hosts and generate comprehensive reports
+# Usage: ansible-playbook playbooks/disk_usage_report.yml
+# Usage: ansible-playbook playbooks/disk_usage_report.yml -e "alert_threshold=80"
+# Usage: ansible-playbook playbooks/disk_usage_report.yml -e "detailed_analysis=true"
+
+- name: Generate Comprehensive Disk Usage Report
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    alert_threshold: "{{ alert_threshold | default(85) }}"
+    warning_threshold: "{{ warning_threshold | default(75) }}"
+    detailed_analysis: "{{ detailed_analysis | default(false) }}"
+    report_dir: "/tmp/disk_reports"
+    include_docker_analysis: "{{ include_docker_analysis | default(true) }}"
+    top_directories_count: "{{ top_directories_count | default(10) }}"
+
+  tasks:
+    - name: Create report directory
+      file:
+        path: "{{ report_dir }}/{{ ansible_date_time.date }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+
+    - name: Get basic disk usage
+      shell: df -h
+      register: disk_usage_basic
+      changed_when: false
+
+    - name: Get disk usage percentages
+      shell: df --output=source,pcent,avail,target | grep -v "Filesystem"
+      register: disk_usage_percent
+      changed_when: false
+
+    - name: Identify high usage filesystems
+      shell: |
+        df --output=source,pcent,target | awk 'NR>1 {gsub(/%/, "", $2); if ($2 >= {{ alert_threshold }}) print $0}'
+      register: high_usage_filesystems
+      changed_when: false
+
+    - name: Get inode usage
+      shell: df -i
+      register: inode_usage
+      changed_when: false
+
+    - name: Analyze Docker storage usage
+      shell: |
+        echo "=== DOCKER STORAGE ANALYSIS ==="
+        if command -v docker &> /dev/null; then
+          echo "Docker System Usage:"
+          docker system df 2>/dev/null || echo "Cannot access Docker"
+          echo ""
+
+          echo "Container Sizes:"
+          docker ps --format "table {{.Names}}\t{{.Size}}" 2>/dev/null || echo "Cannot access Docker containers"
+          echo ""
+
+          echo "Image Sizes:"
+          docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}" 2>/dev/null | head -20 || echo "Cannot access Docker images"
+          echo ""
+
+          echo "Volume Usage:"
+          docker volume ls -q | xargs -I {} sh -c 'echo "Volume: {}"; docker volume inspect {} --format "{{.Mountpoint}}" | xargs du -sh 2>/dev/null || echo "Cannot access volume"' 2>/dev/null || echo "Cannot access Docker volumes"
+        else
+          echo "Docker not available"
+        fi
+      register: docker_storage_analysis
+      when: include_docker_analysis | bool
+      changed_when: false
+
+    - name: Find largest directories
+      shell: |
+        echo "=== TOP {{ top_directories_count }} LARGEST DIRECTORIES ==="
+
+        # Find largest directories in common locations
+        for path in / /var /opt /home /volume1 /volume2; do
+          if [ -d "$path" ]; then
+            echo "=== $path ==="
+            du -h "$path"/* 2>/dev/null | sort -hr | head -{{ top_directories_count }} || echo "Cannot analyze $path"
+            echo ""
+          fi
+        done
+      register: largest_directories
+      when: detailed_analysis | bool
+      changed_when: false
+
+    - name: Analyze log file sizes
+      shell: |
+        echo "=== LOG FILE ANALYSIS ==="
+
+        # System logs
+        echo "System Logs:"
+        find /var/log -type f -name "*.log" -exec du -h {} \; 2>/dev/null | sort -hr | head -10 || echo "Cannot access system logs"
+        echo ""
+
+        # Docker logs
+        echo "Docker Container Logs:"
+        if [ -d "/var/lib/docker/containers" ]; then
+          find /var/lib/docker/containers -name "*-json.log" -exec du -h {} \; 2>/dev/null | sort -hr | head -10 || echo "Cannot access Docker logs"
+        fi
+        echo ""
+
+        # Application logs
+        echo "Application Logs:"
+        find /volume1 /opt -name "*.log" -type f -exec du -h {} \; 2>/dev/null | sort -hr | head -10 || echo "No application logs found"
+      register: log_analysis
+      when: detailed_analysis | bool
+      changed_when: false
+
+    - name: Check for large files
+      shell: |
+        echo "=== LARGE FILES (>1GB) ==="
+        find / -type f -size +1G -exec du -h {} \; 2>/dev/null | sort -hr | head -20 || echo "No large files found or permission denied"
+      register: large_files
+      when: detailed_analysis | bool
+      changed_when: false
+
+    - name: Analyze temporary files
+      shell: |
+        echo "=== TEMPORARY FILES ANALYSIS ==="
+
+        for temp_dir in /tmp /var/tmp /volume1/tmp; do
+          if [ -d "$temp_dir" ]; then
+            echo "=== $temp_dir ==="
+            du -sh "$temp_dir" 2>/dev/null || echo "Cannot access $temp_dir"
+            echo "File count: $(find "$temp_dir" -type f 2>/dev/null | wc -l)"
+            echo "Oldest file: $(find "$temp_dir" -type f -printf '%T+ %p\n' 2>/dev/null | sort | head -1 | cut -d' ' -f2- || echo 'None')"
+            echo ""
+          fi
+        done
+      register: temp_files_analysis
+      changed_when: false
+
+    - name: Generate disk usage alerts
+      set_fact:
+        disk_alerts: []
+        disk_warnings: []
+
+    - name: Process disk usage alerts
+      set_fact:
+        disk_alerts: "{{ disk_alerts + [item] }}"
+      loop: "{{ disk_usage_percent.stdout_lines }}"
+      when:
+        - item.split()[1] | regex_replace('%', '') | int >= alert_threshold | int
+      vars:
+        usage_percent: "{{ item.split()[1] | regex_replace('%', '') | int }}"
+
+    - name: Process disk usage warnings
+      set_fact:
+        disk_warnings: "{{ disk_warnings + [item] }}"
+      loop: "{{ disk_usage_percent.stdout_lines }}"
+      when:
+        - item.split()[1] | regex_replace('%', '') | int >= warning_threshold | int
+        - item.split()[1] | regex_replace('%', '') | int < alert_threshold | int
+
+    - name: Create comprehensive report
+      copy:
+        content: |
+          📊 DISK USAGE REPORT - {{ inventory_hostname }}
+          =============================================
+
+          📅 Generated: {{ ansible_date_time.iso8601 }}
+          🖥️  Host: {{ inventory_hostname }}
+          💿 OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
+          ⚠️  Alert Threshold: {{ alert_threshold }}%
+          ⚡ Warning Threshold: {{ warning_threshold }}%
+
+          🚨 CRITICAL ALERTS (>={{ alert_threshold }}%):
+          {% if disk_alerts | length > 0 %}
+          {% for alert in disk_alerts %}
+          ❌ {{ alert }}
+          {% endfor %}
+          {% else %}
+          ✅ No critical disk usage alerts
+          {% endif %}
+
+          ⚠️  WARNINGS (>={{ warning_threshold }}%):
+          {% if disk_warnings | length > 0 %}
+          {% for warning in disk_warnings %}
+          🟡 {{ warning }}
+          {% endfor %}
+          {% else %}
+          ✅ No disk usage warnings
+          {% endif %}
+
+          💾 FILESYSTEM USAGE:
+          {{ disk_usage_basic.stdout }}
+
+          📁 INODE USAGE:
+          {{ inode_usage.stdout }}
+
+          🧹 TEMPORARY FILES:
+          {{ temp_files_analysis.stdout }}
+
+          {% if include_docker_analysis and docker_storage_analysis.stdout is defined %}
+          🐳 DOCKER STORAGE:
+          {{ docker_storage_analysis.stdout }}
+          {% endif %}
+
+          {% if detailed_analysis %}
+          {% if largest_directories.stdout is defined %}
+          📂 LARGEST DIRECTORIES:
+          {{ largest_directories.stdout }}
+          {% endif %}
+
+          {% if log_analysis.stdout is defined %}
+          📝 LOG FILES:
+          {{ log_analysis.stdout }}
+          {% endif %}
+
+          {% if large_files.stdout is defined %}
+          📦 LARGE FILES:
+          {{ large_files.stdout }}
+          {% endif %}
+          {% endif %}
+
+          💡 RECOMMENDATIONS:
+          {% if disk_alerts | length > 0 %}
+          - 🚨 IMMEDIATE ACTION REQUIRED: Clean up filesystems above {{ alert_threshold }}%
+          {% endif %}
+          {% if disk_warnings | length > 0 %}
+          - ⚠️  Monitor filesystems above {{ warning_threshold }}%
+          {% endif %}
+          - 🧹 Run cleanup playbook: ansible-playbook playbooks/cleanup_old_backups.yml
+          - 🐳 Prune Docker: ansible-playbook playbooks/prune_containers.yml
+          - 📝 Rotate logs: ansible-playbook playbooks/log_rotation.yml
+          - 🗑️  Clean temp files: find /tmp -type f -mtime +7 -delete
+
+          📊 SUMMARY:
+          - Total Filesystems: {{ disk_usage_percent.stdout_lines | length }}
+          - Critical Alerts: {{ disk_alerts | length }}
+          - Warnings: {{ disk_warnings | length }}
+          - Docker Analysis: {{ 'Included' if include_docker_analysis else 'Skipped' }}
+          - Detailed Analysis: {{ 'Included' if detailed_analysis else 'Skipped' }}
+
+        dest: "{{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_disk_report.txt"
+      delegate_to: localhost
+
+    - name: Create JSON report for automation
+      copy:
+        content: |
+          {
+            "timestamp": "{{ ansible_date_time.iso8601 }}",
+            "hostname": "{{ inventory_hostname }}",
+            "thresholds": {
+              "alert": {{ alert_threshold }},
+              "warning": {{ warning_threshold }}
+            },
+            "alerts": {{ disk_alerts | to_json }},
+            "warnings": {{ disk_warnings | to_json }},
+            "filesystems": {{ disk_usage_percent.stdout_lines | to_json }},
+            "summary": {
+              "total_filesystems": {{ disk_usage_percent.stdout_lines | length }},
+              "critical_count": {{ disk_alerts | length }},
+              "warning_count": {{ disk_warnings | length }},
+              "status": "{% if disk_alerts | length > 0 %}CRITICAL{% elif disk_warnings | length > 0 %}WARNING{% else %}OK{% endif %}"
+            }
+          }
+        dest: "{{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_disk_report.json"
+      delegate_to: localhost
+
+    - name: Display summary
+      debug:
+        msg: |
+
+          📊 DISK USAGE REPORT COMPLETE - {{ inventory_hostname }}
+          ================================================
+
+          {% if disk_alerts | length > 0 %}
+          🚨 CRITICAL ALERTS: {{ disk_alerts | length }}
+          {% for alert in disk_alerts %}
+          ❌ {{ alert }}
+          {% endfor %}
+          {% endif %}
+
+          {% if disk_warnings | length > 0 %}
+          ⚠️  WARNINGS: {{ disk_warnings | length }}
+          {% for warning in disk_warnings %}
+          🟡 {{ warning }}
+          {% endfor %}
+          {% endif %}
+
+          {% if disk_alerts | length == 0 and disk_warnings | length == 0 %}
+          ✅ All filesystems within normal usage levels
+          {% endif %}
+
+          📄 Reports saved to:
+          - {{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_disk_report.txt
+          - {{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_disk_report.json
+
+          🔍 Next Steps:
+          {% if disk_alerts | length > 0 %}
+          - Run cleanup: ansible-playbook playbooks/cleanup_old_backups.yml
+          - Prune Docker: ansible-playbook playbooks/prune_containers.yml
+          {% endif %}
+          - Schedule regular monitoring via cron
+
+          ================================================
+
+    - name: Send alert if critical usage detected
+      debug:
+        msg: |
+          🚨 CRITICAL DISK USAGE ALERT 🚨
+          Host: {{ inventory_hostname }}
+          Critical filesystems: {{ disk_alerts | length }}
+          Immediate action required!
+      when:
+        - disk_alerts | length > 0
+        - send_alerts | default(false) | bool
--- a/ansible/automation/playbooks/health_check.yml
+++ b/ansible/automation/playbooks/health_check.yml
@@ -0,0 +1,246 @@
+---
+- name: Comprehensive Health Check
+  hosts: all
+  gather_facts: yes
+  vars:
+    health_check_timestamp: "{{ ansible_date_time.iso8601 }}"
+    critical_services:
+      - docker
+      - ssh
+      - tailscaled
+    health_thresholds:
+      cpu_warning: 80
+      cpu_critical: 95
+      memory_warning: 85
+      memory_critical: 95
+      disk_warning: 85
+      disk_critical: 95
+
+  tasks:
+    - name: Create health check report directory
+      file:
+        path: "/tmp/health_reports"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    - name: Check system uptime
+      shell: uptime -p
+      register: system_uptime
+      changed_when: false
+
+    - name: Check CPU usage
+      shell: |
+        top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1 | cut -d',' -f1
+      register: cpu_usage
+      changed_when: false
+
+    - name: Check memory usage
+      shell: |
+        free | awk 'NR==2{printf "%.1f", $3*100/$2}'
+      register: memory_usage
+      changed_when: false
+
+    - name: Check disk usage
+      shell: |
+        df -h / | awk 'NR==2{print $5}' | sed 's/%//'
+      register: disk_usage
+      changed_when: false
+
+    - name: Check load average
+      shell: |
+        uptime | awk -F'load average:' '{print $2}' | sed 's/^ *//'
+      register: load_average
+      changed_when: false
+
+    - name: Check critical services (systemd hosts only)
+      systemd:
+        name: "{{ item }}"
+      register: service_status
+      loop: "{{ critical_services }}"
+      ignore_errors: yes
+      when: ansible_service_mgr == "systemd"
+
+    - name: Check critical services via pgrep (non-systemd hosts — Synology DSM etc.)
+      shell: "pgrep -x {{ item }} >/dev/null 2>&1 && echo 'active' || echo 'inactive'"
+      register: service_status_pgrep
+      loop: "{{ critical_services }}"
+      changed_when: false
+      ignore_errors: yes
+      when: ansible_service_mgr != "systemd"
+
+    - name: Check Docker containers (if Docker is running)
+      shell: |
+        if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
+          echo "Running: $(docker ps -q | wc -l)"
+          echo "Total: $(docker ps -aq | wc -l)"
+          echo "Unhealthy: $(docker ps --filter health=unhealthy -q | wc -l)"
+        else
+          echo "Docker not available"
+        fi
+      register: docker_status
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check network connectivity
+      shell: |
+        ping -c 1 8.8.8.8 >/dev/null 2>&1 && echo "OK" || echo "FAILED"
+      register: internet_check
+      changed_when: false
+
+    - name: Check Tailscale status
+      shell: |
+        if command -v tailscale >/dev/null 2>&1; then
+          tailscale status --json | jq -r '.Self.Online' 2>/dev/null || echo "unknown"
+        else
+          echo "not_installed"
+        fi
+      register: tailscale_status
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Evaluate health status
+      set_fact:
+        health_status:
+          overall: >-
+            {{
+              'CRITICAL' if (
+                (cpu_usage.stdout | float > health_thresholds.cpu_critical) or
+                (memory_usage.stdout | float > health_thresholds.memory_critical) or
+                (disk_usage.stdout | int > health_thresholds.disk_critical) or
+                (internet_check.stdout == "FAILED")
+              ) else 'WARNING' if (
+                (cpu_usage.stdout | float > health_thresholds.cpu_warning) or
+                (memory_usage.stdout | float > health_thresholds.memory_warning) or
+                (disk_usage.stdout | int > health_thresholds.disk_warning)
+              ) else 'HEALTHY'
+            }}
+          cpu: "{{ cpu_usage.stdout | float }}"
+          memory: "{{ memory_usage.stdout | float }}"
+          disk: "{{ disk_usage.stdout | int }}"
+          uptime: "{{ system_uptime.stdout }}"
+          load: "{{ load_average.stdout }}"
+          internet: "{{ internet_check.stdout }}"
+          tailscale: "{{ tailscale_status.stdout }}"
+
+    - name: Display health report
+      debug:
+        msg: |
+
+          ==========================================
+          🏥 HEALTH CHECK REPORT - {{ inventory_hostname }}
+          ==========================================
+
+          📊 OVERALL STATUS: {{ health_status.overall }}
+
+          🖥️  SYSTEM METRICS:
+          - Uptime: {{ health_status.uptime }}
+          - CPU Usage: {{ health_status.cpu }}%
+          - Memory Usage: {{ health_status.memory }}%
+          - Disk Usage: {{ health_status.disk }}%
+          - Load Average: {{ health_status.load }}
+
+          🌐 CONNECTIVITY:
+          - Internet: {{ health_status.internet }}
+          - Tailscale: {{ health_status.tailscale }}
+
+          🐳 DOCKER STATUS:
+          {{ docker_status.stdout }}
+
+          🔧 CRITICAL SERVICES:
+          {% if ansible_service_mgr == "systemd" and service_status is defined %}
+          {% for result in service_status.results %}
+          {% if result.status is defined and result.status.ActiveState is defined %}
+          - {{ result.item }}: {{ 'RUNNING' if result.status.ActiveState == 'active' else 'STOPPED' }}
+          {% elif not result.skipped | default(false) %}
+          - {{ result.item }}: UNKNOWN
+          {% endif %}
+          {% endfor %}
+          {% elif service_status_pgrep is defined %}
+          {% for result in service_status_pgrep.results %}
+          - {{ result.item }}: {{ 'RUNNING' if result.stdout == 'active' else 'STOPPED' }}
+          {% endfor %}
+          {% else %}
+          - Service status not available
+          {% endif %}
+
+          ==========================================
+
+    - name: Generate JSON health report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ health_check_timestamp }}",
+            "hostname": "{{ inventory_hostname }}",
+            "overall_status": "{{ health_status.overall }}",
+            "system": {
+              "uptime": "{{ health_status.uptime }}",
+              "cpu_usage": {{ health_status.cpu }},
+              "memory_usage": {{ health_status.memory }},
+              "disk_usage": {{ health_status.disk }},
+              "load_average": "{{ health_status.load }}"
+            },
+            "connectivity": {
+              "internet": "{{ health_status.internet }}",
+              "tailscale": "{{ health_status.tailscale }}"
+            },
+            "docker": "{{ docker_status.stdout | replace('\n', ' ') }}",
+            "services": [
+              {% if ansible_service_mgr == "systemd" and service_status is defined %}
+              {% set ns = namespace(first=true) %}
+              {% for result in service_status.results %}
+              {% if result.status is defined and result.status.ActiveState is defined %}
+              {% if not ns.first %},{% endif %}
+              {
+                "name": "{{ result.item }}",
+                "status": "{{ result.status.ActiveState }}",
+                "enabled": {{ (result.status.UnitFileState | default('unknown')) == "enabled" }}
+              }
+              {% set ns.first = false %}
+              {% endif %}
+              {% endfor %}
+              {% elif service_status_pgrep is defined %}
+              {% set ns = namespace(first=true) %}
+              {% for result in service_status_pgrep.results %}
+              {% if not ns.first %},{% endif %}
+              {
+                "name": "{{ result.item }}",
+                "status": "{{ result.stdout | default('unknown') }}",
+                "enabled": null
+              }
+              {% set ns.first = false %}
+              {% endfor %}
+              {% endif %}
+            ]
+          }
+        dest: "/tmp/health_reports/{{ inventory_hostname }}_health_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Send alert for critical status
+      shell: |
+        if command -v curl >/dev/null 2>&1; then
+          curl -d "🚨 CRITICAL: {{ inventory_hostname }} health check failed - {{ health_status.overall }}" \
+               -H "Title: Homelab Health Alert" \
+               -H "Priority: urgent" \
+               -H "Tags: warning,health" \
+               "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}" || true
+        fi
+      when: health_status.overall == "CRITICAL"
+      ignore_errors: yes
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          📋 Health check complete for {{ inventory_hostname }}
+          📊 Status: {{ health_status.overall }}
+          📄 Report saved to: /tmp/health_reports/{{ inventory_hostname }}_health_{{ ansible_date_time.epoch }}.json
+
+          {% if health_status.overall == "CRITICAL" %}
+          🚨 CRITICAL issues detected - immediate attention required!
+          {% elif health_status.overall == "WARNING" %}
+          ⚠️  WARNING conditions detected - monitoring recommended
+          {% else %}
+          ✅ System is healthy
+          {% endif %}
--- a/ansible/automation/playbooks/install_tools.yml
+++ b/ansible/automation/playbooks/install_tools.yml
@@ -0,0 +1,17 @@
+---
+- name: Install common diagnostic tools
+  hosts: all
+  become: true
+  tasks:
+    - name: Install essential packages
+      package:
+        name:
+          - htop
+          - curl
+          - wget
+          - net-tools
+          - iperf3
+          - ncdu
+          - vim
+          - git
+        state: present
--- a/ansible/automation/playbooks/log_rotation.yml
+++ b/ansible/automation/playbooks/log_rotation.yml
@@ -0,0 +1,347 @@
+---
+# Log Rotation and Cleanup Playbook
+# Manage log files across all services and system components
+# Usage: ansible-playbook playbooks/log_rotation.yml
+# Usage: ansible-playbook playbooks/log_rotation.yml -e "aggressive_cleanup=true"
+# Usage: ansible-playbook playbooks/log_rotation.yml -e "dry_run=true"
+
+- name: Log Rotation and Cleanup
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    _dry_run: "{{ dry_run | default(false) }}"
+    _aggressive_cleanup: "{{ aggressive_cleanup | default(false) }}"
+    _max_log_age_days: "{{ max_log_age_days | default(30) }}"
+    _max_log_size: "{{ max_log_size | default('100M') }}"
+    _keep_compressed_logs: "{{ keep_compressed_logs | default(true) }}"
+    _compress_old_logs: "{{ compress_old_logs | default(true) }}"
+
+  tasks:
+    - name: Create log cleanup report directory
+      file:
+        path: "/tmp/log_cleanup/{{ ansible_date_time.date }}"
+        state: directory
+        mode: '0755'
+
+    - name: Display log cleanup plan
+      debug:
+        msg: |
+          LOG ROTATION AND CLEANUP PLAN
+          ================================
+          Host: {{ inventory_hostname }}
+          Date: {{ ansible_date_time.date }}
+          Dry Run: {{ _dry_run }}
+          Aggressive: {{ _aggressive_cleanup }}
+          Max Age: {{ _max_log_age_days }} days
+          Max Size: {{ _max_log_size }}
+          Compress: {{ _compress_old_logs }}
+
+    - name: Analyze current log usage
+      shell: |
+        echo "=== LOG USAGE ANALYSIS ==="
+
+        echo "--- SYSTEM LOGS ---"
+        if [ -d "/var/log" ]; then
+          system_log_size=$(du -sh /var/log 2>/dev/null | cut -f1 || echo "0")
+          system_log_count=$(find /var/log -type f -name "*.log" 2>/dev/null | wc -l)
+          echo "System logs: $system_log_size ($system_log_count files)"
+          echo "Largest system logs:"
+          find /var/log -type f -name "*.log" -exec du -h {} \; 2>/dev/null | sort -hr | head -10 || echo "No system logs found"
+        fi
+
+        echo ""
+        echo "--- DOCKER CONTAINER LOGS ---"
+        if [ -d "/var/lib/docker/containers" ]; then
+          docker_log_size=$(du -sh /var/lib/docker/containers 2>/dev/null | cut -f1 || echo "0")
+          docker_log_count=$(find /var/lib/docker/containers -name "*-json.log" 2>/dev/null | wc -l)
+          echo "Docker logs: $docker_log_size ($docker_log_count files)"
+          echo "Largest container logs:"
+          find /var/lib/docker/containers -name "*-json.log" -exec du -h {} \; 2>/dev/null | sort -hr | head -10 || echo "No Docker logs found"
+        fi
+
+        echo ""
+        echo "--- APPLICATION LOGS ---"
+        for log_dir in /volume1/docker /opt/docker; do
+          if [ -d "$log_dir" ]; then
+            app_logs=$(timeout 15 find "$log_dir" -maxdepth 4 -name "*.log" -type f 2>/dev/null | head -20)
+            if [ -n "$app_logs" ]; then
+              echo "Application logs in $log_dir:"
+              echo "$app_logs" | while read log_file; do
+                if [ -f "$log_file" ]; then
+                  du -h "$log_file" 2>/dev/null || echo "Cannot access $log_file"
+                fi
+              done
+            fi
+          fi
+        done
+
+        echo ""
+        echo "--- LARGE LOG FILES (>{{ _max_log_size }}) ---"
+        timeout 15 find /var/log /var/lib/docker/containers -name "*.log" -size +{{ _max_log_size }} -type f 2>/dev/null | head -20 | while read large_log; do
+          du -h "$large_log" 2>/dev/null || echo "? $large_log"
+        done || echo "No large log files found"
+
+        echo ""
+        echo "--- OLD LOG FILES (>{{ _max_log_age_days }} days) ---"
+        old_logs=$(timeout 15 find /var/log /var/lib/docker/containers -name "*.log" -mtime +{{ _max_log_age_days }} -type f 2>/dev/null | wc -l)
+        echo "Old log files found: $old_logs"
+      register: log_analysis
+      changed_when: false
+
+    - name: Rotate system logs
+      shell: |
+        echo "=== SYSTEM LOG ROTATION ==="
+        rotated_list=""
+
+        {% if _dry_run %}
+        echo "DRY RUN: System log rotation simulation"
+        if command -v logrotate >/dev/null 2>&1; then
+          echo "Would run: logrotate -d /etc/logrotate.conf"
+          logrotate -d /etc/logrotate.conf 2>/dev/null | head -20 || echo "Logrotate config not found"
+        fi
+        {% else %}
+        if command -v logrotate >/dev/null 2>&1; then
+          echo "Running logrotate..."
+          logrotate -f /etc/logrotate.conf 2>/dev/null && echo "System log rotation completed" || echo "Logrotate had issues"
+          rotated_list="system_logs"
+        else
+          echo "Logrotate not available"
+        fi
+
+        for log_file in /var/log/syslog /var/log/auth.log /var/log/kern.log; do
+          if [ -f "$log_file" ]; then
+            file_size=$(stat -c%s "$log_file" 2>/dev/null || echo 0)
+            if [ "$file_size" -gt 104857600 ]; then
+              echo "Rotating large log: $log_file"
+              {% if _compress_old_logs %}
+              gzip -c "$log_file" > "$log_file.$(date +%Y%m%d).gz" && > "$log_file"
+              {% else %}
+              cp "$log_file" "$log_file.$(date +%Y%m%d)" && > "$log_file"
+              {% endif %}
+              rotated_list="$rotated_list $(basename $log_file)"
+            fi
+          fi
+        done
+        {% endif %}
+
+        echo "ROTATION SUMMARY: $rotated_list"
+        if [ -z "$rotated_list" ]; then
+          echo "No logs needed rotation"
+        fi
+      register: system_log_rotation
+
+    - name: Manage Docker container logs
+      shell: |
+        echo "=== DOCKER LOG MANAGEMENT ==="
+        managed_count=0
+        total_space_saved=0
+
+        {% if _dry_run %}
+        echo "DRY RUN: Docker log management simulation"
+        large_logs=$(find /var/lib/docker/containers -name "*-json.log" -size +{{ _max_log_size }} 2>/dev/null)
+        if [ -n "$large_logs" ]; then
+          echo "Would truncate large container logs:"
+          echo "$large_logs" | while read log_file; do
+            size=$(du -h "$log_file" 2>/dev/null | cut -f1)
+            container_id=$(basename $(dirname "$log_file"))
+            container_name=$(docker ps -a --filter "id=$container_id" --format '{% raw %}{{.Names}}{% endraw %}' 2>/dev/null || echo "unknown")
+            echo "  - $container_name: $size"
+          done
+        else
+          echo "No large container logs found"
+        fi
+        {% else %}
+        find /var/lib/docker/containers -name "*-json.log" -size +{{ _max_log_size }} 2>/dev/null | while read log_file; do
+          if [ -f "$log_file" ]; then
+            container_id=$(basename $(dirname "$log_file"))
+            container_name=$(docker ps -a --filter "id=$container_id" --format '{% raw %}{{.Names}}{% endraw %}' 2>/dev/null || echo "unknown")
+            size_before=$(stat -c%s "$log_file" 2>/dev/null || echo 0)
+            echo "Truncating log for container: $container_name"
+            tail -1000 "$log_file" > "$log_file.tmp" && mv "$log_file.tmp" "$log_file"
+            size_after=$(stat -c%s "$log_file" 2>/dev/null || echo 0)
+            space_saved=$((size_before - size_after))
+            echo "  Truncated: $(echo $space_saved | numfmt --to=iec 2>/dev/null || echo ${space_saved}B) saved"
+          fi
+        done
+
+        {% if _aggressive_cleanup %}
+        echo "Cleaning old Docker log files..."
+        find /var/lib/docker/containers -name "*.log.*" -mtime +{{ _max_log_age_days }} -delete 2>/dev/null
+        {% endif %}
+        {% endif %}
+
+        echo "DOCKER LOG SUMMARY: done"
+      register: docker_log_management
+
+    - name: Clean up application logs
+      shell: |
+        echo "=== APPLICATION LOG CLEANUP ==="
+        cleaned_count=0
+
+        {% if _dry_run %}
+        echo "DRY RUN: Application log cleanup simulation"
+        for log_dir in /volume1/docker /opt/docker; do
+          if [ -d "$log_dir" ]; then
+            old_app_logs=$(timeout 15 find "$log_dir" -maxdepth 4 -name "*.log" -mtime +{{ _max_log_age_days }} -type f 2>/dev/null)
+            if [ -n "$old_app_logs" ]; then
+              echo "Would clean logs in $log_dir:"
+              echo "$old_app_logs" | head -10
+            fi
+          fi
+        done
+        {% else %}
+        for log_dir in /volume1/docker /opt/docker; do
+          if [ -d "$log_dir" ]; then
+            echo "Cleaning logs in $log_dir..."
+
+            {% if _compress_old_logs %}
+            find "$log_dir" -name "*.log" -mtime +7 -mtime -{{ _max_log_age_days }} -type f 2>/dev/null | while read log_file; do
+              if [ -f "$log_file" ]; then
+                gzip "$log_file" 2>/dev/null && echo "  Compressed: $(basename $log_file)"
+              fi
+            done
+            {% endif %}
+
+            old_logs_removed=$(find "$log_dir" -name "*.log" -mtime +{{ _max_log_age_days }} -type f -delete -print 2>/dev/null | wc -l)
+            {% if _keep_compressed_logs %}
+            max_gz_age=$(({{ _max_log_age_days }} * 2))
+            old_gz_removed=$(find "$log_dir" -name "*.log.gz" -mtime +$max_gz_age -type f -delete -print 2>/dev/null | wc -l)
+            {% else %}
+            old_gz_removed=$(find "$log_dir" -name "*.log.gz" -mtime +{{ _max_log_age_days }} -type f -delete -print 2>/dev/null | wc -l)
+            {% endif %}
+
+            if [ "$old_logs_removed" -gt 0 ] || [ "$old_gz_removed" -gt 0 ]; then
+              echo "  Cleaned $old_logs_removed logs, $old_gz_removed compressed logs"
+            fi
+          fi
+        done
+        {% endif %}
+
+        echo "APPLICATION CLEANUP SUMMARY: done"
+      register: app_log_cleanup
+
+    - name: Configure log rotation for services
+      shell: |
+        echo "=== LOG ROTATION CONFIGURATION ==="
+        config_changed="no"
+
+        {% if _dry_run %}
+        echo "DRY RUN: Would configure log rotation"
+        {% else %}
+        logrotate_config="/etc/logrotate.d/docker-containers"
+
+        if [ ! -f "$logrotate_config" ]; then
+          echo "Creating Docker container log rotation config..."
+          printf '%s\n' '/var/lib/docker/containers/*/*.log {' '    rotate 7' '    daily' '    compress' '    size 100M' '    missingok' '    delaycompress' '    copytruncate' '}' > "$logrotate_config"
+          config_changed="yes"
+          echo "  Docker container log rotation configured"
+        fi
+
+        docker_config="/etc/docker/daemon.json"
+        if [ -f "$docker_config" ]; then
+          if ! grep -q "log-driver" "$docker_config" 2>/dev/null; then
+            echo "Docker daemon log configuration recommended"
+            cp "$docker_config" "$docker_config.backup.$(date +%Y%m%d)"
+            echo "  Manual Docker daemon config update recommended"
+            echo '  Add: "log-driver": "json-file", "log-opts": {"max-size": "{{ _max_log_size }}", "max-file": "3"}'
+          fi
+        fi
+        {% endif %}
+
+        echo "CONFIGURATION SUMMARY: config_changed=$config_changed"
+      register: log_rotation_config
+
+    - name: Generate log cleanup report
+      copy:
+        content: |
+          LOG ROTATION AND CLEANUP REPORT - {{ inventory_hostname }}
+          ==========================================================
+
+          Cleanup Date: {{ ansible_date_time.iso8601 }}
+          Host: {{ inventory_hostname }}
+          Dry Run: {{ _dry_run }}
+          Aggressive Mode: {{ _aggressive_cleanup }}
+          Max Age: {{ _max_log_age_days }} days
+          Max Size: {{ _max_log_size }}
+
+          LOG USAGE ANALYSIS:
+          {{ log_analysis.stdout }}
+
+          SYSTEM LOG ROTATION:
+          {{ system_log_rotation.stdout }}
+
+          DOCKER LOG MANAGEMENT:
+          {{ docker_log_management.stdout }}
+
+          APPLICATION LOG CLEANUP:
+          {{ app_log_cleanup.stdout }}
+
+          CONFIGURATION UPDATES:
+          {{ log_rotation_config.stdout }}
+
+          RECOMMENDATIONS:
+          - Schedule regular log rotation via cron
+          - Monitor disk usage: ansible-playbook playbooks/disk_usage_report.yml
+          - Configure application-specific log rotation
+          - Set up log monitoring and alerting
+          {% if not _dry_run %}
+          - Verify services are functioning after log cleanup
+          {% endif %}
+
+          CLEANUP COMPLETE
+
+        dest: "/tmp/log_cleanup/{{ ansible_date_time.date }}/{{ inventory_hostname }}_log_cleanup_report.txt"
+
+    - name: Display log cleanup summary
+      debug:
+        msg: |
+
+          LOG CLEANUP COMPLETE - {{ inventory_hostname }}
+          ==========================================
+
+          Date: {{ ansible_date_time.date }}
+          Mode: {{ 'Dry Run' if _dry_run else 'Live Cleanup' }}
+          Aggressive: {{ _aggressive_cleanup }}
+
+          ACTIONS TAKEN:
+          {{ system_log_rotation.stdout | regex_replace('\n.*', '') }}
+          {{ docker_log_management.stdout | regex_replace('\n.*', '') }}
+          {{ app_log_cleanup.stdout | regex_replace('\n.*', '') }}
+
+          Full report: /tmp/log_cleanup/{{ ansible_date_time.date }}/{{ inventory_hostname }}_log_cleanup_report.txt
+
+          Next Steps:
+          {% if _dry_run %}
+          - Run without dry_run to perform actual cleanup
+          {% endif %}
+          - Monitor disk usage improvements
+          - Schedule regular log rotation
+          - Verify service functionality
+
+          ==========================================
+
+    - name: Restart services if needed
+      shell: |
+        echo "=== SERVICE RESTART CHECK ==="
+        restart_needed="no"
+
+        if systemctl is-active --quiet rsyslog 2>/dev/null && echo "{{ system_log_rotation.stdout }}" | grep -q "system_logs"; then
+          restart_needed="yes"
+          {% if not _dry_run %}
+          echo "Restarting rsyslog..."
+          systemctl restart rsyslog && echo "  rsyslog restarted" || echo "  Failed to restart rsyslog"
+          {% else %}
+          echo "DRY RUN: Would restart rsyslog"
+          {% endif %}
+        fi
+
+        if echo "{{ log_rotation_config.stdout }}" | grep -q "docker"; then
+          echo "Docker daemon config changed - manual restart may be needed"
+          echo "  Run: sudo systemctl restart docker"
+        fi
+
+        if [ "$restart_needed" = "no" ]; then
+          echo "No services need restarting"
+        fi
+      register: service_restart
+      when: restart_services | default(true) | bool
--- a/ansible/automation/playbooks/network_connectivity.yml
+++ b/ansible/automation/playbooks/network_connectivity.yml
@@ -0,0 +1,234 @@
+---
+# Network Connectivity Playbook
+# Full mesh connectivity check: Tailscale status, ping matrix, SSH port reachability,
+# HTTP endpoint checks, and per-host JSON reports.
+# Usage: ansible-playbook playbooks/network_connectivity.yml
+# Usage: ansible-playbook playbooks/network_connectivity.yml -e "host_target=synology"
+
+- name: Network Connectivity Check
+  hosts: "{{ host_target | default('active') }}"
+  gather_facts: yes
+  ignore_unreachable: true
+
+  vars:
+    ntfy_url: "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}"
+    report_dir: "/tmp/connectivity_reports"
+    ts_candidates:
+      - /usr/bin/tailscale
+      - /var/packages/Tailscale/target/bin/tailscale
+    http_endpoints:
+      - name: Portainer
+        url: "http://100.67.40.126:9000"
+      - name: Gitea
+        url: "http://100.67.40.126:3000"
+      - name: Immich
+        url: "http://100.67.40.126:2283"
+      - name: Home Assistant
+        url: "http://100.112.186.90:8123"
+
+  tasks:
+
+    # ---------- Setup ----------
+
+    - name: Create connectivity report directory
+      ansible.builtin.file:
+        path: "{{ report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    # ---------- Tailscale detection ----------
+
+    - name: Detect Tailscale binary path (first candidate that exists)
+      ansible.builtin.shell: |
+        for p in {{ ts_candidates | join(' ') }}; do
+          [ -x "$p" ] && echo "$p" && exit 0
+        done
+        echo ""
+      register: ts_bin
+      changed_when: false
+      failed_when: false
+
+    - name: Get Tailscale status JSON (if binary found)
+      ansible.builtin.command: "{{ ts_bin.stdout }} status --json"
+      register: ts_status_raw
+      changed_when: false
+      failed_when: false
+      when: ts_bin.stdout | length > 0
+
+    - name: Parse Tailscale status JSON
+      ansible.builtin.set_fact:
+        ts_parsed: "{{ ts_status_raw.stdout | from_json }}"
+      when:
+        - ts_bin.stdout | length > 0
+        - ts_status_raw.rc is defined
+        - ts_status_raw.rc == 0
+        - ts_status_raw.stdout | length > 0
+        - ts_status_raw.stdout is search('{')
+
+    - name: Extract Tailscale BackendState and first IP
+      ansible.builtin.set_fact:
+        ts_backend_state: "{{ ts_parsed.BackendState | default('unknown') }}"
+        ts_first_ip: "{{ (ts_parsed.Self.TailscaleIPs | default([]))[0] | default('n/a') }}"
+      when: ts_parsed is defined
+
+    - name: Set Tailscale defaults when binary not found or parse failed
+      ansible.builtin.set_fact:
+        ts_backend_state: "{{ ts_backend_state | default('not_installed') }}"
+        ts_first_ip: "{{ ts_first_ip | default('n/a') }}"
+
+    # ---------- Ping matrix (all active hosts except self) ----------
+
+    - name: Ping all other active hosts (2 pings, 2s timeout)
+      ansible.builtin.command: >
+        ping -c 2 -W 2 {{ hostvars[item]['ansible_host'] }}
+      register: ping_results
+      loop: "{{ groups['active'] | difference([inventory_hostname]) }}"
+      loop_control:
+        label: "{{ item }} ({{ hostvars[item]['ansible_host'] }})"
+      changed_when: false
+      failed_when: false
+
+    - name: Build ping summary map
+      ansible.builtin.set_fact:
+        ping_map: >-
+          {{
+            ping_map | default({}) | combine({
+              item.item: {
+                'host': hostvars[item.item]['ansible_host'],
+                'rc': item.rc,
+                'status': 'OK' if item.rc == 0 else 'FAIL'
+              }
+            })
+          }}
+      loop: "{{ ping_results.results }}"
+      loop_control:
+        label: "{{ item.item }}"
+
+    - name: Identify failed ping targets
+      ansible.builtin.set_fact:
+        failed_ping_peers: >-
+          {{
+            ping_results.results
+            | selectattr('rc', 'ne', 0)
+            | map(attribute='item')
+            | list
+          }}
+
+    # ---------- SSH port reachability ----------
+
+    - name: Check SSH port reachability for all other active hosts
+      ansible.builtin.command: >
+        nc -z -w 3
+        {{ hostvars[item]['ansible_host'] }}
+        {{ hostvars[item]['ansible_port'] | default(22) }}
+      register: ssh_results
+      loop: "{{ groups['active'] | difference([inventory_hostname]) }}"
+      loop_control:
+        label: "{{ item }} ({{ hostvars[item]['ansible_host'] }}:{{ hostvars[item]['ansible_port'] | default(22) }})"
+      changed_when: false
+      failed_when: false
+
+    - name: Build SSH reachability summary map
+      ansible.builtin.set_fact:
+        ssh_map: >-
+          {{
+            ssh_map | default({}) | combine({
+              item.item: {
+                'host': hostvars[item.item]['ansible_host'],
+                'port': hostvars[item.item]['ansible_port'] | default(22),
+                'rc': item.rc,
+                'status': 'OK' if item.rc == 0 else 'FAIL'
+              }
+            })
+          }}
+      loop: "{{ ssh_results.results }}"
+      loop_control:
+        label: "{{ item.item }}"
+
+    # ---------- Per-host connectivity summary ----------
+
+    - name: Display per-host connectivity summary
+      ansible.builtin.debug:
+        msg: |
+          ==========================================
+          CONNECTIVITY SUMMARY: {{ inventory_hostname }}
+          ==========================================
+          Tailscale:
+            binary:        {{ ts_bin.stdout if ts_bin.stdout | length > 0 else 'not found' }}
+            backend_state: {{ ts_backend_state }}
+            first_ip:      {{ ts_first_ip }}
+
+          Ping matrix (from {{ inventory_hostname }}):
+          {% for peer, result in (ping_map | default({})).items() %}
+            {{ peer }} ({{ result.host }}): {{ result.status }}
+          {% endfor %}
+
+          SSH port reachability (from {{ inventory_hostname }}):
+          {% for peer, result in (ssh_map | default({})).items() %}
+            {{ peer }} ({{ result.host }}:{{ result.port }}): {{ result.status }}
+          {% endfor %}
+          ==========================================
+
+    # ---------- HTTP endpoint checks (run once from localhost) ----------
+
+    - name: Check HTTP endpoints
+      ansible.builtin.uri:
+        url: "{{ item.url }}"
+        method: GET
+        status_code: [200, 301, 302, 401, 403]
+        timeout: 10
+        validate_certs: false
+      register: http_results
+      loop: "{{ http_endpoints }}"
+      loop_control:
+        label: "{{ item.name }} ({{ item.url }})"
+      delegate_to: localhost
+      run_once: true
+      failed_when: false
+
+    - name: Display HTTP endpoint results
+      ansible.builtin.debug:
+        msg: |
+          ==========================================
+          HTTP ENDPOINT RESULTS
+          ==========================================
+          {% for result in http_results.results %}
+          {{ result.item.name }} ({{ result.item.url }}):
+            status: {{ result.status | default('UNREACHABLE') }}
+            ok:     {{ 'YES' if result.status is defined and result.status in [200, 301, 302, 401, 403] else 'NO' }}
+          {% endfor %}
+          ==========================================
+      delegate_to: localhost
+      run_once: true
+
+    # ---------- ntfy alert for failed ping peers ----------
+
+    - name: Send ntfy alert when peers fail ping
+      ansible.builtin.uri:
+        url: "{{ ntfy_url }}"
+        method: POST
+        body: |
+          Host {{ inventory_hostname }} detected {{ failed_ping_peers | length }} unreachable peer(s):
+          {% for peer in failed_ping_peers %}
+          - {{ peer }} ({{ hostvars[peer]['ansible_host'] }})
+          {% endfor %}
+          Checked at {{ ansible_date_time.iso8601 }}
+        headers:
+          Title: "Homelab Network Alert"
+          Priority: "high"
+          Tags: "warning,network"
+        status_code: [200, 204]
+      delegate_to: localhost
+      failed_when: false
+      when: failed_ping_peers | default([]) | length > 0
+
+    # ---------- Per-host JSON report ----------
+
+    - name: Write per-host JSON connectivity report
+      ansible.builtin.copy:
+        content: "{{ {'timestamp': ansible_date_time.iso8601, 'hostname': inventory_hostname, 'tailscale': {'binary': ts_bin.stdout | default('') | trim, 'backend_state': ts_backend_state, 'first_ip': ts_first_ip}, 'ping_matrix': ping_map | default({}), 'ssh_reachability': ssh_map | default({}), 'failed_ping_peers': failed_ping_peers | default([])} | to_nice_json }}"
+        dest: "{{ report_dir }}/{{ inventory_hostname }}_{{ ansible_date_time.date }}.json"
+      delegate_to: localhost
+      changed_when: false
--- a/ansible/automation/playbooks/ntp_check.yml
+++ b/ansible/automation/playbooks/ntp_check.yml
@@ -0,0 +1,226 @@
+---
+# NTP Check Playbook
+# Read-only audit of time synchronisation across all hosts.
+# Reports the active NTP daemon, current clock offset in milliseconds,
+# and fires ntfy alerts for hosts that exceed the warn/critical thresholds.
+# Usage: ansible-playbook playbooks/ntp_check.yml
+# Usage: ansible-playbook playbooks/ntp_check.yml -e "host_target=rpi"
+# Usage: ansible-playbook playbooks/ntp_check.yml -e "warn_offset_ms=200 critical_offset_ms=500"
+
+- name: NTP Time Sync Check
+  hosts: "{{ host_target | default('active') }}"
+  gather_facts: yes
+  ignore_unreachable: true
+
+  vars:
+    ntfy_url: "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}"
+    report_dir: "/tmp/ntp_reports"
+    warn_offset_ms: "{{ warn_offset_ms | default(500) }}"
+    critical_offset_ms: "{{ critical_offset_ms | default(1000) }}"
+
+  tasks:
+
+    # ---------- Setup ----------
+
+    - name: Create NTP report directory
+      ansible.builtin.file:
+        path: "{{ report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    # ---------- Detect active NTP daemon ----------
+
+    - name: Detect active NTP daemon
+      ansible.builtin.shell: |
+        if command -v chronyc >/dev/null 2>&1 && chronyc tracking >/dev/null 2>&1; then echo "chrony"
+        elif timedatectl show-timesync 2>/dev/null | grep -q ServerName; then echo "timesyncd"
+        elif timedatectl 2>/dev/null | grep -q "NTP service: active"; then echo "timesyncd"
+        elif command -v ntpq >/dev/null 2>&1 && ntpq -p >/dev/null 2>&1; then echo "ntpd"
+        else echo "unknown"
+        fi
+      register: ntp_impl
+      changed_when: false
+      failed_when: false
+
+    # ---------- Chrony offset collection ----------
+
+    - name: Get chrony tracking info (full)
+      ansible.builtin.shell: chronyc tracking 2>/dev/null
+      register: chrony_tracking
+      changed_when: false
+      failed_when: false
+      when: ntp_impl.stdout | trim == "chrony"
+
+    - name: Parse chrony offset in ms
+      ansible.builtin.shell: >
+        chronyc tracking 2>/dev/null
+        | grep "System time"
+        | awk '{sign=($6=="slow")?-1:1; printf "%.3f", sign * $4 * 1000}'
+      register: chrony_offset_raw
+      changed_when: false
+      failed_when: false
+      when: ntp_impl.stdout | trim == "chrony"
+
+    - name: Get chrony sync sources
+      ansible.builtin.shell: chronyc sources -v 2>/dev/null | grep "^\^" | head -3
+      register: chrony_sources
+      changed_when: false
+      failed_when: false
+      when: ntp_impl.stdout | trim == "chrony"
+
+    # ---------- timesyncd offset collection ----------
+
+    - name: Get timesyncd status
+      ansible.builtin.shell: timedatectl show-timesync 2>/dev/null || timedatectl 2>/dev/null
+      register: timesyncd_status
+      changed_when: false
+      failed_when: false
+      when: ntp_impl.stdout | trim == "timesyncd"
+
+    - name: Parse timesyncd offset from journal (ms)
+      ansible.builtin.shell: |
+        raw=$(journalctl -u systemd-timesyncd --since "5 minutes ago" -n 20 --no-pager 2>/dev/null \
+          | grep -oE 'offset[=: ][+-]?[0-9]+(\.[0-9]+)?(ms|us|s)' \
+          | tail -1)
+        if [ -z "$raw" ]; then
+          echo "0"
+          exit 0
+        fi
+        num=$(echo "$raw" | grep -oE '[+-]?[0-9]+(\.[0-9]+)?')
+        unit=$(echo "$raw" | grep -oE '(ms|us|s)$')
+        if [ "$unit" = "us" ]; then
+          awk "BEGIN {printf \"%.3f\", $num / 1000}"
+        elif [ "$unit" = "s" ]; then
+          awk "BEGIN {printf \"%.3f\", $num * 1000}"
+        else
+          printf "%.3f" "$num"
+        fi
+      register: timesyncd_offset_raw
+      changed_when: false
+      failed_when: false
+      when: ntp_impl.stdout | trim == "timesyncd"
+
+    # ---------- ntpd offset collection ----------
+
+    - name: Get ntpd peer table
+      ansible.builtin.shell: ntpq -pn 2>/dev/null | head -10
+      register: ntpd_peers
+      changed_when: false
+      failed_when: false
+      when: ntp_impl.stdout | trim == "ntpd"
+
+    - name: Parse ntpd offset in ms
+      ansible.builtin.shell: >
+        ntpq -p 2>/dev/null
+        | awk 'NR>2 && /^\*/ {printf "%.3f", $9; exit}'
+        || echo "0"
+      register: ntpd_offset_raw
+      changed_when: false
+      failed_when: false
+      when: ntp_impl.stdout | trim == "ntpd"
+
+    # ---------- Unified offset fact ----------
+
+    - name: Set unified ntp_offset_ms fact
+      ansible.builtin.set_fact:
+        ntp_offset_ms: >-
+          {%- set impl = ntp_impl.stdout | trim -%}
+          {%- if impl == "chrony" -%}
+            {{ (chrony_offset_raw.stdout | default('0') | trim) | float }}
+          {%- elif impl == "timesyncd" -%}
+            {{ (timesyncd_offset_raw.stdout | default('0') | trim) | float }}
+          {%- elif impl == "ntpd" -%}
+            {{ (ntpd_offset_raw.stdout | default('0') | trim) | float }}
+          {%- else -%}
+            0
+          {%- endif -%}
+
+    # ---------- Determine sync status ----------
+
+    - name: Determine NTP sync status (OK / WARN / CRITICAL)
+      ansible.builtin.set_fact:
+        ntp_status: >-
+          {%- if ntp_offset_ms | float | abs >= critical_offset_ms | float -%}
+            CRITICAL
+          {%- elif ntp_offset_ms | float | abs >= warn_offset_ms | float -%}
+            WARN
+          {%- else -%}
+            OK
+          {%- endif -%}
+
+    # ---------- Per-host summary ----------
+
+    - name: Display per-host NTP summary
+      ansible.builtin.debug:
+        msg: |
+          ==========================================
+          NTP SUMMARY: {{ inventory_hostname }}
+          ==========================================
+          Daemon:     {{ ntp_impl.stdout | trim }}
+          Offset:     {{ ntp_offset_ms }} ms
+          Status:     {{ ntp_status }}
+          Thresholds: WARN >= {{ warn_offset_ms }} ms  |  CRITICAL >= {{ critical_offset_ms }} ms
+
+          Raw details:
+          {% if ntp_impl.stdout | trim == "chrony" %}
+          --- chronyc tracking ---
+          {{ chrony_tracking.stdout | default('n/a') }}
+          --- chronyc sources ---
+          {{ chrony_sources.stdout | default('n/a') }}
+          {% elif ntp_impl.stdout | trim == "timesyncd" %}
+          --- timedatectl show-timesync ---
+          {{ timesyncd_status.stdout | default('n/a') }}
+          {% elif ntp_impl.stdout | trim == "ntpd" %}
+          --- ntpq peers ---
+          {{ ntpd_peers.stdout | default('n/a') }}
+          {% else %}
+          (no NTP tool found — offset assumed 0)
+          {% endif %}
+          ==========================================
+
+    # ---------- ntfy alert ----------
+
+    - name: Send ntfy alert for hosts exceeding warn threshold
+      ansible.builtin.uri:
+        url: "{{ ntfy_url }}"
+        method: POST
+        body: |
+          Host {{ inventory_hostname }} has NTP offset of {{ ntp_offset_ms }} ms ({{ ntp_status }}).
+          Daemon: {{ ntp_impl.stdout | trim }}
+          Thresholds: WARN >= {{ warn_offset_ms }} ms | CRITICAL >= {{ critical_offset_ms }} ms
+          Checked at {{ ansible_date_time.iso8601 }}
+        headers:
+          Title: "Homelab NTP Alert"
+          Priority: "{{ 'urgent' if ntp_status == 'CRITICAL' else 'high' }}"
+          Tags: "warning,clock"
+        status_code: [200, 204]
+      delegate_to: localhost
+      failed_when: false
+      when: ntp_status in ['WARN', 'CRITICAL']
+
+    # ---------- Per-host JSON report ----------
+
+    - name: Write per-host JSON NTP report
+      ansible.builtin.copy:
+        content: "{{ {
+            'timestamp': ansible_date_time.iso8601,
+            'hostname': inventory_hostname,
+            'ntp_daemon': ntp_impl.stdout | trim,
+            'offset_ms': ntp_offset_ms | float,
+            'status': ntp_status,
+            'thresholds': {
+              'warn_ms': warn_offset_ms,
+              'critical_ms': critical_offset_ms
+            },
+            'raw': {
+              'chrony_tracking': chrony_tracking.stdout | default('') | trim,
+              'chrony_sources': chrony_sources.stdout | default('') | trim,
+              'timesyncd_status': timesyncd_status.stdout | default('') | trim,
+              'ntpd_peers': ntpd_peers.stdout | default('') | trim
+            }
+          } | to_nice_json }}"
+        dest: "{{ report_dir }}/{{ inventory_hostname }}_{{ ansible_date_time.date }}.json"
+      delegate_to: localhost
+      changed_when: false
--- a/ansible/automation/playbooks/prometheus_target_discovery.yml
+++ b/ansible/automation/playbooks/prometheus_target_discovery.yml
@@ -0,0 +1,320 @@
+---
+# Prometheus Target Discovery
+# Auto-discovers containers for monitoring and validates coverage
+# Run with: ansible-playbook -i hosts.ini playbooks/prometheus_target_discovery.yml
+
+- name: Prometheus Target Discovery
+  hosts: all
+  gather_facts: yes
+  vars:
+    prometheus_port: 9090
+    node_exporter_port: 9100
+    cadvisor_port: 8080
+    snmp_exporter_port: 9116
+
+    # Expected exporters by host type
+    expected_exporters:
+      synology:
+        - "node_exporter"
+        - "snmp_exporter"
+      debian_clients:
+        - "node_exporter"
+      hypervisors:
+        - "node_exporter"
+        - "cadvisor"
+
+  tasks:
+    - name: Scan for running exporters
+      shell: |
+        echo "=== Exporter Discovery on {{ inventory_hostname }} ==="
+
+        # Check for node_exporter
+        if netstat -tlnp 2>/dev/null | grep -q ":{{ node_exporter_port }} "; then
+          echo "✓ node_exporter: Port {{ node_exporter_port }} ($(netstat -tlnp 2>/dev/null | grep ":{{ node_exporter_port }} " | awk '{print $7}' | cut -d'/' -f2))"
+        else
+          echo "✗ node_exporter: Not found on port {{ node_exporter_port }}"
+        fi
+
+        # Check for cAdvisor
+        if netstat -tlnp 2>/dev/null | grep -q ":{{ cadvisor_port }} "; then
+          echo "✓ cAdvisor: Port {{ cadvisor_port }}"
+        else
+          echo "✗ cAdvisor: Not found on port {{ cadvisor_port }}"
+        fi
+
+        # Check for SNMP exporter
+        if netstat -tlnp 2>/dev/null | grep -q ":{{ snmp_exporter_port }} "; then
+          echo "✓ snmp_exporter: Port {{ snmp_exporter_port }}"
+        else
+          echo "✗ snmp_exporter: Not found on port {{ snmp_exporter_port }}"
+        fi
+
+        # Check for custom exporters
+        echo ""
+        echo "=== Custom Exporters ==="
+        netstat -tlnp 2>/dev/null | grep -E ":91[0-9][0-9] " | while read line; do
+          port=$(echo "$line" | awk '{print $4}' | cut -d':' -f2)
+          process=$(echo "$line" | awk '{print $7}' | cut -d'/' -f2)
+          echo "Found exporter on port $port: $process"
+        done
+      register: exporter_scan
+
+    - name: Get Docker containers with exposed ports
+      shell: |
+        echo "=== Container Port Mapping ==="
+        if command -v docker >/dev/null 2>&1; then
+          docker ps --format "table {{ '{{' }}.Names{{ '}}' }}\t{{ '{{' }}.Ports{{ '}}' }}" | grep -E ":[0-9]+->|:[0-9]+/tcp" | while IFS=$'\t' read name ports; do
+            echo "Container: $name"
+            echo "Ports: $ports"
+            echo "---"
+          done
+        else
+          echo "Docker not available"
+        fi
+      register: container_ports
+      become: yes
+
+    - name: Test Prometheus metrics endpoints
+      uri:
+        url: "http://{{ ansible_default_ipv4.address }}:{{ item }}/metrics"
+        method: GET
+        timeout: 5
+      register: metrics_test
+      loop:
+        - "{{ node_exporter_port }}"
+        - "{{ cadvisor_port }}"
+        - "{{ snmp_exporter_port }}"
+      failed_when: false
+
+    - name: Analyze metrics endpoints
+      set_fact:
+        available_endpoints: "{{ metrics_test.results | selectattr('status', 'defined') | selectattr('status', 'equalto', 200) | map(attribute='item') | list }}"
+        failed_endpoints: "{{ metrics_test.results | rejectattr('status', 'defined') | map(attribute='item') | list + (metrics_test.results | selectattr('status', 'defined') | rejectattr('status', 'equalto', 200) | map(attribute='item') | list) }}"
+
+    - name: Discover application metrics
+      shell: |
+        echo "=== Application Metrics Discovery ==="
+        app_ports="3000 8080 8081 8090 9091 9093 9094 9115"
+        for port in $app_ports; do
+          if netstat -tln 2>/dev/null | grep -q ":$port "; then
+            if curl -s --connect-timeout 2 "http://localhost:$port/metrics" | head -1 | grep -q "^#"; then
+              echo "✓ Metrics endpoint found: localhost:$port/metrics"
+            elif curl -s --connect-timeout 2 "http://localhost:$port/actuator/prometheus" | head -1 | grep -q "^#"; then
+              echo "✓ Spring Boot metrics: localhost:$port/actuator/prometheus"
+            else
+              echo "? Port $port open but no metrics endpoint detected"
+            fi
+          fi
+        done
+      register: app_metrics_discovery
+
+    - name: Generate Prometheus configuration snippet
+      copy:
+        content: |
+          # Prometheus Target Configuration for {{ inventory_hostname }}
+          # Generated: {{ ansible_date_time.iso8601 }}
+
+          {% if available_endpoints | length > 0 %}
+          - job_name: '{{ inventory_hostname }}-exporters'
+            static_configs:
+              - targets:
+          {% for port in available_endpoints %}
+                - '{{ ansible_default_ipv4.address }}:{{ port }}'
+          {% endfor %}
+            scrape_interval: 15s
+            metrics_path: /metrics
+            labels:
+              host: '{{ inventory_hostname }}'
+              environment: 'homelab'
+          {% endif %}
+
+          {% if inventory_hostname in groups['synology'] %}
+          # SNMP monitoring for Synology {{ inventory_hostname }}
+          - job_name: '{{ inventory_hostname }}-snmp'
+            static_configs:
+              - targets:
+                - '{{ ansible_default_ipv4.address }}'
+            metrics_path: /snmp
+            params:
+              module: [synology]
+            relabel_configs:
+              - source_labels: [__address__]
+                target_label: __param_target
+              - source_labels: [__param_target]
+                target_label: instance
+              - target_label: __address__
+                replacement: '{{ ansible_default_ipv4.address }}:{{ snmp_exporter_port }}'
+            labels:
+              host: '{{ inventory_hostname }}'
+              type: 'synology'
+          {% endif %}
+        dest: "/tmp/prometheus_{{ inventory_hostname }}_targets.yml"
+      delegate_to: localhost
+
+    - name: Check for missing monitoring coverage
+      set_fact:
+        monitoring_gaps: |
+          {% set gaps = [] %}
+          {% if inventory_hostname in groups['synology'] and node_exporter_port not in available_endpoints %}
+          {% set _ = gaps.append('node_exporter missing on Synology') %}
+          {% endif %}
+          {% if inventory_hostname in groups['debian_clients'] and node_exporter_port not in available_endpoints %}
+          {% set _ = gaps.append('node_exporter missing on Debian client') %}
+          {% endif %}
+          {% if ansible_facts.services is defined and 'docker' in ansible_facts.services and cadvisor_port not in available_endpoints %}
+          {% set _ = gaps.append('cAdvisor missing for Docker monitoring') %}
+          {% endif %}
+          {{ gaps }}
+
+    - name: Generate monitoring coverage report
+      copy:
+        content: |
+          # Monitoring Coverage Report - {{ inventory_hostname }}
+          Generated: {{ ansible_date_time.iso8601 }}
+
+          ## Host Information
+          - Hostname: {{ inventory_hostname }}
+          - IP Address: {{ ansible_default_ipv4.address }}
+          - OS: {{ ansible_facts['os_family'] }} {{ ansible_facts['distribution_version'] }}
+          - Groups: {{ group_names | join(', ') }}
+
+          ## Exporter Discovery
+          ```
+          {{ exporter_scan.stdout }}
+          ```
+
+          ## Available Metrics Endpoints
+          {% for endpoint in available_endpoints %}
+          - ✅ http://{{ ansible_default_ipv4.address }}:{{ endpoint }}/metrics
+          {% endfor %}
+
+          {% if failed_endpoints | length > 0 %}
+          ## Failed/Missing Endpoints
+          {% for endpoint in failed_endpoints %}
+          - ❌ http://{{ ansible_default_ipv4.address }}:{{ endpoint }}/metrics
+          {% endfor %}
+          {% endif %}
+
+          ## Container Port Mapping
+          ```
+          {{ container_ports.stdout }}
+          ```
+
+          ## Application Metrics Discovery
+          ```
+          {{ app_metrics_discovery.stdout }}
+          ```
+
+          {% if monitoring_gaps | length > 0 %}
+          ## Monitoring Gaps
+          {% for gap in monitoring_gaps %}
+          - ⚠️ {{ gap }}
+          {% endfor %}
+          {% endif %}
+
+          ## Recommended Actions
+          {% if node_exporter_port not in available_endpoints %}
+          - Install node_exporter for system metrics
+          {% endif %}
+          {% if ansible_facts.services is defined and 'docker' in ansible_facts.services and cadvisor_port not in available_endpoints %}
+          - Install cAdvisor for container metrics
+          {% endif %}
+          {% if inventory_hostname in groups['synology'] and snmp_exporter_port not in available_endpoints %}
+          - Configure SNMP exporter for Synology-specific metrics
+          {% endif %}
+        dest: "/tmp/monitoring_coverage_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md"
+      delegate_to: localhost
+
+    - name: Display monitoring summary
+      debug:
+        msg: |
+          Monitoring Coverage Summary for {{ inventory_hostname }}:
+          - Available Endpoints: {{ available_endpoints | length }}
+          - Failed Endpoints: {{ failed_endpoints | length }}
+          - Monitoring Gaps: {{ monitoring_gaps | length if monitoring_gaps else 0 }}
+          - Prometheus Config: /tmp/prometheus_{{ inventory_hostname }}_targets.yml
+          - Coverage Report: /tmp/monitoring_coverage_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md
+
+# Consolidation task to run on localhost
+- name: Consolidate Prometheus Configuration
+  hosts: localhost
+  gather_facts: no
+  tasks:
+    - name: Combine all target configurations
+      shell: |
+        echo "# Consolidated Prometheus Targets Configuration"
+        echo "# Generated: $(date)"
+        echo ""
+        echo "scrape_configs:"
+
+        for file in /tmp/prometheus_*_targets.yml; do
+          if [ -f "$file" ]; then
+            echo "  # From $(basename $file)"
+            cat "$file" | sed 's/^/  /'
+            echo ""
+          fi
+        done
+      register: consolidated_config
+
+    - name: Save consolidated Prometheus configuration
+      copy:
+        content: "{{ consolidated_config.stdout }}"
+        dest: "/tmp/prometheus_homelab_targets_{{ ansible_date_time.epoch }}.yml"
+
+    - name: Generate monitoring summary report
+      shell: |
+        echo "# Homelab Monitoring Coverage Summary"
+        echo "Generated: $(date)"
+        echo ""
+        echo "## Coverage by Host"
+
+        total_hosts=0
+        monitored_hosts=0
+
+        for file in /tmp/monitoring_coverage_*_*.md; do
+          if [ -f "$file" ]; then
+            host=$(basename "$file" | sed 's/monitoring_coverage_\(.*\)_[0-9]*.md/\1/')
+            endpoints=$(grep -c "✅" "$file" 2>/dev/null || echo "0")
+            gaps=$(grep -c "⚠️" "$file" 2>/dev/null || echo "0")
+
+            total_hosts=$((total_hosts + 1))
+            if [ "$endpoints" -gt 0 ]; then
+              monitored_hosts=$((monitored_hosts + 1))
+            fi
+
+            echo "- **$host**: $endpoints endpoints, $gaps gaps"
+          fi
+        done
+
+        echo ""
+        echo "## Summary"
+        echo "- Total Hosts: $total_hosts"
+        echo "- Monitored Hosts: $monitored_hosts"
+        echo "- Coverage: $(( monitored_hosts * 100 / total_hosts ))%"
+
+        echo ""
+        echo "## Next Steps"
+        echo "1. Review individual host reports in /tmp/monitoring_coverage_*.md"
+        echo "2. Apply consolidated Prometheus config: /tmp/prometheus_homelab_targets_$(date +%s).yml"
+        echo "3. Address monitoring gaps identified in reports"
+      register: summary_report
+
+    - name: Save monitoring summary
+      copy:
+        content: "{{ summary_report.stdout }}"
+        dest: "/tmp/homelab_monitoring_summary_{{ ansible_date_time.epoch }}.md"
+
+    - name: Display final summary
+      debug:
+        msg: |
+          Homelab Monitoring Discovery Complete!
+
+          📊 Reports Generated:
+          - Consolidated Config: /tmp/prometheus_homelab_targets_{{ ansible_date_time.epoch }}.yml
+          - Summary Report: /tmp/homelab_monitoring_summary_{{ ansible_date_time.epoch }}.md
+          - Individual Reports: /tmp/monitoring_coverage_*.md
+
+          🔧 Next Steps:
+          1. Review the summary report for coverage gaps
+          2. Apply the consolidated Prometheus configuration
+          3. Install missing exporters where needed
--- a/ansible/automation/playbooks/proxmox_management.yml
+++ b/ansible/automation/playbooks/proxmox_management.yml
@@ -0,0 +1,195 @@
+---
+# Proxmox VE Management Playbook
+# Inventory and health check for VMs, LXC containers, storage, and recent tasks
+# Usage: ansible-playbook playbooks/proxmox_management.yml -i hosts.ini
+# Usage: ansible-playbook playbooks/proxmox_management.yml -i hosts.ini -e action=snapshot -e vm_id=100
+
+- name: Proxmox VE Management
+  hosts: pve
+  gather_facts: yes
+  become: false
+
+  vars:
+    action: "{{ action | default('status') }}"
+    vm_id: "{{ vm_id | default('') }}"
+    report_dir: "/tmp/health_reports"
+
+  tasks:
+
+    # ---------- Report directory ----------
+    - name: Ensure health report directory exists
+      ansible.builtin.file:
+        path: "{{ report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    # ---------- Status mode ----------
+    - name: Get PVE version
+      ansible.builtin.command: pveversion
+      register: pve_version
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    - name: Get node resource summary
+      ansible.builtin.shell: |
+        pvesh get /nodes/$(hostname)/status --output-format json 2>/dev/null || \
+          echo '{"error": "pvesh not available"}'
+      register: node_status_raw
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    - name: List all VMs
+      ansible.builtin.command: qm list
+      register: vm_list
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    - name: List all LXC containers
+      ansible.builtin.command: pct list
+      register: lxc_list
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    - name: Count running VMs
+      ansible.builtin.shell: qm list 2>/dev/null | grep -c running || echo "0"
+      register: running_vm_count
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    - name: Count running LXC containers
+      ansible.builtin.shell: pct list 2>/dev/null | grep -c running || echo "0"
+      register: running_lxc_count
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    - name: Get storage pool status
+      ansible.builtin.shell: |
+        pvesh get /nodes/$(hostname)/storage --output-format json 2>/dev/null | python3 << 'PYEOF' || pvesm status 2>/dev/null || echo "Storage info unavailable"
+        import sys, json
+        try:
+            pools = json.load(sys.stdin)
+        except Exception:
+            sys.exit(1)
+        print('{:<20} {:<15} {:>8} {:>14}'.format('Storage', 'Type', 'Used%', 'Avail (GiB)'))
+        print('-' * 62)
+        for p in pools:
+            name      = p.get('storage', 'n/a')
+            stype     = p.get('type', 'n/a')
+            total     = p.get('total', 0)
+            used      = p.get('used', 0)
+            avail     = p.get('avail', 0)
+            pct       = round(used / total * 100, 1) if total and total > 0 else 0.0
+            avail_gib = round(avail / 1024**3, 2)
+            print('{:<20} {:<15} {:>7}% {:>13} GiB'.format(name, stype, pct, avail_gib))
+        PYEOF
+      register: storage_status
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    - name: Get last 10 task log entries
+      ansible.builtin.shell: |
+        pvesh get /nodes/$(hostname)/tasks --limit 10 --output-format json 2>/dev/null | python3 << 'PYEOF' || echo "Task log unavailable"
+        import sys, json, datetime
+        try:
+            tasks = json.load(sys.stdin)
+        except Exception:
+            sys.exit(1)
+        print('{:<22} {:<12} {}'.format('Timestamp', 'Status', 'UPID'))
+        print('-' * 80)
+        for t in tasks:
+            upid      = t.get('upid', 'n/a')
+            status    = t.get('status', 'n/a')
+            starttime = t.get('starttime', 0)
+            try:
+                ts = datetime.datetime.fromtimestamp(starttime).strftime('%Y-%m-%d %H:%M:%S')
+            except Exception:
+                ts = str(starttime)
+            print('{:<22} {:<12} {}'.format(ts, status, upid[:60]))
+        PYEOF
+      register: task_log
+      changed_when: false
+      failed_when: false
+      when: action == 'status'
+
+    # ---------- Status summary ----------
+    - name: Display Proxmox status summary
+      ansible.builtin.debug:
+        msg: |
+          ============================================================
+          Proxmox VE Status — {{ inventory_hostname }}
+          ============================================================
+          PVE Version   : {{ pve_version.stdout | default('n/a') }}
+          Running VMs   : {{ running_vm_count.stdout | default('0') | trim }}
+          Running LXCs  : {{ running_lxc_count.stdout | default('0') | trim }}
+
+          --- Node Resource Summary (JSON) ---
+          {{ node_status_raw.stdout | default('{}') | from_json | to_nice_json if (node_status_raw.stdout | default('') | length > 0 and node_status_raw.stdout | default('') is search('{')) else node_status_raw.stdout | default('unavailable') }}
+
+          --- VMs (qm list) ---
+          {{ vm_list.stdout | default('none') }}
+
+          --- LXC Containers (pct list) ---
+          {{ lxc_list.stdout | default('none') }}
+
+          --- Storage Pools ---
+          {{ storage_status.stdout | default('unavailable') }}
+
+          --- Recent Tasks (last 10) ---
+          {{ task_log.stdout | default('unavailable') }}
+          ============================================================
+      when: action == 'status'
+
+    # ---------- Write JSON report ----------
+    - name: Write Proxmox health JSON report
+      ansible.builtin.copy:
+        content: "{{ report_data | to_nice_json }}"
+        dest: "{{ report_dir }}/proxmox_{{ ansible_date_time.date }}.json"
+      vars:
+        report_data:
+          timestamp: "{{ ansible_date_time.iso8601 }}"
+          host: "{{ inventory_hostname }}"
+          pve_version: "{{ pve_version.stdout | default('n/a') | trim }}"
+          running_vms: "{{ running_vm_count.stdout | default('0') | trim }}"
+          running_lxcs: "{{ running_lxc_count.stdout | default('0') | trim }}"
+          vm_list: "{{ vm_list.stdout | default('') }}"
+          lxc_list: "{{ lxc_list.stdout | default('') }}"
+          storage_status: "{{ storage_status.stdout | default('') }}"
+          task_log: "{{ task_log.stdout | default('') }}"
+          node_status_raw: "{{ node_status_raw.stdout | default('') }}"
+      delegate_to: localhost
+      run_once: true
+      changed_when: false
+      when: action == 'status'
+
+    # ---------- Snapshot mode ----------
+    - name: Create VM snapshot
+      ansible.builtin.shell: >
+        qm snapshot {{ vm_id }} "ansible-snap-{{ ansible_date_time.epoch }}"
+        --description "Ansible automated snapshot"
+      register: snapshot_result
+      changed_when: true
+      failed_when: false
+      when:
+        - action == 'snapshot'
+        - vm_id | string | length > 0
+
+    - name: Display snapshot result
+      ansible.builtin.debug:
+        msg: |
+          Snapshot created on {{ inventory_hostname }}
+          VM ID : {{ vm_id }}
+          Result:
+          {{ (snapshot_result | default({})).stdout | default('') }}
+          {{ (snapshot_result | default({})).stderr | default('') }}
+      when:
+        - action == 'snapshot'
+        - vm_id | string | length > 0
--- a/ansible/automation/playbooks/prune_containers.yml
+++ b/ansible/automation/playbooks/prune_containers.yml
@@ -0,0 +1,420 @@
+---
+# Docker Cleanup and Pruning Playbook
+# Clean up unused containers, images, volumes, and networks
+# Usage: ansible-playbook playbooks/prune_containers.yml
+# Usage: ansible-playbook playbooks/prune_containers.yml -e "aggressive_cleanup=true"
+# Usage: ansible-playbook playbooks/prune_containers.yml -e "dry_run=true"
+
+- name: Docker System Cleanup and Pruning
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    dry_run: "{{ dry_run | default(false) }}"
+    aggressive_cleanup: "{{ aggressive_cleanup | default(false) }}"
+    keep_images_days: "{{ keep_images_days | default(7) }}"
+    keep_volumes: "{{ keep_volumes | default(true) }}"
+    backup_before_cleanup: "{{ backup_before_cleanup | default(true) }}"
+    cleanup_logs: "{{ cleanup_logs | default(true) }}"
+    max_log_size: "{{ max_log_size | default('100m') }}"
+
+  tasks:
+    - name: Check if Docker is running
+      systemd:
+        name: docker
+      register: docker_status
+      failed_when: docker_status.status.ActiveState != "active"
+
+    - name: Create cleanup report directory
+      file:
+        path: "/tmp/docker_cleanup/{{ ansible_date_time.date }}"
+        state: directory
+        mode: '0755'
+
+    - name: Get pre-cleanup Docker system info
+      shell: |
+        echo "=== PRE-CLEANUP DOCKER SYSTEM INFO ==="
+        echo "Date: {{ ansible_date_time.iso8601 }}"
+        echo "Host: {{ inventory_hostname }}"
+        echo ""
+
+        echo "System Usage:"
+        docker system df
+        echo ""
+
+        echo "Container Count:"
+        echo "Running: $(docker ps -q | wc -l)"
+        echo "Stopped: $(docker ps -aq --filter status=exited | wc -l)"
+        echo "Total: $(docker ps -aq | wc -l)"
+        echo ""
+
+        echo "Image Count:"
+        echo "Total: $(docker images -q | wc -l)"
+        echo "Dangling: $(docker images -f dangling=true -q | wc -l)"
+        echo ""
+
+        echo "Volume Count:"
+        echo "Total: $(docker volume ls -q | wc -l)"
+        echo "Dangling: $(docker volume ls -f dangling=true -q | wc -l)"
+        echo ""
+
+        echo "Network Count:"
+        echo "Total: $(docker network ls -q | wc -l)"
+        echo "Custom: $(docker network ls --filter type=custom -q | wc -l)"
+      register: pre_cleanup_info
+      changed_when: false
+
+    - name: Display cleanup plan
+      debug:
+        msg: |
+          🧹 DOCKER CLEANUP PLAN
+          ======================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔍 Dry Run: {{ dry_run }}
+          💪 Aggressive: {{ aggressive_cleanup }}
+          📦 Keep Images: {{ keep_images_days }} days
+          💾 Keep Volumes: {{ keep_volumes }}
+          📝 Cleanup Logs: {{ cleanup_logs }}
+
+          {{ pre_cleanup_info.stdout }}
+
+    - name: Backup container list before cleanup
+      shell: |
+        backup_file="/tmp/docker_cleanup/{{ ansible_date_time.date }}/{{ inventory_hostname }}_containers_backup.txt"
+
+        echo "=== CONTAINER BACKUP - {{ ansible_date_time.iso8601 }} ===" > "$backup_file"
+        echo "Host: {{ inventory_hostname }}" >> "$backup_file"
+        echo "" >> "$backup_file"
+
+        echo "=== RUNNING CONTAINERS ===" >> "$backup_file"
+        docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" >> "$backup_file"
+        echo "" >> "$backup_file"
+
+        echo "=== ALL CONTAINERS ===" >> "$backup_file"
+        docker ps -a --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.CreatedAt}}" >> "$backup_file"
+        echo "" >> "$backup_file"
+
+        echo "=== IMAGES ===" >> "$backup_file"
+        docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}\t{{.CreatedAt}}" >> "$backup_file"
+        echo "" >> "$backup_file"
+
+        echo "=== VOLUMES ===" >> "$backup_file"
+        docker volume ls >> "$backup_file"
+        echo "" >> "$backup_file"
+
+        echo "=== NETWORKS ===" >> "$backup_file"
+        docker network ls >> "$backup_file"
+      when: backup_before_cleanup | bool
+
+    - name: Remove stopped containers
+      shell: |
+        {% if dry_run %}
+        echo "DRY RUN: Would remove stopped containers:"
+        docker ps -aq --filter status=exited
+        {% else %}
+        echo "Removing stopped containers..."
+        stopped_containers=$(docker ps -aq --filter status=exited)
+        if [ -n "$stopped_containers" ]; then
+          docker rm $stopped_containers
+          echo "✅ Removed stopped containers"
+        else
+          echo "ℹ️  No stopped containers to remove"
+        fi
+        {% endif %}
+      register: remove_stopped_containers
+
+    - name: Remove dangling images
+      shell: |
+        {% if dry_run %}
+        echo "DRY RUN: Would remove dangling images:"
+        docker images -f dangling=true -q
+        {% else %}
+        echo "Removing dangling images..."
+        dangling_images=$(docker images -f dangling=true -q)
+        if [ -n "$dangling_images" ]; then
+          docker rmi $dangling_images
+          echo "✅ Removed dangling images"
+        else
+          echo "ℹ️  No dangling images to remove"
+        fi
+        {% endif %}
+      register: remove_dangling_images
+
+    - name: Remove unused images (aggressive cleanup)
+      shell: |
+        {% if dry_run %}
+        echo "DRY RUN: Would remove unused images older than {{ keep_images_days }} days:"
+        docker images --filter "until={{ keep_images_days * 24 }}h" -q
+        {% else %}
+        echo "Removing unused images older than {{ keep_images_days }} days..."
+        old_images=$(docker images --filter "until={{ keep_images_days * 24 }}h" -q)
+        if [ -n "$old_images" ]; then
+          # Check if images are not used by any container
+          for image in $old_images; do
+            if ! docker ps -a --format "{{.Image}}" | grep -q "$image"; then
+              docker rmi "$image" 2>/dev/null && echo "Removed image: $image" || echo "Failed to remove image: $image"
+            else
+              echo "Skipping image in use: $image"
+            fi
+          done
+          echo "✅ Removed old unused images"
+        else
+          echo "ℹ️  No old images to remove"
+        fi
+        {% endif %}
+      register: remove_old_images
+      when: aggressive_cleanup | bool
+
+    - name: Remove dangling volumes
+      shell: |
+        {% if dry_run %}
+        echo "DRY RUN: Would remove dangling volumes:"
+        docker volume ls -f dangling=true -q
+        {% else %}
+        {% if not keep_volumes %}
+        echo "Removing dangling volumes..."
+        dangling_volumes=$(docker volume ls -f dangling=true -q)
+        if [ -n "$dangling_volumes" ]; then
+          docker volume rm $dangling_volumes
+          echo "✅ Removed dangling volumes"
+        else
+          echo "ℹ️  No dangling volumes to remove"
+        fi
+        {% else %}
+        echo "ℹ️  Volume cleanup skipped (keep_volumes=true)"
+        {% endif %}
+        {% endif %}
+      register: remove_dangling_volumes
+
+    - name: Remove unused networks
+      shell: |
+        {% if dry_run %}
+        echo "DRY RUN: Would remove unused networks:"
+        docker network ls --filter type=custom -q
+        {% else %}
+        echo "Removing unused networks..."
+        docker network prune -f
+        echo "✅ Removed unused networks"
+        {% endif %}
+      register: remove_unused_networks
+
+    - name: Clean up container logs
+      shell: |
+        {% if dry_run %}
+        echo "DRY RUN: Would clean up container logs larger than {{ max_log_size }}"
+        find /var/lib/docker/containers -name "*-json.log" -size +{{ max_log_size }} 2>/dev/null | wc -l
+        {% else %}
+        {% if cleanup_logs %}
+        echo "Cleaning up large container logs (>{{ max_log_size }})..."
+
+        log_count=0
+        total_size_before=0
+        total_size_after=0
+
+        for log_file in $(find /var/lib/docker/containers -name "*-json.log" -size +{{ max_log_size }} 2>/dev/null); do
+          if [ -f "$log_file" ]; then
+            size_before=$(stat -f%z "$log_file" 2>/dev/null || stat -c%s "$log_file" 2>/dev/null || echo 0)
+            total_size_before=$((total_size_before + size_before))
+
+            # Truncate log file to last 1000 lines
+            tail -1000 "$log_file" > "${log_file}.tmp" && mv "${log_file}.tmp" "$log_file"
+
+            size_after=$(stat -f%z "$log_file" 2>/dev/null || stat -c%s "$log_file" 2>/dev/null || echo 0)
+            total_size_after=$((total_size_after + size_after))
+
+            log_count=$((log_count + 1))
+          fi
+        done
+
+        if [ $log_count -gt 0 ]; then
+          saved_bytes=$((total_size_before - total_size_after))
+          echo "✅ Cleaned $log_count log files, saved $(echo $saved_bytes | numfmt --to=iec) bytes"
+        else
+          echo "ℹ️  No large log files to clean"
+        fi
+        {% else %}
+        echo "ℹ️  Log cleanup skipped (cleanup_logs=false)"
+        {% endif %}
+        {% endif %}
+      register: cleanup_logs_result
+      when: cleanup_logs | bool
+
+    - name: Run Docker system prune
+      shell: |
+        {% if dry_run %}
+        echo "DRY RUN: Would run docker system prune"
+        docker system df
+        {% else %}
+        echo "Running Docker system prune..."
+        {% if aggressive_cleanup %}
+        docker system prune -af --volumes
+        {% else %}
+        docker system prune -f
+        {% endif %}
+        echo "✅ Docker system prune complete"
+        {% endif %}
+      register: system_prune_result
+
+    - name: Get post-cleanup Docker system info
+      shell: |
+        echo "=== POST-CLEANUP DOCKER SYSTEM INFO ==="
+        echo "Date: {{ ansible_date_time.iso8601 }}"
+        echo "Host: {{ inventory_hostname }}"
+        echo ""
+
+        echo "System Usage:"
+        docker system df
+        echo ""
+
+        echo "Container Count:"
+        echo "Running: $(docker ps -q | wc -l)"
+        echo "Stopped: $(docker ps -aq --filter status=exited | wc -l)"
+        echo "Total: $(docker ps -aq | wc -l)"
+        echo ""
+
+        echo "Image Count:"
+        echo "Total: $(docker images -q | wc -l)"
+        echo "Dangling: $(docker images -f dangling=true -q | wc -l)"
+        echo ""
+
+        echo "Volume Count:"
+        echo "Total: $(docker volume ls -q | wc -l)"
+        echo "Dangling: $(docker volume ls -f dangling=true -q | wc -l)"
+        echo ""
+
+        echo "Network Count:"
+        echo "Total: $(docker network ls -q | wc -l)"
+        echo "Custom: $(docker network ls --filter type=custom -q | wc -l)"
+      register: post_cleanup_info
+      changed_when: false
+
+    - name: Generate cleanup report
+      copy:
+        content: |
+          🧹 DOCKER CLEANUP REPORT - {{ inventory_hostname }}
+          ===============================================
+
+          📅 Cleanup Date: {{ ansible_date_time.iso8601 }}
+          🖥️  Host: {{ inventory_hostname }}
+          🔍 Dry Run: {{ dry_run }}
+          💪 Aggressive Mode: {{ aggressive_cleanup }}
+          📦 Image Retention: {{ keep_images_days }} days
+          💾 Keep Volumes: {{ keep_volumes }}
+          📝 Log Cleanup: {{ cleanup_logs }}
+
+          📊 BEFORE CLEANUP:
+          {{ pre_cleanup_info.stdout }}
+
+          🔧 CLEANUP ACTIONS:
+
+          🗑️  Stopped Containers:
+          {{ remove_stopped_containers.stdout }}
+
+          🖼️  Dangling Images:
+          {{ remove_dangling_images.stdout }}
+
+          {% if aggressive_cleanup %}
+          📦 Old Images:
+          {{ remove_old_images.stdout }}
+          {% endif %}
+
+          💾 Dangling Volumes:
+          {{ remove_dangling_volumes.stdout }}
+
+          🌐 Unused Networks:
+          {{ remove_unused_networks.stdout }}
+
+          {% if cleanup_logs %}
+          📝 Container Logs:
+          {{ cleanup_logs_result.stdout }}
+          {% endif %}
+
+          🧹 System Prune:
+          {{ system_prune_result.stdout }}
+
+          📊 AFTER CLEANUP:
+          {{ post_cleanup_info.stdout }}
+
+          💡 RECOMMENDATIONS:
+          - Schedule regular cleanup: cron job for this playbook
+          - Monitor disk usage: ansible-playbook playbooks/disk_usage_report.yml
+          - Consider log rotation: ansible-playbook playbooks/log_rotation.yml
+          {% if not aggressive_cleanup %}
+          - For more space: run with -e "aggressive_cleanup=true"
+          {% endif %}
+
+          ✅ CLEANUP COMPLETE
+
+        dest: "/tmp/docker_cleanup/{{ ansible_date_time.date }}/{{ inventory_hostname }}_cleanup_report.txt"
+
+    - name: Display cleanup summary
+      debug:
+        msg: |
+
+          ✅ DOCKER CLEANUP COMPLETE - {{ inventory_hostname }}
+          =============================================
+
+          🔍 Mode: {{ 'DRY RUN' if dry_run else 'LIVE CLEANUP' }}
+          💪 Aggressive: {{ aggressive_cleanup }}
+
+          📊 SUMMARY:
+          {{ post_cleanup_info.stdout }}
+
+          📄 Full report: /tmp/docker_cleanup/{{ ansible_date_time.date }}/{{ inventory_hostname }}_cleanup_report.txt
+
+          🔍 Next Steps:
+          {% if dry_run %}
+          - Run without dry_run to perform actual cleanup
+          {% endif %}
+          - Monitor: ansible-playbook playbooks/disk_usage_report.yml
+          - Schedule regular cleanup via cron
+
+          =============================================
+
+    - name: Restart Docker daemon if needed
+      systemd:
+        name: docker
+        state: restarted
+      when:
+        - restart_docker | default(false) | bool
+        - not dry_run | bool
+      register: docker_restart
+
+    - name: Verify services after cleanup
+      ansible.builtin.command: "docker ps --filter name={{ item }} --format '{{ '{{' }}.Names{{ '}}' }}'"
+      loop:
+        - plex
+        - immich-server
+        - vaultwarden
+        - grafana
+        - prometheus
+      register: service_checks
+      changed_when: false
+      failed_when: false
+      when:
+        - not dry_run | bool
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+    - name: Display service verification
+      debug:
+        msg: "{{ service_verification.stdout }}"
+      when: service_verification is defined
--- a/ansible/automation/playbooks/restart_service.yml
+++ b/ansible/automation/playbooks/restart_service.yml
@@ -0,0 +1,194 @@
+---
+# Service Restart Playbook
+# Restart specific services with proper dependency handling
+# Usage: ansible-playbook playbooks/restart_service.yml -e "service_name=plex host_target=atlantis"
+# Usage: ansible-playbook playbooks/restart_service.yml -e "service_name=immich-server host_target=atlantis wait_time=30"
+
+- name: Restart Service with Dependency Handling
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    service_name: "{{ service_name | mandatory }}"
+    force_restart: "{{ force_restart | default(false) }}"
+
+    # Service dependency mapping
+    service_dependencies:
+      # Media stack dependencies
+      plex:
+        depends_on: []
+        restart_delay: 30
+      sonarr:
+        depends_on: ["prowlarr"]
+        restart_delay: 20
+      radarr:
+        depends_on: ["prowlarr"]
+        restart_delay: 20
+      lidarr:
+        depends_on: ["prowlarr"]
+        restart_delay: 20
+      bazarr:
+        depends_on: ["sonarr", "radarr"]
+        restart_delay: 15
+      jellyseerr:
+        depends_on: ["plex", "sonarr", "radarr"]
+        restart_delay: 25
+
+      # Immich stack
+      immich-server:
+        depends_on: ["immich-db", "immich-redis"]
+        restart_delay: 30
+      immich-machine-learning:
+        depends_on: ["immich-server"]
+        restart_delay: 20
+
+      # Security stack
+      vaultwarden:
+        depends_on: ["vaultwarden-db"]
+        restart_delay: 25
+
+      # Monitoring stack
+      grafana:
+        depends_on: ["prometheus"]
+        restart_delay: 20
+      prometheus:
+        depends_on: []
+        restart_delay: 30
+
+  tasks:
+    - name: Validate required variables
+      fail:
+        msg: "service_name is required. Use -e 'service_name=SERVICE_NAME'"
+      when: service_name is not defined or service_name == ""
+
+    - name: Check if Docker is running
+      systemd:
+        name: docker
+      register: docker_status
+      failed_when: docker_status.status.ActiveState != "active"
+
+    - name: Check if service exists
+      shell: 'docker ps -a --filter "name={{ service_name }}" --format "{%raw%}{{.Names}}{%endraw%}"'
+      register: service_exists
+      changed_when: false
+
+    - name: Fail if service doesn't exist
+      fail:
+        msg: "Service '{{ service_name }}' not found on {{ inventory_hostname }}"
+      when: service_exists.stdout == ""
+
+    - name: Get current service status
+      shell: 'docker ps --filter "name={{ service_name }}" --format "{%raw%}{{.Status}}{%endraw%}"'
+      register: service_status_before
+      changed_when: false
+
+    - name: Display pre-restart status
+      debug:
+        msg: |
+          🔄 RESTART REQUEST for {{ service_name }} on {{ inventory_hostname }}
+          📊 Current Status: {{ service_status_before.stdout | default('Not running') }}
+          ⏱️  Wait Time: {{ wait_time | default(15) }} seconds
+          🔗 Dependencies: {{ service_dependencies.get(service_name, {}).get('depends_on', []) | join(', ') or 'None' }}
+
+    - name: Check dependencies are running
+      shell: 'docker ps --filter "name={{ item }}" --format "{%raw%}{{.Names}}{%endraw%}"'
+      register: dependency_check
+      loop: "{{ service_dependencies.get(service_name, {}).get('depends_on', []) }}"
+      when: service_dependencies.get(service_name, {}).get('depends_on', []) | length > 0
+
+    - name: Warn about missing dependencies
+      debug:
+        msg: "⚠️  Warning: Dependency '{{ item.item }}' is not running"
+      loop: "{{ dependency_check.results | default([]) }}"
+      when:
+        - dependency_check is defined
+        - item.stdout == ""
+
+    - name: Create pre-restart backup of logs
+      shell: |
+        mkdir -p /tmp/service_logs/{{ ansible_date_time.date }}
+        docker logs {{ service_name }} --tail 100 > /tmp/service_logs/{{ ansible_date_time.date }}/{{ service_name }}_pre_restart.log 2>&1
+      ignore_errors: yes
+
+    - name: Stop service gracefully
+      shell: docker stop {{ service_name }}
+      register: stop_result
+      ignore_errors: yes
+
+    - name: Force stop if graceful stop failed
+      shell: docker kill {{ service_name }}
+      when:
+        - stop_result.rc != 0
+        - force_restart | bool
+
+    - name: Wait for service to fully stop
+      shell: 'docker ps --filter "name={{ service_name }}" --format "{%raw%}{{.Names}}{%endraw%}"'
+      register: stop_check
+      until: stop_check.stdout == ""
+      retries: 10
+      delay: 2
+
+    - name: Start service
+      shell: docker start {{ service_name }}
+      register: start_result
+
+    - name: Wait for service to be ready
+      pause:
+        seconds: "{{ service_dependencies.get(service_name, {}).get('restart_delay', wait_time | default(15)) }}"
+
+    - name: Verify service is running
+      shell: 'docker ps --filter "name={{ service_name }}" --format "{%raw%}{{.Status}}{%endraw%}"'
+      register: service_status_after
+      retries: 5
+      delay: 3
+      until: "'Up' in service_status_after.stdout"
+
+    - name: Check service health (if health check available)
+      shell: 'docker inspect {{ service_name }} --format="{%raw%}{{.State.Health.Status}}{%endraw%}"'
+      register: health_check
+      ignore_errors: yes
+      changed_when: false
+
+    - name: Wait for healthy status
+      shell: 'docker inspect {{ service_name }} --format="{%raw%}{{.State.Health.Status}}{%endraw%}"'
+      register: health_status
+      until: health_status.stdout == "healthy"
+      retries: 10
+      delay: 5
+      when:
+        - health_check.rc == 0
+        - health_check.stdout != "none"
+      ignore_errors: yes
+
+    - name: Create post-restart log snapshot
+      shell: |
+        docker logs {{ service_name }} --tail 50 > /tmp/service_logs/{{ ansible_date_time.date }}/{{ service_name }}_post_restart.log 2>&1
+      ignore_errors: yes
+
+    - name: Display restart results
+      debug:
+        msg: |
+
+          ✅ SERVICE RESTART COMPLETE
+          ================================
+          🖥️  Host: {{ inventory_hostname }}
+          🔧 Service: {{ service_name }}
+          📊 Status Before: {{ service_status_before.stdout | default('Not running') }}
+          📊 Status After: {{ service_status_after.stdout }}
+          {% if health_check.rc == 0 and health_check.stdout != "none" %}
+          🏥 Health Status: {{ health_status.stdout | default('Checking...') }}
+          {% endif %}
+          ⏱️  Restart Duration: {{ service_dependencies.get(service_name, {}).get('restart_delay', wait_time | default(15)) }} seconds
+          📝 Logs: /tmp/service_logs/{{ ansible_date_time.date }}/{{ service_name }}_*.log
+
+          ================================
+
+    - name: Restart dependent services (if any)
+      include_tasks: restart_dependent_services.yml
+      vars:
+        parent_service: "{{ service_name }}"
+      when: restart_dependents | default(false) | bool
+
+  handlers:
+    - name: restart_dependent_services
+      debug:
+        msg: "This would restart services that depend on {{ service_name }}"
--- a/ansible/automation/playbooks/security_audit.yml
+++ b/ansible/automation/playbooks/security_audit.yml
@@ -0,0 +1,304 @@
+---
+- name: Security Audit and Hardening
+  hosts: all
+  gather_facts: yes
+  vars:
+    audit_timestamp: "{{ ansible_date_time.iso8601 }}"
+    security_report_dir: "/tmp/security_reports"
+
+  tasks:
+    - name: Create security reports directory
+      file:
+        path: "{{ security_report_dir }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    - name: Check system updates
+      shell: |
+        if command -v apt >/dev/null 2>&1; then
+          apt list --upgradable 2>/dev/null | wc -l
+        elif command -v yum >/dev/null 2>&1; then
+          yum check-update --quiet | wc -l
+        else
+          echo "0"
+        fi
+      register: pending_updates
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check for security updates
+      shell: |
+        if command -v apt >/dev/null 2>&1; then
+          apt list --upgradable 2>/dev/null | grep -i security | wc -l
+        elif command -v yum >/dev/null 2>&1; then
+          yum --security check-update --quiet 2>/dev/null | wc -l
+        else
+          echo "0"
+        fi
+      register: security_updates
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check SSH configuration
+      shell: |
+        echo "=== SSH SECURITY AUDIT ==="
+        if [ -f /etc/ssh/sshd_config ]; then
+          echo "SSH Configuration:"
+          echo "PermitRootLogin: $(grep -E '^PermitRootLogin' /etc/ssh/sshd_config | awk '{print $2}' || echo 'default')"
+          echo "PasswordAuthentication: $(grep -E '^PasswordAuthentication' /etc/ssh/sshd_config | awk '{print $2}' || echo 'default')"
+          echo "Port: $(grep -E '^Port' /etc/ssh/sshd_config | awk '{print $2}' || echo '22')"
+          echo "Protocol: $(grep -E '^Protocol' /etc/ssh/sshd_config | awk '{print $2}' || echo 'default')"
+        else
+          echo "SSH config not accessible"
+        fi
+      register: ssh_audit
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check firewall status
+      shell: |
+        echo "=== FIREWALL STATUS ==="
+        if command -v ufw >/dev/null 2>&1; then
+          echo "UFW Status:"
+          ufw status verbose 2>/dev/null || echo "UFW not configured"
+        elif command -v iptables >/dev/null 2>&1; then
+          echo "IPTables Rules:"
+          iptables -L -n | head -20 2>/dev/null || echo "IPTables not accessible"
+        elif command -v firewall-cmd >/dev/null 2>&1; then
+          echo "FirewallD Status:"
+          firewall-cmd --state 2>/dev/null || echo "FirewallD not running"
+        else
+          echo "No firewall tools found"
+        fi
+      register: firewall_audit
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check user accounts
+      shell: |
+        echo "=== USER ACCOUNT AUDIT ==="
+        echo "Users with shell access:"
+        grep -E '/bin/(bash|sh|zsh)$' /etc/passwd | cut -d: -f1 | sort
+        echo ""
+        echo "Users with sudo access:"
+        if [ -f /etc/sudoers ]; then
+          grep -E '^[^#]*ALL.*ALL' /etc/sudoers 2>/dev/null | cut -d' ' -f1 || echo "No sudo users found"
+        fi
+        echo ""
+        echo "Recent logins:"
+        last -n 10 2>/dev/null | head -10 || echo "Login history not available"
+      register: user_audit
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check file permissions
+      shell: |
+        echo "=== FILE PERMISSIONS AUDIT ==="
+        echo "World-writable files in /etc:"
+        find /etc -type f -perm -002 2>/dev/null | head -10 || echo "None found"
+        echo ""
+        echo "SUID/SGID files:"
+        find /usr -type f \( -perm -4000 -o -perm -2000 \) 2>/dev/null | head -10 || echo "None found"
+        echo ""
+        echo "SSH key permissions:"
+        if [ -d ~/.ssh ]; then
+          ls -la ~/.ssh/ 2>/dev/null || echo "SSH directory not accessible"
+        else
+          echo "No SSH directory found"
+        fi
+      register: permissions_audit
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check network security
+      shell: |
+        echo "=== NETWORK SECURITY AUDIT ==="
+        echo "Open ports:"
+        if command -v netstat >/dev/null 2>&1; then
+          netstat -tuln | grep LISTEN | head -10
+        elif command -v ss >/dev/null 2>&1; then
+          ss -tuln | grep LISTEN | head -10
+        else
+          echo "No network tools available"
+        fi
+        echo ""
+        echo "Network interfaces:"
+        ip addr show 2>/dev/null | grep -E '^[0-9]+:' || echo "Network info not available"
+      register: network_audit
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check system services
+      shell: |
+        echo "=== SERVICE SECURITY AUDIT ==="
+        if command -v systemctl >/dev/null 2>&1; then
+          echo "Running services:"
+          systemctl list-units --type=service --state=running --no-legend | head -15
+          echo ""
+          echo "Failed services:"
+          systemctl --failed --no-legend | head -5
+        else
+          echo "Systemd not available"
+        fi
+      register: service_audit
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check Docker security (if available)
+      shell: |
+        if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
+          echo "=== DOCKER SECURITY AUDIT ==="
+          echo "Docker daemon info:"
+          docker info --format '{{.SecurityOptions}}' 2>/dev/null || echo "Security options not available"
+          echo ""
+          echo "Privileged containers:"
+          docker ps --format "table {{.Names}}\t{{.Status}}" --filter "label=privileged=true" 2>/dev/null || echo "No privileged containers found"
+          echo ""
+          echo "Containers with host network:"
+          docker ps --format "table {{.Names}}\t{{.Ports}}" | grep -E '0\.0\.0\.0|::' | head -5 || echo "No host network containers found"
+        else
+          echo "Docker not available or not accessible"
+        fi
+      register: docker_audit
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Calculate security score
+      set_fact:
+        security_score:
+          updates_pending: "{{ pending_updates.stdout | int }}"
+          security_updates_pending: "{{ security_updates.stdout | int }}"
+          ssh_root_login: "{{ 'SECURE' if 'no' in ssh_audit.stdout.lower() else 'INSECURE' }}"
+          ssh_password_auth: "{{ 'SECURE' if 'no' in ssh_audit.stdout.lower() else 'INSECURE' }}"
+          firewall_active: "{{ 'ACTIVE' if 'active' in firewall_audit.stdout.lower() or 'status: active' in firewall_audit.stdout.lower() else 'INACTIVE' }}"
+          overall_risk: >-
+            {{
+              'HIGH' if (
+                (security_updates.stdout | int > 5) or
+                ('yes' in ssh_audit.stdout.lower() and 'PermitRootLogin' in ssh_audit.stdout) or
+                ('inactive' in firewall_audit.stdout.lower())
+              ) else 'MEDIUM' if (
+                (pending_updates.stdout | int > 10) or
+                (security_updates.stdout | int > 0)
+              ) else 'LOW'
+            }}
+
+    - name: Display security audit report
+      debug:
+        msg: |
+
+          ==========================================
+          🔒 SECURITY AUDIT REPORT - {{ inventory_hostname }}
+          ==========================================
+
+          📊 SECURITY SCORE: {{ security_score.overall_risk }} RISK
+
+          🔄 UPDATES:
+          - Pending Updates: {{ security_score.updates_pending }}
+          - Security Updates: {{ security_score.security_updates_pending }}
+
+          🔐 SSH SECURITY:
+          - Root Login: {{ security_score.ssh_root_login }}
+          - Password Auth: {{ security_score.ssh_password_auth }}
+
+          🛡️  FIREWALL:
+          - Status: {{ security_score.firewall_active }}
+
+          {{ ssh_audit.stdout }}
+
+          {{ firewall_audit.stdout }}
+
+          {{ user_audit.stdout }}
+
+          {{ permissions_audit.stdout }}
+
+          {{ network_audit.stdout }}
+
+          {{ service_audit.stdout }}
+
+          {{ docker_audit.stdout }}
+
+          ==========================================
+
+    - name: Generate JSON security report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ audit_timestamp }}",
+            "hostname": "{{ inventory_hostname }}",
+            "security_score": {
+              "overall_risk": "{{ security_score.overall_risk }}",
+              "updates_pending": {{ security_score.updates_pending }},
+              "security_updates_pending": {{ security_score.security_updates_pending }},
+              "ssh_root_login": "{{ security_score.ssh_root_login }}",
+              "ssh_password_auth": "{{ security_score.ssh_password_auth }}",
+              "firewall_active": "{{ security_score.firewall_active }}"
+            },
+            "audit_details": {
+              "ssh_config": {{ ssh_audit.stdout | to_json }},
+              "firewall_status": {{ firewall_audit.stdout | to_json }},
+              "user_accounts": {{ user_audit.stdout | to_json }},
+              "file_permissions": {{ permissions_audit.stdout | to_json }},
+              "network_security": {{ network_audit.stdout | to_json }},
+              "services": {{ service_audit.stdout | to_json }},
+              "docker_security": {{ docker_audit.stdout | to_json }}
+            },
+            "recommendations": [
+              {% if security_score.security_updates_pending | int > 0 %}
+              "Apply {{ security_score.security_updates_pending }} pending security updates",
+              {% endif %}
+              {% if security_score.ssh_root_login == "INSECURE" %}
+              "Disable SSH root login",
+              {% endif %}
+              {% if security_score.firewall_active == "INACTIVE" %}
+              "Enable and configure firewall",
+              {% endif %}
+              {% if security_score.updates_pending | int > 20 %}
+              "Apply system updates ({{ security_score.updates_pending }} pending)",
+              {% endif %}
+              "Regular security monitoring recommended"
+            ]
+          }
+        dest: "{{ security_report_dir }}/{{ inventory_hostname }}_security_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Send security alert for high risk
+      shell: |
+        if command -v curl >/dev/null 2>&1; then
+          curl -d "🚨 HIGH RISK: {{ inventory_hostname }} security audit - {{ security_score.overall_risk }} risk level detected" \
+               -H "Title: Security Alert" \
+               -H "Priority: high" \
+               -H "Tags: security,audit" \
+               "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}" || true
+        fi
+      when: security_score.overall_risk == "HIGH"
+      ignore_errors: yes
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          🔒 Security audit complete for {{ inventory_hostname }}
+          📊 Risk Level: {{ security_score.overall_risk }}
+          📄 Report saved to: {{ security_report_dir }}/{{ inventory_hostname }}_security_{{ ansible_date_time.epoch }}.json
+
+          {% if security_score.overall_risk == "HIGH" %}
+          🚨 HIGH RISK detected - immediate action required!
+          {% elif security_score.overall_risk == "MEDIUM" %}
+          ⚠️  MEDIUM RISK - review and address issues
+          {% else %}
+          ✅ LOW RISK - system appears secure
+          {% endif %}
+
+          Key Issues:
+          {% if security_score.security_updates_pending | int > 0 %}
+          - {{ security_score.security_updates_pending }} security updates pending
+          {% endif %}
+          {% if security_score.ssh_root_login == "INSECURE" %}
+          - SSH root login enabled
+          {% endif %}
+          {% if security_score.firewall_active == "INACTIVE" %}
+          - Firewall not active
+          {% endif %}
--- a/ansible/automation/playbooks/security_updates.yml
+++ b/ansible/automation/playbooks/security_updates.yml
@@ -0,0 +1,318 @@
+---
+# Security Updates Playbook
+# Automated security patches and system updates
+# Usage: ansible-playbook playbooks/security_updates.yml
+# Usage: ansible-playbook playbooks/security_updates.yml -e "reboot_if_required=true"
+# Usage: ansible-playbook playbooks/security_updates.yml -e "security_only=true"
+
+- name: Apply Security Updates
+  hosts: "{{ host_target | default('debian_clients') }}"
+  gather_facts: yes
+  become: yes
+  vars:
+    security_only: "{{ security_only | default(true) }}"
+    reboot_if_required: "{{ reboot_if_required | default(false) }}"
+    backup_before_update: "{{ backup_before_update | default(true) }}"
+    max_reboot_wait: "{{ max_reboot_wait | default(300) }}"
+    update_docker: "{{ update_docker | default(false) }}"
+
+  tasks:
+    - name: Check if host is reachable
+      ping:
+      register: ping_result
+
+    - name: Create update log directory
+      file:
+        path: "/var/log/ansible_updates"
+        state: directory
+        mode: '0755'
+
+    - name: Get pre-update system info
+      shell: |
+        echo "=== PRE-UPDATE SYSTEM INFO ==="
+        echo "Date: {{ ansible_date_time.iso8601 }}"
+        echo "Host: {{ inventory_hostname }}"
+        echo "Kernel: $(uname -r)"
+        echo "Uptime: $(uptime)"
+        echo ""
+
+        echo "=== CURRENT PACKAGES ==="
+        dpkg -l | grep -E "(linux-image|linux-headers)" || echo "No kernel packages found"
+        echo ""
+
+        echo "=== SECURITY UPDATES AVAILABLE ==="
+        apt list --upgradable 2>/dev/null | grep -i security || echo "No security updates available"
+        echo ""
+
+        echo "=== DISK SPACE ==="
+        df -h /
+        echo ""
+
+        echo "=== RUNNING SERVICES ==="
+        systemctl list-units --type=service --state=running | head -10
+      register: pre_update_info
+      changed_when: false
+
+    - name: Display update plan
+      debug:
+        msg: |
+          🔒 SECURITY UPDATE PLAN
+          =======================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔐 Security Only: {{ security_only }}
+          🔄 Reboot if Required: {{ reboot_if_required }}
+          💾 Backup First: {{ backup_before_update }}
+          🐳 Update Docker: {{ update_docker }}
+
+          {{ pre_update_info.stdout }}
+
+    - name: Backup critical configs before update
+      shell: |
+        backup_dir="/var/backups/pre-update-{{ ansible_date_time.epoch }}"
+        mkdir -p "$backup_dir"
+
+        echo "Creating pre-update backup..."
+
+        # Backup critical system configs
+        cp -r /etc/ssh "$backup_dir/" 2>/dev/null || echo "SSH config backup failed"
+        cp -r /etc/nginx "$backup_dir/" 2>/dev/null || echo "Nginx config not found"
+        cp -r /etc/systemd "$backup_dir/" 2>/dev/null || echo "Systemd config backup failed"
+
+        # Backup package list
+        dpkg --get-selections > "$backup_dir/package_list.txt"
+
+        # Backup Docker configs if they exist
+        if [ -d "/opt/docker" ]; then
+          tar -czf "$backup_dir/docker_configs.tar.gz" /opt/docker 2>/dev/null || echo "Docker config backup failed"
+        fi
+
+        echo "✅ Backup created at $backup_dir"
+        ls -la "$backup_dir"
+      register: backup_result
+      when: backup_before_update | bool
+
+    - name: Update package cache
+      apt:
+        update_cache: yes
+        cache_valid_time: 0
+      register: cache_update
+
+    - name: Check for available security updates
+      shell: |
+        apt list --upgradable 2>/dev/null | grep -c security || echo "0"
+      register: security_updates_count
+      changed_when: false
+
+    - name: Check for kernel updates
+      shell: |
+        apt list --upgradable 2>/dev/null | grep -E "(linux-image|linux-headers)" | wc -l
+      register: kernel_updates_count
+      changed_when: false
+
+    - name: Apply security updates only
+      apt:
+        upgrade: safe
+        autoremove: yes
+        autoclean: yes
+      register: security_update_result
+      when:
+        - security_only | bool
+        - security_updates_count.stdout | int > 0
+
+    - name: Apply all updates (if not security only)
+      apt:
+        upgrade: dist
+        autoremove: yes
+        autoclean: yes
+      register: full_update_result
+      when:
+        - not security_only | bool
+
+    - name: Update Docker (if requested)
+      block:
+        - name: Add Docker GPG key
+          apt_key:
+            url: https://download.docker.com/linux/ubuntu/gpg
+            state: present
+
+        - name: Add Docker repository
+          apt_repository:
+            repo: "deb [arch=amd64] https://download.docker.com/linux/ubuntu {{ ansible_distribution_release }} stable"
+            state: present
+
+        - name: Update Docker packages
+          apt:
+            name:
+              - docker-ce
+              - docker-ce-cli
+              - containerd.io
+            state: latest
+          register: docker_update_result
+
+        - name: Restart Docker service
+          systemd:
+            name: docker
+            state: restarted
+            enabled: yes
+          when: docker_update_result.changed
+
+      when: update_docker | bool
+
+    - name: Check if reboot is required
+      stat:
+        path: /var/run/reboot-required
+      register: reboot_required_file
+
+    - name: Display reboot requirement
+      debug:
+        msg: |
+          🔄 REBOOT STATUS
+          ================
+          Reboot Required: {{ reboot_required_file.stat.exists }}
+          Kernel Updates: {{ kernel_updates_count.stdout }}
+          Auto Reboot: {{ reboot_if_required }}
+
+    - name: Create update report
+      shell: |
+        report_file="/var/log/ansible_updates/update_report_{{ ansible_date_time.epoch }}.txt"
+
+        echo "🔒 SECURITY UPDATE REPORT - {{ inventory_hostname }}" > "$report_file"
+        echo "=================================================" >> "$report_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$report_file"
+        echo "Host: {{ inventory_hostname }}" >> "$report_file"
+        echo "Security Only: {{ security_only }}" >> "$report_file"
+        echo "Reboot Required: {{ reboot_required_file.stat.exists }}" >> "$report_file"
+        echo "" >> "$report_file"
+
+        echo "=== PRE-UPDATE INFO ===" >> "$report_file"
+        echo "{{ pre_update_info.stdout }}" >> "$report_file"
+        echo "" >> "$report_file"
+
+        echo "=== UPDATE RESULTS ===" >> "$report_file"
+        {% if security_only %}
+        {% if security_update_result is defined %}
+        echo "Security updates applied: {{ security_update_result.changed }}" >> "$report_file"
+        {% endif %}
+        {% else %}
+        {% if full_update_result is defined %}
+        echo "Full system update applied: {{ full_update_result.changed }}" >> "$report_file"
+        {% endif %}
+        {% endif %}
+
+        {% if update_docker and docker_update_result is defined %}
+        echo "Docker updated: {{ docker_update_result.changed }}" >> "$report_file"
+        {% endif %}
+
+        echo "" >> "$report_file"
+        echo "=== POST-UPDATE INFO ===" >> "$report_file"
+        echo "Kernel: $(uname -r)" >> "$report_file"
+        echo "Uptime: $(uptime)" >> "$report_file"
+        echo "Available updates: $(apt list --upgradable 2>/dev/null | wc -l)" >> "$report_file"
+
+        {% if backup_before_update %}
+        echo "" >> "$report_file"
+        echo "=== BACKUP INFO ===" >> "$report_file"
+        echo "{{ backup_result.stdout }}" >> "$report_file"
+        {% endif %}
+
+        cat "$report_file"
+      register: update_report
+
+    - name: Notify about pending reboot
+      debug:
+        msg: |
+          ⚠️  REBOOT REQUIRED
+          ===================
+          Host: {{ inventory_hostname }}
+          Reason: System updates require reboot
+          Kernel updates: {{ kernel_updates_count.stdout }}
+
+          Manual reboot command: sudo reboot
+          Or run with: -e "reboot_if_required=true"
+      when:
+        - reboot_required_file.stat.exists
+        - not reboot_if_required | bool
+
+    - name: Reboot system if required and authorized
+      reboot:
+        reboot_timeout: "{{ max_reboot_wait }}"
+        msg: "Rebooting for security updates"
+        pre_reboot_delay: 10
+      when:
+        - reboot_required_file.stat.exists
+        - reboot_if_required | bool
+      register: reboot_result
+
+    - name: Wait for system to come back online
+      wait_for_connection:
+        timeout: "{{ max_reboot_wait }}"
+        delay: 30
+      when: reboot_result is defined and reboot_result.changed
+
+    - name: Verify services after reboot
+      ansible.builtin.systemd:
+        name: "{{ item }}"
+      loop:
+        - ssh
+        - docker
+        - tailscaled
+      register: service_checks
+      failed_when: false
+      changed_when: false
+      when: reboot_result is defined and reboot_result.changed
+
+    - name: Final security check
+      shell: |
+        echo "=== FINAL SECURITY STATUS ==="
+        echo "Available security updates: $(apt list --upgradable 2>/dev/null | grep -c security || echo '0')"
+        echo "Reboot required: $([ -f /var/run/reboot-required ] && echo 'Yes' || echo 'No')"
+        echo "Last update: {{ ansible_date_time.iso8601 }}"
+        echo ""
+
+        echo "=== SYSTEM HARDENING CHECK ==="
+        echo "SSH root login: $(grep PermitRootLogin /etc/ssh/sshd_config | head -1 || echo 'Not configured')"
+        echo "Firewall status: $(ufw status | head -1 || echo 'UFW not available')"
+        echo "Fail2ban status: $(systemctl is-active fail2ban 2>/dev/null || echo 'Not running')"
+        echo "Automatic updates: $(systemctl is-enabled unattended-upgrades 2>/dev/null || echo 'Not configured')"
+      register: final_security_check
+      changed_when: false
+
+    - name: Display update summary
+      debug:
+        msg: |
+
+          ✅ SECURITY UPDATE COMPLETE - {{ inventory_hostname }}
+          =============================================
+
+          📅 Update Date: {{ ansible_date_time.date }}
+          🔐 Security Only: {{ security_only }}
+          🔄 Reboot Performed: {{ reboot_result.changed if reboot_result is defined else 'No' }}
+
+          {{ update_report.stdout }}
+
+          {{ final_security_check.stdout }}
+
+          {% if post_reboot_verification is defined %}
+          🔍 POST-REBOOT VERIFICATION:
+          {{ post_reboot_verification.stdout }}
+          {% endif %}
+
+          📄 Full report: /var/log/ansible_updates/update_report_{{ ansible_date_time.epoch }}.txt
+
+          🔍 Next Steps:
+          - Monitor system stability
+          - Check service functionality
+          - Review security hardening: ansible-playbook playbooks/security_audit.yml
+
+          =============================================
+
+    - name: Send update notification (if configured)
+      debug:
+        msg: |
+          📧 UPDATE NOTIFICATION
+          Host: {{ inventory_hostname }}
+          Status: Updates applied successfully
+          Reboot: {{ 'Required' if reboot_required_file.stat.exists else 'Not required' }}
+          Security updates: {{ security_updates_count.stdout }}
+      when: send_notifications | default(false) | bool
--- a/ansible/automation/playbooks/service_health_deep.yml
+++ b/ansible/automation/playbooks/service_health_deep.yml
@@ -0,0 +1,524 @@
+---
+# Deep Service Health Check Playbook
+# Comprehensive health monitoring for all homelab services
+# Usage: ansible-playbook playbooks/service_health_deep.yml
+# Usage: ansible-playbook playbooks/service_health_deep.yml -e "include_performance=true"
+# Usage: ansible-playbook playbooks/service_health_deep.yml -e "alert_on_issues=true"
+
+- name: Deep Service Health Check
+  hosts: "{{ host_target | default('all') }}"
+  gather_facts: yes
+  vars:
+    include_performance: "{{ include_performance | default(true) }}"
+    alert_on_issues: "{{ alert_on_issues | default(false) }}"
+    health_check_timeout: "{{ health_check_timeout | default(30) }}"
+    report_dir: "/tmp/health_reports"
+
+    # Service health check configurations
+    service_health_checks:
+      atlantis:
+        - name: "plex"
+          container: "plex"
+          health_url: "http://localhost:32400/web"
+          expected_status: 200
+          critical: true
+        - name: "immich-server"
+          container: "immich-server"
+          health_url: "http://localhost:2283/api/server-info/ping"
+          expected_status: 200
+          critical: true
+        - name: "vaultwarden"
+          container: "vaultwarden"
+          health_url: "http://localhost:80/alive"
+          expected_status: 200
+          critical: true
+        - name: "sonarr"
+          container: "sonarr"
+          health_url: "http://localhost:8989/api/v3/system/status"
+          expected_status: 200
+          critical: false
+        - name: "radarr"
+          container: "radarr"
+          health_url: "http://localhost:7878/api/v3/system/status"
+          expected_status: 200
+          critical: false
+      calypso:
+        - name: "authentik-server"
+          container: "authentik-server"
+          health_url: "http://localhost:9000/-/health/live/"
+          expected_status: 200
+          critical: true
+        - name: "paperless-webserver"
+          container: "paperless-webserver"
+          health_url: "http://localhost:8000"
+          expected_status: 200
+          critical: false
+      homelab_vm:
+        - name: "grafana"
+          container: "grafana"
+          health_url: "http://localhost:3000/api/health"
+          expected_status: 200
+          critical: true
+        - name: "prometheus"
+          container: "prometheus"
+          health_url: "http://localhost:9090/-/healthy"
+          expected_status: 200
+          critical: true
+
+  tasks:
+    - name: Create health report directory
+      file:
+        path: "{{ report_dir }}/{{ ansible_date_time.date }}"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+
+    - name: Get current service health checks for this host
+      set_fact:
+        current_health_checks: "{{ service_health_checks.get(inventory_hostname, []) }}"
+
+    - name: Display health check plan
+      debug:
+        msg: |
+          🏥 DEEP HEALTH CHECK PLAN
+          =========================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          🔍 Services to check: {{ current_health_checks | length }}
+          📊 Include Performance: {{ include_performance }}
+          🚨 Alert on Issues: {{ alert_on_issues }}
+          ⏱️  Timeout: {{ health_check_timeout }}s
+
+          📋 Services:
+          {% for service in current_health_checks %}
+          - {{ service.name }} ({{ 'Critical' if service.critical else 'Non-critical' }})
+          {% endfor %}
+
+    - name: Check Docker daemon health
+      shell: |
+        echo "=== DOCKER DAEMON HEALTH ==="
+
+        # Check Docker daemon status
+        if systemctl is-active --quiet docker; then
+          echo "✅ Docker daemon: Running"
+
+          # Check Docker daemon responsiveness
+          if timeout 10 docker version >/dev/null 2>&1; then
+            echo "✅ Docker API: Responsive"
+          else
+            echo "❌ Docker API: Unresponsive"
+          fi
+
+          # Check Docker disk usage
+          docker_usage=$(docker system df --format "table {{.Type}}\t{{.TotalCount}}\t{{.Size}}\t{{.Reclaimable}}")
+          echo "📊 Docker Usage:"
+          echo "$docker_usage"
+
+        else
+          echo "❌ Docker daemon: Not running"
+        fi
+      register: docker_health
+      changed_when: false
+
+    - name: Check container health status
+      shell: |
+        echo "=== CONTAINER HEALTH STATUS ==="
+
+        health_issues=()
+        total_containers=0
+        healthy_containers=0
+
+        {% for service in current_health_checks %}
+        echo "🔍 Checking {{ service.name }}..."
+        total_containers=$((total_containers + 1))
+
+        # Check if container exists and is running
+        if docker ps --filter "name={{ service.container }}" --format "{{.Names}}" | grep -q "{{ service.container }}"; then
+          echo "  ✅ Container running: {{ service.container }}"
+
+          # Check container health if health check is configured
+          health_status=$(docker inspect {{ service.container }} --format='{{.State.Health.Status}}' 2>/dev/null || echo "none")
+          if [ "$health_status" != "none" ]; then
+            if [ "$health_status" = "healthy" ]; then
+              echo "  ✅ Health check: $health_status"
+              healthy_containers=$((healthy_containers + 1))
+            else
+              echo "  ❌ Health check: $health_status"
+              health_issues+=("{{ service.name }}:health_check_failed")
+            fi
+          else
+            echo "  ℹ️  No health check configured"
+            healthy_containers=$((healthy_containers + 1))  # Assume healthy if no health check
+          fi
+
+          # Check container resource usage
+          container_stats=$(docker stats {{ service.container }} --no-stream --format "CPU: {{.CPUPerc}}, Memory: {{.MemUsage}}" 2>/dev/null || echo "Stats unavailable")
+          echo "  📊 Resources: $container_stats"
+
+        else
+          echo "  ❌ Container not running: {{ service.container }}"
+          health_issues+=("{{ service.name }}:container_down")
+        fi
+        echo ""
+        {% endfor %}
+
+        echo "📊 CONTAINER SUMMARY:"
+        echo "Total containers checked: $total_containers"
+        echo "Healthy containers: $healthy_containers"
+        echo "Issues found: ${#health_issues[@]}"
+
+        if [ ${#health_issues[@]} -gt 0 ]; then
+          echo "🚨 ISSUES:"
+          for issue in "${health_issues[@]}"; do
+            echo "  - $issue"
+          done
+        fi
+      register: container_health
+      changed_when: false
+
+    - name: Test service endpoints
+      shell: |
+        echo "=== SERVICE ENDPOINT HEALTH ==="
+
+        endpoint_issues=()
+        total_endpoints=0
+        healthy_endpoints=0
+
+        {% for service in current_health_checks %}
+        {% if service.health_url is defined %}
+        echo "🌐 Testing {{ service.name }} endpoint..."
+        total_endpoints=$((total_endpoints + 1))
+
+        # Test HTTP endpoint
+        response_code=$(curl -s -o /dev/null -w "%{http_code}" --max-time {{ health_check_timeout }} "{{ service.health_url }}" 2>/dev/null || echo "000")
+        response_time=$(curl -s -o /dev/null -w "%{time_total}" --max-time {{ health_check_timeout }} "{{ service.health_url }}" 2>/dev/null || echo "timeout")
+
+        if [ "$response_code" = "{{ service.expected_status }}" ]; then
+          echo "  ✅ HTTP $response_code (${response_time}s): {{ service.health_url }}"
+          healthy_endpoints=$((healthy_endpoints + 1))
+        else
+          echo "  ❌ HTTP $response_code (expected {{ service.expected_status }}): {{ service.health_url }}"
+          endpoint_issues+=("{{ service.name }}:http_$response_code")
+        fi
+        {% endif %}
+        {% endfor %}
+
+        echo ""
+        echo "📊 ENDPOINT SUMMARY:"
+        echo "Total endpoints tested: $total_endpoints"
+        echo "Healthy endpoints: $healthy_endpoints"
+        echo "Issues found: ${#endpoint_issues[@]}"
+
+        if [ ${#endpoint_issues[@]} -gt 0 ]; then
+          echo "🚨 ENDPOINT ISSUES:"
+          for issue in "${endpoint_issues[@]}"; do
+            echo "  - $issue"
+          done
+        fi
+      register: endpoint_health
+      changed_when: false
+
+    - name: Check system resources and performance
+      shell: |
+        echo "=== SYSTEM PERFORMANCE ==="
+
+        # CPU usage
+        cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
+        echo "🖥️  CPU Usage: ${cpu_usage}%"
+
+        # Memory usage
+        memory_info=$(free -h | awk 'NR==2{printf "Used: %s/%s (%.1f%%)", $3, $2, $3*100/$2}')
+        echo "💾 Memory: $memory_info"
+
+        # Disk usage for critical paths
+        echo "💿 Disk Usage:"
+        df -h / | tail -1 | awk '{printf "  Root: %s used (%s)\n", $5, $4}'
+
+        {% if inventory_hostname in ['atlantis', 'calypso'] %}
+        # Synology specific checks
+        if [ -d "/volume1" ]; then
+          df -h /volume1 | tail -1 | awk '{printf "  Volume1: %s used (%s)\n", $5, $4}'
+        fi
+        {% endif %}
+
+        # Load average
+        load_avg=$(uptime | awk -F'load average:' '{print $2}')
+        echo "⚖️  Load Average:$load_avg"
+
+        # Network connectivity
+        echo "🌐 Network:"
+        if ping -c 1 8.8.8.8 >/dev/null 2>&1; then
+          echo "  ✅ Internet connectivity"
+        else
+          echo "  ❌ Internet connectivity failed"
+        fi
+
+        # Tailscale status
+        if command -v tailscale >/dev/null 2>&1; then
+          tailscale_status=$(tailscale status --json 2>/dev/null | jq -r '.Self.Online' 2>/dev/null || echo "unknown")
+          if [ "$tailscale_status" = "true" ]; then
+            echo "  ✅ Tailscale connected"
+          else
+            echo "  ❌ Tailscale status: $tailscale_status"
+          fi
+        fi
+      register: system_performance
+      when: include_performance | bool
+      changed_when: false
+
+    - name: Check critical service dependencies
+      shell: |
+        echo "=== SERVICE DEPENDENCIES ==="
+
+        dependency_issues=()
+
+        # Check database connections for services that need them
+        {% for service in current_health_checks %}
+        {% if service.name in ['immich-server', 'vaultwarden', 'authentik-server', 'paperless-webserver'] %}
+        echo "🔍 Checking {{ service.name }} database dependency..."
+
+        # Try to find associated database container
+        db_container=""
+        case "{{ service.name }}" in
+          "immich-server") db_container="immich-db" ;;
+          "vaultwarden") db_container="vaultwarden-db" ;;
+          "authentik-server") db_container="authentik-db" ;;
+          "paperless-webserver") db_container="paperless-db" ;;
+        esac
+
+        if [ -n "$db_container" ]; then
+          if docker ps --filter "name=$db_container" --format "{{.Names}}" | grep -q "$db_container"; then
+            echo "  ✅ Database container running: $db_container"
+
+            # Test database connection
+            if docker exec "$db_container" pg_isready >/dev/null 2>&1; then
+              echo "  ✅ Database accepting connections"
+            else
+              echo "  ❌ Database not accepting connections"
+              dependency_issues+=("{{ service.name }}:database_connection")
+            fi
+          else
+            echo "  ❌ Database container not running: $db_container"
+            dependency_issues+=("{{ service.name }}:database_down")
+          fi
+        fi
+        {% endif %}
+        {% endfor %}
+
+        # Check Redis dependencies
+        {% for service in current_health_checks %}
+        {% if service.name in ['immich-server'] %}
+        echo "🔍 Checking {{ service.name }} Redis dependency..."
+
+        redis_container=""
+        case "{{ service.name }}" in
+          "immich-server") redis_container="immich-redis" ;;
+        esac
+
+        if [ -n "$redis_container" ]; then
+          if docker ps --filter "name=$redis_container" --format "{{.Names}}" | grep -q "$redis_container"; then
+            echo "  ✅ Redis container running: $redis_container"
+
+            # Test Redis connection
+            if docker exec "$redis_container" redis-cli ping | grep -q "PONG"; then
+              echo "  ✅ Redis responding to ping"
+            else
+              echo "  ❌ Redis not responding"
+              dependency_issues+=("{{ service.name }}:redis_connection")
+            fi
+          else
+            echo "  ❌ Redis container not running: $redis_container"
+            dependency_issues+=("{{ service.name }}:redis_down")
+          fi
+        fi
+        {% endif %}
+        {% endfor %}
+
+        echo ""
+        echo "📊 DEPENDENCY SUMMARY:"
+        echo "Issues found: ${#dependency_issues[@]}"
+
+        if [ ${#dependency_issues[@]} -gt 0 ]; then
+          echo "🚨 DEPENDENCY ISSUES:"
+          for issue in "${dependency_issues[@]}"; do
+            echo "  - $issue"
+          done
+        fi
+      register: dependency_health
+      changed_when: false
+
+    - name: Analyze service logs for errors
+      shell: |
+        echo "=== SERVICE LOG ANALYSIS ==="
+
+        log_issues=()
+
+        {% for service in current_health_checks %}
+        echo "📝 Analyzing {{ service.name }} logs..."
+
+        if docker ps --filter "name={{ service.container }}" --format "{{.Names}}" | grep -q "{{ service.container }}"; then
+          # Get recent logs and check for errors
+          error_count=$(docker logs {{ service.container }} --since=1h 2>&1 | grep -i -E "(error|exception|failed|fatal|panic)" | wc -l)
+          warn_count=$(docker logs {{ service.container }} --since=1h 2>&1 | grep -i -E "(warn|warning)" | wc -l)
+
+          echo "  Errors (1h): $error_count"
+          echo "  Warnings (1h): $warn_count"
+
+          if [ $error_count -gt 10 ]; then
+            echo "  ⚠️  High error count detected"
+            log_issues+=("{{ service.name }}:high_error_count:$error_count")
+          elif [ $error_count -gt 0 ]; then
+            echo "  ℹ️  Some errors detected"
+          else
+            echo "  ✅ No errors in recent logs"
+          fi
+
+          # Show recent critical errors
+          if [ $error_count -gt 0 ]; then
+            echo "  Recent errors:"
+            docker logs {{ service.container }} --since=1h 2>&1 | grep -i -E "(error|exception|failed|fatal|panic)" | tail -3 | sed 's/^/    /'
+          fi
+        else
+          echo "  ❌ Container not running"
+        fi
+        echo ""
+        {% endfor %}
+
+        echo "📊 LOG ANALYSIS SUMMARY:"
+        echo "Issues found: ${#log_issues[@]}"
+
+        if [ ${#log_issues[@]} -gt 0 ]; then
+          echo "🚨 LOG ISSUES:"
+          for issue in "${log_issues[@]}"; do
+            echo "  - $issue"
+          done
+        fi
+      register: log_analysis
+      changed_when: false
+
+    - name: Generate comprehensive health report
+      copy:
+        content: |
+          🏥 DEEP SERVICE HEALTH REPORT - {{ inventory_hostname }}
+          =====================================================
+
+          📅 Health Check Date: {{ ansible_date_time.iso8601 }}
+          🖥️  Host: {{ inventory_hostname }}
+          📊 Services Checked: {{ current_health_checks | length }}
+          ⏱️  Check Timeout: {{ health_check_timeout }}s
+
+          🐳 DOCKER DAEMON HEALTH:
+          {{ docker_health.stdout }}
+
+          📦 CONTAINER HEALTH:
+          {{ container_health.stdout }}
+
+          🌐 ENDPOINT HEALTH:
+          {{ endpoint_health.stdout }}
+
+          {% if include_performance %}
+          📊 SYSTEM PERFORMANCE:
+          {{ system_performance.stdout }}
+          {% endif %}
+
+          🔗 SERVICE DEPENDENCIES:
+          {{ dependency_health.stdout }}
+
+          📝 LOG ANALYSIS:
+          {{ log_analysis.stdout }}
+
+          🎯 CRITICAL SERVICES STATUS:
+          {% for service in current_health_checks %}
+          {% if service.critical %}
+          - {{ service.name }}: {% if service.container in container_health.stdout %}✅ Running{% else %}❌ Issues{% endif %}
+          {% endif %}
+          {% endfor %}
+
+          💡 RECOMMENDATIONS:
+          {% if 'Issues found: 0' not in container_health.stdout %}
+          - 🚨 Address container issues immediately
+          {% endif %}
+          {% if 'Issues found: 0' not in endpoint_health.stdout %}
+          - 🌐 Check service endpoint connectivity
+          {% endif %}
+          {% if 'Issues found: 0' not in dependency_health.stdout %}
+          - 🔗 Resolve service dependency issues
+          {% endif %}
+          - 📊 Monitor resource usage trends
+          - 🔄 Schedule regular health checks
+          - 📝 Set up log monitoring alerts
+
+          ✅ HEALTH CHECK COMPLETE
+
+        dest: "{{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_health_report.txt"
+      delegate_to: localhost
+
+    - name: Create health status JSON for automation
+      copy:
+        content: |
+          {
+            "timestamp": "{{ ansible_date_time.iso8601 }}",
+            "hostname": "{{ inventory_hostname }}",
+            "health_check_summary": {
+              "total_services": {{ current_health_checks | length }},
+              "critical_services": {{ current_health_checks | selectattr('critical', 'equalto', true) | list | length }},
+              "docker_healthy": {{ 'true' if 'Docker daemon: Running' in docker_health.stdout else 'false' }},
+              "overall_status": "{% if 'Issues found: 0' in container_health.stdout and 'Issues found: 0' in endpoint_health.stdout %}HEALTHY{% else %}ISSUES_DETECTED{% endif %}"
+            },
+            "services": [
+              {% for service in current_health_checks %}
+              {
+                "name": "{{ service.name }}",
+                "container": "{{ service.container }}",
+                "critical": {{ service.critical | lower }},
+                "status": "{% if service.container in container_health.stdout %}running{% else %}down{% endif %}"
+              }{% if not loop.last %},{% endif %}
+              {% endfor %}
+            ]
+          }
+        dest: "{{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_health_status.json"
+      delegate_to: localhost
+
+    - name: Display health check summary
+      debug:
+        msg: |
+
+          🏥 DEEP HEALTH CHECK COMPLETE - {{ inventory_hostname }}
+          ===============================================
+
+          📅 Date: {{ ansible_date_time.date }}
+          📊 Services: {{ current_health_checks | length }}
+
+          🎯 CRITICAL SERVICES:
+          {% for service in current_health_checks %}
+          {% if service.critical %}
+          - {{ service.name }}: {% if service.container in container_health.stdout %}✅ OK{% else %}❌ ISSUES{% endif %}
+          {% endif %}
+          {% endfor %}
+
+          📊 SUMMARY:
+          - Docker: {{ '✅ Healthy' if 'Docker daemon: Running' in docker_health.stdout else '❌ Issues' }}
+          - Containers: {{ '✅ All OK' if 'Issues found: 0' in container_health.stdout else '⚠️ Issues Found' }}
+          - Endpoints: {{ '✅ All OK' if 'Issues found: 0' in endpoint_health.stdout else '⚠️ Issues Found' }}
+          - Dependencies: {{ '✅ All OK' if 'Issues found: 0' in dependency_health.stdout else '⚠️ Issues Found' }}
+
+          📄 Reports:
+          - {{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_health_report.txt
+          - {{ report_dir }}/{{ ansible_date_time.date }}/{{ inventory_hostname }}_health_status.json
+
+          🔍 Next Steps:
+          - Review detailed report for specific issues
+          - Address any critical service problems
+          - Schedule regular health monitoring
+
+          ===============================================
+
+    - name: Send health alerts (if issues detected)
+      debug:
+        msg: |
+          🚨 HEALTH ALERT - {{ inventory_hostname }}
+          Critical issues detected in service health check!
+          Check the detailed report immediately.
+      when:
+        - alert_on_issues | bool
+        - "'ISSUES_DETECTED' in lookup('file', report_dir + '/' + ansible_date_time.date + '/' + inventory_hostname + '_health_status.json')"
--- a/ansible/automation/playbooks/service_inventory.yml
+++ b/ansible/automation/playbooks/service_inventory.yml
@@ -0,0 +1,331 @@
+---
+- name: Service Inventory and Documentation Generator
+  hosts: all
+  gather_facts: yes
+  vars:
+    inventory_timestamp: "{{ ansible_date_time.iso8601 }}"
+    inventory_dir: "/tmp/service_inventory"
+    documentation_dir: "/tmp/service_docs"
+
+  tasks:
+    - name: Create inventory directories
+      file:
+        path: "{{ item }}"
+        state: directory
+        mode: '0755'
+      loop:
+        - "{{ inventory_dir }}"
+        - "{{ documentation_dir }}"
+      delegate_to: localhost
+      run_once: true
+
+    - name: Check if Docker is available
+      shell: command -v docker >/dev/null 2>&1
+      register: docker_available
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Skip Docker tasks if not available
+      set_fact:
+        skip_docker: "{{ docker_available.rc != 0 }}"
+
+    - name: Discover running services
+      shell: |
+        echo "=== SERVICE DISCOVERY ==="
+
+        # System services (systemd)
+        if command -v systemctl >/dev/null 2>&1; then
+          echo "SYSTEMD_SERVICES:"
+          systemctl list-units --type=service --state=active --no-legend | head -20 | while read service rest; do
+            port_info=""
+            # Try to extract port information from service files
+            if systemctl show "$service" --property=ExecStart 2>/dev/null | grep -qE ":[0-9]+"; then
+              port_info=$(systemctl show "$service" --property=ExecStart 2>/dev/null | grep -oE ":[0-9]+" | head -1)
+            fi
+            echo "$service$port_info"
+          done
+          echo ""
+        fi
+
+        # Synology services (if available)
+        if command -v synoservice >/dev/null 2>&1; then
+          echo "SYNOLOGY_SERVICES:"
+          synoservice --list 2>/dev/null | grep -E "^\[.*\].*running" | head -20
+          echo ""
+        fi
+
+        # Network services (listening ports)
+        echo "NETWORK_SERVICES:"
+        if command -v netstat >/dev/null 2>&1; then
+          netstat -tuln 2>/dev/null | grep LISTEN | head -20
+        elif command -v ss >/dev/null 2>&1; then
+          ss -tuln 2>/dev/null | grep LISTEN | head -20
+        fi
+        echo ""
+      register: system_services
+      changed_when: false
+
+    - name: Discover Docker services
+      shell: |
+        if ! command -v docker >/dev/null 2>&1; then
+          echo "Docker not available"
+          exit 0
+        fi
+
+        echo "=== DOCKER SERVICE DISCOVERY ==="
+
+        # Get detailed container information
+        docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}\t{{.Ports}}" 2>/dev/null | while IFS=$'\t' read name image status ports; do
+          if [ "$name" != "NAMES" ]; then
+            echo "CONTAINER: $name"
+            echo "  Image: $image"
+            echo "  Status: $status"
+            echo "  Ports: $ports"
+
+            # Try to get more details
+            labels=$(docker inspect "$name" --format '{{range $key, $value := .Config.Labels}}{{$key}}={{$value}}{{"\n"}}{{end}}' 2>/dev/null | head -5)
+            if [ -n "$labels" ]; then
+              echo "  Labels:"
+              echo "$labels" | sed 's/^/    /'
+            fi
+
+            # Check for health status
+            health=$(docker inspect "$name" --format '{{.State.Health.Status}}' 2>/dev/null)
+            if [ "$health" != "<no value>" ] && [ -n "$health" ]; then
+              echo "  Health: $health"
+            fi
+
+            echo ""
+          fi
+        done
+      register: docker_services
+      changed_when: false
+      when: not skip_docker
+
+    - name: Analyze service configurations
+      shell: |
+        echo "=== CONFIGURATION ANALYSIS ==="
+
+        # Find common configuration directories
+        config_dirs="/etc /opt /home/*/config /volume1/docker"
+
+        echo "Configuration directories found:"
+        for dir in $config_dirs; do
+          if [ -d "$dir" ]; then
+            # Look for common config files
+            find "$dir" -maxdepth 3 -name "*.conf" -o -name "*.yaml" -o -name "*.yml" -o -name "*.json" -o -name "*.env" 2>/dev/null | head -10 | while read config_file; do
+              if [ -r "$config_file" ]; then
+                echo "  $config_file"
+              fi
+            done
+          fi
+        done
+        echo ""
+
+        # Docker Compose files
+        echo "Docker Compose files:"
+        find /opt /home -name "docker-compose*.yml" -o -name "compose*.yml" 2>/dev/null | head -10 | while read compose_file; do
+          echo "  $compose_file"
+          # Extract service names
+          services=$(grep -E "^  [a-zA-Z0-9_-]+:" "$compose_file" 2>/dev/null | sed 's/://g' | sed 's/^  //' | head -5)
+          if [ -n "$services" ]; then
+            echo "    Services: $(echo $services | tr '\n' ' ')"
+          fi
+        done
+      register: config_analysis
+      changed_when: false
+
+    - name: Detect web interfaces and APIs
+      shell: |
+        echo "=== WEB INTERFACE DETECTION ==="
+
+        # Common web interface ports
+        web_ports="80 443 8080 8443 3000 5000 8000 9000 9090 3001 8081 8082 8083 8084 8085"
+
+        for port in $web_ports; do
+          # Check if port is listening
+          if netstat -tuln 2>/dev/null | grep -q ":$port " || ss -tuln 2>/dev/null | grep -q ":$port "; then
+            echo "Port $port is active"
+
+            # Try to detect service type
+            if curl -s -m 3 -I "http://localhost:$port" 2>/dev/null | head -1 | grep -q "200\|301\|302"; then
+              server_header=$(curl -s -m 3 -I "http://localhost:$port" 2>/dev/null | grep -i "server:" | head -1)
+              title=$(curl -s -m 3 "http://localhost:$port" 2>/dev/null | grep -i "<title>" | head -1 | sed 's/<[^>]*>//g' | xargs)
+
+              echo "  HTTP Response: OK"
+              if [ -n "$server_header" ]; then
+                echo "  $server_header"
+              fi
+              if [ -n "$title" ]; then
+                echo "  Title: $title"
+              fi
+
+              # Check for common API endpoints
+              for endpoint in /api /health /status /metrics /version; do
+                if curl -s -m 2 "http://localhost:$port$endpoint" >/dev/null 2>&1; then
+                  echo "  API endpoint: http://localhost:$port$endpoint"
+                  break
+                fi
+              done
+            fi
+            echo ""
+          fi
+        done
+      register: web_interfaces
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Generate service catalog
+      set_fact:
+        service_catalog:
+          timestamp: "{{ inventory_timestamp }}"
+          hostname: "{{ inventory_hostname }}"
+          system_info:
+            os: "{{ ansible_distribution }} {{ ansible_distribution_version }}"
+            kernel: "{{ ansible_kernel }}"
+            architecture: "{{ ansible_architecture }}"
+          services:
+            system: "{{ system_services.stdout }}"
+            docker: "{{ docker_services.stdout if not skip_docker else 'Docker not available' }}"
+            configurations: "{{ config_analysis.stdout }}"
+            web_interfaces: "{{ web_interfaces.stdout }}"
+
+    - name: Display service inventory
+      debug:
+        msg: |
+
+          ==========================================
+          📋 SERVICE INVENTORY - {{ inventory_hostname }}
+          ==========================================
+
+          🖥️  SYSTEM INFO:
+          - OS: {{ service_catalog.system_info.os }}
+          - Kernel: {{ service_catalog.system_info.kernel }}
+          - Architecture: {{ service_catalog.system_info.architecture }}
+
+          🔧 SYSTEM SERVICES:
+          {{ service_catalog.services.system }}
+
+          🐳 DOCKER SERVICES:
+          {{ service_catalog.services.docker }}
+
+          ⚙️  CONFIGURATIONS:
+          {{ service_catalog.services.configurations }}
+
+          🌐 WEB INTERFACES:
+          {{ service_catalog.services.web_interfaces }}
+
+          ==========================================
+
+    - name: Generate JSON service inventory
+      copy:
+        content: |
+          {
+            "timestamp": "{{ service_catalog.timestamp }}",
+            "hostname": "{{ service_catalog.hostname }}",
+            "system_info": {
+              "os": "{{ service_catalog.system_info.os }}",
+              "kernel": "{{ service_catalog.system_info.kernel }}",
+              "architecture": "{{ service_catalog.system_info.architecture }}"
+            },
+            "services": {
+              "system": {{ service_catalog.services.system | to_json }},
+              "docker": {{ service_catalog.services.docker | to_json }},
+              "configurations": {{ service_catalog.services.configurations | to_json }},
+              "web_interfaces": {{ service_catalog.services.web_interfaces | to_json }}
+            }
+          }
+        dest: "{{ inventory_dir }}/{{ inventory_hostname }}_inventory_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Generate Markdown documentation
+      copy:
+        content: |
+          # Service Documentation - {{ inventory_hostname }}
+
+          **Generated:** {{ inventory_timestamp }}
+          **System:** {{ service_catalog.system_info.os }} ({{ service_catalog.system_info.architecture }})
+
+          ## 🔧 System Services
+
+          ```
+          {{ service_catalog.services.system }}
+          ```
+
+          ## 🐳 Docker Services
+
+          ```
+          {{ service_catalog.services.docker }}
+          ```
+
+          ## ⚙️ Configuration Files
+
+          ```
+          {{ service_catalog.services.configurations }}
+          ```
+
+          ## 🌐 Web Interfaces & APIs
+
+          ```
+          {{ service_catalog.services.web_interfaces }}
+          ```
+
+          ## 📊 Quick Stats
+
+          - **Hostname:** {{ inventory_hostname }}
+          - **OS:** {{ service_catalog.system_info.os }}
+          - **Kernel:** {{ service_catalog.system_info.kernel }}
+          - **Architecture:** {{ service_catalog.system_info.architecture }}
+          - **Docker Available:** {{ 'Yes' if not skip_docker else 'No' }}
+
+          ---
+
+          *Auto-generated by Ansible service_inventory.yml playbook*
+        dest: "{{ documentation_dir }}/{{ inventory_hostname }}_services.md"
+      delegate_to: localhost
+
+    - name: Generate consolidated inventory (run once)
+      shell: |
+        cd "{{ inventory_dir }}"
+
+        echo "# Homelab Service Inventory" > consolidated_inventory.md
+        echo "" >> consolidated_inventory.md
+        echo "**Generated:** {{ inventory_timestamp }}" >> consolidated_inventory.md
+        echo "" >> consolidated_inventory.md
+
+        # Process all JSON files
+        for json_file in *_inventory_*.json; do
+          if [ -f "$json_file" ]; then
+            hostname=$(basename "$json_file" | cut -d'_' -f1)
+            echo "## 🖥️ $hostname" >> consolidated_inventory.md
+            echo "" >> consolidated_inventory.md
+
+            # Extract key information using basic tools
+            if command -v jq >/dev/null 2>&1; then
+              os=$(jq -r '.system_info.os' "$json_file" 2>/dev/null || echo "Unknown")
+              echo "- **OS:** $os" >> consolidated_inventory.md
+              echo "- **File:** [$json_file](./$json_file)" >> consolidated_inventory.md
+              echo "- **Documentation:** [${hostname}_services.md](../service_docs/${hostname}_services.md)" >> consolidated_inventory.md
+            else
+              echo "- **File:** [$json_file](./$json_file)" >> consolidated_inventory.md
+            fi
+            echo "" >> consolidated_inventory.md
+          fi
+        done
+
+        echo "---" >> consolidated_inventory.md
+        echo "*Auto-generated by Ansible service_inventory.yml playbook*" >> consolidated_inventory.md
+      delegate_to: localhost
+      run_once: true
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          📋 Service inventory complete for {{ inventory_hostname }}
+          📄 JSON Report: {{ inventory_dir }}/{{ inventory_hostname }}_inventory_{{ ansible_date_time.epoch }}.json
+          📖 Markdown Doc: {{ documentation_dir }}/{{ inventory_hostname }}_services.md
+          📚 Consolidated: {{ inventory_dir }}/consolidated_inventory.md
+
+          💡 Use this playbook regularly to maintain up-to-date service documentation
+          💡 JSON files can be consumed by monitoring systems or dashboards
--- a/ansible/automation/playbooks/service_status.yml
+++ b/ansible/automation/playbooks/service_status.yml
@@ -0,0 +1,337 @@
+---
+# Service Status Check Playbook
+# Get comprehensive status of all services across homelab infrastructure
+# Usage: ansible-playbook playbooks/service_status.yml
+# Usage with specific host: ansible-playbook playbooks/service_status.yml --limit atlantis
+
+- name: Check Service Status Across Homelab
+  hosts: all
+  gather_facts: yes
+  vars:
+    portainer_endpoints:
+      atlantis: "https://192.168.0.200:9443"
+      calypso: "https://192.168.0.201:9443"
+      concord_nuc: "https://192.168.0.202:9443"
+      homelab_vm: "https://192.168.0.203:9443"
+      rpi5_vish: "https://192.168.0.204:9443"
+
+  tasks:
+    - name: Detect system type and environment
+      set_fact:
+        system_type: >-
+          {{
+            'synology' if (ansible_system_vendor is defined and 'synology' in ansible_system_vendor | lower) or
+                         (ansible_distribution is defined and 'dsm' in ansible_distribution | lower) or
+                         (ansible_hostname is defined and ('atlantis' in ansible_hostname or 'calypso' in ansible_hostname))
+            else 'container' if ansible_virtualization_type is defined and ansible_virtualization_type in ['docker', 'container']
+            else 'standard'
+          }}
+
+    - name: Check if Docker is running (Standard Linux with systemd)
+      systemd:
+        name: docker
+      register: docker_status_systemd
+      when: system_type == "standard"
+      ignore_errors: yes
+
+    - name: Check if Docker is running (Synology DSM)
+      shell: |
+        # Multiple methods to check Docker on Synology
+        if command -v synoservice >/dev/null 2>&1; then
+          # Method 1: Use synoservice (DSM 6.x/7.x)
+          if synoservice --status pkgctl-Docker 2>/dev/null | grep -q "start\|running"; then
+            echo "active"
+          elif synoservice --status Docker 2>/dev/null | grep -q "start\|running"; then
+            echo "active"
+          else
+            echo "inactive"
+          fi
+        elif command -v docker >/dev/null 2>&1; then
+          # Method 2: Direct Docker check
+          if docker info >/dev/null 2>&1; then
+            echo "active"
+          else
+            echo "inactive"
+          fi
+        elif [ -f /var/packages/Docker/enabled ]; then
+          # Method 3: Check package status file
+          echo "active"
+        else
+          echo "not-found"
+        fi
+      register: docker_status_synology
+      when: system_type == "synology"
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check if Docker is running (Container/Other environments)
+      shell: |
+        if command -v docker >/dev/null 2>&1; then
+          if docker info >/dev/null 2>&1; then
+            echo "active"
+          else
+            echo "inactive"
+          fi
+        else
+          echo "not-found"
+        fi
+      register: docker_status_other
+      when: system_type == "container"
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Set unified Docker status
+      set_fact:
+        docker_running: >-
+          {{
+            (docker_status_systemd is defined and docker_status_systemd.status is defined and docker_status_systemd.status.ActiveState == "active") or
+            (docker_status_synology is defined and docker_status_synology.stdout is defined and docker_status_synology.stdout == "active") or
+            (docker_status_other is defined and docker_status_other.stdout is defined and docker_status_other.stdout == "active")
+          }}
+
+    - name: Get Docker container status
+      shell: |
+        if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
+          echo "=== DOCKER CONTAINERS ==="
+          # Use simpler format to avoid template issues
+          {% raw %}
+          docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Image}}" 2>/dev/null || echo "Permission denied or no containers"
+          {% endraw %}
+          echo ""
+          echo "=== CONTAINER SUMMARY ==="
+          running=$(docker ps -q 2>/dev/null | wc -l)
+          total=$(docker ps -aq 2>/dev/null | wc -l)
+          echo "Running: $running"
+          echo "Total: $total"
+        else
+          echo "Docker not available or not accessible"
+        fi
+      register: container_status
+      when: docker_running | bool
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check system resources
+      shell: |
+        echo "=== SYSTEM RESOURCES ==="
+        echo "CPU Usage: $(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)%"
+        echo "Memory: $(free -h | awk 'NR==2{printf "%.1f%% (%s/%s)", $3*100/$2, $3, $2}')"
+        echo "Disk: $(df -h / | awk 'NR==2{printf "%s (%s used)", $5, $3}')"
+        echo "Load Average: $(uptime | awk -F'load average:' '{print $2}')"
+      register: system_resources
+
+    - name: Check critical services (Standard Linux)
+      systemd:
+        name: "{{ item }}"
+      register: critical_services_systemd
+      loop:
+        - docker
+        - ssh
+        - tailscaled
+      when: system_type == "standard"
+      ignore_errors: yes
+
+    - name: Check critical services (Synology)
+      shell: |
+        service_name="{{ item }}"
+        case "$service_name" in
+          "docker")
+            if command -v synoservice >/dev/null 2>&1; then
+              if synoservice --status pkgctl-Docker 2>/dev/null | grep -q "start\|running"; then
+                echo "active"
+              else
+                echo "inactive"
+              fi
+            elif command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
+              echo "active"
+            else
+              echo "inactive"
+            fi
+            ;;
+          "ssh")
+            if pgrep -f "sshd" >/dev/null 2>&1; then
+              echo "active"
+            else
+              echo "inactive"
+            fi
+            ;;
+          "tailscaled")
+            if pgrep -f "tailscaled" >/dev/null 2>&1; then
+              echo "active"
+            elif command -v tailscale >/dev/null 2>&1 && tailscale status >/dev/null 2>&1; then
+              echo "active"
+            else
+              echo "inactive"
+            fi
+            ;;
+          *)
+            echo "unknown"
+            ;;
+        esac
+      register: critical_services_synology
+      loop:
+        - docker
+        - ssh
+        - tailscaled
+      when: system_type == "synology"
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Check critical services (Container/Other)
+      shell: |
+        service_name="{{ item }}"
+        case "$service_name" in
+          "docker")
+            if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
+              echo "active"
+            else
+              echo "inactive"
+            fi
+            ;;
+          "ssh")
+            if pgrep -f "sshd" >/dev/null 2>&1; then
+              echo "active"
+            else
+              echo "inactive"
+            fi
+            ;;
+          "tailscaled")
+            if pgrep -f "tailscaled" >/dev/null 2>&1; then
+              echo "active"
+            elif command -v tailscale >/dev/null 2>&1 && tailscale status >/dev/null 2>&1; then
+              echo "active"
+            else
+              echo "inactive"
+            fi
+            ;;
+          *)
+            echo "unknown"
+            ;;
+        esac
+      register: critical_services_other
+      loop:
+        - docker
+        - ssh
+        - tailscaled
+      when: system_type == "container"
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Set unified critical services status
+      set_fact:
+        critical_services: >-
+          {{
+            critical_services_systemd if critical_services_systemd is defined and not critical_services_systemd.skipped
+            else critical_services_synology if critical_services_synology is defined and not critical_services_synology.skipped
+            else critical_services_other if critical_services_other is defined and not critical_services_other.skipped
+            else {'results': []}
+          }}
+
+    - name: Check network connectivity
+      shell: |
+        echo "=== NETWORK STATUS ==="
+        echo "Tailscale Status:"
+        tailscale status --json | jq -r '.Self.HostName + " - " + .Self.TailscaleIPs[0]' 2>/dev/null || echo "Tailscale not available"
+        echo "Internet Connectivity:"
+        ping -c 1 8.8.8.8 >/dev/null 2>&1 && echo "✅ Internet OK" || echo "❌ Internet DOWN"
+      register: network_status
+      ignore_errors: yes
+
+    - name: Display comprehensive status report
+      debug:
+        msg: |
+
+          ==========================================
+          📊 SERVICE STATUS REPORT - {{ inventory_hostname }}
+          ==========================================
+
+          🖥️  SYSTEM INFO:
+          - Hostname: {{ ansible_hostname }}
+          - OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
+          - Uptime: {{ ansible_uptime_seconds | int // 86400 }} days, {{ (ansible_uptime_seconds | int % 86400) // 3600 }} hours
+
+          {{ system_resources.stdout }}
+
+          🐳 DOCKER STATUS:
+          {% if docker_running %}
+          ✅ Docker is running ({{ system_type }} system)
+          {% else %}
+          ❌ Docker is not running ({{ system_type }} system)
+          {% endif %}
+
+          📦 CONTAINER STATUS:
+          {% if container_status.stdout is defined %}
+          {{ container_status.stdout }}
+          {% else %}
+          No containers found or Docker not accessible
+          {% endif %}
+
+          🔧 CRITICAL SERVICES:
+          {% if critical_services.results is defined %}
+          {% for service in critical_services.results %}
+          {% if system_type == "standard" and service.status is defined %}
+          {% if service.status.ActiveState == "active" %}
+          ✅ {{ service.item }}: Running
+          {% else %}
+          ❌ {{ service.item }}: {{ service.status.ActiveState | default('Unknown') }}
+          {% endif %}
+          {% else %}
+          {% if service.stdout is defined and service.stdout == "active" %}
+          ✅ {{ service.item }}: Running
+          {% else %}
+          ❌ {{ service.item }}: {{ service.stdout | default('Unknown') }}
+          {% endif %}
+          {% endif %}
+          {% endfor %}
+          {% else %}
+          No service status available
+          {% endif %}
+
+          {{ network_status.stdout }}
+
+          ==========================================
+
+    - name: Generate JSON status report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ ansible_date_time.iso8601 }}",
+            "hostname": "{{ inventory_hostname }}",
+            "system_type": "{{ system_type }}",
+            "system": {
+              "os": "{{ ansible_distribution }} {{ ansible_distribution_version }}",
+              "uptime_days": {{ ansible_uptime_seconds | int // 86400 }},
+              "cpu_count": {{ ansible_processor_vcpus }},
+              "memory_mb": {{ ansible_memtotal_mb }},
+              "docker_status": "{{ 'active' if docker_running else 'inactive' }}"
+            },
+            "containers": {{ (container_status.stdout_lines | default([])) | to_json }},
+            "critical_services": [
+              {% if critical_services.results is defined %}
+              {% for service in critical_services.results %}
+              {
+                "name": "{{ service.item }}",
+                {% if system_type == "standard" and service.status is defined %}
+                "status": "{{ service.status.ActiveState | default('unknown') }}",
+                "enabled": {{ service.status.UnitFileState == "enabled" if service.status.UnitFileState is defined else false }}
+                {% else %}
+                "status": "{{ service.stdout | default('unknown') }}",
+                "enabled": {{ (service.stdout is defined and service.stdout == "active") | bool }}
+                {% endif %}
+              }{% if not loop.last %},{% endif %}
+              {% endfor %}
+              {% endif %}
+            ]
+          }
+        dest: "/tmp/{{ inventory_hostname }}_status_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+      ignore_errors: yes
+
+    - name: Summary message
+      debug:
+        msg: |
+          📋 Status check complete for {{ inventory_hostname }}
+          📄 JSON report saved to: /tmp/{{ inventory_hostname }}_status_{{ ansible_date_time.epoch }}.json
+
+          Run with --limit to check specific hosts:
+          ansible-playbook playbooks/service_status.yml --limit atlantis
--- a/ansible/automation/playbooks/setup_gitea_runner.yml
+++ b/ansible/automation/playbooks/setup_gitea_runner.yml
@@ -0,0 +1,140 @@
+---
+# Setup Gitea Actions Runner
+# This playbook sets up a Gitea Actions runner to process workflow jobs
+# Run with: ansible-playbook -i hosts.ini playbooks/setup_gitea_runner.yml --limit homelab
+#
+# The Gitea API token is prompted at runtime and never stored in this file.
+# Retrieve the token from Vaultwarden (collection: Homelab > Gitea API Tokens).
+
+- name: Setup Gitea Actions Runner
+  hosts: homelab
+  become: yes
+  vars:
+    gitea_url: "https://git.vish.gg"
+    runner_name: "homelab-runner"
+    runner_labels: "ubuntu-latest,linux,x64"
+    runner_dir: "/opt/gitea-runner"
+
+  vars_prompt:
+    - name: gitea_token
+      prompt: "Enter Gitea API token (see Vaultwarden > Homelab > Gitea API Tokens)"
+      private: yes
+
+  tasks:
+    - name: Create runner directory
+      file:
+        path: "{{ runner_dir }}"
+        state: directory
+        owner: root
+        group: root
+        mode: '0755'
+
+    - name: Check if act_runner binary exists
+      stat:
+        path: "{{ runner_dir }}/act_runner"
+      register: runner_binary
+
+    - name: Download act_runner binary
+      get_url:
+        url: "https://dl.gitea.com/act_runner/0.2.6/act_runner-0.2.6-linux-amd64"
+        dest: "{{ runner_dir }}/act_runner"
+        mode: '0755'
+        owner: root
+        group: root
+      when: not runner_binary.stat.exists
+
+    - name: Get registration token from Gitea API
+      uri:
+        url: "{{ gitea_url }}/api/v1/repos/Vish/homelab-optimized/actions/runners/registration-token"
+        method: GET
+        headers:
+          Authorization: "token {{ gitea_token }}"
+        return_content: yes
+      register: registration_response
+      delegate_to: localhost
+      run_once: true
+
+    - name: Extract registration token
+      set_fact:
+        registration_token: "{{ registration_response.json.token }}"
+
+    - name: Check if runner is already registered
+      stat:
+        path: "{{ runner_dir }}/.runner"
+      register: runner_config
+
+    - name: Register runner with Gitea
+      shell: |
+        cd {{ runner_dir }}
+        echo "{{ gitea_url }}" | {{ runner_dir }}/act_runner register \
+          --token {{ registration_token }} \
+          --name {{ runner_name }} \
+          --labels {{ runner_labels }} \
+          --no-interactive
+      when: not runner_config.stat.exists
+
+    - name: Create systemd service file
+      copy:
+        content: |
+          [Unit]
+          Description=Gitea Actions Runner
+          After=network.target
+
+          [Service]
+          Type=simple
+          User=root
+          WorkingDirectory={{ runner_dir }}
+          ExecStart={{ runner_dir }}/act_runner daemon
+          Restart=always
+          RestartSec=5
+
+          [Install]
+          WantedBy=multi-user.target
+        dest: /etc/systemd/system/gitea-runner.service
+        owner: root
+        group: root
+        mode: '0644'
+
+    - name: Reload systemd daemon
+      systemd:
+        daemon_reload: yes
+
+    - name: Enable and start gitea-runner service
+      systemd:
+        name: gitea-runner
+        enabled: yes
+        state: started
+
+    - name: Check runner status
+      systemd:
+        name: gitea-runner
+      register: runner_status
+
+    - name: Display runner status
+      debug:
+        msg: |
+          Gitea Actions Runner Status:
+          - Service: {{ runner_status.status.ActiveState }}
+          - Directory: {{ runner_dir }}
+          - Name: {{ runner_name }}
+          - Labels: {{ runner_labels }}
+          - Gitea URL: {{ gitea_url }}
+
+    - name: Verify runner registration
+      uri:
+        url: "{{ gitea_url }}/api/v1/repos/Vish/homelab-optimized/actions/runners"
+        method: GET
+        headers:
+          Authorization: "token {{ gitea_token }}"
+        return_content: yes
+      register: runners_list
+      delegate_to: localhost
+      run_once: true
+
+    - name: Display registered runners
+      debug:
+        msg: |
+          Registered Runners: {{ runners_list.json.total_count }}
+          {% for runner in runners_list.json.runners %}
+          - {{ runner.name }} ({{ runner.status }})
+          {% endfor %}
--- a/ansible/automation/playbooks/synology_backup_orchestrator.yml
+++ b/ansible/automation/playbooks/synology_backup_orchestrator.yml
@@ -0,0 +1,260 @@
+---
+# Synology Backup Orchestrator
+# Coordinates backups across Atlantis/Calypso with integrity verification
+# Run with: ansible-playbook -i hosts.ini playbooks/synology_backup_orchestrator.yml --limit synology
+
+- name: Synology Backup Orchestration
+  hosts: synology
+  gather_facts: yes
+  vars:
+    backup_retention_days: 30
+    critical_containers:
+      - "postgres"
+      - "mariadb"
+      - "gitea"
+      - "immich-server"
+      - "paperlessngx"
+      - "authentik-server"
+      - "vaultwarden"
+
+    backup_paths:
+      atlantis:
+        - "/volume1/docker"
+        - "/volume1/media"
+        - "/volume1/backups"
+        - "/volume1/documents"
+      calypso:
+        - "/volume1/docker"
+        - "/volume1/backups"
+        - "/volume1/development"
+
+  tasks:
+    - name: Check Synology system status
+      shell: |
+        echo "=== System Info ==="
+        uname -a
+        echo "=== Disk Usage ==="
+        df -h
+        echo "=== Memory Usage ==="
+        free -h
+        echo "=== Load Average ==="
+        uptime
+      register: system_status
+
+    - name: Display system status
+      debug:
+        msg: "{{ system_status.stdout_lines }}"
+
+    - name: Check Docker service status
+      shell: systemctl is-active docker
+      register: docker_status
+      failed_when: false
+
+    - name: Get running containers
+      shell: docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"
+      register: running_containers
+      become: yes
+
+    - name: Identify critical containers
+      shell: docker ps --filter "name={{ item }}" --format "{{.Names}}"
+      register: critical_container_check
+      loop: "{{ critical_containers }}"
+      become: yes
+
+    - name: Create backup directory structure
+      file:
+        path: "/volume1/backups/{{ item }}"
+        state: directory
+        mode: '0755'
+      loop:
+        - "containers"
+        - "databases"
+        - "configs"
+        - "logs"
+      become: yes
+
+    - name: Stop non-critical containers for backup
+      shell: |
+        # Get list of running containers excluding critical ones
+        critical_pattern="{{ critical_containers | join('|') }}"
+        docker ps --format "{{.Names}}" | grep -vE "($critical_pattern)" > /tmp/non_critical_containers.txt || true
+
+        # Stop non-critical containers
+        if [ -s /tmp/non_critical_containers.txt ]; then
+          echo "Stopping non-critical containers for backup..."
+          cat /tmp/non_critical_containers.txt | xargs -r docker stop
+          echo "Stopped containers:"
+          cat /tmp/non_critical_containers.txt
+        else
+          echo "No non-critical containers to stop"
+        fi
+      register: stopped_containers
+      when: stop_containers_for_backup | default(false) | bool
+      become: yes
+
+    - name: Backup Docker volumes
+      shell: |
+        backup_date=$(date +%Y%m%d_%H%M%S)
+        backup_file="/volume1/backups/containers/docker_volumes_${backup_date}.tar.gz"
+
+        echo "Creating Docker volumes backup: $backup_file"
+        tar -czf "$backup_file" -C /volume1/docker . 2>/dev/null || true
+
+        if [ -f "$backup_file" ]; then
+          size=$(du -h "$backup_file" | cut -f1)
+          echo "Backup created successfully: $backup_file ($size)"
+        else
+          echo "Backup failed"
+          exit 1
+        fi
+      register: volume_backup
+      become: yes
+
+    - name: Backup database containers
+      shell: |
+        backup_date=$(date +%Y%m%d_%H%M%S)
+
+        # Backup PostgreSQL databases
+        for container in $(docker ps --filter "ancestor=postgres" --format "{{.Names}}"); do
+          echo "Backing up PostgreSQL container: $container"
+          docker exec "$container" pg_dumpall -U postgres > "/volume1/backups/databases/${container}_${backup_date}.sql" 2>/dev/null || true
+        done
+
+        # Backup MariaDB databases
+        for container in $(docker ps --filter "ancestor=mariadb" --format "{{.Names}}"); do
+          echo "Backing up MariaDB container: $container"
+          docker exec "$container" mysqldump --all-databases -u root > "/volume1/backups/databases/${container}_${backup_date}.sql" 2>/dev/null || true
+        done
+
+        echo "Database backups completed"
+      register: database_backup
+      become: yes
+
+    - name: Backup container configurations
+      shell: |
+        backup_date=$(date +%Y%m%d_%H%M%S)
+        config_backup="/volume1/backups/configs/container_configs_${backup_date}.tar.gz"
+
+        # Find all docker-compose files and configs
+        find /volume1/docker -name "docker-compose.yml" -o -name "*.env" -o -name "config" -type d | \
+        tar -czf "$config_backup" -T - 2>/dev/null || true
+
+        if [ -f "$config_backup" ]; then
+          size=$(du -h "$config_backup" | cut -f1)
+          echo "Configuration backup created: $config_backup ($size)"
+        fi
+      register: config_backup
+      become: yes
+
+    - name: Restart stopped containers
+      shell: |
+        if [ -f /tmp/non_critical_containers.txt ] && [ -s /tmp/non_critical_containers.txt ]; then
+          echo "Restarting previously stopped containers..."
+          cat /tmp/non_critical_containers.txt | xargs -r docker start
+          echo "Restarted containers:"
+          cat /tmp/non_critical_containers.txt
+          rm -f /tmp/non_critical_containers.txt
+        fi
+      when: stop_containers_for_backup | default(false) | bool
+      become: yes
+
+    - name: Verify backup integrity
+      shell: |
+        echo "=== Backup Verification ==="
+
+        # Check volume backup
+        latest_volume_backup=$(ls -t /volume1/backups/containers/docker_volumes_*.tar.gz 2>/dev/null | head -1)
+        if [ -n "$latest_volume_backup" ]; then
+          echo "Volume backup: $latest_volume_backup"
+          tar -tzf "$latest_volume_backup" >/dev/null 2>&1 && echo "✓ Volume backup integrity OK" || echo "✗ Volume backup corrupted"
+        fi
+
+        # Check database backups
+        db_backup_count=$(ls /volume1/backups/databases/*.sql 2>/dev/null | wc -l)
+        echo "Database backups: $db_backup_count files"
+
+        # Check config backup
+        latest_config_backup=$(ls -t /volume1/backups/configs/container_configs_*.tar.gz 2>/dev/null | head -1)
+        if [ -n "$latest_config_backup" ]; then
+          echo "Config backup: $latest_config_backup"
+          tar -tzf "$latest_config_backup" >/dev/null 2>&1 && echo "✓ Config backup integrity OK" || echo "✗ Config backup corrupted"
+        fi
+      register: backup_verification
+      become: yes
+
+    - name: Clean old backups
+      shell: |
+        echo "Cleaning backups older than {{ backup_retention_days }} days..."
+
+        # Clean volume backups
+        find /volume1/backups/containers -name "docker_volumes_*.tar.gz" -mtime +{{ backup_retention_days }} -delete
+
+        # Clean database backups
+        find /volume1/backups/databases -name "*.sql" -mtime +{{ backup_retention_days }} -delete
+
+        # Clean config backups
+        find /volume1/backups/configs -name "container_configs_*.tar.gz" -mtime +{{ backup_retention_days }} -delete
+
+        echo "Cleanup completed"
+      register: backup_cleanup
+      become: yes
+
+    - name: Generate backup report
+      copy:
+        content: |
+          # Synology Backup Report - {{ inventory_hostname }}
+          Generated: {{ ansible_date_time.iso8601 }}
+
+          ## System Status
+          ```
+          {{ system_status.stdout }}
+          ```
+
+          ## Running Containers
+          ```
+          {{ running_containers.stdout }}
+          ```
+
+          ## Backup Operations
+
+          ### Volume Backup
+          ```
+          {{ volume_backup.stdout }}
+          ```
+
+          ### Database Backup
+          ```
+          {{ database_backup.stdout }}
+          ```
+
+          ### Configuration Backup
+          ```
+          {{ config_backup.stdout }}
+          ```
+
+          ## Backup Verification
+          ```
+          {{ backup_verification.stdout }}
+          ```
+
+          ## Cleanup Results
+          ```
+          {{ backup_cleanup.stdout }}
+          ```
+
+          ## Critical Containers Status
+          {% for container in critical_containers %}
+          - {{ container }}: {{ 'Running' if container in running_containers.stdout else 'Not Found' }}
+          {% endfor %}
+        dest: "/tmp/synology_backup_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md"
+      delegate_to: localhost
+
+    - name: Display backup summary
+      debug:
+        msg: |
+          Backup Summary for {{ inventory_hostname }}:
+          - Volume Backup: {{ 'Completed' if volume_backup.rc == 0 else 'Failed' }}
+          - Database Backup: {{ 'Completed' if database_backup.rc == 0 else 'Failed' }}
+          - Config Backup: {{ 'Completed' if config_backup.rc == 0 else 'Failed' }}
+          - Verification: {{ 'Passed' if backup_verification.rc == 0 else 'Failed' }}
+          - Report: /tmp/synology_backup_{{ inventory_hostname }}_{{ ansible_date_time.epoch }}.md
--- a/ansible/automation/playbooks/system_info.yml
+++ b/ansible/automation/playbooks/system_info.yml
@@ -0,0 +1,12 @@
+---
+- name: Display system information
+  hosts: all
+  gather_facts: yes
+  tasks:
+    - name: Print system details
+      debug:
+        msg:
+          - "Hostname: {{ ansible_hostname }}"
+          - "OS: {{ ansible_distribution }} {{ ansible_distribution_version }}"
+          - "Kernel: {{ ansible_kernel }}"
+          - "Uptime (hours): {{ ansible_uptime_seconds | int / 3600 | round(1) }}"
--- a/ansible/automation/playbooks/system_metrics.yml
+++ b/ansible/automation/playbooks/system_metrics.yml
@@ -0,0 +1,259 @@
+---
+# System Metrics Collection Playbook
+# Collects detailed system metrics for monitoring and analysis
+# Usage: ansible-playbook playbooks/system_metrics.yml
+# Usage: ansible-playbook playbooks/system_metrics.yml -e "metrics_duration=300"
+
+- name: Collect System Metrics
+  hosts: all
+  gather_facts: yes
+  vars:
+    metrics_dir: "/tmp/metrics"
+    default_metrics_duration: 60  # seconds
+    collection_interval: 5  # seconds between samples
+
+  tasks:
+    - name: Create metrics directory
+      file:
+        path: "{{ metrics_dir }}/{{ inventory_hostname }}"
+        state: directory
+        mode: '0755'
+
+    - name: Display metrics collection plan
+      debug:
+        msg: |
+          📊 SYSTEM METRICS COLLECTION
+          ===========================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          ⏱️  Duration: {{ metrics_duration | default(default_metrics_duration) }}s
+          📈 Interval: {{ collection_interval }}s
+          📁 Output: {{ metrics_dir }}/{{ inventory_hostname }}
+
+    - name: Collect baseline system information
+      shell: |
+        info_file="{{ metrics_dir }}/{{ inventory_hostname }}/system_info_{{ ansible_date_time.epoch }}.txt"
+
+        echo "📊 SYSTEM BASELINE INFORMATION" > "$info_file"
+        echo "==============================" >> "$info_file"
+        echo "Host: {{ inventory_hostname }}" >> "$info_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$info_file"
+        echo "OS: {{ ansible_distribution }} {{ ansible_distribution_version }}" >> "$info_file"
+        echo "Kernel: {{ ansible_kernel }}" >> "$info_file"
+        echo "Architecture: {{ ansible_architecture }}" >> "$info_file"
+        echo "CPU Cores: {{ ansible_processor_vcpus }}" >> "$info_file"
+        echo "Total Memory: {{ ansible_memtotal_mb }}MB" >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "🖥️  CPU INFORMATION:" >> "$info_file"
+        cat /proc/cpuinfo | grep -E "model name|cpu MHz|cache size" | head -10 >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "💾 MEMORY INFORMATION:" >> "$info_file"
+        cat /proc/meminfo | head -10 >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "💿 DISK INFORMATION:" >> "$info_file"
+        lsblk -o NAME,SIZE,TYPE,MOUNTPOINT >> "$info_file"
+        echo "" >> "$info_file"
+
+        echo "🌐 NETWORK INTERFACES:" >> "$info_file"
+        ip addr show | grep -E "^[0-9]+:|inet " >> "$info_file"
+
+        echo "Baseline info saved to: $info_file"
+      register: baseline_info
+
+    - name: Start continuous metrics collection
+      shell: |
+        metrics_file="{{ metrics_dir }}/{{ inventory_hostname }}/metrics_{{ ansible_date_time.epoch }}.csv"
+
+        # Create CSV header
+        echo "timestamp,cpu_usage,memory_usage,memory_available,load_1min,load_5min,load_15min,disk_usage_root,network_rx_bytes,network_tx_bytes,processes_total,processes_running,docker_containers_running" > "$metrics_file"
+
+        echo "📈 Starting metrics collection for {{ metrics_duration | default(default_metrics_duration) }} seconds..."
+
+        # Get initial network stats
+        initial_rx=$(cat /sys/class/net/*/statistics/rx_bytes 2>/dev/null | awk '{sum+=$1} END {print sum}' || echo "0")
+        initial_tx=$(cat /sys/class/net/*/statistics/tx_bytes 2>/dev/null | awk '{sum+=$1} END {print sum}' || echo "0")
+
+        samples=0
+        max_samples=$(( {{ metrics_duration | default(default_metrics_duration) }} / {{ collection_interval }} ))
+
+        while [ $samples -lt $max_samples ]; do
+          timestamp=$(date '+%Y-%m-%d %H:%M:%S')
+
+          # CPU usage (1 - idle percentage)
+          cpu_usage=$(vmstat 1 2 | tail -1 | awk '{print 100-$15}')
+
+          # Memory usage
+          memory_info=$(free -m)
+          memory_total=$(echo "$memory_info" | awk 'NR==2{print $2}')
+          memory_used=$(echo "$memory_info" | awk 'NR==2{print $3}')
+          memory_available=$(echo "$memory_info" | awk 'NR==2{print $7}')
+          memory_usage=$(echo "scale=1; $memory_used * 100 / $memory_total" | bc -l 2>/dev/null || echo "0")
+
+          # Load averages
+          load_info=$(uptime | awk -F'load average:' '{print $2}' | sed 's/^ *//')
+          load_1min=$(echo "$load_info" | awk -F',' '{print $1}' | sed 's/^ *//')
+          load_5min=$(echo "$load_info" | awk -F',' '{print $2}' | sed 's/^ *//')
+          load_15min=$(echo "$load_info" | awk -F',' '{print $3}' | sed 's/^ *//')
+
+          # Disk usage for root partition
+          disk_usage=$(df / | awk 'NR==2{print $5}' | sed 's/%//')
+
+          # Network stats
+          current_rx=$(cat /sys/class/net/*/statistics/rx_bytes 2>/dev/null | awk '{sum+=$1} END {print sum}' || echo "0")
+          current_tx=$(cat /sys/class/net/*/statistics/tx_bytes 2>/dev/null | awk '{sum+=$1} END {print sum}' || echo "0")
+
+          # Process counts
+          processes_total=$(ps aux | wc -l)
+          processes_running=$(ps aux | awk '$8 ~ /^R/ {count++} END {print count+0}')
+
+          # Docker container count (if available)
+          if command -v docker &> /dev/null && docker info &> /dev/null; then
+            docker_containers=$(docker ps -q | wc -l)
+          else
+            docker_containers=0
+          fi
+
+          # Write metrics to CSV
+          echo "$timestamp,$cpu_usage,$memory_usage,$memory_available,$load_1min,$load_5min,$load_15min,$disk_usage,$current_rx,$current_tx,$processes_total,$processes_running,$docker_containers" >> "$metrics_file"
+
+          samples=$((samples + 1))
+          echo "Sample $samples/$max_samples collected..."
+
+          sleep {{ collection_interval }}
+        done
+
+        echo "✅ Metrics collection complete: $metrics_file"
+      register: metrics_collection
+      async: "{{ ((metrics_duration | default(default_metrics_duration)) | int) + 30 }}"
+      poll: 10
+
+    - name: Collect Docker metrics (if available)
+      shell: |
+        docker_file="{{ metrics_dir }}/{{ inventory_hostname }}/docker_metrics_{{ ansible_date_time.epoch }}.txt"
+
+        if command -v docker &> /dev/null && docker info &> /dev/null; then
+          echo "🐳 DOCKER METRICS" > "$docker_file"
+          echo "=================" >> "$docker_file"
+          echo "Timestamp: {{ ansible_date_time.iso8601 }}" >> "$docker_file"
+          echo "" >> "$docker_file"
+
+          echo "📊 DOCKER SYSTEM INFO:" >> "$docker_file"
+          docker system df >> "$docker_file" 2>/dev/null || echo "Cannot get Docker system info" >> "$docker_file"
+          echo "" >> "$docker_file"
+
+          echo "📦 CONTAINER STATS:" >> "$docker_file"
+          docker stats --no-stream --format "table {{ '{{' }}.Container{{ '}}' }}\t{{ '{{' }}.CPUPerc{{ '}}' }}\t{{ '{{' }}.MemUsage{{ '}}' }}\t{{ '{{' }}.MemPerc{{ '}}' }}\t{{ '{{' }}.NetIO{{ '}}' }}\t{{ '{{' }}.BlockIO{{ '}}' }}" >> "$docker_file" 2>/dev/null || echo "Cannot get container stats" >> "$docker_file"
+          echo "" >> "$docker_file"
+
+          echo "🏃 RUNNING CONTAINERS:" >> "$docker_file"
+          docker ps --format "table {{ '{{' }}.Names{{ '}}' }}\t{{ '{{' }}.Image{{ '}}' }}\t{{ '{{' }}.Status{{ '}}' }}\t{{ '{{' }}.Ports{{ '}}' }}" >> "$docker_file" 2>/dev/null || echo "Cannot list containers" >> "$docker_file"
+          echo "" >> "$docker_file"
+
+          echo "🔍 CONTAINER RESOURCE USAGE:" >> "$docker_file"
+          for container in $(docker ps --format "{{ '{{' }}.Names{{ '}}' }}" 2>/dev/null); do
+            echo "--- $container ---" >> "$docker_file"
+            docker exec "$container" sh -c 'top -bn1 | head -5' >> "$docker_file" 2>/dev/null || echo "Cannot access container $container" >> "$docker_file"
+            echo "" >> "$docker_file"
+          done
+
+          echo "Docker metrics saved to: $docker_file"
+        else
+          echo "Docker not available - skipping Docker metrics"
+        fi
+      register: docker_metrics
+      failed_when: false
+
+    - name: Collect network metrics
+      shell: |
+        network_file="{{ metrics_dir }}/{{ inventory_hostname }}/network_metrics_{{ ansible_date_time.epoch }}.txt"
+
+        echo "🌐 NETWORK METRICS" > "$network_file"
+        echo "==================" >> "$network_file"
+        echo "Timestamp: {{ ansible_date_time.iso8601 }}" >> "$network_file"
+        echo "" >> "$network_file"
+
+        echo "🔌 INTERFACE STATISTICS:" >> "$network_file"
+        cat /proc/net/dev >> "$network_file"
+        echo "" >> "$network_file"
+
+        echo "🔗 ACTIVE CONNECTIONS:" >> "$network_file"
+        netstat -tuln | head -20 >> "$network_file" 2>/dev/null || ss -tuln | head -20 >> "$network_file" 2>/dev/null || echo "Cannot get connection info" >> "$network_file"
+        echo "" >> "$network_file"
+
+        echo "📡 ROUTING TABLE:" >> "$network_file"
+        ip route >> "$network_file" 2>/dev/null || route -n >> "$network_file" 2>/dev/null || echo "Cannot get routing info" >> "$network_file"
+        echo "" >> "$network_file"
+
+        echo "🌍 DNS CONFIGURATION:" >> "$network_file"
+        cat /etc/resolv.conf >> "$network_file" 2>/dev/null || echo "Cannot read DNS config" >> "$network_file"
+
+        echo "Network metrics saved to: $network_file"
+      register: network_metrics
+
+    - name: Generate metrics summary
+      shell: |
+        summary_file="{{ metrics_dir }}/{{ inventory_hostname }}/metrics_summary_{{ ansible_date_time.epoch }}.txt"
+        metrics_csv="{{ metrics_dir }}/{{ inventory_hostname }}/metrics_{{ ansible_date_time.epoch }}.csv"
+
+        echo "📊 METRICS COLLECTION SUMMARY" > "$summary_file"
+        echo "=============================" >> "$summary_file"
+        echo "Host: {{ inventory_hostname }}" >> "$summary_file"
+        echo "Date: {{ ansible_date_time.iso8601 }}" >> "$summary_file"
+        echo "Duration: {{ metrics_duration | default(default_metrics_duration) }}s" >> "$summary_file"
+        echo "Interval: {{ collection_interval }}s" >> "$summary_file"
+        echo "" >> "$summary_file"
+
+        if [ -f "$metrics_csv" ]; then
+          sample_count=$(tail -n +2 "$metrics_csv" | wc -l)
+          echo "📈 COLLECTION STATISTICS:" >> "$summary_file"
+          echo "Samples collected: $sample_count" >> "$summary_file"
+          echo "Expected samples: $(( {{ metrics_duration | default(default_metrics_duration) }} / {{ collection_interval }} ))" >> "$summary_file"
+          echo "" >> "$summary_file"
+
+          echo "📊 METRIC RANGES:" >> "$summary_file"
+          echo "CPU Usage:" >> "$summary_file"
+          tail -n +2 "$metrics_csv" | awk -F',' '{print $2}' | sort -n | awk 'NR==1{min=$1} {max=$1} END{print "  Min: " min "%, Max: " max "%"}' >> "$summary_file"
+
+          echo "Memory Usage:" >> "$summary_file"
+          tail -n +2 "$metrics_csv" | awk -F',' '{print $3}' | sort -n | awk 'NR==1{min=$1} {max=$1} END{print "  Min: " min "%, Max: " max "%"}' >> "$summary_file"
+
+          echo "Load Average (1min):" >> "$summary_file"
+          tail -n +2 "$metrics_csv" | awk -F',' '{print $5}' | sort -n | awk 'NR==1{min=$1} {max=$1} END{print "  Min: " min ", Max: " max}' >> "$summary_file"
+
+          echo "" >> "$summary_file"
+          echo "📁 GENERATED FILES:" >> "$summary_file"
+          ls -la {{ metrics_dir }}/{{ inventory_hostname }}/*{{ ansible_date_time.epoch }}* >> "$summary_file" 2>/dev/null || echo "No files found" >> "$summary_file"
+        else
+          echo "⚠️  WARNING: Metrics CSV file not found" >> "$summary_file"
+        fi
+
+        echo "Summary saved to: $summary_file"
+      register: metrics_summary
+
+    - name: Display metrics collection results
+      debug:
+        msg: |
+
+          📊 METRICS COLLECTION COMPLETE
+          ==============================
+          🖥️  Host: {{ inventory_hostname }}
+          📅 Date: {{ ansible_date_time.date }}
+          ⏱️  Duration: {{ metrics_duration | default(default_metrics_duration) }}s
+
+          📁 Generated Files:
+          {{ baseline_info.stdout }}
+          {{ metrics_collection.stdout }}
+          {{ docker_metrics.stdout | default('Docker metrics: N/A') }}
+          {{ network_metrics.stdout }}
+          {{ metrics_summary.stdout }}
+
+          🔍 Next Steps:
+          - Analyze metrics: cat {{ metrics_dir }}/{{ inventory_hostname }}/metrics_*.csv
+          - View summary: cat {{ metrics_dir }}/{{ inventory_hostname }}/metrics_summary_*.txt
+          - Plot trends: Use the CSV data with your preferred visualization tool
+          - Set up monitoring: ansible-playbook playbooks/alert_check.yml
+
+          ==============================
--- a/ansible/automation/playbooks/system_monitoring.yml
+++ b/ansible/automation/playbooks/system_monitoring.yml
@@ -0,0 +1,224 @@
+---
+- name: System Monitoring and Metrics Collection
+  hosts: all
+  gather_facts: yes
+  vars:
+    monitoring_timestamp: "{{ ansible_date_time.iso8601 }}"
+    metrics_retention_days: 30
+
+  tasks:
+    - name: Create monitoring data directory
+      file:
+        path: "/tmp/monitoring_data"
+        state: directory
+        mode: '0755'
+      delegate_to: localhost
+      run_once: true
+
+    - name: Collect system metrics
+      shell: |
+        echo "=== SYSTEM METRICS ==="
+        echo "Timestamp: $(date -Iseconds)"
+        echo "Hostname: $(hostname)"
+        echo "Uptime: $(uptime -p)"
+        echo "Load: $(uptime | awk -F'load average:' '{print $2}')"
+        echo ""
+
+        echo "=== CPU INFORMATION ==="
+        echo "CPU Model: $(lscpu | grep 'Model name' | cut -d':' -f2 | xargs)"
+        echo "CPU Cores: $(nproc)"
+        echo "CPU Usage: $(top -bn1 | grep 'Cpu(s)' | awk '{print $2}' | cut -d'%' -f1)%"
+        echo ""
+
+        echo "=== MEMORY INFORMATION ==="
+        free -h
+        echo ""
+
+        echo "=== DISK USAGE ==="
+        df -h
+        echo ""
+
+        echo "=== NETWORK INTERFACES ==="
+        ip -brief addr show
+        echo ""
+
+        echo "=== PROCESS SUMMARY ==="
+        ps aux --sort=-%cpu | head -10
+        echo ""
+
+        echo "=== SYSTEM TEMPERATURES (if available) ==="
+        if command -v sensors >/dev/null 2>&1; then
+          sensors 2>/dev/null || echo "Temperature sensors not available"
+        else
+          echo "lm-sensors not installed"
+        fi
+      register: system_metrics
+      changed_when: false
+
+    - name: Collect Docker metrics (if available)
+      shell: |
+        if command -v docker >/dev/null 2>&1 && docker info >/dev/null 2>&1; then
+          echo "=== DOCKER METRICS ==="
+          echo "Docker Version: $(docker --version)"
+          echo "Containers Running: $(docker ps -q | wc -l)"
+          echo "Containers Total: $(docker ps -aq | wc -l)"
+          echo "Images: $(docker images -q | wc -l)"
+          echo "Volumes: $(docker volume ls -q | wc -l)"
+          echo ""
+
+          echo "=== CONTAINER RESOURCE USAGE ==="
+          docker stats --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}\t{{.BlockIO}}" 2>/dev/null || echo "No running containers"
+          echo ""
+
+          echo "=== DOCKER SYSTEM INFO ==="
+          docker system df 2>/dev/null || echo "Docker system info not available"
+        else
+          echo "Docker not available or not accessible"
+        fi
+      register: docker_metrics
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Collect network metrics
+      shell: |
+        echo "=== NETWORK METRICS ==="
+        echo "Active Connections:"
+        netstat -tuln 2>/dev/null | head -20 || ss -tuln | head -20
+        echo ""
+
+        echo "=== TAILSCALE STATUS ==="
+        if command -v tailscale >/dev/null 2>&1; then
+          tailscale status 2>/dev/null || echo "Tailscale not accessible"
+        else
+          echo "Tailscale not installed"
+        fi
+        echo ""
+
+        echo "=== INTERNET CONNECTIVITY ==="
+        ping -c 3 8.8.8.8 2>/dev/null | tail -2 || echo "Internet connectivity test failed"
+      register: network_metrics
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Collect service metrics
+      shell: |
+        echo "=== SERVICE METRICS ==="
+        if command -v systemctl >/dev/null 2>&1; then
+          echo "Failed Services:"
+          systemctl --failed --no-legend 2>/dev/null || echo "No failed services"
+          echo ""
+
+          echo "Active Services (sample):"
+          systemctl list-units --type=service --state=active --no-legend | head -10
+        else
+          echo "Systemd not available"
+        fi
+        echo ""
+
+        echo "=== LOG SUMMARY ==="
+        if [ -f /var/log/syslog ]; then
+          echo "Recent system log entries:"
+          tail -5 /var/log/syslog 2>/dev/null || echo "Cannot access syslog"
+        elif command -v journalctl >/dev/null 2>&1; then
+          echo "Recent journal entries:"
+          journalctl --no-pager -n 5 2>/dev/null || echo "Cannot access journal"
+        else
+          echo "No accessible system logs"
+        fi
+      register: service_metrics
+      changed_when: false
+      ignore_errors: yes
+
+    - name: Calculate performance metrics
+      set_fact:
+        performance_metrics:
+          cpu_usage: "{{ (system_metrics.stdout | regex_search('CPU Usage: ([0-9.]+)%', '\\1'))[0] | default('0') | float }}"
+          memory_total: "{{ ansible_memtotal_mb }}"
+          memory_used: "{{ ansible_memtotal_mb - ansible_memfree_mb }}"
+          memory_percent: "{{ ((ansible_memtotal_mb - ansible_memfree_mb) / ansible_memtotal_mb * 100) | round(1) }}"
+          disk_usage: "{{ ansible_mounts | selectattr('mount', 'equalto', '/') | map(attribute='size_total') | first | default(0) }}"
+          uptime_seconds: "{{ ansible_uptime_seconds }}"
+
+    - name: Display monitoring summary
+      debug:
+        msg: |
+
+          ==========================================
+          📊 MONITORING REPORT - {{ inventory_hostname }}
+          ==========================================
+
+          🖥️  PERFORMANCE SUMMARY:
+          - CPU Usage: {{ performance_metrics.cpu_usage }}%
+          - Memory: {{ performance_metrics.memory_percent }}% ({{ performance_metrics.memory_used }}MB/{{ performance_metrics.memory_total }}MB)
+          - Uptime: {{ performance_metrics.uptime_seconds | int // 86400 }} days, {{ (performance_metrics.uptime_seconds | int % 86400) // 3600 }} hours
+
+          📈 DETAILED METRICS:
+          {{ system_metrics.stdout }}
+
+          🐳 DOCKER METRICS:
+          {{ docker_metrics.stdout }}
+
+          🌐 NETWORK METRICS:
+          {{ network_metrics.stdout }}
+
+          🔧 SERVICE METRICS:
+          {{ service_metrics.stdout }}
+
+          ==========================================
+
+    - name: Generate comprehensive monitoring report
+      copy:
+        content: |
+          {
+            "timestamp": "{{ monitoring_timestamp }}",
+            "hostname": "{{ inventory_hostname }}",
+            "system_info": {
+              "os": "{{ ansible_distribution }} {{ ansible_distribution_version }}",
+              "kernel": "{{ ansible_kernel }}",
+              "architecture": "{{ ansible_architecture }}",
+              "cpu_cores": {{ ansible_processor_vcpus }},
+              "memory_mb": {{ ansible_memtotal_mb }}
+            },
+            "performance": {
+              "cpu_usage_percent": {{ performance_metrics.cpu_usage }},
+              "memory_usage_percent": {{ performance_metrics.memory_percent }},
+              "memory_used_mb": {{ performance_metrics.memory_used }},
+              "memory_total_mb": {{ performance_metrics.memory_total }},
+              "uptime_seconds": {{ performance_metrics.uptime_seconds }},
+              "uptime_days": {{ performance_metrics.uptime_seconds | int // 86400 }}
+            },
+            "raw_metrics": {
+              "system": {{ system_metrics.stdout | to_json }},
+              "docker": {{ docker_metrics.stdout | to_json }},
+              "network": {{ network_metrics.stdout | to_json }},
+              "services": {{ service_metrics.stdout | to_json }}
+            }
+          }
+        dest: "/tmp/monitoring_data/{{ inventory_hostname }}_metrics_{{ ansible_date_time.epoch }}.json"
+      delegate_to: localhost
+
+    - name: Create monitoring trend data
+      shell: |
+        echo "{{ monitoring_timestamp }},{{ inventory_hostname }},{{ performance_metrics.cpu_usage }},{{ performance_metrics.memory_percent }},{{ performance_metrics.uptime_seconds }}" >> /tmp/monitoring_data/trends.csv
+      delegate_to: localhost
+      ignore_errors: yes
+
+    - name: Clean old monitoring data
+      shell: |
+        find /tmp/monitoring_data -name "*.json" -mtime +{{ metrics_retention_days }} -delete 2>/dev/null || true
+      delegate_to: localhost
+      run_once: true
+      ignore_errors: yes
+
+    - name: Summary message
+      debug:
+        msg: |
+
+          📊 Monitoring complete for {{ inventory_hostname }}
+          📄 Report saved to: /tmp/monitoring_data/{{ inventory_hostname }}_metrics_{{ ansible_date_time.epoch }}.json
+          📈 Trend data updated in: /tmp/monitoring_data/trends.csv
+
+          Performance Summary:
+          - CPU: {{ performance_metrics.cpu_usage }}%
+          - Memory: {{ performance_metrics.memory_percent }}%
+          - Uptime: {{ performance_metrics.uptime_seconds | int // 86400 }} days
--- a/ansible/automation/playbooks/tailscale_health.yml
+++ b/ansible/automation/playbooks/tailscale_health.yml
@@ -0,0 +1,75 @@
+---
+- name: Tailscale Health Check (Homelab)
+  hosts: active            # or "all" if you want to check everything
+  gather_facts: yes
+  become: false
+
+  vars:
+    tailscale_bin: "/usr/bin/tailscale"
+    tailscale_service: "tailscaled"
+
+  tasks:
+
+    - name: Verify Tailscale binary exists
+      stat:
+        path: "{{ tailscale_bin }}"
+      register: ts_bin
+      ignore_errors: true
+
+    - name: Skip host if Tailscale not installed
+      meta: end_host
+      when: not ts_bin.stat.exists
+
+    - name: Get Tailscale CLI version
+      command: "{{ tailscale_bin }} version"
+      register: ts_version
+      changed_when: false
+      failed_when: false
+
+    - name: Get Tailscale status (JSON)
+      command: "{{ tailscale_bin }} status --json"
+      register: ts_status
+      changed_when: false
+      failed_when: false
+
+    - name: Parse Tailscale JSON
+      set_fact:
+        ts_parsed: "{{ ts_status.stdout | from_json }}"
+      when: ts_status.rc == 0 and (ts_status.stdout | length) > 0 and ts_status.stdout is search('{')
+
+    - name: Extract important fields
+      set_fact:
+        ts_backend_state: "{{ ts_parsed.BackendState | default('unknown') }}"
+        ts_ips: "{{ ts_parsed.Self.TailscaleIPs | default([]) }}"
+        ts_hostname: "{{ ts_parsed.Self.HostName | default(inventory_hostname) }}"
+      when: ts_parsed is defined
+
+    - name: Report healthy nodes
+      debug:
+        msg: >-
+          HEALTHY: {{ ts_hostname }}
+          version={{ ts_version.stdout | default('n/a') }},
+          backend={{ ts_backend_state }},
+          ips={{ ts_ips }}
+      when:
+        - ts_parsed is defined
+        - ts_backend_state == "Running"
+        - ts_ips | length > 0
+
+    - name: Report unhealthy or unreachable nodes
+      debug:
+        msg: >-
+          UNHEALTHY: {{ inventory_hostname }}
+          rc={{ ts_status.rc }},
+          backend={{ ts_backend_state | default('n/a') }},
+          ips={{ ts_ips | default([]) }},
+          version={{ ts_version.stdout | default('n/a') }}
+      when: ts_parsed is not defined or ts_backend_state != "Running"
+
+    - name: Always print concise summary
+      debug:
+        msg: >-
+          Host={{ inventory_hostname }},
+          Version={{ ts_version.stdout | default('n/a') }},
+          Backend={{ ts_backend_state | default('unknown') }},
+          IPs={{ ts_ips | default([]) }}
--- a/ansible/automation/playbooks/update_ansible.yml
+++ b/ansible/automation/playbooks/update_ansible.yml
@@ -0,0 +1,96 @@
+---
+# Update and upgrade Ansible on Linux hosts
+# Excludes Synology devices and handles Home Assistant carefully
+# Created: February 8, 2026
+
+- name: Update package cache and upgrade Ansible on Linux hosts
+  hosts: debian_clients:!synology
+  gather_facts: yes
+  become: yes
+  vars:
+    ansible_become_pass: "{{ ansible_ssh_pass | default(omit) }}"
+
+  tasks:
+    - name: Display target host information
+      debug:
+        msg: "Updating Ansible on {{ inventory_hostname }} ({{ ansible_host }})"
+
+    - name: Check if host is Home Assistant
+      set_fact:
+        is_homeassistant: "{{ inventory_hostname == 'homeassistant' }}"
+
+    - name: Skip Home Assistant with warning
+      debug:
+        msg: "Skipping {{ inventory_hostname }} - Home Assistant uses its own package management"
+      when: is_homeassistant
+
+    - name: Update apt package cache
+      apt:
+        update_cache: yes
+        cache_valid_time: 3600
+      when: not is_homeassistant
+      register: apt_update_result
+
+    - name: Display apt update results
+      debug:
+        msg: "APT cache updated on {{ inventory_hostname }}"
+      when: not is_homeassistant and apt_update_result is succeeded
+
+    - name: Check current Ansible version
+      command: ansible --version
+      register: current_ansible_version
+      changed_when: false
+      failed_when: false
+      when: not is_homeassistant
+
+    - name: Display current Ansible version
+      debug:
+        msg: "Current Ansible version on {{ inventory_hostname }}: {{ current_ansible_version.stdout_lines[0] if current_ansible_version.stdout_lines else 'Not installed' }}"
+      when: not is_homeassistant and current_ansible_version is defined
+
+    - name: Upgrade Ansible package
+      apt:
+        name: ansible
+        state: latest
+        only_upgrade: yes
+      when: not is_homeassistant
+      register: ansible_upgrade_result
+
+    - name: Display Ansible upgrade results
+      debug:
+        msg: |
+          Ansible upgrade on {{ inventory_hostname }}:
+          {% if ansible_upgrade_result.changed %}
+          ✅ Ansible was upgraded successfully
+          {% else %}
+          ℹ️  Ansible was already at the latest version
+          {% endif %}
+      when: not is_homeassistant
+
+    - name: Check new Ansible version
+      command: ansible --version
+      register: new_ansible_version
+      changed_when: false
+      when: not is_homeassistant and ansible_upgrade_result is succeeded
+
+    - name: Display new Ansible version
+      debug:
+        msg: "New Ansible version on {{ inventory_hostname }}: {{ new_ansible_version.stdout_lines[0] }}"
+      when: not is_homeassistant and new_ansible_version is defined
+
+    - name: Summary of changes
+      debug:
+        msg: |
+          Summary for {{ inventory_hostname }}:
+          {% if is_homeassistant %}
+          - Skipped (Home Assistant uses its own package management)
+          {% else %}
+          - APT cache: {{ 'Updated' if apt_update_result.changed else 'Already current' }}
+          - Ansible: {{ 'Upgraded' if ansible_upgrade_result.changed else 'Already latest version' }}
+          {% endif %}
+
+  handlers:
+    - name: Clean apt cache
+      apt:
+        autoclean: yes
+      when: not is_homeassistant
--- a/ansible/automation/playbooks/update_ansible_targeted.yml
+++ b/ansible/automation/playbooks/update_ansible_targeted.yml
@@ -0,0 +1,122 @@
+---
+# Targeted Ansible update for confirmed Debian/Ubuntu hosts
+# Excludes Synology, TrueNAS, Home Assistant, and unreachable hosts
+# Created: February 8, 2026
+
+- name: Update and upgrade Ansible on confirmed Linux hosts
+  hosts: homelab,pi-5,vish-concord-nuc,pve
+  gather_facts: yes
+  become: yes
+  serial: 1  # Process one host at a time for better control
+
+  tasks:
+    - name: Display target host information
+      debug:
+        msg: |
+          Processing: {{ inventory_hostname }} ({{ ansible_host }})
+          OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
+          Python: {{ ansible_python_version }}
+
+    - name: Check if apt is available
+      stat:
+        path: /usr/bin/apt
+      register: apt_available
+
+    - name: Skip non-Debian hosts
+      debug:
+        msg: "Skipping {{ inventory_hostname }} - apt not available"
+      when: not apt_available.stat.exists
+
+    - name: Update apt package cache (with retry)
+      apt:
+        update_cache: yes
+        cache_valid_time: 0  # Force update
+      register: apt_update_result
+      retries: 3
+      delay: 10
+      when: apt_available.stat.exists
+      ignore_errors: yes
+
+    - name: Display apt update status
+      debug:
+        msg: |
+          APT update on {{ inventory_hostname }}:
+          {% if apt_update_result is succeeded %}
+          ✅ Success - Cache updated
+          {% elif apt_update_result is failed %}
+          ❌ Failed - {{ apt_update_result.msg | default('Unknown error') }}
+          {% else %}
+          ⏭️  Skipped - apt not available
+          {% endif %}
+
+    - name: Check if Ansible is installed
+      command: which ansible
+      register: ansible_installed
+      changed_when: false
+      failed_when: false
+      when: apt_available.stat.exists and apt_update_result is succeeded
+
+    - name: Get current Ansible version if installed
+      command: ansible --version
+      register: current_ansible_version
+      changed_when: false
+      failed_when: false
+      when: ansible_installed is succeeded and ansible_installed.rc == 0
+
+    - name: Display current Ansible status
+      debug:
+        msg: |
+          Ansible status on {{ inventory_hostname }}:
+          {% if ansible_installed is defined and ansible_installed.rc == 0 %}
+          📦 Installed: {{ current_ansible_version.stdout_lines[0] if current_ansible_version.stdout_lines else 'Version check failed' }}
+          {% else %}
+          📦 Not installed
+          {% endif %}
+
+    - name: Install or upgrade Ansible
+      apt:
+        name: ansible
+        state: latest
+        update_cache: no  # We already updated above
+      register: ansible_upgrade_result
+      when: apt_available.stat.exists and apt_update_result is succeeded
+      ignore_errors: yes
+
+    - name: Display Ansible installation/upgrade results
+      debug:
+        msg: |
+          Ansible operation on {{ inventory_hostname }}:
+          {% if ansible_upgrade_result is succeeded %}
+            {% if ansible_upgrade_result.changed %}
+          ✅ {{ 'Installed' if ansible_installed.rc != 0 else 'Upgraded' }} successfully
+            {% else %}
+          ℹ️  Already at latest version
+            {% endif %}
+          {% elif ansible_upgrade_result is failed %}
+          ❌ Failed: {{ ansible_upgrade_result.msg | default('Unknown error') }}
+          {% else %}
+          ⏭️  Skipped due to previous errors
+          {% endif %}
+
+    - name: Verify final Ansible version
+      command: ansible --version
+      register: final_ansible_version
+      changed_when: false
+      failed_when: false
+      when: ansible_upgrade_result is succeeded
+
+    - name: Final status summary
+      debug:
+        msg: |
+          === SUMMARY FOR {{ inventory_hostname | upper }} ===
+          Host: {{ ansible_host }}
+          OS: {{ ansible_distribution }} {{ ansible_distribution_version }}
+          APT Update: {{ '✅ Success' if apt_update_result is succeeded else '❌ Failed' if apt_update_result is defined else '⏭️ Skipped' }}
+          Ansible: {% if final_ansible_version is succeeded %}{{ final_ansible_version.stdout_lines[0] }}{% elif ansible_upgrade_result is succeeded %}{{ 'Installed/Updated' if ansible_upgrade_result.changed else 'Already current' }}{% else %}{{ '❌ Failed or skipped' }}{% endif %}
+
+  post_tasks:
+    - name: Clean up apt cache
+      apt:
+        autoclean: yes
+      when: apt_available.stat.exists and apt_update_result is succeeded
+      ignore_errors: yes
--- a/ansible/automation/playbooks/update_portainer_agent.yml
+++ b/ansible/automation/playbooks/update_portainer_agent.yml
@@ -0,0 +1,92 @@
+---
+# Update Portainer Edge Agent across homelab hosts
+#
+# Usage:
+#   ansible-playbook -i hosts.ini playbooks/update_portainer_agent.yml
+#   ansible-playbook -i hosts.ini playbooks/update_portainer_agent.yml -e "agent_version=2.33.7"
+#   ansible-playbook -i hosts.ini playbooks/update_portainer_agent.yml --limit vish-concord-nuc
+#
+# Notes:
+#   - Reads EDGE_ID and EDGE_KEY from the running container — no secrets needed in vars
+#   - Set docker_bin in host_vars to override the docker binary path per host
+#   - For Synology (calypso): docker_bin includes sudo prefix since Ansible become
+#     does not reliably escalate on DSM
+
+- name: Update Portainer Edge Agent
+  hosts: portainer_edge_agents
+  gather_facts: false
+  vars:
+    agent_version: "2.33.7"
+    agent_image: "portainer/agent:{{ agent_version }}"
+    container_name: portainer_edge_agent
+
+  tasks:
+    - name: Check container exists
+      shell: "{{ docker_bin | default('docker') }} inspect {{ container_name }} --format '{{ '{{' }}.Id{{ '}}' }}'"
+      register: container_check
+      changed_when: false
+      failed_when: container_check.rc != 0
+
+    - name: Get current image
+      shell: "{{ docker_bin | default('docker') }} inspect {{ container_name }} --format '{{ '{{' }}.Config.Image{{ '}}' }}'"
+      register: current_image
+      changed_when: false
+
+    - name: Get EDGE environment vars from running container
+      shell: "{{ docker_bin | default('docker') }} inspect {{ container_name }} --format '{{ '{{' }}json .Config.Env{{ '}}' }}'"
+      register: container_env
+      changed_when: false
+
+    - name: Parse EDGE_ID
+      set_fact:
+        edge_id: "{{ (container_env.stdout | from_json | select('match', 'EDGE_ID=.*') | list | first).split('=', 1)[1] }}"
+
+    - name: Parse EDGE_KEY
+      set_fact:
+        edge_key: "{{ (container_env.stdout | from_json | select('match', 'EDGE_KEY=.*') | list | first).split('=', 1)[1] }}"
+
+    - name: Pull new agent image
+      shell: "{{ docker_bin | default('docker') }} pull {{ agent_image }}"
+      register: pull_result
+      changed_when: "'Status: Downloaded newer image' in pull_result.stdout"
+
+    - name: Skip if already on target version
+      debug:
+        msg: "{{ inventory_hostname }}: already running {{ agent_image }}, skipping recreate"
+      when: current_image.stdout == agent_image and not pull_result.changed
+
+    - name: Stop old container
+      shell: "{{ docker_bin | default('docker') }} stop {{ container_name }}"
+      when: current_image.stdout != agent_image or pull_result.changed
+
+    - name: Remove old container
+      shell: "{{ docker_bin | default('docker') }} rm {{ container_name }}"
+      when: current_image.stdout != agent_image or pull_result.changed
+
+    - name: Start new container
+      shell: >
+        {{ docker_bin | default('docker') }} run -d
+        --name {{ container_name }}
+        --restart always
+        -v /var/run/docker.sock:/var/run/docker.sock
+        -v {{ docker_volumes_path | default('/var/lib/docker/volumes') }}:/var/lib/docker/volumes
+        -v /:/host
+        -v portainer_agent_data:/data
+        -e EDGE=1
+        -e EDGE_ID={{ edge_id }}
+        -e EDGE_KEY={{ edge_key }}
+        -e EDGE_INSECURE_POLL=1
+        {{ agent_image }}
+      when: current_image.stdout != agent_image or pull_result.changed
+
+    - name: Wait for container to be running
+      shell: "{{ docker_bin | default('docker') }} ps --filter 'name={{ container_name }}' --format '{{ '{{' }}.Status{{ '}}' }}'"
+      register: container_status
+      retries: 5
+      delay: 3
+      until: "'Up' in container_status.stdout"
+      when: current_image.stdout != agent_image or pull_result.changed
+
+    - name: Report result
+      debug:
+        msg: "{{ inventory_hostname }}: {{ current_image.stdout }} → {{ agent_image }} | {{ container_status.stdout | default('no change') }}"
--- a/ansible/automation/playbooks/update_system.yml
+++ b/ansible/automation/playbooks/update_system.yml
@@ -0,0 +1,8 @@
+- hosts: all
+  become: true
+  tasks:
+    - name: Update apt cache and upgrade packages
+      apt:
+        update_cache: yes
+        upgrade: dist
+      when: ansible_os_family == "Debian"