# New Playbooks Design — 2026-02-21 ## Context Adding 5 playbooks to fill coverage gaps in the existing 42-playbook homelab automation suite. Infrastructure: 10+ hosts, 200+ containers, Tailscale mesh, mixed platforms (Ubuntu, Debian, Synology DSM, TrueNAS SCALE, Proxmox, Alpine/Home Assistant, Raspberry Pi). ## Approved Playbooks ### 1. `network_connectivity.yml` **Priority: High (user-requested)** Full mesh connectivity verification across the tailnet. - Targets: `all` (unreachable hosts handled gracefully with `ignore_unreachable`) - Checks per host: - Tailscale is running and has a valid IP (`tailscale status --json`) - Ping all other inventory hosts by Tailscale IP - SSH reachability to each peer - HTTP/HTTPS endpoint health for key services (Portainer, Gitea, Immich, Home Assistant, etc.) — defined in group_vars or inline vars - Output: connectivity matrix table + `/tmp/connectivity_reports/connectivity_.json` - Alert: ntfy notification on any failed node or endpoint ### 2. `proxmox_management.yml` **Priority: High** Proxmox-specific management targeting `pve` host. - Checks: - VM/LXC inventory: count, names, state (running/stopped) - Resource allocation vs actual usage (RAM, CPU per VM) - Storage pool status and utilisation - Recent Proxmox task log (last 10 tasks) - Optional action: `-e action=snapshot -e vm_id=100` to snapshot a specific VM - Output: JSON report at `/tmp/health_reports/proxmox_.json` - Pattern: mirrors `synology_health.yml` structure ### 3. `truenas_health.yml` **Priority: High** TrueNAS SCALE-specific health targeting `truenas-scale` host. - Checks: - ZFS pool status (`zpool status`) — flags DEGRADED/FAULTED - Pool scrub: last scrub date, status, any errors - Dataset disk usage with warnings at 80%/90% - SMART status for physical disks - TrueNAS apps (k3s-based): running app count, failed apps - Output: JSON report at `/tmp/health_reports/truenas_.json` - Complements existing `synology_health.yml` ### 4. `ntp_check.yml` **Priority: Medium** Time sync health check across all hosts. Check only — no configuration changes. - Targets: `all` - Platform-adaptive daemon detection: `chronyd`, `systemd-timesyncd`, `ntpd`, Synology NTP - Reports: sync source, current offset (ms), stratum, last sync time - Thresholds: warn >500ms, critical >1000ms - Alert: ntfy notification for hosts exceeding warn threshold - Output: summary table + `/tmp/ntp_reports/ntp_.json` ### 5. `cron_audit.yml` **Priority: Medium** Scheduled task inventory and basic security audit across all hosts. - Inventories: - `/etc/crontab`, `/etc/cron.d/*`, `/etc/cron.{hourly,daily,weekly,monthly}/` - User crontabs (`crontab -l` for each user with a crontab) - `systemd` timer units (`systemctl list-timers --all`) - Security flags: - Cron jobs running as root that reference world-writable paths - Cron jobs referencing paths that no longer exist - Output: per-host JSON at `/tmp/cron_audit/_.json` + summary ## Patterns to Follow - Use `changed_when: false` on all read-only shell tasks - Use `ignore_errors: true` / `ignore_unreachable: true` for non-fatal checks - Platform detection via `ansible_distribution` and custom `system_type` host_vars - ntfy URL from `ntfy_url` variable (group_vars with default fallback) - JSON reports saved to `/tmp/_reports/` with timestamp in filename - `delegate_to: localhost` + `run_once: true` for report aggregation tasks ## Out of Scope - NTP configuration/enforcement (check only, per user decision) - Home Assistant backup (deferred) - Docker compose drift detection (deferred) - Gitea health (deferred)