New Playbooks Design — 2026-02-21

Context

Adding 5 playbooks to fill coverage gaps in the existing 42-playbook homelab automation suite. Infrastructure: 10+ hosts, 200+ containers, Tailscale mesh, mixed platforms (Ubuntu, Debian, Synology DSM, TrueNAS SCALE, Proxmox, Alpine/Home Assistant, Raspberry Pi).

Approved Playbooks

1. `network_connectivity.yml`

Priority: High (user-requested)

Full mesh connectivity verification across the tailnet.

Targets: all (unreachable hosts handled gracefully with ignore_unreachable)
Checks per host:
- Tailscale is running and has a valid IP (tailscale status --json)
- Ping all other inventory hosts by Tailscale IP
- SSH reachability to each peer
- HTTP/HTTPS endpoint health for key services (Portainer, Gitea, Immich, Home Assistant, etc.) — defined in group_vars or inline vars
Output: connectivity matrix table + /tmp/connectivity_reports/connectivity_<timestamp>.json
Alert: ntfy notification on any failed node or endpoint

2. `proxmox_management.yml`

Priority: High

Proxmox-specific management targeting pve host.

Checks:
- VM/LXC inventory: count, names, state (running/stopped)
- Resource allocation vs actual usage (RAM, CPU per VM)
- Storage pool status and utilisation
- Recent Proxmox task log (last 10 tasks)
Optional action: -e action=snapshot -e vm_id=100 to snapshot a specific VM
Output: JSON report at /tmp/health_reports/proxmox_<timestamp>.json
Pattern: mirrors synology_health.yml structure

3. `truenas_health.yml`

Priority: High

TrueNAS SCALE-specific health targeting truenas-scale host.

Checks:
- ZFS pool status (zpool status) — flags DEGRADED/FAULTED
- Pool scrub: last scrub date, status, any errors
- Dataset disk usage with warnings at 80%/90%
- SMART status for physical disks
- TrueNAS apps (k3s-based): running app count, failed apps
Output: JSON report at /tmp/health_reports/truenas_<timestamp>.json
Complements existing synology_health.yml

4. `ntp_check.yml`

Priority: Medium

Time sync health check across all hosts. Check only — no configuration changes.

Targets: all
Platform-adaptive daemon detection: chronyd, systemd-timesyncd, ntpd, Synology NTP
Reports: sync source, current offset (ms), stratum, last sync time
Thresholds: warn >500ms, critical >1000ms
Alert: ntfy notification for hosts exceeding warn threshold
Output: summary table + /tmp/ntp_reports/ntp_<timestamp>.json

5. `cron_audit.yml`

Priority: Medium

Scheduled task inventory and basic security audit across all hosts.

Inventories:
- /etc/crontab, /etc/cron.d/*, /etc/cron.{hourly,daily,weekly,monthly}/
- User crontabs (crontab -l for each user with a crontab)
- systemd timer units (systemctl list-timers --all)
Security flags:
- Cron jobs running as root that reference world-writable paths
- Cron jobs referencing paths that no longer exist
Output: per-host JSON at /tmp/cron_audit/<host>_<timestamp>.json + summary

Patterns to Follow

Use changed_when: false on all read-only shell tasks
Use ignore_errors: true / ignore_unreachable: true for non-fatal checks
Platform detection via ansible_distribution and custom system_type host_vars
ntfy URL from ntfy_url variable (group_vars with default fallback)
JSON reports saved to /tmp/<category>_reports/ with timestamp in filename
delegate_to: localhost + run_once: true for report aggregation tasks

Out of Scope

NTP configuration/enforcement (check only, per user decision)
Home Assistant backup (deferred)
Docker compose drift detection (deferred)
Gitea health (deferred)

3.6 KiB Raw Blame History

New Playbooks Design — 2026-02-21

Context

Approved Playbooks

1. network_connectivity.yml

2. proxmox_management.yml

3. truenas_health.yml

4. ntp_check.yml

5. cron_audit.yml