Sanitized mirror from private repository - 2026-03-18 10:31:50 UTC

2026-03-18 10:31:50 +00:00
commit 8e49624d78
1221 changed files with 304405 additions and 0 deletions
--- a/ansible/automation/docs/plans/2026-02-21-new-playbooks-design.md
+++ b/ansible/automation/docs/plans/2026-02-21-new-playbooks-design.md
@@ -0,0 +1,93 @@
+# New Playbooks Design — 2026-02-21
+
+## Context
+
+Adding 5 playbooks to fill coverage gaps in the existing 42-playbook homelab automation suite.
+Infrastructure: 10+ hosts, 200+ containers, Tailscale mesh, mixed platforms (Ubuntu, Debian,
+Synology DSM, TrueNAS SCALE, Proxmox, Alpine/Home Assistant, Raspberry Pi).
+
+## Approved Playbooks
+
+### 1. `network_connectivity.yml`
+**Priority: High (user-requested)**
+
+Full mesh connectivity verification across the tailnet.
+
+- Targets: `all` (unreachable hosts handled gracefully with `ignore_unreachable`)
+- Checks per host:
+  - Tailscale is running and has a valid IP (`tailscale status --json`)
+  - Ping all other inventory hosts by Tailscale IP
+  - SSH reachability to each peer
+  - HTTP/HTTPS endpoint health for key services (Portainer, Gitea, Immich, Home Assistant, etc.) — defined in group_vars or inline vars
+- Output: connectivity matrix table + `/tmp/connectivity_reports/connectivity_<timestamp>.json`
+- Alert: ntfy notification on any failed node or endpoint
+
+### 2. `proxmox_management.yml`
+**Priority: High**
+
+Proxmox-specific management targeting `pve` host.
+
+- Checks:
+  - VM/LXC inventory: count, names, state (running/stopped)
+  - Resource allocation vs actual usage (RAM, CPU per VM)
+  - Storage pool status and utilisation
+  - Recent Proxmox task log (last 10 tasks)
+- Optional action: `-e action=snapshot -e vm_id=100` to snapshot a specific VM
+- Output: JSON report at `/tmp/health_reports/proxmox_<timestamp>.json`
+- Pattern: mirrors `synology_health.yml` structure
+
+### 3. `truenas_health.yml`
+**Priority: High**
+
+TrueNAS SCALE-specific health targeting `truenas-scale` host.
+
+- Checks:
+  - ZFS pool status (`zpool status`) — flags DEGRADED/FAULTED
+  - Pool scrub: last scrub date, status, any errors
+  - Dataset disk usage with warnings at 80%/90%
+  - SMART status for physical disks
+  - TrueNAS apps (k3s-based): running app count, failed apps
+- Output: JSON report at `/tmp/health_reports/truenas_<timestamp>.json`
+- Complements existing `synology_health.yml`
+
+### 4. `ntp_check.yml`
+**Priority: Medium**
+
+Time sync health check across all hosts. Check only — no configuration changes.
+
+- Targets: `all`
+- Platform-adaptive daemon detection: `chronyd`, `systemd-timesyncd`, `ntpd`, Synology NTP
+- Reports: sync source, current offset (ms), stratum, last sync time
+- Thresholds: warn >500ms, critical >1000ms
+- Alert: ntfy notification for hosts exceeding warn threshold
+- Output: summary table + `/tmp/ntp_reports/ntp_<timestamp>.json`
+
+### 5. `cron_audit.yml`
+**Priority: Medium**
+
+Scheduled task inventory and basic security audit across all hosts.
+
+- Inventories:
+  - `/etc/crontab`, `/etc/cron.d/*`, `/etc/cron.{hourly,daily,weekly,monthly}/`
+  - User crontabs (`crontab -l` for each user with a crontab)
+  - `systemd` timer units (`systemctl list-timers --all`)
+- Security flags:
+  - Cron jobs running as root that reference world-writable paths
+  - Cron jobs referencing paths that no longer exist
+- Output: per-host JSON at `/tmp/cron_audit/<host>_<timestamp>.json` + summary
+
+## Patterns to Follow
+
+- Use `changed_when: false` on all read-only shell tasks
+- Use `ignore_errors: true` / `ignore_unreachable: true` for non-fatal checks
+- Platform detection via `ansible_distribution` and custom `system_type` host_vars
+- ntfy URL from `ntfy_url` variable (group_vars with default fallback)
+- JSON reports saved to `/tmp/<category>_reports/` with timestamp in filename
+- `delegate_to: localhost` + `run_once: true` for report aggregation tasks
+
+## Out of Scope
+
+- NTP configuration/enforcement (check only, per user decision)
+- Home Assistant backup (deferred)
+- Docker compose drift detection (deferred)
+- Gitea health (deferred)
--- a/ansible/automation/docs/plans/2026-02-21-new-playbooks-implementation.md
+++ b/ansible/automation/docs/plans/2026-02-21-new-playbooks-implementation.md