# New Playbooks Implementation Plan > **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. **Goal:** Add 5 new Ansible playbooks covering network connectivity health, Proxmox management, TrueNAS health, NTP sync auditing, and cron job inventory. **Architecture:** Each playbook is standalone, follows existing patterns (read-only shell tasks with `changed_when: false`, `failed_when: false` for non-fatal checks, ntfy alerting via `ntfy_url` var, JSON reports in `/tmp/_reports/`). Platform detection is done inline via command availability checks rather than Ansible facts to keep cross-platform compatibility with Synology/TrueNAS. **Tech Stack:** Ansible, bash shell commands, Tailscale CLI, Proxmox `qm`/`pct`/`pvesh` CLI, ZFS `zpool`/`zfs` tools, `chronyc`/`timedatectl`, `smartctl`, standard POSIX cron paths. --- ## Conventions to Follow (read this first) These patterns appear in every existing playbook — match them exactly: ```yaml # Read-only tasks always have: changed_when: false failed_when: false # (or ignore_errors: yes) # Report directories: delegate_to: localhost run_once: true # Variable defaults: my_var: "{{ my_var | default('fallback') }}" # Module names use fully-qualified form: ansible.builtin.shell ansible.builtin.debug ansible.builtin.assert # ntfy alerting (used in alert_check.yml — copy that pattern): ntfy_url: "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}" ``` Reference files to read before each task: - `playbooks/synology_health.yml` — pattern for platform-specific health checks - `playbooks/tailscale_health.yml` — pattern for binary detection + JSON parsing - `playbooks/disk_usage_report.yml` — pattern for threshold variables + report dirs - `playbooks/alert_check.yml` — pattern for ntfy notifications --- ## Task 1: `network_connectivity.yml` — Full mesh connectivity check **Files:** - Create: `playbooks/network_connectivity.yml` **What it does:** For every host in inventory, check Tailscale is Running, ping all other hosts by their `ansible_host` IP, test SSH port reachability, and verify HTTP endpoints for key services. Outputs a connectivity matrix and sends ntfy alert on failures. **Step 1: Create the playbook file** ```yaml --- # Network Connectivity Health Check # Verifies Tailscale mesh connectivity between all inventory hosts # and checks HTTP/HTTPS endpoints for key services. # # Usage: ansible-playbook -i hosts.ini playbooks/network_connectivity.yml # Usage: ansible-playbook -i hosts.ini playbooks/network_connectivity.yml --limit homelab - name: Network Connectivity Health Check hosts: "{{ host_target | default('active') }}" gather_facts: yes ignore_unreachable: true vars: report_dir: "/tmp/connectivity_reports" ts_candidates: - /usr/bin/tailscale - /var/packages/Tailscale/target/bin/tailscale warn_on_failure: true ntfy_url: "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}" # HTTP endpoints to verify — add/remove per your services http_endpoints: - name: Portainer (homelab) url: "http://100.67.40.126:9000" - name: Gitea (homelab) url: "http://100.67.40.126:3000" - name: Immich (homelab) url: "http://100.67.40.126:2283" - name: Home Assistant url: "http://100.112.186.90:8123" tasks: - name: Create connectivity report directory ansible.builtin.file: path: "{{ report_dir }}" state: directory mode: '0755' delegate_to: localhost run_once: true # ── Tailscale status ────────────────────────────────────────────── - name: Detect Tailscale binary ansible.builtin.shell: | for p in {{ ts_candidates | join(' ') }}; do [ -x "$p" ] && echo "$p" && exit 0 done echo "" register: ts_bin changed_when: false failed_when: false - name: Get Tailscale status JSON ansible.builtin.command: "{{ ts_bin.stdout }} status --json" register: ts_status_raw changed_when: false failed_when: false when: ts_bin.stdout | length > 0 - name: Parse Tailscale state ansible.builtin.set_fact: ts_parsed: "{{ ts_status_raw.stdout | from_json }}" ts_backend: "{{ (ts_status_raw.stdout | from_json).BackendState | default('unknown') }}" ts_ip: "{{ ((ts_status_raw.stdout | from_json).Self.TailscaleIPs | default([]) | first) | default('n/a') }}" when: - ts_bin.stdout | length > 0 - ts_status_raw.rc | default(1) == 0 - ts_status_raw.stdout | default('') | length > 0 - ts_status_raw.stdout is search('{') failed_when: false # ── Peer reachability (ping each inventory host by Tailscale IP) ── - name: Ping all inventory hosts ansible.builtin.shell: | ping -c 2 -W 2 {{ hostvars[item]['ansible_host'] }} > /dev/null 2>&1 && echo "OK" || echo "FAIL" register: ping_results changed_when: false failed_when: false loop: "{{ groups['active'] | select('ne', inventory_hostname) | list }}" loop_control: label: "{{ item }}" - name: Summarise ping results ansible.builtin.set_fact: ping_summary: "{{ ping_summary | default({}) | combine({item.item: item.stdout | trim}) }}" loop: "{{ ping_results.results }}" loop_control: label: "{{ item.item }}" # ── SSH port check ──────────────────────────────────────────────── - name: Check SSH port on all inventory hosts ansible.builtin.shell: | port="{{ hostvars[item]['ansible_port'] | default(22) }}" nc -zw3 {{ hostvars[item]['ansible_host'] }} "$port" > /dev/null 2>&1 && echo "OK" || echo "FAIL" register: ssh_port_results changed_when: false failed_when: false loop: "{{ groups['active'] | select('ne', inventory_hostname) | list }}" loop_control: label: "{{ item }}" - name: Summarise SSH port results ansible.builtin.set_fact: ssh_summary: "{{ ssh_summary | default({}) | combine({item.item: item.stdout | trim}) }}" loop: "{{ ssh_port_results.results }}" loop_control: label: "{{ item.item }}" # ── HTTP endpoint checks (run once from localhost) ──────────────── - name: Check HTTP endpoints ansible.builtin.uri: url: "{{ item.url }}" method: GET status_code: [200, 301, 302, 401, 403] timeout: 5 validate_certs: false register: http_results failed_when: false loop: "{{ http_endpoints }}" loop_control: label: "{{ item.name }}" delegate_to: localhost run_once: true # ── Connectivity summary ────────────────────────────────────────── - name: Display connectivity summary per host ansible.builtin.debug: msg: | ═══ {{ inventory_hostname }} ═══ Tailscale: {{ ts_backend | default('not installed') }} | IP: {{ ts_ip | default('n/a') }} Peer ping results: {% for host, result in (ping_summary | default({})).items() %} {{ host }}: {{ result }} {% endfor %} SSH port results: {% for host, result in (ssh_summary | default({})).items() %} {{ host }}: {{ result }} {% endfor %} - name: Display HTTP endpoint results ansible.builtin.debug: msg: | ═══ HTTP Endpoint Health ═══ {% for item in http_results.results | default([]) %} {{ item.item.name }}: {{ 'OK (' + (item.status | string) + ')' if item.status is defined and item.status > 0 else 'FAIL' }} {% endfor %} run_once: true delegate_to: localhost # ── Alert on failures ───────────────────────────────────────────── - name: Collect failed peers ansible.builtin.set_fact: failed_peers: >- {{ (ping_summary | default({})).items() | selectattr('1', 'eq', 'FAIL') | map(attribute='0') | list }} - name: Send ntfy alert for connectivity failures ansible.builtin.uri: url: "{{ ntfy_url }}" method: POST body: "Connectivity failures on {{ inventory_hostname }}: {{ failed_peers | join(', ') }}" headers: Title: "Homelab Network Alert" Priority: "high" Tags: "warning,network" body_format: raw status_code: [200, 204] delegate_to: localhost failed_when: false when: - warn_on_failure | bool - failed_peers | length > 0 # ── Write JSON report ───────────────────────────────────────────── - name: Write connectivity report ansible.builtin.copy: content: "{{ {'host': inventory_hostname, 'timestamp': ansible_date_time.iso8601, 'tailscale_state': ts_backend | default('unknown'), 'tailscale_ip': ts_ip | default('n/a'), 'ping': ping_summary | default({}), 'ssh_port': ssh_summary | default({})} | to_nice_json }}" dest: "{{ report_dir }}/{{ inventory_hostname }}_{{ ansible_date_time.date }}.json" delegate_to: localhost changed_when: false ``` **Step 2: Validate YAML syntax** ```bash cd /home/homelab/organized/repos/homelab/ansible/automation ansible-playbook --syntax-check -i hosts.ini playbooks/network_connectivity.yml ``` Expected: `playbook: playbooks/network_connectivity.yml` with no errors. **Step 3: Dry-run against one host** ```bash ansible-playbook -i hosts.ini playbooks/network_connectivity.yml --limit homelab --check ``` Expected: Tasks run, no failures. Some tasks will report `skipped` (when conditions, etc.) — that's fine. **Step 4: Run for real against one host** ```bash ansible-playbook -i hosts.ini playbooks/network_connectivity.yml --limit homelab ``` Expected: Connectivity summary printed, report written to `/tmp/connectivity_reports/homelab_.json`. **Step 5: Run against all active hosts** ```bash ansible-playbook -i hosts.ini playbooks/network_connectivity.yml ``` Expected: Summary for every host in `[active]` group. Unreachable hosts are handled gracefully (skipped, not errored). **Step 6: Commit** ```bash git add playbooks/network_connectivity.yml git commit -m "feat: add network_connectivity playbook for full mesh health check" ``` --- ## Task 2: `proxmox_management.yml` — Proxmox VM/LXC inventory and health **Files:** - Create: `playbooks/proxmox_management.yml` **What it does:** Targets the `pve` host. Reports VM inventory (`qm list`), LXC inventory (`pct list`), node resource summary, storage pool status, and last 10 task log entries. Optional snapshot action via `-e action=snapshot -e vm_id=100`. **Note:** `pve` uses `ansible_user=root` (see `hosts.ini`), so `become: false` is correct here — root already has all access. **Step 1: Create the playbook** ```yaml --- # Proxmox VE Management Playbook # Reports VM/LXC inventory, resource usage, storage pool status, and recent tasks. # Optionally creates a snapshot with -e action=snapshot -e vm_id=100 # # Usage: ansible-playbook -i hosts.ini playbooks/proxmox_management.yml # Usage: ansible-playbook -i hosts.ini playbooks/proxmox_management.yml -e action=snapshot -e vm_id=100 - name: Proxmox VE Management hosts: pve gather_facts: yes become: false vars: action: "{{ action | default('status') }}" # status | snapshot vm_id: "{{ vm_id | default('') }}" report_dir: "/tmp/health_reports" tasks: - name: Create report directory ansible.builtin.file: path: "{{ report_dir }}" state: directory mode: '0755' delegate_to: localhost run_once: true # ── Node overview ───────────────────────────────────────────────── - name: Get PVE version ansible.builtin.command: pveversion register: pve_version changed_when: false failed_when: false - name: Get node resource summary ansible.builtin.shell: | pvesh get /nodes/$(hostname)/status --output-format json 2>/dev/null || \ echo '{"error": "pvesh not available"}' register: node_status_raw changed_when: false failed_when: false - name: Parse node status ansible.builtin.set_fact: node_status: "{{ node_status_raw.stdout | from_json }}" failed_when: false when: node_status_raw.stdout | default('') | length > 0 # ── VM inventory ────────────────────────────────────────────────── - name: List all VMs ansible.builtin.command: qm list register: vm_list changed_when: false failed_when: false - name: List all LXC containers ansible.builtin.command: pct list register: lxc_list changed_when: false failed_when: false - name: Count running VMs ansible.builtin.shell: | qm list 2>/dev/null | grep -c "running" || echo "0" register: vm_running_count changed_when: false failed_when: false - name: Count running LXCs ansible.builtin.shell: | pct list 2>/dev/null | grep -c "running" || echo "0" register: lxc_running_count changed_when: false failed_when: false # ── Storage pools ───────────────────────────────────────────────── - name: Get storage pool status ansible.builtin.shell: | pvesh get /nodes/$(hostname)/storage --output-format json 2>/dev/null | \ python3 -c " import json,sys data=json.load(sys.stdin) for s in data: used_pct = round(s.get('used',0) / s.get('total',1) * 100, 1) if s.get('total',0) > 0 else 0 print(f\"{s.get('storage','?'):20} {s.get('type','?'):10} used={used_pct}% avail={round(s.get('avail',0)/1073741824,1)}GiB\") " 2>/dev/null || pvesm status 2>/dev/null || echo "Storage info unavailable" register: storage_status changed_when: false failed_when: false # ── Recent task log ─────────────────────────────────────────────── - name: Get recent PVE tasks ansible.builtin.shell: | pvesh get /nodes/$(hostname)/tasks \ --limit 10 \ --output-format json 2>/dev/null | \ python3 -c " import json,sys,datetime tasks=json.load(sys.stdin) for t in tasks: ts=datetime.datetime.fromtimestamp(t.get('starttime',0)).strftime('%Y-%m-%d %H:%M') status=t.get('status','?') upid=t.get('upid','?') print(f'{ts} {status:12} {upid}') " 2>/dev/null || echo "Task log unavailable" register: recent_tasks changed_when: false failed_when: false # ── Summary output ──────────────────────────────────────────────── - name: Display Proxmox summary ansible.builtin.debug: msg: | ═══ Proxmox VE — {{ inventory_hostname }} ═══ Version: {{ pve_version.stdout | default('unknown') }} VMs: {{ vm_running_count.stdout | trim }} running {{ vm_list.stdout | default('(no VMs)') | indent(2) }} LXCs: {{ lxc_running_count.stdout | trim }} running {{ lxc_list.stdout | default('(no LXCs)') | indent(2) }} Storage Pools: {{ storage_status.stdout | default('n/a') | indent(2) }} Recent Tasks (last 10): {{ recent_tasks.stdout | default('n/a') | indent(2) }} # ── Optional: snapshot a VM ─────────────────────────────────────── - name: Create VM snapshot ansible.builtin.shell: | snap_name="ansible-snap-$(date +%Y%m%d-%H%M%S)" qm snapshot {{ vm_id }} "$snap_name" --description "Ansible automated snapshot" echo "Snapshot created: $snap_name for VM {{ vm_id }}" register: snapshot_result when: - action == "snapshot" - vm_id | string | length > 0 changed_when: true - name: Show snapshot result ansible.builtin.debug: msg: "{{ snapshot_result.stdout | default('No snapshot taken') }}" when: action == "snapshot" # ── Write JSON report ───────────────────────────────────────────── - name: Write Proxmox report ansible.builtin.copy: content: "{{ {'host': inventory_hostname, 'timestamp': ansible_date_time.iso8601, 'version': pve_version.stdout | default('unknown'), 'vms_running': vm_running_count.stdout | trim, 'lxcs_running': lxc_running_count.stdout | trim, 'storage': storage_status.stdout | default(''), 'tasks': recent_tasks.stdout | default('')} | to_nice_json }}" dest: "{{ report_dir }}/proxmox_{{ ansible_date_time.date }}.json" delegate_to: localhost changed_when: false ``` **Step 2: Validate syntax** ```bash ansible-playbook --syntax-check -i hosts.ini playbooks/proxmox_management.yml ``` Expected: no errors. **Step 3: Run against pve** ```bash ansible-playbook -i hosts.ini playbooks/proxmox_management.yml ``` Expected: Proxmox summary table printed. JSON report written to `/tmp/health_reports/proxmox_.json`. **Step 4: Test snapshot action (optional — only if you have a test VM)** ```bash # Replace 100 with a real VM ID from the qm list output above ansible-playbook -i hosts.ini playbooks/proxmox_management.yml -e action=snapshot -e vm_id=100 ``` Expected: `Snapshot created: ansible-snap- for VM 100` **Step 5: Commit** ```bash git add playbooks/proxmox_management.yml git commit -m "feat: add proxmox_management playbook for PVE VM/LXC inventory and health" ``` --- ## Task 3: `truenas_health.yml` — TrueNAS SCALE ZFS and app health **Files:** - Create: `playbooks/truenas_health.yml` **What it does:** Targets `truenas-scale`. Checks ZFS pool health, scrub status, dataset usage, SMART disk status, and running TrueNAS apps (k3s-based). Flags degraded/faulted pools. Mirrors `synology_health.yml` structure. **Note:** TrueNAS SCALE runs on Debian. The `vish` user needs sudo for `smartctl` and `zpool`. Check `host_vars/truenas-scale.yml` — `ansible_become: true` is set in `group_vars/homelab_linux.yml` which covers all hosts. **Step 1: Create the playbook** ```yaml --- # TrueNAS SCALE Health Check # Checks ZFS pool status, scrub health, dataset usage, SMART disk status, and app state. # Mirrors synology_health.yml but for TrueNAS SCALE (Debian-based with ZFS). # # Usage: ansible-playbook -i hosts.ini playbooks/truenas_health.yml - name: TrueNAS SCALE Health Check hosts: truenas-scale gather_facts: yes become: true vars: disk_warn_pct: 80 disk_critical_pct: 90 report_dir: "/tmp/health_reports" tasks: - name: Create report directory ansible.builtin.file: path: "{{ report_dir }}" state: directory mode: '0755' delegate_to: localhost run_once: true # ── System overview ─────────────────────────────────────────────── - name: Get system uptime ansible.builtin.command: uptime -p register: uptime_out changed_when: false failed_when: false - name: Get TrueNAS version ansible.builtin.shell: | cat /etc/version 2>/dev/null || \ midclt call system.version 2>/dev/null || \ echo "version unavailable" register: truenas_version changed_when: false failed_when: false # ── ZFS pool health ─────────────────────────────────────────────── - name: Get ZFS pool status ansible.builtin.command: zpool status -v register: zpool_status changed_when: false failed_when: false - name: Get ZFS pool list (usage) ansible.builtin.command: zpool list -H register: zpool_list changed_when: false failed_when: false - name: Check for degraded or faulted pools ansible.builtin.shell: | zpool status 2>/dev/null | grep -E "state:\s*(DEGRADED|FAULTED|OFFLINE|REMOVED)" | wc -l register: pool_errors changed_when: false failed_when: false - name: Assert no degraded pools ansible.builtin.assert: that: - (pool_errors.stdout | trim | int) == 0 success_msg: "All ZFS pools ONLINE" fail_msg: "DEGRADED or FAULTED pool detected — run: zpool status" changed_when: false ignore_errors: yes # ── ZFS scrub status ────────────────────────────────────────────── - name: Get last scrub info per pool ansible.builtin.shell: | for pool in $(zpool list -H -o name 2>/dev/null); do echo "Pool: $pool" zpool status "$pool" 2>/dev/null | grep -E "scrub|scan" | head -3 echo "---" done register: scrub_status changed_when: false failed_when: false # ── Dataset usage ───────────────────────────────────────────────── - name: Get dataset usage (top-level datasets) ansible.builtin.shell: | zfs list -H -o name,used,avail,refer,mountpoint -d 1 2>/dev/null | head -20 register: dataset_usage changed_when: false failed_when: false # ── SMART disk status ───────────────────────────────────────────── - name: List physical disks ansible.builtin.shell: | lsblk -d -o NAME,SIZE,MODEL,SERIAL 2>/dev/null | grep -v "loop\|sr" || \ ls /dev/sd? /dev/nvme?n? 2>/dev/null register: disk_list changed_when: false failed_when: false - name: Check SMART health for each disk ansible.builtin.shell: | failed=0 for disk in $(lsblk -d -n -o NAME 2>/dev/null | grep -v "loop\|sr"); do result=$(smartctl -H /dev/$disk 2>/dev/null | grep -E "SMART overall-health|PASSED|FAILED" || echo "n/a") echo "$disk: $result" echo "$result" | grep -q "FAILED" && failed=$((failed+1)) done exit $failed register: smart_results changed_when: false failed_when: false # ── TrueNAS apps (k3s) ──────────────────────────────────────────── - name: Get TrueNAS app status ansible.builtin.shell: | if command -v k3s >/dev/null 2>&1; then k3s kubectl get pods -A --no-headers 2>/dev/null | \ awk '{print $4}' | sort | uniq -c | sort -rn elif command -v midclt >/dev/null 2>&1; then midclt call chart.release.query 2>/dev/null | \ python3 -c " import json,sys try: apps=json.load(sys.stdin) for a in apps: print(f\"{a.get('id','?'):30} {a.get('status','?')}\") except: print('App status unavailable') " 2>/dev/null else echo "App runtime not detected (k3s/midclt not found)" fi register: app_status changed_when: false failed_when: false # ── Summary output ──────────────────────────────────────────────── - name: Display TrueNAS health summary ansible.builtin.debug: msg: | ═══ TrueNAS SCALE — {{ inventory_hostname }} ═══ Version : {{ truenas_version.stdout | default('unknown') | trim }} Uptime : {{ uptime_out.stdout | default('n/a') }} Pool errors: {{ pool_errors.stdout | trim | default('0') }} ZFS Pool List: {{ zpool_list.stdout | default('(none)') | indent(2) }} ZFS Pool Status (degraded/faulted check): Degraded pools found: {{ pool_errors.stdout | trim }} Scrub Status: {{ scrub_status.stdout | default('n/a') | indent(2) }} Dataset Usage (top-level): {{ dataset_usage.stdout | default('n/a') | indent(2) }} SMART Disk Status: {{ smart_results.stdout | default('n/a') | indent(2) }} TrueNAS Apps: {{ app_status.stdout | default('n/a') | indent(2) }} # ── Write JSON report ───────────────────────────────────────────── - name: Write TrueNAS health report ansible.builtin.copy: content: "{{ {'host': inventory_hostname, 'timestamp': ansible_date_time.iso8601, 'version': truenas_version.stdout | default('unknown') | trim, 'pool_errors': pool_errors.stdout | trim, 'zpool_list': zpool_list.stdout | default(''), 'scrub': scrub_status.stdout | default(''), 'smart': smart_results.stdout | default(''), 'apps': app_status.stdout | default('')} | to_nice_json }}" dest: "{{ report_dir }}/truenas_{{ ansible_date_time.date }}.json" delegate_to: localhost changed_when: false ``` **Step 2: Validate syntax** ```bash ansible-playbook --syntax-check -i hosts.ini playbooks/truenas_health.yml ``` Expected: no errors. **Step 3: Run against truenas-scale** ```bash ansible-playbook -i hosts.ini playbooks/truenas_health.yml ``` Expected: Health summary printed, pool status shown, SMART results visible. JSON report at `/tmp/health_reports/truenas_.json`. **Step 4: Commit** ```bash git add playbooks/truenas_health.yml git commit -m "feat: add truenas_health playbook for ZFS pool, scrub, SMART, and app status" ``` --- ## Task 4: `ntp_check.yml` — Time sync health audit **Files:** - Create: `playbooks/ntp_check.yml` **What it does:** Checks time sync status across all hosts. Detects which NTP daemon is running, extracts current offset in milliseconds, warns at >500ms, critical at >1000ms. Sends ntfy alert for hosts exceeding warn threshold. Read-only — no config changes. **Platform notes:** - Ubuntu/Debian: `systemd-timesyncd` → use `timedatectl show-timesync` or `chronyc tracking` - Synology: Uses its own NTP, check via `/proc/driver/rtc` or `synoinfo.conf` + `ntpq -p` - TrueNAS: Debian-based, likely `chrony` or `systemd-timesyncd` - Proxmox: Debian-based **Step 1: Create the playbook** ```yaml --- # NTP Time Sync Health Check # Audits time synchronization across all hosts. Read-only — no config changes. # Warns when offset > 500ms, critical > 1000ms. # # Usage: ansible-playbook -i hosts.ini playbooks/ntp_check.yml # Usage: ansible-playbook -i hosts.ini playbooks/ntp_check.yml --limit synology - name: NTP Time Sync Health Check hosts: "{{ host_target | default('active') }}" gather_facts: yes ignore_unreachable: true vars: warn_offset_ms: 500 critical_offset_ms: 1000 ntfy_url: "{{ ntfy_url | default('https://ntfy.sh/REDACTED_TOPIC') }}" report_dir: "/tmp/ntp_reports" tasks: - name: Create report directory ansible.builtin.file: path: "{{ report_dir }}" state: directory mode: '0755' delegate_to: localhost run_once: true # ── Detect NTP daemon ───────────────────────────────────────────── - name: Detect active NTP implementation ansible.builtin.shell: | if command -v chronyc >/dev/null 2>&1 && chronyc tracking >/dev/null 2>&1; then echo "chrony" elif timedatectl show-timesync 2>/dev/null | grep -q ServerName; then echo "timesyncd" elif timedatectl 2>/dev/null | grep -q "NTP service: active"; then echo "timesyncd" elif command -v ntpq >/dev/null 2>&1 && ntpq -p >/dev/null 2>&1; then echo "ntpd" else echo "unknown" fi register: ntp_impl changed_when: false failed_when: false # ── Get offset (chrony) ─────────────────────────────────────────── - name: Get chrony tracking info ansible.builtin.shell: chronyc tracking 2>/dev/null register: chrony_tracking changed_when: false failed_when: false when: ntp_impl.stdout | trim == "chrony" - name: Parse chrony offset (ms) ansible.builtin.shell: | chronyc tracking 2>/dev/null | \ grep "System time" | \ awk '{printf "%.3f", $4 * 1000}' register: chrony_offset_ms changed_when: false failed_when: false when: ntp_impl.stdout | trim == "chrony" - name: Get chrony sync source ansible.builtin.shell: | chronyc sources -v 2>/dev/null | grep "^\^" | head -3 register: chrony_sources changed_when: false failed_when: false when: ntp_impl.stdout | trim == "chrony" # ── Get offset (systemd-timesyncd) ──────────────────────────────── - name: Get timesyncd status ansible.builtin.shell: timedatectl show-timesync 2>/dev/null || timedatectl 2>/dev/null register: timesyncd_info changed_when: false failed_when: false when: ntp_impl.stdout | trim == "timesyncd" - name: Parse timesyncd offset (ms) ansible.builtin.shell: | # timesyncd doesn't expose offset cleanly — use systemd journal instead # Fall back to 0 if not available journalctl -u systemd-timesyncd --since "1 hour ago" --no-pager 2>/dev/null | \ grep -oE "offset [+-]?[0-9]+(\.[0-9]+)?(ms|us|s)" | tail -1 | \ awk '{ val=$2; unit=$3; gsub(/[^0-9.-]/,"",val); if (unit=="us") printf "%.3f", val/1000; else if (unit=="s") printf "%.3f", val*1000; else printf "%.3f", val; }' || echo "0" register: timesyncd_offset_ms changed_when: false failed_when: false when: ntp_impl.stdout | trim == "timesyncd" # ── Get offset (ntpd) ───────────────────────────────────────────── - name: Get ntpq peers ansible.builtin.shell: ntpq -pn 2>/dev/null | head -10 register: ntpq_peers changed_when: false failed_when: false when: ntp_impl.stdout | trim == "ntpd" - name: Parse ntpq offset (ms) ansible.builtin.shell: | # offset is column 9 in ntpq -p output (milliseconds) ntpq -p 2>/dev/null | awk 'NR>2 && /^\*/ {printf "%.3f", $9; exit}' || echo "0" register: ntpq_offset_ms changed_when: false failed_when: false when: ntp_impl.stdout | trim == "ntpd" # ── Consolidate offset ──────────────────────────────────────────── - name: Set unified offset fact ansible.builtin.set_fact: ntp_offset_ms: >- {{ (chrony_offset_ms.stdout | default('0')) | float if ntp_impl.stdout | trim == 'chrony' else (timesyncd_offset_ms.stdout | default('0')) | float if ntp_impl.stdout | trim == 'timesyncd' else (ntpq_offset_ms.stdout | default('0')) | float }} ntp_raw_info: >- {{ chrony_tracking.stdout | default('') if ntp_impl.stdout | trim == 'chrony' else timesyncd_info.stdout | default('') if ntp_impl.stdout | trim == 'timesyncd' else ntpq_peers.stdout | default('') }} - name: Determine sync status ansible.builtin.set_fact: ntp_status: >- {{ 'CRITICAL' if (ntp_offset_ms | abs) >= critical_offset_ms else 'WARN' if (ntp_offset_ms | abs) >= warn_offset_ms else 'OK' }} # ── Per-host summary ────────────────────────────────────────────── - name: Display NTP summary ansible.builtin.debug: msg: | ═══ {{ inventory_hostname }} ═══ NTP daemon : {{ ntp_impl.stdout | trim | default('unknown') }} Offset : {{ ntp_offset_ms }} ms Status : {{ ntp_status }} Details : {{ ntp_raw_info | indent(2) }} # ── Alert on warn/critical ──────────────────────────────────────── - name: Send ntfy alert for NTP issues ansible.builtin.uri: url: "{{ ntfy_url }}" method: POST body: "NTP {{ ntp_status }} on {{ inventory_hostname }}: offset={{ ntp_offset_ms }}ms (threshold={{ warn_offset_ms }}ms)" headers: Title: "Homelab NTP Alert" Priority: "{{ 'urgent' if ntp_status == 'CRITICAL' else 'high' }}" Tags: "warning,clock" body_format: raw status_code: [200, 204] delegate_to: localhost failed_when: false when: ntp_status in ['WARN', 'CRITICAL'] # ── Write JSON report ───────────────────────────────────────────── - name: Write NTP report ansible.builtin.copy: content: "{{ {'host': inventory_hostname, 'timestamp': ansible_date_time.iso8601, 'ntp_daemon': ntp_impl.stdout | trim, 'offset_ms': ntp_offset_ms, 'status': ntp_status} | to_nice_json }}" dest: "{{ report_dir }}/{{ inventory_hostname }}_{{ ansible_date_time.date }}.json" delegate_to: localhost changed_when: false ``` **Step 2: Validate syntax** ```bash ansible-playbook --syntax-check -i hosts.ini playbooks/ntp_check.yml ``` Expected: no errors. **Step 3: Run against one host** ```bash ansible-playbook -i hosts.ini playbooks/ntp_check.yml --limit homelab ``` Expected: NTP daemon detected, offset printed, status OK/WARN/CRITICAL. **Step 4: Run across all hosts** ```bash ansible-playbook -i hosts.ini playbooks/ntp_check.yml ``` Expected: Summary for every active host. Synology hosts may report `unknown` for daemon — that's acceptable (they have NTP but expose it differently). **Step 5: Commit** ```bash git add playbooks/ntp_check.yml git commit -m "feat: add ntp_check playbook for time sync drift auditing across all hosts" ``` --- ## Task 5: `cron_audit.yml` — Scheduled task inventory **Files:** - Create: `playbooks/cron_audit.yml` **What it does:** Inventories all scheduled tasks across every host: system crontabs, user crontabs, and systemd timer units. Flags potential security issues (root cron jobs referencing world-writable paths, missing-file paths). Outputs per-host JSON. **Step 1: Create the playbook** ```yaml --- # Cron and Scheduled Task Audit # Inventories crontabs and systemd timers across all hosts. # Flags security concerns: root crons with world-writable path references. # # Usage: ansible-playbook -i hosts.ini playbooks/cron_audit.yml # Usage: ansible-playbook -i hosts.ini playbooks/cron_audit.yml --limit homelab - name: Cron and Scheduled Task Audit hosts: "{{ host_target | default('active') }}" gather_facts: yes ignore_unreachable: true vars: report_dir: "/tmp/cron_audit" tasks: - name: Create audit report directory ansible.builtin.file: path: "{{ report_dir }}" state: directory mode: '0755' delegate_to: localhost run_once: true # ── System crontabs ─────────────────────────────────────────────── - name: Read /etc/crontab ansible.builtin.shell: cat /etc/crontab 2>/dev/null || echo "(not present)" register: etc_crontab changed_when: false failed_when: false - name: Read /etc/cron.d/ entries ansible.builtin.shell: | for f in /etc/cron.d/*; do [ -f "$f" ] || continue echo "=== $f ===" cat "$f" echo "" done register: cron_d_entries changed_when: false failed_when: false - name: Read /etc/cron.{hourly,daily,weekly,monthly} scripts ansible.builtin.shell: | for dir in hourly daily weekly monthly; do path="/etc/cron.$dir" [ -d "$path" ] || continue scripts=$(ls "$path" 2>/dev/null) if [ -n "$scripts" ]; then echo "=== /etc/cron.$dir ===" echo "$scripts" fi done register: cron_dirs changed_when: false failed_when: false # ── User crontabs ───────────────────────────────────────────────── - name: List users with crontabs ansible.builtin.shell: | if [ -d /var/spool/cron/crontabs ]; then ls /var/spool/cron/crontabs/ 2>/dev/null elif [ -d /var/spool/cron ]; then ls /var/spool/cron/ 2>/dev/null | grep -v atjobs else echo "(crontab spool not found)" fi register: users_with_crontabs changed_when: false failed_when: false - name: Dump user crontabs ansible.builtin.shell: | spool_dir="" [ -d /var/spool/cron/crontabs ] && spool_dir=/var/spool/cron/crontabs [ -d /var/spool/cron ] && [ -z "$spool_dir" ] && spool_dir=/var/spool/cron if [ -z "$spool_dir" ]; then echo "(no spool directory found)" exit 0 fi for user_file in "$spool_dir"/*; do [ -f "$user_file" ] || continue user=$(basename "$user_file") echo "=== crontab for: $user ===" cat "$user_file" 2>/dev/null echo "" done register: user_crontabs changed_when: false failed_when: false # ── Systemd timers ──────────────────────────────────────────────── - name: List systemd timers ansible.builtin.shell: | if command -v systemctl >/dev/null 2>&1; then systemctl list-timers --all --no-pager 2>/dev/null || echo "(systemd not available)" else echo "(not a systemd host)" fi register: systemd_timers changed_when: false failed_when: false # ── Security flags ──────────────────────────────────────────────── - name: REDACTED_APP_PASSWORD referencing world-writable paths ansible.builtin.shell: | # Gather all root cron entries { cat /etc/crontab 2>/dev/null cat /etc/cron.d/* 2>/dev/null spool="" [ -d /var/spool/cron/crontabs ] && spool=/var/spool/cron/crontabs [ -d /var/spool/cron ] && spool=/var/spool/cron [ -n "$spool" ] && cat "$spool/root" 2>/dev/null } | grep -v "^#" | grep -v "^$" > /tmp/_cron_lines.txt found=0 while IFS= read -r line; do # Extract script/binary paths from the cron command cmd=$(echo "$line" | awk '{for(i=6;i<=NF;i++) printf $i" "; print ""}' | awk '{print $1}') if [ -n "$cmd" ] && [ -f "$cmd" ]; then perms=$(stat -c "%a" "$cmd" 2>/dev/null || echo "") if echo "$perms" | grep -qE "^[0-9][0-9][2367]$"; then echo "FLAGGED: $cmd is world-writable — used in cron: $line" found=$((found+1)) fi fi done < /tmp/_cron_lines.txt rm -f /tmp/_cron_lines.txt [ "$found" -eq 0 ] && echo "No world-writable cron script paths found" exit 0 register: security_flags changed_when: false failed_when: false # ── Summary ─────────────────────────────────────────────────────── - name: Display cron audit summary ansible.builtin.debug: msg: | ═══ Cron Audit — {{ inventory_hostname }} ═══ /etc/crontab: {{ etc_crontab.stdout | default('(empty)') | indent(2) }} /etc/cron.d/: {{ cron_d_entries.stdout | default('(empty)') | indent(2) }} Cron directories (/etc/cron.{hourly,daily,weekly,monthly}): {{ cron_dirs.stdout | default('(empty)') | indent(2) }} Users with crontabs: {{ users_with_crontabs.stdout | default('(none)') | trim }} User crontab contents: {{ user_crontabs.stdout | default('(none)') | indent(2) }} Systemd timers: {{ systemd_timers.stdout | default('(none)') | indent(2) }} Security flags: {{ security_flags.stdout | default('(none)') | indent(2) }} # ── Write JSON report ───────────────────────────────────────────── - name: Write cron audit report ansible.builtin.copy: content: "{{ {'host': inventory_hostname, 'timestamp': ansible_date_time.iso8601, 'etc_crontab': etc_crontab.stdout | default(''), 'cron_d': cron_d_entries.stdout | default(''), 'cron_dirs': cron_dirs.stdout | default(''), 'users_with_crontabs': users_with_crontabs.stdout | default(''), 'user_crontabs': user_crontabs.stdout | default(''), 'systemd_timers': systemd_timers.stdout | default(''), 'security_flags': security_flags.stdout | default('')} | to_nice_json }}" dest: "{{ report_dir }}/{{ inventory_hostname }}_{{ ansible_date_time.date }}.json" delegate_to: localhost changed_when: false ``` **Step 2: Validate syntax** ```bash ansible-playbook --syntax-check -i hosts.ini playbooks/cron_audit.yml ``` Expected: no errors. **Step 3: Run against one host** ```bash ansible-playbook -i hosts.ini playbooks/cron_audit.yml --limit homelab ``` Expected: Cron entries and systemd timers displayed. Security flags report shown. **Step 4: Run across all hosts** ```bash ansible-playbook -i hosts.ini playbooks/cron_audit.yml ``` Expected: Summary per host. Reports written to `/tmp/cron_audit/`. **Step 5: Commit** ```bash git add playbooks/cron_audit.yml git commit -m "feat: add cron_audit playbook for scheduled task inventory across all hosts" ``` --- ## Task 6: Update README.md **Files:** - Modify: `README.md` **Step 1: Add the 5 new playbooks to the relevant tables in README.md** Add to the Health & Monitoring table: ```markdown | **`network_connectivity.yml`** | Full mesh Tailscale + SSH + HTTP endpoint health | Daily | ✅ | | **`ntp_check.yml`** | Time sync drift audit with ntfy alerts | Daily | ✅ | ``` Add a new "Platform Management" section (after Advanced Container Management): ```markdown ### 🖥️ Platform Management (3 playbooks) | Playbook | Purpose | Usage | Multi-System | |----------|---------|-------|--------------| | `synology_health.yml` | Synology NAS health (DSM, RAID, Tailscale) | Monthly | Synology only | | **`proxmox_management.yml`** | 🆕 PVE VM/LXC inventory, storage pools, snapshots | Weekly | PVE only | | **`truenas_health.yml`** | 🆕 ZFS pool health, scrub, SMART, app status | Weekly | TrueNAS only | ``` Add to the Security & Maintenance table: ```markdown | **`cron_audit.yml`** | 🆕 Scheduled task inventory + security flags | Monthly | ✅ | ``` **Step 2: Update the total playbook count at the bottom** Change: `33 playbooks` → `38 playbooks` **Step 3: Commit** ```bash git add README.md git commit -m "docs: update README with 5 new playbooks" ```