Files
homelab-optimized/docs/guides/scrutiny-smart-monitoring.md
Gitea Mirror Bot c727d0bfb1
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-03-24 12:45:58 UTC
2026-03-24 12:45:58 +00:00

4.8 KiB
Raw Blame History

Scrutiny — SMART Disk Health Monitoring

Scrutiny runs SMART health checks on physical drives and presents results in a web UI with historical trending and alerting.

Architecture

                    ┌─────────────────────────────────┐
                    │  homelab-vm (100.67.40.126)     │
                    │  scrutiny-web  :8090             │
                    │  scrutiny-influxdb (internal)    │
                    └──────────────┬──────────────────┘
                                   │ collector API
            ┌──────────────────────┼──────────────────────┐
            │                      │                      │
   atlantis-collector    calypso-collector    setillo-collector
   concord-nuc-collector    pi-5-collector
Role Host Notes
Hub (web + InfluxDB) homelab-vm Port 8090, proxied at scrutiny.vish.gg
Collector atlantis 8-bay NAS, /dev/sdasdh
Collector calypso 2-bay NAS, /dev/sdasdb
Collector setillo 2-bay NAS, /dev/sdasdb
Collector concord-nuc Intel NUC, /dev/sda (NVMe optional)
Collector pi-5 /dev/nvme0n1 (M.2 HAT)
Skipped homelab-vm, seattle, matrix-ubuntu VMs — no physical disks
Skipped guava (TrueNAS) Native TrueNAS disk monitoring

Files

File Purpose
hosts/vms/homelab-vm/scrutiny.yaml Hub (web + InfluxDB)
hosts/synology/atlantis/scrutiny-collector.yaml Atlantis collector
hosts/synology/calypso/scrutiny-collector.yaml Calypso collector
hosts/synology/setillo/scrutiny-collector.yaml Setillo collector
hosts/physical/concord-nuc/scrutiny-collector.yaml NUC collector
hosts/edge/rpi5-vish/scrutiny-collector.yaml Pi-5 collector

Deployment

Hub (homelab-vm)

Deploy via Portainer GitOps on endpoint 443399:

  1. Portainer → Stacks → Add stack → Git repository
  2. URL: https://git.vish.gg/Vish/homelab
  3. Compose path: hosts/vms/homelab-vm/scrutiny.yaml

Or manually:

ssh homelab
docker compose -f /path/to/scrutiny.yaml up -d

Verify:

curl http://100.67.40.126:8090/api/health
# {"success":true}

Collectors — Synology (Atlantis, Calypso, Setillo)

Synology requires privileged: true (DSM kernel lacks nf_conntrack_netlink).

Deploy via Portainer stacks on each Synology host, or manually:

ssh atlantis
sudo /var/packages/REDACTED_APP_PASSWORD/target/usr/bin/docker compose \
  -f /path/to/scrutiny-collector.yaml up -d

Important — verify drive paths first:

# List block devices on the host
lsblk -o NAME,SIZE,TYPE,MODEL
# Or for Synology:
sudo fdisk -l | grep '^Disk /dev'

Update the devices: list in the collector compose to match actual drives.

Collectors — Linux (concord-nuc, pi-5)

Deploy via Portainer edge agent or manually:

ssh vish-concord-nuc
docker compose -f scrutiny-collector.yaml up -d

Verify a collector is shipping data:

docker logs scrutiny-collector --tail 20
# Should show: "Sending device summary to Scrutiny API"

DNS / Subdomain Setup

scrutiny.vish.gg is already added to the DDNS updater on Atlantis (dynamicdnsupdater.yaml).

Still needed (manual steps):

  1. Cloudflare DNS: add A record scrutiny.vish.gg → current public IP (proxied)
    • Or let the DDNS container create it automatically on next run
  2. NPM proxy host: scrutiny.vish.gg → http://100.67.40.126:8090

Validation

# Hub health
curl http://100.67.40.126:8090/api/health

# List all tracked devices after collectors run
curl http://100.67.40.126:8090/api/devices | jq '.data[].device_name'

# Check collector logs
docker logs scrutiny-collector

# Open UI
open https://scrutiny.vish.gg

Collector Schedule

By default, collectors run a SMART scan on startup and then hourly. The schedule is controlled inside the container — no cron needed.


Troubleshooting

"permission denied" on /dev/sdX → Use privileged: true on Synology. On Linux, use cap_add: [SYS_RAWIO, SYS_ADMIN].

Device not found in collector → Run lsblk on the host, update devices: list in the compose file, recreate the container.

Hub shows no devices → Check collector logs for API errors. Verify COLLECTOR_API_ENDPOINT is reachable from the collector host via Tailscale (curl http://100.67.40.126:8090/api/health).

InfluxDB fails to start → The influxdb container initialises on first run; scrutiny-web depends on it but may start before it's ready. Wait ~30s and check docker logs scrutiny-influxdb.