Files
homelab-optimized/docs/guides/scrutiny-smart-monitoring.md
Gitea Mirror Bot fb00a325d1
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m14s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-18 11:19:59 UTC
2026-04-18 11:19:59 +00:00

152 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Scrutiny — SMART Disk Health Monitoring
Scrutiny runs SMART health checks on physical drives and presents results in a web UI with historical trending and alerting.
## Architecture
```
┌─────────────────────────────────┐
│ homelab-vm (100.67.40.126) │
│ scrutiny-web :8090 │
│ scrutiny-influxdb (internal) │
└──────────────┬──────────────────┘
│ collector API
┌──────────────────────┼──────────────────────┐
│ │ │
atlantis-collector calypso-collector setillo-collector
concord-nuc-collector pi-5-collector
```
| Role | Host | Notes |
|------|------|-------|
| Hub (web + InfluxDB) | homelab-vm | Port 8090, proxied at scrutiny.vish.gg |
| Collector | atlantis | 8-bay NAS, /dev/sdasdh |
| Collector | calypso | 2-bay NAS, /dev/sdasdb |
| Collector | setillo | 2-bay NAS, /dev/sdasdb |
| Collector | concord-nuc | Intel NUC, /dev/sda (NVMe optional) |
| Collector | pi-5 | /dev/nvme0n1 (M.2 HAT) |
| Skipped | homelab-vm, seattle, matrix-ubuntu | VMs — no physical disks |
| Skipped | guava (TrueNAS) | Native TrueNAS disk monitoring |
---
## Files
| File | Purpose |
|------|---------|
| `hosts/vms/homelab-vm/scrutiny.yaml` | Hub (web + InfluxDB) |
| `hosts/synology/atlantis/scrutiny-collector.yaml` | Atlantis collector |
| `hosts/synology/calypso/scrutiny-collector.yaml` | Calypso collector |
| `hosts/synology/setillo/scrutiny-collector.yaml` | Setillo collector |
| `hosts/physical/concord-nuc/scrutiny-collector.yaml` | NUC collector |
| `hosts/edge/rpi5-vish/scrutiny-collector.yaml` | Pi-5 collector |
---
## Deployment
### Hub (homelab-vm)
Deploy via Portainer GitOps on endpoint 443399:
1. Portainer → Stacks → Add stack → Git repository
2. URL: `https://git.vish.gg/Vish/homelab`
3. Compose path: `hosts/vms/homelab-vm/scrutiny.yaml`
Or manually:
```bash
ssh homelab
docker compose -f /path/to/scrutiny.yaml up -d
```
Verify:
```bash
curl http://100.67.40.126:8090/api/health
# {"success":true}
```
### Collectors — Synology (Atlantis, Calypso, Setillo)
Synology requires `privileged: true` (DSM kernel lacks `nf_conntrack_netlink`).
Deploy via Portainer stacks on each Synology host, or manually:
```bash
ssh atlantis
sudo /var/packages/REDACTED_APP_PASSWORD/target/usr/bin/docker compose \
-f /path/to/scrutiny-collector.yaml up -d
```
**Important — verify drive paths first:**
```bash
# List block devices on the host
lsblk -o NAME,SIZE,TYPE,MODEL
# Or for Synology:
sudo fdisk -l | grep '^Disk /dev'
```
Update the `devices:` list in the collector compose to match actual drives.
### Collectors — Linux (concord-nuc, pi-5)
Deploy via Portainer edge agent or manually:
```bash
ssh vish-concord-nuc
docker compose -f scrutiny-collector.yaml up -d
```
Verify a collector is shipping data:
```bash
docker logs scrutiny-collector --tail 20
# Should show: "Sending device summary to Scrutiny API"
```
---
## DNS / Subdomain Setup
`scrutiny.vish.gg` is already added to the DDNS updater on Atlantis (`dynamicdnsupdater.yaml`).
Still needed (manual steps):
1. **Cloudflare DNS**: add A record `scrutiny.vish.gg → current public IP` (proxied)
- Or let the DDNS container create it automatically on next run
2. **NPM proxy host**: `scrutiny.vish.gg → http://100.67.40.126:8090`
---
## Validation
```bash
# Hub health
curl http://100.67.40.126:8090/api/health
# List all tracked devices after collectors run
curl http://100.67.40.126:8090/api/devices | jq '.data[].device_name'
# Check collector logs
docker logs scrutiny-collector
# Open UI
open https://scrutiny.vish.gg
```
---
## Collector Schedule
By default, collectors run a SMART scan on startup and then hourly. The schedule is controlled inside the container — no cron needed.
---
## Troubleshooting
**"permission denied" on /dev/sdX**
→ Use `privileged: true` on Synology. On Linux, use `cap_add: [SYS_RAWIO, SYS_ADMIN]`.
**Device not found in collector**
→ Run `lsblk` on the host, update `devices:` list in the compose file, recreate the container.
**Hub shows no devices**
→ Check collector logs for API errors. Verify `COLLECTOR_API_ENDPOINT` is reachable from the collector host via Tailscale (`curl http://100.67.40.126:8090/api/health`).
**InfluxDB fails to start**
→ The influxdb container initialises on first run; `scrutiny-web` depends on it but may start before it's ready. Wait ~30s and check `docker logs scrutiny-influxdb`.