152 lines
4.8 KiB
Markdown
152 lines
4.8 KiB
Markdown
# Scrutiny — SMART Disk Health Monitoring
|
||
|
||
Scrutiny runs SMART health checks on physical drives and presents results in a web UI with historical trending and alerting.
|
||
|
||
## Architecture
|
||
|
||
```
|
||
┌─────────────────────────────────┐
|
||
│ homelab-vm (100.67.40.126) │
|
||
│ scrutiny-web :8090 │
|
||
│ scrutiny-influxdb (internal) │
|
||
└──────────────┬──────────────────┘
|
||
│ collector API
|
||
┌──────────────────────┼──────────────────────┐
|
||
│ │ │
|
||
atlantis-collector calypso-collector setillo-collector
|
||
concord-nuc-collector pi-5-collector
|
||
```
|
||
|
||
| Role | Host | Notes |
|
||
|------|------|-------|
|
||
| Hub (web + InfluxDB) | homelab-vm | Port 8090, proxied at scrutiny.vish.gg |
|
||
| Collector | atlantis | 8-bay NAS, /dev/sda–sdh |
|
||
| Collector | calypso | 2-bay NAS, /dev/sda–sdb |
|
||
| Collector | setillo | 2-bay NAS, /dev/sda–sdb |
|
||
| Collector | concord-nuc | Intel NUC, /dev/sda (NVMe optional) |
|
||
| Collector | pi-5 | /dev/nvme0n1 (M.2 HAT) |
|
||
| Skipped | homelab-vm, seattle, matrix-ubuntu | VMs — no physical disks |
|
||
| Skipped | guava (TrueNAS) | Native TrueNAS disk monitoring |
|
||
|
||
---
|
||
|
||
## Files
|
||
|
||
| File | Purpose |
|
||
|------|---------|
|
||
| `hosts/vms/homelab-vm/scrutiny.yaml` | Hub (web + InfluxDB) |
|
||
| `hosts/synology/atlantis/scrutiny-collector.yaml` | Atlantis collector |
|
||
| `hosts/synology/calypso/scrutiny-collector.yaml` | Calypso collector |
|
||
| `hosts/synology/setillo/scrutiny-collector.yaml` | Setillo collector |
|
||
| `hosts/physical/concord-nuc/scrutiny-collector.yaml` | NUC collector |
|
||
| `hosts/edge/rpi5-vish/scrutiny-collector.yaml` | Pi-5 collector |
|
||
|
||
---
|
||
|
||
## Deployment
|
||
|
||
### Hub (homelab-vm)
|
||
|
||
Deploy via Portainer GitOps on endpoint 443399:
|
||
1. Portainer → Stacks → Add stack → Git repository
|
||
2. URL: `https://git.vish.gg/Vish/homelab`
|
||
3. Compose path: `hosts/vms/homelab-vm/scrutiny.yaml`
|
||
|
||
Or manually:
|
||
```bash
|
||
ssh homelab
|
||
docker compose -f /path/to/scrutiny.yaml up -d
|
||
```
|
||
|
||
Verify:
|
||
```bash
|
||
curl http://100.67.40.126:8090/api/health
|
||
# {"success":true}
|
||
```
|
||
|
||
### Collectors — Synology (Atlantis, Calypso, Setillo)
|
||
|
||
Synology requires `privileged: true` (DSM kernel lacks `nf_conntrack_netlink`).
|
||
|
||
Deploy via Portainer stacks on each Synology host, or manually:
|
||
```bash
|
||
ssh atlantis
|
||
sudo /var/packages/REDACTED_APP_PASSWORD/target/usr/bin/docker compose \
|
||
-f /path/to/scrutiny-collector.yaml up -d
|
||
```
|
||
|
||
**Important — verify drive paths first:**
|
||
```bash
|
||
# List block devices on the host
|
||
lsblk -o NAME,SIZE,TYPE,MODEL
|
||
# Or for Synology:
|
||
sudo fdisk -l | grep '^Disk /dev'
|
||
```
|
||
|
||
Update the `devices:` list in the collector compose to match actual drives.
|
||
|
||
### Collectors — Linux (concord-nuc, pi-5)
|
||
|
||
Deploy via Portainer edge agent or manually:
|
||
```bash
|
||
ssh vish-concord-nuc
|
||
docker compose -f scrutiny-collector.yaml up -d
|
||
```
|
||
|
||
Verify a collector is shipping data:
|
||
```bash
|
||
docker logs scrutiny-collector --tail 20
|
||
# Should show: "Sending device summary to Scrutiny API"
|
||
```
|
||
|
||
---
|
||
|
||
## DNS / Subdomain Setup
|
||
|
||
`scrutiny.vish.gg` is already added to the DDNS updater on Atlantis (`dynamicdnsupdater.yaml`).
|
||
|
||
Still needed (manual steps):
|
||
1. **Cloudflare DNS**: add A record `scrutiny.vish.gg → current public IP` (proxied)
|
||
- Or let the DDNS container create it automatically on next run
|
||
2. **NPM proxy host**: `scrutiny.vish.gg → http://100.67.40.126:8090`
|
||
|
||
---
|
||
|
||
## Validation
|
||
|
||
```bash
|
||
# Hub health
|
||
curl http://100.67.40.126:8090/api/health
|
||
|
||
# List all tracked devices after collectors run
|
||
curl http://100.67.40.126:8090/api/devices | jq '.data[].device_name'
|
||
|
||
# Check collector logs
|
||
docker logs scrutiny-collector
|
||
|
||
# Open UI
|
||
open https://scrutiny.vish.gg
|
||
```
|
||
|
||
---
|
||
|
||
## Collector Schedule
|
||
|
||
By default, collectors run a SMART scan on startup and then hourly. The schedule is controlled inside the container — no cron needed.
|
||
|
||
---
|
||
|
||
## Troubleshooting
|
||
|
||
**"permission denied" on /dev/sdX**
|
||
→ Use `privileged: true` on Synology. On Linux, use `cap_add: [SYS_RAWIO, SYS_ADMIN]`.
|
||
|
||
**Device not found in collector**
|
||
→ Run `lsblk` on the host, update `devices:` list in the compose file, recreate the container.
|
||
|
||
**Hub shows no devices**
|
||
→ Check collector logs for API errors. Verify `COLLECTOR_API_ENDPOINT` is reachable from the collector host via Tailscale (`curl http://100.67.40.126:8090/api/health`).
|
||
|
||
**InfluxDB fails to start**
|
||
→ The influxdb container initialises on first run; `scrutiny-web` depends on it but may start before it's ready. Wait ~30s and check `docker logs scrutiny-influxdb`.
|