Sanitized mirror from private repository - 2026-03-18 10:31:50 UTC
Some checks failed
Documentation / Build Docusaurus (push) Failing after 21m3s
Documentation / Deploy to GitHub Pages (push) Has been skipped

This commit is contained in:
Gitea Mirror Bot
2026-03-18 10:31:50 +00:00
commit 8e49624d78
1221 changed files with 304405 additions and 0 deletions

View File

@@ -0,0 +1,151 @@
# Scrutiny — SMART Disk Health Monitoring
Scrutiny runs SMART health checks on physical drives and presents results in a web UI with historical trending and alerting.
## Architecture
```
┌─────────────────────────────────┐
│ homelab-vm (100.67.40.126) │
│ scrutiny-web :8090 │
│ scrutiny-influxdb (internal) │
└──────────────┬──────────────────┘
│ collector API
┌──────────────────────┼──────────────────────┐
│ │ │
atlantis-collector calypso-collector setillo-collector
concord-nuc-collector pi-5-collector
```
| Role | Host | Notes |
|------|------|-------|
| Hub (web + InfluxDB) | homelab-vm | Port 8090, proxied at scrutiny.vish.gg |
| Collector | atlantis | 8-bay NAS, /dev/sdasdh |
| Collector | calypso | 2-bay NAS, /dev/sdasdb |
| Collector | setillo | 2-bay NAS, /dev/sdasdb |
| Collector | concord-nuc | Intel NUC, /dev/sda (NVMe optional) |
| Collector | pi-5 | /dev/nvme0n1 (M.2 HAT) |
| Skipped | homelab-vm, seattle, matrix-ubuntu | VMs — no physical disks |
| Skipped | guava (TrueNAS) | Native TrueNAS disk monitoring |
---
## Files
| File | Purpose |
|------|---------|
| `hosts/vms/homelab-vm/scrutiny.yaml` | Hub (web + InfluxDB) |
| `hosts/synology/atlantis/scrutiny-collector.yaml` | Atlantis collector |
| `hosts/synology/calypso/scrutiny-collector.yaml` | Calypso collector |
| `hosts/synology/setillo/scrutiny-collector.yaml` | Setillo collector |
| `hosts/physical/concord-nuc/scrutiny-collector.yaml` | NUC collector |
| `hosts/edge/rpi5-vish/scrutiny-collector.yaml` | Pi-5 collector |
---
## Deployment
### Hub (homelab-vm)
Deploy via Portainer GitOps on endpoint 443399:
1. Portainer → Stacks → Add stack → Git repository
2. URL: `https://git.vish.gg/Vish/homelab`
3. Compose path: `hosts/vms/homelab-vm/scrutiny.yaml`
Or manually:
```bash
ssh homelab
docker compose -f /path/to/scrutiny.yaml up -d
```
Verify:
```bash
curl http://100.67.40.126:8090/api/health
# {"success":true}
```
### Collectors — Synology (Atlantis, Calypso, Setillo)
Synology requires `privileged: true` (DSM kernel lacks `nf_conntrack_netlink`).
Deploy via Portainer stacks on each Synology host, or manually:
```bash
ssh atlantis
sudo /var/packages/REDACTED_APP_PASSWORD/target/usr/bin/docker compose \
-f /path/to/scrutiny-collector.yaml up -d
```
**Important — verify drive paths first:**
```bash
# List block devices on the host
lsblk -o NAME,SIZE,TYPE,MODEL
# Or for Synology:
sudo fdisk -l | grep '^Disk /dev'
```
Update the `devices:` list in the collector compose to match actual drives.
### Collectors — Linux (concord-nuc, pi-5)
Deploy via Portainer edge agent or manually:
```bash
ssh vish-concord-nuc
docker compose -f scrutiny-collector.yaml up -d
```
Verify a collector is shipping data:
```bash
docker logs scrutiny-collector --tail 20
# Should show: "Sending device summary to Scrutiny API"
```
---
## DNS / Subdomain Setup
`scrutiny.vish.gg` is already added to the DDNS updater on Atlantis (`dynamicdnsupdater.yaml`).
Still needed (manual steps):
1. **Cloudflare DNS**: add A record `scrutiny.vish.gg → current public IP` (proxied)
- Or let the DDNS container create it automatically on next run
2. **NPM proxy host**: `scrutiny.vish.gg → http://100.67.40.126:8090`
---
## Validation
```bash
# Hub health
curl http://100.67.40.126:8090/api/health
# List all tracked devices after collectors run
curl http://100.67.40.126:8090/api/devices | jq '.data[].device_name'
# Check collector logs
docker logs scrutiny-collector
# Open UI
open https://scrutiny.vish.gg
```
---
## Collector Schedule
By default, collectors run a SMART scan on startup and then hourly. The schedule is controlled inside the container — no cron needed.
---
## Troubleshooting
**"permission denied" on /dev/sdX**
→ Use `privileged: true` on Synology. On Linux, use `cap_add: [SYS_RAWIO, SYS_ADMIN]`.
**Device not found in collector**
→ Run `lsblk` on the host, update `devices:` list in the compose file, recreate the container.
**Hub shows no devices**
→ Check collector logs for API errors. Verify `COLLECTOR_API_ENDPOINT` is reachable from the collector host via Tailscale (`curl http://100.67.40.126:8090/api/health`).
**InfluxDB fails to start**
→ The influxdb container initialises on first run; `scrutiny-web` depends on it but may start before it's ready. Wait ~30s and check `docker logs scrutiny-influxdb`.