# Homelab Monitoring & Alerting Setup Last updated: 2026-03-21 ## Overview | Service | Host | Port | URL | Purpose | |---------|------|------|-----|---------| | **Grafana** | Homelab VM | 3300 | `https://gf.vish.gg` | Dashboards & visualization | | **Prometheus** | Homelab VM | 9090 | `http://192.168.0.210:9090` | Metrics collection & storage | | **Alertmanager** | Homelab VM | 9093 | `http://192.168.0.210:9093` | Alert routing & dedup | | **ntfy** | Homelab VM | 8081 | `https://ntfy.vish.gg` | Push notifications | | **Uptime Kuma** | RPi 5 | 3001 | `http://192.168.0.66:3001` or `https://kuma.vish.gg` | Uptime monitoring (97 monitors) | | **DIUN** | Atlantis | — | ntfy topic `diun` | Docker image update detection | | **Scrutiny** | Multiple | 8090 | `http://192.168.0.210:8090` | SMART disk health | All monitoring services on homelab-vm are deployed as `monitoring-stack` via Portainer GitOps from `hosts/vms/homelab-vm/monitoring.yaml`. ## Prometheus Targets (14 active) | Job | Target | Type | Status | |-----|--------|------|--------| | atlantis-node | atlantis | node_exporter | Up | | atlantis-snmp | atlantis | SNMP exporter | Up | | calypso-node | calypso | node_exporter | Up | | calypso-snmp | calypso | SNMP exporter | Up | | concord-nuc-node | concord-nuc | node_exporter | Up | | homelab-node | homelab-vm | node_exporter | Up | | node_exporter | homelab-vm | node_exporter (self) | Up | | prometheus | localhost:9090 | self-scrape | Up | | proxmox-node | proxmox | node_exporter | Up | | raspberry-pis | pi-5 | node_exporter | Up | | seattle-node | seattle | node_exporter | Up | | setillo-node | setillo | node_exporter | Up | | setillo-snmp | setillo | SNMP exporter | Up | | truenas-node | guava | node_exporter | Up | ## Grafana - **URL**: `https://gf.vish.gg` (Authentik SSO) or `http://192.168.0.210:3300` - **Login**: Authentik SSO (primary), local `admin` account (fallback) - **Dashboards**: Infrastructure Overview, Node Details, Synology NAS, Node Exporter Full - **Stack**: `hosts/vms/homelab-vm/monitoring.yaml` ## ntfy Push Notifications - **Public URL**: `https://ntfy.vish.gg` - **Local URL**: `http://192.168.0.210:8081` - **Primary topic**: `homelab-alerts` (subscribed by mobile app) - **Other topics**: `diun` (image updates), `homelab-deploys` (CI deployments) ### Send test notification ```bash curl -X POST "http://localhost:8081/homelab-alerts" \ -H "Title: Test Alert" \ -H "Priority: 3" \ -d "Test notification from homelab" ``` ### Mobile app setup 1. Install ntfy app (Android/iOS) 2. Add server: `https://ntfy.vish.gg` 3. Subscribe to topic: `homelab-alerts` ## Uptime Kuma - **URL**: `http://192.168.0.66:3001` or `https://kuma.vish.gg` (via Authentik forward auth) - **Host**: RPi 5 (pi-5) - **Monitors**: 97 total (27 HTTP + 70 port checks) - **Status page**: `https://kuma.vish.gg/status/homelab` ### Monitor types - **HTTP monitors**: Check public URLs (`https://*.vish.gg`) — works for all services behind NPM - **Port monitors**: Check Tailscale IPs (100.x.x.x) on service ports — direct connectivity check ### Known limitations - Port monitors use Tailscale IPs because pi-5 can't resolve `.tail.vish.gg` (AdGuard not configured as its DNS) - Some services return 401/302 on port check (auth required) but Kuma counts any TCP response as "up" ## Alertmanager - **URL**: `http://192.168.0.210:9093` - **Routes to**: ntfy via webhook - **Active alerts**: 0 (normal) ## Watchtower Notifications Watchtower runs on 3 hosts with ntfy integration: ```yaml # In common/watchtower-full.yaml WATCHTOWER_NOTIFICATIONS=shoutrrr WATCHTOWER_NOTIFICATION_URL=ntfy://192.168.0.210:8081/homelab-alerts?scheme=http ``` Manual update trigger: ```bash curl -X POST http://192.168.0.200:8083/v1/update \ -H "Authorization: Bearer watchtower-metrics-token" ``` ## DIUN (Docker Image Update Notifier) - **Host**: Atlantis - **Schedule**: Mondays 09:00 UTC - **Notifications**: ntfy topic `diun` - **What it does**: Scans all running container images for new upstream digests ## Scrutiny (SMART Monitoring) Scrutiny collectors run on hosts with physical drives: - Atlantis (8x HDD + 4x NVMe) - Calypso (2x HDD + 2x NVMe) - RPi 5 (NVMe) Hub dashboard: `http://192.168.0.210:8090` or `https://scrutiny.vish.gg` ## Related Documentation - [Image Update Guide](IMAGE_UPDATE_GUIDE.md) — Renovate, DIUN, Watchtower - [Ansible Playbook Guide](ANSIBLE_PLAYBOOK_GUIDE.md) — `health_check.yml`, `service_status.yml` - [Backup Strategy](../infrastructure/backup-strategy.md) — backup monitoring - [Offline & Remote Access](../infrastructure/offline-and-remote-access.md) — accessing monitoring when internet is down