Sanitized mirror from private repository - 2026-03-21 11:08:37 UTC
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled

This commit is contained in:
Gitea Mirror Bot
2026-03-21 11:08:37 +00:00
commit 352b75ebcf
1236 changed files with 307401 additions and 0 deletions

View File

@@ -0,0 +1,123 @@
# Homelab Monitoring & Alerting Setup
Last updated: 2026-03-21
## Overview
| Service | Host | Port | URL | Purpose |
|---------|------|------|-----|---------|
| **Grafana** | Homelab VM | 3300 | `https://gf.vish.gg` | Dashboards & visualization |
| **Prometheus** | Homelab VM | 9090 | `http://192.168.0.210:9090` | Metrics collection & storage |
| **Alertmanager** | Homelab VM | 9093 | `http://192.168.0.210:9093` | Alert routing & dedup |
| **ntfy** | Homelab VM | 8081 | `https://ntfy.vish.gg` | Push notifications |
| **Uptime Kuma** | RPi 5 | 3001 | `http://192.168.0.66:3001` or `https://kuma.vish.gg` | Uptime monitoring (97 monitors) |
| **DIUN** | Atlantis | — | ntfy topic `diun` | Docker image update detection |
| **Scrutiny** | Multiple | 8090 | `http://192.168.0.210:8090` | SMART disk health |
All monitoring services on homelab-vm are deployed as `monitoring-stack` via Portainer GitOps from `hosts/vms/homelab-vm/monitoring.yaml`.
## Prometheus Targets (14 active)
| Job | Target | Type | Status |
|-----|--------|------|--------|
| atlantis-node | atlantis | node_exporter | Up |
| atlantis-snmp | atlantis | SNMP exporter | Up |
| calypso-node | calypso | node_exporter | Up |
| calypso-snmp | calypso | SNMP exporter | Up |
| concord-nuc-node | concord-nuc | node_exporter | Up |
| homelab-node | homelab-vm | node_exporter | Up |
| node_exporter | homelab-vm | node_exporter (self) | Up |
| prometheus | localhost:9090 | self-scrape | Up |
| proxmox-node | proxmox | node_exporter | Up |
| raspberry-pis | pi-5 | node_exporter | Up |
| seattle-node | seattle | node_exporter | Up |
| setillo-node | setillo | node_exporter | Up |
| setillo-snmp | setillo | SNMP exporter | Up |
| truenas-node | guava | node_exporter | Up |
## Grafana
- **URL**: `https://gf.vish.gg` (Authentik SSO) or `http://192.168.0.210:3300`
- **Login**: Authentik SSO (primary), local `admin` account (fallback)
- **Dashboards**: Infrastructure Overview, Node Details, Synology NAS, Node Exporter Full
- **Stack**: `hosts/vms/homelab-vm/monitoring.yaml`
## ntfy Push Notifications
- **Public URL**: `https://ntfy.vish.gg`
- **Local URL**: `http://192.168.0.210:8081`
- **Primary topic**: `homelab-alerts` (subscribed by mobile app)
- **Other topics**: `diun` (image updates), `homelab-deploys` (CI deployments)
### Send test notification
```bash
curl -X POST "http://localhost:8081/homelab-alerts" \
-H "Title: Test Alert" \
-H "Priority: 3" \
-d "Test notification from homelab"
```
### Mobile app setup
1. Install ntfy app (Android/iOS)
2. Add server: `https://ntfy.vish.gg`
3. Subscribe to topic: `homelab-alerts`
## Uptime Kuma
- **URL**: `http://192.168.0.66:3001` or `https://kuma.vish.gg` (via Authentik forward auth)
- **Host**: RPi 5 (pi-5)
- **Monitors**: 97 total (27 HTTP + 70 port checks)
- **Status page**: `https://kuma.vish.gg/status/homelab`
### Monitor types
- **HTTP monitors**: Check public URLs (`https://*.vish.gg`) — works for all services behind NPM
- **Port monitors**: Check Tailscale IPs (100.x.x.x) on service ports — direct connectivity check
### Known limitations
- Port monitors use Tailscale IPs because pi-5 can't resolve `.tail.vish.gg` (AdGuard not configured as its DNS)
- Some services return 401/302 on port check (auth required) but Kuma counts any TCP response as "up"
## Alertmanager
- **URL**: `http://192.168.0.210:9093`
- **Routes to**: ntfy via webhook
- **Active alerts**: 0 (normal)
## Watchtower Notifications
Watchtower runs on 3 hosts with ntfy integration:
```yaml
# In common/watchtower-full.yaml
WATCHTOWER_NOTIFICATIONS=shoutrrr
WATCHTOWER_NOTIFICATION_URL=ntfy://192.168.0.210:8081/homelab-alerts?scheme=http
```
Manual update trigger:
```bash
curl -X POST http://192.168.0.200:8083/v1/update \
-H "Authorization: Bearer watchtower-metrics-token"
```
## DIUN (Docker Image Update Notifier)
- **Host**: Atlantis
- **Schedule**: Mondays 09:00 UTC
- **Notifications**: ntfy topic `diun`
- **What it does**: Scans all running container images for new upstream digests
## Scrutiny (SMART Monitoring)
Scrutiny collectors run on hosts with physical drives:
- Atlantis (8x HDD + 4x NVMe)
- Calypso (2x HDD + 2x NVMe)
- RPi 5 (NVMe)
Hub dashboard: `http://192.168.0.210:8090` or `https://scrutiny.vish.gg`
## Related Documentation
- [Image Update Guide](IMAGE_UPDATE_GUIDE.md) — Renovate, DIUN, Watchtower
- [Ansible Playbook Guide](ANSIBLE_PLAYBOOK_GUIDE.md) — `health_check.yml`, `service_status.yml`
- [Backup Strategy](../infrastructure/backup-strategy.md) — backup monitoring
- [Offline & Remote Access](../infrastructure/offline-and-remote-access.md) — accessing monitoring when internet is down