Sanitized mirror from private repository - 2026-04-05 10:36:59 UTC
This commit is contained in:
261
docs/admin/alerting-setup.md
Normal file
261
docs/admin/alerting-setup.md
Normal file
@@ -0,0 +1,261 @@
|
||||
# 🚨 Alerting & Notification System
|
||||
|
||||
**Last Updated**: 2026-01-27
|
||||
|
||||
This document describes the homelab alerting stack that provides dual-channel notifications via **ntfy** (mobile push) and **Signal** (encrypted messaging).
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The alerting system monitors your infrastructure and sends notifications through two channels:
|
||||
|
||||
| Channel | Use Case | App Required |
|
||||
|---------|----------|--------------|
|
||||
| **ntfy** | All alerts (warnings + critical) | ntfy iOS/Android app |
|
||||
| **Signal** | Critical alerts only | Signal messenger |
|
||||
|
||||
### Alert Severity Routing
|
||||
|
||||
```
|
||||
⚠️ Warning alerts → ntfy only
|
||||
🚨 Critical alerts → ntfy + Signal
|
||||
✅ Resolved alerts → Both channels (for critical)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
||||
│ Prometheus │────▶│ Alertmanager │────▶│ ntfy-bridge │───▶ ntfy app
|
||||
│ (port 9090) │ │ (port 9093) │ │ (port 5001) │
|
||||
└─────────────────┘ └────────┬─────────┘ └─────────────────┘
|
||||
│
|
||||
│ (critical only)
|
||||
▼
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ signal-bridge │────▶│ Signal API │───▶ Signal app
|
||||
│ (port 5000) │ │ (port 8080) │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Prometheus (Metrics Collection)
|
||||
- **Location**: Homelab VM
|
||||
- **Port**: 9090
|
||||
- **Config**: `~/docker/monitoring/prometheus/prometheus.yml`
|
||||
- **Alert Rules**: `~/docker/monitoring/prometheus/alert-rules.yml`
|
||||
|
||||
### 2. Alertmanager (Alert Routing)
|
||||
- **Location**: Homelab VM
|
||||
- **Port**: 9093
|
||||
- **Config**: `~/docker/monitoring/alerting/alertmanager/alertmanager.yml`
|
||||
- **Web UI**: http://homelab-vm:9093
|
||||
|
||||
### 3. ntfy-bridge (Notification Formatter)
|
||||
- **Location**: Homelab VM
|
||||
- **Port**: 5001
|
||||
- **Purpose**: Formats Alertmanager webhooks into clean ntfy notifications
|
||||
- **Source**: `~/docker/monitoring/alerting/ntfy-bridge/`
|
||||
|
||||
### 4. signal-bridge (Signal Forwarder)
|
||||
- **Location**: Homelab VM
|
||||
- **Port**: 5000
|
||||
- **Purpose**: Forwards critical alerts to Signal via signal-api
|
||||
- **Source**: `~/docker/monitoring/alerting/signal-bridge/`
|
||||
|
||||
---
|
||||
|
||||
## Alert Rules Configured
|
||||
|
||||
| Alert | Severity | Threshold | Duration | Notification |
|
||||
|-------|----------|-----------|----------|--------------|
|
||||
| **HostDown** | 🔴 Critical | Host unreachable | 2 min | ntfy + Signal |
|
||||
| **HighCPUUsage** | 🟡 Warning | CPU > 80% | 5 min | ntfy only |
|
||||
| **CriticalCPUUsage** | 🔴 Critical | CPU > 95% | 2 min | ntfy + Signal |
|
||||
| **HighMemoryUsage** | 🟡 Warning | Memory > 85% | 5 min | ntfy only |
|
||||
| **CriticalMemoryUsage** | 🔴 Critical | Memory > 95% | 2 min | ntfy + Signal |
|
||||
| **HighDiskUsage** | 🟡 Warning | Disk > 85% | 5 min | ntfy only |
|
||||
| **CriticalDiskUsage** | 🔴 Critical | Disk > 95% | 2 min | ntfy + Signal |
|
||||
| **DiskWillFillIn24Hours** | 🟡 Warning | Predictive | 5 min | ntfy only |
|
||||
| **HighNetworkErrors** | 🟡 Warning | Errors > 1% | 5 min | ntfy only |
|
||||
| **ServiceDown** | 🔴 Critical | Container exited | 1 min | ntfy + Signal |
|
||||
| **ContainerHighCPU** | 🟡 Warning | Container CPU > 80% | 5 min | ntfy only |
|
||||
| **ContainerHighMemory** | 🟡 Warning | Container Memory > 80% | 5 min | ntfy only |
|
||||
|
||||
---
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### Alertmanager Configuration
|
||||
```yaml
|
||||
# ~/docker/monitoring/alerting/alertmanager/alertmanager.yml
|
||||
|
||||
global:
|
||||
resolve_timeout: 5m
|
||||
|
||||
route:
|
||||
group_by: ['alertname', 'severity', 'instance']
|
||||
group_wait: 30s
|
||||
group_interval: 5m
|
||||
repeat_interval: 4h
|
||||
receiver: 'ntfy-all'
|
||||
|
||||
routes:
|
||||
- match:
|
||||
severity: critical
|
||||
receiver: 'critical-alerts'
|
||||
- match:
|
||||
severity: warning
|
||||
receiver: 'ntfy-all'
|
||||
|
||||
receivers:
|
||||
- name: 'ntfy-all'
|
||||
webhook_configs:
|
||||
- url: 'http://ntfy-bridge:5001/alert'
|
||||
send_resolved: true
|
||||
|
||||
- name: 'critical-alerts'
|
||||
webhook_configs:
|
||||
- url: 'http://ntfy-bridge:5001/alert'
|
||||
send_resolved: true
|
||||
- url: 'http://signal-bridge:5000/alert'
|
||||
send_resolved: true
|
||||
```
|
||||
|
||||
### Docker Compose (Alerting Stack)
|
||||
```yaml
|
||||
# ~/docker/monitoring/alerting/docker-compose.alerting.yml
|
||||
|
||||
services:
|
||||
alertmanager:
|
||||
image: prom/alertmanager:latest
|
||||
container_name: alertmanager
|
||||
ports:
|
||||
- "9093:9093"
|
||||
volumes:
|
||||
- ./alertmanager:/etc/alertmanager
|
||||
networks:
|
||||
- monitoring-stack_default
|
||||
|
||||
ntfy-bridge:
|
||||
build: ./ntfy-bridge
|
||||
container_name: ntfy-bridge
|
||||
ports:
|
||||
- "5001:5001"
|
||||
environment:
|
||||
- NTFY_URL=http://NTFY:80
|
||||
- NTFY_TOPIC="REDACTED_NTFY_TOPIC"
|
||||
networks:
|
||||
- monitoring-stack_default
|
||||
- ntfy-stack_default
|
||||
|
||||
signal-bridge:
|
||||
build: ./signal-bridge
|
||||
container_name: signal-bridge
|
||||
ports:
|
||||
- "5000:5000"
|
||||
environment:
|
||||
- SIGNAL_API_URL=http://signal-api:8080
|
||||
- SIGNAL_SENDER=+REDACTED_PHONE_NUMBER
|
||||
- SIGNAL_RECIPIENTS=+REDACTED_PHONE_NUMBER
|
||||
networks:
|
||||
- monitoring-stack_default
|
||||
- signal-api-stack_default
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## iOS ntfy Configuration
|
||||
|
||||
For iOS push notifications to work with self-hosted ntfy, the upstream proxy must be configured:
|
||||
|
||||
```yaml
|
||||
# ~/docker/ntfy/config/server.yml
|
||||
|
||||
base-url: "https://ntfy.vish.gg"
|
||||
upstream-base-url: "https://ntfy.sh"
|
||||
```
|
||||
|
||||
This routes iOS notifications through ntfy.sh's APNs integration while keeping messages on your self-hosted server.
|
||||
|
||||
---
|
||||
|
||||
## Testing Notifications
|
||||
|
||||
### Test ntfy Alert
|
||||
```bash
|
||||
curl -X POST http://localhost:5001/alert -H "Content-Type: application/json" -d '{
|
||||
"alerts": [{
|
||||
"status": "firing",
|
||||
"labels": {"alertname": "TestAlert", "severity": "warning", "instance": "test:9100"},
|
||||
"annotations": {"summary": "Test alert", "description": "This is a test notification"}
|
||||
}]
|
||||
}'
|
||||
```
|
||||
|
||||
### Test Signal Alert
|
||||
```bash
|
||||
curl -X POST http://localhost:5000/alert -H "Content-Type: application/json" -d '{
|
||||
"alerts": [{
|
||||
"status": "firing",
|
||||
"labels": {"alertname": "TestAlert", "severity": "critical", "instance": "test:9100"},
|
||||
"annotations": {"summary": "Test alert", "description": "This is a test notification"}
|
||||
}]
|
||||
}'
|
||||
```
|
||||
|
||||
### Test Direct ntfy
|
||||
```bash
|
||||
curl -H "Title: Test" -d "Hello from homelab!" https://ntfy.vish.gg/REDACTED_NTFY_TOPIC
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Alerts not firing
|
||||
1. Check Prometheus targets: http://homelab-vm:9090/targets
|
||||
2. Check alert rules: http://homelab-vm:9090/alerts
|
||||
3. Check Alertmanager: http://homelab-vm:9093
|
||||
|
||||
### ntfy notifications not received on iOS
|
||||
1. Verify `upstream-base-url: "https://ntfy.sh"` is set
|
||||
2. Restart ntfy container: `docker restart NTFY`
|
||||
3. Re-subscribe in iOS app
|
||||
|
||||
### Signal notifications not working
|
||||
1. Check signal-api is registered: `docker logs signal-api`
|
||||
2. Verify phone number is linked
|
||||
3. Test signal-bridge health: `curl http://localhost:5000/health`
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Restart Alerting Stack
|
||||
```bash
|
||||
cd ~/docker/monitoring/alerting
|
||||
docker compose -f docker-compose.alerting.yml restart
|
||||
```
|
||||
|
||||
### Reload Alertmanager Config
|
||||
```bash
|
||||
curl -X POST http://localhost:9093/-/reload
|
||||
```
|
||||
|
||||
### Reload Prometheus Config
|
||||
```bash
|
||||
curl -X POST http://localhost:9090/-/reload
|
||||
```
|
||||
|
||||
### View Alert History
|
||||
```bash
|
||||
# Alertmanager API
|
||||
curl -s http://localhost:9093/api/v2/alerts | jq
|
||||
```
|
||||
Reference in New Issue
Block a user