Files
homelab-optimized/docs/admin/alerting-setup.md
Gitea Mirror Bot 9fa5b7654e
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-16 07:18:01 UTC
2026-04-16 07:18:01 +00:00

262 lines
7.6 KiB
Markdown

# 🚨 Alerting & Notification System
**Last Updated**: 2026-01-27
This document describes the homelab alerting stack that provides dual-channel notifications via **ntfy** (mobile push) and **Signal** (encrypted messaging).
---
## Overview
The alerting system monitors your infrastructure and sends notifications through two channels:
| Channel | Use Case | App Required |
|---------|----------|--------------|
| **ntfy** | All alerts (warnings + critical) | ntfy iOS/Android app |
| **Signal** | Critical alerts only | Signal messenger |
### Alert Severity Routing
```
⚠️ Warning alerts → ntfy only
🚨 Critical alerts → ntfy + Signal
✅ Resolved alerts → Both channels (for critical)
```
---
## Architecture
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Prometheus │────▶│ Alertmanager │────▶│ ntfy-bridge │───▶ ntfy app
│ (port 9090) │ │ (port 9093) │ │ (port 5001) │
└─────────────────┘ └────────┬─────────┘ └─────────────────┘
│ (critical only)
┌─────────────────┐ ┌─────────────────┐
│ signal-bridge │────▶│ Signal API │───▶ Signal app
│ (port 5000) │ │ (port 8080) │
└─────────────────┘ └─────────────────┘
```
---
## Components
### 1. Prometheus (Metrics Collection)
- **Location**: Homelab VM
- **Port**: 9090
- **Config**: `~/docker/monitoring/prometheus/prometheus.yml`
- **Alert Rules**: `~/docker/monitoring/prometheus/alert-rules.yml`
### 2. Alertmanager (Alert Routing)
- **Location**: Homelab VM
- **Port**: 9093
- **Config**: `~/docker/monitoring/alerting/alertmanager/alertmanager.yml`
- **Web UI**: http://homelab-vm:9093
### 3. ntfy-bridge (Notification Formatter)
- **Location**: Homelab VM
- **Port**: 5001
- **Purpose**: Formats Alertmanager webhooks into clean ntfy notifications
- **Source**: `~/docker/monitoring/alerting/ntfy-bridge/`
### 4. signal-bridge (Signal Forwarder)
- **Location**: Homelab VM
- **Port**: 5000
- **Purpose**: Forwards critical alerts to Signal via signal-api
- **Source**: `~/docker/monitoring/alerting/signal-bridge/`
---
## Alert Rules Configured
| Alert | Severity | Threshold | Duration | Notification |
|-------|----------|-----------|----------|--------------|
| **HostDown** | 🔴 Critical | Host unreachable | 2 min | ntfy + Signal |
| **HighCPUUsage** | 🟡 Warning | CPU > 80% | 5 min | ntfy only |
| **CriticalCPUUsage** | 🔴 Critical | CPU > 95% | 2 min | ntfy + Signal |
| **HighMemoryUsage** | 🟡 Warning | Memory > 85% | 5 min | ntfy only |
| **CriticalMemoryUsage** | 🔴 Critical | Memory > 95% | 2 min | ntfy + Signal |
| **HighDiskUsage** | 🟡 Warning | Disk > 85% | 5 min | ntfy only |
| **CriticalDiskUsage** | 🔴 Critical | Disk > 95% | 2 min | ntfy + Signal |
| **DiskWillFillIn24Hours** | 🟡 Warning | Predictive | 5 min | ntfy only |
| **HighNetworkErrors** | 🟡 Warning | Errors > 1% | 5 min | ntfy only |
| **ServiceDown** | 🔴 Critical | Container exited | 1 min | ntfy + Signal |
| **ContainerHighCPU** | 🟡 Warning | Container CPU > 80% | 5 min | ntfy only |
| **ContainerHighMemory** | 🟡 Warning | Container Memory > 80% | 5 min | ntfy only |
---
## Configuration Files
### Alertmanager Configuration
```yaml
# ~/docker/monitoring/alerting/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity', 'instance']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'ntfy-all'
routes:
- match:
severity: critical
receiver: 'critical-alerts'
- match:
severity: warning
receiver: 'ntfy-all'
receivers:
- name: 'ntfy-all'
webhook_configs:
- url: 'http://ntfy-bridge:5001/alert'
send_resolved: true
- name: 'critical-alerts'
webhook_configs:
- url: 'http://ntfy-bridge:5001/alert'
send_resolved: true
- url: 'http://signal-bridge:5000/alert'
send_resolved: true
```
### Docker Compose (Alerting Stack)
```yaml
# ~/docker/monitoring/alerting/docker-compose.alerting.yml
services:
alertmanager:
image: prom/alertmanager:latest
container_name: alertmanager
ports:
- "9093:9093"
volumes:
- ./alertmanager:/etc/alertmanager
networks:
- monitoring-stack_default
ntfy-bridge:
build: ./ntfy-bridge
container_name: ntfy-bridge
ports:
- "5001:5001"
environment:
- NTFY_URL=http://NTFY:80
- NTFY_TOPIC="REDACTED_NTFY_TOPIC"
networks:
- monitoring-stack_default
- ntfy-stack_default
signal-bridge:
build: ./signal-bridge
container_name: signal-bridge
ports:
- "5000:5000"
environment:
- SIGNAL_API_URL=http://signal-api:8080
- SIGNAL_SENDER=+REDACTED_PHONE_NUMBER
- SIGNAL_RECIPIENTS=+REDACTED_PHONE_NUMBER
networks:
- monitoring-stack_default
- signal-api-stack_default
```
---
## iOS ntfy Configuration
For iOS push notifications to work with self-hosted ntfy, the upstream proxy must be configured:
```yaml
# ~/docker/ntfy/config/server.yml
base-url: "https://ntfy.vish.gg"
upstream-base-url: "https://ntfy.sh"
```
This routes iOS notifications through ntfy.sh's APNs integration while keeping messages on your self-hosted server.
---
## Testing Notifications
### Test ntfy Alert
```bash
curl -X POST http://localhost:5001/alert -H "Content-Type: application/json" -d '{
"alerts": [{
"status": "firing",
"labels": {"alertname": "TestAlert", "severity": "warning", "instance": "test:9100"},
"annotations": {"summary": "Test alert", "description": "This is a test notification"}
}]
}'
```
### Test Signal Alert
```bash
curl -X POST http://localhost:5000/alert -H "Content-Type: application/json" -d '{
"alerts": [{
"status": "firing",
"labels": {"alertname": "TestAlert", "severity": "critical", "instance": "test:9100"},
"annotations": {"summary": "Test alert", "description": "This is a test notification"}
}]
}'
```
### Test Direct ntfy
```bash
curl -H "Title: Test" -d "Hello from homelab!" https://ntfy.vish.gg/REDACTED_NTFY_TOPIC
```
---
## Troubleshooting
### Alerts not firing
1. Check Prometheus targets: http://homelab-vm:9090/targets
2. Check alert rules: http://homelab-vm:9090/alerts
3. Check Alertmanager: http://homelab-vm:9093
### ntfy notifications not received on iOS
1. Verify `upstream-base-url: "https://ntfy.sh"` is set
2. Restart ntfy container: `docker restart NTFY`
3. Re-subscribe in iOS app
### Signal notifications not working
1. Check signal-api is registered: `docker logs signal-api`
2. Verify phone number is linked
3. Test signal-bridge health: `curl http://localhost:5000/health`
---
## Maintenance
### Restart Alerting Stack
```bash
cd ~/docker/monitoring/alerting
docker compose -f docker-compose.alerting.yml restart
```
### Reload Alertmanager Config
```bash
curl -X POST http://localhost:9093/-/reload
```
### Reload Prometheus Config
```bash
curl -X POST http://localhost:9090/-/reload
```
### View Alert History
```bash
# Alertmanager API
curl -s http://localhost:9093/api/v2/alerts | jq
```