262 lines
7.6 KiB
Markdown
262 lines
7.6 KiB
Markdown
# 🚨 Alerting & Notification System
|
|
|
|
**Last Updated**: 2026-01-27
|
|
|
|
This document describes the homelab alerting stack that provides dual-channel notifications via **ntfy** (mobile push) and **Signal** (encrypted messaging).
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
The alerting system monitors your infrastructure and sends notifications through two channels:
|
|
|
|
| Channel | Use Case | App Required |
|
|
|---------|----------|--------------|
|
|
| **ntfy** | All alerts (warnings + critical) | ntfy iOS/Android app |
|
|
| **Signal** | Critical alerts only | Signal messenger |
|
|
|
|
### Alert Severity Routing
|
|
|
|
```
|
|
⚠️ Warning alerts → ntfy only
|
|
🚨 Critical alerts → ntfy + Signal
|
|
✅ Resolved alerts → Both channels (for critical)
|
|
```
|
|
|
|
---
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
│ Prometheus │────▶│ Alertmanager │────▶│ ntfy-bridge │───▶ ntfy app
|
|
│ (port 9090) │ │ (port 9093) │ │ (port 5001) │
|
|
└─────────────────┘ └────────┬─────────┘ └─────────────────┘
|
|
│
|
|
│ (critical only)
|
|
▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ signal-bridge │────▶│ Signal API │───▶ Signal app
|
|
│ (port 5000) │ │ (port 8080) │
|
|
└─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Components
|
|
|
|
### 1. Prometheus (Metrics Collection)
|
|
- **Location**: Homelab VM
|
|
- **Port**: 9090
|
|
- **Config**: `~/docker/monitoring/prometheus/prometheus.yml`
|
|
- **Alert Rules**: `~/docker/monitoring/prometheus/alert-rules.yml`
|
|
|
|
### 2. Alertmanager (Alert Routing)
|
|
- **Location**: Homelab VM
|
|
- **Port**: 9093
|
|
- **Config**: `~/docker/monitoring/alerting/alertmanager/alertmanager.yml`
|
|
- **Web UI**: http://homelab-vm:9093
|
|
|
|
### 3. ntfy-bridge (Notification Formatter)
|
|
- **Location**: Homelab VM
|
|
- **Port**: 5001
|
|
- **Purpose**: Formats Alertmanager webhooks into clean ntfy notifications
|
|
- **Source**: `~/docker/monitoring/alerting/ntfy-bridge/`
|
|
|
|
### 4. signal-bridge (Signal Forwarder)
|
|
- **Location**: Homelab VM
|
|
- **Port**: 5000
|
|
- **Purpose**: Forwards critical alerts to Signal via signal-api
|
|
- **Source**: `~/docker/monitoring/alerting/signal-bridge/`
|
|
|
|
---
|
|
|
|
## Alert Rules Configured
|
|
|
|
| Alert | Severity | Threshold | Duration | Notification |
|
|
|-------|----------|-----------|----------|--------------|
|
|
| **HostDown** | 🔴 Critical | Host unreachable | 2 min | ntfy + Signal |
|
|
| **HighCPUUsage** | 🟡 Warning | CPU > 80% | 5 min | ntfy only |
|
|
| **CriticalCPUUsage** | 🔴 Critical | CPU > 95% | 2 min | ntfy + Signal |
|
|
| **HighMemoryUsage** | 🟡 Warning | Memory > 85% | 5 min | ntfy only |
|
|
| **CriticalMemoryUsage** | 🔴 Critical | Memory > 95% | 2 min | ntfy + Signal |
|
|
| **HighDiskUsage** | 🟡 Warning | Disk > 85% | 5 min | ntfy only |
|
|
| **CriticalDiskUsage** | 🔴 Critical | Disk > 95% | 2 min | ntfy + Signal |
|
|
| **DiskWillFillIn24Hours** | 🟡 Warning | Predictive | 5 min | ntfy only |
|
|
| **HighNetworkErrors** | 🟡 Warning | Errors > 1% | 5 min | ntfy only |
|
|
| **ServiceDown** | 🔴 Critical | Container exited | 1 min | ntfy + Signal |
|
|
| **ContainerHighCPU** | 🟡 Warning | Container CPU > 80% | 5 min | ntfy only |
|
|
| **ContainerHighMemory** | 🟡 Warning | Container Memory > 80% | 5 min | ntfy only |
|
|
|
|
---
|
|
|
|
## Configuration Files
|
|
|
|
### Alertmanager Configuration
|
|
```yaml
|
|
# ~/docker/monitoring/alerting/alertmanager/alertmanager.yml
|
|
|
|
global:
|
|
resolve_timeout: 5m
|
|
|
|
route:
|
|
group_by: ['alertname', 'severity', 'instance']
|
|
group_wait: 30s
|
|
group_interval: 5m
|
|
repeat_interval: 4h
|
|
receiver: 'ntfy-all'
|
|
|
|
routes:
|
|
- match:
|
|
severity: critical
|
|
receiver: 'critical-alerts'
|
|
- match:
|
|
severity: warning
|
|
receiver: 'ntfy-all'
|
|
|
|
receivers:
|
|
- name: 'ntfy-all'
|
|
webhook_configs:
|
|
- url: 'http://ntfy-bridge:5001/alert'
|
|
send_resolved: true
|
|
|
|
- name: 'critical-alerts'
|
|
webhook_configs:
|
|
- url: 'http://ntfy-bridge:5001/alert'
|
|
send_resolved: true
|
|
- url: 'http://signal-bridge:5000/alert'
|
|
send_resolved: true
|
|
```
|
|
|
|
### Docker Compose (Alerting Stack)
|
|
```yaml
|
|
# ~/docker/monitoring/alerting/docker-compose.alerting.yml
|
|
|
|
services:
|
|
alertmanager:
|
|
image: prom/alertmanager:latest
|
|
container_name: alertmanager
|
|
ports:
|
|
- "9093:9093"
|
|
volumes:
|
|
- ./alertmanager:/etc/alertmanager
|
|
networks:
|
|
- monitoring-stack_default
|
|
|
|
ntfy-bridge:
|
|
build: ./ntfy-bridge
|
|
container_name: ntfy-bridge
|
|
ports:
|
|
- "5001:5001"
|
|
environment:
|
|
- NTFY_URL=http://NTFY:80
|
|
- NTFY_TOPIC="REDACTED_NTFY_TOPIC"
|
|
networks:
|
|
- monitoring-stack_default
|
|
- ntfy-stack_default
|
|
|
|
signal-bridge:
|
|
build: ./signal-bridge
|
|
container_name: signal-bridge
|
|
ports:
|
|
- "5000:5000"
|
|
environment:
|
|
- SIGNAL_API_URL=http://signal-api:8080
|
|
- SIGNAL_SENDER=+REDACTED_PHONE_NUMBER
|
|
- SIGNAL_RECIPIENTS=+REDACTED_PHONE_NUMBER
|
|
networks:
|
|
- monitoring-stack_default
|
|
- signal-api-stack_default
|
|
```
|
|
|
|
---
|
|
|
|
## iOS ntfy Configuration
|
|
|
|
For iOS push notifications to work with self-hosted ntfy, the upstream proxy must be configured:
|
|
|
|
```yaml
|
|
# ~/docker/ntfy/config/server.yml
|
|
|
|
base-url: "https://ntfy.vish.gg"
|
|
upstream-base-url: "https://ntfy.sh"
|
|
```
|
|
|
|
This routes iOS notifications through ntfy.sh's APNs integration while keeping messages on your self-hosted server.
|
|
|
|
---
|
|
|
|
## Testing Notifications
|
|
|
|
### Test ntfy Alert
|
|
```bash
|
|
curl -X POST http://localhost:5001/alert -H "Content-Type: application/json" -d '{
|
|
"alerts": [{
|
|
"status": "firing",
|
|
"labels": {"alertname": "TestAlert", "severity": "warning", "instance": "test:9100"},
|
|
"annotations": {"summary": "Test alert", "description": "This is a test notification"}
|
|
}]
|
|
}'
|
|
```
|
|
|
|
### Test Signal Alert
|
|
```bash
|
|
curl -X POST http://localhost:5000/alert -H "Content-Type: application/json" -d '{
|
|
"alerts": [{
|
|
"status": "firing",
|
|
"labels": {"alertname": "TestAlert", "severity": "critical", "instance": "test:9100"},
|
|
"annotations": {"summary": "Test alert", "description": "This is a test notification"}
|
|
}]
|
|
}'
|
|
```
|
|
|
|
### Test Direct ntfy
|
|
```bash
|
|
curl -H "Title: Test" -d "Hello from homelab!" https://ntfy.vish.gg/REDACTED_NTFY_TOPIC
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Alerts not firing
|
|
1. Check Prometheus targets: http://homelab-vm:9090/targets
|
|
2. Check alert rules: http://homelab-vm:9090/alerts
|
|
3. Check Alertmanager: http://homelab-vm:9093
|
|
|
|
### ntfy notifications not received on iOS
|
|
1. Verify `upstream-base-url: "https://ntfy.sh"` is set
|
|
2. Restart ntfy container: `docker restart NTFY`
|
|
3. Re-subscribe in iOS app
|
|
|
|
### Signal notifications not working
|
|
1. Check signal-api is registered: `docker logs signal-api`
|
|
2. Verify phone number is linked
|
|
3. Test signal-bridge health: `curl http://localhost:5000/health`
|
|
|
|
---
|
|
|
|
## Maintenance
|
|
|
|
### Restart Alerting Stack
|
|
```bash
|
|
cd ~/docker/monitoring/alerting
|
|
docker compose -f docker-compose.alerting.yml restart
|
|
```
|
|
|
|
### Reload Alertmanager Config
|
|
```bash
|
|
curl -X POST http://localhost:9093/-/reload
|
|
```
|
|
|
|
### Reload Prometheus Config
|
|
```bash
|
|
curl -X POST http://localhost:9090/-/reload
|
|
```
|
|
|
|
### View Alert History
|
|
```bash
|
|
# Alertmanager API
|
|
curl -s http://localhost:9093/api/v2/alerts | jq
|
|
```
|