Sanitized mirror from private repository - 2026-03-21 07:31:47 UTC
This commit is contained in:
404
docs/NTFY_NOTIFICATION_SYSTEM.md
Normal file
404
docs/NTFY_NOTIFICATION_SYSTEM.md
Normal file
@@ -0,0 +1,404 @@
|
||||
# NTFY Notification System Documentation
|
||||
|
||||
## Overview
|
||||
|
||||
The homelab uses a comprehensive notification system built around NTFY (a simple HTTP-based pub-sub notification service) with multiple bridges and integrations for different notification channels.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
|
||||
1. **NTFY Server** - Main notification hub
|
||||
2. **NTFY Bridge** - Connects Alertmanager to NTFY
|
||||
3. **Signal Bridge** - Forwards NTFY notifications to Signal messenger
|
||||
4. **Gitea NTFY Bridge** - Sends Git repository events to NTFY
|
||||
|
||||
### Container Stack
|
||||
|
||||
All notification components are deployed via Docker Compose in the alerting stack:
|
||||
|
||||
```yaml
|
||||
# Location: /home/homelab/docker/monitoring/homelab_vm/alerting.yaml
|
||||
services:
|
||||
ntfy:
|
||||
image: binwiederhier/ntfy:latest
|
||||
container_name: ntfy
|
||||
command: serve
|
||||
volumes:
|
||||
- /home/homelab/docker/monitoring/homelab_vm/ntfy:/var/lib/ntfy
|
||||
ports:
|
||||
- "8080:80"
|
||||
environment:
|
||||
- NTFY_BASE_URL=http://homelab.vish.local:8080
|
||||
- NTFY_CACHE_FILE=/var/lib/ntfy/cache.db
|
||||
- NTFY_AUTH_FILE=/var/lib/ntfy/auth.db
|
||||
- NTFY_ATTACHMENT_CACHE_DIR=/var/lib/ntfy/attachments
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- alerting
|
||||
|
||||
ntfy-bridge:
|
||||
image: xenrox/ntfy-alertmanager:latest
|
||||
container_name: ntfy-bridge
|
||||
environment:
|
||||
- NTFY_TOPIC="REDACTED_NTFY_TOPIC"
|
||||
- NTFY_URL=http://ntfy:80
|
||||
- NTFY_USER=
|
||||
- NTFY_PASSWORD=
|
||||
"REDACTED_PASSWORD"
|
||||
- "8081:8080"
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- alerting
|
||||
|
||||
signal-bridge:
|
||||
image: bbernhard/signal-cli-rest-api:latest
|
||||
container_name: signal-bridge
|
||||
ports:
|
||||
- "8082:8080"
|
||||
environment:
|
||||
- MODE=json-rpc
|
||||
volumes:
|
||||
- /home/homelab/docker/monitoring/homelab_vm/signal-data:/home/.local/share/signal-cli
|
||||
restart: unless-stopped
|
||||
networks:
|
||||
- alerting
|
||||
```
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### NTFY Server Configuration
|
||||
|
||||
**Location**: `/home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml`
|
||||
|
||||
```yaml
|
||||
# Basic server configuration
|
||||
base-url: "http://homelab.vish.local:8080"
|
||||
listen-http: ":80"
|
||||
cache-file: "/var/lib/ntfy/cache.db"
|
||||
auth-file: "/var/lib/ntfy/auth.db"
|
||||
attachment-cache-dir: "/var/lib/ntfy/attachments"
|
||||
|
||||
# Authentication and access control
|
||||
auth-default-access: "deny-all"
|
||||
enable-signup: false
|
||||
enable-login: true
|
||||
|
||||
# Rate limiting
|
||||
visitor-request-limit-burst: 60
|
||||
visitor-request-limit-replenish: "5s"
|
||||
|
||||
# Message limits
|
||||
message-limit: 4096
|
||||
attachment-file-size-limit: "15M"
|
||||
attachment-total-size-limit: "100M"
|
||||
|
||||
# Retention
|
||||
cache-duration: "12h"
|
||||
keepalive-interval: "45s"
|
||||
manager-interval: "1m"
|
||||
|
||||
# Topics and subscriptions
|
||||
topics:
|
||||
- name: "alerts"
|
||||
description: "System alerts from Prometheus/Alertmanager"
|
||||
- name: "gitea"
|
||||
description: "Git repository notifications"
|
||||
- name: "monitoring"
|
||||
description: "Infrastructure monitoring alerts"
|
||||
```
|
||||
|
||||
### Alertmanager Integration
|
||||
|
||||
**Location**: `/home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml`
|
||||
|
||||
```yaml
|
||||
global:
|
||||
smtp_smarthost: 'localhost:587'
|
||||
smtp_from: 'alertmanager@homelab.local'
|
||||
|
||||
route:
|
||||
group_by: ['alertname']
|
||||
group_wait: 10s
|
||||
group_interval: 10s
|
||||
repeat_interval: 1h
|
||||
receiver: 'web.hook'
|
||||
|
||||
receivers:
|
||||
- name: 'web.hook'
|
||||
webhook_configs:
|
||||
- url: 'http://ntfy-bridge:8080/alerts'
|
||||
send_resolved: true
|
||||
http_config:
|
||||
basic_auth:
|
||||
username: ''
|
||||
password: ''
|
||||
|
||||
inhibit_rules:
|
||||
- source_match:
|
||||
severity: 'critical'
|
||||
target_match:
|
||||
severity: 'warning'
|
||||
equal: ['alertname', 'dev', 'instance']
|
||||
```
|
||||
|
||||
### Prometheus Alert Rules
|
||||
|
||||
**Location**: `/home/homelab/docker/monitoring/alerting/alert-rules.yml`
|
||||
|
||||
Key alert rules that trigger NTFY notifications:
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- name: system.rules
|
||||
rules:
|
||||
- alert: InstanceDown
|
||||
expr: up == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Instance {{ $labels.instance }} down"
|
||||
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
|
||||
|
||||
- alert: HighCPUUsage
|
||||
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High CPU usage on {{ $labels.instance }}"
|
||||
description: "CPU usage is above 80% for more than 2 minutes."
|
||||
|
||||
- alert: HighMemoryUsage
|
||||
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "High memory usage on {{ $labels.instance }}"
|
||||
description: "Memory usage is above 90% for more than 2 minutes."
|
||||
|
||||
- alert: DiskSpaceLow
|
||||
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Low disk space on {{ $labels.instance }}"
|
||||
description: "Disk space is below 10% on root filesystem."
|
||||
```
|
||||
|
||||
## Notification Channels
|
||||
|
||||
### 1. NTFY Web Interface
|
||||
- **URL**: http://homelab.vish.local:8080
|
||||
- **Topics**:
|
||||
- `alerts` - System monitoring alerts
|
||||
- `gitea` - Git repository events
|
||||
- `monitoring` - Infrastructure status
|
||||
|
||||
### 2. Signal Messenger Integration
|
||||
- **Bridge Container**: signal-bridge
|
||||
- **Port**: 8082
|
||||
- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/signal-data/`
|
||||
|
||||
### 3. Gitea Integration
|
||||
- **Bridge Container**: gitea-ntfy-bridge
|
||||
- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/gitea-ntfy-bridge/`
|
||||
|
||||
## Current Monitoring Targets
|
||||
|
||||
The Prometheus instance monitors the following nodes:
|
||||
|
||||
```yaml
|
||||
# From /home/homelab/docker/monitoring/prometheus/prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: "prometheus"
|
||||
static_configs:
|
||||
- targets: ["localhost:9090"]
|
||||
|
||||
- job_name: "alertmanager"
|
||||
static_configs:
|
||||
- targets: ["alertmanager:9093"]
|
||||
|
||||
- job_name: "node-exporter"
|
||||
static_configs:
|
||||
- targets: ["localhost:9100"]
|
||||
|
||||
- job_name: "calypso-node"
|
||||
static_configs:
|
||||
- targets: ["100.75.252.64:9100"]
|
||||
|
||||
- job_name: "seattle-node"
|
||||
static_configs:
|
||||
- targets: ["100.82.197.124:9100"]
|
||||
|
||||
- job_name: "proxmox-node"
|
||||
static_configs:
|
||||
- targets: ["100.87.12.28:9100"]
|
||||
```
|
||||
|
||||
## How to Modify Notifications
|
||||
|
||||
### 1. Adding New Alert Rules
|
||||
|
||||
Edit the alert rules file:
|
||||
```bash
|
||||
sudo nano /home/homelab/docker/monitoring/alerting/alert-rules.yml
|
||||
```
|
||||
|
||||
Example new rule:
|
||||
```yaml
|
||||
- alert: ServiceDown
|
||||
expr: up{job="my-service"} == 0
|
||||
for: 30s
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Service {{ $labels.job }} is down"
|
||||
description: "The service {{ $labels.job }} on {{ $labels.instance }} has been down for more than 30 seconds."
|
||||
```
|
||||
|
||||
### 2. Modifying Notification Routing
|
||||
|
||||
Edit Alertmanager configuration:
|
||||
```bash
|
||||
sudo nano /home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml
|
||||
```
|
||||
|
||||
### 3. Adding New NTFY Topics
|
||||
|
||||
Edit NTFY server configuration:
|
||||
```bash
|
||||
sudo nano /home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml
|
||||
```
|
||||
|
||||
### 4. Changing Notification Thresholds
|
||||
|
||||
Modify the alert expressions in `alert-rules.yml`. Common patterns:
|
||||
|
||||
- **CPU Usage**: `expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > THRESHOLD`
|
||||
- **Memory Usage**: `expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > THRESHOLD`
|
||||
- **Disk Usage**: `expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < THRESHOLD`
|
||||
|
||||
### 5. Reloading Configuration
|
||||
|
||||
After making changes:
|
||||
|
||||
```bash
|
||||
# Reload Prometheus configuration
|
||||
curl -X POST http://localhost:9090/-/reload
|
||||
|
||||
# Reload Alertmanager configuration
|
||||
curl -X POST http://localhost:9093/-/reload
|
||||
|
||||
# Restart NTFY if server config changed
|
||||
cd /home/homelab/docker/monitoring
|
||||
docker compose -f homelab_vm/alerting.yaml restart ntfy
|
||||
```
|
||||
|
||||
## Testing Notifications
|
||||
|
||||
### Manual Test via NTFY API
|
||||
|
||||
```bash
|
||||
# Send test notification
|
||||
curl -d "Test notification from homelab" http://homelab.vish.local:8080/alerts
|
||||
|
||||
# Send with priority and tags
|
||||
curl -H "Priority: urgent" -H "Tags: warning,test" -d "High priority test" http://homelab.vish.local:8080/alerts
|
||||
```
|
||||
|
||||
### Test Alert Rules
|
||||
|
||||
```bash
|
||||
# Trigger a test alert by stopping a service temporarily
|
||||
sudo systemctl stop node_exporter
|
||||
# Wait for alert to fire, then restart
|
||||
sudo systemctl start node_exporter
|
||||
```
|
||||
|
||||
### Verify Alert Flow
|
||||
|
||||
1. **Prometheus** scrapes metrics and evaluates rules
|
||||
2. **Alertmanager** receives alerts and routes them
|
||||
3. **NTFY Bridge** converts alerts to NTFY messages
|
||||
4. **NTFY Server** publishes to subscribed topics
|
||||
5. **Signal Bridge** forwards to Signal messenger (if configured)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Alerts not firing**: Check Prometheus targets are up
|
||||
2. **Notifications not received**: Verify NTFY bridge connectivity
|
||||
3. **Signal not working**: Check Signal bridge registration
|
||||
|
||||
### Useful Commands
|
||||
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps | grep -E "(ntfy|alert|signal)"
|
||||
|
||||
# View logs
|
||||
docker logs ntfy
|
||||
docker logs ntfy-bridge
|
||||
docker logs alertmanager
|
||||
|
||||
# Test connectivity
|
||||
curl http://homelab.vish.local:8080/v1/health
|
||||
curl http://localhost:9093/-/healthy
|
||||
curl http://localhost:9090/-/healthy
|
||||
```
|
||||
|
||||
### Log Locations
|
||||
|
||||
- **NTFY**: `docker logs ntfy`
|
||||
- **Alertmanager**: `docker logs alertmanager`
|
||||
- **Prometheus**: `docker logs prometheus`
|
||||
- **NTFY Bridge**: `docker logs ntfy-bridge`
|
||||
|
||||
## Security Considerations
|
||||
|
||||
1. **Authentication**: NTFY server has authentication enabled
|
||||
2. **Network**: All services run on internal Docker network
|
||||
3. **Access Control**: Default access is deny-all
|
||||
4. **Rate Limiting**: Configured to prevent abuse
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Important Files to Backup
|
||||
|
||||
- `/home/homelab/docker/monitoring/homelab_vm/ntfy/` - NTFY data
|
||||
- `/home/homelab/docker/monitoring/alerting/` - Alert configurations
|
||||
- `/home/homelab/docker/monitoring/prometheus/` - Prometheus config
|
||||
|
||||
### Recovery Process
|
||||
|
||||
1. Restore configuration files
|
||||
2. Restart containers: `docker compose -f homelab_vm/alerting.yaml up -d`
|
||||
3. Verify all services are healthy
|
||||
4. Test notification flow
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Regular Tasks
|
||||
|
||||
1. **Weekly**: Check alert rule effectiveness
|
||||
2. **Monthly**: Review notification volumes
|
||||
3. **Quarterly**: Update container images
|
||||
4. **Annually**: Review and update alert thresholds
|
||||
|
||||
### Monitoring the Monitoring
|
||||
|
||||
- Monitor NTFY server uptime
|
||||
- Track alert volume and patterns
|
||||
- Verify notification delivery
|
||||
- Check for false positives/negatives
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: February 15, 2026
|
||||
**Maintainer**: Homelab Administrator
|
||||
**Version**: 1.0
|
||||
Reference in New Issue
Block a user