Files
homelab-optimized/docs/NTFY_NOTIFICATION_SYSTEM.md
Gitea Mirror Bot 079778cccc
Some checks failed
Documentation / Build Docusaurus (push) Failing after 9s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-16 07:00:59 UTC
2026-03-16 07:00:59 +00:00

404 lines
10 KiB
Markdown

# NTFY Notification System Documentation
## Overview
The homelab uses a comprehensive notification system built around NTFY (a simple HTTP-based pub-sub notification service) with multiple bridges and integrations for different notification channels.
## Architecture
### Core Components
1. **NTFY Server** - Main notification hub
2. **NTFY Bridge** - Connects Alertmanager to NTFY
3. **Signal Bridge** - Forwards NTFY notifications to Signal messenger
4. **Gitea NTFY Bridge** - Sends Git repository events to NTFY
### Container Stack
All notification components are deployed via Docker Compose in the alerting stack:
```yaml
# Location: /home/homelab/docker/monitoring/homelab_vm/alerting.yaml
services:
ntfy:
image: binwiederhier/ntfy:latest
container_name: ntfy
command: serve
volumes:
- /home/homelab/docker/monitoring/homelab_vm/ntfy:/var/lib/ntfy
ports:
- "8080:80"
environment:
- NTFY_BASE_URL=http://homelab.vish.local:8080
- NTFY_CACHE_FILE=/var/lib/ntfy/cache.db
- NTFY_AUTH_FILE=/var/lib/ntfy/auth.db
- NTFY_ATTACHMENT_CACHE_DIR=/var/lib/ntfy/attachments
restart: unless-stopped
networks:
- alerting
ntfy-bridge:
image: xenrox/ntfy-alertmanager:latest
container_name: ntfy-bridge
environment:
- NTFY_TOPIC="REDACTED_NTFY_TOPIC"
- NTFY_URL=http://ntfy:80
- NTFY_USER=
- NTFY_PASSWORD=
"REDACTED_PASSWORD"
- "8081:8080"
restart: unless-stopped
networks:
- alerting
signal-bridge:
image: bbernhard/signal-cli-rest-api:latest
container_name: signal-bridge
ports:
- "8082:8080"
environment:
- MODE=json-rpc
volumes:
- /home/homelab/docker/monitoring/homelab_vm/signal-data:/home/.local/share/signal-cli
restart: unless-stopped
networks:
- alerting
```
## Configuration Files
### NTFY Server Configuration
**Location**: `/home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml`
```yaml
# Basic server configuration
base-url: "http://homelab.vish.local:8080"
listen-http: ":80"
cache-file: "/var/lib/ntfy/cache.db"
auth-file: "/var/lib/ntfy/auth.db"
attachment-cache-dir: "/var/lib/ntfy/attachments"
# Authentication and access control
auth-default-access: "deny-all"
enable-signup: false
enable-login: true
# Rate limiting
visitor-request-limit-burst: 60
visitor-request-limit-replenish: "5s"
# Message limits
message-limit: 4096
attachment-file-size-limit: "15M"
attachment-total-size-limit: "100M"
# Retention
cache-duration: "12h"
keepalive-interval: "45s"
manager-interval: "1m"
# Topics and subscriptions
topics:
- name: "alerts"
description: "System alerts from Prometheus/Alertmanager"
- name: "gitea"
description: "Git repository notifications"
- name: "monitoring"
description: "Infrastructure monitoring alerts"
```
### Alertmanager Integration
**Location**: `/home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml`
```yaml
global:
smtp_smarthost: 'localhost:587'
smtp_from: 'alertmanager@homelab.local'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://ntfy-bridge:8080/alerts'
send_resolved: true
http_config:
basic_auth:
username: ''
password: ''
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
```
### Prometheus Alert Rules
**Location**: `/home/homelab/docker/monitoring/alerting/alert-rules.yml`
Key alert rules that trigger NTFY notifications:
```yaml
groups:
- name: system.rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 2m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for more than 2 minutes."
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
for: 2m
labels:
severity: critical
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 90% for more than 2 minutes."
- alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
for: 1m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk space is below 10% on root filesystem."
```
## Notification Channels
### 1. NTFY Web Interface
- **URL**: http://homelab.vish.local:8080
- **Topics**:
- `alerts` - System monitoring alerts
- `gitea` - Git repository events
- `monitoring` - Infrastructure status
### 2. Signal Messenger Integration
- **Bridge Container**: signal-bridge
- **Port**: 8082
- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/signal-data/`
### 3. Gitea Integration
- **Bridge Container**: gitea-ntfy-bridge
- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/gitea-ntfy-bridge/`
## Current Monitoring Targets
The Prometheus instance monitors the following nodes:
```yaml
# From /home/homelab/docker/monitoring/prometheus/prometheus.yml
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
- job_name: "alertmanager"
static_configs:
- targets: ["alertmanager:9093"]
- job_name: "node-exporter"
static_configs:
- targets: ["localhost:9100"]
- job_name: "calypso-node"
static_configs:
- targets: ["100.75.252.64:9100"]
- job_name: "seattle-node"
static_configs:
- targets: ["100.82.197.124:9100"]
- job_name: "proxmox-node"
static_configs:
- targets: ["100.87.12.28:9100"]
```
## How to Modify Notifications
### 1. Adding New Alert Rules
Edit the alert rules file:
```bash
sudo nano /home/homelab/docker/monitoring/alerting/alert-rules.yml
```
Example new rule:
```yaml
- alert: ServiceDown
expr: up{job="my-service"} == 0
for: 30s
labels:
severity: warning
annotations:
summary: "Service {{ $labels.job }} is down"
description: "The service {{ $labels.job }} on {{ $labels.instance }} has been down for more than 30 seconds."
```
### 2. Modifying Notification Routing
Edit Alertmanager configuration:
```bash
sudo nano /home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml
```
### 3. Adding New NTFY Topics
Edit NTFY server configuration:
```bash
sudo nano /home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml
```
### 4. Changing Notification Thresholds
Modify the alert expressions in `alert-rules.yml`. Common patterns:
- **CPU Usage**: `expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > THRESHOLD`
- **Memory Usage**: `expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > THRESHOLD`
- **Disk Usage**: `expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < THRESHOLD`
### 5. Reloading Configuration
After making changes:
```bash
# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload
# Reload Alertmanager configuration
curl -X POST http://localhost:9093/-/reload
# Restart NTFY if server config changed
cd /home/homelab/docker/monitoring
docker compose -f homelab_vm/alerting.yaml restart ntfy
```
## Testing Notifications
### Manual Test via NTFY API
```bash
# Send test notification
curl -d "Test notification from homelab" http://homelab.vish.local:8080/alerts
# Send with priority and tags
curl -H "Priority: urgent" -H "Tags: warning,test" -d "High priority test" http://homelab.vish.local:8080/alerts
```
### Test Alert Rules
```bash
# Trigger a test alert by stopping a service temporarily
sudo systemctl stop node_exporter
# Wait for alert to fire, then restart
sudo systemctl start node_exporter
```
### Verify Alert Flow
1. **Prometheus** scrapes metrics and evaluates rules
2. **Alertmanager** receives alerts and routes them
3. **NTFY Bridge** converts alerts to NTFY messages
4. **NTFY Server** publishes to subscribed topics
5. **Signal Bridge** forwards to Signal messenger (if configured)
## Troubleshooting
### Common Issues
1. **Alerts not firing**: Check Prometheus targets are up
2. **Notifications not received**: Verify NTFY bridge connectivity
3. **Signal not working**: Check Signal bridge registration
### Useful Commands
```bash
# Check container status
docker ps | grep -E "(ntfy|alert|signal)"
# View logs
docker logs ntfy
docker logs ntfy-bridge
docker logs alertmanager
# Test connectivity
curl http://homelab.vish.local:8080/v1/health
curl http://localhost:9093/-/healthy
curl http://localhost:9090/-/healthy
```
### Log Locations
- **NTFY**: `docker logs ntfy`
- **Alertmanager**: `docker logs alertmanager`
- **Prometheus**: `docker logs prometheus`
- **NTFY Bridge**: `docker logs ntfy-bridge`
## Security Considerations
1. **Authentication**: NTFY server has authentication enabled
2. **Network**: All services run on internal Docker network
3. **Access Control**: Default access is deny-all
4. **Rate Limiting**: Configured to prevent abuse
## Backup and Recovery
### Important Files to Backup
- `/home/homelab/docker/monitoring/homelab_vm/ntfy/` - NTFY data
- `/home/homelab/docker/monitoring/alerting/` - Alert configurations
- `/home/homelab/docker/monitoring/prometheus/` - Prometheus config
### Recovery Process
1. Restore configuration files
2. Restart containers: `docker compose -f homelab_vm/alerting.yaml up -d`
3. Verify all services are healthy
4. Test notification flow
## Maintenance
### Regular Tasks
1. **Weekly**: Check alert rule effectiveness
2. **Monthly**: Review notification volumes
3. **Quarterly**: Update container images
4. **Annually**: Review and update alert thresholds
### Monitoring the Monitoring
- Monitor NTFY server uptime
- Track alert volume and patterns
- Verify notification delivery
- Check for false positives/negatives
---
**Last Updated**: February 15, 2026
**Maintainer**: Homelab Administrator
**Version**: 1.0