Sanitized mirror from private repository - 2026-03-21 07:31:47 UTC

2026-03-21 07:31:47 +00:00
commit 82b69ea7e3
1235 changed files with 306089 additions and 0 deletions
--- a/docs/NTFY_NOTIFICATION_SYSTEM.md
+++ b/docs/NTFY_NOTIFICATION_SYSTEM.md
@@ -0,0 +1,404 @@
+# NTFY Notification System Documentation
+
+## Overview
+
+The homelab uses a comprehensive notification system built around NTFY (a simple HTTP-based pub-sub notification service) with multiple bridges and integrations for different notification channels.
+
+## Architecture
+
+### Core Components
+
+1. **NTFY Server** - Main notification hub
+2. **NTFY Bridge** - Connects Alertmanager to NTFY
+3. **Signal Bridge** - Forwards NTFY notifications to Signal messenger
+4. **Gitea NTFY Bridge** - Sends Git repository events to NTFY
+
+### Container Stack
+
+All notification components are deployed via Docker Compose in the alerting stack:
+
+```yaml
+# Location: /home/homelab/docker/monitoring/homelab_vm/alerting.yaml
+services:
+  ntfy:
+    image: binwiederhier/ntfy:latest
+    container_name: ntfy
+    command: serve
+    volumes:
+      - /home/homelab/docker/monitoring/homelab_vm/ntfy:/var/lib/ntfy
+    ports:
+      - "8080:80"
+    environment:
+      - NTFY_BASE_URL=http://homelab.vish.local:8080
+      - NTFY_CACHE_FILE=/var/lib/ntfy/cache.db
+      - NTFY_AUTH_FILE=/var/lib/ntfy/auth.db
+      - NTFY_ATTACHMENT_CACHE_DIR=/var/lib/ntfy/attachments
+    restart: unless-stopped
+    networks:
+      - alerting
+
+  ntfy-bridge:
+    image: xenrox/ntfy-alertmanager:latest
+    container_name: ntfy-bridge
+    environment:
+      - NTFY_TOPIC="REDACTED_NTFY_TOPIC"
+      - NTFY_URL=http://ntfy:80
+      - NTFY_USER=
+      - NTFY_PASSWORD=
+    "REDACTED_PASSWORD"
+      - "8081:8080"
+    restart: unless-stopped
+    networks:
+      - alerting
+
+  signal-bridge:
+    image: bbernhard/signal-cli-rest-api:latest
+    container_name: signal-bridge
+    ports:
+      - "8082:8080"
+    environment:
+      - MODE=json-rpc
+    volumes:
+      - /home/homelab/docker/monitoring/homelab_vm/signal-data:/home/.local/share/signal-cli
+    restart: unless-stopped
+    networks:
+      - alerting
+```
+
+## Configuration Files
+
+### NTFY Server Configuration
+
+**Location**: `/home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml`
+
+```yaml
+# Basic server configuration
+base-url: "http://homelab.vish.local:8080"
+listen-http: ":80"
+cache-file: "/var/lib/ntfy/cache.db"
+auth-file: "/var/lib/ntfy/auth.db"
+attachment-cache-dir: "/var/lib/ntfy/attachments"
+
+# Authentication and access control
+auth-default-access: "deny-all"
+enable-signup: false
+enable-login: true
+
+# Rate limiting
+visitor-request-limit-burst: 60
+visitor-request-limit-replenish: "5s"
+
+# Message limits
+message-limit: 4096
+attachment-file-size-limit: "15M"
+attachment-total-size-limit: "100M"
+
+# Retention
+cache-duration: "12h"
+keepalive-interval: "45s"
+manager-interval: "1m"
+
+# Topics and subscriptions
+topics:
+  - name: "alerts"
+    description: "System alerts from Prometheus/Alertmanager"
+  - name: "gitea"
+    description: "Git repository notifications"
+  - name: "monitoring"
+    description: "Infrastructure monitoring alerts"
+```
+
+### Alertmanager Integration
+
+**Location**: `/home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml`
+
+```yaml
+global:
+  smtp_smarthost: 'localhost:587'
+  smtp_from: 'alertmanager@homelab.local'
+
+route:
+  group_by: ['alertname']
+  group_wait: 10s
+  group_interval: 10s
+  repeat_interval: 1h
+  receiver: 'web.hook'
+
+receivers:
+- name: 'web.hook'
+  webhook_configs:
+  - url: 'http://ntfy-bridge:8080/alerts'
+    send_resolved: true
+    http_config:
+      basic_auth:
+        username: ''
+        password: ''
+
+inhibit_rules:
+  - source_match:
+      severity: 'critical'
+    target_match:
+      severity: 'warning'
+    equal: ['alertname', 'dev', 'instance']
+```
+
+### Prometheus Alert Rules
+
+**Location**: `/home/homelab/docker/monitoring/alerting/alert-rules.yml`
+
+Key alert rules that trigger NTFY notifications:
+
+```yaml
+groups:
+- name: system.rules
+  rules:
+  - alert: InstanceDown
+    expr: up == 0
+    for: 1m
+    labels:
+      severity: critical
+    annotations:
+      summary: "Instance {{ $labels.instance }} down"
+      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
+
+  - alert: HighCPUUsage
+    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
+    for: 2m
+    labels:
+      severity: warning
+    annotations:
+      summary: "High CPU usage on {{ $labels.instance }}"
+      description: "CPU usage is above 80% for more than 2 minutes."
+
+  - alert: HighMemoryUsage
+    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
+    for: 2m
+    labels:
+      severity: critical
+    annotations:
+      summary: "High memory usage on {{ $labels.instance }}"
+      description: "Memory usage is above 90% for more than 2 minutes."
+
+  - alert: DiskSpaceLow
+    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
+    for: 1m
+    labels:
+      severity: critical
+    annotations:
+      summary: "Low disk space on {{ $labels.instance }}"
+      description: "Disk space is below 10% on root filesystem."
+```
+
+## Notification Channels
+
+### 1. NTFY Web Interface
+- **URL**: http://homelab.vish.local:8080
+- **Topics**: 
+  - `alerts` - System monitoring alerts
+  - `gitea` - Git repository events
+  - `monitoring` - Infrastructure status
+
+### 2. Signal Messenger Integration
+- **Bridge Container**: signal-bridge
+- **Port**: 8082
+- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/signal-data/`
+
+### 3. Gitea Integration
+- **Bridge Container**: gitea-ntfy-bridge
+- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/gitea-ntfy-bridge/`
+
+## Current Monitoring Targets
+
+The Prometheus instance monitors the following nodes:
+
+```yaml
+# From /home/homelab/docker/monitoring/prometheus/prometheus.yml
+scrape_configs:
+  - job_name: "prometheus"
+    static_configs:
+      - targets: ["localhost:9090"]
+
+  - job_name: "alertmanager"
+    static_configs:
+      - targets: ["alertmanager:9093"]
+
+  - job_name: "node-exporter"
+    static_configs:
+      - targets: ["localhost:9100"]
+
+  - job_name: "calypso-node"
+    static_configs:
+      - targets: ["100.75.252.64:9100"]
+
+  - job_name: "seattle-node"
+    static_configs:
+      - targets: ["100.82.197.124:9100"]
+
+  - job_name: "proxmox-node"
+    static_configs:
+      - targets: ["100.87.12.28:9100"]
+```
+
+## How to Modify Notifications
+
+### 1. Adding New Alert Rules
+
+Edit the alert rules file:
+```bash
+sudo nano /home/homelab/docker/monitoring/alerting/alert-rules.yml
+```
+
+Example new rule:
+```yaml
+- alert: ServiceDown
+  expr: up{job="my-service"} == 0
+  for: 30s
+  labels:
+    severity: warning
+  annotations:
+    summary: "Service {{ $labels.job }} is down"
+    description: "The service {{ $labels.job }} on {{ $labels.instance }} has been down for more than 30 seconds."
+```
+
+### 2. Modifying Notification Routing
+
+Edit Alertmanager configuration:
+```bash
+sudo nano /home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml
+```
+
+### 3. Adding New NTFY Topics
+
+Edit NTFY server configuration:
+```bash
+sudo nano /home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml
+```
+
+### 4. Changing Notification Thresholds
+
+Modify the alert expressions in `alert-rules.yml`. Common patterns:
+
+- **CPU Usage**: `expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > THRESHOLD`
+- **Memory Usage**: `expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > THRESHOLD`
+- **Disk Usage**: `expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < THRESHOLD`
+
+### 5. Reloading Configuration
+
+After making changes:
+
+```bash
+# Reload Prometheus configuration
+curl -X POST http://localhost:9090/-/reload
+
+# Reload Alertmanager configuration
+curl -X POST http://localhost:9093/-/reload
+
+# Restart NTFY if server config changed
+cd /home/homelab/docker/monitoring
+docker compose -f homelab_vm/alerting.yaml restart ntfy
+```
+
+## Testing Notifications
+
+### Manual Test via NTFY API
+
+```bash
+# Send test notification
+curl -d "Test notification from homelab" http://homelab.vish.local:8080/alerts
+
+# Send with priority and tags
+curl -H "Priority: urgent" -H "Tags: warning,test" -d "High priority test" http://homelab.vish.local:8080/alerts
+```
+
+### Test Alert Rules
+
+```bash
+# Trigger a test alert by stopping a service temporarily
+sudo systemctl stop node_exporter
+# Wait for alert to fire, then restart
+sudo systemctl start node_exporter
+```
+
+### Verify Alert Flow
+
+1. **Prometheus** scrapes metrics and evaluates rules
+2. **Alertmanager** receives alerts and routes them
+3. **NTFY Bridge** converts alerts to NTFY messages
+4. **NTFY Server** publishes to subscribed topics
+5. **Signal Bridge** forwards to Signal messenger (if configured)
+
+## Troubleshooting
+
+### Common Issues
+
+1. **Alerts not firing**: Check Prometheus targets are up
+2. **Notifications not received**: Verify NTFY bridge connectivity
+3. **Signal not working**: Check Signal bridge registration
+
+### Useful Commands
+
+```bash
+# Check container status
+docker ps | grep -E "(ntfy|alert|signal)"
+
+# View logs
+docker logs ntfy
+docker logs ntfy-bridge
+docker logs alertmanager
+
+# Test connectivity
+curl http://homelab.vish.local:8080/v1/health
+curl http://localhost:9093/-/healthy
+curl http://localhost:9090/-/healthy
+```
+
+### Log Locations
+
+- **NTFY**: `docker logs ntfy`
+- **Alertmanager**: `docker logs alertmanager`
+- **Prometheus**: `docker logs prometheus`
+- **NTFY Bridge**: `docker logs ntfy-bridge`
+
+## Security Considerations
+
+1. **Authentication**: NTFY server has authentication enabled
+2. **Network**: All services run on internal Docker network
+3. **Access Control**: Default access is deny-all
+4. **Rate Limiting**: Configured to prevent abuse
+
+## Backup and Recovery
+
+### Important Files to Backup
+
+- `/home/homelab/docker/monitoring/homelab_vm/ntfy/` - NTFY data
+- `/home/homelab/docker/monitoring/alerting/` - Alert configurations
+- `/home/homelab/docker/monitoring/prometheus/` - Prometheus config
+
+### Recovery Process
+
+1. Restore configuration files
+2. Restart containers: `docker compose -f homelab_vm/alerting.yaml up -d`
+3. Verify all services are healthy
+4. Test notification flow
+
+## Maintenance
+
+### Regular Tasks
+
+1. **Weekly**: Check alert rule effectiveness
+2. **Monthly**: Review notification volumes
+3. **Quarterly**: Update container images
+4. **Annually**: Review and update alert thresholds
+
+### Monitoring the Monitoring
+
+- Monitor NTFY server uptime
+- Track alert volume and patterns
+- Verify notification delivery
+- Check for false positives/negatives
+
+---
+
+**Last Updated**: February 15, 2026
+**Maintainer**: Homelab Administrator
+**Version**: 1.0