Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot 206cbd2780

Documentation / Build Docusaurus (push) Failing after 18m30s

Details

Documentation / Deploy to GitHub Pages (push) Has been skipped

Details

Sanitized mirror from private repository - 2026-03-19 02:19:26 UTC

2026-03-19 02:19:26 +00:00

10 KiB

Raw Blame History

NTFY Notification System Documentation

Overview

The homelab uses a comprehensive notification system built around NTFY (a simple HTTP-based pub-sub notification service) with multiple bridges and integrations for different notification channels.

Architecture

Core Components

NTFY Server - Main notification hub
NTFY Bridge - Connects Alertmanager to NTFY
Signal Bridge - Forwards NTFY notifications to Signal messenger
Gitea NTFY Bridge - Sends Git repository events to NTFY

Container Stack

All notification components are deployed via Docker Compose in the alerting stack:

# Location: /home/homelab/docker/monitoring/homelab_vm/alerting.yaml
services:
  ntfy:
    image: binwiederhier/ntfy:latest
    container_name: ntfy
    command: serve
    volumes:
      - /home/homelab/docker/monitoring/homelab_vm/ntfy:/var/lib/ntfy
    ports:
      - "8080:80"
    environment:
      - NTFY_BASE_URL=http://homelab.vish.local:8080
      - NTFY_CACHE_FILE=/var/lib/ntfy/cache.db
      - NTFY_AUTH_FILE=/var/lib/ntfy/auth.db
      - NTFY_ATTACHMENT_CACHE_DIR=/var/lib/ntfy/attachments
    restart: unless-stopped
    networks:
      - alerting

  ntfy-bridge:
    image: xenrox/ntfy-alertmanager:latest
    container_name: ntfy-bridge
    environment:
      - NTFY_TOPIC="REDACTED_NTFY_TOPIC"
      - NTFY_URL=http://ntfy:80
      - NTFY_USER=
      - NTFY_PASSWORD=
    "REDACTED_PASSWORD"
      - "8081:8080"
    restart: unless-stopped
    networks:
      - alerting

  signal-bridge:
    image: bbernhard/signal-cli-rest-api:latest
    container_name: signal-bridge
    ports:
      - "8082:8080"
    environment:
      - MODE=json-rpc
    volumes:
      - /home/homelab/docker/monitoring/homelab_vm/signal-data:/home/.local/share/signal-cli
    restart: unless-stopped
    networks:
      - alerting

Configuration Files

NTFY Server Configuration

Location: /home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml

# Basic server configuration
base-url: "http://homelab.vish.local:8080"
listen-http: ":80"
cache-file: "/var/lib/ntfy/cache.db"
auth-file: "/var/lib/ntfy/auth.db"
attachment-cache-dir: "/var/lib/ntfy/attachments"

# Authentication and access control
auth-default-access: "deny-all"
enable-signup: false
enable-login: true

# Rate limiting
visitor-request-limit-burst: 60
visitor-request-limit-replenish: "5s"

# Message limits
message-limit: 4096
attachment-file-size-limit: "15M"
attachment-total-size-limit: "100M"

# Retention
cache-duration: "12h"
keepalive-interval: "45s"
manager-interval: "1m"

# Topics and subscriptions
topics:
  - name: "alerts"
    description: "System alerts from Prometheus/Alertmanager"
  - name: "gitea"
    description: "Git repository notifications"
  - name: "monitoring"
    description: "Infrastructure monitoring alerts"

Alertmanager Integration

Location: /home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml

global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alertmanager@homelab.local'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://ntfy-bridge:8080/alerts'
    send_resolved: true
    http_config:
      basic_auth:
        username: ''
        password: ''

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

Prometheus Alert Rules

Location: /home/homelab/docker/monitoring/alerting/alert-rules.yml

Key alert rules that trigger NTFY notifications:

groups:
- name: system.rules
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

  - alert: HighCPUUsage
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is above 80% for more than 2 minutes."

  - alert: HighMemoryUsage
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High memory usage on {{ $labels.instance }}"
      description: "Memory usage is above 90% for more than 2 minutes."

  - alert: DiskSpaceLow
    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Low disk space on {{ $labels.instance }}"
      description: "Disk space is below 10% on root filesystem."

Notification Channels

1. NTFY Web Interface

URL: http://homelab.vish.local:8080
Topics:
- alerts - System monitoring alerts
- gitea - Git repository events
- monitoring - Infrastructure status

2. Signal Messenger Integration

Bridge Container: signal-bridge
Port: 8082
Configuration: /home/homelab/docker/monitoring/homelab_vm/signal-data/

3. Gitea Integration

Bridge Container: gitea-ntfy-bridge
Configuration: /home/homelab/docker/monitoring/homelab_vm/gitea-ntfy-bridge/

Current Monitoring Targets

The Prometheus instance monitors the following nodes:

# From /home/homelab/docker/monitoring/prometheus/prometheus.yml
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "alertmanager"
    static_configs:
      - targets: ["alertmanager:9093"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["localhost:9100"]

  - job_name: "calypso-node"
    static_configs:
      - targets: ["100.75.252.64:9100"]

  - job_name: "seattle-node"
    static_configs:
      - targets: ["100.82.197.124:9100"]

  - job_name: "proxmox-node"
    static_configs:
      - targets: ["100.87.12.28:9100"]

How to Modify Notifications

1. Adding New Alert Rules

Edit the alert rules file:

sudo nano /home/homelab/docker/monitoring/alerting/alert-rules.yml

Example new rule:

- alert: ServiceDown
  expr: up{job="my-service"} == 0
  for: 30s
  labels:
    severity: warning
  annotations:
    summary: "Service {{ $labels.job }} is down"
    description: "The service {{ $labels.job }} on {{ $labels.instance }} has been down for more than 30 seconds."

2. Modifying Notification Routing

Edit Alertmanager configuration:

sudo nano /home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml

3. Adding New NTFY Topics

Edit NTFY server configuration:

sudo nano /home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml

4. Changing Notification Thresholds

Modify the alert expressions in alert-rules.yml. Common patterns:

CPU Usage: expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > THRESHOLD
Memory Usage: expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > THRESHOLD
Disk Usage: expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < THRESHOLD

5. Reloading Configuration

After making changes:

# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

# Reload Alertmanager configuration
curl -X POST http://localhost:9093/-/reload

# Restart NTFY if server config changed
cd /home/homelab/docker/monitoring
docker compose -f homelab_vm/alerting.yaml restart ntfy

Testing Notifications

Manual Test via NTFY API

# Send test notification
curl -d "Test notification from homelab" http://homelab.vish.local:8080/alerts

# Send with priority and tags
curl -H "Priority: urgent" -H "Tags: warning,test" -d "High priority test" http://homelab.vish.local:8080/alerts

Test Alert Rules

# Trigger a test alert by stopping a service temporarily
sudo systemctl stop node_exporter
# Wait for alert to fire, then restart
sudo systemctl start node_exporter

Verify Alert Flow

Prometheus scrapes metrics and evaluates rules
Alertmanager receives alerts and routes them
NTFY Bridge converts alerts to NTFY messages
NTFY Server publishes to subscribed topics
Signal Bridge forwards to Signal messenger (if configured)

Troubleshooting

Common Issues

Alerts not firing: Check Prometheus targets are up
Notifications not received: Verify NTFY bridge connectivity
Signal not working: Check Signal bridge registration

Useful Commands

# Check container status
docker ps | grep -E "(ntfy|alert|signal)"

# View logs
docker logs ntfy
docker logs ntfy-bridge
docker logs alertmanager

# Test connectivity
curl http://homelab.vish.local:8080/v1/health
curl http://localhost:9093/-/healthy
curl http://localhost:9090/-/healthy

Log Locations

NTFY: docker logs ntfy
Alertmanager: docker logs alertmanager
Prometheus: docker logs prometheus
NTFY Bridge: docker logs ntfy-bridge

Security Considerations

Authentication: NTFY server has authentication enabled
Network: All services run on internal Docker network
Access Control: Default access is deny-all
Rate Limiting: Configured to prevent abuse

Backup and Recovery

Important Files to Backup

/home/homelab/docker/monitoring/homelab_vm/ntfy/ - NTFY data
/home/homelab/docker/monitoring/alerting/ - Alert configurations
/home/homelab/docker/monitoring/prometheus/ - Prometheus config

Recovery Process

Restore configuration files
Restart containers: docker compose -f homelab_vm/alerting.yaml up -d
Verify all services are healthy
Test notification flow

Maintenance

Regular Tasks

Weekly: Check alert rule effectiveness
Monthly: Review notification volumes
Quarterly: Update container images
Annually: Review and update alert thresholds

Monitoring the Monitoring

Monitor NTFY server uptime
Track alert volume and patterns
Verify notification delivery
Check for false positives/negatives

Last Updated: February 15, 2026 Maintainer: Homelab Administrator Version: 1.0

10 KiB Raw Blame History