# NTFY Notification System Documentation

## Overview

The homelab uses a comprehensive notification system built around NTFY (a simple HTTP-based pub-sub notification service) with multiple bridges and integrations for different notification channels.

## Architecture

### Core Components

1. **NTFY Server** - Main notification hub
2. **NTFY Bridge** - Connects Alertmanager to NTFY
3. **Signal Bridge** - Forwards NTFY notifications to Signal messenger
4. **Gitea NTFY Bridge** - Sends Git repository events to NTFY

### Container Stack

All notification components are deployed via Docker Compose in the alerting stack:

```yaml
# Location: /home/homelab/docker/monitoring/homelab_vm/alerting.yaml
services:
  ntfy:
    image: binwiederhier/ntfy:latest
    container_name: ntfy
    command: serve
    volumes:
      - /home/homelab/docker/monitoring/homelab_vm/ntfy:/var/lib/ntfy
    ports:
      - "8080:80"
    environment:
      - NTFY_BASE_URL=http://homelab.vish.local:8080
      - NTFY_CACHE_FILE=/var/lib/ntfy/cache.db
      - NTFY_AUTH_FILE=/var/lib/ntfy/auth.db
      - NTFY_ATTACHMENT_CACHE_DIR=/var/lib/ntfy/attachments
    restart: unless-stopped
    networks:
      - alerting

  ntfy-bridge:
    image: xenrox/ntfy-alertmanager:latest
    container_name: ntfy-bridge
    environment:
      - NTFY_TOPIC="REDACTED_NTFY_TOPIC"
      - NTFY_URL=http://ntfy:80
      - NTFY_USER=
      - NTFY_PASSWORD=
    "REDACTED_PASSWORD"
      - "8081:8080"
    restart: unless-stopped
    networks:
      - alerting

  signal-bridge:
    image: bbernhard/signal-cli-rest-api:latest
    container_name: signal-bridge
    ports:
      - "8082:8080"
    environment:
      - MODE=json-rpc
    volumes:
      - /home/homelab/docker/monitoring/homelab_vm/signal-data:/home/.local/share/signal-cli
    restart: unless-stopped
    networks:
      - alerting
```

## Configuration Files

### NTFY Server Configuration

**Location**: `/home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml`

```yaml
# Basic server configuration
base-url: "http://homelab.vish.local:8080"
listen-http: ":80"
cache-file: "/var/lib/ntfy/cache.db"
auth-file: "/var/lib/ntfy/auth.db"
attachment-cache-dir: "/var/lib/ntfy/attachments"

# Authentication and access control
auth-default-access: "deny-all"
enable-signup: false
enable-login: true

# Rate limiting
visitor-request-limit-burst: 60
visitor-request-limit-replenish: "5s"

# Message limits
message-limit: 4096
attachment-file-size-limit: "15M"
attachment-total-size-limit: "100M"

# Retention
cache-duration: "12h"
keepalive-interval: "45s"
manager-interval: "1m"

# Topics and subscriptions
topics:
  - name: "alerts"
    description: "System alerts from Prometheus/Alertmanager"
  - name: "gitea"
    description: "Git repository notifications"
  - name: "monitoring"
    description: "Infrastructure monitoring alerts"
```

### Alertmanager Integration

**Location**: `/home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml`

```yaml
global:
  smtp_smarthost: 'localhost:587'
  smtp_from: 'alertmanager@homelab.local'

route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
- name: 'web.hook'
  webhook_configs:
  - url: 'http://ntfy-bridge:8080/alerts'
    send_resolved: true
    http_config:
      basic_auth:
        username: ''
        password: ''

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
```

### Prometheus Alert Rules

**Location**: `/home/homelab/docker/monitoring/alerting/alert-rules.yml`

Key alert rules that trigger NTFY notifications:

```yaml
groups:
- name: system.rules
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

  - alert: HighCPUUsage
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is above 80% for more than 2 minutes."

  - alert: HighMemoryUsage
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 90
    for: 2m
    labels:
      severity: critical
    annotations:
      summary: "High memory usage on {{ $labels.instance }}"
      description: "Memory usage is above 90% for more than 2 minutes."

  - alert: DiskSpaceLow
    expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 10
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Low disk space on {{ $labels.instance }}"
      description: "Disk space is below 10% on root filesystem."
```

## Notification Channels

### 1. NTFY Web Interface
- **URL**: http://homelab.vish.local:8080
- **Topics**: 
  - `alerts` - System monitoring alerts
  - `gitea` - Git repository events
  - `monitoring` - Infrastructure status

### 2. Signal Messenger Integration
- **Bridge Container**: signal-bridge
- **Port**: 8082
- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/signal-data/`

### 3. Gitea Integration
- **Bridge Container**: gitea-ntfy-bridge
- **Configuration**: `/home/homelab/docker/monitoring/homelab_vm/gitea-ntfy-bridge/`

## Current Monitoring Targets

The Prometheus instance monitors the following nodes:

```yaml
# From /home/homelab/docker/monitoring/prometheus/prometheus.yml
scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "alertmanager"
    static_configs:
      - targets: ["alertmanager:9093"]

  - job_name: "node-exporter"
    static_configs:
      - targets: ["localhost:9100"]

  - job_name: "calypso-node"
    static_configs:
      - targets: ["100.75.252.64:9100"]

  - job_name: "seattle-node"
    static_configs:
      - targets: ["100.82.197.124:9100"]

  - job_name: "proxmox-node"
    static_configs:
      - targets: ["100.87.12.28:9100"]
```

## How to Modify Notifications

### 1. Adding New Alert Rules

Edit the alert rules file:
```bash
sudo nano /home/homelab/docker/monitoring/alerting/alert-rules.yml
```

Example new rule:
```yaml
- alert: ServiceDown
  expr: up{job="my-service"} == 0
  for: 30s
  labels:
    severity: warning
  annotations:
    summary: "Service {{ $labels.job }} is down"
    description: "The service {{ $labels.job }} on {{ $labels.instance }} has been down for more than 30 seconds."
```

### 2. Modifying Notification Routing

Edit Alertmanager configuration:
```bash
sudo nano /home/homelab/docker/monitoring/alerting/alertmanager/alertmanager.yml
```

### 3. Adding New NTFY Topics

Edit NTFY server configuration:
```bash
sudo nano /home/homelab/docker/monitoring/homelab_vm/ntfy/server.yml
```

### 4. Changing Notification Thresholds

Modify the alert expressions in `alert-rules.yml`. Common patterns:

- **CPU Usage**: `expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > THRESHOLD`
- **Memory Usage**: `expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > THRESHOLD`
- **Disk Usage**: `expr: (node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < THRESHOLD`

### 5. Reloading Configuration

After making changes:

```bash
# Reload Prometheus configuration
curl -X POST http://localhost:9090/-/reload

# Reload Alertmanager configuration
curl -X POST http://localhost:9093/-/reload

# Restart NTFY if server config changed
cd /home/homelab/docker/monitoring
docker compose -f homelab_vm/alerting.yaml restart ntfy
```

## Testing Notifications

### Manual Test via NTFY API

```bash
# Send test notification
curl -d "Test notification from homelab" http://homelab.vish.local:8080/alerts

# Send with priority and tags
curl -H "Priority: urgent" -H "Tags: warning,test" -d "High priority test" http://homelab.vish.local:8080/alerts
```

### Test Alert Rules

```bash
# Trigger a test alert by stopping a service temporarily
sudo systemctl stop node_exporter
# Wait for alert to fire, then restart
sudo systemctl start node_exporter
```

### Verify Alert Flow

1. **Prometheus** scrapes metrics and evaluates rules
2. **Alertmanager** receives alerts and routes them
3. **NTFY Bridge** converts alerts to NTFY messages
4. **NTFY Server** publishes to subscribed topics
5. **Signal Bridge** forwards to Signal messenger (if configured)

## Troubleshooting

### Common Issues

1. **Alerts not firing**: Check Prometheus targets are up
2. **Notifications not received**: Verify NTFY bridge connectivity
3. **Signal not working**: Check Signal bridge registration

### Useful Commands

```bash
# Check container status
docker ps | grep -E "(ntfy|alert|signal)"

# View logs
docker logs ntfy
docker logs ntfy-bridge
docker logs alertmanager

# Test connectivity
curl http://homelab.vish.local:8080/v1/health
curl http://localhost:9093/-/healthy
curl http://localhost:9090/-/healthy
```

### Log Locations

- **NTFY**: `docker logs ntfy`
- **Alertmanager**: `docker logs alertmanager`
- **Prometheus**: `docker logs prometheus`
- **NTFY Bridge**: `docker logs ntfy-bridge`

## Security Considerations

1. **Authentication**: NTFY server has authentication enabled
2. **Network**: All services run on internal Docker network
3. **Access Control**: Default access is deny-all
4. **Rate Limiting**: Configured to prevent abuse

## Backup and Recovery

### Important Files to Backup

- `/home/homelab/docker/monitoring/homelab_vm/ntfy/` - NTFY data
- `/home/homelab/docker/monitoring/alerting/` - Alert configurations
- `/home/homelab/docker/monitoring/prometheus/` - Prometheus config

### Recovery Process

1. Restore configuration files
2. Restart containers: `docker compose -f homelab_vm/alerting.yaml up -d`
3. Verify all services are healthy
4. Test notification flow

## Maintenance

### Regular Tasks

1. **Weekly**: Check alert rule effectiveness
2. **Monthly**: Review notification volumes
3. **Quarterly**: Update container images
4. **Annually**: Review and update alert thresholds

### Monitoring the Monitoring

- Monitor NTFY server uptime
- Track alert volume and patterns
- Verify notification delivery
- Check for false positives/negatives

---

**Last Updated**: February 15, 2026
**Maintainer**: Homelab Administrator
**Version**: 1.0