Files
homelab-optimized/docs/admin/tailscale-monitoring-status.md
Gitea Mirror Bot 5d4f7d9d45
Some checks failed
Documentation / Build Docusaurus (push) Failing after 9s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-07 08:33:21 UTC
2026-03-07 08:33:21 +00:00

3.4 KiB

Tailscale Host Monitoring Status Report

📊 Current Status

Generated: February 15, 2026

Monitored Tailscale Hosts (13 total)

Online Hosts (10)

  • atlantis-node (100.83.230.112:9100) - Synology NAS
  • atlantis-snmp (100.83.230.112) - SNMP monitoring
  • calypso-node (100.103.48.78:9100) - Node exporter
  • calypso-snmp (100.103.48.78) - SNMP monitoring
  • concord-nuc-node (100.72.55.21:9100) - Intel NUC
  • proxmox-node (100.87.12.28:9100) - Proxmox server
  • raspberry-pis (100.77.151.40:9100) - Pi cluster node
  • setillo-node (100.125.0.20:9100) - Node exporter
  • setillo-snmp (100.125.0.20) - SNMP monitoring
  • truenas-node (100.75.252.64:9100) - TrueNAS server

Offline Hosts (3)

  • homelab-node (100.67.40.126:9100) - Main homelab VM
  • raspberry-pis (100.123.246.75:9100) - Pi cluster node
  • vmi2076105-node (100.99.156.20:9100) - VPS instance

🚨 Active Alerts

Critical HostDown Alerts (2 firing)

  1. vmi2076105-node (100.99.156.20:9100)

    • Status: Firing since Feb 14, 07:57 UTC
    • Duration: ~24 hours
    • Notifications: Sent to ntfy + Signal
  2. homelab-node (100.67.40.126:9100)

    • Status: Firing since Feb 14, 09:23 UTC
    • Duration: ~22 hours
    • Notifications: Sent to ntfy + Signal

📬 Notification System Status

Working Notification Channels

Test Results

  • ntfy notification test: PASSED
  • Message delivery: CONFIRMED
  • Alert routing: WORKING

⚙️ Monitoring Configuration

Alert Rules

  • Trigger: Host unreachable for 2+ minutes
  • Severity: Critical (dual-channel notifications)
  • Query: up{job=~".*-node"} == 0
  • Evaluation: Every 30 seconds

Notification Routing

  • Warning alerts → ntfy only
  • Critical alerts → ntfy + Signal
  • Resolved alerts → Both channels

🔧 Infrastructure Details

Monitoring Stack

Data Collection

  • Node Exporter: System metrics on port 9100
  • SNMP Exporter: Network device metrics on port 9116
  • Scrape Interval: 15 seconds
  • Retention: Default Prometheus retention

📋 Recommendations

Immediate Actions

  1. Investigate offline hosts:

    • Check homelab-node (100.67.40.126) - main VM down
    • Verify vmi2076105-node (100.99.156.20) - VPS status
    • Check raspberry-pis node (100.123.246.75)
  2. Verify notifications:

    • Confirm you're receiving ntfy alerts on mobile
    • Test Signal notifications for critical alerts

Maintenance

  • Monitor disk space on active hosts
  • Review alert thresholds if needed
  • Consider adding more monitoring targets

🧪 Testing

Use the test script to verify monitoring:

./scripts/test-tailscale-monitoring.sh

For manual testing:

  1. Stop node_exporter on any host: sudo systemctl stop node_exporter
  2. Wait 2+ minutes for alert to fire
  3. Check ntfy app and Signal for notifications
  4. Restart: sudo systemctl start node_exporter

Last Updated: February 15, 2026
Next Review: Weekly or when infrastructure changes