3.4 KiB
3.4 KiB
Tailscale Host Monitoring Status Report
📊 Current Status
Generated: February 15, 2026
Monitored Tailscale Hosts (13 total)
✅ Online Hosts (10)
- atlantis-node (100.83.230.112:9100) - Synology NAS
- atlantis-snmp (100.83.230.112) - SNMP monitoring
- calypso-node (100.103.48.78:9100) - Node exporter
- calypso-snmp (100.103.48.78) - SNMP monitoring
- concord-nuc-node (100.72.55.21:9100) - Intel NUC
- proxmox-node (100.87.12.28:9100) - Proxmox server
- raspberry-pis (100.77.151.40:9100) - Pi cluster node
- setillo-node (100.125.0.20:9100) - Node exporter
- setillo-snmp (100.125.0.20) - SNMP monitoring
- truenas-node (100.75.252.64:9100) - TrueNAS server
❌ Offline Hosts (3)
- homelab-node (100.67.40.126:9100) - Main homelab VM
- raspberry-pis (100.123.246.75:9100) - Pi cluster node
- vmi2076105-node (100.99.156.20:9100) - VPS instance
🚨 Active Alerts
Critical HostDown Alerts (2 firing)
-
vmi2076105-node (100.99.156.20:9100)
- Status: Firing since Feb 14, 07:57 UTC
- Duration: ~24 hours
- Notifications: Sent to ntfy + Signal
-
homelab-node (100.67.40.126:9100)
- Status: Firing since Feb 14, 09:23 UTC
- Duration: ~22 hours
- Notifications: Sent to ntfy + Signal
📬 Notification System Status
✅ Working Notification Channels
- ntfy: http://192.168.0.210:8081/homelab-alerts ✅
- Signal: Via signal-bridge (critical alerts) ✅
- Alertmanager: http://100.67.40.126:9093 ✅
Test Results
- ntfy notification test: PASSED ✅
- Message delivery: CONFIRMED ✅
- Alert routing: WORKING ✅
⚙️ Monitoring Configuration
Alert Rules
- Trigger: Host unreachable for 2+ minutes
- Severity: Critical (dual-channel notifications)
- Query:
up{job=~".*-node"} == 0 - Evaluation: Every 30 seconds
Notification Routing
- Warning alerts → ntfy only
- Critical alerts → ntfy + Signal
- Resolved alerts → Both channels
🔧 Infrastructure Details
Monitoring Stack
- Prometheus: http://100.67.40.126:9090
- Grafana: http://100.67.40.126:3300
- Alertmanager: http://100.67.40.126:9093
- Bridge Services: ntfy-bridge (5001), signal-bridge (5000)
Data Collection
- Node Exporter: System metrics on port 9100
- SNMP Exporter: Network device metrics on port 9116
- Scrape Interval: 15 seconds
- Retention: Default Prometheus retention
📋 Recommendations
Immediate Actions
-
Investigate offline hosts:
- Check homelab-node (100.67.40.126) - main VM down
- Verify vmi2076105-node (100.99.156.20) - VPS status
- Check raspberry-pis node (100.123.246.75)
-
Verify notifications:
- Confirm you're receiving ntfy alerts on mobile
- Test Signal notifications for critical alerts
Maintenance
- Monitor disk space on active hosts
- Review alert thresholds if needed
- Consider adding more monitoring targets
🧪 Testing
Use the test script to verify monitoring:
./scripts/test-tailscale-monitoring.sh
For manual testing:
- Stop node_exporter on any host:
sudo systemctl stop node_exporter - Wait 2+ minutes for alert to fire
- Check ntfy app and Signal for notifications
- Restart:
sudo systemctl start node_exporter
Last Updated: February 15, 2026
Next Review: Weekly or when infrastructure changes