Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot 2db71a65a6

Documentation / Build Docusaurus (push) Failing after 8s

Details

Documentation / Deploy to GitHub Pages (push) Has been skipped

Details

Sanitized mirror from private repository - 2026-03-09 11:57:39 UTC

2026-03-09 11:57:39 +00:00

3.4 KiB

Raw Blame History

Tailscale Host Monitoring Status Report

📊 Current Status

Generated: February 15, 2026

Monitored Tailscale Hosts (13 total)

✅ Online Hosts (10)

atlantis-node (100.83.230.112:9100) - Synology NAS
atlantis-snmp (100.83.230.112) - SNMP monitoring
calypso-node (100.103.48.78:9100) - Node exporter
calypso-snmp (100.103.48.78) - SNMP monitoring
concord-nuc-node (100.72.55.21:9100) - Intel NUC
proxmox-node (100.87.12.28:9100) - Proxmox server
raspberry-pis (100.77.151.40:9100) - Pi cluster node
setillo-node (100.125.0.20:9100) - Node exporter
setillo-snmp (100.125.0.20) - SNMP monitoring
truenas-node (100.75.252.64:9100) - TrueNAS server

❌ Offline Hosts (3)

homelab-node (100.67.40.126:9100) - Main homelab VM
raspberry-pis (100.123.246.75:9100) - Pi cluster node
vmi2076105-node (100.99.156.20:9100) - VPS instance

🚨 Active Alerts

Critical HostDown Alerts (2 firing)

vmi2076105-node (100.99.156.20:9100)
- Status: Firing since Feb 14, 07:57 UTC
- Duration: ~24 hours
- Notifications: Sent to ntfy + Signal
homelab-node (100.67.40.126:9100)
- Status: Firing since Feb 14, 09:23 UTC
- Duration: ~22 hours
- Notifications: Sent to ntfy + Signal

📬 Notification System Status

✅ Working Notification Channels

ntfy: http://192.168.0.210:8081/homelab-alerts ✅
Signal: Via signal-bridge (critical alerts) ✅
Alertmanager: http://100.67.40.126:9093 ✅

Test Results

ntfy notification test: PASSED ✅
Message delivery: CONFIRMED ✅
Alert routing: WORKING ✅

⚙️ Monitoring Configuration

Alert Rules

Trigger: Host unreachable for 2+ minutes
Severity: Critical (dual-channel notifications)
Query: up{job=~".*-node"} == 0
Evaluation: Every 30 seconds

Notification Routing

Warning alerts → ntfy only
Critical alerts → ntfy + Signal
Resolved alerts → Both channels

🔧 Infrastructure Details

Monitoring Stack

Prometheus: http://100.67.40.126:9090
Grafana: http://100.67.40.126:3300
Alertmanager: http://100.67.40.126:9093
Bridge Services: ntfy-bridge (5001), signal-bridge (5000)

Data Collection

Node Exporter: System metrics on port 9100
SNMP Exporter: Network device metrics on port 9116
Scrape Interval: 15 seconds
Retention: Default Prometheus retention

📋 Recommendations

Immediate Actions

Investigate offline hosts:
- Check homelab-node (100.67.40.126) - main VM down
- Verify vmi2076105-node (100.99.156.20) - VPS status
- Check raspberry-pis node (100.123.246.75)
Verify notifications:
- Confirm you're receiving ntfy alerts on mobile
- Test Signal notifications for critical alerts

Maintenance

Monitor disk space on active hosts
Review alert thresholds if needed
Consider adding more monitoring targets

🧪 Testing

Use the test script to verify monitoring:

./scripts/test-tailscale-monitoring.sh

For manual testing:

Stop node_exporter on any host: sudo systemctl stop node_exporter
Wait 2+ minutes for alert to fire
Check ntfy app and Signal for notifications
Restart: sudo systemctl start node_exporter

Last Updated: February 15, 2026
Next Review: Weekly or when infrastructure changes

3.4 KiB Raw Blame History