Files
homelab-optimized/docs/admin/tailscale-monitoring-status.md
Gitea Mirror Bot f90b6dd93f
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-05 10:17:38 UTC
2026-04-05 10:17:38 +00:00

5.3 KiB

Tailscale Host Monitoring Status Report

⚠️ Historical Snapshot: This document was generated on Feb 15, 2026. The alerts and offline status listed here are no longer current. For live node status, run tailscale status on the homelab VM or check Grafana at http://100.67.40.126:3000.

📊 Status Snapshot

Generated: February 15, 2026

Monitored Tailscale Hosts (13 total)

Online Hosts (10)

  • atlantis-node (100.83.230.112:9100) - Synology NAS
  • atlantis-snmp (100.83.230.112) - SNMP monitoring
  • calypso-node (100.103.48.78:9100) - Node exporter
  • calypso-snmp (100.103.48.78) - SNMP monitoring
  • concord-nuc-node (100.72.55.21:9100) - Intel NUC
  • proxmox-node (100.87.12.28:9100) - Proxmox server
  • raspberry-pis (100.77.151.40:9100) - Pi cluster node
  • setillo-node (100.125.0.20:9100) - Node exporter
  • setillo-snmp (100.125.0.20) - SNMP monitoring
  • truenas-node (100.75.252.64:9100) - TrueNAS server

Offline Hosts (3)

  • homelab-node (100.67.40.126:9100) - Main homelab VM
  • raspberry-pis (100.123.246.75:9100) - Pi cluster node
  • vmi2076105-node (100.99.156.20:9100) - VPS instance

🚨 Active Alerts

Critical HostDown Alerts (2 firing)

  1. vmi2076105-node (100.99.156.20:9100)

    • Status: Firing since Feb 14, 07:57 UTC
    • Duration: ~24 hours
    • Notifications: Sent to ntfy + Signal
  2. homelab-node (100.67.40.126:9100)

    • Status: Firing since Feb 14, 09:23 UTC
    • Duration: ~22 hours
    • Notifications: Sent to ntfy + Signal

📬 Notification System Status

Working Notification Channels

Test Results

  • ntfy notification test: PASSED
  • Message delivery: CONFIRMED
  • Alert routing: WORKING

⚙️ Monitoring Configuration

Alert Rules

  • Trigger: Host unreachable for 2+ minutes
  • Severity: Critical (dual-channel notifications)
  • Query: up{job=~".*-node"} == 0
  • Evaluation: Every 30 seconds

Notification Routing

  • Warning alerts → ntfy only
  • Critical alerts → ntfy + Signal
  • Resolved alerts → Both channels

🔧 Infrastructure Details

Monitoring Stack

Data Collection

  • Node Exporter: System metrics on port 9100
  • SNMP Exporter: Network device metrics on port 9116
  • Scrape Interval: 15 seconds
  • Retention: Default Prometheus retention

📋 Recommendations

Immediate Actions

  1. Investigate offline hosts:

    • Check homelab-node (100.67.40.126) - main VM down
    • Verify vmi2076105-node (100.99.156.20) - VPS status
    • Check raspberry-pis node (100.123.246.75)
  2. Verify notifications:

    • Confirm you're receiving ntfy alerts on mobile
    • Test Signal notifications for critical alerts

Maintenance

  • Monitor disk space on active hosts
  • Review alert thresholds if needed
  • Consider adding more monitoring targets

🧪 Testing

Use the test script to verify monitoring:

./scripts/test-tailscale-monitoring.sh

For manual testing:

  1. Stop node_exporter on any host: sudo systemctl stop node_exporter
  2. Wait 2+ minutes for alert to fire
  3. Check ntfy app and Signal for notifications
  4. Restart: sudo systemctl start node_exporter

🟢 Verified Online Nodes (March 2026)

As of March 11, 2026, all 16 active nodes verified reachable via ping:

Node Tailscale IP Role
atlantis 100.83.230.112 Primary NAS, exit node
calypso 100.103.48.78 Secondary NAS, Headscale host
setillo 100.125.0.20 Remote NAS, Tucson
homelab 100.67.40.126 Main VM (this host)
pve 100.87.12.28 Proxmox hypervisor
vish-concord-nuc 100.72.55.21 Intel NUC, exit node
pi-5 100.77.151.40 Raspberry Pi 5
matrix-ubuntu 100.85.21.51 Atlantis VM
guava 100.75.252.64 TrueNAS Scale
jellyfish 100.69.121.120 Pi 5 media/NAS
gl-mt3000 100.126.243.15 GL.iNet router (remote), SSH alias gl-mt3000
gl-be3600 100.105.59.123 GL.iNet router (Concord), exit node
homeassistant 100.112.186.90 HA Green (via GL-MT3000 subnet)
seattle 100.82.197.124 Contabo VPS, exit node
shinku-ryuu 100.98.93.15 Desktop workstation (Windows)
moon 100.64.0.6 Debian x86_64, GL-MT3000 subnet (192.168.12.223)
headscale-test 100.64.0.1 Headscale test node

Notes

  • moon was migrated from public Tailscale (dvish92@) to Headscale on 2026-03-14. It is on the 192.168.12.0/24 subnet behind the GL-MT3000 router. accept_routes=true is enabled so it can reach 192.168.0.0/24 (home LAN) via Calypso's subnet advertisement.
  • guava has accept_routes=false to prevent Calypso's 192.168.0.0/24 route from overriding its own LAN replies. See docs/troubleshooting/guava-smb-incident-2026-03-14.md.
  • shinku-ryuu also has accept_routes=false for the same reason.

Last Updated: March 2026
Note: The Feb 2026 alerts (homelab-node and vmi2076105-node offline) were resolved. Both nodes are now online.