Files
homelab-optimized/docs/admin/tailscale-monitoring-status.md
Gitea Mirror Bot d90cf1f849
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-19 09:52:01 UTC
2026-04-19 09:52:01 +00:00

5.6 KiB

Tailscale Host Monitoring Status Report

⚠️ Historical Snapshot: This document was generated on Feb 15, 2026. The alerts and offline status listed here are no longer current. For live node status, run tailscale status on the homelab VM or check Grafana at http://100.67.40.126:3000.

📊 Status Snapshot

Generated: February 15, 2026

Monitored Tailscale Hosts (13 total)

Online Hosts (10)

  • atlantis-node (100.83.230.112:9100) - Synology NAS
  • atlantis-snmp (100.83.230.112) - SNMP monitoring
  • calypso-node (100.103.48.78:9100) - Node exporter
  • calypso-snmp (100.103.48.78) - SNMP monitoring
  • concord-nuc-node (100.72.55.21:9100) - Intel NUC
  • proxmox-node (100.87.12.28:9100) - Proxmox server
  • raspberry-pis (100.77.151.40:9100) - Pi cluster node
  • setillo-node (100.125.0.20:9100) - Node exporter
  • setillo-snmp (100.125.0.20) - SNMP monitoring
  • truenas-node (100.75.252.64:9100) - TrueNAS server

Offline Hosts (3)

  • homelab-node (100.67.40.126:9100) - Main homelab VM
  • raspberry-pis (100.123.246.75:9100) - Pi cluster node
  • vmi2076105-node (100.99.156.20:9100) - VPS instance

🚨 Active Alerts

Critical HostDown Alerts (2 firing)

  1. vmi2076105-node (100.99.156.20:9100)

    • Status: Firing since Feb 14, 07:57 UTC
    • Duration: ~24 hours
    • Notifications: Sent to ntfy + Signal
  2. homelab-node (100.67.40.126:9100)

    • Status: Firing since Feb 14, 09:23 UTC
    • Duration: ~22 hours
    • Notifications: Sent to ntfy + Signal

📬 Notification System Status

Working Notification Channels

Test Results

  • ntfy notification test: PASSED
  • Message delivery: CONFIRMED
  • Alert routing: WORKING

⚙️ Monitoring Configuration

Alert Rules

  • Trigger: Host unreachable for 2+ minutes
  • Severity: Critical (dual-channel notifications)
  • Query: up{job=~".*-node"} == 0
  • Evaluation: Every 30 seconds

Notification Routing

  • Warning alerts → ntfy only
  • Critical alerts → ntfy + Signal
  • Resolved alerts → Both channels

🔧 Infrastructure Details

Monitoring Stack

Data Collection

  • Node Exporter: System metrics on port 9100
  • SNMP Exporter: Network device metrics on port 9116
  • Scrape Interval: 15 seconds
  • Retention: Default Prometheus retention

📋 Recommendations

Immediate Actions

  1. Investigate offline hosts:

    • Check homelab-node (100.67.40.126) - main VM down
    • Verify vmi2076105-node (100.99.156.20) - VPS status
    • Check raspberry-pis node (100.123.246.75)
  2. Verify notifications:

    • Confirm you're receiving ntfy alerts on mobile
    • Test Signal notifications for critical alerts

Maintenance

  • Monitor disk space on active hosts
  • Review alert thresholds if needed
  • Consider adding more monitoring targets

🧪 Testing

Use the test script to verify monitoring:

./scripts/test-tailscale-monitoring.sh

For manual testing:

  1. Stop node_exporter on any host: sudo systemctl stop node_exporter
  2. Wait 2+ minutes for alert to fire
  3. Check ntfy app and Signal for notifications
  4. Restart: sudo systemctl start node_exporter

🟢 Verified Online Nodes (March 2026)

As of March 11, 2026, all 16 active nodes verified reachable via ping:

Node Tailscale IP Role
atlantis 100.83.230.112 Primary NAS, exit node
calypso 100.103.48.78 Secondary NAS, Headscale host
setillo 100.125.0.20 Remote NAS, Tucson
homelab 100.67.40.126 Main VM (this host)
pve 100.87.12.28 Proxmox hypervisor
vish-concord-nuc 100.72.55.21 Intel NUC, exit node
pi-5 100.77.151.40 Raspberry Pi 5
matrix-ubuntu 100.85.21.51 Atlantis VM
guava 100.75.252.64 TrueNAS Scale
jellyfish 100.69.121.120 Pi 5 media/NAS
gl-mt3000 100.126.243.15 GL.iNet Beryl AX (travel router, repeater behind GL-MT3600BE, exit node)
gl-be3600 100.105.59.123 GL.iNet Slate 7 (travel router, exit node)
gl-mt3600be 100.64.0.10 GL.iNet Beryl 7 (remote primary gateway, subnet + exit node)
homeassistant 100.112.186.90 HA Green (via remote subnet, behind GL-MT3600BE)
seattle 100.82.197.124 Contabo VPS, exit node
shinku-ryuu 100.98.93.15 Desktop workstation (Windows)
moon 100.64.0.6 Debian x86_64, remote subnet (192.168.12.223, behind GL-MT3600BE)
jellyfish 100.69.121.120 Remote workstation (behind GL-MT3600BE)
headscale-test 100.64.0.1 Headscale test node

Notes

  • moon was migrated from public Tailscale (dvish92@) to Headscale on 2026-03-14. It is on the 192.168.12.0/24 subnet, now behind the GL-MT3600BE (Beryl 7) router (replaced GL-MT3000 on 2026-04-16). accept_routes=true is enabled so it can reach 192.168.0.0/24 (home LAN) via Calypso's subnet advertisement.
  • guava has accept_routes=false to prevent Calypso's 192.168.0.0/24 route from overriding its own LAN replies. See docs/troubleshooting/guava-smb-incident-2026-03-14.md.
  • shinku-ryuu also has accept_routes=false for the same reason.

Last Updated: April 2026
Note: The Feb 2026 alerts (homelab-node and vmi2076105-node offline) were resolved. Both nodes are now online.