Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot f90b6dd93f

Documentation / Deploy to GitHub Pages (push) Has been cancelled

Details

Documentation / Build Docusaurus (push) Has been cancelled

Details

Sanitized mirror from private repository - 2026-04-05 10:17:38 UTC

2026-04-05 10:17:38 +00:00

5.3 KiB

Raw Blame History

Tailscale Host Monitoring Status Report

⚠️ Historical Snapshot: This document was generated on Feb 15, 2026. The alerts and offline status listed here are no longer current. For live node status, run tailscale status on the homelab VM or check Grafana at http://100.67.40.126:3000.

📊 Status Snapshot

Generated: February 15, 2026

Monitored Tailscale Hosts (13 total)

✅ Online Hosts (10)

atlantis-node (100.83.230.112:9100) - Synology NAS
atlantis-snmp (100.83.230.112) - SNMP monitoring
calypso-node (100.103.48.78:9100) - Node exporter
calypso-snmp (100.103.48.78) - SNMP monitoring
concord-nuc-node (100.72.55.21:9100) - Intel NUC
proxmox-node (100.87.12.28:9100) - Proxmox server
raspberry-pis (100.77.151.40:9100) - Pi cluster node
setillo-node (100.125.0.20:9100) - Node exporter
setillo-snmp (100.125.0.20) - SNMP monitoring
truenas-node (100.75.252.64:9100) - TrueNAS server

❌ Offline Hosts (3)

homelab-node (100.67.40.126:9100) - Main homelab VM
raspberry-pis (100.123.246.75:9100) - Pi cluster node
vmi2076105-node (100.99.156.20:9100) - VPS instance

🚨 Active Alerts

Critical HostDown Alerts (2 firing)

vmi2076105-node (100.99.156.20:9100)
- Status: Firing since Feb 14, 07:57 UTC
- Duration: ~24 hours
- Notifications: Sent to ntfy + Signal
homelab-node (100.67.40.126:9100)
- Status: Firing since Feb 14, 09:23 UTC
- Duration: ~22 hours
- Notifications: Sent to ntfy + Signal

📬 Notification System Status

✅ Working Notification Channels

ntfy: http://192.168.0.210:8081/homelab-alerts ✅
Signal: Via signal-bridge (critical alerts) ✅
Alertmanager: http://100.67.40.126:9093 ✅

Test Results

ntfy notification test: PASSED ✅
Message delivery: CONFIRMED ✅
Alert routing: WORKING ✅

⚙️ Monitoring Configuration

Alert Rules

Trigger: Host unreachable for 2+ minutes
Severity: Critical (dual-channel notifications)
Query: up{job=~".*-node"} == 0
Evaluation: Every 30 seconds

Notification Routing

Warning alerts → ntfy only
Critical alerts → ntfy + Signal
Resolved alerts → Both channels

🔧 Infrastructure Details

Monitoring Stack

Prometheus: http://100.67.40.126:9090
Grafana: http://100.67.40.126:3000
Alertmanager: http://100.67.40.126:9093
Bridge Services: ntfy-bridge (5001), signal-bridge (5000)

Data Collection

Node Exporter: System metrics on port 9100
SNMP Exporter: Network device metrics on port 9116
Scrape Interval: 15 seconds
Retention: Default Prometheus retention

📋 Recommendations

Immediate Actions

Investigate offline hosts:
- Check homelab-node (100.67.40.126) - main VM down
- Verify vmi2076105-node (100.99.156.20) - VPS status
- Check raspberry-pis node (100.123.246.75)
Verify notifications:
- Confirm you're receiving ntfy alerts on mobile
- Test Signal notifications for critical alerts

Maintenance

Monitor disk space on active hosts
Review alert thresholds if needed
Consider adding more monitoring targets

🧪 Testing

Use the test script to verify monitoring:

./scripts/test-tailscale-monitoring.sh

For manual testing:

Stop node_exporter on any host: sudo systemctl stop node_exporter
Wait 2+ minutes for alert to fire
Check ntfy app and Signal for notifications
Restart: sudo systemctl start node_exporter

🟢 Verified Online Nodes (March 2026)

As of March 11, 2026, all 16 active nodes verified reachable via ping:

Node	Tailscale IP	Role
atlantis	100.83.230.112	Primary NAS, exit node
calypso	100.103.48.78	Secondary NAS, Headscale host
setillo	100.125.0.20	Remote NAS, Tucson
homelab	100.67.40.126	Main VM (this host)
pve	100.87.12.28	Proxmox hypervisor
vish-concord-nuc	100.72.55.21	Intel NUC, exit node
pi-5	100.77.151.40	Raspberry Pi 5
matrix-ubuntu	100.85.21.51	Atlantis VM
guava	100.75.252.64	TrueNAS Scale
jellyfish	100.69.121.120	Pi 5 media/NAS
gl-mt3000	100.126.243.15	GL.iNet router (remote), SSH alias `gl-mt3000`
gl-be3600	100.105.59.123	GL.iNet router (Concord), exit node
homeassistant	100.112.186.90	HA Green (via GL-MT3000 subnet)
seattle	100.82.197.124	Contabo VPS, exit node
shinku-ryuu	100.98.93.15	Desktop workstation (Windows)
moon	100.64.0.6	Debian x86_64, GL-MT3000 subnet (`192.168.12.223`)
headscale-test	100.64.0.1	Headscale test node

Notes

moon was migrated from public Tailscale (dvish92@) to Headscale on 2026-03-14. It is on the 192.168.12.0/24 subnet behind the GL-MT3000 router. accept_routes=true is enabled so it can reach 192.168.0.0/24 (home LAN) via Calypso's subnet advertisement.
guava has accept_routes=false to prevent Calypso's 192.168.0.0/24 route from overriding its own LAN replies. See docs/troubleshooting/guava-smb-incident-2026-03-14.md.
shinku-ryuu also has accept_routes=false for the same reason.

Last Updated: March 2026
Note: The Feb 2026 alerts (homelab-node and vmi2076105-node offline) were resolved. Both nodes are now online.

5.3 KiB Raw Blame History