90 lines
5.1 KiB
Markdown
90 lines
5.1 KiB
Markdown
# Tailscale Mesh Connectivity Test
|
|
|
|
Last tested: 2026-03-31
|
|
|
|
## Test Method
|
|
|
|
Full `tailscale ping` from each SSH-accessible host to all other online Headscale nodes. This tests WireGuard tunnel establishment and reports whether the connection is direct (peer-to-peer) or relayed via DERP.
|
|
|
|
## Results
|
|
|
|
All 10 online hosts can reach all other online hosts. No failures.
|
|
|
|
### Connection Type Matrix
|
|
|
|
`D` = direct, `R` = DERP relay, `—` = self
|
|
|
|
| From \ To | Atlantis | Calypso | Pi-5 | Homelab | Matrix-Ubuntu | Setillo | NUC | Guava | Seattle | PVE |
|
|
|-----------|----------|---------|------|---------|---------------|---------|-----|-------|---------|-----|
|
|
| **Atlantis** | — | D | D | D | D | D | D | D | D | D |
|
|
| **Calypso** | D | — | D | D | D | R | D | D | D | D |
|
|
| **Pi-5** | D | D | — | D | D | D | D | D | D | D |
|
|
| **Homelab-VM** | D | D | D | — | D | R | D | D | D | D |
|
|
| **Matrix-Ubuntu** | (tested inbound) | | | | — | | | | | |
|
|
| **Setillo** | D | D | D | R | R | — | | | | |
|
|
| **NUC** | D | D | D | D | R | D | — | R | D | R |
|
|
| **Guava** | (no CLI) | | | | | | | — | | |
|
|
| **Seattle** | D | D | D | D | D | D | R | D | — | D |
|
|
| **PVE** | D | D | D | D | D | D | D | D | D | — |
|
|
|
|
### Notes
|
|
|
|
- **Atlantis/Calypso**: Tailscale binary at `/var/packages/Tailscale/target/bin/tailscale` (Synology package)
|
|
- **Setillo**: Tailscale binary at `/usr/local/bin/tailscale`
|
|
- **Guava**: Tailscale runs via TrueNAS built-in management; no `tailscale` CLI in PATH. Confirmed reachable via inbound pings from all other hosts.
|
|
- **Matrix-Ubuntu**: SSH via LAN IP (192.168.0.154) was unreliable during testing due to table 52 LAN interception (since fixed). Confirmed reachable via Tailscale IP and inbound pings.
|
|
- **DERP relays** (NUC ↔ some peers, Setillo ↔ homelab/matrix-ubuntu): Normal for nodes behind different NATs. Adds 15-60ms latency but does not affect reliability. Connections may upgrade to direct over time.
|
|
|
|
### ICMP Ping Notes
|
|
|
|
Standard ICMP ping from the OS (not `tailscale ping`) fails for Atlantis, Calypso, and Setillo because those hosts have ICMP blocked at the firewall level. This does not indicate a connectivity problem — `tailscale ping` and SSH both work.
|
|
|
|
## Tailscale Health Warnings
|
|
|
|
After fixes applied on 2026-03-31:
|
|
|
|
| Host | Health |
|
|
|------|--------|
|
|
| Homelab-VM | none |
|
|
| Pi-5 | none |
|
|
| NUC | none |
|
|
| Seattle | none |
|
|
| PVE | none (was `--accept-routes is false`, fixed) |
|
|
| Matrix-Ubuntu | none (was `--accept-routes is false`, fixed) |
|
|
|
|
## Fixes Applied (2026-03-31)
|
|
|
|
### LAN Routing (table 52 interception)
|
|
|
|
1. **Pi-5**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via dispatcher script + cron
|
|
2. **Matrix-Ubuntu**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via dispatcher script + cron. Enabling `--accept-routes` without the rule broke LAN connectivity (SSH via 192.168.0.154 timed out).
|
|
3. **PVE**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via cron @reboot
|
|
|
|
See [LAN Routing Fix](GUAVA_LAN_ROUTING_FIX.md) for full details on the table 52 issue.
|
|
|
|
### Kuma monitors
|
|
|
|
- Switched **all 20 Calypso monitors** from Tailscale IP (`100.103.48.78`) to LAN IP (`192.168.0.250`) in the Kuma SQLite DB. Pi-5 (where Kuma runs) is on the same LAN, so using Tailscale IPs added unnecessary fragility.
|
|
- Added LAN-based monitors for Rustdesk (ID:124) and Rackula (ID:125).
|
|
- Fixed corrupted `accepted_statuscodes_json` field (`[200-299]` → `["200-299"]`) that prevented the Kuma UI from loading.
|
|
- Fixed ntfy notifications by setting `primaryBaseURL` to `https://kuma.vish.gg` — the "view" action button was missing a URL.
|
|
|
|
### Calypso Tailscale 5-minute disconnect (root cause)
|
|
|
|
**Symptom**: Calypso's disco key rotated every 5 minutes, dropping all peer WireGuard sessions for ~30 seconds.
|
|
|
|
**Root cause**: A cron job in `/etc/crontab` ran `/usr/local/bin/tailscale-fix.sh` every 5 minutes (`*/5 * * * *`). The script checked for the `tailscale0` TUN device, but Calypso runs Tailscale in `--tun=userspace-networking` mode (Synology DSM7), which has no TUN device. The script also checked `tailscale status --json` which returned empty state when run as the `tailscale` user. So every 5 minutes:
|
|
|
|
1. Script detects "tailscale0 missing" or "state empty"
|
|
2. Runs `configure-host` + full service restart via `synosystemctl`
|
|
3. Re-authenticates with `tailscale up --reset`
|
|
4. New disco key generated → all peers tear down and re-establish connections
|
|
|
|
**Fix**: Rewrote `/usr/local/bin/tailscale-fix.sh` to check if `tailscaled` process is running and can `tailscale ping` a known peer (Atlantis), which works in both TUN and userspace-networking modes.
|
|
|
|
**Additional changes on Calypso** (not the root cause but good hygiene):
|
|
- Disabled Docker IPv6 on all bridge interfaces via `sysctl` (77 routes removed)
|
|
- Updated `dockerd.json` with `"ipv6": false, "ip6tables": false` (persistent after Docker restart)
|
|
- Added `TS_DEBUG_NETMON_SKIP_INTERFACE_REGEXPS` and `TS_DEBUG_NETMON_NO_ROUTE_MONITORING` env vars to Tailscale startup script
|
|
- Added `/etc/hosts` entry: `192.168.0.250 headscale.vish.gg` to avoid hairpin NAT for control plane
|