Files
homelab-optimized/docs/networking/TAILSCALE_MESH_TEST.md
Gitea Mirror Bot e7652c8dab
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m3s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-20 01:32:01 UTC
2026-04-20 01:32:01 +00:00

140 lines
8.0 KiB
Markdown

# Tailscale Mesh Connectivity Test
Last tested: 2026-03-31
## Test Method
Full `tailscale ping` from each SSH-accessible host to all other online Headscale nodes. This tests WireGuard tunnel establishment and reports whether the connection is direct (peer-to-peer) or relayed via DERP.
## Results
All 10 online hosts can reach all other online hosts. No failures.
### Connection Type Matrix
`D` = direct, `R` = DERP relay, `—` = self
| From \ To | Atlantis | Calypso | Pi-5 | Homelab | Matrix-Ubuntu | Setillo | NUC | Guava | Seattle | PVE |
|-----------|----------|---------|------|---------|---------------|---------|-----|-------|---------|-----|
| **Atlantis** | — | D | D | D | D | D | D | D | D | D |
| **Calypso** | D | — | D | D | D | R | D | D | D | D |
| **Pi-5** | D | D | — | D | D | D | D | D | D | D |
| **Homelab-VM** | D | D | D | — | D | R | D | D | D | D |
| **Matrix-Ubuntu** | (tested inbound) | | | | — | | | | | |
| **Setillo** | D | D | D | R | R | — | | | | |
| **NUC** | D | D | D | D | R | D | — | R | D | R |
| **Guava** | (no CLI) | | | | | | | — | | |
| **Seattle** | D | D | D | D | D | D | R | D | — | D |
| **PVE** | D | D | D | D | D | D | D | D | D | — |
### Notes
- **Atlantis/Calypso**: Tailscale binary at `/var/packages/Tailscale/target/bin/tailscale` (Synology package)
- **Setillo**: Tailscale binary at `/usr/local/bin/tailscale`
- **Guava**: Tailscale runs via TrueNAS built-in management; no `tailscale` CLI in PATH. Confirmed reachable via inbound pings from all other hosts.
- **Matrix-Ubuntu**: SSH via LAN IP (192.168.0.154) was unreliable during testing due to table 52 LAN interception (since fixed). Confirmed reachable via Tailscale IP and inbound pings.
- **DERP relays** (NUC ↔ some peers, Setillo ↔ homelab/matrix-ubuntu): Normal for nodes behind different NATs. Adds 15-60ms latency but does not affect reliability. Connections may upgrade to direct over time.
### ICMP Ping Notes
Standard ICMP ping from the OS (not `tailscale ping`) fails for Atlantis, Calypso, and Setillo because those hosts have ICMP blocked at the firewall level. This does not indicate a connectivity problem — `tailscale ping` and SSH both work.
## Tailscale Health Warnings
After fixes applied on 2026-03-31:
| Host | Health |
|------|--------|
| Homelab-VM | none |
| Pi-5 | none |
| NUC | none |
| Seattle | none |
| PVE | none (was `--accept-routes is false`, fixed) |
| Matrix-Ubuntu | none (was `--accept-routes is false`, fixed) |
## Fixes Applied (2026-03-31)
### LAN Routing (table 52 interception)
1. **Pi-5**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via dispatcher script + cron
2. **Matrix-Ubuntu**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via dispatcher script + cron. Enabling `--accept-routes` without the rule broke LAN connectivity (SSH via 192.168.0.154 timed out).
3. **PVE**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via cron @reboot
See [LAN Routing Fix](GUAVA_LAN_ROUTING_FIX.md) for full details on the table 52 issue.
### Kuma monitors
- Switched **all 20 Calypso monitors** from Tailscale IP (`100.103.48.78`) to LAN IP (`192.168.0.250`) in the Kuma SQLite DB. Pi-5 (where Kuma runs) is on the same LAN, so using Tailscale IPs added unnecessary fragility.
- Added LAN-based monitors for Rustdesk (ID:124) and Rackula (ID:125).
- Fixed corrupted `accepted_statuscodes_json` field (`[200-299]``["200-299"]`) that prevented the Kuma UI from loading.
- Fixed ntfy notifications by setting `primaryBaseURL` to `https://kuma.vish.gg` — the "view" action button was missing a URL.
### Calypso Tailscale 5-minute disconnect (root cause)
**Symptom**: Calypso's disco key rotated every 5 minutes, dropping all peer WireGuard sessions for ~30 seconds.
**Root cause**: A cron job in `/etc/crontab` ran `/usr/local/bin/tailscale-fix.sh` every 5 minutes (`*/5 * * * *`). The script checked for the `tailscale0` TUN device, but Calypso runs Tailscale in `--tun=userspace-networking` mode (Synology DSM7), which has no TUN device. The script also checked `tailscale status --json` which returned empty state when run as the `tailscale` user. So every 5 minutes:
1. Script detects "tailscale0 missing" or "state empty"
2. Runs `configure-host` + full service restart via `synosystemctl`
3. Re-authenticates with `tailscale up --reset`
4. New disco key generated → all peers tear down and re-establish connections
**Fix**: Rewrote `/usr/local/bin/tailscale-fix.sh` to check if `tailscaled` process is running and can `tailscale ping` a known peer (Atlantis), which works in both TUN and userspace-networking modes.
**Additional changes on Calypso** (not the root cause but good hygiene):
- Disabled Docker IPv6 on all bridge interfaces via `sysctl` (77 routes removed)
- Updated `dockerd.json` with `"ipv6": false, "ip6tables": false` (persistent after Docker restart)
- Added `TS_DEBUG_NETMON_SKIP_INTERFACE_REGEXPS` and `TS_DEBUG_NETMON_NO_ROUTE_MONITORING` env vars to Tailscale startup script
- Added `/etc/hosts` entry: `192.168.0.250 headscale.vish.gg` to avoid hairpin NAT for control plane
### GL.iNet router Tailscale fixes
**GL-MT3000** (Hawaii, via Jellyfish at `192.168.12.1`):
- Tailscale was pointed at `https://controlplane.tailscale.com` (official) instead of Headscale
- Re-registered with preauth key: `tailscale up --login-server=https://headscale.vish.gg:8443 --authkey=<key> --hostname=gl-mt3000 --accept-routes --accept-dns=false --advertise-routes=192.168.12.0/24 --reset`
- Auto-start verified: `/etc/init.d/tailscale-up` (S81) runs `tailscale up` with correct flags after 10s delay
- State file at `/etc/tailscale/tailscaled.state` has `WantRunning: true`, `ControlURL: headscale.vish.gg:8443`
**GL-BE3600** (Home gateway at `192.168.0.1`):
- Was already connected to Headscale correctly, no fix needed
- SSH not available on LAN port 22 — accessible via Tailscale IP (`100.105.59.123`)
- Auto-start verified: `/etc/init.d/tailscale-up` (S99) waits up to 120s for network, then connects with `--advertise-routes=192.168.68.0/22,192.168.8.0/24`
### Speedtest results (Ookla official CLI, staggered)
Tests run sequentially per WAN to avoid bandwidth contention. Tested 2026-03-31.
| Host | Location | NIC | ISP | Download | Upload | Latency | Loss |
|------|----------|-----|-----|----------|--------|---------|------|
| Calypso | Home | 10GbE | Sonic | 8,080 Mbps | 6,847 Mbps | 3 ms | 0% |
| Atlantis | Home | 10GbE | Sonic | 6,079 Mbps | 6,908 Mbps | 3 ms | 0% |
| Matrix-Ubuntu | Home | virtio (on Atlantis) | Sonic | 3,537 Mbps | 3,666 Mbps | 4 ms | 0% |
| GL-MT3000 | Hawaii | 1GbE | Spectrum | 1,051 Mbps | 37 Mbps | 8 ms | 0% |
| NUC | Concord | 1GbE | Comcast | 943 Mbps | 357 Mbps | 14 ms | 3.7% |
| GL-BE3600 | Home (gateway) | 2.5GbE | Comcast | 781 Mbps | 71 Mbps | 25 ms | 0% |
| Homelab-VM | Home | 1GbE virtio | Sonic | 616 Mbps | 933 Mbps | 3 ms | 0% |
| Pi-5 | Home | 1GbE | Sonic | 551 Mbps | 936 Mbps | 3 ms | 0% |
| Seattle VPS | Seattle | VPS | Nubes LLC | 197 Mbps | 197 Mbps | 1 ms | 0% |
| Setillo | Remote | Starlink | Starlink | 72 Mbps | 27 Mbps | 27 ms | 0% |
### iperf3 benchmarks (all hosts → Calypso)
All tests against Calypso's iperf3 server (`192.168.0.250:5201` for LAN, `100.103.48.78:5201` for Tailscale). Run staggered to avoid contention. Tested 2026-03-31.
| Source | Link | NIC / Tunnel | Streams | Throughput | Retransmits |
|--------|------|-------------|---------|------------|-------------|
| Atlantis | LAN | 10GbE | 4 | 9.30 Gbps | 0 |
| Guava | LAN | 10GbE | 4 | 8.54 Gbps | 24 |
| Homelab-VM | LAN | 1GbE virtio | 1 | 940 Mbps | 0 |
| PVE | LAN | 1GbE | 1 | 938 Mbps | 0 |
| Pi-5 | LAN | 1GbE | 1 | 887 Mbps | 451 |
| Seattle | Tailscale | WireGuard | 1 | 454 Mbps | 410 |
| NUC | Tailscale | WireGuard | 1 | 252 Mbps | 2 |
Notes:
- 10GbE hosts (Atlantis, Guava) saturate REDACTED_APP_PASSWORD 4 parallel streams
- 1GbE hosts cap at ~940 Mbps as expected; Pi-5 slightly lower with retransmits
- Tailscale/WireGuard tunnel adds overhead: Seattle gets ~450 Mbps, NUC ~250 Mbps
- Calypso not tested (it's the server)