Sanitized mirror from private repository - 2026-04-16 07:12:52 UTC
This commit is contained in:
139
docs/networking/TAILSCALE_MESH_TEST.md
Normal file
139
docs/networking/TAILSCALE_MESH_TEST.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Tailscale Mesh Connectivity Test
|
||||
|
||||
Last tested: 2026-03-31
|
||||
|
||||
## Test Method
|
||||
|
||||
Full `tailscale ping` from each SSH-accessible host to all other online Headscale nodes. This tests WireGuard tunnel establishment and reports whether the connection is direct (peer-to-peer) or relayed via DERP.
|
||||
|
||||
## Results
|
||||
|
||||
All 10 online hosts can reach all other online hosts. No failures.
|
||||
|
||||
### Connection Type Matrix
|
||||
|
||||
`D` = direct, `R` = DERP relay, `—` = self
|
||||
|
||||
| From \ To | Atlantis | Calypso | Pi-5 | Homelab | Matrix-Ubuntu | Setillo | NUC | Guava | Seattle | PVE |
|
||||
|-----------|----------|---------|------|---------|---------------|---------|-----|-------|---------|-----|
|
||||
| **Atlantis** | — | D | D | D | D | D | D | D | D | D |
|
||||
| **Calypso** | D | — | D | D | D | R | D | D | D | D |
|
||||
| **Pi-5** | D | D | — | D | D | D | D | D | D | D |
|
||||
| **Homelab-VM** | D | D | D | — | D | R | D | D | D | D |
|
||||
| **Matrix-Ubuntu** | (tested inbound) | | | | — | | | | | |
|
||||
| **Setillo** | D | D | D | R | R | — | | | | |
|
||||
| **NUC** | D | D | D | D | R | D | — | R | D | R |
|
||||
| **Guava** | (no CLI) | | | | | | | — | | |
|
||||
| **Seattle** | D | D | D | D | D | D | R | D | — | D |
|
||||
| **PVE** | D | D | D | D | D | D | D | D | D | — |
|
||||
|
||||
### Notes
|
||||
|
||||
- **Atlantis/Calypso**: Tailscale binary at `/var/packages/Tailscale/target/bin/tailscale` (Synology package)
|
||||
- **Setillo**: Tailscale binary at `/usr/local/bin/tailscale`
|
||||
- **Guava**: Tailscale runs via TrueNAS built-in management; no `tailscale` CLI in PATH. Confirmed reachable via inbound pings from all other hosts.
|
||||
- **Matrix-Ubuntu**: SSH via LAN IP (192.168.0.154) was unreliable during testing due to table 52 LAN interception (since fixed). Confirmed reachable via Tailscale IP and inbound pings.
|
||||
- **DERP relays** (NUC ↔ some peers, Setillo ↔ homelab/matrix-ubuntu): Normal for nodes behind different NATs. Adds 15-60ms latency but does not affect reliability. Connections may upgrade to direct over time.
|
||||
|
||||
### ICMP Ping Notes
|
||||
|
||||
Standard ICMP ping from the OS (not `tailscale ping`) fails for Atlantis, Calypso, and Setillo because those hosts have ICMP blocked at the firewall level. This does not indicate a connectivity problem — `tailscale ping` and SSH both work.
|
||||
|
||||
## Tailscale Health Warnings
|
||||
|
||||
After fixes applied on 2026-03-31:
|
||||
|
||||
| Host | Health |
|
||||
|------|--------|
|
||||
| Homelab-VM | none |
|
||||
| Pi-5 | none |
|
||||
| NUC | none |
|
||||
| Seattle | none |
|
||||
| PVE | none (was `--accept-routes is false`, fixed) |
|
||||
| Matrix-Ubuntu | none (was `--accept-routes is false`, fixed) |
|
||||
|
||||
## Fixes Applied (2026-03-31)
|
||||
|
||||
### LAN Routing (table 52 interception)
|
||||
|
||||
1. **Pi-5**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via dispatcher script + cron
|
||||
2. **Matrix-Ubuntu**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via dispatcher script + cron. Enabling `--accept-routes` without the rule broke LAN connectivity (SSH via 192.168.0.154 timed out).
|
||||
3. **PVE**: Enabled `--accept-routes`, added LAN routing rule (priority 5200), persistent via cron @reboot
|
||||
|
||||
See [LAN Routing Fix](GUAVA_LAN_ROUTING_FIX.md) for full details on the table 52 issue.
|
||||
|
||||
### Kuma monitors
|
||||
|
||||
- Switched **all 20 Calypso monitors** from Tailscale IP (`100.103.48.78`) to LAN IP (`192.168.0.250`) in the Kuma SQLite DB. Pi-5 (where Kuma runs) is on the same LAN, so using Tailscale IPs added unnecessary fragility.
|
||||
- Added LAN-based monitors for Rustdesk (ID:124) and Rackula (ID:125).
|
||||
- Fixed corrupted `accepted_statuscodes_json` field (`[200-299]` → `["200-299"]`) that prevented the Kuma UI from loading.
|
||||
- Fixed ntfy notifications by setting `primaryBaseURL` to `https://kuma.vish.gg` — the "view" action button was missing a URL.
|
||||
|
||||
### Calypso Tailscale 5-minute disconnect (root cause)
|
||||
|
||||
**Symptom**: Calypso's disco key rotated every 5 minutes, dropping all peer WireGuard sessions for ~30 seconds.
|
||||
|
||||
**Root cause**: A cron job in `/etc/crontab` ran `/usr/local/bin/tailscale-fix.sh` every 5 minutes (`*/5 * * * *`). The script checked for the `tailscale0` TUN device, but Calypso runs Tailscale in `--tun=userspace-networking` mode (Synology DSM7), which has no TUN device. The script also checked `tailscale status --json` which returned empty state when run as the `tailscale` user. So every 5 minutes:
|
||||
|
||||
1. Script detects "tailscale0 missing" or "state empty"
|
||||
2. Runs `configure-host` + full service restart via `synosystemctl`
|
||||
3. Re-authenticates with `tailscale up --reset`
|
||||
4. New disco key generated → all peers tear down and re-establish connections
|
||||
|
||||
**Fix**: Rewrote `/usr/local/bin/tailscale-fix.sh` to check if `tailscaled` process is running and can `tailscale ping` a known peer (Atlantis), which works in both TUN and userspace-networking modes.
|
||||
|
||||
**Additional changes on Calypso** (not the root cause but good hygiene):
|
||||
- Disabled Docker IPv6 on all bridge interfaces via `sysctl` (77 routes removed)
|
||||
- Updated `dockerd.json` with `"ipv6": false, "ip6tables": false` (persistent after Docker restart)
|
||||
- Added `TS_DEBUG_NETMON_SKIP_INTERFACE_REGEXPS` and `TS_DEBUG_NETMON_NO_ROUTE_MONITORING` env vars to Tailscale startup script
|
||||
- Added `/etc/hosts` entry: `192.168.0.250 headscale.vish.gg` to avoid hairpin NAT for control plane
|
||||
|
||||
### GL.iNet router Tailscale fixes
|
||||
|
||||
**GL-MT3000** (Hawaii, via Jellyfish at `192.168.12.1`):
|
||||
- Tailscale was pointed at `https://controlplane.tailscale.com` (official) instead of Headscale
|
||||
- Re-registered with preauth key: `tailscale up --login-server=https://headscale.vish.gg:8443 --authkey=<key> --hostname=gl-mt3000 --accept-routes --accept-dns=false --advertise-routes=192.168.12.0/24 --reset`
|
||||
- Auto-start verified: `/etc/init.d/tailscale-up` (S81) runs `tailscale up` with correct flags after 10s delay
|
||||
- State file at `/etc/tailscale/tailscaled.state` has `WantRunning: true`, `ControlURL: headscale.vish.gg:8443`
|
||||
|
||||
**GL-BE3600** (Home gateway at `192.168.0.1`):
|
||||
- Was already connected to Headscale correctly, no fix needed
|
||||
- SSH not available on LAN port 22 — accessible via Tailscale IP (`100.105.59.123`)
|
||||
- Auto-start verified: `/etc/init.d/tailscale-up` (S99) waits up to 120s for network, then connects with `--advertise-routes=192.168.68.0/22,192.168.8.0/24`
|
||||
|
||||
### Speedtest results (Ookla official CLI, staggered)
|
||||
|
||||
Tests run sequentially per WAN to avoid bandwidth contention. Tested 2026-03-31.
|
||||
|
||||
| Host | Location | NIC | ISP | Download | Upload | Latency | Loss |
|
||||
|------|----------|-----|-----|----------|--------|---------|------|
|
||||
| Calypso | Home | 10GbE | Sonic | 8,080 Mbps | 6,847 Mbps | 3 ms | 0% |
|
||||
| Atlantis | Home | 10GbE | Sonic | 6,079 Mbps | 6,908 Mbps | 3 ms | 0% |
|
||||
| Matrix-Ubuntu | Home | virtio (on Atlantis) | Sonic | 3,537 Mbps | 3,666 Mbps | 4 ms | 0% |
|
||||
| GL-MT3000 | Hawaii | 1GbE | Spectrum | 1,051 Mbps | 37 Mbps | 8 ms | 0% |
|
||||
| NUC | Concord | 1GbE | Comcast | 943 Mbps | 357 Mbps | 14 ms | 3.7% |
|
||||
| GL-BE3600 | Home (gateway) | 2.5GbE | Comcast | 781 Mbps | 71 Mbps | 25 ms | 0% |
|
||||
| Homelab-VM | Home | 1GbE virtio | Sonic | 616 Mbps | 933 Mbps | 3 ms | 0% |
|
||||
| Pi-5 | Home | 1GbE | Sonic | 551 Mbps | 936 Mbps | 3 ms | 0% |
|
||||
| Seattle VPS | Seattle | VPS | Nubes LLC | 197 Mbps | 197 Mbps | 1 ms | 0% |
|
||||
| Setillo | Remote | Starlink | Starlink | 72 Mbps | 27 Mbps | 27 ms | 0% |
|
||||
|
||||
### iperf3 benchmarks (all hosts → Calypso)
|
||||
|
||||
All tests against Calypso's iperf3 server (`192.168.0.250:5201` for LAN, `100.103.48.78:5201` for Tailscale). Run staggered to avoid contention. Tested 2026-03-31.
|
||||
|
||||
| Source | Link | NIC / Tunnel | Streams | Throughput | Retransmits |
|
||||
|--------|------|-------------|---------|------------|-------------|
|
||||
| Atlantis | LAN | 10GbE | 4 | 9.30 Gbps | 0 |
|
||||
| Guava | LAN | 10GbE | 4 | 8.54 Gbps | 24 |
|
||||
| Homelab-VM | LAN | 1GbE virtio | 1 | 940 Mbps | 0 |
|
||||
| PVE | LAN | 1GbE | 1 | 938 Mbps | 0 |
|
||||
| Pi-5 | LAN | 1GbE | 1 | 887 Mbps | 451 |
|
||||
| Seattle | Tailscale | WireGuard | 1 | 454 Mbps | 410 |
|
||||
| NUC | Tailscale | WireGuard | 1 | 252 Mbps | 2 |
|
||||
|
||||
Notes:
|
||||
- 10GbE hosts (Atlantis, Guava) saturate REDACTED_APP_PASSWORD 4 parallel streams
|
||||
- 1GbE hosts cap at ~940 Mbps as expected; Pi-5 slightly lower with retransmits
|
||||
- Tailscale/WireGuard tunnel adds overhead: Seattle gets ~450 Mbps, NUC ~250 Mbps
|
||||
- Calypso not tested (it's the server)
|
||||
Reference in New Issue
Block a user