Sanitized mirror from private repository - 2026-03-26 10:25:55 UTC
This commit is contained in:
108
docs/networking/GUAVA_LAN_ROUTING_FIX.md
Normal file
108
docs/networking/GUAVA_LAN_ROUTING_FIX.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# LAN Routing Fix: Tailscale 10GbE Throughput Issue
|
||||
|
||||
## Problem
|
||||
|
||||
Hosts with host-level Tailscale on the `192.168.0.0/24` LAN have their local traffic intercepted by Tailscale's policy routing table 52. Instead of going directly over the physical 10GbE link, traffic gets routed through the WireGuard tunnel via Calypso's advertised `192.168.0.0/24` subnet route.
|
||||
|
||||
### Root Cause
|
||||
|
||||
Calypso (Headscale node ID:12) advertises `192.168.0.0/24` as a subnet route so remote nodes (Moon, Seattle, NUC) can reach LAN devices over Tailscale. However, machines that are **already on** that LAN also accept this route into Tailscale's routing table 52 (ip rule priority 5270), causing local traffic to hairpin through the tunnel.
|
||||
|
||||
Diagnosis:
|
||||
```bash
|
||||
# Shows traffic going through tailscale0 instead of the physical NIC
|
||||
ip route get 192.168.0.200
|
||||
# → 192.168.0.200 dev tailscale0 table 52 src 100.75.252.64
|
||||
|
||||
# Table 52 has the LAN subnet routed through Tailscale
|
||||
ip route show table 52 | grep 192.168.0
|
||||
# → 192.168.0.0/24 dev tailscale0
|
||||
```
|
||||
|
||||
### Affected Hosts
|
||||
|
||||
| Host | LAN IP | Physical NIC | Affected? |
|
||||
|---|---|---|---|
|
||||
| Guava (TrueNAS) | 192.168.0.100 | enp1s0f0np0 (10GbE) | **YES** — fixed |
|
||||
| homelab-vm | 192.168.0.210 | ens18 | **YES** — fixed |
|
||||
| Atlantis | 192.168.0.200 | eth2/ovs_eth2 (10GbE) | No (Synology OVS) |
|
||||
| Calypso | 192.168.0.250 | ovs_eth2 | No (Synology OVS) |
|
||||
| Pi-5 | 192.168.0.66 | eth0 | No (not accepting route) |
|
||||
| NUC | 192.168.68.100 | eno1 | No (different subnet) |
|
||||
|
||||
### Measured Impact (Guava → Atlantis)
|
||||
|
||||
| Route | Throughput | Retransmits |
|
||||
|---|---|---|
|
||||
| Before fix (via Tailscale) | 1.39 Gbps | 6,891 |
|
||||
| After fix (direct LAN) | **7.61 Gbps** | 5,066 |
|
||||
|
||||
**5.5x improvement** — from WireGuard-encapsulated tunnel to direct 10GbE.
|
||||
|
||||
## Fix Applied
|
||||
|
||||
Add an ip policy rule at priority 5200 (before Tailscale's table 52 at 5270) that forces LAN traffic to use the main routing table, which routes via the physical NIC:
|
||||
|
||||
```bash
|
||||
sudo ip rule add to 192.168.0.0/24 lookup main priority 5200
|
||||
```
|
||||
|
||||
This means: for any traffic destined to `192.168.0.0/24`, check the main table first. The main table has `192.168.0.0/24 dev <physical-nic>`, so traffic goes direct. All Tailscale traffic to `100.x.x.x` nodes is unaffected.
|
||||
|
||||
### Verification
|
||||
|
||||
```bash
|
||||
# Should show physical NIC, not tailscale0
|
||||
ip route get 192.168.0.200
|
||||
|
||||
# Should get sub-1ms ping
|
||||
ping -c 3 192.168.0.200
|
||||
|
||||
# Confirm rule is in place
|
||||
ip rule show | grep 5200
|
||||
```
|
||||
|
||||
### Revert
|
||||
|
||||
```bash
|
||||
sudo ip rule del to 192.168.0.0/24 lookup main priority 5200
|
||||
```
|
||||
|
||||
## Persistence
|
||||
|
||||
### Guava (TrueNAS)
|
||||
|
||||
Init script added via TrueNAS API (ID: 2):
|
||||
- **Type:** COMMAND
|
||||
- **When:** POSTINIT
|
||||
- **Command:** `ip rule add to 192.168.0.0/24 lookup main priority 5200`
|
||||
- **Comment:** Bypass Tailscale routing for LAN traffic (direct 10GbE)
|
||||
|
||||
Manage via TrueNAS UI: **System → Advanced → Init/Shutdown Scripts**
|
||||
|
||||
### homelab-vm (Ubuntu 24.04)
|
||||
|
||||
Systemd service at `/etc/systemd/system/lan-route-fix.service`:
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Ensure LAN traffic bypasses Tailscale routing table
|
||||
After=network-online.target tailscaled.service
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
ExecStart=/sbin/ip rule add to 192.168.0.0/24 lookup main priority 5200
|
||||
ExecStop=/sbin/ip rule del to 192.168.0.0/24 lookup main priority 5200
|
||||
RemainAfterExit=yes
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Enabled with `sudo systemctl enable lan-route-fix.service`.
|
||||
|
||||
## Notes
|
||||
|
||||
- Remote nodes (Moon, Seattle, NUC) that are **not** on `192.168.0.0/24` are unaffected — they correctly use Calypso's subnet route to reach LAN devices via Tailscale.
|
||||
- If a new host is added to the LAN with host-level Tailscale, the same fix will need to be applied.
|
||||
- The Synology boxes (Atlantis, Calypso) use Open vSwitch bridging and don't exhibit this issue.
|
||||
79
docs/networking/SSH_MESH.md
Normal file
79
docs/networking/SSH_MESH.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# SSH Mesh — Key-Based Authentication Across All Hosts
|
||||
|
||||
All Tailscale-connected hosts can SSH to each other using ed25519 key authentication.
|
||||
No passwords needed.
|
||||
|
||||
## Participating Hosts
|
||||
|
||||
| Host | User | Tailscale IP | SSH Port | Key |
|
||||
|------|------|-------------|----------|-----|
|
||||
| homelab-vm | homelab | 100.67.40.126 | 22 | admin@thevish.io |
|
||||
| atlantis | vish | 100.83.230.112 | 60000 | vish@atlantis |
|
||||
| calypso | Vish | 100.103.48.78 | 62000 | calypso access |
|
||||
| guava | vish | 100.75.252.64 | 22 | vish@guava |
|
||||
| setillo | vish | 100.125.0.20 | 22 | setillo-key |
|
||||
| pi-5 | vish | 100.77.151.40 | 22 | vish@pi-5 |
|
||||
| nuc | vish | 100.72.55.21 | 22 | vish@nuc |
|
||||
| moon | vish | 100.64.0.6 | 22 | vish@moon |
|
||||
| seattle | root | 100.82.197.124 | 22 | root@seattle |
|
||||
| matrix-ubuntu | test | 100.85.21.51 | 22 | test@matrix-ubuntu |
|
||||
| jellyfish | lulu | 100.69.121.120 | 22 | lulu@jellyfish |
|
||||
| pve | root | 100.87.12.28 | 22 | root@pve (RSA) |
|
||||
| gl-mt3000 | root | 100.126.243.15 | 22 | (admin key only) |
|
||||
| gl-be3600 | root | 100.105.59.123 | 22 | root@gl-be3600 |
|
||||
|
||||
The **admin key** (`admin@thevish.io` from homelab-vm) is present on every host.
|
||||
|
||||
## Ansible Playbook
|
||||
|
||||
Manage the mesh with `ansible/playbooks/ssh_mesh.yml`:
|
||||
|
||||
```bash
|
||||
# Distribute keys to all hosts (collect + push)
|
||||
ansible-playbook -i inventory.yml playbooks/ssh_mesh.yml --tags distribute
|
||||
|
||||
# Verify connectivity from localhost
|
||||
ansible-playbook -i inventory.yml playbooks/ssh_mesh.yml --tags verify
|
||||
|
||||
# Generate missing keys + distribute
|
||||
ansible-playbook -i inventory.yml playbooks/ssh_mesh.yml -e "generate_missing=true"
|
||||
```
|
||||
|
||||
The `ssh_mesh` group in `inventory.yml` defines which hosts participate.
|
||||
|
||||
## Adding a New Host
|
||||
|
||||
1. Add the host to `ansible/inventory.yml` under the appropriate group and to the `ssh_mesh` children
|
||||
2. Run the playbook with key generation:
|
||||
```bash
|
||||
ansible-playbook -i inventory.yml playbooks/ssh_mesh.yml -e "generate_missing=true"
|
||||
```
|
||||
3. This will generate a key on the new host if needed, collect all keys, and distribute them everywhere
|
||||
|
||||
## Notes
|
||||
|
||||
- **Synology NAS (Atlantis/Calypso/Setillo)**: Home directory must be `chmod 755` or stricter — SSH refuses key auth if home is world-writable. DSM can reset permissions on reboot.
|
||||
- **OpenWrt routers (MT3000/BE3600)**: Use dropbear SSH, not OpenSSH. Keys must be in both `/etc/dropbear/authorized_keys` AND `/root/.ssh/authorized_keys`. Key auth works but `ssh -o` flags differ slightly.
|
||||
- **GL-BE3600 in repeater mode**: SSH port 22 is accessible via Tailscale only — LAN SSH is blocked by the repeater firewall. Use `100.105.59.123` not `192.168.68.1`.
|
||||
- **TrueNAS (Guava)**: Home directory is at `/mnt/data/vish-home/vish/`, not `/home/vish/`.
|
||||
- **pi-5-kevin**: Frequently offline — will fail verification but has keys distributed.
|
||||
- **homelab-vm**: SSH config historically uses password auth to itself; key auth works to all other hosts.
|
||||
- **rsync to Atlantis**: rsync from homelab-vm to Atlantis fails (Synology SSH subsystem issue). Use `scp -O -r -P 60000` instead, or pull from Atlantis.
|
||||
|
||||
## Router Tailscale Auto-Start
|
||||
|
||||
Both GL.iNet routers have init scripts to auto-connect to Headscale on boot:
|
||||
|
||||
**GL-MT3000** (`/etc/init.d/tailscale-up`, START=81):
|
||||
```sh
|
||||
tailscale up --accept-routes --login-server=https://headscale.vish.gg:8443 --accept-dns=false --advertise-routes=192.168.12.0/24
|
||||
```
|
||||
|
||||
**GL-BE3600** (`/etc/init.d/tailscale-up`, START=99):
|
||||
- Waits for network connectivity (repeater mode needs WiFi first)
|
||||
- Polls every 2s for up to 120s before running `tailscale up`
|
||||
- Advertises `192.168.68.0/22,192.168.8.0/24`
|
||||
|
||||
Update script on both: `/root/update-tailscale.sh` (Admon's GL.iNet updater, use `--force` for non-interactive).
|
||||
|
||||
## Established 2026-03-23, updated 2026-03-24
|
||||
Reference in New Issue
Block a user