146 lines
5.9 KiB
Markdown
146 lines
5.9 KiB
Markdown
# Guava SMB Incident — 2026-03-14
|
|
|
|
**Affected host:** guava (TrueNAS SCALE, `100.75.252.64` / `192.168.0.100`)
|
|
**Affected client:** shinku-ryuu (Windows, `192.168.0.3`)
|
|
**Symptoms:** All SMB shares on guava unreachable from shinku after guava reboot
|
|
|
|
---
|
|
|
|
## Root Causes (two separate issues)
|
|
|
|
### 1. Tailscale app was STOPPED after reboot
|
|
|
|
Guava's Tailscale was running as an **orphaned host process** rather than the managed TrueNAS app. On reboot the orphan was gone and the app didn't start because it was in `STOPPED` state.
|
|
|
|
**Why it was stopped:** The app had been upgraded from v1.3.30 → v1.4.2. The new version's startup script ran `tailscale up` but failed because the stored state had `--accept-dns=false` while the app config had `accept_dns: true` — a mismatch that requires `--reset`. The app exited, leaving the old manually-started daemon running until the next reboot.
|
|
|
|
### 2. Tailscale `accept_routes: true` caused SMB replies to route via tunnel
|
|
|
|
After fixing the app startup, shinku still couldn't reach guava on the LAN. The cause:
|
|
|
|
- **Calypso** advertises `192.168.0.0/24` as a subnet route via Tailscale
|
|
- Guava had `accept_routes: true` — it installed Calypso's `192.168.0.0/24` route into Tailscale's policy routing table (table 52, priority 5270)
|
|
- When shinku sent a TCP SYN to guava port 445, it arrived on `enp1s0f0np0`
|
|
- Guava's reply looked up `192.168.0.3` in the routing table — hit table 52 first — and sent the reply **out via `tailscale0`** instead of the LAN
|
|
- The reply never reached shinku; the connection timed out
|
|
|
|
This also affected shinku: it had `accept_routes: true` as well, so it was routing traffic destined for `192.168.0.100` via Calypso's Tailscale tunnel rather than its local Ethernet interface.
|
|
|
|
---
|
|
|
|
## Fixes Applied
|
|
|
|
### Fix 1 — Tailscale app startup config
|
|
|
|
Updated the TrueNAS app config to match the node's actual desired state:
|
|
|
|
```bash
|
|
sudo midclt call app.update tailscale '{"values": {"tailscale": {
|
|
"accept_dns": false,
|
|
"accept_routes": false,
|
|
"advertise_exit_node": true,
|
|
"advertise_routes": [],
|
|
"auth_key": "...",
|
|
"auth_once": true,
|
|
"hostname": "truenas-scale",
|
|
"reset": true
|
|
}}}'
|
|
```
|
|
|
|
Key changes:
|
|
- `accept_dns: false` — matches the running state stored in Tailscale's state dir
|
|
- `accept_routes: false` — prevents guava from pulling in subnet routes from other nodes (see Fix 2)
|
|
- `reset: true` — clears the flag mismatch that was causing `tailscale up` to fail
|
|
|
|
**Saved in:** `/mnt/.ix-apps/app_configs/tailscale/versions/1.4.2/user_config.yaml`
|
|
|
|
### Fix 2 — Remove stale subnet routes from guava's routing table
|
|
|
|
After updating the app config the stale routes persisted in table 52. Removed manually:
|
|
|
|
```bash
|
|
sudo ip route del 192.168.0.0/24 dev tailscale0 table 52
|
|
sudo ip route del 192.168.12.0/24 dev tailscale0 table 52
|
|
sudo ip route del 192.168.68.0/22 dev tailscale0 table 52
|
|
sudo ip route del 192.168.69.0/24 dev tailscale0 table 52
|
|
```
|
|
|
|
With `accept_routes: false` now saved, these routes will not reappear on next reboot.
|
|
|
|
### Fix 3 — Disable accept_routes on shinku
|
|
|
|
Shinku was also accepting Calypso's `192.168.0.0/24` route (metric 0 via Tailscale, beating Ethernet 3's metric 256):
|
|
|
|
```
|
|
# Before fix — traffic to 192.168.0.100 went via Tailscale
|
|
192.168.0.0/24 100.100.100.100 0 Tailscale
|
|
|
|
# After fix — traffic goes via local LAN
|
|
192.168.0.0/24 0.0.0.0 256 Ethernet 3
|
|
```
|
|
|
|
Fixed by running on shinku:
|
|
```
|
|
tailscale up --accept-routes=false --login-server=https://headscale.vish.gg:8443
|
|
```
|
|
|
|
### Fix 4 — SMB password reset and credential cache
|
|
|
|
The SMB password for `vish` on guava was changed via the TrueNAS web UI. Windows had stale credentials cached. Fixed by:
|
|
|
|
1. Clearing Windows Credential Manager entry for `192.168.0.100`
|
|
2. Re-mapping shares from an interactive PowerShell session on shinku
|
|
|
|
---
|
|
|
|
## SMB Share Layout on Guava
|
|
|
|
| Windows drive | Share | Path on guava |
|
|
|--------------|-------|---------------|
|
|
| I: | `guava_turquoise` | `/mnt/data/guava_turquoise` |
|
|
| J: | `photos` | `/mnt/data/photos` |
|
|
| K: | `data` | `/mnt/data/passionfruit` |
|
|
| L: | `website` | `/mnt/data/website` |
|
|
| M: | `jellyfin` | `/mnt/data/jellyfin` |
|
|
| N: | `truenas-exporters` | `/mnt/data/truenas-exporters` |
|
|
| Q: | `iso` | `/mnt/data/iso` |
|
|
|
|
All shares use `vish` as the SMB user. Credentials stored in Windows Credential Manager under `192.168.0.100`.
|
|
|
|
---
|
|
|
|
## Diagnosis Commands
|
|
|
|
```bash
|
|
# Check Tailscale app state on guava
|
|
ssh guava "sudo midclt call app.query '[[\"name\",\"=\",\"tailscale\"]]' | python3 -c 'import sys,json; a=json.load(sys.stdin)[0]; print(a[\"name\"], a[\"state\"])'"
|
|
|
|
# Check for rogue subnet routes in Tailscale's routing table
|
|
ssh guava "ip route show table 52 | grep 192.168"
|
|
|
|
# Check tailscale container logs
|
|
ssh guava "sudo docker logs \$(sudo docker ps | grep tailscale | awk '{print \$1}' | head -1) 2>&1 | tail -20"
|
|
|
|
# Check SMB audit log for auth failures on guava
|
|
ssh guava "sudo journalctl -u smbd --since '1 hour ago' --no-pager | grep -i 'wrong_password\|STATUS'"
|
|
|
|
# Check which Tailscale peer is advertising a given subnet (run on any node)
|
|
tailscale status --json | python3 -c "
|
|
import sys, json
|
|
d = json.load(sys.stdin)
|
|
for peer in d.get('Peer', {}).values():
|
|
routes = peer.get('PrimaryRoutes') or []
|
|
if routes:
|
|
print(peer['HostName'], routes)
|
|
"
|
|
```
|
|
|
|
---
|
|
|
|
## Prevention
|
|
|
|
- **Guava:** `accept_routes: false` is now saved in the TrueNAS app config — will survive reboots
|
|
- **Shinku:** `--accept-routes=false` set via `tailscale up` — survives reboots
|
|
- **General rule:** Hosts on the same LAN as the subnet-advertising node (Calypso → `192.168.0.0/24`) should have `accept_routes: false`, or the advertised subnet should be scoped to only nodes that need remote access to that LAN
|
|
- **TrueNAS app upgrades:** After upgrading the Tailscale app version, always check the new `user_config.yaml` to ensure `accept_dns`, `accept_routes`, and other flags match the node's actual running state. If unsure, set `reset: true` once to clear any stale state, then set it back to `false`
|