homelab-optimized/docs/troubleshooting/guava-smb-incident-2026-03-14.md

# Guava SMB Incident — 2026-03-14

**Affected host:** guava (TrueNAS SCALE, `100.75.252.64` / `192.168.0.100`)
**Affected client:** shinku-ryuu (Windows, `192.168.0.3`)
**Symptoms:** All SMB shares on guava unreachable from shinku after guava reboot

---

## Root Causes (two separate issues)

### 1. Tailscale app was STOPPED after reboot

Guava's Tailscale was running as an **orphaned host process** rather than the managed TrueNAS app. On reboot the orphan was gone and the app didn't start because it was in `STOPPED` state.

**Why it was stopped:** The app had been upgraded from v1.3.30 → v1.4.2. The new version's startup script ran `tailscale up` but failed because the stored state had `--accept-dns=false` while the app config had `accept_dns: true` — a mismatch that requires `--reset`. The app exited, leaving the old manually-started daemon running until the next reboot.

### 2. Tailscale `accept_routes: true` caused SMB replies to route via tunnel

After fixing the app startup, shinku still couldn't reach guava on the LAN. The cause:

- **Calypso** advertises `192.168.0.0/24` as a subnet route via Tailscale
- Guava had `accept_routes: true` — it installed Calypso's `192.168.0.0/24` route into Tailscale's policy routing table (table 52, priority 5270)
- When shinku sent a TCP SYN to guava port 445, it arrived on `enp1s0f0np0`
- Guava's reply looked up `192.168.0.3` in the routing table — hit table 52 first — and sent the reply **out via `tailscale0`** instead of the LAN
- The reply never reached shinku; the connection timed out

This also affected shinku: it had `accept_routes: true` as well, so it was routing traffic destined for `192.168.0.100` via Calypso's Tailscale tunnel rather than its local Ethernet interface.

---

## Fixes Applied

### Fix 1 — Tailscale app startup config

Updated the TrueNAS app config to match the node's actual desired state:

```bash
sudo midclt call app.update tailscale '{"values": {"tailscale": {
  "accept_dns": false,
  "accept_routes": false,
  "advertise_exit_node": true,
  "advertise_routes": [],
  "auth_key": "...",
  "auth_once": true,
  "hostname": "truenas-scale",
  "reset": true
}}}'
```

Key changes:
- `accept_dns: false` — matches the running state stored in Tailscale's state dir
- `accept_routes: false` — prevents guava from pulling in subnet routes from other nodes (see Fix 2)
- `reset: true` — clears the flag mismatch that was causing `tailscale up` to fail

**Saved in:** `/mnt/.ix-apps/app_configs/tailscale/versions/1.4.2/user_config.yaml`

### Fix 2 — Remove stale subnet routes from guava's routing table

After updating the app config the stale routes persisted in table 52. Removed manually:

```bash
sudo ip route del 192.168.0.0/24 dev tailscale0 table 52
sudo ip route del 192.168.12.0/24 dev tailscale0 table 52
sudo ip route del 192.168.68.0/22 dev tailscale0 table 52
sudo ip route del 192.168.69.0/24 dev tailscale0 table 52
```

With `accept_routes: false` now saved, these routes will not reappear on next reboot.

### Fix 3 — Disable accept_routes on shinku

Shinku was also accepting Calypso's `192.168.0.0/24` route (metric 0 via Tailscale, beating Ethernet 3's metric 256):

```
# Before fix — traffic to 192.168.0.100 went via Tailscale
192.168.0.0/24    100.100.100.100    0    Tailscale

# After fix — traffic goes via local LAN
192.168.0.0/24    0.0.0.0    256    Ethernet 3
```

Fixed by running on shinku:
```
tailscale up --accept-routes=false --login-server=https://headscale.vish.gg:8443
```

### Fix 4 — SMB password reset and credential cache

The SMB password for `vish` on guava was changed via the TrueNAS web UI. Windows had stale credentials cached. Fixed by:

1. Clearing Windows Credential Manager entry for `192.168.0.100`
2. Re-mapping shares from an interactive PowerShell session on shinku

---

## SMB Share Layout on Guava

| Windows drive | Share | Path on guava |
|--------------|-------|---------------|
| I: | `guava_turquoise` | `/mnt/data/guava_turquoise` |
| J: | `photos` | `/mnt/data/photos` |
| K: | `data` | `/mnt/data/passionfruit` |
| L: | `website` | `/mnt/data/website` |
| M: | `jellyfin` | `/mnt/data/jellyfin` |
| N: | `truenas-exporters` | `/mnt/data/truenas-exporters` |
| Q: | `iso` | `/mnt/data/iso` |

All shares use `vish` as the SMB user. Credentials stored in Windows Credential Manager under `192.168.0.100`.

---

## Diagnosis Commands

```bash
# Check Tailscale app state on guava
ssh guava "sudo midclt call app.query '[[\"name\",\"=\",\"tailscale\"]]' | python3 -c 'import sys,json; a=json.load(sys.stdin)[0]; print(a[\"name\"], a[\"state\"])'"

# Check for rogue subnet routes in Tailscale's routing table
ssh guava "ip route show table 52 | grep 192.168"

# Check tailscale container logs
ssh guava "sudo docker logs \$(sudo docker ps | grep tailscale | awk '{print \$1}' | head -1) 2>&1 | tail -20"

# Check SMB audit log for auth failures on guava
ssh guava "sudo journalctl -u smbd --since '1 hour ago' --no-pager | grep -i 'wrong_password\|STATUS'"

# Check which Tailscale peer is advertising a given subnet (run on any node)
tailscale status --json | python3 -c "
import sys, json
d = json.load(sys.stdin)
for peer in d.get('Peer', {}).values():
    routes = peer.get('PrimaryRoutes') or []
    if routes:
        print(peer['HostName'], routes)
"
```

---

## Prevention

- **Guava:** `accept_routes: false` is now saved in the TrueNAS app config — will survive reboots
- **Shinku:** `--accept-routes=false` set via `tailscale up` — survives reboots
- **General rule:** Hosts on the same LAN as the subnet-advertising node (Calypso → `192.168.0.0/24`) should have `accept_routes: false`, or the advertised subnet should be scoped to only nodes that need remote access to that LAN
- **TrueNAS app upgrades:** After upgrading the Tailscale app version, always check the new `user_config.yaml` to ensure `accept_dns`, `accept_routes`, and other flags match the node's actual running state. If unsure, set `reset: true` once to clear any stale state, then set it back to `false`