6.8 KiB
Steam Deck Runbook
SteamOS handheld — tailnet node with self-healing watchdog.
Headscale ID: 29
Status: 🟢 Online
Hardware: Steam Deck (AMD APU, SteamOS Holo, btrfs root)
Access: ssh deck (key-based, via ~/.ssh/config alias → 192.168.0.140)
Tailnet IP: 100.64.0.11 (MagicDNS deck.tail.vish.gg)
Overview
The Steam Deck participates in the homelab tailnet for SSH/remote access. Because SteamOS ships with an immutable /usr and a read-only /usr/local, all custom state lives in /etc (writable via overlay) and /opt (writable, outside Valve's managed tree).
Filesystem layout
| Path | Purpose | Survives SteamOS update? |
|---|---|---|
/opt/tailscale/tailscale, /opt/tailscale/tailscaled |
Tailscale binaries (standard Steam Deck install location) | Usually yes |
/etc/systemd/system/tailscaled.service + tailscaled.service.d/override.conf |
systemd unit + override pointing ExecStart at /opt/tailscale/tailscaled |
Overlay — may be wiped on major updates |
/etc/tailscale/authkey (0600 root) |
Reusable Headscale preauth key used by the watchdog for re-auth | Overlay — may be wiped |
/etc/tailscale/watchdog.sh (0755 root) |
Re-auth watchdog (bash + python3) | Overlay — may be wiped |
/etc/systemd/system/tailscale-watchdog.{service,timer} |
Watchdog systemd units | Overlay — may be wiped |
/etc/hosts |
Contains a pin <public-ip> headscale.vish.gg maintained by the watchdog |
Overlay — may be wiped |
/var/log/tailscale-watchdog.log |
Watchdog activity log | Yes (on /var) |
After any SteamOS upgrade, verify these files still exist:
ls /etc/tailscale/ /etc/systemd/system/tailscale-watchdog.*. If the overlay was reset, re-run the setup (seedocs/infrastructure/hosts/deck-runbook.mdsection "Recovering after SteamOS update").
Tailscale / Headscale
- Control server:
https://headscale.vish.gg:8443(migrated off public Tailscale 2026-04-19). - Preauth key: reusable, 1-year expiry, stored at
/etc/tailscale/authkey. Reusable so the watchdog can re-authenticate without human intervention. - Node expiry: registered nodes in Headscale do not auto-expire unless explicitly expired with
headscale nodes expire. If you want the0001-01-01sentinel (node-level "never expires"), that requires Headscale DB manipulation — not currently applied.
Watchdog behavior
/etc/tailscale/watchdog.sh runs every 5 minutes via the tailscale-watchdog.timer (OnBootSec=2min, OnUnitActiveSec=5min). Each tick:
- Calls
tailscale status --json, extractsBackendStatevia python3. - If
BackendStateisRunning, exits silently. - Otherwise (
NeedsLogin,Stopped,NoState, or daemon missing):- Refreshes the
/etc/hostspin forheadscale.vish.ggusing DNS-over-HTTPS (dns.google, fallback1.1.1.1). This is needed because the Deck has nodig/nslookup/host— onlypython3— and because the local resolver returns the internal LAN IP forheadscale.vish.ggwhen on-LAN (split-horizon DNS), which is useless when the Deck is travelling. - Re-runs
tailscale up --login-server=https://headscale.vish.gg:8443 --authkey=<stored> --accept-routes=false --hostname=deck. - Logs to
/var/log/tailscale-watchdog.log.
- Refreshes the
Verified failure-recovery matrix (2026-04-19)
| Failure | Recovery mechanism | Recovery time |
|---|---|---|
kill -9 tailscaled |
Restart=on-failure in tailscaled.service |
~3 s, PID rotated, state preserved |
tailscale down |
Watchdog detects Stopped, runs tailscale up |
~1 s after next timer tick (≤5 min) |
tailscale logout |
Watchdog detects NeedsLogin, runs tailscale up with stored authkey |
~4 s after next timer tick (≤5 min) |
| Boot | tailscaled auto-starts from /var/lib/tailscale/tailscaled.state; watchdog fires 2 min after boot as a safety net |
not yet validated |
Known gap
If tailscaled is stopped cleanly (systemctl stop tailscaled), the current watchdog logs "tailscaled not running" and tries tailscale up, which fails because the daemon socket is missing. On boot this is a non-issue (systemd starts tailscaled). During runtime, this would leave the Deck disconnected. If this becomes a problem, extend the watchdog to systemctl start tailscaled when pidof tailscaled is empty.
SSH
- Alias on homelab-vm:
~/.ssh/configentry →Host deck / HostName 192.168.0.140 / User deck / IdentityFile ~/.ssh/id_ed25519. - Installed key:
admin@thevish.ioed25519 pubkey in/home/deck/.ssh/authorized_keys. - Password (for sudo): same as initial login.
- MCP:
deckis inscripts/homelab-mcp/server.pySSH_KNOWN_HOSTS, sossh_exec(host="deck", …)works from the homelab MCP.
Recovering after a SteamOS update
If the /etc overlay was wiped:
# 1. Re-install key
cat ~/.ssh/id_ed25519.pub | sshpass -p '<password>' ssh -o StrictHostKeyChecking=accept-new deck@192.168.0.140 \
'mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys'
# 2. Restore systemd override for tailscaled
ssh deck 'echo <pw> | sudo -S mkdir -p /etc/systemd/system/tailscaled.service.d && \
echo -e "[Service]\nExecStartPre=\nExecStartPre=/opt/tailscale/tailscaled --cleanup\nExecStart=\nExecStart=/opt/tailscale/tailscaled --state=/var/lib/tailscale/tailscaled.state --socket=/run/tailscale/tailscaled.sock --port=\${PORT} \$FLAGS\nExecStopPost=\nExecStopPost=/opt/tailscale/tailscaled --cleanup" | sudo tee /etc/systemd/system/tailscaled.service.d/override.conf'
# 3. Restore /etc/hosts pin
ssh deck 'echo <pw> | sudo -S sh -c "grep -q headscale.vish.gg /etc/hosts || echo 184.23.52.14 headscale.vish.gg >> /etc/hosts"'
# 4. Create fresh reusable preauth key (via MCP headscale_create_preauth_key) and store it
AUTHKEY='hskey-auth-…' # pragma: allowlist secret (placeholder)
ssh deck 'echo <pw> | sudo -S sh -c "mkdir -p /etc/tailscale && umask 077 && printf %s \"$AUTHKEY\" > /etc/tailscale/authkey && chmod 600 /etc/tailscale/authkey"'
# 5. Reinstall watchdog (copy from git or re-apply from this runbook's source repo)
scp docs/infrastructure/hosts/deck/watchdog.sh deck:/tmp/
ssh deck 'echo <pw> | sudo -S install -m 0755 /tmp/watchdog.sh /etc/tailscale/watchdog.sh'
# 6. Reinstall + enable systemd units (see files/ directory)
scp docs/infrastructure/hosts/deck/tailscale-watchdog.{service,timer} deck:/tmp/
ssh deck 'echo <pw> | sudo -S sh -c "install -m 0644 /tmp/tailscale-watchdog.service /etc/systemd/system/ && install -m 0644 /tmp/tailscale-watchdog.timer /etc/systemd/system/ && systemctl daemon-reload && systemctl enable --now tailscaled.service tailscale-watchdog.timer"'
The watchdog script and systemd unit sources are checked in under docs/infrastructure/hosts/deck/ so a recovery doesn't require reconstructing them from memory.