Steam Deck Runbook

SteamOS handheld — tailnet node with self-healing watchdog.

Headscale ID: 29 Status: 🟢 Online Hardware: Steam Deck (AMD APU, SteamOS Holo, btrfs root) Access: ssh deck (key-based, via ~/.ssh/config alias → 192.168.0.140) Tailnet IP: 100.64.0.11 (MagicDNS deck.tail.vish.gg)

Overview

The Steam Deck participates in the homelab tailnet for SSH/remote access. Because SteamOS ships with an immutable /usr and a read-only /usr/local, all custom state lives in /etc (writable via overlay) and /opt (writable, outside Valve's managed tree).

Filesystem layout

Path	Purpose	Survives SteamOS update?
`/opt/tailscale/tailscale`, `/opt/tailscale/tailscaled`	Tailscale binaries (standard Steam Deck install location)	Usually yes
`/etc/systemd/system/tailscaled.service` + `tailscaled.service.d/override.conf`	systemd unit + override pointing ExecStart at `/opt/tailscale/tailscaled`	Overlay — may be wiped on major updates
`/etc/tailscale/authkey` (0600 root)	Reusable Headscale preauth key used by the watchdog for re-auth	Overlay — may be wiped
`/etc/tailscale/watchdog.sh` (0755 root)	Re-auth watchdog (bash + python3)	Overlay — may be wiped
`/etc/systemd/system/tailscale-watchdog.{service,timer}`	Watchdog systemd units	Overlay — may be wiped
`/etc/hosts`	Contains a pin `<public-ip> headscale.vish.gg` maintained by the watchdog	Overlay — may be wiped
`/var/log/tailscale-watchdog.log`	Watchdog activity log	Yes (on /var)

After any SteamOS upgrade, verify these files still exist: ls /etc/tailscale/ /etc/systemd/system/tailscale-watchdog.*. If the overlay was reset, re-run the setup (see docs/infrastructure/hosts/deck-runbook.md section "Recovering after SteamOS update").

Tailscale / Headscale

Control server: https://headscale.vish.gg:8443 (migrated off public Tailscale 2026-04-19).
Preauth key: reusable, 1-year expiry, stored at /etc/tailscale/authkey. Reusable so the watchdog can re-authenticate without human intervention.
Node expiry: registered nodes in Headscale do not auto-expire unless explicitly expired with headscale nodes expire. If you want the 0001-01-01 sentinel (node-level "never expires"), that requires Headscale DB manipulation — not currently applied.

Watchdog behavior

/etc/tailscale/watchdog.sh runs every 5 minutes via the tailscale-watchdog.timer (OnBootSec=2min, OnUnitActiveSec=5min). Each tick:

Calls tailscale status --json, extracts BackendState via python3.
If BackendState is Running, exits silently.
Otherwise (NeedsLogin, Stopped, NoState, or daemon missing):
- Refreshes the /etc/hosts pin for headscale.vish.gg using DNS-over-HTTPS (dns.google, fallback 1.1.1.1). This is needed because the Deck has no dig/nslookup/host — only python3 — and because the local resolver returns the internal LAN IP for headscale.vish.gg when on-LAN (split-horizon DNS), which is useless when the Deck is travelling.
- Re-runs tailscale up --login-server=https://headscale.vish.gg:8443 --authkey=<stored> --accept-routes=false --hostname=deck.
- Logs to /var/log/tailscale-watchdog.log.

Verified failure-recovery matrix (2026-04-19)

Failure	Recovery mechanism	Recovery time
`kill -9 tailscaled`	`Restart=on-failure` in tailscaled.service	~3 s, PID rotated, state preserved
`tailscale down`	Watchdog detects `Stopped`, runs `tailscale up`	~1 s after next timer tick (≤5 min)
`tailscale logout`	Watchdog detects `NeedsLogin`, runs `tailscale up` with stored authkey	~4 s after next timer tick (≤5 min)
Boot	tailscaled auto-starts from `/var/lib/tailscale/tailscaled.state`; watchdog fires 2 min after boot as a safety net	not yet validated

Known gap

If tailscaled is stopped cleanly (systemctl stop tailscaled), the current watchdog logs "tailscaled not running" and tries tailscale up, which fails because the daemon socket is missing. On boot this is a non-issue (systemd starts tailscaled). During runtime, this would leave the Deck disconnected. If this becomes a problem, extend the watchdog to systemctl start tailscaled when pidof tailscaled is empty.

SSH

Alias on homelab-vm: ~/.ssh/config entry → Host deck / HostName 192.168.0.140 / User deck / IdentityFile ~/.ssh/id_ed25519.
Installed key: admin@thevish.io ed25519 pubkey in /home/deck/.ssh/authorized_keys.
Password (for sudo): same as initial login.
MCP: deck is in scripts/homelab-mcp/server.py SSH_KNOWN_HOSTS, so ssh_exec(host="deck", …) works from the homelab MCP.

Recovering after a SteamOS update

If the /etc overlay was wiped:

# 1. Re-install key
cat ~/.ssh/id_ed25519.pub | sshpass -p '<password>' ssh -o StrictHostKeyChecking=accept-new deck@192.168.0.140 \
  'mkdir -p ~/.ssh && chmod 700 ~/.ssh && cat >> ~/.ssh/authorized_keys && chmod 600 ~/.ssh/authorized_keys'

# 2. Restore systemd override for tailscaled
ssh deck 'echo <pw> | sudo -S mkdir -p /etc/systemd/system/tailscaled.service.d && \
  echo -e "[Service]\nExecStartPre=\nExecStartPre=/opt/tailscale/tailscaled --cleanup\nExecStart=\nExecStart=/opt/tailscale/tailscaled --state=/var/lib/tailscale/tailscaled.state --socket=/run/tailscale/tailscaled.sock --port=\${PORT} \$FLAGS\nExecStopPost=\nExecStopPost=/opt/tailscale/tailscaled --cleanup" | sudo tee /etc/systemd/system/tailscaled.service.d/override.conf'

# 3. Restore /etc/hosts pin
ssh deck 'echo <pw> | sudo -S sh -c "grep -q headscale.vish.gg /etc/hosts || echo 184.23.52.14 headscale.vish.gg >> /etc/hosts"'

# 4. Create fresh reusable preauth key (via MCP headscale_create_preauth_key) and store it
AUTHKEY='hskey-auth-…'  # pragma: allowlist secret (placeholder)
ssh deck 'echo <pw> | sudo -S sh -c "mkdir -p /etc/tailscale && umask 077 && printf %s \"$AUTHKEY\" > /etc/tailscale/authkey && chmod 600 /etc/tailscale/authkey"'

# 5. Reinstall watchdog (copy from git or re-apply from this runbook's source repo)
scp docs/infrastructure/hosts/deck/watchdog.sh deck:/tmp/
ssh deck 'echo <pw> | sudo -S install -m 0755 /tmp/watchdog.sh /etc/tailscale/watchdog.sh'

# 6. Reinstall + enable systemd units (see files/ directory)
scp docs/infrastructure/hosts/deck/tailscale-watchdog.{service,timer} deck:/tmp/
ssh deck 'echo <pw> | sudo -S sh -c "install -m 0644 /tmp/tailscale-watchdog.service /etc/systemd/system/ && install -m 0644 /tmp/tailscale-watchdog.timer /etc/systemd/system/ && systemctl daemon-reload && systemctl enable --now tailscaled.service tailscale-watchdog.timer"'

The watchdog script and systemd unit sources are checked in under docs/infrastructure/hosts/deck/ so a recovery doesn't require reconstructing them from memory.

6.8 KiB Raw Blame History