Homelab Claude Code Instructions

Deployment

When deploying services, always verify the target host before proceeding. Confirm which host a service should run on and check for port conflicts with existing services.
Check ss -tlnp | grep <port> on the target host before deploying.
Hosts: atlantis (Synology NAS, media/arr), calypso (Synology, DNS/SSO), olares (K3s, GPU), nuc (lightweight), rpi5 (Kuma), homelab-vm (monitoring/dashboard), guava (TrueNAS), seattle (remote), matrix-ubuntu (NPM/CrowdSec).

Before modifying config files (YAML, JSON, etc.), always create a backup copy first.
Never use sed for complex YAML edits — use a proper parser or manual editing to avoid duplicate keys and corruption.
For YAML changes, validate with python3 -c "import yaml; yaml.safe_load(open('file.yaml'))" after editing.
Never empty or overwrite a config file without reading it first.

For homelab SSH operations: if MCP SSH times out on large outputs, fall back to Bash with ssh directly.
Always use the correct Tailscale/LAN IP for each host. When Ollama or other services aren't on localhost, check the memory or ask for the correct endpoint before guessing.
After making infrastructure changes (Tailscale, DNS, networking), always verify connectivity from affected hosts before marking complete.
Never run a second instance of a network daemon (tailscaled, etc.) — it will break host networking.
homelab-vm IS localhost — never SSH into it, use local commands.

Before installing/running anything on a remote host, probe the environment first: uname -a, which <binary>, mount | grep noexec, sudo -n true. Adapt or propose alternatives instead of failing then pivoting.
Tailscale binary paths differ across hosts (Synology, GL.iNet, k3s, standard Linux) — verify with which tailscale before assuming.
Synology /tmp is noexec — stage scripts in /volume1 or user home.
Synology has no git and no SFTP subsystem — use ssh-pipe (cat file | ssh host 'cat > dest') and prefix docker commands with sudo /usr/local/bin/docker.
GL.iNet travel routers wipe config on firmware update — reapply watchdog/Tailscale config after every flash.
uqiyoe is Windows — use dir/del/rmdir, not ls/rm. SSH user is vish, not homelab.
Check architecture (uname -m) before downloading binaries; the fleet has mixed amd64/arm64.

Set explicit, short timeouts on SSH/Bash commands. Default 30s, max 120s for known-slow ops.
For potentially slow operations (find on NAS, large rsync, apt upgrade): run with run_in_background: true and poll, or scope tightly with -maxdepth/path filters.
Never run unbounded find / on NAS or Synology hosts — always anchor to a specific path.
For destructive/mutating ops (rsync, dd, rm -rf, db edits): dry-run first, verify checksums/counts, take a backup before applying. Don't trust silent successes — rsync once truncated 70 GB to 74 MB without erroring.

Before changing anything to "fix" an issue, list the top 2–3 candidate root causes ranked by likelihood with one diagnostic per candidate. Run the diagnostics first, share results, then propose a fix. Don't patch the visible symptom (e.g., disabling a Kuma monitor) before confirming the underlying cause.

After deploying or fixing a service, verify end-to-end before declaring done: curl the endpoint, check Kuma status, tail logs for >60s of clean uptime.
Kuma accepted_statuscodes must be quoted strings in JSON: ["200-299"], not [200-299] (parse error otherwise).
Commit and push documentation changes in the same session as the infra change — don't leave docs lagging behind reality.

When working with LLM model deployments (Ollama, vLLM), always verify: 1) GPU access, 2) context length meets the consumer's requirements, 3) tool-calling support if needed.
Ollama is at http://192.168.0.145:31434 (Olares LAN NodePort), NOT localhost.
HAMI vGPU on Olares causes ffmpeg segfaults — do NOT request nvidia.com/gpu resources, use runtimeClassName: nvidia directly.

Olares admission webhook blocks hostNetwork and reverts custom NetworkPolicies.
Use Calico GlobalNetworkPolicy for LAN access (it can't be overridden by the webhook).
The Olares proxy adds ~100ms latency — use direct LAN NodePorts for streaming/high-throughput services.
Marketplace app patches (NFS mounts, GPU) are lost on app updates — re-apply after updates.

Never add Co-Authored-By lines to git commits.
Always run detect-secrets scan --baseline .secrets.baseline before committing if secrets baseline exists.
Use pragma: allowlist secret comments for intentional secrets in private repo files.

After completing each task, immediately update the relevant documentation in the repo and commit with a descriptive message before moving to the next task.
Key docs: docs/services/individual/dashboard.md, docs/services/individual/olares.md, scripts/README.md.

API uses X-API-Key header (NOT Bearer token).
Portainer URL: http://100.83.230.112:10000 (Tailscale IP).
Endpoints: atlantis=2, calypso=443397, nuc=443398, homelab=443399, rpi5=443395.
GitOps stacks use Gitea token for auth — if redeploy fails with "authentication required", credentials need re-entry in Portainer UI.

Dashboard runs at http://homelab.tail.vish.gg:3100 (Next.js on port 3100, FastAPI API on port 18888).
API proxied through Next.js rewrites — frontend calls /api/* which routes to localhost:18888.
16 glassmorphism themes with Exo 2 font.
To rebuild: cd dashboard/ui && rm -rf .next && BACKEND_URL=http://localhost:18888 npm run build && cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public.