# Homelab Claude Code Instructions ## Deployment - When deploying services, always verify the target host before proceeding. Confirm which host a service should run on and check for port conflicts with existing services. - Check `ss -tlnp | grep ` on the target host before deploying. - Hosts: atlantis (Synology NAS, media/arr), calypso (Synology, DNS/SSO), olares (K3s, GPU), nuc (lightweight), rpi5 (Kuma), homelab-vm (monitoring/dashboard), guava (TrueNAS), seattle (remote), matrix-ubuntu (NPM/CrowdSec). ## Configuration Management - Before modifying config files (YAML, JSON, etc.), always create a backup copy first. - Never use sed for complex YAML edits — use a proper parser or manual editing to avoid duplicate keys and corruption. - For YAML changes, validate with `python3 -c "import yaml; yaml.safe_load(open('file.yaml'))"` after editing. - Never empty or overwrite a config file without reading it first. ## Homelab SSH & Networking - For homelab SSH operations: if MCP SSH times out on large outputs, fall back to Bash with `ssh` directly. - Always use the correct Tailscale/LAN IP for each host. When Ollama or other services aren't on localhost, check the memory or ask for the correct endpoint before guessing. - After making infrastructure changes (Tailscale, DNS, networking), always verify connectivity from affected hosts before marking complete. - Never run a second instance of a network daemon (tailscaled, etc.) — it will break host networking. - homelab-vm IS localhost — never SSH into it, use local commands. ## Heterogeneous Host Awareness - Before installing/running anything on a remote host, probe the environment first: `uname -a`, `which `, `mount | grep noexec`, `sudo -n true`. Adapt or propose alternatives instead of failing then pivoting. - Tailscale binary paths differ across hosts (Synology, GL.iNet, k3s, standard Linux) — verify with `which tailscale` before assuming. - Synology `/tmp` is `noexec` — stage scripts in `/volume1` or user home. - Synology has no `git` and no SFTP subsystem — use `ssh-pipe` (`cat file | ssh host 'cat > dest'`) and prefix docker commands with `sudo /usr/local/bin/docker`. - GL.iNet travel routers wipe config on firmware update — reapply watchdog/Tailscale config after every flash. - uqiyoe is **Windows** — use `dir`/`del`/`rmdir`, not `ls`/`rm`. SSH user is `vish`, not `homelab`. - Check architecture (`uname -m`) before downloading binaries; the fleet has mixed amd64/arm64. ## Long-Running Commands - Set explicit, short timeouts on SSH/Bash commands. Default 30s, max 120s for known-slow ops. - For potentially slow operations (find on NAS, large rsync, apt upgrade): run with `run_in_background: true` and poll, or scope tightly with `-maxdepth`/path filters. - Never run unbounded `find /` on NAS or Synology hosts — always anchor to a specific path. - For destructive/mutating ops (rsync, dd, rm -rf, db edits): dry-run first, verify checksums/counts, take a backup before applying. Don't trust silent successes — `rsync` once truncated 70 GB to 74 MB without erroring. ## Debugging Discipline - Before changing anything to "fix" an issue, list the top 2–3 candidate root causes ranked by likelihood with one diagnostic per candidate. Run the diagnostics first, share results, then propose a fix. Don't patch the visible symptom (e.g., disabling a Kuma monitor) before confirming the underlying cause. ## Verification Discipline - After deploying or fixing a service, verify end-to-end before declaring done: curl the endpoint, check Kuma status, tail logs for >60s of clean uptime. - Kuma `accepted_statuscodes` must be quoted strings in JSON: `["200-299"]`, not `[200-299]` (parse error otherwise). - Commit and push documentation changes in the same session as the infra change — don't leave docs lagging behind reality. ## LLM Services - When working with LLM model deployments (Ollama, vLLM), always verify: 1) GPU access, 2) context length meets the consumer's requirements, 3) tool-calling support if needed. - Ollama is at `http://192.168.0.145:31434` (Olares LAN NodePort), NOT localhost. - HAMI vGPU on Olares causes ffmpeg segfaults — do NOT request `nvidia.com/gpu` resources, use `runtimeClassName: nvidia` directly. ## Olares (K3s) - Olares admission webhook blocks hostNetwork and reverts custom NetworkPolicies. - Use Calico GlobalNetworkPolicy for LAN access (it can't be overridden by the webhook). - The Olares proxy adds ~100ms latency — use direct LAN NodePorts for streaming/high-throughput services. - Marketplace app patches (NFS mounts, GPU) are lost on app updates — re-apply after updates. ## Git & Commits - Never add Co-Authored-By lines to git commits. - Always run `detect-secrets scan --baseline .secrets.baseline` before committing if secrets baseline exists. - Use `pragma: allowlist secret` comments for intentional secrets in private repo files. ## Documentation - After completing each task, immediately update the relevant documentation in the repo and commit with a descriptive message before moving to the next task. - Key docs: `docs/services/individual/dashboard.md`, `docs/services/individual/olares.md`, `scripts/README.md`. ## Portainer - API uses `X-API-Key` header (NOT Bearer token). - Portainer URL: `http://100.83.230.112:10000` (Tailscale IP). - Endpoints: atlantis=2, calypso=443397, nuc=443398, homelab=443399, rpi5=443395. - GitOps stacks use Gitea token for auth — if redeploy fails with "authentication required", credentials need re-entry in Portainer UI. ## Dashboard - Dashboard runs at `http://homelab.tail.vish.gg:3100` (Next.js on port 3100, FastAPI API on port 18888). - API proxied through Next.js rewrites — frontend calls `/api/*` which routes to localhost:18888. - 16 glassmorphism themes with Exo 2 font. - To rebuild: `cd dashboard/ui && rm -rf .next && BACKEND_URL=http://localhost:18888 npm run build && cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public`.