88 lines
5.9 KiB
Markdown
88 lines
5.9 KiB
Markdown
# Homelab Claude Code Instructions
|
||
|
||
## Deployment
|
||
|
||
- When deploying services, always verify the target host before proceeding. Confirm which host a service should run on and check for port conflicts with existing services.
|
||
- Check `ss -tlnp | grep <port>` on the target host before deploying.
|
||
- Hosts: atlantis (Synology NAS, media/arr), calypso (Synology, DNS/SSO), olares (K3s, GPU), nuc (lightweight), rpi5 (Kuma), homelab-vm (monitoring/dashboard), guava (TrueNAS), seattle (remote), matrix-ubuntu (NPM/CrowdSec).
|
||
|
||
## Configuration Management
|
||
|
||
- Before modifying config files (YAML, JSON, etc.), always create a backup copy first.
|
||
- Never use sed for complex YAML edits — use a proper parser or manual editing to avoid duplicate keys and corruption.
|
||
- For YAML changes, validate with `python3 -c "import yaml; yaml.safe_load(open('file.yaml'))"` after editing.
|
||
- Never empty or overwrite a config file without reading it first.
|
||
|
||
## Homelab SSH & Networking
|
||
|
||
- For homelab SSH operations: if MCP SSH times out on large outputs, fall back to Bash with `ssh` directly.
|
||
- Always use the correct Tailscale/LAN IP for each host. When Ollama or other services aren't on localhost, check the memory or ask for the correct endpoint before guessing.
|
||
- After making infrastructure changes (Tailscale, DNS, networking), always verify connectivity from affected hosts before marking complete.
|
||
- Never run a second instance of a network daemon (tailscaled, etc.) — it will break host networking.
|
||
- homelab-vm IS localhost — never SSH into it, use local commands.
|
||
|
||
## Heterogeneous Host Awareness
|
||
|
||
- Before installing/running anything on a remote host, probe the environment first: `uname -a`, `which <binary>`, `mount | grep noexec`, `sudo -n true`. Adapt or propose alternatives instead of failing then pivoting.
|
||
- Tailscale binary paths differ across hosts (Synology, GL.iNet, k3s, standard Linux) — verify with `which tailscale` before assuming.
|
||
- Synology `/tmp` is `noexec` — stage scripts in `/volume1` or user home.
|
||
- Synology has no `git` and no SFTP subsystem — use `ssh-pipe` (`cat file | ssh host 'cat > dest'`) and prefix docker commands with `sudo /usr/local/bin/docker`.
|
||
- GL.iNet travel routers wipe config on firmware update — reapply watchdog/Tailscale config after every flash.
|
||
- uqiyoe is **Windows** — use `dir`/`del`/`rmdir`, not `ls`/`rm`. SSH user is `vish`, not `homelab`.
|
||
- Check architecture (`uname -m`) before downloading binaries; the fleet has mixed amd64/arm64.
|
||
|
||
## Long-Running Commands
|
||
|
||
- Set explicit, short timeouts on SSH/Bash commands. Default 30s, max 120s for known-slow ops.
|
||
- For potentially slow operations (find on NAS, large rsync, apt upgrade): run with `run_in_background: true` and poll, or scope tightly with `-maxdepth`/path filters.
|
||
- Never run unbounded `find /` on NAS or Synology hosts — always anchor to a specific path.
|
||
- For destructive/mutating ops (rsync, dd, rm -rf, db edits): dry-run first, verify checksums/counts, take a backup before applying. Don't trust silent successes — `rsync` once truncated 70 GB to 74 MB without erroring.
|
||
|
||
## Debugging Discipline
|
||
|
||
- Before changing anything to "fix" an issue, list the top 2–3 candidate root causes ranked by likelihood with one diagnostic per candidate. Run the diagnostics first, share results, then propose a fix. Don't patch the visible symptom (e.g., disabling a Kuma monitor) before confirming the underlying cause.
|
||
|
||
## Verification Discipline
|
||
|
||
- After deploying or fixing a service, verify end-to-end before declaring done: curl the endpoint, check Kuma status, tail logs for >60s of clean uptime.
|
||
- Kuma `accepted_statuscodes` must be quoted strings in JSON: `["200-299"]`, not `[200-299]` (parse error otherwise).
|
||
- Commit and push documentation changes in the same session as the infra change — don't leave docs lagging behind reality.
|
||
|
||
## LLM Services
|
||
|
||
- When working with LLM model deployments (Ollama, vLLM), always verify: 1) GPU access, 2) context length meets the consumer's requirements, 3) tool-calling support if needed.
|
||
- Ollama is at `http://192.168.0.145:31434` (Olares LAN NodePort), NOT localhost.
|
||
- HAMI vGPU on Olares causes ffmpeg segfaults — do NOT request `nvidia.com/gpu` resources, use `runtimeClassName: nvidia` directly.
|
||
|
||
## Olares (K3s)
|
||
|
||
- Olares admission webhook blocks hostNetwork and reverts custom NetworkPolicies.
|
||
- Use Calico GlobalNetworkPolicy for LAN access (it can't be overridden by the webhook).
|
||
- The Olares proxy adds ~100ms latency — use direct LAN NodePorts for streaming/high-throughput services.
|
||
- Marketplace app patches (NFS mounts, GPU) are lost on app updates — re-apply after updates.
|
||
|
||
## Git & Commits
|
||
|
||
- Never add Co-Authored-By lines to git commits.
|
||
- Always run `detect-secrets scan --baseline .secrets.baseline` before committing if secrets baseline exists.
|
||
- Use `pragma: allowlist secret` comments for intentional secrets in private repo files.
|
||
|
||
## Documentation
|
||
|
||
- After completing each task, immediately update the relevant documentation in the repo and commit with a descriptive message before moving to the next task.
|
||
- Key docs: `docs/services/individual/dashboard.md`, `docs/services/individual/olares.md`, `scripts/README.md`.
|
||
|
||
## Portainer
|
||
|
||
- API uses `X-API-Key` header (NOT Bearer token).
|
||
- Portainer URL: `http://100.83.230.112:10000` (Tailscale IP).
|
||
- Endpoints: atlantis=2, calypso=443397, nuc=443398, homelab=443399, rpi5=443395.
|
||
- GitOps stacks use Gitea token for auth — if redeploy fails with "authentication required", credentials need re-entry in Portainer UI.
|
||
|
||
## Dashboard
|
||
|
||
- Dashboard runs at `http://homelab.tail.vish.gg:3100` (Next.js on port 3100, FastAPI API on port 18888).
|
||
- API proxied through Next.js rewrites — frontend calls `/api/*` which routes to localhost:18888.
|
||
- 16 glassmorphism themes with Exo 2 font.
|
||
- To rebuild: `cd dashboard/ui && rm -rf .next && BACKEND_URL=http://localhost:18888 npm run build && cp -r .next/static .next/standalone/.next/static && cp -r public .next/standalone/public`.
|