Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot c727d0bfb1

Documentation / Deploy to GitHub Pages (push) Has been cancelled

Details

Documentation / Build Docusaurus (push) Has been cancelled

Details

Sanitized mirror from private repository - 2026-03-24 12:45:58 UTC

2026-03-24 12:45:58 +00:00

7.1 KiB

Raw Blame History

OpenCode

AI-Powered Coding Agent CLI

Service Overview

Property	Value
Service Name	opencode
Category	AI / Development
Hosts	homelab VM (192.168.0.210), moon (100.64.0.6)
Install	`curl -fsSL https://opencode.ai/install \| bash`
Config	`~/.config/opencode/opencode.json`
LLM Backend	Olares Ollama (Qwen3-Coder 30B A3B)
Agent Name	Vesper

Purpose

OpenCode is an interactive CLI coding agent (similar to Claude Code) that connects to local LLM backends for AI-assisted software engineering. It runs on developer workstations and connects to the Olares Kubernetes appliance for GPU-accelerated inference.

Architecture

Developer Host (homelab VM / moon)
  └── opencode CLI
        └── HTTPS → Olares (192.168.0.145)
              └── Ollama (RTX 5090 Max-Q, 24GB VRAM)
                    └── qwen3-coder-65k:latest (Qwen3-Coder 30B A3B, Q4_K_M)

Ollama Infrastructure

Host: Olares appliance at 192.168.0.145 (SSH: ssh olares)
Runtime: Kubernetes (k3s), namespace ollamaserver-shared
Pod: ollama-* in deployment ollama
API endpoint: https://a5be22681.vishinator.olares.com
GPU: NVIDIA RTX 5090 Laptop GPU, 24GB VRAM, compute capability 12.0
Flash attention: Enabled (OLLAMA_FLASH_ATTENTION=1 env var on deployment)

Models on Ollama

Model	Size	Context	VRAM Usage	Notes
`qwen3-coder-65k:latest`	18GB	65k tokens	~25.3GB (spills ~1.3GB to system RAM)	Default for everything
`qwen3-coder:latest`	18GB	32k tokens	~22GB (fits in VRAM)	Base model, safe fallback
`devstral-small-2:latest`	15GB	32k tokens	—	Alternative model

The 65k model is a Modelfile wrapper around qwen3-coder:latest with PARAMETER num_ctx 65536. Same weights, same quality, just larger context window. The ~1.3GB VRAM overshoot is handled via resizable BAR / shared memory on the laptop GPU — inference is slightly slower for the spilled portion but negligible for typical use.

Configuration

Config: ~/.config/opencode/opencode.json

Default Model

"model": "olares-qwen3-coder//qwen3-coder-65k:latest"

Agent Personality (Vesper)

OpenCode is configured with a personality via both instructions in the config and AGENTS.md in the repo root:

Name: Vesper
Style: Concise, witty, competent — executes commands directly instead of explaining
Guardian role: Proactively warns about bad practices (secrets in git, missing dry-runs, open permissions)
Safety practices: Works in branches, dry-runs first, backs up before modifying, verifies after acting

Configured Providers

Provider	Model	Context	Status
`olares-qwen3-coder` (default)	Qwen3-Coder 30B	65k	Active
`olares`	Qwen3 30B A3B	40k	vLLM endpoint down (502)
`olares-gptoss`	GPT-OSS 20B	65k	vLLM endpoint down (502)
`olares-qwen35`	Qwen3.5 27B	65k	Ollama, no tool calling
`olares-devstral`	Devstral Small 2 24B	32k	Ollama, available

Permissions (Full Autonomy)

"permission": {
  "bash": "allow",
  "edit": "allow",
  "write": "allow",
  "read": "allow",
  "glob": "allow",
  "grep": "allow",
  "question": "allow",
  "external_directory": "allow",
  "mcp": "allow"
}

Loop Prevention

"mode": {
  "build": { "steps": 50, "permission": { "doom_loop": "deny" } },
  "plan": { "steps": 25, "permission": { "doom_loop": "deny" } }
}

MCP Integration

The homelab MCP server is configured on the homelab VM:

"mcp": {
  "homelab": {
    "type": "local",
    "command": ["python3", "/home/homelab/organized/repos/homelab/scripts/homelab-mcp/server.py"],
    "enabled": true
  }
}

Host-Specific Setup

homelab VM (192.168.0.210)

User: homelab
Binary: ~/.opencode/bin/opencode
Config: ~/.config/opencode/opencode.json
Backup: ~/.config/opencode/opencode.json.bak.*
MCP: homelab MCP server enabled
All providers configured

moon (100.64.0.6 via Tailscale)

User: moon (access via ssh moon, then sudo -i su - moon)
Binary: ~/.opencode/bin/opencode
Config: ~/.config/opencode/opencode.json
May need config updated to point at active Ollama endpoint

Switching to 32k Context (Safe Mode)

If the 65k context causes issues (OOM, slow inference, instability), revert to 32k:

1. Update opencode config (~/.config/opencode/opencode.json):

"model": "olares-qwen3-coder//qwen3-coder:latest"

2. Update email organizer configs (all 3 files):

sed -i 's/qwen3-coder-65k:latest/qwen3-coder:latest/g' \
  scripts/gmail-organizer/config.yaml \
  scripts/gmail-organizer-dvish/config.yaml \
  scripts/proton-organizer/config.local.yaml

3. Unload 65k model and load base:

curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder-65k:latest","keep_alive":0}'
curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder:latest","prompt":"test","stream":false}'

4. Optionally disable flash attention (reduces VRAM overhead further):

ssh olares "sudo k3s kubectl set env deployment/ollama -n ollamaserver-shared -c ollama OLLAMA_FLASH_ATTENTION-"

5. Verify:

curl -s https://a5be22681.vishinator.olares.com/api/ps | python3 -m json.tool
# Should show: ctx=32768, size_vram ~22GB

Removing the 65k Model Entirely

ssh olares "sudo k3s kubectl exec -n ollamaserver-shared \$(sudo k3s kubectl get pod -n ollamaserver-shared -l app=ollama -o jsonpath='{.items[0].metadata.name}') -c ollama -- ollama rm qwen3-coder-65k"

Requirements

Tool calling support required — OpenCode sends tools with every request. Models without tool call templates return 400 errors
Large context needed — System prompt + tool definitions use ~15-20k tokens. Models with less than 32k context will fail
Flash attention recommended — OLLAMA_FLASH_ATTENTION=1 on the Ollama deployment allows larger contexts within VRAM limits

Troubleshooting

Error	Cause	Fix
`bad request` / 400	Model doesn't support tools, or context exceeded	Switch to model with tool calling support
`model not found`	Wrong model name (e.g., `qwen3:coder` vs `qwen3-coder:latest`)	Check `ollama list` for exact names
Constant compaction	Context window too small for system prompt + conversation	Increase context or reduce web fetches
502 Bad Gateway	Ollama pod restarting or endpoint down	Check pod: `ssh olares "sudo k3s kubectl get pods -n ollamaserver-shared"`
Stuck in loops	Model keeps retrying failed tool calls	`doom_loop: "deny"` and reduce `steps`
Slow responses	VRAM spill to system RAM at 65k ctx	Switch to 32k (see above)
Won't run ansible	Model too cautious, AGENTS.md too restrictive	Check instructions in config and AGENTS.md
Web fetch eating context	Model searching internet for local info	Instructions tell it to read local files first

7.1 KiB Raw Blame History