7.1 KiB
OpenCode
AI-Powered Coding Agent CLI
Service Overview
| Property | Value |
|---|---|
| Service Name | opencode |
| Category | AI / Development |
| Hosts | homelab VM (192.168.0.210), moon (100.64.0.6) |
| Install | curl -fsSL https://opencode.ai/install | bash |
| Config | ~/.config/opencode/opencode.json |
| LLM Backend | Olares Ollama (Qwen3-Coder 30B A3B) |
| Agent Name | Vesper |
Purpose
OpenCode is an interactive CLI coding agent (similar to Claude Code) that connects to local LLM backends for AI-assisted software engineering. It runs on developer workstations and connects to the Olares Kubernetes appliance for GPU-accelerated inference.
Architecture
Developer Host (homelab VM / moon)
└── opencode CLI
└── HTTPS → Olares (192.168.0.145)
└── Ollama (RTX 5090 Max-Q, 24GB VRAM)
└── qwen3-coder-65k:latest (Qwen3-Coder 30B A3B, Q4_K_M)
Ollama Infrastructure
- Host: Olares appliance at 192.168.0.145 (SSH:
ssh olares) - Runtime: Kubernetes (k3s), namespace
ollamaserver-shared - Pod:
ollama-*in deploymentollama - API endpoint:
https://a5be22681.vishinator.olares.com - GPU: NVIDIA RTX 5090 Laptop GPU, 24GB VRAM, compute capability 12.0
- Flash attention: Enabled (
OLLAMA_FLASH_ATTENTION=1env var on deployment)
Models on Ollama
| Model | Size | Context | VRAM Usage | Notes |
|---|---|---|---|---|
qwen3-coder-65k:latest |
18GB | 65k tokens | ~25.3GB (spills ~1.3GB to system RAM) | Default for everything |
qwen3-coder:latest |
18GB | 32k tokens | ~22GB (fits in VRAM) | Base model, safe fallback |
devstral-small-2:latest |
15GB | 32k tokens | — | Alternative model |
The 65k model is a Modelfile wrapper around qwen3-coder:latest with PARAMETER num_ctx 65536. Same weights, same quality, just larger context window. The ~1.3GB VRAM overshoot is handled via resizable BAR / shared memory on the laptop GPU — inference is slightly slower for the spilled portion but negligible for typical use.
Configuration
Config: ~/.config/opencode/opencode.json
Default Model
"model": "olares-qwen3-coder//qwen3-coder-65k:latest"
Agent Personality (Vesper)
OpenCode is configured with a personality via both instructions in the config and AGENTS.md in the repo root:
- Name: Vesper
- Style: Concise, witty, competent — executes commands directly instead of explaining
- Guardian role: Proactively warns about bad practices (secrets in git, missing dry-runs, open permissions)
- Safety practices: Works in branches, dry-runs first, backs up before modifying, verifies after acting
Configured Providers
| Provider | Model | Context | Status |
|---|---|---|---|
olares-qwen3-coder (default) |
Qwen3-Coder 30B | 65k | Active |
olares |
Qwen3 30B A3B | 40k | vLLM endpoint down (502) |
olares-gptoss |
GPT-OSS 20B | 65k | vLLM endpoint down (502) |
olares-qwen35 |
Qwen3.5 27B | 65k | Ollama, no tool calling |
olares-devstral |
Devstral Small 2 24B | 32k | Ollama, available |
Permissions (Full Autonomy)
"permission": {
"bash": "allow",
"edit": "allow",
"write": "allow",
"read": "allow",
"glob": "allow",
"grep": "allow",
"question": "allow",
"external_directory": "allow",
"mcp": "allow"
}
Loop Prevention
"mode": {
"build": { "steps": 50, "permission": { "doom_loop": "deny" } },
"plan": { "steps": 25, "permission": { "doom_loop": "deny" } }
}
MCP Integration
The homelab MCP server is configured on the homelab VM:
"mcp": {
"homelab": {
"type": "local",
"command": ["python3", "/home/homelab/organized/repos/homelab/scripts/homelab-mcp/server.py"],
"enabled": true
}
}
Host-Specific Setup
homelab VM (192.168.0.210)
- User: homelab
- Binary:
~/.opencode/bin/opencode - Config:
~/.config/opencode/opencode.json - Backup:
~/.config/opencode/opencode.json.bak.* - MCP: homelab MCP server enabled
- All providers configured
moon (100.64.0.6 via Tailscale)
- User: moon (access via
ssh moon, thensudo -i su - moon) - Binary:
~/.opencode/bin/opencode - Config:
~/.config/opencode/opencode.json - May need config updated to point at active Ollama endpoint
Switching to 32k Context (Safe Mode)
If the 65k context causes issues (OOM, slow inference, instability), revert to 32k:
1. Update opencode config (~/.config/opencode/opencode.json):
"model": "olares-qwen3-coder//qwen3-coder:latest"
2. Update email organizer configs (all 3 files):
sed -i 's/qwen3-coder-65k:latest/qwen3-coder:latest/g' \
scripts/gmail-organizer/config.yaml \
scripts/gmail-organizer-dvish/config.yaml \
scripts/proton-organizer/config.local.yaml
3. Unload 65k model and load base:
curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder-65k:latest","keep_alive":0}'
curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder:latest","prompt":"test","stream":false}'
4. Optionally disable flash attention (reduces VRAM overhead further):
ssh olares "sudo k3s kubectl set env deployment/ollama -n ollamaserver-shared -c ollama OLLAMA_FLASH_ATTENTION-"
5. Verify:
curl -s https://a5be22681.vishinator.olares.com/api/ps | python3 -m json.tool
# Should show: ctx=32768, size_vram ~22GB
Removing the 65k Model Entirely
ssh olares "sudo k3s kubectl exec -n ollamaserver-shared \$(sudo k3s kubectl get pod -n ollamaserver-shared -l app=ollama -o jsonpath='{.items[0].metadata.name}') -c ollama -- ollama rm qwen3-coder-65k"
Requirements
- Tool calling support required — OpenCode sends tools with every request. Models without tool call templates return 400 errors
- Large context needed — System prompt + tool definitions use ~15-20k tokens. Models with less than 32k context will fail
- Flash attention recommended —
OLLAMA_FLASH_ATTENTION=1on the Ollama deployment allows larger contexts within VRAM limits
Troubleshooting
| Error | Cause | Fix |
|---|---|---|
bad request / 400 |
Model doesn't support tools, or context exceeded | Switch to model with tool calling support |
model not found |
Wrong model name (e.g., qwen3:coder vs qwen3-coder:latest) |
Check ollama list for exact names |
| Constant compaction | Context window too small for system prompt + conversation | Increase context or reduce web fetches |
| 502 Bad Gateway | Ollama pod restarting or endpoint down | Check pod: ssh olares "sudo k3s kubectl get pods -n ollamaserver-shared" |
| Stuck in loops | Model keeps retrying failed tool calls | doom_loop: "deny" and reduce steps |
| Slow responses | VRAM spill to system RAM at 65k ctx | Switch to 32k (see above) |
| Won't run ansible | Model too cautious, AGENTS.md too restrictive | Check instructions in config and AGENTS.md |
| Web fetch eating context | Model searching internet for local info | Instructions tell it to read local files first |