Files
homelab-optimized/docs/services/individual/opencode.md
Gitea Mirror Bot c727d0bfb1
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-03-24 12:45:58 UTC
2026-03-24 12:45:58 +00:00

7.1 KiB

OpenCode

AI-Powered Coding Agent CLI

Service Overview

Property Value
Service Name opencode
Category AI / Development
Hosts homelab VM (192.168.0.210), moon (100.64.0.6)
Install curl -fsSL https://opencode.ai/install | bash
Config ~/.config/opencode/opencode.json
LLM Backend Olares Ollama (Qwen3-Coder 30B A3B)
Agent Name Vesper

Purpose

OpenCode is an interactive CLI coding agent (similar to Claude Code) that connects to local LLM backends for AI-assisted software engineering. It runs on developer workstations and connects to the Olares Kubernetes appliance for GPU-accelerated inference.

Architecture

Developer Host (homelab VM / moon)
  └── opencode CLI
        └── HTTPS → Olares (192.168.0.145)
              └── Ollama (RTX 5090 Max-Q, 24GB VRAM)
                    └── qwen3-coder-65k:latest (Qwen3-Coder 30B A3B, Q4_K_M)

Ollama Infrastructure

  • Host: Olares appliance at 192.168.0.145 (SSH: ssh olares)
  • Runtime: Kubernetes (k3s), namespace ollamaserver-shared
  • Pod: ollama-* in deployment ollama
  • API endpoint: https://a5be22681.vishinator.olares.com
  • GPU: NVIDIA RTX 5090 Laptop GPU, 24GB VRAM, compute capability 12.0
  • Flash attention: Enabled (OLLAMA_FLASH_ATTENTION=1 env var on deployment)

Models on Ollama

Model Size Context VRAM Usage Notes
qwen3-coder-65k:latest 18GB 65k tokens ~25.3GB (spills ~1.3GB to system RAM) Default for everything
qwen3-coder:latest 18GB 32k tokens ~22GB (fits in VRAM) Base model, safe fallback
devstral-small-2:latest 15GB 32k tokens Alternative model

The 65k model is a Modelfile wrapper around qwen3-coder:latest with PARAMETER num_ctx 65536. Same weights, same quality, just larger context window. The ~1.3GB VRAM overshoot is handled via resizable BAR / shared memory on the laptop GPU — inference is slightly slower for the spilled portion but negligible for typical use.

Configuration

Config: ~/.config/opencode/opencode.json

Default Model

"model": "olares-qwen3-coder//qwen3-coder-65k:latest"

Agent Personality (Vesper)

OpenCode is configured with a personality via both instructions in the config and AGENTS.md in the repo root:

  • Name: Vesper
  • Style: Concise, witty, competent — executes commands directly instead of explaining
  • Guardian role: Proactively warns about bad practices (secrets in git, missing dry-runs, open permissions)
  • Safety practices: Works in branches, dry-runs first, backs up before modifying, verifies after acting

Configured Providers

Provider Model Context Status
olares-qwen3-coder (default) Qwen3-Coder 30B 65k Active
olares Qwen3 30B A3B 40k vLLM endpoint down (502)
olares-gptoss GPT-OSS 20B 65k vLLM endpoint down (502)
olares-qwen35 Qwen3.5 27B 65k Ollama, no tool calling
olares-devstral Devstral Small 2 24B 32k Ollama, available

Permissions (Full Autonomy)

"permission": {
  "bash": "allow",
  "edit": "allow",
  "write": "allow",
  "read": "allow",
  "glob": "allow",
  "grep": "allow",
  "question": "allow",
  "external_directory": "allow",
  "mcp": "allow"
}

Loop Prevention

"mode": {
  "build": { "steps": 50, "permission": { "doom_loop": "deny" } },
  "plan": { "steps": 25, "permission": { "doom_loop": "deny" } }
}

MCP Integration

The homelab MCP server is configured on the homelab VM:

"mcp": {
  "homelab": {
    "type": "local",
    "command": ["python3", "/home/homelab/organized/repos/homelab/scripts/homelab-mcp/server.py"],
    "enabled": true
  }
}

Host-Specific Setup

homelab VM (192.168.0.210)

  • User: homelab
  • Binary: ~/.opencode/bin/opencode
  • Config: ~/.config/opencode/opencode.json
  • Backup: ~/.config/opencode/opencode.json.bak.*
  • MCP: homelab MCP server enabled
  • All providers configured

moon (100.64.0.6 via Tailscale)

  • User: moon (access via ssh moon, then sudo -i su - moon)
  • Binary: ~/.opencode/bin/opencode
  • Config: ~/.config/opencode/opencode.json
  • May need config updated to point at active Ollama endpoint

Switching to 32k Context (Safe Mode)

If the 65k context causes issues (OOM, slow inference, instability), revert to 32k:

1. Update opencode config (~/.config/opencode/opencode.json):

"model": "olares-qwen3-coder//qwen3-coder:latest"

2. Update email organizer configs (all 3 files):

sed -i 's/qwen3-coder-65k:latest/qwen3-coder:latest/g' \
  scripts/gmail-organizer/config.yaml \
  scripts/gmail-organizer-dvish/config.yaml \
  scripts/proton-organizer/config.local.yaml

3. Unload 65k model and load base:

curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder-65k:latest","keep_alive":0}'
curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder:latest","prompt":"test","stream":false}'

4. Optionally disable flash attention (reduces VRAM overhead further):

ssh olares "sudo k3s kubectl set env deployment/ollama -n ollamaserver-shared -c ollama OLLAMA_FLASH_ATTENTION-"

5. Verify:

curl -s https://a5be22681.vishinator.olares.com/api/ps | python3 -m json.tool
# Should show: ctx=32768, size_vram ~22GB

Removing the 65k Model Entirely

ssh olares "sudo k3s kubectl exec -n ollamaserver-shared \$(sudo k3s kubectl get pod -n ollamaserver-shared -l app=ollama -o jsonpath='{.items[0].metadata.name}') -c ollama -- ollama rm qwen3-coder-65k"

Requirements

  • Tool calling support required — OpenCode sends tools with every request. Models without tool call templates return 400 errors
  • Large context needed — System prompt + tool definitions use ~15-20k tokens. Models with less than 32k context will fail
  • Flash attention recommendedOLLAMA_FLASH_ATTENTION=1 on the Ollama deployment allows larger contexts within VRAM limits

Troubleshooting

Error Cause Fix
bad request / 400 Model doesn't support tools, or context exceeded Switch to model with tool calling support
model not found Wrong model name (e.g., qwen3:coder vs qwen3-coder:latest) Check ollama list for exact names
Constant compaction Context window too small for system prompt + conversation Increase context or reduce web fetches
502 Bad Gateway Ollama pod restarting or endpoint down Check pod: ssh olares "sudo k3s kubectl get pods -n ollamaserver-shared"
Stuck in loops Model keeps retrying failed tool calls doom_loop: "deny" and reduce steps
Slow responses VRAM spill to system RAM at 65k ctx Switch to 32k (see above)
Won't run ansible Model too cautious, AGENTS.md too restrictive Check instructions in config and AGENTS.md
Web fetch eating context Model searching internet for local info Instructions tell it to read local files first