Sanitized mirror from private repository - 2026-03-24 12:51:03 UTC
This commit is contained in:
193
docs/services/individual/opencode.md
Normal file
193
docs/services/individual/opencode.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# OpenCode
|
||||
|
||||
**AI-Powered Coding Agent CLI**
|
||||
|
||||
## Service Overview
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Service Name** | opencode |
|
||||
| **Category** | AI / Development |
|
||||
| **Hosts** | homelab VM (192.168.0.210), moon (100.64.0.6) |
|
||||
| **Install** | `curl -fsSL https://opencode.ai/install \| bash` |
|
||||
| **Config** | `~/.config/opencode/opencode.json` |
|
||||
| **LLM Backend** | Olares Ollama (Qwen3-Coder 30B A3B) |
|
||||
| **Agent Name** | Vesper |
|
||||
|
||||
## Purpose
|
||||
|
||||
OpenCode is an interactive CLI coding agent (similar to Claude Code) that connects to local LLM backends for AI-assisted software engineering. It runs on developer workstations and connects to the Olares Kubernetes appliance for GPU-accelerated inference.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Developer Host (homelab VM / moon)
|
||||
└── opencode CLI
|
||||
└── HTTPS → Olares (192.168.0.145)
|
||||
└── Ollama (RTX 5090 Max-Q, 24GB VRAM)
|
||||
└── qwen3-coder-65k:latest (Qwen3-Coder 30B A3B, Q4_K_M)
|
||||
```
|
||||
|
||||
### Ollama Infrastructure
|
||||
|
||||
- **Host**: Olares appliance at 192.168.0.145 (SSH: `ssh olares`)
|
||||
- **Runtime**: Kubernetes (k3s), namespace `ollamaserver-shared`
|
||||
- **Pod**: `ollama-*` in deployment `ollama`
|
||||
- **API endpoint**: `https://a5be22681.vishinator.olares.com`
|
||||
- **GPU**: NVIDIA RTX 5090 Laptop GPU, 24GB VRAM, compute capability 12.0
|
||||
- **Flash attention**: Enabled (`OLLAMA_FLASH_ATTENTION=1` env var on deployment)
|
||||
|
||||
### Models on Ollama
|
||||
|
||||
| Model | Size | Context | VRAM Usage | Notes |
|
||||
|-------|------|---------|------------|-------|
|
||||
| `qwen3-coder-65k:latest` | 18GB | 65k tokens | ~25.3GB (spills ~1.3GB to system RAM) | **Default for everything** |
|
||||
| `qwen3-coder:latest` | 18GB | 32k tokens | ~22GB (fits in VRAM) | Base model, safe fallback |
|
||||
| `devstral-small-2:latest` | 15GB | 32k tokens | — | Alternative model |
|
||||
|
||||
The 65k model is a Modelfile wrapper around `qwen3-coder:latest` with `PARAMETER num_ctx 65536`. Same weights, same quality, just larger context window. The ~1.3GB VRAM overshoot is handled via resizable BAR / shared memory on the laptop GPU — inference is slightly slower for the spilled portion but negligible for typical use.
|
||||
|
||||
## Configuration
|
||||
|
||||
Config: `~/.config/opencode/opencode.json`
|
||||
|
||||
### Default Model
|
||||
|
||||
```json
|
||||
"model": "olares-qwen3-coder//qwen3-coder-65k:latest"
|
||||
```
|
||||
|
||||
### Agent Personality (Vesper)
|
||||
|
||||
OpenCode is configured with a personality via both `instructions` in the config and `AGENTS.md` in the repo root:
|
||||
|
||||
- **Name**: Vesper
|
||||
- **Style**: Concise, witty, competent — executes commands directly instead of explaining
|
||||
- **Guardian role**: Proactively warns about bad practices (secrets in git, missing dry-runs, open permissions)
|
||||
- **Safety practices**: Works in branches, dry-runs first, backs up before modifying, verifies after acting
|
||||
|
||||
### Configured Providers
|
||||
|
||||
| Provider | Model | Context | Status |
|
||||
|----------|-------|---------|--------|
|
||||
| `olares-qwen3-coder` (default) | Qwen3-Coder 30B | 65k | Active |
|
||||
| `olares` | Qwen3 30B A3B | 40k | vLLM endpoint down (502) |
|
||||
| `olares-gptoss` | GPT-OSS 20B | 65k | vLLM endpoint down (502) |
|
||||
| `olares-qwen35` | Qwen3.5 27B | 65k | Ollama, no tool calling |
|
||||
| `olares-devstral` | Devstral Small 2 24B | 32k | Ollama, available |
|
||||
|
||||
### Permissions (Full Autonomy)
|
||||
|
||||
```json
|
||||
"permission": {
|
||||
"bash": "allow",
|
||||
"edit": "allow",
|
||||
"write": "allow",
|
||||
"read": "allow",
|
||||
"glob": "allow",
|
||||
"grep": "allow",
|
||||
"question": "allow",
|
||||
"external_directory": "allow",
|
||||
"mcp": "allow"
|
||||
}
|
||||
```
|
||||
|
||||
### Loop Prevention
|
||||
|
||||
```json
|
||||
"mode": {
|
||||
"build": { "steps": 50, "permission": { "doom_loop": "deny" } },
|
||||
"plan": { "steps": 25, "permission": { "doom_loop": "deny" } }
|
||||
}
|
||||
```
|
||||
|
||||
### MCP Integration
|
||||
|
||||
The homelab MCP server is configured on the homelab VM:
|
||||
|
||||
```json
|
||||
"mcp": {
|
||||
"homelab": {
|
||||
"type": "local",
|
||||
"command": ["python3", "/home/homelab/organized/repos/homelab/scripts/homelab-mcp/server.py"],
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Host-Specific Setup
|
||||
|
||||
### homelab VM (192.168.0.210)
|
||||
|
||||
- **User**: homelab
|
||||
- **Binary**: `~/.opencode/bin/opencode`
|
||||
- **Config**: `~/.config/opencode/opencode.json`
|
||||
- **Backup**: `~/.config/opencode/opencode.json.bak.*`
|
||||
- **MCP**: homelab MCP server enabled
|
||||
- **All providers configured**
|
||||
|
||||
### moon (100.64.0.6 via Tailscale)
|
||||
|
||||
- **User**: moon (access via `ssh moon`, then `sudo -i su - moon`)
|
||||
- **Binary**: `~/.opencode/bin/opencode`
|
||||
- **Config**: `~/.config/opencode/opencode.json`
|
||||
- **May need config updated** to point at active Ollama endpoint
|
||||
|
||||
## Switching to 32k Context (Safe Mode)
|
||||
|
||||
If the 65k context causes issues (OOM, slow inference, instability), revert to 32k:
|
||||
|
||||
**1. Update opencode config** (`~/.config/opencode/opencode.json`):
|
||||
```json
|
||||
"model": "olares-qwen3-coder//qwen3-coder:latest"
|
||||
```
|
||||
|
||||
**2. Update email organizer configs** (all 3 files):
|
||||
```bash
|
||||
sed -i 's/qwen3-coder-65k:latest/qwen3-coder:latest/g' \
|
||||
scripts/gmail-organizer/config.yaml \
|
||||
scripts/gmail-organizer-dvish/config.yaml \
|
||||
scripts/proton-organizer/config.local.yaml
|
||||
```
|
||||
|
||||
**3. Unload 65k model and load base**:
|
||||
```bash
|
||||
curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder-65k:latest","keep_alive":0}'
|
||||
curl -s https://a5be22681.vishinator.olares.com/api/generate -d '{"model":"qwen3-coder:latest","prompt":"test","stream":false}'
|
||||
```
|
||||
|
||||
**4. Optionally disable flash attention** (reduces VRAM overhead further):
|
||||
```bash
|
||||
ssh olares "sudo k3s kubectl set env deployment/ollama -n ollamaserver-shared -c ollama OLLAMA_FLASH_ATTENTION-"
|
||||
```
|
||||
|
||||
**5. Verify**:
|
||||
```bash
|
||||
curl -s https://a5be22681.vishinator.olares.com/api/ps | python3 -m json.tool
|
||||
# Should show: ctx=32768, size_vram ~22GB
|
||||
```
|
||||
|
||||
## Removing the 65k Model Entirely
|
||||
|
||||
```bash
|
||||
ssh olares "sudo k3s kubectl exec -n ollamaserver-shared \$(sudo k3s kubectl get pod -n ollamaserver-shared -l app=ollama -o jsonpath='{.items[0].metadata.name}') -c ollama -- ollama rm qwen3-coder-65k"
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- **Tool calling support required** — OpenCode sends tools with every request. Models without tool call templates return 400 errors
|
||||
- **Large context needed** — System prompt + tool definitions use ~15-20k tokens. Models with less than 32k context will fail
|
||||
- **Flash attention recommended** — `OLLAMA_FLASH_ATTENTION=1` on the Ollama deployment allows larger contexts within VRAM limits
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `bad request` / 400 | Model doesn't support tools, or context exceeded | Switch to model with tool calling support |
|
||||
| `model not found` | Wrong model name (e.g., `qwen3:coder` vs `qwen3-coder:latest`) | Check `ollama list` for exact names |
|
||||
| Constant compaction | Context window too small for system prompt + conversation | Increase context or reduce web fetches |
|
||||
| 502 Bad Gateway | Ollama pod restarting or endpoint down | Check pod: `ssh olares "sudo k3s kubectl get pods -n ollamaserver-shared"` |
|
||||
| Stuck in loops | Model keeps retrying failed tool calls | `doom_loop: "deny"` and reduce `steps` |
|
||||
| Slow responses | VRAM spill to system RAM at 65k ctx | Switch to 32k (see above) |
|
||||
| Won't run ansible | Model too cautious, AGENTS.md too restrictive | Check instructions in config and AGENTS.md |
|
||||
| Web fetch eating context | Model searching internet for local info | Instructions tell it to read local files first |
|
||||
Reference in New Issue
Block a user