Sanitized mirror from private repository - 2026-03-24 11:56:17 UTC
This commit is contained in:
133
docs/services/individual/opencode.md
Normal file
133
docs/services/individual/opencode.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# OpenCode
|
||||
|
||||
**AI-Powered Coding Agent CLI**
|
||||
|
||||
## Service Overview
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Service Name** | opencode |
|
||||
| **Category** | AI / Development |
|
||||
| **Hosts** | homelab VM (192.168.0.210), moon (100.64.0.6) |
|
||||
| **Install** | `curl -fsSL https://opencode.ai/install \| bash` |
|
||||
| **Config** | `~/.config/opencode/opencode.json` |
|
||||
| **LLM Backend** | Olares vLLM (Qwen3 30B) |
|
||||
|
||||
## Purpose
|
||||
|
||||
OpenCode is an interactive CLI coding agent (similar to Claude Code) that connects to local LLM backends for AI-assisted software engineering. It runs on developer workstations and connects to the Olares Kubernetes appliance for GPU-accelerated inference.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Developer Host (homelab VM / moon)
|
||||
└── opencode CLI
|
||||
└── HTTPS → Olares (192.168.0.145)
|
||||
└── vLLM / Ollama (RTX 5090 Max-Q)
|
||||
└── Qwen3 30B / GPT-OSS 20B
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
# Install opencode
|
||||
curl -fsSL https://opencode.ai/install | bash
|
||||
|
||||
# Create config directory
|
||||
mkdir -p ~/.config/opencode
|
||||
|
||||
# Run in any project directory
|
||||
cd ~/my-project
|
||||
opencode
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Config location: `~/.config/opencode/opencode.json`
|
||||
|
||||
### Configured Providers
|
||||
|
||||
| Provider | Model | Backend | Context | Tool Calling |
|
||||
|----------|-------|---------|---------|-------------|
|
||||
| `olares` (default) | Qwen3 30B A3B | vLLM | 40k tokens | Yes (Hermes parser) |
|
||||
| `olares-gptoss` | GPT-OSS 20B | vLLM | 65k tokens | Yes |
|
||||
| `olares-qwen35` | Qwen3.5 27B Q4_K_M | Ollama | 65k tokens | No (avoid for OpenCode) |
|
||||
|
||||
### Switching Models
|
||||
|
||||
Edit the `"model"` field in `opencode.json`:
|
||||
|
||||
```json
|
||||
"model": "olares//models/qwen3-30b"
|
||||
```
|
||||
|
||||
Available values:
|
||||
- `olares//models/qwen3-30b` — recommended, supports tool calling
|
||||
- `olares-gptoss//models/gpt-oss-20b` — larger context, experimental
|
||||
|
||||
### Loop Prevention
|
||||
|
||||
OpenCode can get stuck in tool call loops with smaller models. These settings cap iterations:
|
||||
|
||||
```json
|
||||
"mode": {
|
||||
"build": {
|
||||
"steps": 25,
|
||||
"permission": { "doom_loop": "deny" }
|
||||
},
|
||||
"plan": {
|
||||
"steps": 15,
|
||||
"permission": { "doom_loop": "deny" }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `steps` — max agentic iterations before forcing text response
|
||||
- `doom_loop: "deny"` — immediately stop when loop detected
|
||||
|
||||
### MCP Integration
|
||||
|
||||
The homelab MCP server is configured on the homelab VM:
|
||||
|
||||
```json
|
||||
"mcp": {
|
||||
"homelab": {
|
||||
"type": "local",
|
||||
"command": ["python3", "/path/to/homelab-mcp/server.py"],
|
||||
"enabled": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Host-Specific Setup
|
||||
|
||||
### homelab VM (192.168.0.210)
|
||||
|
||||
- **User**: homelab
|
||||
- **Binary**: `~/.opencode/bin/opencode`
|
||||
- **Config**: `~/.config/opencode/opencode.json`
|
||||
- **MCP**: homelab MCP server enabled
|
||||
- **All 3 providers configured**
|
||||
|
||||
### moon (100.64.0.6 via Tailscale)
|
||||
|
||||
- **User**: moon (access via `ssh moon`, then `sudo -i su - moon`)
|
||||
- **Binary**: `~/.opencode/bin/opencode`
|
||||
- **Config**: `~/.config/opencode/opencode.json`
|
||||
- **Qwen3 30B provider only**
|
||||
|
||||
## Requirements
|
||||
|
||||
- **Tool calling support required** — OpenCode sends tools with every request. Models without tool call templates (e.g., Ollama Qwen3.5) return 400 errors
|
||||
- **Large context needed** — OpenCode's system prompt + tool definitions use ~15-20k tokens. Models with less than 32k context will fail
|
||||
- **vLLM recommended** — Use `--enable-auto-tool-choice --tool-call-parser hermes` for reliable function calling
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Error | Cause | Fix |
|
||||
|-------|-------|-----|
|
||||
| `bad request` / 400 | Model doesn't support tools, or context exceeded | Switch to vLLM model with tool calling |
|
||||
| `max_tokens negative` | Context window too small for system prompt | Increase `max_model_len` on vLLM |
|
||||
| Stuck in loops | Model keeps retrying failed tool calls | `doom_loop: "deny"` and reduce `steps` |
|
||||
| Connection refused | LLM endpoint down or auth blocking | Check vLLM pod status, verify auth bypass |
|
||||
| Slow responses | Dense model on limited GPU | Use MoE model (Qwen3 30B) for faster inference |
|
||||
Reference in New Issue
Block a user