homelab-optimized/docs/services/individual/opencode.md

# OpenCode

**AI-Powered Coding Agent CLI**

## Service Overview

| Property | Value |
|----------|-------|
| **Service Name** | opencode |
| **Category** | AI / Development |
| **Hosts** | homelab VM (192.168.0.210), moon (100.64.0.6) |
| **Install** | `curl -fsSL https://opencode.ai/install \| bash` |
| **Config** | `~/.config/opencode/opencode.json` |
| **LLM Backend** | Olares vLLM (Qwen3 30B) |

## Purpose

OpenCode is an interactive CLI coding agent (similar to Claude Code) that connects to local LLM backends for AI-assisted software engineering. It runs on developer workstations and connects to the Olares Kubernetes appliance for GPU-accelerated inference.

## Architecture

```
Developer Host (homelab VM / moon)
  └── opencode CLI
        └── HTTPS → Olares (192.168.0.145)
              └── vLLM / Ollama (RTX 5090 Max-Q)
                    └── Qwen3 30B / GPT-OSS 20B
```

## Installation

```bash
# Install opencode
curl -fsSL https://opencode.ai/install | bash

# Create config directory
mkdir -p ~/.config/opencode

# Run in any project directory
cd ~/my-project
opencode
```

## Configuration

Config location: `~/.config/opencode/opencode.json`

### Configured Providers

| Provider | Model | Backend | Context | Tool Calling |
|----------|-------|---------|---------|-------------|
| `olares` (default) | Qwen3 30B A3B | vLLM | 40k tokens | Yes (Hermes parser) |
| `olares-gptoss` | GPT-OSS 20B | vLLM | 65k tokens | Yes |
| `olares-qwen35` | Qwen3.5 27B Q4_K_M | Ollama | 65k tokens | No (avoid for OpenCode) |

### Switching Models

Edit the `"model"` field in `opencode.json`:

```json
"model": "olares//models/qwen3-30b"
```

Available values:
- `olares//models/qwen3-30b` — recommended, supports tool calling
- `olares-gptoss//models/gpt-oss-20b` — larger context, experimental

### Loop Prevention

OpenCode can get stuck in tool call loops with smaller models. These settings cap iterations:

```json
"mode": {
  "build": {
    "steps": 25,
    "permission": { "doom_loop": "deny" }
  },
  "plan": {
    "steps": 15,
    "permission": { "doom_loop": "deny" }
  }
}
```

- `steps` — max agentic iterations before forcing text response
- `doom_loop: "deny"` — immediately stop when loop detected

### MCP Integration

The homelab MCP server is configured on the homelab VM:

```json
"mcp": {
  "homelab": {
    "type": "local",
    "command": ["python3", "/path/to/homelab-mcp/server.py"],
    "enabled": true
  }
}
```

## Host-Specific Setup

### homelab VM (192.168.0.210)

- **User**: homelab
- **Binary**: `~/.opencode/bin/opencode`
- **Config**: `~/.config/opencode/opencode.json`
- **MCP**: homelab MCP server enabled
- **All 3 providers configured**

### moon (100.64.0.6 via Tailscale)

- **User**: moon (access via `ssh moon`, then `sudo -i su - moon`)
- **Binary**: `~/.opencode/bin/opencode`
- **Config**: `~/.config/opencode/opencode.json`
- **Qwen3 30B provider only**

## Requirements

- **Tool calling support required** — OpenCode sends tools with every request. Models without tool call templates (e.g., Ollama Qwen3.5) return 400 errors
- **Large context needed** — OpenCode's system prompt + tool definitions use ~15-20k tokens. Models with less than 32k context will fail
- **vLLM recommended** — Use `--enable-auto-tool-choice --tool-call-parser hermes` for reliable function calling

## Troubleshooting

| Error | Cause | Fix |
|-------|-------|-----|
| `bad request` / 400 | Model doesn't support tools, or context exceeded | Switch to vLLM model with tool calling |
| `max_tokens negative` | Context window too small for system prompt | Increase `max_model_len` on vLLM |
| Stuck in loops | Model keeps retrying failed tool calls | `doom_loop: "deny"` and reduce `steps` |
| Connection refused | LLM endpoint down or auth blocking | Check vLLM pod status, verify auth bypass |
| Slow responses | Dense model on limited GPU | Use MoE model (Qwen3 30B) for faster inference |