Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot 4bb38d4e1f

Documentation / Build Docusaurus (push) Failing after 5m12s

Details

Documentation / Deploy to GitHub Pages (push) Has been skipped

Details

Sanitized mirror from private repository - 2026-03-21 08:56:04 UTC

2026-03-21 08:56:04 +00:00

3.9 KiB

Raw Blame History

OpenCode

AI-Powered Coding Agent CLI

Service Overview

Property	Value
Service Name	opencode
Category	AI / Development
Hosts	homelab VM (192.168.0.210), moon (100.64.0.6)
Install	`curl -fsSL https://opencode.ai/install \| bash`
Config	`~/.config/opencode/opencode.json`
LLM Backend	Olares vLLM (Qwen3 30B)

Purpose

OpenCode is an interactive CLI coding agent (similar to Claude Code) that connects to local LLM backends for AI-assisted software engineering. It runs on developer workstations and connects to the Olares Kubernetes appliance for GPU-accelerated inference.

Architecture

Developer Host (homelab VM / moon)
  └── opencode CLI
        └── HTTPS → Olares (192.168.0.145)
              └── vLLM / Ollama (RTX 5090 Max-Q)
                    └── Qwen3 30B / GPT-OSS 20B

Installation

# Install opencode
curl -fsSL https://opencode.ai/install | bash

# Create config directory
mkdir -p ~/.config/opencode

# Run in any project directory
cd ~/my-project
opencode

Configuration

Config location: ~/.config/opencode/opencode.json

Configured Providers

Provider	Model	Backend	Context	Tool Calling
`olares` (default)	Qwen3 30B A3B	vLLM	40k tokens	Yes (Hermes parser)
`olares-gptoss`	GPT-OSS 20B	vLLM	65k tokens	Yes
`olares-qwen35`	Qwen3.5 27B Q4_K_M	Ollama	65k tokens	No (avoid for OpenCode)

Switching Models

Edit the "model" field in opencode.json:

"model": "olares//models/qwen3-30b"

Available values:

olares//models/qwen3-30b — recommended, supports tool calling
olares-gptoss//models/gpt-oss-20b — larger context, experimental

Loop Prevention

OpenCode can get stuck in tool call loops with smaller models. These settings cap iterations:

"mode": {
  "build": {
    "steps": 25,
    "permission": { "doom_loop": "deny" }
  },
  "plan": {
    "steps": 15,
    "permission": { "doom_loop": "deny" }
  }
}

steps — max agentic iterations before forcing text response
doom_loop: "deny" — immediately stop when loop detected

MCP Integration

The homelab MCP server is configured on the homelab VM:

"mcp": {
  "homelab": {
    "type": "local",
    "command": ["python3", "/path/to/homelab-mcp/server.py"],
    "enabled": true
  }
}

Host-Specific Setup

homelab VM (192.168.0.210)

User: homelab
Binary: ~/.opencode/bin/opencode
Config: ~/.config/opencode/opencode.json
MCP: homelab MCP server enabled
All 3 providers configured

moon (100.64.0.6 via Tailscale)

User: moon (access via ssh moon, then sudo -i su - moon)
Binary: ~/.opencode/bin/opencode
Config: ~/.config/opencode/opencode.json
Qwen3 30B provider only

Requirements

Tool calling support required — OpenCode sends tools with every request. Models without tool call templates (e.g., Ollama Qwen3.5) return 400 errors
Large context needed — OpenCode's system prompt + tool definitions use ~15-20k tokens. Models with less than 32k context will fail
vLLM recommended — Use --enable-auto-tool-choice --tool-call-parser hermes for reliable function calling

Troubleshooting

Error	Cause	Fix
`bad request` / 400	Model doesn't support tools, or context exceeded	Switch to vLLM model with tool calling
`max_tokens negative`	Context window too small for system prompt	Increase `max_model_len` on vLLM
Stuck in loops	Model keeps retrying failed tool calls	`doom_loop: "deny"` and reduce `steps`
Connection refused	LLM endpoint down or auth blocking	Check vLLM pod status, verify auth bypass
Slow responses	Dense model on limited GPU	Use MoE model (Qwen3 30B) for faster inference

3.9 KiB Raw Blame History