Files
homelab-optimized/docs/admin/ai-integrations.md
Gitea Mirror Bot f90b6dd93f
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-05 10:17:38 UTC
2026-04-05 10:17:38 +00:00

5.7 KiB

AI Integrations

Last updated: 2026-03-20

Overview of all AI/LLM integrations across the homelab. The primary GPU inference backend is Olares (RTX 5090 Max-Q, 24GB VRAM) running Qwen3-Coder via Ollama.


Primary AI Backend — Olares

Property Value
Host Olares (192.168.0.145)
GPU RTX 5090 Max-Q (24GB VRAM)
Active model qwen3-coder:latest (30.5B MoE, Q4_K_M)
Ollama endpoint https://a5be22681.vishinator.olares.com
OpenAI-compat endpoint https://a5be22681.vishinator.olares.com/v1
Native Ollama API https://a5be22681.vishinator.olares.com/api/...

Port 11434 is not directly exposed — all access goes through the Olares reverse proxy at the above URL.

Check active models

curl -s https://a5be22681.vishinator.olares.com/api/tags | python3 -m json.tool
curl -s https://a5be22681.vishinator.olares.com/api/ps  # currently loaded in VRAM

Switch models

See docs/services/individual/olares.md for scaling operations.


Services Using Olares AI

Service Host Feature Config
AnythingLLM Atlantis RAG document assistant LLM_PROVIDER=generic-openai, GENERIC_OPEN_AI_BASE_PATH=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest
Perplexica homelab-vm AI-powered search engine OLLAMA_BASE_URL=https://a5be22681.vishinator.olares.com, model set via UI
Reactive Resume v5 Calypso AI resume writing assistance OPENAI_BASE_URL=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest
OpenCode (homelab-vm) homelab-vm Coding agent ~/.config/opencode/opencode.json → Olares Ollama, model=qwen3-coder:latest
OpenCode (moon) moon Coding agent /home/moon/.config/opencode/opencode.json → Olares Ollama, model=qwen3-coder:latest (was: vLLM qwen3-30b — migrated 2026-03-20)

Perplexica config persistence

Perplexica stores its provider config in a Docker volume at /home/perplexica/data/config.json. The OLLAMA_BASE_URL env var sets the default but the UI/DB config takes precedence. The current config is set to olares-ollama provider with qwen3-coder:latest.

To reset if the config gets corrupted:

docker exec perplexica cat /home/perplexica/data/config.json
# Edit and update as needed, then restart
docker restart perplexica

Services Using Other AI Backends

Service Host Backend Notes
OpenHands homelab-vm Anthropic Claude Sonnet 4 (cloud) LLM_MODEL=anthropic/claude-sonnet-4-20250514 — kept on Claude as it's significantly better for agentic coding than local models
Paperless-AI Calypso LM Studio on Shinku (100.98.93.15:1234) via Tailscale Auto-tags/classifies Paperless documents. Model: llama-3.2-3b-instruct. Could be switched to Olares for better quality.
Hoarder homelab-vm OpenAI cloud API (sk-proj-...) AI bookmark tagging/summarization. Could be switched to Olares to save cost.
Home Assistant Voice Concord NUC Local Whisper tiny-int8 + Piper TTS Voice command pipeline — fully local, no GPU needed
Ollama + Open WebUI Atlantis ROCm GPU (phi3:mini, gemma:2b) Separate Ollama instance for Atlantis-local use
LlamaGPT Atlantis llama.cpp (Nous-Hermes-Llama-2-7B) Legacy — likely unused
Reactive Resume (bundled) Calypso Bundled Ollama Resume-OLLAMA-V5 (llama3.2:3b) Still running but app is now pointed at Olares
Ollama + vLLM Seattle VPS CPU-only (llama3.2:3b, Qwen2.5-1.5B) CPU inference, used previously by Perplexica
OpenHands (MSI laptop) Edge device LM Studio (devstral-small-2507) Ad-hoc run config, not a managed stack

Candidates to Migrate to Olares

Service Effort Benefit
Paperless-AI Low — change CUSTOM_BASE_URL in compose Better model (30B vs 3B) for document classification
Hoarder Low — add OPENAI_BASE_URL env var Eliminates cloud API cost

Olares Endpoint Reference

Protocol URL Use for
OpenAI-compat (Ollama) https://a5be22681.vishinator.olares.com/v1 Services expecting OpenAI API format — primary endpoint
Native Ollama https://a5be22681.vishinator.olares.com Services with native Ollama support
Models list https://a5be22681.vishinator.olares.com/api/tags Check available models
Active models https://a5be22681.vishinator.olares.com/api/ps Check VRAM usage
vLLM (legacy) https://04521407.vishinator.olares.com/v1 vLLM inference — available but not currently used

Note: Only one large model should be loaded at a time (24GB VRAM limit). If inference is slow or failing, check api/ps — another model may be occupying VRAM.

OpenCode per-host config

OpenCode config lives at ~/.config/opencode/opencode.json on each machine. All instances use Olares Ollama:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "olares": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Olares Ollama (Qwen3-Coder)",
      "options": {
        "baseURL": "https://a5be22681.vishinator.olares.com/v1"
      },
      "models": {
        "qwen3-coder:latest": {
          "name": "Qwen3 Coder 30.5B Q4_K_M",
          "limit": { "context": 40000, "output": 8192 }
        }
      }
    }
  },
  "model": "olares/qwen3-coder:latest"
}

Config locations:

  • homelab-vm: /home/homelab/.config/opencode/opencode.json
  • moon: /home/moon/.config/opencode/opencode.json (migrated from vLLM 2026-03-20)