5.7 KiB
AI Integrations
Last updated: 2026-03-20
Overview of all AI/LLM integrations across the homelab. The primary GPU inference backend is Olares (RTX 5090 Max-Q, 24GB VRAM) running Qwen3-Coder via Ollama.
Primary AI Backend — Olares
| Property | Value |
|---|---|
| Host | Olares (192.168.0.145) |
| GPU | RTX 5090 Max-Q (24GB VRAM) |
| Active model | qwen3-coder:latest (30.5B MoE, Q4_K_M) |
| Ollama endpoint | https://a5be22681.vishinator.olares.com |
| OpenAI-compat endpoint | https://a5be22681.vishinator.olares.com/v1 |
| Native Ollama API | https://a5be22681.vishinator.olares.com/api/... |
Port 11434 is not directly exposed — all access goes through the Olares reverse proxy at the above URL.
Check active models
curl -s https://a5be22681.vishinator.olares.com/api/tags | python3 -m json.tool
curl -s https://a5be22681.vishinator.olares.com/api/ps # currently loaded in VRAM
Switch models
See docs/services/individual/olares.md for scaling operations.
Services Using Olares AI
| Service | Host | Feature | Config |
|---|---|---|---|
| AnythingLLM | Atlantis | RAG document assistant | LLM_PROVIDER=generic-openai, GENERIC_OPEN_AI_BASE_PATH=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest |
| Perplexica | homelab-vm | AI-powered search engine | OLLAMA_BASE_URL=https://a5be22681.vishinator.olares.com, model set via UI |
| Reactive Resume v5 | Calypso | AI resume writing assistance | OPENAI_BASE_URL=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest |
| OpenCode (homelab-vm) | homelab-vm | Coding agent | ~/.config/opencode/opencode.json → Olares Ollama, model=qwen3-coder:latest |
| OpenCode (moon) | moon | Coding agent | /home/moon/.config/opencode/opencode.json → Olares Ollama, model=qwen3-coder:latest (was: vLLM qwen3-30b — migrated 2026-03-20) |
Perplexica config persistence
Perplexica stores its provider config in a Docker volume at /home/perplexica/data/config.json. The OLLAMA_BASE_URL env var sets the default but the UI/DB config takes precedence. The current config is set to olares-ollama provider with qwen3-coder:latest.
To reset if the config gets corrupted:
docker exec perplexica cat /home/perplexica/data/config.json
# Edit and update as needed, then restart
docker restart perplexica
Services Using Other AI Backends
| Service | Host | Backend | Notes |
|---|---|---|---|
| OpenHands | homelab-vm | Anthropic Claude Sonnet 4 (cloud) | LLM_MODEL=anthropic/claude-sonnet-4-20250514 — kept on Claude as it's significantly better for agentic coding than local models |
| Paperless-AI | Calypso | LM Studio on Shinku (100.98.93.15:1234) via Tailscale |
Auto-tags/classifies Paperless documents. Model: llama-3.2-3b-instruct. Could be switched to Olares for better quality. |
| Hoarder | homelab-vm | OpenAI cloud API (sk-proj-...) |
AI bookmark tagging/summarization. Could be switched to Olares to save cost. |
| Home Assistant Voice | Concord NUC | Local Whisper tiny-int8 + Piper TTS |
Voice command pipeline — fully local, no GPU needed |
| Ollama + Open WebUI | Atlantis | ROCm GPU (phi3:mini, gemma:2b) |
Separate Ollama instance for Atlantis-local use |
| LlamaGPT | Atlantis | llama.cpp (Nous-Hermes-Llama-2-7B) |
Legacy — likely unused |
| Reactive Resume (bundled) | Calypso | Bundled Ollama Resume-OLLAMA-V5 (llama3.2:3b) |
Still running but app is now pointed at Olares |
| Ollama + vLLM | Seattle VPS | CPU-only (llama3.2:3b, Qwen2.5-1.5B) |
CPU inference, used previously by Perplexica |
| OpenHands (MSI laptop) | Edge device | LM Studio (devstral-small-2507) |
Ad-hoc run config, not a managed stack |
Candidates to Migrate to Olares
| Service | Effort | Benefit |
|---|---|---|
| Paperless-AI | Low — change CUSTOM_BASE_URL in compose |
Better model (30B vs 3B) for document classification |
| Hoarder | Low — add OPENAI_BASE_URL env var |
Eliminates cloud API cost |
Olares Endpoint Reference
| Protocol | URL | Use for |
|---|---|---|
| OpenAI-compat (Ollama) | https://a5be22681.vishinator.olares.com/v1 |
Services expecting OpenAI API format — primary endpoint |
| Native Ollama | https://a5be22681.vishinator.olares.com |
Services with native Ollama support |
| Models list | https://a5be22681.vishinator.olares.com/api/tags |
Check available models |
| Active models | https://a5be22681.vishinator.olares.com/api/ps |
Check VRAM usage |
| vLLM (legacy) | https://04521407.vishinator.olares.com/v1 |
vLLM inference — available but not currently used |
Note: Only one large model should be loaded at a time (24GB VRAM limit). If inference is slow or failing, check
api/ps— another model may be occupying VRAM.
OpenCode per-host config
OpenCode config lives at ~/.config/opencode/opencode.json on each machine. All instances use Olares Ollama:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"olares": {
"npm": "@ai-sdk/openai-compatible",
"name": "Olares Ollama (Qwen3-Coder)",
"options": {
"baseURL": "https://a5be22681.vishinator.olares.com/v1"
},
"models": {
"qwen3-coder:latest": {
"name": "Qwen3 Coder 30.5B Q4_K_M",
"limit": { "context": 40000, "output": 8192 }
}
}
}
},
"model": "olares/qwen3-coder:latest"
}
Config locations:
- homelab-vm:
/home/homelab/.config/opencode/opencode.json - moon:
/home/moon/.config/opencode/opencode.json(migrated from vLLM 2026-03-20)