4.6 KiB
AI Integrations
Last updated: 2026-03-16
Overview of all AI/LLM integrations across the homelab. The primary GPU inference backend is Olares (RTX 5090 Max-Q, 24GB VRAM) running Qwen3-Coder via Ollama.
Primary AI Backend — Olares
| Property | Value |
|---|---|
| Host | Olares (192.168.0.145) |
| GPU | RTX 5090 Max-Q (24GB VRAM) |
| Active model | qwen3-coder:latest (30.5B MoE, Q4_K_M) |
| Ollama endpoint | https://a5be22681.vishinator.olares.com |
| OpenAI-compat endpoint | https://a5be22681.vishinator.olares.com/v1 |
| Native Ollama API | https://a5be22681.vishinator.olares.com/api/... |
Port 11434 is not directly exposed — all access goes through the Olares reverse proxy at the above URL.
Check active models
curl -s https://a5be22681.vishinator.olares.com/api/tags | python3 -m json.tool
curl -s https://a5be22681.vishinator.olares.com/api/ps # currently loaded in VRAM
Switch models
See docs/services/individual/olares.md for scaling operations.
Services Using Olares AI
| Service | Host | Feature | Config |
|---|---|---|---|
| AnythingLLM | Atlantis | RAG document assistant | LLM_PROVIDER=generic-openai, GENERIC_OPEN_AI_BASE_PATH=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest |
| Perplexica | homelab-vm | AI-powered search engine | OLLAMA_BASE_URL=https://a5be22681.vishinator.olares.com, model set via UI |
| Reactive Resume v5 | Calypso | AI resume writing assistance | OPENAI_BASE_URL=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest |
| OpenCode | homelab-vm + moon | Coding agent (you are here) | ~/.config/opencode/opencode.json → Olares provider |
Perplexica config persistence
Perplexica stores its provider config in a Docker volume at /home/perplexica/data/config.json. The OLLAMA_BASE_URL env var sets the default but the UI/DB config takes precedence. The current config is set to olares-ollama provider with qwen3-coder:latest.
To reset if the config gets corrupted:
docker exec perplexica cat /home/perplexica/data/config.json
# Edit and update as needed, then restart
docker restart perplexica
Services Using Other AI Backends
| Service | Host | Backend | Notes |
|---|---|---|---|
| OpenHands | homelab-vm | Anthropic Claude Sonnet 4 (cloud) | LLM_MODEL=anthropic/claude-sonnet-4-20250514 — kept on Claude as it's significantly better for agentic coding than local models |
| Paperless-AI | Calypso | LM Studio on Shinku (100.98.93.15:1234) via Tailscale |
Auto-tags/classifies Paperless documents. Model: llama-3.2-3b-instruct. Could be switched to Olares for better quality. |
| Hoarder | homelab-vm | OpenAI cloud API (sk-proj-...) |
AI bookmark tagging/summarization. Could be switched to Olares to save cost. |
| Home Assistant Voice | Concord NUC | Local Whisper tiny-int8 + Piper TTS |
Voice command pipeline — fully local, no GPU needed |
| Ollama + Open WebUI | Atlantis | ROCm GPU (phi3:mini, gemma:2b) |
Separate Ollama instance for Atlantis-local use |
| LlamaGPT | Atlantis | llama.cpp (Nous-Hermes-Llama-2-7B) |
Legacy — likely unused |
| Reactive Resume (bundled) | Calypso | Bundled Ollama Resume-OLLAMA-V5 (llama3.2:3b) |
Still running but app is now pointed at Olares |
| Ollama + vLLM | Seattle VPS | CPU-only (llama3.2:3b, Qwen2.5-1.5B) |
CPU inference, used previously by Perplexica |
| OpenHands (MSI laptop) | Edge device | LM Studio (devstral-small-2507) |
Ad-hoc run config, not a managed stack |
Candidates to Migrate to Olares
| Service | Effort | Benefit |
|---|---|---|
| Paperless-AI | Low — change CUSTOM_BASE_URL in compose |
Better model (30B vs 3B) for document classification |
| Hoarder | Low — add OPENAI_BASE_URL env var |
Eliminates cloud API cost |
Olares Endpoint Reference
All services should use the same base URL: https://a5be22681.vishinator.olares.com
| Protocol | URL | Use for |
|---|---|---|
| OpenAI-compat | https://a5be22681.vishinator.olares.com/v1 |
Services expecting OpenAI API format |
| Native Ollama | https://a5be22681.vishinator.olares.com |
Services with native Ollama support |
| Models list | https://a5be22681.vishinator.olares.com/api/tags |
Check available models |
| Active models | https://a5be22681.vishinator.olares.com/api/ps |
Check VRAM usage |
Note: Only one large model should be loaded at a time (24GB VRAM limit). If inference is slow or failing, check
api/ps— another model may be occupying VRAM.