Files
homelab-optimized/docs/admin/ai-integrations.md
Gitea Mirror Bot 0de1e1dcd6
Some checks failed
Documentation / Build Docusaurus (push) Failing after 4m59s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-21 07:03:28 UTC
2026-03-21 07:03:28 +00:00

4.6 KiB

AI Integrations

Last updated: 2026-03-16

Overview of all AI/LLM integrations across the homelab. The primary GPU inference backend is Olares (RTX 5090 Max-Q, 24GB VRAM) running Qwen3-Coder via Ollama.


Primary AI Backend — Olares

Property Value
Host Olares (192.168.0.145)
GPU RTX 5090 Max-Q (24GB VRAM)
Active model qwen3-coder:latest (30.5B MoE, Q4_K_M)
Ollama endpoint https://a5be22681.vishinator.olares.com
OpenAI-compat endpoint https://a5be22681.vishinator.olares.com/v1
Native Ollama API https://a5be22681.vishinator.olares.com/api/...

Port 11434 is not directly exposed — all access goes through the Olares reverse proxy at the above URL.

Check active models

curl -s https://a5be22681.vishinator.olares.com/api/tags | python3 -m json.tool
curl -s https://a5be22681.vishinator.olares.com/api/ps  # currently loaded in VRAM

Switch models

See docs/services/individual/olares.md for scaling operations.


Services Using Olares AI

Service Host Feature Config
AnythingLLM Atlantis RAG document assistant LLM_PROVIDER=generic-openai, GENERIC_OPEN_AI_BASE_PATH=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest
Perplexica homelab-vm AI-powered search engine OLLAMA_BASE_URL=https://a5be22681.vishinator.olares.com, model set via UI
Reactive Resume v5 Calypso AI resume writing assistance OPENAI_BASE_URL=https://a5be22681.vishinator.olares.com/v1, model=qwen3-coder:latest
OpenCode homelab-vm + moon Coding agent (you are here) ~/.config/opencode/opencode.json → Olares provider

Perplexica config persistence

Perplexica stores its provider config in a Docker volume at /home/perplexica/data/config.json. The OLLAMA_BASE_URL env var sets the default but the UI/DB config takes precedence. The current config is set to olares-ollama provider with qwen3-coder:latest.

To reset if the config gets corrupted:

docker exec perplexica cat /home/perplexica/data/config.json
# Edit and update as needed, then restart
docker restart perplexica

Services Using Other AI Backends

Service Host Backend Notes
OpenHands homelab-vm Anthropic Claude Sonnet 4 (cloud) LLM_MODEL=anthropic/claude-sonnet-4-20250514 — kept on Claude as it's significantly better for agentic coding than local models
Paperless-AI Calypso LM Studio on Shinku (100.98.93.15:1234) via Tailscale Auto-tags/classifies Paperless documents. Model: llama-3.2-3b-instruct. Could be switched to Olares for better quality.
Hoarder homelab-vm OpenAI cloud API (sk-proj-...) AI bookmark tagging/summarization. Could be switched to Olares to save cost.
Home Assistant Voice Concord NUC Local Whisper tiny-int8 + Piper TTS Voice command pipeline — fully local, no GPU needed
Ollama + Open WebUI Atlantis ROCm GPU (phi3:mini, gemma:2b) Separate Ollama instance for Atlantis-local use
LlamaGPT Atlantis llama.cpp (Nous-Hermes-Llama-2-7B) Legacy — likely unused
Reactive Resume (bundled) Calypso Bundled Ollama Resume-OLLAMA-V5 (llama3.2:3b) Still running but app is now pointed at Olares
Ollama + vLLM Seattle VPS CPU-only (llama3.2:3b, Qwen2.5-1.5B) CPU inference, used previously by Perplexica
OpenHands (MSI laptop) Edge device LM Studio (devstral-small-2507) Ad-hoc run config, not a managed stack

Candidates to Migrate to Olares

Service Effort Benefit
Paperless-AI Low — change CUSTOM_BASE_URL in compose Better model (30B vs 3B) for document classification
Hoarder Low — add OPENAI_BASE_URL env var Eliminates cloud API cost

Olares Endpoint Reference

All services should use the same base URL: https://a5be22681.vishinator.olares.com

Protocol URL Use for
OpenAI-compat https://a5be22681.vishinator.olares.com/v1 Services expecting OpenAI API format
Native Ollama https://a5be22681.vishinator.olares.com Services with native Ollama support
Models list https://a5be22681.vishinator.olares.com/api/tags Check available models
Active models https://a5be22681.vishinator.olares.com/api/ps Check VRAM usage

Note: Only one large model should be loaded at a time (24GB VRAM limit). If inference is slow or failing, check api/ps — another model may be occupying VRAM.