AI Integrations

Last updated: 2026-03-16

Overview of all AI/LLM integrations across the homelab. The primary GPU inference backend is Olares (RTX 5090 Max-Q, 24GB VRAM) running Qwen3-Coder via Ollama.

Primary AI Backend — Olares

Property	Value
Host	Olares (`192.168.0.145`)
GPU	RTX 5090 Max-Q (24GB VRAM)
Active model	`qwen3-coder:latest` (30.5B MoE, Q4_K_M)
Ollama endpoint	`https://a5be22681.vishinator.olares.com`
OpenAI-compat endpoint	`https://a5be22681.vishinator.olares.com/v1`
Native Ollama API	`https://a5be22681.vishinator.olares.com/api/...`

Port 11434 is not directly exposed — all access goes through the Olares reverse proxy at the above URL.

Check active models

curl -s https://a5be22681.vishinator.olares.com/api/tags | python3 -m json.tool
curl -s https://a5be22681.vishinator.olares.com/api/ps  # currently loaded in VRAM

Switch models

See docs/services/individual/olares.md for scaling operations.

Services Using Olares AI

Service	Host	Feature	Config
AnythingLLM	Atlantis	RAG document assistant	`LLM_PROVIDER=generic-openai`, `GENERIC_OPEN_AI_BASE_PATH=https://a5be22681.vishinator.olares.com/v1`, model=`qwen3-coder:latest`
Perplexica	homelab-vm	AI-powered search engine	`OLLAMA_BASE_URL=https://a5be22681.vishinator.olares.com`, model set via UI
Reactive Resume v5	Calypso	AI resume writing assistance	`OPENAI_BASE_URL=https://a5be22681.vishinator.olares.com/v1`, model=`qwen3-coder:latest`
OpenCode	homelab-vm + moon	Coding agent (you are here)	`~/.config/opencode/opencode.json` → Olares provider

Perplexica config persistence

Perplexica stores its provider config in a Docker volume at /home/perplexica/data/config.json. The OLLAMA_BASE_URL env var sets the default but the UI/DB config takes precedence. The current config is set to olares-ollama provider with qwen3-coder:latest.

To reset if the config gets corrupted:

docker exec perplexica cat /home/perplexica/data/config.json
# Edit and update as needed, then restart
docker restart perplexica

Services Using Other AI Backends

Service	Host	Backend	Notes
OpenHands	homelab-vm	Anthropic Claude Sonnet 4 (cloud)	`LLM_MODEL=anthropic/claude-sonnet-4-20250514` — kept on Claude as it's significantly better for agentic coding than local models
Paperless-AI	Calypso	LM Studio on Shinku (`100.98.93.15:1234`) via Tailscale	Auto-tags/classifies Paperless documents. Model: `llama-3.2-3b-instruct`. Could be switched to Olares for better quality.
Hoarder	homelab-vm	OpenAI cloud API (`sk-proj-...`)	AI bookmark tagging/summarization. Could be switched to Olares to save cost.
Home Assistant Voice	Concord NUC	Local Whisper `tiny-int8` + Piper TTS	Voice command pipeline — fully local, no GPU needed
Ollama + Open WebUI	Atlantis	ROCm GPU (`phi3:mini`, `gemma:2b`)	Separate Ollama instance for Atlantis-local use
LlamaGPT	Atlantis	llama.cpp (`Nous-Hermes-Llama-2-7B`)	Legacy — likely unused
Reactive Resume (bundled)	Calypso	Bundled Ollama `Resume-OLLAMA-V5` (`llama3.2:3b`)	Still running but app is now pointed at Olares
Ollama + vLLM	Seattle VPS	CPU-only (`llama3.2:3b`, `Qwen2.5-1.5B`)	CPU inference, used previously by Perplexica
OpenHands (MSI laptop)	Edge device	LM Studio (`devstral-small-2507`)	Ad-hoc run config, not a managed stack

Candidates to Migrate to Olares

Service	Effort	Benefit
Paperless-AI	Low — change `CUSTOM_BASE_URL` in compose	Better model (30B vs 3B) for document classification
Hoarder	Low — add `OPENAI_BASE_URL` env var	Eliminates cloud API cost

Olares Endpoint Reference

All services should use the same base URL: https://a5be22681.vishinator.olares.com

Protocol	URL	Use for
OpenAI-compat	`https://a5be22681.vishinator.olares.com/v1`	Services expecting OpenAI API format
Native Ollama	`https://a5be22681.vishinator.olares.com`	Services with native Ollama support
Models list	`https://a5be22681.vishinator.olares.com/api/tags`	Check available models
Active models	`https://a5be22681.vishinator.olares.com/api/ps`	Check VRAM usage

Note: Only one large model should be loaded at a time (24GB VRAM limit). If inference is slow or failing, check api/ps — another model may be occupying VRAM.

4.6 KiB Raw Blame History