Sanitized mirror from private repository - 2026-04-05 11:10:41 UTC
This commit is contained in:
120
docs/admin/ai-integrations.md
Normal file
120
docs/admin/ai-integrations.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# AI Integrations
|
||||
|
||||
**Last updated:** 2026-03-20
|
||||
|
||||
Overview of all AI/LLM integrations across the homelab. The primary GPU inference backend is **Olares** (RTX 5090 Max-Q, 24GB VRAM) running Qwen3-Coder via Ollama.
|
||||
|
||||
---
|
||||
|
||||
## Primary AI Backend — Olares
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Host** | Olares (`192.168.0.145`) |
|
||||
| **GPU** | RTX 5090 Max-Q (24GB VRAM) |
|
||||
| **Active model** | `qwen3-coder:latest` (30.5B MoE, Q4_K_M) |
|
||||
| **Ollama endpoint** | `https://a5be22681.vishinator.olares.com` |
|
||||
| **OpenAI-compat endpoint** | `https://a5be22681.vishinator.olares.com/v1` |
|
||||
| **Native Ollama API** | `https://a5be22681.vishinator.olares.com/api/...` |
|
||||
|
||||
> Port 11434 is not directly exposed — all access goes through the Olares reverse proxy at the above URL.
|
||||
|
||||
### Check active models
|
||||
```bash
|
||||
curl -s https://a5be22681.vishinator.olares.com/api/tags | python3 -m json.tool
|
||||
curl -s https://a5be22681.vishinator.olares.com/api/ps # currently loaded in VRAM
|
||||
```
|
||||
|
||||
### Switch models
|
||||
See `docs/services/individual/olares.md` for scaling operations.
|
||||
|
||||
---
|
||||
|
||||
## Services Using Olares AI
|
||||
|
||||
| Service | Host | Feature | Config |
|
||||
|---------|------|---------|--------|
|
||||
| **AnythingLLM** | Atlantis | RAG document assistant | `LLM_PROVIDER=generic-openai`, `GENERIC_OPEN_AI_BASE_PATH=https://a5be22681.vishinator.olares.com/v1`, model=`qwen3-coder:latest` |
|
||||
| **Perplexica** | homelab-vm | AI-powered search engine | `OLLAMA_BASE_URL=https://a5be22681.vishinator.olares.com`, model set via UI |
|
||||
| **Reactive Resume v5** | Calypso | AI resume writing assistance | `OPENAI_BASE_URL=https://a5be22681.vishinator.olares.com/v1`, model=`qwen3-coder:latest` |
|
||||
| **OpenCode (homelab-vm)** | homelab-vm | Coding agent | `~/.config/opencode/opencode.json` → Olares Ollama, model=`qwen3-coder:latest` |
|
||||
| **OpenCode (moon)** | moon | Coding agent | `/home/moon/.config/opencode/opencode.json` → Olares Ollama, model=`qwen3-coder:latest` (was: vLLM `qwen3-30b` — migrated 2026-03-20) |
|
||||
|
||||
### Perplexica config persistence
|
||||
Perplexica stores its provider config in a Docker volume at `/home/perplexica/data/config.json`. The `OLLAMA_BASE_URL` env var sets the default but the UI/DB config takes precedence. The current config is set to `olares-ollama` provider with `qwen3-coder:latest`.
|
||||
|
||||
To reset if the config gets corrupted:
|
||||
```bash
|
||||
docker exec perplexica cat /home/perplexica/data/config.json
|
||||
# Edit and update as needed, then restart
|
||||
docker restart perplexica
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Services Using Other AI Backends
|
||||
|
||||
| Service | Host | Backend | Notes |
|
||||
|---------|------|---------|-------|
|
||||
| **OpenHands** | homelab-vm | Anthropic Claude Sonnet 4 (cloud) | `LLM_MODEL=anthropic/claude-sonnet-4-20250514` — kept on Claude as it's significantly better for agentic coding than local models |
|
||||
| **Paperless-AI** | Calypso | LM Studio on Shinku (`100.98.93.15:1234`) via Tailscale | Auto-tags/classifies Paperless documents. Model: `llama-3.2-3b-instruct`. Could be switched to Olares for better quality. |
|
||||
| **Hoarder** | homelab-vm | OpenAI cloud API (`sk-proj-...`) | AI bookmark tagging/summarization. Could be switched to Olares to save cost. |
|
||||
| **Home Assistant Voice** | Concord NUC | Local Whisper `tiny-int8` + Piper TTS | Voice command pipeline — fully local, no GPU needed |
|
||||
| **Ollama + Open WebUI** | Atlantis | ROCm GPU (`phi3:mini`, `gemma:2b`) | Separate Ollama instance for Atlantis-local use |
|
||||
| **LlamaGPT** | Atlantis | llama.cpp (`Nous-Hermes-Llama-2-7B`) | Legacy — likely unused |
|
||||
| **Reactive Resume (bundled)** | Calypso | Bundled Ollama `Resume-OLLAMA-V5` (`llama3.2:3b`) | Still running but app is now pointed at Olares |
|
||||
| **Ollama + vLLM** | Seattle VPS | CPU-only (`llama3.2:3b`, `Qwen2.5-1.5B`) | CPU inference, used previously by Perplexica |
|
||||
| **OpenHands (MSI laptop)** | Edge device | LM Studio (`devstral-small-2507`) | Ad-hoc run config, not a managed stack |
|
||||
|
||||
---
|
||||
|
||||
## Candidates to Migrate to Olares
|
||||
|
||||
| Service | Effort | Benefit |
|
||||
|---------|--------|---------|
|
||||
| **Paperless-AI** | Low — change `CUSTOM_BASE_URL` in compose | Better model (30B vs 3B) for document classification |
|
||||
| **Hoarder** | Low — add `OPENAI_BASE_URL` env var | Eliminates cloud API cost |
|
||||
|
||||
---
|
||||
|
||||
## Olares Endpoint Reference
|
||||
|
||||
| Protocol | URL | Use for |
|
||||
|----------|-----|---------|
|
||||
| OpenAI-compat (Ollama) | `https://a5be22681.vishinator.olares.com/v1` | Services expecting OpenAI API format — **primary endpoint** |
|
||||
| Native Ollama | `https://a5be22681.vishinator.olares.com` | Services with native Ollama support |
|
||||
| Models list | `https://a5be22681.vishinator.olares.com/api/tags` | Check available models |
|
||||
| Active models | `https://a5be22681.vishinator.olares.com/api/ps` | Check VRAM usage |
|
||||
| vLLM (legacy) | `https://04521407.vishinator.olares.com/v1` | vLLM inference — available but not currently used |
|
||||
|
||||
> **Note:** Only one large model should be loaded at a time (24GB VRAM limit). If inference is slow or failing, check `api/ps` — another model may be occupying VRAM.
|
||||
|
||||
### OpenCode per-host config
|
||||
|
||||
OpenCode config lives at `~/.config/opencode/opencode.json` on each machine. All instances use Olares Ollama:
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema": "https://opencode.ai/config.json",
|
||||
"provider": {
|
||||
"olares": {
|
||||
"npm": "@ai-sdk/openai-compatible",
|
||||
"name": "Olares Ollama (Qwen3-Coder)",
|
||||
"options": {
|
||||
"baseURL": "https://a5be22681.vishinator.olares.com/v1"
|
||||
},
|
||||
"models": {
|
||||
"qwen3-coder:latest": {
|
||||
"name": "Qwen3 Coder 30.5B Q4_K_M",
|
||||
"limit": { "context": 40000, "output": 8192 }
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"model": "olares/qwen3-coder:latest"
|
||||
}
|
||||
```
|
||||
|
||||
Config locations:
|
||||
- **homelab-vm**: `/home/homelab/.config/opencode/opencode.json`
|
||||
- **moon**: `/home/moon/.config/opencode/opencode.json` (migrated from vLLM 2026-03-20)
|
||||
Reference in New Issue
Block a user