# Perplexica + Seattle Ollama Integration Guide ## Overview This guide explains how to configure Perplexica (running on homelab-vm at 192.168.0.210) to use the Ollama instance running on the Seattle VM (Contabo VPS at 100.82.197.124 via Tailscale). ## Why This Setup? ### Benefits 1. **Load Distribution**: Spread LLM inference across multiple servers 2. **Redundancy**: Backup LLM provider if primary Ollama fails 3. **Cost Efficiency**: Use self-hosted inference instead of cloud APIs 4. **Privacy**: All inference stays within your infrastructure ### Architecture ``` ┌─────────────────┐ │ Perplexica │ │ 192.168.0.210 │ │ :4785 │ └────────┬────────┘ │ ├──────────┐ │ │ ▼ ▼ ┌────────────┐ ┌────────────┐ │ Ollama │ │ Ollama │ │ Atlantis │ │ Seattle │ │ :11434 │ │ :11434 │ └────────────┘ └────────────┘ (Primary) (Secondary) ``` ## Prerequisites - Perplexica running on homelab-vm (192.168.0.210:4785) - Ollama running on Seattle VM (100.82.197.124:11434) - Tailscale VPN connecting both machines - At least one model pulled on Seattle Ollama ## Step-by-Step Configuration ### 1. Verify Connectivity First, verify that the homelab can reach Seattle's Ollama: ```bash # From homelab machine curl http://100.82.197.124:11434/api/tags # Should return JSON with available models ``` ### 2. Access Perplexica Settings 1. Open your web browser 2. Navigate to: **http://192.168.0.210:4785** 3. Click the **Settings** icon (gear icon) in the top right 4. Or go directly to: **http://192.168.0.210:4785/settings** ### 3. Add Ollama Seattle Provider 1. In Settings, click **"Model Providers"** section 2. Click **"Add Provider"** button 3. Fill in the form: | Field | Value | |-------|-------| | **Name** | Ollama Seattle | | **Type** | Ollama | | **Base URL** | `http://100.82.197.124:11434` | | **API Key** | *(leave empty)* | 4. Click **"Save"** or **"Add"** ### 4. Select Model After adding the provider: 1. Return to the main Perplexica search page 2. Click on the **model selector** dropdown 3. You should see **"Ollama Seattle"** as an option 4. Expand it to see available models: - `qwen2.5:1.5b` 5. Select the model you want to use ### 5. Test the Integration 1. Enter a search query (e.g., "What is machine learning?") 2. Press Enter or click Search 3. Observe the response 4. Verify it's using Seattle Ollama (check response time, different from primary) ## Performance Issues & Solutions ⚠️ **IMPORTANT**: CPU-based Ollama inference on Seattle is very slow for larger models. See [PERPLEXICA_TROUBLESHOOTING.md](./PERPLEXICA_TROUBLESHOOTING.md) for detailed performance analysis. ### Performance Timeline - **Qwen2.5:1.5b on Seattle CPU**: 10 minutes per query ❌ (unusable) - **TinyLlama:1.1b on Seattle CPU**: 12 seconds per query ⚠️ (slow but usable) - **Groq API (Llama 3.3 70B)**: 0.4 seconds per query ✅ (recommended) ### Recommended Configuration (As of Feb 2026) - **Primary**: Use Groq API for chat (fast, free tier available) - **Secondary**: Use Seattle Ollama for embeddings only - **Fallback**: TinyLlama on Seattle if Groq unavailable ## Troubleshooting ### Provider Not Appearing **Problem**: Seattle Ollama doesn't show up in provider list **Solutions**: 1. Refresh the page (Ctrl+F5 or Cmd+Shift+R) 2. Check browser console for errors (F12) 3. Verify provider was saved correctly 4. Re-add the provider ### Connection Timeout **Problem**: Perplexica can't connect to Seattle Ollama **Check connectivity**: ```bash # From the Perplexica container docker exec perplexica curl -m 5 http://100.82.197.124:11434/api/tags ``` **Solutions**: 1. Verify Tailscale is running on both machines: ```bash tailscale status ``` 2. Check if Seattle Ollama is running: ```bash ssh seattle-tailscale "docker ps | grep ollama" ``` 3. Test from homelab host: ```bash curl http://100.82.197.124:11434/api/tags ``` ### No Models Available **Problem**: Provider added but no models show up **Solution**: Pull a model on Seattle: ```bash ssh seattle-tailscale "docker exec ollama-seattle ollama pull qwen2.5:1.5b" ``` ### Slow Responses **Problem**: Seattle Ollama is slower than expected **Causes**: - Seattle VM uses CPU-only inference (no GPU) - Network latency over Tailscale - Model too large for CPU **Solutions**: 1. Use smaller models (1.5B or 3B) 2. Stick to primary Ollama for time-sensitive queries 3. Use Seattle Ollama for background/batch queries ## Performance Comparison ### Expected Response Times | Setup | Tokens/Second | Notes | |-------|---------------|-------| | **Atlantis Ollama** (GPU) | 50-100+ | Much faster with GPU | | **Seattle Ollama** (CPU) | 8-12 | Adequate for most queries | | **Cloud APIs** (OpenAI, etc.) | 30-60 | Fast but costs money | ### When to Use Each **Use Atlantis Ollama (Primary)**: - Real-time searches - Large models (7B+) - When GPU acceleration is beneficial **Use Seattle Ollama (Secondary)**: - Load balancing during heavy usage - Backup when primary is down - Testing new models - When primary is busy ## Advanced Configuration ### Load Balancing Strategy To automatically distribute load: 1. Configure both Ollama instances 2. Use smaller models on Seattle (1.5B, 3B) 3. Reserve larger models (7B+) for Atlantis 4. Manually switch based on load ### Model Recommendations by Instance **Atlantis Ollama** (GPU): - `mistral:7b` - Best quality - `codellama:7b` - Code tasks - `llama3:8b` - General purpose **Seattle Ollama** (CPU): - `qwen2.5:1.5b` - Very fast, light - `qwen2.5:3b` - Good balance - `phi3:3.8b` - Efficient ### Monitoring Track which instance is being used: ```bash # Watch Atlantis Ollama logs ssh atlantis "docker logs -f ollama" # Watch Seattle Ollama logs ssh seattle-tailscale "docker logs -f ollama-seattle" ``` ## Cost Analysis ### Before Integration - Single Ollama instance (Atlantis) - Risk of overload during heavy usage - Single point of failure ### After Integration - Distributed inference capacity - No additional ongoing costs (VPS already paid for) - Redundancy built in - Can scale by adding more instances ### vs Cloud APIs | Scenario | Cloud API Cost | Self-Hosted Cost | |----------|---------------|------------------| | 1M tokens/month | $0.15-0.60 | $0 (already running) | | 10M tokens/month | $1.50-6.00 | $0 | | 100M tokens/month | $15-60 | $0 | ## Security Considerations ### Current Setup - Ollama accessible only via Tailscale - No public internet exposure - No authentication required (trusted network) ### Recommended Enhancements 1. **Tailscale ACLs**: Restrict which devices can access Ollama 2. **Reverse Proxy**: Add Nginx with basic auth 3. **Rate Limiting**: Prevent abuse 4. **Monitoring**: Alert on unusual usage patterns ## Maintenance ### Regular Tasks **Weekly**: - Check Ollama is running: `docker ps | grep ollama` - Verify connectivity: `curl http://100.82.197.124:11434/api/tags` **Monthly**: - Update Ollama image: `docker pull ollama/ollama:latest` - Clean up unused models: `ollama list` and `ollama rm ` - Check disk space: `df -h` **As Needed**: - Pull new models based on usage patterns - Adjust resource limits if performance issues - Update Perplexica when new versions release ## Related Documentation - [Ollama Seattle Setup](../../hosts/vms/seattle/README-ollama.md) - Full Seattle Ollama documentation - [Perplexica Service](../services/individual/perplexica.md) - Main Perplexica documentation - [Seattle VM Overview](../../hosts/vms/seattle/README.md) - Seattle server details ## Changelog ### February 16, 2026 - **Initial setup**: Deployed Ollama on Seattle VM - **Model**: Pulled `qwen2.5:1.5b` - **Integration**: Configured Perplexica to use Seattle Ollama - **Documentation**: Created this guide ### Attempted vLLM (Failed) - Tried `vllm/vllm-openai:latest` for CPU inference - Failed with device detection errors - vLLM not suitable for CPU-only systems - Switched to Ollama successfully --- **Status:** 🔴 Performance Issues - Use Groq API instead **Last Updated:** February 16, 2026 **Maintained By:** Manual Configuration See [PERPLEXICA_STATUS.md](../../PERPLEXICA_STATUS.md) for current operational status.