8.3 KiB
Perplexica + Seattle Ollama Integration Guide
Overview
This guide explains how to configure Perplexica (running on homelab-vm at 192.168.0.210) to use the Ollama instance running on the Seattle VM (Contabo VPS at 100.82.197.124 via Tailscale).
Why This Setup?
Benefits
- Load Distribution: Spread LLM inference across multiple servers
- Redundancy: Backup LLM provider if primary Ollama fails
- Cost Efficiency: Use self-hosted inference instead of cloud APIs
- Privacy: All inference stays within your infrastructure
Architecture
┌─────────────────┐
│ Perplexica │
│ 192.168.0.210 │
│ :4785 │
└────────┬────────┘
│
├──────────┐
│ │
▼ ▼
┌────────────┐ ┌────────────┐
│ Ollama │ │ Ollama │
│ Atlantis │ │ Seattle │
│ :11434 │ │ :11434 │
└────────────┘ └────────────┘
(Primary) (Secondary)
Prerequisites
- Perplexica running on homelab-vm (192.168.0.210:4785)
- Ollama running on Seattle VM (100.82.197.124:11434)
- Tailscale VPN connecting both machines
- At least one model pulled on Seattle Ollama
Step-by-Step Configuration
1. Verify Connectivity
First, verify that the homelab can reach Seattle's Ollama:
# From homelab machine
curl http://100.82.197.124:11434/api/tags
# Should return JSON with available models
2. Access Perplexica Settings
- Open your web browser
- Navigate to: http://192.168.0.210:4785
- Click the Settings icon (gear icon) in the top right
- Or go directly to: http://192.168.0.210:4785/settings
3. Add Ollama Seattle Provider
- In Settings, click "Model Providers" section
- Click "Add Provider" button
- Fill in the form:
| Field | Value |
|---|---|
| Name | Ollama Seattle |
| Type | Ollama |
| Base URL | http://100.82.197.124:11434 |
| API Key | (leave empty) |
- Click "Save" or "Add"
4. Select Model
After adding the provider:
- Return to the main Perplexica search page
- Click on the model selector dropdown
- You should see "Ollama Seattle" as an option
- Expand it to see available models:
qwen2.5:1.5b
- Select the model you want to use
5. Test the Integration
- Enter a search query (e.g., "What is machine learning?")
- Press Enter or click Search
- Observe the response
- Verify it's using Seattle Ollama (check response time, different from primary)
Performance Issues & Solutions
⚠️ IMPORTANT: CPU-based Ollama inference on Seattle is very slow for larger models.
See PERPLEXICA_TROUBLESHOOTING.md for detailed performance analysis.
Performance Timeline
- Qwen2.5:1.5b on Seattle CPU: 10 minutes per query ❌ (unusable)
- TinyLlama:1.1b on Seattle CPU: 12 seconds per query ⚠️ (slow but usable)
- Groq API (Llama 3.3 70B): 0.4 seconds per query ✅ (recommended)
Recommended Configuration (As of Feb 2026)
- Primary: Use Groq API for chat (fast, free tier available)
- Secondary: Use Seattle Ollama for embeddings only
- Fallback: TinyLlama on Seattle if Groq unavailable
Troubleshooting
Provider Not Appearing
Problem: Seattle Ollama doesn't show up in provider list
Solutions:
- Refresh the page (Ctrl+F5 or Cmd+Shift+R)
- Check browser console for errors (F12)
- Verify provider was saved correctly
- Re-add the provider
Connection Timeout
Problem: Perplexica can't connect to Seattle Ollama
Check connectivity:
# From the Perplexica container
docker exec perplexica curl -m 5 http://100.82.197.124:11434/api/tags
Solutions:
-
Verify Tailscale is running on both machines:
tailscale status -
Check if Seattle Ollama is running:
ssh seattle-tailscale "docker ps | grep ollama" -
Test from homelab host:
curl http://100.82.197.124:11434/api/tags
No Models Available
Problem: Provider added but no models show up
Solution: Pull a model on Seattle:
ssh seattle-tailscale "docker exec ollama-seattle ollama pull qwen2.5:1.5b"
Slow Responses
Problem: Seattle Ollama is slower than expected
Causes:
- Seattle VM uses CPU-only inference (no GPU)
- Network latency over Tailscale
- Model too large for CPU
Solutions:
- Use smaller models (1.5B or 3B)
- Stick to primary Ollama for time-sensitive queries
- Use Seattle Ollama for background/batch queries
Performance Comparison
Expected Response Times
| Setup | Tokens/Second | Notes |
|---|---|---|
| Atlantis Ollama (GPU) | 50-100+ | Much faster with GPU |
| Seattle Ollama (CPU) | 8-12 | Adequate for most queries |
| Cloud APIs (OpenAI, etc.) | 30-60 | Fast but costs money |
When to Use Each
Use Atlantis Ollama (Primary):
- Real-time searches
- Large models (7B+)
- When GPU acceleration is beneficial
Use Seattle Ollama (Secondary):
- Load balancing during heavy usage
- Backup when primary is down
- Testing new models
- When primary is busy
Advanced Configuration
Load Balancing Strategy
To automatically distribute load:
- Configure both Ollama instances
- Use smaller models on Seattle (1.5B, 3B)
- Reserve larger models (7B+) for Atlantis
- Manually switch based on load
Model Recommendations by Instance
Atlantis Ollama (GPU):
mistral:7b- Best qualitycodellama:7b- Code tasksllama3:8b- General purpose
Seattle Ollama (CPU):
qwen2.5:1.5b- Very fast, lightqwen2.5:3b- Good balancephi3:3.8b- Efficient
Monitoring
Track which instance is being used:
# Watch Atlantis Ollama logs
ssh atlantis "docker logs -f ollama"
# Watch Seattle Ollama logs
ssh seattle-tailscale "docker logs -f ollama-seattle"
Cost Analysis
Before Integration
- Single Ollama instance (Atlantis)
- Risk of overload during heavy usage
- Single point of failure
After Integration
- Distributed inference capacity
- No additional ongoing costs (VPS already paid for)
- Redundancy built in
- Can scale by adding more instances
vs Cloud APIs
| Scenario | Cloud API Cost | Self-Hosted Cost |
|---|---|---|
| 1M tokens/month | $0.15-0.60 | $0 (already running) |
| 10M tokens/month | $1.50-6.00 | $0 |
| 100M tokens/month | $15-60 | $0 |
Security Considerations
Current Setup
- Ollama accessible only via Tailscale
- No public internet exposure
- No authentication required (trusted network)
Recommended Enhancements
- Tailscale ACLs: Restrict which devices can access Ollama
- Reverse Proxy: Add Nginx with basic auth
- Rate Limiting: Prevent abuse
- Monitoring: Alert on unusual usage patterns
Maintenance
Regular Tasks
Weekly:
- Check Ollama is running:
docker ps | grep ollama - Verify connectivity:
curl http://100.82.197.124:11434/api/tags
Monthly:
- Update Ollama image:
docker pull ollama/ollama:latest - Clean up unused models:
ollama listandollama rm <model> - Check disk space:
df -h
As Needed:
- Pull new models based on usage patterns
- Adjust resource limits if performance issues
- Update Perplexica when new versions release
Related Documentation
- Ollama Seattle Setup - Full Seattle Ollama documentation
- Perplexica Service - Main Perplexica documentation
- Seattle VM Overview - Seattle server details
Changelog
February 16, 2026
- Initial setup: Deployed Ollama on Seattle VM
- Model: Pulled
qwen2.5:1.5b - Integration: Configured Perplexica to use Seattle Ollama
- Documentation: Created this guide
Attempted vLLM (Failed)
- Tried
vllm/vllm-openai:latestfor CPU inference - Failed with device detection errors
- vLLM not suitable for CPU-only systems
- Switched to Ollama successfully
Status: 🔴 Performance Issues - Use Groq API instead Last Updated: February 16, 2026 Maintained By: Manual Configuration
See PERPLEXICA_STATUS.md for current operational status.