Sanitized mirror from private repository - 2026-04-20 01:32:01 UTC
This commit is contained in:
308
docs/guides/PERPLEXICA_SEATTLE_INTEGRATION.md
Normal file
308
docs/guides/PERPLEXICA_SEATTLE_INTEGRATION.md
Normal file
@@ -0,0 +1,308 @@
|
||||
# Perplexica + Seattle Ollama Integration Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide explains how to configure Perplexica (running on homelab-vm at 192.168.0.210) to use the Ollama instance running on the Seattle VM (Contabo VPS at 100.82.197.124 via Tailscale).
|
||||
|
||||
## Why This Setup?
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Load Distribution**: Spread LLM inference across multiple servers
|
||||
2. **Redundancy**: Backup LLM provider if primary Ollama fails
|
||||
3. **Cost Efficiency**: Use self-hosted inference instead of cloud APIs
|
||||
4. **Privacy**: All inference stays within your infrastructure
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────┐
|
||||
│ Perplexica │
|
||||
│ 192.168.0.210 │
|
||||
│ :4785 │
|
||||
└────────┬────────┘
|
||||
│
|
||||
├──────────┐
|
||||
│ │
|
||||
▼ ▼
|
||||
┌────────────┐ ┌────────────┐
|
||||
│ Ollama │ │ Ollama │
|
||||
│ Atlantis │ │ Seattle │
|
||||
│ :11434 │ │ :11434 │
|
||||
└────────────┘ └────────────┘
|
||||
(Primary) (Secondary)
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Perplexica running on homelab-vm (192.168.0.210:4785)
|
||||
- Ollama running on Seattle VM (100.82.197.124:11434)
|
||||
- Tailscale VPN connecting both machines
|
||||
- At least one model pulled on Seattle Ollama
|
||||
|
||||
## Step-by-Step Configuration
|
||||
|
||||
### 1. Verify Connectivity
|
||||
|
||||
First, verify that the homelab can reach Seattle's Ollama:
|
||||
|
||||
```bash
|
||||
# From homelab machine
|
||||
curl http://100.82.197.124:11434/api/tags
|
||||
|
||||
# Should return JSON with available models
|
||||
```
|
||||
|
||||
### 2. Access Perplexica Settings
|
||||
|
||||
1. Open your web browser
|
||||
2. Navigate to: **http://192.168.0.210:4785**
|
||||
3. Click the **Settings** icon (gear icon) in the top right
|
||||
4. Or go directly to: **http://192.168.0.210:4785/settings**
|
||||
|
||||
### 3. Add Ollama Seattle Provider
|
||||
|
||||
1. In Settings, click **"Model Providers"** section
|
||||
2. Click **"Add Provider"** button
|
||||
3. Fill in the form:
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| **Name** | Ollama Seattle |
|
||||
| **Type** | Ollama |
|
||||
| **Base URL** | `http://100.82.197.124:11434` |
|
||||
| **API Key** | *(leave empty)* |
|
||||
|
||||
4. Click **"Save"** or **"Add"**
|
||||
|
||||
### 4. Select Model
|
||||
|
||||
After adding the provider:
|
||||
|
||||
1. Return to the main Perplexica search page
|
||||
2. Click on the **model selector** dropdown
|
||||
3. You should see **"Ollama Seattle"** as an option
|
||||
4. Expand it to see available models:
|
||||
- `qwen2.5:1.5b`
|
||||
5. Select the model you want to use
|
||||
|
||||
### 5. Test the Integration
|
||||
|
||||
1. Enter a search query (e.g., "What is machine learning?")
|
||||
2. Press Enter or click Search
|
||||
3. Observe the response
|
||||
4. Verify it's using Seattle Ollama (check response time, different from primary)
|
||||
|
||||
## Performance Issues & Solutions
|
||||
|
||||
⚠️ **IMPORTANT**: CPU-based Ollama inference on Seattle is very slow for larger models.
|
||||
|
||||
See [PERPLEXICA_TROUBLESHOOTING.md](./PERPLEXICA_TROUBLESHOOTING.md) for detailed performance analysis.
|
||||
|
||||
### Performance Timeline
|
||||
- **Qwen2.5:1.5b on Seattle CPU**: 10 minutes per query ❌ (unusable)
|
||||
- **TinyLlama:1.1b on Seattle CPU**: 12 seconds per query ⚠️ (slow but usable)
|
||||
- **Groq API (Llama 3.3 70B)**: 0.4 seconds per query ✅ (recommended)
|
||||
|
||||
### Recommended Configuration (As of Feb 2026)
|
||||
- **Primary**: Use Groq API for chat (fast, free tier available)
|
||||
- **Secondary**: Use Seattle Ollama for embeddings only
|
||||
- **Fallback**: TinyLlama on Seattle if Groq unavailable
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Provider Not Appearing
|
||||
|
||||
**Problem**: Seattle Ollama doesn't show up in provider list
|
||||
|
||||
**Solutions**:
|
||||
1. Refresh the page (Ctrl+F5 or Cmd+Shift+R)
|
||||
2. Check browser console for errors (F12)
|
||||
3. Verify provider was saved correctly
|
||||
4. Re-add the provider
|
||||
|
||||
### Connection Timeout
|
||||
|
||||
**Problem**: Perplexica can't connect to Seattle Ollama
|
||||
|
||||
**Check connectivity**:
|
||||
```bash
|
||||
# From the Perplexica container
|
||||
docker exec perplexica curl -m 5 http://100.82.197.124:11434/api/tags
|
||||
```
|
||||
|
||||
**Solutions**:
|
||||
1. Verify Tailscale is running on both machines:
|
||||
```bash
|
||||
tailscale status
|
||||
```
|
||||
|
||||
2. Check if Seattle Ollama is running:
|
||||
```bash
|
||||
ssh seattle-tailscale "docker ps | grep ollama"
|
||||
```
|
||||
|
||||
3. Test from homelab host:
|
||||
```bash
|
||||
curl http://100.82.197.124:11434/api/tags
|
||||
```
|
||||
|
||||
### No Models Available
|
||||
|
||||
**Problem**: Provider added but no models show up
|
||||
|
||||
**Solution**: Pull a model on Seattle:
|
||||
```bash
|
||||
ssh seattle-tailscale "docker exec ollama-seattle ollama pull qwen2.5:1.5b"
|
||||
```
|
||||
|
||||
### Slow Responses
|
||||
|
||||
**Problem**: Seattle Ollama is slower than expected
|
||||
|
||||
**Causes**:
|
||||
- Seattle VM uses CPU-only inference (no GPU)
|
||||
- Network latency over Tailscale
|
||||
- Model too large for CPU
|
||||
|
||||
**Solutions**:
|
||||
1. Use smaller models (1.5B or 3B)
|
||||
2. Stick to primary Ollama for time-sensitive queries
|
||||
3. Use Seattle Ollama for background/batch queries
|
||||
|
||||
## Performance Comparison
|
||||
|
||||
### Expected Response Times
|
||||
|
||||
| Setup | Tokens/Second | Notes |
|
||||
|-------|---------------|-------|
|
||||
| **Atlantis Ollama** (GPU) | 50-100+ | Much faster with GPU |
|
||||
| **Seattle Ollama** (CPU) | 8-12 | Adequate for most queries |
|
||||
| **Cloud APIs** (OpenAI, etc.) | 30-60 | Fast but costs money |
|
||||
|
||||
### When to Use Each
|
||||
|
||||
**Use Atlantis Ollama (Primary)**:
|
||||
- Real-time searches
|
||||
- Large models (7B+)
|
||||
- When GPU acceleration is beneficial
|
||||
|
||||
**Use Seattle Ollama (Secondary)**:
|
||||
- Load balancing during heavy usage
|
||||
- Backup when primary is down
|
||||
- Testing new models
|
||||
- When primary is busy
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
### Load Balancing Strategy
|
||||
|
||||
To automatically distribute load:
|
||||
|
||||
1. Configure both Ollama instances
|
||||
2. Use smaller models on Seattle (1.5B, 3B)
|
||||
3. Reserve larger models (7B+) for Atlantis
|
||||
4. Manually switch based on load
|
||||
|
||||
### Model Recommendations by Instance
|
||||
|
||||
**Atlantis Ollama** (GPU):
|
||||
- `mistral:7b` - Best quality
|
||||
- `codellama:7b` - Code tasks
|
||||
- `llama3:8b` - General purpose
|
||||
|
||||
**Seattle Ollama** (CPU):
|
||||
- `qwen2.5:1.5b` - Very fast, light
|
||||
- `qwen2.5:3b` - Good balance
|
||||
- `phi3:3.8b` - Efficient
|
||||
|
||||
### Monitoring
|
||||
|
||||
Track which instance is being used:
|
||||
|
||||
```bash
|
||||
# Watch Atlantis Ollama logs
|
||||
ssh atlantis "docker logs -f ollama"
|
||||
|
||||
# Watch Seattle Ollama logs
|
||||
ssh seattle-tailscale "docker logs -f ollama-seattle"
|
||||
```
|
||||
|
||||
## Cost Analysis
|
||||
|
||||
### Before Integration
|
||||
- Single Ollama instance (Atlantis)
|
||||
- Risk of overload during heavy usage
|
||||
- Single point of failure
|
||||
|
||||
### After Integration
|
||||
- Distributed inference capacity
|
||||
- No additional ongoing costs (VPS already paid for)
|
||||
- Redundancy built in
|
||||
- Can scale by adding more instances
|
||||
|
||||
### vs Cloud APIs
|
||||
| Scenario | Cloud API Cost | Self-Hosted Cost |
|
||||
|----------|---------------|------------------|
|
||||
| 1M tokens/month | $0.15-0.60 | $0 (already running) |
|
||||
| 10M tokens/month | $1.50-6.00 | $0 |
|
||||
| 100M tokens/month | $15-60 | $0 |
|
||||
|
||||
## Security Considerations
|
||||
|
||||
### Current Setup
|
||||
- Ollama accessible only via Tailscale
|
||||
- No public internet exposure
|
||||
- No authentication required (trusted network)
|
||||
|
||||
### Recommended Enhancements
|
||||
1. **Tailscale ACLs**: Restrict which devices can access Ollama
|
||||
2. **Reverse Proxy**: Add Nginx with basic auth
|
||||
3. **Rate Limiting**: Prevent abuse
|
||||
4. **Monitoring**: Alert on unusual usage patterns
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Regular Tasks
|
||||
|
||||
**Weekly**:
|
||||
- Check Ollama is running: `docker ps | grep ollama`
|
||||
- Verify connectivity: `curl http://100.82.197.124:11434/api/tags`
|
||||
|
||||
**Monthly**:
|
||||
- Update Ollama image: `docker pull ollama/ollama:latest`
|
||||
- Clean up unused models: `ollama list` and `ollama rm <model>`
|
||||
- Check disk space: `df -h`
|
||||
|
||||
**As Needed**:
|
||||
- Pull new models based on usage patterns
|
||||
- Adjust resource limits if performance issues
|
||||
- Update Perplexica when new versions release
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Ollama Seattle Setup](../../hosts/vms/seattle/README-ollama.md) - Full Seattle Ollama documentation
|
||||
- [Perplexica Service](../services/individual/perplexica.md) - Main Perplexica documentation
|
||||
- [Seattle VM Overview](../../hosts/vms/seattle/README.md) - Seattle server details
|
||||
|
||||
## Changelog
|
||||
|
||||
### February 16, 2026
|
||||
- **Initial setup**: Deployed Ollama on Seattle VM
|
||||
- **Model**: Pulled `qwen2.5:1.5b`
|
||||
- **Integration**: Configured Perplexica to use Seattle Ollama
|
||||
- **Documentation**: Created this guide
|
||||
|
||||
### Attempted vLLM (Failed)
|
||||
- Tried `vllm/vllm-openai:latest` for CPU inference
|
||||
- Failed with device detection errors
|
||||
- vLLM not suitable for CPU-only systems
|
||||
- Switched to Ollama successfully
|
||||
|
||||
---
|
||||
|
||||
**Status:** 🔴 Performance Issues - Use Groq API instead
|
||||
**Last Updated:** February 16, 2026
|
||||
**Maintained By:** Manual Configuration
|
||||
|
||||
See [PERPLEXICA_STATUS.md](../../PERPLEXICA_STATUS.md) for current operational status.
|
||||
Reference in New Issue
Block a user