Sanitized mirror from private repository - 2026-04-20 01:32:01 UTC

2026-04-20 01:32:01 +00:00
commit e7652c8dab
1445 changed files with 364095 additions and 0 deletions
--- a/docs/guides/PERPLEXICA_SEATTLE_INTEGRATION.md
+++ b/docs/guides/PERPLEXICA_SEATTLE_INTEGRATION.md
@@ -0,0 +1,308 @@
+# Perplexica + Seattle Ollama Integration Guide
+
+## Overview
+
+This guide explains how to configure Perplexica (running on homelab-vm at 192.168.0.210) to use the Ollama instance running on the Seattle VM (Contabo VPS at 100.82.197.124 via Tailscale).
+
+## Why This Setup?
+
+### Benefits
+
+1. **Load Distribution**: Spread LLM inference across multiple servers
+2. **Redundancy**: Backup LLM provider if primary Ollama fails
+3. **Cost Efficiency**: Use self-hosted inference instead of cloud APIs
+4. **Privacy**: All inference stays within your infrastructure
+
+### Architecture
+
+```
+┌─────────────────┐
+│  Perplexica     │
+│  192.168.0.210  │
+│  :4785          │
+└────────┬────────┘
+         │
+         ├──────────┐
+         │          │
+         ▼          ▼
+┌────────────┐  ┌────────────┐
+│ Ollama     │  │ Ollama     │
+│ Atlantis   │  │ Seattle    │
+│ :11434     │  │ :11434     │
+└────────────┘  └────────────┘
+   (Primary)       (Secondary)
+```
+
+## Prerequisites
+
+- Perplexica running on homelab-vm (192.168.0.210:4785)
+- Ollama running on Seattle VM (100.82.197.124:11434)
+- Tailscale VPN connecting both machines
+- At least one model pulled on Seattle Ollama
+
+## Step-by-Step Configuration
+
+### 1. Verify Connectivity
+
+First, verify that the homelab can reach Seattle's Ollama:
+
+```bash
+# From homelab machine
+curl http://100.82.197.124:11434/api/tags
+
+# Should return JSON with available models
+```
+
+### 2. Access Perplexica Settings
+
+1. Open your web browser
+2. Navigate to: **http://192.168.0.210:4785**
+3. Click the **Settings** icon (gear icon) in the top right
+4. Or go directly to: **http://192.168.0.210:4785/settings**
+
+### 3. Add Ollama Seattle Provider
+
+1. In Settings, click **"Model Providers"** section
+2. Click **"Add Provider"** button
+3. Fill in the form:
+
+| Field | Value |
+|-------|-------|
+| **Name** | Ollama Seattle |
+| **Type** | Ollama |
+| **Base URL** | `http://100.82.197.124:11434` |
+| **API Key** | *(leave empty)* |
+
+4. Click **"Save"** or **"Add"**
+
+### 4. Select Model
+
+After adding the provider:
+
+1. Return to the main Perplexica search page
+2. Click on the **model selector** dropdown
+3. You should see **"Ollama Seattle"** as an option
+4. Expand it to see available models:
+   - `qwen2.5:1.5b`
+5. Select the model you want to use
+
+### 5. Test the Integration
+
+1. Enter a search query (e.g., "What is machine learning?")
+2. Press Enter or click Search
+3. Observe the response
+4. Verify it's using Seattle Ollama (check response time, different from primary)
+
+## Performance Issues & Solutions
+
+⚠️ **IMPORTANT**: CPU-based Ollama inference on Seattle is very slow for larger models.
+
+See [PERPLEXICA_TROUBLESHOOTING.md](./PERPLEXICA_TROUBLESHOOTING.md) for detailed performance analysis.
+
+### Performance Timeline
+- **Qwen2.5:1.5b on Seattle CPU**: 10 minutes per query ❌ (unusable)
+- **TinyLlama:1.1b on Seattle CPU**: 12 seconds per query ⚠️ (slow but usable)
+- **Groq API (Llama 3.3 70B)**: 0.4 seconds per query ✅ (recommended)
+
+### Recommended Configuration (As of Feb 2026)
+- **Primary**: Use Groq API for chat (fast, free tier available)
+- **Secondary**: Use Seattle Ollama for embeddings only
+- **Fallback**: TinyLlama on Seattle if Groq unavailable
+
+## Troubleshooting
+
+### Provider Not Appearing
+
+**Problem**: Seattle Ollama doesn't show up in provider list
+
+**Solutions**:
+1. Refresh the page (Ctrl+F5 or Cmd+Shift+R)
+2. Check browser console for errors (F12)
+3. Verify provider was saved correctly
+4. Re-add the provider
+
+### Connection Timeout
+
+**Problem**: Perplexica can't connect to Seattle Ollama
+
+**Check connectivity**:
+```bash
+# From the Perplexica container
+docker exec perplexica curl -m 5 http://100.82.197.124:11434/api/tags
+```
+
+**Solutions**:
+1. Verify Tailscale is running on both machines:
+   ```bash
+   tailscale status
+   ```
+
+2. Check if Seattle Ollama is running:
+   ```bash
+   ssh seattle-tailscale "docker ps | grep ollama"
+   ```
+
+3. Test from homelab host:
+   ```bash
+   curl http://100.82.197.124:11434/api/tags
+   ```
+
+### No Models Available
+
+**Problem**: Provider added but no models show up
+
+**Solution**: Pull a model on Seattle:
+```bash
+ssh seattle-tailscale "docker exec ollama-seattle ollama pull qwen2.5:1.5b"
+```
+
+### Slow Responses
+
+**Problem**: Seattle Ollama is slower than expected
+
+**Causes**:
+- Seattle VM uses CPU-only inference (no GPU)
+- Network latency over Tailscale
+- Model too large for CPU
+
+**Solutions**:
+1. Use smaller models (1.5B or 3B)
+2. Stick to primary Ollama for time-sensitive queries
+3. Use Seattle Ollama for background/batch queries
+
+## Performance Comparison
+
+### Expected Response Times
+
+| Setup | Tokens/Second | Notes |
+|-------|---------------|-------|
+| **Atlantis Ollama** (GPU) | 50-100+ | Much faster with GPU |
+| **Seattle Ollama** (CPU) | 8-12 | Adequate for most queries |
+| **Cloud APIs** (OpenAI, etc.) | 30-60 | Fast but costs money |
+
+### When to Use Each
+
+**Use Atlantis Ollama (Primary)**:
+- Real-time searches
+- Large models (7B+)
+- When GPU acceleration is beneficial
+
+**Use Seattle Ollama (Secondary)**:
+- Load balancing during heavy usage
+- Backup when primary is down
+- Testing new models
+- When primary is busy
+
+## Advanced Configuration
+
+### Load Balancing Strategy
+
+To automatically distribute load:
+
+1. Configure both Ollama instances
+2. Use smaller models on Seattle (1.5B, 3B)
+3. Reserve larger models (7B+) for Atlantis
+4. Manually switch based on load
+
+### Model Recommendations by Instance
+
+**Atlantis Ollama** (GPU):
+- `mistral:7b` - Best quality
+- `codellama:7b` - Code tasks
+- `llama3:8b` - General purpose
+
+**Seattle Ollama** (CPU):
+- `qwen2.5:1.5b` - Very fast, light
+- `qwen2.5:3b` - Good balance
+- `phi3:3.8b` - Efficient
+
+### Monitoring
+
+Track which instance is being used:
+
+```bash
+# Watch Atlantis Ollama logs
+ssh atlantis "docker logs -f ollama"
+
+# Watch Seattle Ollama logs
+ssh seattle-tailscale "docker logs -f ollama-seattle"
+```
+
+## Cost Analysis
+
+### Before Integration
+- Single Ollama instance (Atlantis)
+- Risk of overload during heavy usage
+- Single point of failure
+
+### After Integration
+- Distributed inference capacity
+- No additional ongoing costs (VPS already paid for)
+- Redundancy built in
+- Can scale by adding more instances
+
+### vs Cloud APIs
+| Scenario | Cloud API Cost | Self-Hosted Cost |
+|----------|---------------|------------------|
+| 1M tokens/month | $0.15-0.60 | $0 (already running) |
+| 10M tokens/month | $1.50-6.00 | $0 |
+| 100M tokens/month | $15-60 | $0 |
+
+## Security Considerations
+
+### Current Setup
+- Ollama accessible only via Tailscale
+- No public internet exposure
+- No authentication required (trusted network)
+
+### Recommended Enhancements
+1. **Tailscale ACLs**: Restrict which devices can access Ollama
+2. **Reverse Proxy**: Add Nginx with basic auth
+3. **Rate Limiting**: Prevent abuse
+4. **Monitoring**: Alert on unusual usage patterns
+
+## Maintenance
+
+### Regular Tasks
+
+**Weekly**:
+- Check Ollama is running: `docker ps | grep ollama`
+- Verify connectivity: `curl http://100.82.197.124:11434/api/tags`
+
+**Monthly**:
+- Update Ollama image: `docker pull ollama/ollama:latest`
+- Clean up unused models: `ollama list` and `ollama rm <model>`
+- Check disk space: `df -h`
+
+**As Needed**:
+- Pull new models based on usage patterns
+- Adjust resource limits if performance issues
+- Update Perplexica when new versions release
+
+## Related Documentation
+
+- [Ollama Seattle Setup](../../hosts/vms/seattle/README-ollama.md) - Full Seattle Ollama documentation
+- [Perplexica Service](../services/individual/perplexica.md) - Main Perplexica documentation
+- [Seattle VM Overview](../../hosts/vms/seattle/README.md) - Seattle server details
+
+## Changelog
+
+### February 16, 2026
+- **Initial setup**: Deployed Ollama on Seattle VM
+- **Model**: Pulled `qwen2.5:1.5b`
+- **Integration**: Configured Perplexica to use Seattle Ollama
+- **Documentation**: Created this guide
+
+### Attempted vLLM (Failed)
+- Tried `vllm/vllm-openai:latest` for CPU inference
+- Failed with device detection errors
+- vLLM not suitable for CPU-only systems
+- Switched to Ollama successfully
+
+---
+
+**Status:** 🔴 Performance Issues - Use Groq API instead
+**Last Updated:** February 16, 2026
+**Maintained By:** Manual Configuration
+
+See [PERPLEXICA_STATUS.md](../../PERPLEXICA_STATUS.md) for current operational status.