# Perplexica + Seattle Ollama Integration Guide

## Overview

This guide explains how to configure Perplexica (running on homelab-vm at 192.168.0.210) to use the Ollama instance running on the Seattle VM (Contabo VPS at 100.82.197.124 via Tailscale).

## Why This Setup?

### Benefits

1. **Load Distribution**: Spread LLM inference across multiple servers
2. **Redundancy**: Backup LLM provider if primary Ollama fails
3. **Cost Efficiency**: Use self-hosted inference instead of cloud APIs
4. **Privacy**: All inference stays within your infrastructure

### Architecture

```
┌─────────────────┐
│  Perplexica     │
│  192.168.0.210  │
│  :4785          │
└────────┬────────┘
         │
         ├──────────┐
         │          │
         ▼          ▼
┌────────────┐  ┌────────────┐
│ Ollama     │  │ Ollama     │
│ Atlantis   │  │ Seattle    │
│ :11434     │  │ :11434     │
└────────────┘  └────────────┘
   (Primary)       (Secondary)
```

## Prerequisites

- Perplexica running on homelab-vm (192.168.0.210:4785)
- Ollama running on Seattle VM (100.82.197.124:11434)
- Tailscale VPN connecting both machines
- At least one model pulled on Seattle Ollama

## Step-by-Step Configuration

### 1. Verify Connectivity

First, verify that the homelab can reach Seattle's Ollama:

```bash
# From homelab machine
curl http://100.82.197.124:11434/api/tags

# Should return JSON with available models
```

### 2. Access Perplexica Settings

1. Open your web browser
2. Navigate to: **http://192.168.0.210:4785**
3. Click the **Settings** icon (gear icon) in the top right
4. Or go directly to: **http://192.168.0.210:4785/settings**

### 3. Add Ollama Seattle Provider

1. In Settings, click **"Model Providers"** section
2. Click **"Add Provider"** button
3. Fill in the form:

| Field | Value |
|-------|-------|
| **Name** | Ollama Seattle |
| **Type** | Ollama |
| **Base URL** | `http://100.82.197.124:11434` |
| **API Key** | *(leave empty)* |

4. Click **"Save"** or **"Add"**

### 4. Select Model

After adding the provider:

1. Return to the main Perplexica search page
2. Click on the **model selector** dropdown
3. You should see **"Ollama Seattle"** as an option
4. Expand it to see available models:
   - `qwen2.5:1.5b`
5. Select the model you want to use

### 5. Test the Integration

1. Enter a search query (e.g., "What is machine learning?")
2. Press Enter or click Search
3. Observe the response
4. Verify it's using Seattle Ollama (check response time, different from primary)

## Performance Issues & Solutions

⚠️ **IMPORTANT**: CPU-based Ollama inference on Seattle is very slow for larger models.

See [PERPLEXICA_TROUBLESHOOTING.md](./PERPLEXICA_TROUBLESHOOTING.md) for detailed performance analysis.

### Performance Timeline
- **Qwen2.5:1.5b on Seattle CPU**: 10 minutes per query ❌ (unusable)
- **TinyLlama:1.1b on Seattle CPU**: 12 seconds per query ⚠️ (slow but usable)
- **Groq API (Llama 3.3 70B)**: 0.4 seconds per query ✅ (recommended)

### Recommended Configuration (As of Feb 2026)
- **Primary**: Use Groq API for chat (fast, free tier available)
- **Secondary**: Use Seattle Ollama for embeddings only
- **Fallback**: TinyLlama on Seattle if Groq unavailable

## Troubleshooting

### Provider Not Appearing

**Problem**: Seattle Ollama doesn't show up in provider list

**Solutions**:
1. Refresh the page (Ctrl+F5 or Cmd+Shift+R)
2. Check browser console for errors (F12)
3. Verify provider was saved correctly
4. Re-add the provider

### Connection Timeout

**Problem**: Perplexica can't connect to Seattle Ollama

**Check connectivity**:
```bash
# From the Perplexica container
docker exec perplexica curl -m 5 http://100.82.197.124:11434/api/tags
```

**Solutions**:
1. Verify Tailscale is running on both machines:
   ```bash
   tailscale status
   ```

2. Check if Seattle Ollama is running:
   ```bash
   ssh seattle-tailscale "docker ps | grep ollama"
   ```

3. Test from homelab host:
   ```bash
   curl http://100.82.197.124:11434/api/tags
   ```

### No Models Available

**Problem**: Provider added but no models show up

**Solution**: Pull a model on Seattle:
```bash
ssh seattle-tailscale "docker exec ollama-seattle ollama pull qwen2.5:1.5b"
```

### Slow Responses

**Problem**: Seattle Ollama is slower than expected

**Causes**:
- Seattle VM uses CPU-only inference (no GPU)
- Network latency over Tailscale
- Model too large for CPU

**Solutions**:
1. Use smaller models (1.5B or 3B)
2. Stick to primary Ollama for time-sensitive queries
3. Use Seattle Ollama for background/batch queries

## Performance Comparison

### Expected Response Times

| Setup | Tokens/Second | Notes |
|-------|---------------|-------|
| **Atlantis Ollama** (GPU) | 50-100+ | Much faster with GPU |
| **Seattle Ollama** (CPU) | 8-12 | Adequate for most queries |
| **Cloud APIs** (OpenAI, etc.) | 30-60 | Fast but costs money |

### When to Use Each

**Use Atlantis Ollama (Primary)**:
- Real-time searches
- Large models (7B+)
- When GPU acceleration is beneficial

**Use Seattle Ollama (Secondary)**:
- Load balancing during heavy usage
- Backup when primary is down
- Testing new models
- When primary is busy

## Advanced Configuration

### Load Balancing Strategy

To automatically distribute load:

1. Configure both Ollama instances
2. Use smaller models on Seattle (1.5B, 3B)
3. Reserve larger models (7B+) for Atlantis
4. Manually switch based on load

### Model Recommendations by Instance

**Atlantis Ollama** (GPU):
- `mistral:7b` - Best quality
- `codellama:7b` - Code tasks
- `llama3:8b` - General purpose

**Seattle Ollama** (CPU):
- `qwen2.5:1.5b` - Very fast, light
- `qwen2.5:3b` - Good balance
- `phi3:3.8b` - Efficient

### Monitoring

Track which instance is being used:

```bash
# Watch Atlantis Ollama logs
ssh atlantis "docker logs -f ollama"

# Watch Seattle Ollama logs
ssh seattle-tailscale "docker logs -f ollama-seattle"
```

## Cost Analysis

### Before Integration
- Single Ollama instance (Atlantis)
- Risk of overload during heavy usage
- Single point of failure

### After Integration
- Distributed inference capacity
- No additional ongoing costs (VPS already paid for)
- Redundancy built in
- Can scale by adding more instances

### vs Cloud APIs
| Scenario | Cloud API Cost | Self-Hosted Cost |
|----------|---------------|------------------|
| 1M tokens/month | $0.15-0.60 | $0 (already running) |
| 10M tokens/month | $1.50-6.00 | $0 |
| 100M tokens/month | $15-60 | $0 |

## Security Considerations

### Current Setup
- Ollama accessible only via Tailscale
- No public internet exposure
- No authentication required (trusted network)

### Recommended Enhancements
1. **Tailscale ACLs**: Restrict which devices can access Ollama
2. **Reverse Proxy**: Add Nginx with basic auth
3. **Rate Limiting**: Prevent abuse
4. **Monitoring**: Alert on unusual usage patterns

## Maintenance

### Regular Tasks

**Weekly**:
- Check Ollama is running: `docker ps | grep ollama`
- Verify connectivity: `curl http://100.82.197.124:11434/api/tags`

**Monthly**:
- Update Ollama image: `docker pull ollama/ollama:latest`
- Clean up unused models: `ollama list` and `ollama rm <model>`
- Check disk space: `df -h`

**As Needed**:
- Pull new models based on usage patterns
- Adjust resource limits if performance issues
- Update Perplexica when new versions release

## Related Documentation

- [Ollama Seattle Setup](../../hosts/vms/seattle/README-ollama.md) - Full Seattle Ollama documentation
- [Perplexica Service](../services/individual/perplexica.md) - Main Perplexica documentation
- [Seattle VM Overview](../../hosts/vms/seattle/README.md) - Seattle server details

## Changelog

### February 16, 2026
- **Initial setup**: Deployed Ollama on Seattle VM
- **Model**: Pulled `qwen2.5:1.5b`
- **Integration**: Configured Perplexica to use Seattle Ollama
- **Documentation**: Created this guide

### Attempted vLLM (Failed)
- Tried `vllm/vllm-openai:latest` for CPU inference
- Failed with device detection errors
- vLLM not suitable for CPU-only systems
- Switched to Ollama successfully

---

**Status:** 🔴 Performance Issues - Use Groq API instead
**Last Updated:** February 16, 2026
**Maintained By:** Manual Configuration

See [PERPLEXICA_STATUS.md](../../PERPLEXICA_STATUS.md) for current operational status.