Files
homelab-optimized/docs/guides/PERPLEXICA_SEATTLE_INTEGRATION.md
Gitea Mirror Bot d72af152e3
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-16 07:19:56 UTC
2026-04-16 07:19:56 +00:00

8.3 KiB

Perplexica + Seattle Ollama Integration Guide

Overview

This guide explains how to configure Perplexica (running on homelab-vm at 192.168.0.210) to use the Ollama instance running on the Seattle VM (Contabo VPS at 100.82.197.124 via Tailscale).

Why This Setup?

Benefits

  1. Load Distribution: Spread LLM inference across multiple servers
  2. Redundancy: Backup LLM provider if primary Ollama fails
  3. Cost Efficiency: Use self-hosted inference instead of cloud APIs
  4. Privacy: All inference stays within your infrastructure

Architecture

┌─────────────────┐
│  Perplexica     │
│  192.168.0.210  │
│  :4785          │
└────────┬────────┘
         │
         ├──────────┐
         │          │
         ▼          ▼
┌────────────┐  ┌────────────┐
│ Ollama     │  │ Ollama     │
│ Atlantis   │  │ Seattle    │
│ :11434     │  │ :11434     │
└────────────┘  └────────────┘
   (Primary)       (Secondary)

Prerequisites

  • Perplexica running on homelab-vm (192.168.0.210:4785)
  • Ollama running on Seattle VM (100.82.197.124:11434)
  • Tailscale VPN connecting both machines
  • At least one model pulled on Seattle Ollama

Step-by-Step Configuration

1. Verify Connectivity

First, verify that the homelab can reach Seattle's Ollama:

# From homelab machine
curl http://100.82.197.124:11434/api/tags

# Should return JSON with available models

2. Access Perplexica Settings

  1. Open your web browser
  2. Navigate to: http://192.168.0.210:4785
  3. Click the Settings icon (gear icon) in the top right
  4. Or go directly to: http://192.168.0.210:4785/settings

3. Add Ollama Seattle Provider

  1. In Settings, click "Model Providers" section
  2. Click "Add Provider" button
  3. Fill in the form:
Field Value
Name Ollama Seattle
Type Ollama
Base URL http://100.82.197.124:11434
API Key (leave empty)
  1. Click "Save" or "Add"

4. Select Model

After adding the provider:

  1. Return to the main Perplexica search page
  2. Click on the model selector dropdown
  3. You should see "Ollama Seattle" as an option
  4. Expand it to see available models:
    • qwen2.5:1.5b
  5. Select the model you want to use

5. Test the Integration

  1. Enter a search query (e.g., "What is machine learning?")
  2. Press Enter or click Search
  3. Observe the response
  4. Verify it's using Seattle Ollama (check response time, different from primary)

Performance Issues & Solutions

⚠️ IMPORTANT: CPU-based Ollama inference on Seattle is very slow for larger models.

See PERPLEXICA_TROUBLESHOOTING.md for detailed performance analysis.

Performance Timeline

  • Qwen2.5:1.5b on Seattle CPU: 10 minutes per query (unusable)
  • TinyLlama:1.1b on Seattle CPU: 12 seconds per query ⚠️ (slow but usable)
  • Groq API (Llama 3.3 70B): 0.4 seconds per query (recommended)
  • Primary: Use Groq API for chat (fast, free tier available)
  • Secondary: Use Seattle Ollama for embeddings only
  • Fallback: TinyLlama on Seattle if Groq unavailable

Troubleshooting

Provider Not Appearing

Problem: Seattle Ollama doesn't show up in provider list

Solutions:

  1. Refresh the page (Ctrl+F5 or Cmd+Shift+R)
  2. Check browser console for errors (F12)
  3. Verify provider was saved correctly
  4. Re-add the provider

Connection Timeout

Problem: Perplexica can't connect to Seattle Ollama

Check connectivity:

# From the Perplexica container
docker exec perplexica curl -m 5 http://100.82.197.124:11434/api/tags

Solutions:

  1. Verify Tailscale is running on both machines:

    tailscale status
    
  2. Check if Seattle Ollama is running:

    ssh seattle-tailscale "docker ps | grep ollama"
    
  3. Test from homelab host:

    curl http://100.82.197.124:11434/api/tags
    

No Models Available

Problem: Provider added but no models show up

Solution: Pull a model on Seattle:

ssh seattle-tailscale "docker exec ollama-seattle ollama pull qwen2.5:1.5b"

Slow Responses

Problem: Seattle Ollama is slower than expected

Causes:

  • Seattle VM uses CPU-only inference (no GPU)
  • Network latency over Tailscale
  • Model too large for CPU

Solutions:

  1. Use smaller models (1.5B or 3B)
  2. Stick to primary Ollama for time-sensitive queries
  3. Use Seattle Ollama for background/batch queries

Performance Comparison

Expected Response Times

Setup Tokens/Second Notes
Atlantis Ollama (GPU) 50-100+ Much faster with GPU
Seattle Ollama (CPU) 8-12 Adequate for most queries
Cloud APIs (OpenAI, etc.) 30-60 Fast but costs money

When to Use Each

Use Atlantis Ollama (Primary):

  • Real-time searches
  • Large models (7B+)
  • When GPU acceleration is beneficial

Use Seattle Ollama (Secondary):

  • Load balancing during heavy usage
  • Backup when primary is down
  • Testing new models
  • When primary is busy

Advanced Configuration

Load Balancing Strategy

To automatically distribute load:

  1. Configure both Ollama instances
  2. Use smaller models on Seattle (1.5B, 3B)
  3. Reserve larger models (7B+) for Atlantis
  4. Manually switch based on load

Model Recommendations by Instance

Atlantis Ollama (GPU):

  • mistral:7b - Best quality
  • codellama:7b - Code tasks
  • llama3:8b - General purpose

Seattle Ollama (CPU):

  • qwen2.5:1.5b - Very fast, light
  • qwen2.5:3b - Good balance
  • phi3:3.8b - Efficient

Monitoring

Track which instance is being used:

# Watch Atlantis Ollama logs
ssh atlantis "docker logs -f ollama"

# Watch Seattle Ollama logs
ssh seattle-tailscale "docker logs -f ollama-seattle"

Cost Analysis

Before Integration

  • Single Ollama instance (Atlantis)
  • Risk of overload during heavy usage
  • Single point of failure

After Integration

  • Distributed inference capacity
  • No additional ongoing costs (VPS already paid for)
  • Redundancy built in
  • Can scale by adding more instances

vs Cloud APIs

Scenario Cloud API Cost Self-Hosted Cost
1M tokens/month $0.15-0.60 $0 (already running)
10M tokens/month $1.50-6.00 $0
100M tokens/month $15-60 $0

Security Considerations

Current Setup

  • Ollama accessible only via Tailscale
  • No public internet exposure
  • No authentication required (trusted network)
  1. Tailscale ACLs: Restrict which devices can access Ollama
  2. Reverse Proxy: Add Nginx with basic auth
  3. Rate Limiting: Prevent abuse
  4. Monitoring: Alert on unusual usage patterns

Maintenance

Regular Tasks

Weekly:

  • Check Ollama is running: docker ps | grep ollama
  • Verify connectivity: curl http://100.82.197.124:11434/api/tags

Monthly:

  • Update Ollama image: docker pull ollama/ollama:latest
  • Clean up unused models: ollama list and ollama rm <model>
  • Check disk space: df -h

As Needed:

  • Pull new models based on usage patterns
  • Adjust resource limits if performance issues
  • Update Perplexica when new versions release

Changelog

February 16, 2026

  • Initial setup: Deployed Ollama on Seattle VM
  • Model: Pulled qwen2.5:1.5b
  • Integration: Configured Perplexica to use Seattle Ollama
  • Documentation: Created this guide

Attempted vLLM (Failed)

  • Tried vllm/vllm-openai:latest for CPU inference
  • Failed with device detection errors
  • vLLM not suitable for CPU-only systems
  • Switched to Ollama successfully

Status: 🔴 Performance Issues - Use Groq API instead Last Updated: February 16, 2026 Maintained By: Manual Configuration

See PERPLEXICA_STATUS.md for current operational status.