Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot 7122d277d5

Documentation / Build Docusaurus (push) Failing after 5m1s

Details

Documentation / Deploy to GitHub Pages (push) Has been skipped

Details

Sanitized mirror from private repository - 2026-03-26 12:32:56 UTC

2026-03-26 12:32:56 +00:00

8.3 KiB

Raw Blame History

Perplexica + Seattle Ollama Integration Guide

Overview

This guide explains how to configure Perplexica (running on homelab-vm at 192.168.0.210) to use the Ollama instance running on the Seattle VM (Contabo VPS at 100.82.197.124 via Tailscale).

Why This Setup?

Benefits

Load Distribution: Spread LLM inference across multiple servers
Redundancy: Backup LLM provider if primary Ollama fails
Cost Efficiency: Use self-hosted inference instead of cloud APIs
Privacy: All inference stays within your infrastructure

Architecture

┌─────────────────┐
│  Perplexica     │
│  192.168.0.210  │
│  :4785          │
└────────┬────────┘
         │
         ├──────────┐
         │          │
         ▼          ▼
┌────────────┐  ┌────────────┐
│ Ollama     │  │ Ollama     │
│ Atlantis   │  │ Seattle    │
│ :11434     │  │ :11434     │
└────────────┘  └────────────┘
   (Primary)       (Secondary)

Prerequisites

Perplexica running on homelab-vm (192.168.0.210:4785)
Ollama running on Seattle VM (100.82.197.124:11434)
Tailscale VPN connecting both machines
At least one model pulled on Seattle Ollama

Step-by-Step Configuration

1. Verify Connectivity

First, verify that the homelab can reach Seattle's Ollama:

# From homelab machine
curl http://100.82.197.124:11434/api/tags

# Should return JSON with available models

2. Access Perplexica Settings

Open your web browser
Navigate to: http://192.168.0.210:4785
Click the Settings icon (gear icon) in the top right
Or go directly to: http://192.168.0.210:4785/settings

3. Add Ollama Seattle Provider

In Settings, click "Model Providers" section
Click "Add Provider" button
Fill in the form:

Field	Value
Name	Ollama Seattle
Type	Ollama
Base URL	`http://100.82.197.124:11434`
API Key	(leave empty)

Click "Save" or "Add"

4. Select Model

After adding the provider:

Return to the main Perplexica search page
Click on the model selector dropdown
You should see "Ollama Seattle" as an option
Expand it to see available models:
- qwen2.5:1.5b
Select the model you want to use

5. Test the Integration

Enter a search query (e.g., "What is machine learning?")
Press Enter or click Search
Observe the response
Verify it's using Seattle Ollama (check response time, different from primary)

Performance Issues & Solutions

⚠️ IMPORTANT: CPU-based Ollama inference on Seattle is very slow for larger models.

See PERPLEXICA_TROUBLESHOOTING.md for detailed performance analysis.

Performance Timeline

Qwen2.5:1.5b on Seattle CPU: 10 minutes per query ❌ (unusable)
TinyLlama:1.1b on Seattle CPU: 12 seconds per query ⚠️ (slow but usable)
Groq API (Llama 3.3 70B): 0.4 seconds per query ✅ (recommended)

Recommended Configuration (As of Feb 2026)

Primary: Use Groq API for chat (fast, free tier available)
Secondary: Use Seattle Ollama for embeddings only
Fallback: TinyLlama on Seattle if Groq unavailable

Troubleshooting

Provider Not Appearing

Problem: Seattle Ollama doesn't show up in provider list

Solutions:

Refresh the page (Ctrl+F5 or Cmd+Shift+R)
Check browser console for errors (F12)
Verify provider was saved correctly
Re-add the provider

Connection Timeout

Problem: Perplexica can't connect to Seattle Ollama

Check connectivity:

# From the Perplexica container
docker exec perplexica curl -m 5 http://100.82.197.124:11434/api/tags

Solutions:

Verify Tailscale is running on both machines:
```
tailscale status
```

Check if Seattle Ollama is running:

ssh seattle-tailscale "docker ps | grep ollama"

Test from homelab host:

curl http://100.82.197.124:11434/api/tags

No Models Available

Problem: Provider added but no models show up

Solution: Pull a model on Seattle:

ssh seattle-tailscale "docker exec ollama-seattle ollama pull qwen2.5:1.5b"

Slow Responses

Problem: Seattle Ollama is slower than expected

Causes:

Seattle VM uses CPU-only inference (no GPU)
Network latency over Tailscale
Model too large for CPU

Solutions:

Use smaller models (1.5B or 3B)
Stick to primary Ollama for time-sensitive queries
Use Seattle Ollama for background/batch queries

Performance Comparison

Expected Response Times

Setup	Tokens/Second	Notes
Atlantis Ollama (GPU)	50-100+	Much faster with GPU
Seattle Ollama (CPU)	8-12	Adequate for most queries
Cloud APIs (OpenAI, etc.)	30-60	Fast but costs money

When to Use Each

Use Atlantis Ollama (Primary):

Real-time searches
Large models (7B+)
When GPU acceleration is beneficial

Use Seattle Ollama (Secondary):

Load balancing during heavy usage
Backup when primary is down
Testing new models
When primary is busy

Advanced Configuration

Load Balancing Strategy

To automatically distribute load:

Configure both Ollama instances
Use smaller models on Seattle (1.5B, 3B)
Reserve larger models (7B+) for Atlantis
Manually switch based on load

Model Recommendations by Instance

Atlantis Ollama (GPU):

mistral:7b - Best quality
codellama:7b - Code tasks
llama3:8b - General purpose

Seattle Ollama (CPU):

qwen2.5:1.5b - Very fast, light
qwen2.5:3b - Good balance
phi3:3.8b - Efficient

Monitoring

Track which instance is being used:

# Watch Atlantis Ollama logs
ssh atlantis "docker logs -f ollama"

# Watch Seattle Ollama logs
ssh seattle-tailscale "docker logs -f ollama-seattle"

Cost Analysis

Before Integration

Single Ollama instance (Atlantis)
Risk of overload during heavy usage
Single point of failure

After Integration

Distributed inference capacity
No additional ongoing costs (VPS already paid for)
Redundancy built in
Can scale by adding more instances

vs Cloud APIs

Scenario	Cloud API Cost	Self-Hosted Cost
1M tokens/month	$0.15-0.60	$0 (already running)
10M tokens/month	$1.50-6.00	$0
100M tokens/month	$15-60	$0

Security Considerations

Current Setup

Ollama accessible only via Tailscale
No public internet exposure
No authentication required (trusted network)

Recommended Enhancements

Tailscale ACLs: Restrict which devices can access Ollama
Reverse Proxy: Add Nginx with basic auth
Rate Limiting: Prevent abuse
Monitoring: Alert on unusual usage patterns

Maintenance

Regular Tasks

Weekly:

Check Ollama is running: docker ps | grep ollama
Verify connectivity: curl http://100.82.197.124:11434/api/tags

Monthly:

Update Ollama image: docker pull ollama/ollama:latest
Clean up unused models: ollama list and ollama rm <model>
Check disk space: df -h

As Needed:

Pull new models based on usage patterns
Adjust resource limits if performance issues
Update Perplexica when new versions release

Ollama Seattle Setup - Full Seattle Ollama documentation
Perplexica Service - Main Perplexica documentation
Seattle VM Overview - Seattle server details

Changelog

February 16, 2026

Initial setup: Deployed Ollama on Seattle VM
Model: Pulled qwen2.5:1.5b
Integration: Configured Perplexica to use Seattle Ollama
Documentation: Created this guide

Attempted vLLM (Failed)

Tried vllm/vllm-openai:latest for CPU inference
Failed with device detection errors
vLLM not suitable for CPU-only systems
Switched to Ollama successfully

Status: 🔴 Performance Issues - Use Groq API instead Last Updated: February 16, 2026 Maintained By: Manual Configuration

See PERPLEXICA_STATUS.md for current operational status.

8.3 KiB Raw Blame History

Perplexica + Seattle Ollama Integration Guide

Overview

Why This Setup?

Benefits

Architecture

Prerequisites

Step-by-Step Configuration

1. Verify Connectivity

2. Access Perplexica Settings

3. Add Ollama Seattle Provider

4. Select Model

5. Test the Integration

Performance Issues & Solutions

Performance Timeline

Recommended Configuration (As of Feb 2026)

Troubleshooting

Provider Not Appearing

Connection Timeout

No Models Available

Slow Responses

Performance Comparison

Expected Response Times

When to Use Each

Advanced Configuration

Load Balancing Strategy

Model Recommendations by Instance

Monitoring

Cost Analysis

Before Integration

After Integration

vs Cloud APIs

Security Considerations

Current Setup

Recommended Enhancements

Maintenance

Regular Tasks

Related Documentation

Changelog

February 16, 2026

Attempted vLLM (Failed)

8.3 KiB

Raw Blame History