Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot b2aa602dac

Documentation / Deploy to GitHub Pages (push) Has been cancelled

Details

Documentation / Build Docusaurus (push) Has been cancelled

Details

Sanitized mirror from private repository - 2026-04-05 12:03:50 UTC

2026-04-05 12:03:50 +00:00

5.0 KiB

Raw Blame History

Perplexica Performance Troubleshooting

Issue Summary

Perplexica search queries were taking 10 minutes with CPU-based Ollama inference on Seattle VM.

Timeline of Solutions Attempted

1. Initial Setup: Seattle Ollama with Qwen2.5:1.5b

Result: 10 minutes per search query
Problem: CPU inference too slow, Seattle load average 9.82, Ollama using 937% CPU
Metrics:
- Chat requests: 16-28 seconds each
- Generate requests: 2+ minutes each

2. Switched to TinyLlama:1.1b

Model Size: 608MB (vs 940MB for Qwen2.5)
Speed: 12 seconds per response
Improvement: 50x faster than Qwen2.5
Quality: Lower quality responses
Status: Works but still slow

3. Switched to Groq API (Current)

Model: llama-3.3-70b-versatile
Speed: 0.4 seconds per response
Quality: Excellent (70B model)
Cost: Free tier (30 req/min, 14,400/day)
Status: Configured but user reports not working

Current Configuration

Perplexica Config (`config.json`)

{
  "version": 1,
  "setupComplete": true,
  "modelProviders": [
    {
      "id": "groq-provider",
      "name": "Groq",
      "type": "openai",
      "config": {
        "baseURL": "https://api.groq.com/openai/v1",
        "apiKey": "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"
      },
      "chatModels": [
        {
          "name": "llama-3.3-70b-versatile",
          "key": "llama-3.3-70b-versatile"
        }
      ]
    },
    {
      "id": "seattle-ollama",
      "name": "Seattle Ollama",
      "type": "ollama",
      "config": {
        "baseURL": "http://100.82.197.124:11434"
      },
      "chatModels": [
        {
          "name": "tinyllama:1.1b",
          "key": "tinyllama:1.1b"
        }
      ],
      "embeddingModels": [
        {
          "name": "nomic-embed-text:latest",
          "key": "nomic-embed-text:latest"
        }
      ]
    }
  ],
  "REDACTED_APP_PASSWORD": "llama-3.3-70b-versatile",
  "defaultEmbeddingModel": "nomic-embed-text:latest"
}

Seattle Ollama Models

ssh seattle "docker exec ollama-seattle ollama list"

Available models:

tinyllama:1.1b (608MB) - Fast CPU inference
qwen2.5:1.5b (940MB) - Slow but better quality
nomic-embed-text:latest (261MB) - For embeddings

Performance Comparison

Configuration	Chat Speed	Quality	Notes
Qwen2.5 1.5B (Seattle CPU)	10 minutes	Good	CPU overload, unusable
TinyLlama 1.1B (Seattle CPU)	12 seconds	Basic	Usable but slow
Llama 3.3 70B (Groq API)	0.4 seconds	Excellent	Best option

Common Issues

Issue: "nomic-embed-text:latest does not support chat"

Cause: Config has embedding model listed as chat model
Fix: Ensure embedding models are only in embeddingModels array

Issue: Browser shows old model selections

Cause: Browser cache
Fix: Clear browser cache (Ctrl+F5) and close all tabs

Issue: Database retains old conversations

Fix: Clear database:

docker run --rm -v perplexica-data:/data alpine rm -f /data/db.sqlite
docker restart perplexica

Issue: Config reverts after restart

Cause: Config is in Docker volume, not git-tracked file
Fix: Update config in volume:

docker run --rm -v perplexica-data:/data -v /tmp:/tmp alpine cp /tmp/config.json /data/config.json

Testing

Test SearXNG (from inside container)

docker exec perplexica curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results | length'

Test Seattle Ollama

curl -s http://100.82.197.124:11434/api/tags | jq '.models[].name'

Test Groq API

curl -s https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Test"}],
    "max_tokens": 50
  }' | jq -r '.choices[0].message.content'

Check Perplexica Config

docker run --rm -v perplexica-data:/data alpine cat /data/config.json | jq .

Recommendations

Use Groq for chat (0.4s response time, excellent quality)
Use Seattle Ollama for embeddings (nomic-embed-text:latest)
Keep TinyLlama as fallback (if Groq rate limits hit)
Monitor Groq rate limits (30 req/min on free tier)

Alternative Solutions

If Groq doesn't work:

OpenRouter API: Similar to Groq, multiple models
Anthropic Claude: Via API (costs money)
Local GPU: Move Ollama to GPU-enabled host
Accept slow performance: Use TinyLlama with 12s responses

Status

✅ Groq API key configured
✅ Groq API responding in 0.4s
✅ Config updated in Perplexica
❌ User reports web UI still not working (needs investigation)

Next Steps

Test from web UI and capture exact error message
Check browser console for JavaScript errors
Check Perplexica logs during failed search
Verify Groq API calls in network tab
Consider switching to different LLM provider if Groq incompatible

5.0 KiB Raw Blame History