Files
homelab-optimized/docs/guides/PERPLEXICA_TROUBLESHOOTING.md
Gitea Mirror Bot dca0a02a19
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-19 08:22:03 UTC
2026-04-19 08:22:03 +00:00

5.0 KiB

Perplexica Performance Troubleshooting

Issue Summary

Perplexica search queries were taking 10 minutes with CPU-based Ollama inference on Seattle VM.

Timeline of Solutions Attempted

1. Initial Setup: Seattle Ollama with Qwen2.5:1.5b

  • Result: 10 minutes per search query
  • Problem: CPU inference too slow, Seattle load average 9.82, Ollama using 937% CPU
  • Metrics:
    • Chat requests: 16-28 seconds each
    • Generate requests: 2+ minutes each

2. Switched to TinyLlama:1.1b

  • Model Size: 608MB (vs 940MB for Qwen2.5)
  • Speed: 12 seconds per response
  • Improvement: 50x faster than Qwen2.5
  • Quality: Lower quality responses
  • Status: Works but still slow

3. Switched to Groq API (Current)

  • Model: llama-3.3-70b-versatile
  • Speed: 0.4 seconds per response
  • Quality: Excellent (70B model)
  • Cost: Free tier (30 req/min, 14,400/day)
  • Status: Configured but user reports not working

Current Configuration

Perplexica Config (config.json)

{
  "version": 1,
  "setupComplete": true,
  "modelProviders": [
    {
      "id": "groq-provider",
      "name": "Groq",
      "type": "openai",
      "config": {
        "baseURL": "https://api.groq.com/openai/v1",
        "apiKey": "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"
      },
      "chatModels": [
        {
          "name": "llama-3.3-70b-versatile",
          "key": "llama-3.3-70b-versatile"
        }
      ]
    },
    {
      "id": "seattle-ollama",
      "name": "Seattle Ollama",
      "type": "ollama",
      "config": {
        "baseURL": "http://100.82.197.124:11434"
      },
      "chatModels": [
        {
          "name": "tinyllama:1.1b",
          "key": "tinyllama:1.1b"
        }
      ],
      "embeddingModels": [
        {
          "name": "nomic-embed-text:latest",
          "key": "nomic-embed-text:latest"
        }
      ]
    }
  ],
  "REDACTED_APP_PASSWORD": "llama-3.3-70b-versatile",
  "defaultEmbeddingModel": "nomic-embed-text:latest"
}

Seattle Ollama Models

ssh seattle "docker exec ollama-seattle ollama list"

Available models:

  • tinyllama:1.1b (608MB) - Fast CPU inference
  • qwen2.5:1.5b (940MB) - Slow but better quality
  • nomic-embed-text:latest (261MB) - For embeddings

Performance Comparison

Configuration Chat Speed Quality Notes
Qwen2.5 1.5B (Seattle CPU) 10 minutes Good CPU overload, unusable
TinyLlama 1.1B (Seattle CPU) 12 seconds Basic Usable but slow
Llama 3.3 70B (Groq API) 0.4 seconds Excellent Best option

Common Issues

Issue: "nomic-embed-text:latest does not support chat"

  • Cause: Config has embedding model listed as chat model
  • Fix: Ensure embedding models are only in embeddingModels array

Issue: Browser shows old model selections

  • Cause: Browser cache
  • Fix: Clear browser cache (Ctrl+F5) and close all tabs

Issue: Database retains old conversations

  • Fix: Clear database:
docker run --rm -v perplexica-data:/data alpine rm -f /data/db.sqlite
docker restart perplexica

Issue: Config reverts after restart

  • Cause: Config is in Docker volume, not git-tracked file
  • Fix: Update config in volume:
docker run --rm -v perplexica-data:/data -v /tmp:/tmp alpine cp /tmp/config.json /data/config.json

Testing

Test SearXNG (from inside container)

docker exec perplexica curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results | length'

Test Seattle Ollama

curl -s http://100.82.197.124:11434/api/tags | jq '.models[].name'

Test Groq API

curl -s https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Test"}],
    "max_tokens": 50
  }' | jq -r '.choices[0].message.content'

Check Perplexica Config

docker run --rm -v perplexica-data:/data alpine cat /data/config.json | jq .

Recommendations

  1. Use Groq for chat (0.4s response time, excellent quality)
  2. Use Seattle Ollama for embeddings (nomic-embed-text:latest)
  3. Keep TinyLlama as fallback (if Groq rate limits hit)
  4. Monitor Groq rate limits (30 req/min on free tier)

Alternative Solutions

If Groq doesn't work:

  1. OpenRouter API: Similar to Groq, multiple models
  2. Anthropic Claude: Via API (costs money)
  3. Local GPU: Move Ollama to GPU-enabled host
  4. Accept slow performance: Use TinyLlama with 12s responses

Status

  • Groq API key configured
  • Groq API responding in 0.4s
  • Config updated in Perplexica
  • User reports web UI still not working (needs investigation)

Next Steps

  1. Test from web UI and capture exact error message
  2. Check browser console for JavaScript errors
  3. Check Perplexica logs during failed search
  4. Verify Groq API calls in network tab
  5. Consider switching to different LLM provider if Groq incompatible