5.0 KiB
5.0 KiB
Perplexica Performance Troubleshooting
Issue Summary
Perplexica search queries were taking 10 minutes with CPU-based Ollama inference on Seattle VM.
Timeline of Solutions Attempted
1. Initial Setup: Seattle Ollama with Qwen2.5:1.5b
- Result: 10 minutes per search query
- Problem: CPU inference too slow, Seattle load average 9.82, Ollama using 937% CPU
- Metrics:
- Chat requests: 16-28 seconds each
- Generate requests: 2+ minutes each
2. Switched to TinyLlama:1.1b
- Model Size: 608MB (vs 940MB for Qwen2.5)
- Speed: 12 seconds per response
- Improvement: 50x faster than Qwen2.5
- Quality: Lower quality responses
- Status: Works but still slow
3. Switched to Groq API (Current)
- Model: llama-3.3-70b-versatile
- Speed: 0.4 seconds per response
- Quality: Excellent (70B model)
- Cost: Free tier (30 req/min, 14,400/day)
- Status: Configured but user reports not working
Current Configuration
Perplexica Config (config.json)
{
"version": 1,
"setupComplete": true,
"modelProviders": [
{
"id": "groq-provider",
"name": "Groq",
"type": "openai",
"config": {
"baseURL": "https://api.groq.com/openai/v1",
"apiKey": "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"
},
"chatModels": [
{
"name": "llama-3.3-70b-versatile",
"key": "llama-3.3-70b-versatile"
}
]
},
{
"id": "seattle-ollama",
"name": "Seattle Ollama",
"type": "ollama",
"config": {
"baseURL": "http://100.82.197.124:11434"
},
"chatModels": [
{
"name": "tinyllama:1.1b",
"key": "tinyllama:1.1b"
}
],
"embeddingModels": [
{
"name": "nomic-embed-text:latest",
"key": "nomic-embed-text:latest"
}
]
}
],
"REDACTED_APP_PASSWORD": "llama-3.3-70b-versatile",
"defaultEmbeddingModel": "nomic-embed-text:latest"
}
Seattle Ollama Models
ssh seattle "docker exec ollama-seattle ollama list"
Available models:
tinyllama:1.1b(608MB) - Fast CPU inferenceqwen2.5:1.5b(940MB) - Slow but better qualitynomic-embed-text:latest(261MB) - For embeddings
Performance Comparison
| Configuration | Chat Speed | Quality | Notes |
|---|---|---|---|
| Qwen2.5 1.5B (Seattle CPU) | 10 minutes | Good | CPU overload, unusable |
| TinyLlama 1.1B (Seattle CPU) | 12 seconds | Basic | Usable but slow |
| Llama 3.3 70B (Groq API) | 0.4 seconds | Excellent | Best option |
Common Issues
Issue: "nomic-embed-text:latest does not support chat"
- Cause: Config has embedding model listed as chat model
- Fix: Ensure embedding models are only in
embeddingModelsarray
Issue: Browser shows old model selections
- Cause: Browser cache
- Fix: Clear browser cache (Ctrl+F5) and close all tabs
Issue: Database retains old conversations
- Fix: Clear database:
docker run --rm -v perplexica-data:/data alpine rm -f /data/db.sqlite
docker restart perplexica
Issue: Config reverts after restart
- Cause: Config is in Docker volume, not git-tracked file
- Fix: Update config in volume:
docker run --rm -v perplexica-data:/data -v /tmp:/tmp alpine cp /tmp/config.json /data/config.json
Testing
Test SearXNG (from inside container)
docker exec perplexica curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results | length'
Test Seattle Ollama
curl -s http://100.82.197.124:11434/api/tags | jq '.models[].name'
Test Groq API
curl -s https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "Test"}],
"max_tokens": 50
}' | jq -r '.choices[0].message.content'
Check Perplexica Config
docker run --rm -v perplexica-data:/data alpine cat /data/config.json | jq .
Recommendations
- Use Groq for chat (0.4s response time, excellent quality)
- Use Seattle Ollama for embeddings (nomic-embed-text:latest)
- Keep TinyLlama as fallback (if Groq rate limits hit)
- Monitor Groq rate limits (30 req/min on free tier)
Alternative Solutions
If Groq doesn't work:
- OpenRouter API: Similar to Groq, multiple models
- Anthropic Claude: Via API (costs money)
- Local GPU: Move Ollama to GPU-enabled host
- Accept slow performance: Use TinyLlama with 12s responses
Status
- ✅ Groq API key configured
- ✅ Groq API responding in 0.4s
- ✅ Config updated in Perplexica
- ❌ User reports web UI still not working (needs investigation)
Next Steps
- Test from web UI and capture exact error message
- Check browser console for JavaScript errors
- Check Perplexica logs during failed search
- Verify Groq API calls in network tab
- Consider switching to different LLM provider if Groq incompatible