180 lines
5.0 KiB
Markdown
180 lines
5.0 KiB
Markdown
# Perplexica Performance Troubleshooting
|
|
|
|
## Issue Summary
|
|
|
|
Perplexica search queries were taking 10 minutes with CPU-based Ollama inference on Seattle VM.
|
|
|
|
## Timeline of Solutions Attempted
|
|
|
|
### 1. Initial Setup: Seattle Ollama with Qwen2.5:1.5b
|
|
- **Result**: 10 minutes per search query
|
|
- **Problem**: CPU inference too slow, Seattle load average 9.82, Ollama using 937% CPU
|
|
- **Metrics**:
|
|
- Chat requests: 16-28 seconds each
|
|
- Generate requests: 2+ minutes each
|
|
|
|
### 2. Switched to TinyLlama:1.1b
|
|
- **Model Size**: 608MB (vs 940MB for Qwen2.5)
|
|
- **Speed**: 12 seconds per response
|
|
- **Improvement**: 50x faster than Qwen2.5
|
|
- **Quality**: Lower quality responses
|
|
- **Status**: Works but still slow
|
|
|
|
### 3. Switched to Groq API (Current)
|
|
- **Model**: llama-3.3-70b-versatile
|
|
- **Speed**: 0.4 seconds per response
|
|
- **Quality**: Excellent (70B model)
|
|
- **Cost**: Free tier (30 req/min, 14,400/day)
|
|
- **Status**: Configured but user reports not working
|
|
|
|
## Current Configuration
|
|
|
|
### Perplexica Config (`config.json`)
|
|
```json
|
|
{
|
|
"version": 1,
|
|
"setupComplete": true,
|
|
"modelProviders": [
|
|
{
|
|
"id": "groq-provider",
|
|
"name": "Groq",
|
|
"type": "openai",
|
|
"config": {
|
|
"baseURL": "https://api.groq.com/openai/v1",
|
|
"apiKey": "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"
|
|
},
|
|
"chatModels": [
|
|
{
|
|
"name": "llama-3.3-70b-versatile",
|
|
"key": "llama-3.3-70b-versatile"
|
|
}
|
|
]
|
|
},
|
|
{
|
|
"id": "seattle-ollama",
|
|
"name": "Seattle Ollama",
|
|
"type": "ollama",
|
|
"config": {
|
|
"baseURL": "http://100.82.197.124:11434"
|
|
},
|
|
"chatModels": [
|
|
{
|
|
"name": "tinyllama:1.1b",
|
|
"key": "tinyllama:1.1b"
|
|
}
|
|
],
|
|
"embeddingModels": [
|
|
{
|
|
"name": "nomic-embed-text:latest",
|
|
"key": "nomic-embed-text:latest"
|
|
}
|
|
]
|
|
}
|
|
],
|
|
"REDACTED_APP_PASSWORD": "llama-3.3-70b-versatile",
|
|
"defaultEmbeddingModel": "nomic-embed-text:latest"
|
|
}
|
|
```
|
|
|
|
### Seattle Ollama Models
|
|
```bash
|
|
ssh seattle "docker exec ollama-seattle ollama list"
|
|
```
|
|
|
|
Available models:
|
|
- `tinyllama:1.1b` (608MB) - Fast CPU inference
|
|
- `qwen2.5:1.5b` (940MB) - Slow but better quality
|
|
- `nomic-embed-text:latest` (261MB) - For embeddings
|
|
|
|
## Performance Comparison
|
|
|
|
| Configuration | Chat Speed | Quality | Notes |
|
|
|--------------|------------|---------|-------|
|
|
| Qwen2.5 1.5B (Seattle CPU) | 10 minutes | Good | CPU overload, unusable |
|
|
| TinyLlama 1.1B (Seattle CPU) | 12 seconds | Basic | Usable but slow |
|
|
| Llama 3.3 70B (Groq API) | 0.4 seconds | Excellent | Best option |
|
|
|
|
## Common Issues
|
|
|
|
### Issue: "nomic-embed-text:latest does not support chat"
|
|
- **Cause**: Config has embedding model listed as chat model
|
|
- **Fix**: Ensure embedding models are only in `embeddingModels` array
|
|
|
|
### Issue: Browser shows old model selections
|
|
- **Cause**: Browser cache
|
|
- **Fix**: Clear browser cache (Ctrl+F5) and close all tabs
|
|
|
|
### Issue: Database retains old conversations
|
|
- **Fix**: Clear database:
|
|
```bash
|
|
docker run --rm -v perplexica-data:/data alpine rm -f /data/db.sqlite
|
|
docker restart perplexica
|
|
```
|
|
|
|
### Issue: Config reverts after restart
|
|
- **Cause**: Config is in Docker volume, not git-tracked file
|
|
- **Fix**: Update config in volume:
|
|
```bash
|
|
docker run --rm -v perplexica-data:/data -v /tmp:/tmp alpine cp /tmp/config.json /data/config.json
|
|
```
|
|
|
|
## Testing
|
|
|
|
### Test SearXNG (from inside container)
|
|
```bash
|
|
docker exec perplexica curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results | length'
|
|
```
|
|
|
|
### Test Seattle Ollama
|
|
```bash
|
|
curl -s http://100.82.197.124:11434/api/tags | jq '.models[].name'
|
|
```
|
|
|
|
### Test Groq API
|
|
```bash
|
|
curl -s https://api.groq.com/openai/v1/chat/completions \
|
|
-H "Authorization: Bearer YOUR_API_KEY" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{
|
|
"model": "llama-3.3-70b-versatile",
|
|
"messages": [{"role": "user", "content": "Test"}],
|
|
"max_tokens": 50
|
|
}' | jq -r '.choices[0].message.content'
|
|
```
|
|
|
|
### Check Perplexica Config
|
|
```bash
|
|
docker run --rm -v perplexica-data:/data alpine cat /data/config.json | jq .
|
|
```
|
|
|
|
## Recommendations
|
|
|
|
1. **Use Groq for chat** (0.4s response time, excellent quality)
|
|
2. **Use Seattle Ollama for embeddings** (nomic-embed-text:latest)
|
|
3. **Keep TinyLlama as fallback** (if Groq rate limits hit)
|
|
4. **Monitor Groq rate limits** (30 req/min on free tier)
|
|
|
|
## Alternative Solutions
|
|
|
|
If Groq doesn't work:
|
|
|
|
1. **OpenRouter API**: Similar to Groq, multiple models
|
|
2. **Anthropic Claude**: Via API (costs money)
|
|
3. **Local GPU**: Move Ollama to GPU-enabled host
|
|
4. **Accept slow performance**: Use TinyLlama with 12s responses
|
|
|
|
## Status
|
|
|
|
- ✅ Groq API key configured
|
|
- ✅ Groq API responding in 0.4s
|
|
- ✅ Config updated in Perplexica
|
|
- ❌ User reports web UI still not working (needs investigation)
|
|
|
|
## Next Steps
|
|
|
|
1. Test from web UI and capture exact error message
|
|
2. Check browser console for JavaScript errors
|
|
3. Check Perplexica logs during failed search
|
|
4. Verify Groq API calls in network tab
|
|
5. Consider switching to different LLM provider if Groq incompatible
|