Sanitized mirror from private repository - 2026-04-18 11:19:59 UTC

2026-04-18 11:19:59 +00:00
commit fb00a325d1
1418 changed files with 359990 additions and 0 deletions
--- a/docs/guides/PERPLEXICA_TROUBLESHOOTING.md
+++ b/docs/guides/PERPLEXICA_TROUBLESHOOTING.md
@@ -0,0 +1,179 @@
+# Perplexica Performance Troubleshooting
+
+## Issue Summary
+
+Perplexica search queries were taking 10 minutes with CPU-based Ollama inference on Seattle VM.
+
+## Timeline of Solutions Attempted
+
+### 1. Initial Setup: Seattle Ollama with Qwen2.5:1.5b
+- **Result**: 10 minutes per search query
+- **Problem**: CPU inference too slow, Seattle load average 9.82, Ollama using 937% CPU
+- **Metrics**:
+  - Chat requests: 16-28 seconds each
+  - Generate requests: 2+ minutes each
+
+### 2. Switched to TinyLlama:1.1b
+- **Model Size**: 608MB (vs 940MB for Qwen2.5)
+- **Speed**: 12 seconds per response
+- **Improvement**: 50x faster than Qwen2.5
+- **Quality**: Lower quality responses
+- **Status**: Works but still slow
+
+### 3. Switched to Groq API (Current)
+- **Model**: llama-3.3-70b-versatile
+- **Speed**: 0.4 seconds per response
+- **Quality**: Excellent (70B model)
+- **Cost**: Free tier (30 req/min, 14,400/day)
+- **Status**: Configured but user reports not working
+
+## Current Configuration
+
+### Perplexica Config (`config.json`)
+```json
+{
+  "version": 1,
+  "setupComplete": true,
+  "modelProviders": [
+    {
+      "id": "groq-provider",
+      "name": "Groq",
+      "type": "openai",
+      "config": {
+        "baseURL": "https://api.groq.com/openai/v1",
+        "apiKey": "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"
+      },
+      "chatModels": [
+        {
+          "name": "llama-3.3-70b-versatile",
+          "key": "llama-3.3-70b-versatile"
+        }
+      ]
+    },
+    {
+      "id": "seattle-ollama",
+      "name": "Seattle Ollama",
+      "type": "ollama",
+      "config": {
+        "baseURL": "http://100.82.197.124:11434"
+      },
+      "chatModels": [
+        {
+          "name": "tinyllama:1.1b",
+          "key": "tinyllama:1.1b"
+        }
+      ],
+      "embeddingModels": [
+        {
+          "name": "nomic-embed-text:latest",
+          "key": "nomic-embed-text:latest"
+        }
+      ]
+    }
+  ],
+  "REDACTED_APP_PASSWORD": "llama-3.3-70b-versatile",
+  "defaultEmbeddingModel": "nomic-embed-text:latest"
+}
+```
+
+### Seattle Ollama Models
+```bash
+ssh seattle "docker exec ollama-seattle ollama list"
+```
+
+Available models:
+- `tinyllama:1.1b` (608MB) - Fast CPU inference
+- `qwen2.5:1.5b` (940MB) - Slow but better quality
+- `nomic-embed-text:latest` (261MB) - For embeddings
+
+## Performance Comparison
+
+| Configuration | Chat Speed | Quality | Notes |
+|--------------|------------|---------|-------|
+| Qwen2.5 1.5B (Seattle CPU) | 10 minutes | Good | CPU overload, unusable |
+| TinyLlama 1.1B (Seattle CPU) | 12 seconds | Basic | Usable but slow |
+| Llama 3.3 70B (Groq API) | 0.4 seconds | Excellent | Best option |
+
+## Common Issues
+
+### Issue: "nomic-embed-text:latest does not support chat"
+- **Cause**: Config has embedding model listed as chat model
+- **Fix**: Ensure embedding models are only in `embeddingModels` array
+
+### Issue: Browser shows old model selections
+- **Cause**: Browser cache
+- **Fix**: Clear browser cache (Ctrl+F5) and close all tabs
+
+### Issue: Database retains old conversations
+- **Fix**: Clear database:
+```bash
+docker run --rm -v perplexica-data:/data alpine rm -f /data/db.sqlite
+docker restart perplexica
+```
+
+### Issue: Config reverts after restart
+- **Cause**: Config is in Docker volume, not git-tracked file
+- **Fix**: Update config in volume:
+```bash
+docker run --rm -v perplexica-data:/data -v /tmp:/tmp alpine cp /tmp/config.json /data/config.json
+```
+
+## Testing
+
+### Test SearXNG (from inside container)
+```bash
+docker exec perplexica curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results | length'
+```
+
+### Test Seattle Ollama
+```bash
+curl -s http://100.82.197.124:11434/api/tags | jq '.models[].name'
+```
+
+### Test Groq API
+```bash
+curl -s https://api.groq.com/openai/v1/chat/completions \
+  -H "Authorization: Bearer YOUR_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama-3.3-70b-versatile",
+    "messages": [{"role": "user", "content": "Test"}],
+    "max_tokens": 50
+  }' | jq -r '.choices[0].message.content'
+```
+
+### Check Perplexica Config
+```bash
+docker run --rm -v perplexica-data:/data alpine cat /data/config.json | jq .
+```
+
+## Recommendations
+
+1. **Use Groq for chat** (0.4s response time, excellent quality)
+2. **Use Seattle Ollama for embeddings** (nomic-embed-text:latest)
+3. **Keep TinyLlama as fallback** (if Groq rate limits hit)
+4. **Monitor Groq rate limits** (30 req/min on free tier)
+
+## Alternative Solutions
+
+If Groq doesn't work:
+
+1. **OpenRouter API**: Similar to Groq, multiple models
+2. **Anthropic Claude**: Via API (costs money)
+3. **Local GPU**: Move Ollama to GPU-enabled host
+4. **Accept slow performance**: Use TinyLlama with 12s responses
+
+## Status
+
+- ✅ Groq API key configured
+- ✅ Groq API responding in 0.4s
+- ✅ Config updated in Perplexica
+- ❌ User reports web UI still not working (needs investigation)
+
+## Next Steps
+
+1. Test from web UI and capture exact error message
+2. Check browser console for JavaScript errors
+3. Check Perplexica logs during failed search
+4. Verify Groq API calls in network tab
+5. Consider switching to different LLM provider if Groq incompatible