homelab-optimized/docs/guides/PERPLEXICA_TROUBLESHOOTING.md

# Perplexica Performance Troubleshooting

## Issue Summary

Perplexica search queries were taking 10 minutes with CPU-based Ollama inference on Seattle VM.

## Timeline of Solutions Attempted

### 1. Initial Setup: Seattle Ollama with Qwen2.5:1.5b
- **Result**: 10 minutes per search query
- **Problem**: CPU inference too slow, Seattle load average 9.82, Ollama using 937% CPU
- **Metrics**:
  - Chat requests: 16-28 seconds each
  - Generate requests: 2+ minutes each

### 2. Switched to TinyLlama:1.1b
- **Model Size**: 608MB (vs 940MB for Qwen2.5)
- **Speed**: 12 seconds per response
- **Improvement**: 50x faster than Qwen2.5
- **Quality**: Lower quality responses
- **Status**: Works but still slow

### 3. Switched to Groq API (Current)
- **Model**: llama-3.3-70b-versatile
- **Speed**: 0.4 seconds per response
- **Quality**: Excellent (70B model)
- **Cost**: Free tier (30 req/min, 14,400/day)
- **Status**: Configured but user reports not working

## Current Configuration

### Perplexica Config (`config.json`)
```json
{
  "version": 1,
  "setupComplete": true,
  "modelProviders": [
    {
      "id": "groq-provider",
      "name": "Groq",
      "type": "openai",
      "config": {
        "baseURL": "https://api.groq.com/openai/v1",
        "apiKey": "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"
      },
      "chatModels": [
        {
          "name": "llama-3.3-70b-versatile",
          "key": "llama-3.3-70b-versatile"
        }
      ]
    },
    {
      "id": "seattle-ollama",
      "name": "Seattle Ollama",
      "type": "ollama",
      "config": {
        "baseURL": "http://100.82.197.124:11434"
      },
      "chatModels": [
        {
          "name": "tinyllama:1.1b",
          "key": "tinyllama:1.1b"
        }
      ],
      "embeddingModels": [
        {
          "name": "nomic-embed-text:latest",
          "key": "nomic-embed-text:latest"
        }
      ]
    }
  ],
  "REDACTED_APP_PASSWORD": "llama-3.3-70b-versatile",
  "defaultEmbeddingModel": "nomic-embed-text:latest"
}
```

### Seattle Ollama Models
```bash
ssh seattle "docker exec ollama-seattle ollama list"
```

Available models:
- `tinyllama:1.1b` (608MB) - Fast CPU inference
- `qwen2.5:1.5b` (940MB) - Slow but better quality
- `nomic-embed-text:latest` (261MB) - For embeddings

## Performance Comparison

| Configuration | Chat Speed | Quality | Notes |
|--------------|------------|---------|-------|
| Qwen2.5 1.5B (Seattle CPU) | 10 minutes | Good | CPU overload, unusable |
| TinyLlama 1.1B (Seattle CPU) | 12 seconds | Basic | Usable but slow |
| Llama 3.3 70B (Groq API) | 0.4 seconds | Excellent | Best option |

## Common Issues

### Issue: "nomic-embed-text:latest does not support chat"
- **Cause**: Config has embedding model listed as chat model
- **Fix**: Ensure embedding models are only in `embeddingModels` array

### Issue: Browser shows old model selections
- **Cause**: Browser cache
- **Fix**: Clear browser cache (Ctrl+F5) and close all tabs

### Issue: Database retains old conversations
- **Fix**: Clear database:
```bash
docker run --rm -v perplexica-data:/data alpine rm -f /data/db.sqlite
docker restart perplexica
```

### Issue: Config reverts after restart
- **Cause**: Config is in Docker volume, not git-tracked file
- **Fix**: Update config in volume:
```bash
docker run --rm -v perplexica-data:/data -v /tmp:/tmp alpine cp /tmp/config.json /data/config.json
```

## Testing

### Test SearXNG (from inside container)
```bash
docker exec perplexica curl -s "http://localhost:8080/search?q=test&format=json" | jq '.results | length'
```

### Test Seattle Ollama
```bash
curl -s http://100.82.197.124:11434/api/tags | jq '.models[].name'
```

### Test Groq API
```bash
curl -s https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Test"}],
    "max_tokens": 50
  }' | jq -r '.choices[0].message.content'
```

### Check Perplexica Config
```bash
docker run --rm -v perplexica-data:/data alpine cat /data/config.json | jq .
```

## Recommendations

1. **Use Groq for chat** (0.4s response time, excellent quality)
2. **Use Seattle Ollama for embeddings** (nomic-embed-text:latest)
3. **Keep TinyLlama as fallback** (if Groq rate limits hit)
4. **Monitor Groq rate limits** (30 req/min on free tier)

## Alternative Solutions

If Groq doesn't work:

1. **OpenRouter API**: Similar to Groq, multiple models
2. **Anthropic Claude**: Via API (costs money)
3. **Local GPU**: Move Ollama to GPU-enabled host
4. **Accept slow performance**: Use TinyLlama with 12s responses

## Status

- ✅ Groq API key configured
- ✅ Groq API responding in 0.4s
- ✅ Config updated in Perplexica
- ❌ User reports web UI still not working (needs investigation)

## Next Steps

1. Test from web UI and capture exact error message
2. Check browser console for JavaScript errors
3. Check Perplexica logs during failed search
4. Verify Groq API calls in network tab
5. Consider switching to different LLM provider if Groq incompatible