# ⚡ Performance Troubleshooting Guide ## Overview This guide helps diagnose and resolve performance issues in your homelab, from slow containers to network bottlenecks and storage problems. --- ## 🔍 Quick Diagnostics Checklist Before diving deep, run through this checklist: ```bash # 1. Check system resources htop # CPU, memory usage docker stats # Container resource usage df -h # Disk space iostat -x 1 5 # Disk I/O # 2. Check network iperf3 -c # Network throughput ping -c 10 # Latency netstat -tulpn # Open ports/connections # 3. Check containers docker ps -a # Container status docker logs --tail 100 # Recent logs ``` --- ## 🐌 Slow Container Performance ### Symptoms - Container takes long to respond - High CPU usage by specific container - Container restarts frequently ### Diagnosis ```bash # Check container resource usage docker stats # Check container logs for errors docker logs --tail 200 | grep -i "error\|warn\|slow" # Inspect container health docker inspect | jq '.[0].State' # Check container processes docker top ``` ### Common Causes & Solutions #### 1. Memory Limits Too Low ```yaml # docker-compose.yml - Increase memory limits services: myservice: mem_limit: 2g # Increase from default memswap_limit: 4g # Allow swap if needed ``` #### 2. CPU Throttling ```yaml # docker-compose.yml - Adjust CPU limits services: myservice: cpus: '2.0' # Allow 2 CPU cores cpu_shares: 1024 # Higher priority ``` #### 3. Storage I/O Bottleneck ```bash # Check if container is doing heavy I/O docker stats --format "table {{.Name}}\t{{.BlockIO}}" # Solution: Move data to faster storage (NVMe cache, SSD) ``` #### 4. Database Performance ```bash # PostgreSQL slow queries docker exec -it postgres psql -U user -c " SELECT query, calls, mean_time, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;" # Add indexes for slow queries # Increase shared_buffers in postgresql.conf ``` --- ## 🌐 Network Performance Issues ### Symptoms - Slow file transfers between hosts - High latency to services - Buffering when streaming media ### Diagnosis ```bash # Test throughput between hosts iperf3 -s # On server iperf3 -c -t 30 # On client # Expected speeds: # - 1GbE: ~940 Mbps # - 2.5GbE: ~2.35 Gbps # - 10GbE: ~9.4 Gbps # Check for packet loss ping -c 100 | tail -3 # Check network interface errors ip -s link show eth0 ``` ### Common Causes & Solutions #### 1. MTU Mismatch ```bash # Check current MTU ip link show | grep mtu # Test for MTU issues (should not fragment) ping -M do -s 1472 # Fix: Set consistent MTU across network ip link set eth0 mtu 1500 ``` #### 2. Duplex/Speed Mismatch ```bash # Check link speed ethtool eth0 | grep -i speed # Force correct speed (if auto-negotiation fails) ethtool -s eth0 speed 1000 duplex full autoneg off ``` #### 3. DNS Resolution Slow ```bash # Test DNS resolution time time dig google.com # If slow, check /etc/resolv.conf # Use local Pi-hole/AdGuard or fast upstream DNS # Fix in Docker # docker-compose.yml services: myservice: dns: - 192.168.1.x # Local DNS (Pi-hole) - 1.1.1.1 # Fallback ``` #### 4. Tailscale Performance ```bash # Check Tailscale connection type tailscale status # If using DERP relay (slow), check firewall # Port 41641/UDP should be open for direct connections # Check Tailscale latency tailscale ping ``` #### 5. Reverse Proxy Bottleneck ```bash # Check Nginx Proxy Manager logs docker logs nginx-proxy-manager --tail 100 # Increase worker connections # In nginx.conf: worker_processes auto; events { worker_connections 4096; } ``` --- ## 💾 Storage Performance Issues ### Symptoms - Slow read/write speeds - High disk I/O wait - Database queries timing out ### Diagnosis ```bash # Check disk I/O statistics iostat -xz 1 10 # Key metrics: # - %util > 90% = disk saturated # - await > 20ms = slow disk # - r/s, w/s = operations per second # Check for processes doing heavy I/O iotop -o # Test disk speed # Sequential write dd if=/dev/zero of=/volume1/test bs=1G count=1 oflag=direct # Sequential read dd if=/volume1/test of=/dev/null bs=1G count=1 iflag=direct ``` ### Common Causes & Solutions #### 1. HDD vs SSD/NVMe ``` Expected speeds: - HDD (7200 RPM): 100-200 MB/s sequential - SATA SSD: 500-550 MB/s - NVMe SSD: 2000-7000 MB/s # Move frequently accessed data to faster storage # Use NVMe cache on Synology NAS ``` #### 2. RAID Rebuild in Progress ```bash # Check Synology RAID status cat /proc/mdstat # During rebuild, expect 30-50% performance loss # Wait for rebuild to complete ``` #### 3. NVMe Cache Not Working ```bash # On Synology, check cache status in DSM # Storage Manager > SSD Cache # Common issues: # - Cache full (increase size or add more SSDs) # - Wrong cache mode (read-only vs read-write) # - Cache disabled after DSM update ``` #### 4. SMB/NFS Performance ```bash # Test SMB performance smbclient //nas/share -U user -c "put largefile.bin" # Optimize SMB settings in smb.conf: socket options = TCP_NODELAY IPTOS_LOWDELAY read raw = yes write raw = yes max xmit = 65535 # For NFS, use NFSv4.1 with larger rsize/wsize mount -t nfs4 nas:/share /mnt -o rsize=1048576,wsize=1048576 ``` #### 5. Docker Volume Performance ```bash # Check volume driver docker volume inspect # For better performance, use: # - Bind mounts instead of named volumes for large datasets # - Local SSD for database volumes # docker-compose.yml volumes: - /fast-ssd/postgres:/var/lib/postgresql/data ``` --- ## 📺 Media Streaming Performance ### Symptoms - Buffering during playback - Transcoding takes too long - Multiple streams cause stuttering ### Plex/Jellyfin Optimization ```bash # Check transcoding status # Plex: Settings > Dashboard > Now Playing # Jellyfin: Dashboard > Active Streams # Enable hardware transcoding # Plex: Settings > Transcoder > Hardware Acceleration # Jellyfin: Dashboard > Playback > Transcoding # For Intel QuickSync (Synology): docker run -d \ --device /dev/dri:/dev/dri \ # Pass GPU -e PLEX_CLAIM="claim-xxx" \ plexinc/pms-docker ``` ### Direct Play vs Transcoding ``` Performance comparison: - Direct Play: ~5-20 Mbps per stream (no CPU usage) - Transcoding: ~2000-4000 CPU score per 1080p stream # Optimize for Direct Play: # 1. Use compatible codecs (H.264, AAC) # 2. Match client capabilities # 3. Disable transcoding for local clients ``` ### Multiple Concurrent Streams ``` 10GbE can handle: ~80 concurrent 4K streams (theoretical) 1GbE can handle: ~8 concurrent 4K streams # If hitting limits: # 1. Reduce stream quality for remote users # 2. Enable bandwidth limits per user # 3. Upgrade network infrastructure ``` --- ## 🖥️ Synology NAS Performance ### Check System Health ```bash # SSH into Synology ssh admin@nas # Check CPU/memory top # Check storage health cat /proc/mdstat syno_hdd_util --all # Check Docker performance docker stats ``` ### Common Synology Issues #### 1. Indexing Slowing System ```bash # Check if Synology is indexing ps aux | grep -i index # Temporarily stop indexing synoservicectl --stop synoindexd # Or schedule indexing for off-hours # Control Panel > Indexing Service > Schedule ``` #### 2. Snapshot Replication Running ```bash # Check running tasks synoschedtask --list # Schedule snapshots during low-usage hours ``` #### 3. Antivirus Scanning ```bash # Disable real-time scanning or schedule scans # Security Advisor > Advanced > Scheduled Scan ``` #### 4. Memory Pressure ```bash # Check memory usage free -h # If low on RAM, consider: # - Adding more RAM (DS1823xs+ supports up to 32GB) # - Reducing number of running containers # - Disabling unused packages ``` --- ## 📊 Monitoring for Performance ### Set Up Prometheus Alerts ```yaml # prometheus/rules/performance.yml groups: - name: performance rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage on {{ $labels.instance }}" - alert: HighMemoryUsage expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90 for: 5m labels: severity: warning - alert: DiskIOHigh expr: rate(node_disk_io_time_seconds_total[5m]) > 0.9 for: 10m labels: severity: warning - alert: NetworkErrors expr: rate(node_network_receive_errs_total[5m]) > 10 for: 5m labels: severity: warning ``` ### Grafana Dashboard Panels Key metrics to monitor: - CPU usage by core - Memory usage and swap - Disk I/O latency (await) - Network throughput and errors - Container resource usage - Docker volume I/O --- ## 🛠️ Performance Tuning Checklist ### System Level - [ ] Kernel parameters optimized (`/etc/sysctl.conf`) - [ ] Disk scheduler appropriate for workload (mq-deadline for SSD) - [ ] Swap configured appropriately - [ ] File descriptor limits increased ### Docker Level - [ ] Container resource limits set - [ ] Logging driver configured (json-file with max-size) - [ ] Unused containers/images removed - [ ] Volumes on appropriate storage ### Network Level - [ ] Jumbo frames enabled (if supported) - [ ] DNS resolution fast - [ ] Firewall rules optimized - [ ] Quality of Service (QoS) configured ### Application Level - [ ] Database indexes optimized - [ ] Caching enabled (Redis/Memcached) - [ ] Connection pooling configured - [ ] Static assets served efficiently --- ## 🔗 Related Documentation - [Network Performance Tuning](../infrastructure/network-performance-tuning.md) - [Monitoring Setup](../admin/monitoring.md) - [Common Issues](common-issues.md) - [10GbE Backbone](../diagrams/10gbe-backbone.md) - [Storage Topology](../diagrams/storage-topology.md)