Sanitized mirror from private repository - 2026-04-01 04:44:34 UTC

2026-04-01 04:44:34 +00:00
commit ac5a4ca940
1285 changed files with 332083 additions and 0 deletions
--- a/docs/troubleshooting/performance.md
+++ b/docs/troubleshooting/performance.md
@@ -0,0 +1,475 @@
+# ⚡ Performance Troubleshooting Guide
+
+## Overview
+
+This guide helps diagnose and resolve performance issues in your homelab, from slow containers to network bottlenecks and storage problems.
+
+---
+
+## 🔍 Quick Diagnostics Checklist
+
+Before diving deep, run through this checklist:
+
+```bash
+# 1. Check system resources
+htop                           # CPU, memory usage
+docker stats                   # Container resource usage
+df -h                          # Disk space
+iostat -x 1 5                  # Disk I/O
+
+# 2. Check network
+iperf3 -c <target-ip>          # Network throughput
+ping -c 10 <target>            # Latency
+netstat -tulpn                 # Open ports/connections
+
+# 3. Check containers
+docker ps -a                   # Container status
+docker logs <container> --tail 100  # Recent logs
+```
+
+---
+
+## 🐌 Slow Container Performance
+
+### Symptoms
+- Container takes long to respond
+- High CPU usage by specific container
+- Container restarts frequently
+
+### Diagnosis
+
+```bash
+# Check container resource usage
+docker stats <container_name>
+
+# Check container logs for errors
+docker logs <container_name> --tail 200 | grep -i "error\|warn\|slow"
+
+# Inspect container health
+docker inspect <container_name> | jq '.[0].State'
+
+# Check container processes
+docker top <container_name>
+```
+
+### Common Causes & Solutions
+
+#### 1. Memory Limits Too Low
+```yaml
+# docker-compose.yml - Increase memory limits
+services:
+  myservice:
+    mem_limit: 2g        # Increase from default
+    memswap_limit: 4g    # Allow swap if needed
+```
+
+#### 2. CPU Throttling
+```yaml
+# docker-compose.yml - Adjust CPU limits
+services:
+  myservice:
+    cpus: '2.0'          # Allow 2 CPU cores
+    cpu_shares: 1024     # Higher priority
+```
+
+#### 3. Storage I/O Bottleneck
+```bash
+# Check if container is doing heavy I/O
+docker stats --format "table {{.Name}}\t{{.BlockIO}}"
+
+# Solution: Move data to faster storage (NVMe cache, SSD)
+```
+
+#### 4. Database Performance
+```bash
+# PostgreSQL slow queries
+docker exec -it postgres psql -U user -c "
+SELECT query, calls, mean_time, total_time 
+FROM pg_stat_statements 
+ORDER BY total_time DESC 
+LIMIT 10;"
+
+# Add indexes for slow queries
+# Increase shared_buffers in postgresql.conf
+```
+
+---
+
+## 🌐 Network Performance Issues
+
+### Symptoms
+- Slow file transfers between hosts
+- High latency to services
+- Buffering when streaming media
+
+### Diagnosis
+
+```bash
+# Test throughput between hosts
+iperf3 -s                      # On server
+iperf3 -c <server-ip> -t 30    # On client
+
+# Expected speeds:
+# - 1GbE: ~940 Mbps
+# - 2.5GbE: ~2.35 Gbps
+# - 10GbE: ~9.4 Gbps
+
+# Check for packet loss
+ping -c 100 <target> | tail -3
+
+# Check network interface errors
+ip -s link show eth0
+```
+
+### Common Causes & Solutions
+
+#### 1. MTU Mismatch
+```bash
+# Check current MTU
+ip link show | grep mtu
+
+# Test for MTU issues (should not fragment)
+ping -M do -s 1472 <target>
+
+# Fix: Set consistent MTU across network
+ip link set eth0 mtu 1500
+```
+
+#### 2. Duplex/Speed Mismatch
+```bash
+# Check link speed
+ethtool eth0 | grep -i speed
+
+# Force correct speed (if auto-negotiation fails)
+ethtool -s eth0 speed 1000 duplex full autoneg off
+```
+
+#### 3. DNS Resolution Slow
+```bash
+# Test DNS resolution time
+time dig google.com
+
+# If slow, check /etc/resolv.conf
+# Use local Pi-hole/AdGuard or fast upstream DNS
+
+# Fix in Docker
+# docker-compose.yml
+services:
+  myservice:
+    dns:
+      - 192.168.1.x    # Local DNS (Pi-hole)
+      - 1.1.1.1        # Fallback
+```
+
+#### 4. Tailscale Performance
+```bash
+# Check Tailscale connection type
+tailscale status
+
+# If using DERP relay (slow), check firewall
+# Port 41641/UDP should be open for direct connections
+
+# Check Tailscale latency
+tailscale ping <device>
+```
+
+#### 5. Reverse Proxy Bottleneck
+```bash
+# Check Nginx Proxy Manager logs
+docker logs nginx-proxy-manager --tail 100
+
+# Increase worker connections
+# In nginx.conf:
+worker_processes auto;
+events {
+    worker_connections 4096;
+}
+```
+
+---
+
+## 💾 Storage Performance Issues
+
+### Symptoms
+- Slow read/write speeds
+- High disk I/O wait
+- Database queries timing out
+
+### Diagnosis
+
+```bash
+# Check disk I/O statistics
+iostat -xz 1 10
+
+# Key metrics:
+# - %util > 90% = disk saturated
+# - await > 20ms = slow disk
+# - r/s, w/s = operations per second
+
+# Check for processes doing heavy I/O
+iotop -o
+
+# Test disk speed
+# Sequential write
+dd if=/dev/zero of=/volume1/test bs=1G count=1 oflag=direct
+
+# Sequential read
+dd if=/volume1/test of=/dev/null bs=1G count=1 iflag=direct
+```
+
+### Common Causes & Solutions
+
+#### 1. HDD vs SSD/NVMe
+```
+Expected speeds:
+- HDD (7200 RPM): 100-200 MB/s sequential
+- SATA SSD: 500-550 MB/s
+- NVMe SSD: 2000-7000 MB/s
+
+# Move frequently accessed data to faster storage
+# Use NVMe cache on Synology NAS
+```
+
+#### 2. RAID Rebuild in Progress
+```bash
+# Check Synology RAID status
+cat /proc/mdstat
+
+# During rebuild, expect 30-50% performance loss
+# Wait for rebuild to complete
+```
+
+#### 3. NVMe Cache Not Working
+```bash
+# On Synology, check cache status in DSM
+# Storage Manager > SSD Cache
+
+# Common issues:
+# - Cache full (increase size or add more SSDs)
+# - Wrong cache mode (read-only vs read-write)
+# - Cache disabled after DSM update
+```
+
+#### 4. SMB/NFS Performance
+```bash
+# Test SMB performance
+smbclient //nas/share -U user -c "put largefile.bin"
+
+# Optimize SMB settings in smb.conf:
+socket options = TCP_NODELAY IPTOS_LOWDELAY
+read raw = yes
+write raw = yes
+max xmit = 65535
+
+# For NFS, use NFSv4.1 with larger rsize/wsize
+mount -t nfs4 nas:/share /mnt -o rsize=1048576,wsize=1048576
+```
+
+#### 5. Docker Volume Performance
+```bash
+# Check volume driver
+docker volume inspect <volume>
+
+# For better performance, use:
+# - Bind mounts instead of named volumes for large datasets
+# - Local SSD for database volumes
+
+# docker-compose.yml
+volumes:
+  - /fast-ssd/postgres:/var/lib/postgresql/data
+```
+
+---
+
+## 📺 Media Streaming Performance
+
+### Symptoms
+- Buffering during playback
+- Transcoding takes too long
+- Multiple streams cause stuttering
+
+### Plex/Jellyfin Optimization
+
+```bash
+# Check transcoding status
+# Plex: Settings > Dashboard > Now Playing
+# Jellyfin: Dashboard > Active Streams
+
+# Enable hardware transcoding
+# Plex: Settings > Transcoder > Hardware Acceleration
+# Jellyfin: Dashboard > Playback > Transcoding
+
+# For Intel QuickSync (Synology):
+docker run -d \
+  --device /dev/dri:/dev/dri \  # Pass GPU
+  -e PLEX_CLAIM="claim-xxx" \
+  plexinc/pms-docker
+```
+
+### Direct Play vs Transcoding
+```
+Performance comparison:
+- Direct Play: ~5-20 Mbps per stream (no CPU usage)
+- Transcoding: ~2000-4000 CPU score per 1080p stream
+
+# Optimize for Direct Play:
+# 1. Use compatible codecs (H.264, AAC)
+# 2. Match client capabilities
+# 3. Disable transcoding for local clients
+```
+
+### Multiple Concurrent Streams
+```
+10GbE can handle: ~80 concurrent 4K streams (theoretical)
+1GbE can handle: ~8 concurrent 4K streams
+
+# If hitting limits:
+# 1. Reduce stream quality for remote users
+# 2. Enable bandwidth limits per user
+# 3. Upgrade network infrastructure
+```
+
+---
+
+## 🖥️ Synology NAS Performance
+
+### Check System Health
+```bash
+# SSH into Synology
+ssh admin@nas
+
+# Check CPU/memory
+top
+
+# Check storage health
+cat /proc/mdstat
+syno_hdd_util --all
+
+# Check Docker performance
+docker stats
+```
+
+### Common Synology Issues
+
+#### 1. Indexing Slowing System
+```bash
+# Check if Synology is indexing
+ps aux | grep -i index
+
+# Temporarily stop indexing
+synoservicectl --stop synoindexd
+
+# Or schedule indexing for off-hours
+# Control Panel > Indexing Service > Schedule
+```
+
+#### 2. Snapshot Replication Running
+```bash
+# Check running tasks
+synoschedtask --list
+
+# Schedule snapshots during low-usage hours
+```
+
+#### 3. Antivirus Scanning
+```bash
+# Disable real-time scanning or schedule scans
+# Security Advisor > Advanced > Scheduled Scan
+```
+
+#### 4. Memory Pressure
+```bash
+# Check memory usage
+free -h
+
+# If low on RAM, consider:
+# - Adding more RAM (DS1823xs+ supports up to 32GB)
+# - Reducing number of running containers
+# - Disabling unused packages
+```
+
+---
+
+## 📊 Monitoring for Performance
+
+### Set Up Prometheus Alerts
+
+```yaml
+# prometheus/rules/performance.yml
+groups:
+  - name: performance
+    rules:
+      - alert: HighCPUUsage
+        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High CPU usage on {{ $labels.instance }}"
+          
+      - alert: HighMemoryUsage
+        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
+        for: 5m
+        labels:
+          severity: warning
+          
+      - alert: DiskIOHigh
+        expr: rate(node_disk_io_time_seconds_total[5m]) > 0.9
+        for: 10m
+        labels:
+          severity: warning
+          
+      - alert: NetworkErrors
+        expr: rate(node_network_receive_errs_total[5m]) > 10
+        for: 5m
+        labels:
+          severity: warning
+```
+
+### Grafana Dashboard Panels
+
+Key metrics to monitor:
+- CPU usage by core
+- Memory usage and swap
+- Disk I/O latency (await)
+- Network throughput and errors
+- Container resource usage
+- Docker volume I/O
+
+---
+
+## 🛠️ Performance Tuning Checklist
+
+### System Level
+- [ ] Kernel parameters optimized (`/etc/sysctl.conf`)
+- [ ] Disk scheduler appropriate for workload (mq-deadline for SSD)
+- [ ] Swap configured appropriately
+- [ ] File descriptor limits increased
+
+### Docker Level
+- [ ] Container resource limits set
+- [ ] Logging driver configured (json-file with max-size)
+- [ ] Unused containers/images removed
+- [ ] Volumes on appropriate storage
+
+### Network Level
+- [ ] Jumbo frames enabled (if supported)
+- [ ] DNS resolution fast
+- [ ] Firewall rules optimized
+- [ ] Quality of Service (QoS) configured
+
+### Application Level
+- [ ] Database indexes optimized
+- [ ] Caching enabled (Redis/Memcached)
+- [ ] Connection pooling configured
+- [ ] Static assets served efficiently
+
+---
+
+## 🔗 Related Documentation
+
+- [Network Performance Tuning](../infrastructure/network-performance-tuning.md)
+- [Monitoring Setup](../admin/monitoring.md)
+- [Common Issues](common-issues.md)
+- [10GbE Backbone](../diagrams/10gbe-backbone.md)
+- [Storage Topology](../diagrams/storage-topology.md)