476 lines
10 KiB
Markdown
476 lines
10 KiB
Markdown
# ⚡ Performance Troubleshooting Guide
|
|
|
|
## Overview
|
|
|
|
This guide helps diagnose and resolve performance issues in your homelab, from slow containers to network bottlenecks and storage problems.
|
|
|
|
---
|
|
|
|
## 🔍 Quick Diagnostics Checklist
|
|
|
|
Before diving deep, run through this checklist:
|
|
|
|
```bash
|
|
# 1. Check system resources
|
|
htop # CPU, memory usage
|
|
docker stats # Container resource usage
|
|
df -h # Disk space
|
|
iostat -x 1 5 # Disk I/O
|
|
|
|
# 2. Check network
|
|
iperf3 -c <target-ip> # Network throughput
|
|
ping -c 10 <target> # Latency
|
|
netstat -tulpn # Open ports/connections
|
|
|
|
# 3. Check containers
|
|
docker ps -a # Container status
|
|
docker logs <container> --tail 100 # Recent logs
|
|
```
|
|
|
|
---
|
|
|
|
## 🐌 Slow Container Performance
|
|
|
|
### Symptoms
|
|
- Container takes long to respond
|
|
- High CPU usage by specific container
|
|
- Container restarts frequently
|
|
|
|
### Diagnosis
|
|
|
|
```bash
|
|
# Check container resource usage
|
|
docker stats <container_name>
|
|
|
|
# Check container logs for errors
|
|
docker logs <container_name> --tail 200 | grep -i "error\|warn\|slow"
|
|
|
|
# Inspect container health
|
|
docker inspect <container_name> | jq '.[0].State'
|
|
|
|
# Check container processes
|
|
docker top <container_name>
|
|
```
|
|
|
|
### Common Causes & Solutions
|
|
|
|
#### 1. Memory Limits Too Low
|
|
```yaml
|
|
# docker-compose.yml - Increase memory limits
|
|
services:
|
|
myservice:
|
|
mem_limit: 2g # Increase from default
|
|
memswap_limit: 4g # Allow swap if needed
|
|
```
|
|
|
|
#### 2. CPU Throttling
|
|
```yaml
|
|
# docker-compose.yml - Adjust CPU limits
|
|
services:
|
|
myservice:
|
|
cpus: '2.0' # Allow 2 CPU cores
|
|
cpu_shares: 1024 # Higher priority
|
|
```
|
|
|
|
#### 3. Storage I/O Bottleneck
|
|
```bash
|
|
# Check if container is doing heavy I/O
|
|
docker stats --format "table {{.Name}}\t{{.BlockIO}}"
|
|
|
|
# Solution: Move data to faster storage (NVMe cache, SSD)
|
|
```
|
|
|
|
#### 4. Database Performance
|
|
```bash
|
|
# PostgreSQL slow queries
|
|
docker exec -it postgres psql -U user -c "
|
|
SELECT query, calls, mean_time, total_time
|
|
FROM pg_stat_statements
|
|
ORDER BY total_time DESC
|
|
LIMIT 10;"
|
|
|
|
# Add indexes for slow queries
|
|
# Increase shared_buffers in postgresql.conf
|
|
```
|
|
|
|
---
|
|
|
|
## 🌐 Network Performance Issues
|
|
|
|
### Symptoms
|
|
- Slow file transfers between hosts
|
|
- High latency to services
|
|
- Buffering when streaming media
|
|
|
|
### Diagnosis
|
|
|
|
```bash
|
|
# Test throughput between hosts
|
|
iperf3 -s # On server
|
|
iperf3 -c <server-ip> -t 30 # On client
|
|
|
|
# Expected speeds:
|
|
# - 1GbE: ~940 Mbps
|
|
# - 2.5GbE: ~2.35 Gbps
|
|
# - 10GbE: ~9.4 Gbps
|
|
|
|
# Check for packet loss
|
|
ping -c 100 <target> | tail -3
|
|
|
|
# Check network interface errors
|
|
ip -s link show eth0
|
|
```
|
|
|
|
### Common Causes & Solutions
|
|
|
|
#### 1. MTU Mismatch
|
|
```bash
|
|
# Check current MTU
|
|
ip link show | grep mtu
|
|
|
|
# Test for MTU issues (should not fragment)
|
|
ping -M do -s 1472 <target>
|
|
|
|
# Fix: Set consistent MTU across network
|
|
ip link set eth0 mtu 1500
|
|
```
|
|
|
|
#### 2. Duplex/Speed Mismatch
|
|
```bash
|
|
# Check link speed
|
|
ethtool eth0 | grep -i speed
|
|
|
|
# Force correct speed (if auto-negotiation fails)
|
|
ethtool -s eth0 speed 1000 duplex full autoneg off
|
|
```
|
|
|
|
#### 3. DNS Resolution Slow
|
|
```bash
|
|
# Test DNS resolution time
|
|
time dig google.com
|
|
|
|
# If slow, check /etc/resolv.conf
|
|
# Use local Pi-hole/AdGuard or fast upstream DNS
|
|
|
|
# Fix in Docker
|
|
# docker-compose.yml
|
|
services:
|
|
myservice:
|
|
dns:
|
|
- 192.168.1.x # Local DNS (Pi-hole)
|
|
- 1.1.1.1 # Fallback
|
|
```
|
|
|
|
#### 4. Tailscale Performance
|
|
```bash
|
|
# Check Tailscale connection type
|
|
tailscale status
|
|
|
|
# If using DERP relay (slow), check firewall
|
|
# Port 41641/UDP should be open for direct connections
|
|
|
|
# Check Tailscale latency
|
|
tailscale ping <device>
|
|
```
|
|
|
|
#### 5. Reverse Proxy Bottleneck
|
|
```bash
|
|
# Check Nginx Proxy Manager logs
|
|
docker logs nginx-proxy-manager --tail 100
|
|
|
|
# Increase worker connections
|
|
# In nginx.conf:
|
|
worker_processes auto;
|
|
events {
|
|
worker_connections 4096;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 💾 Storage Performance Issues
|
|
|
|
### Symptoms
|
|
- Slow read/write speeds
|
|
- High disk I/O wait
|
|
- Database queries timing out
|
|
|
|
### Diagnosis
|
|
|
|
```bash
|
|
# Check disk I/O statistics
|
|
iostat -xz 1 10
|
|
|
|
# Key metrics:
|
|
# - %util > 90% = disk saturated
|
|
# - await > 20ms = slow disk
|
|
# - r/s, w/s = operations per second
|
|
|
|
# Check for processes doing heavy I/O
|
|
iotop -o
|
|
|
|
# Test disk speed
|
|
# Sequential write
|
|
dd if=/dev/zero of=/volume1/test bs=1G count=1 oflag=direct
|
|
|
|
# Sequential read
|
|
dd if=/volume1/test of=/dev/null bs=1G count=1 iflag=direct
|
|
```
|
|
|
|
### Common Causes & Solutions
|
|
|
|
#### 1. HDD vs SSD/NVMe
|
|
```
|
|
Expected speeds:
|
|
- HDD (7200 RPM): 100-200 MB/s sequential
|
|
- SATA SSD: 500-550 MB/s
|
|
- NVMe SSD: 2000-7000 MB/s
|
|
|
|
# Move frequently accessed data to faster storage
|
|
# Use NVMe cache on Synology NAS
|
|
```
|
|
|
|
#### 2. RAID Rebuild in Progress
|
|
```bash
|
|
# Check Synology RAID status
|
|
cat /proc/mdstat
|
|
|
|
# During rebuild, expect 30-50% performance loss
|
|
# Wait for rebuild to complete
|
|
```
|
|
|
|
#### 3. NVMe Cache Not Working
|
|
```bash
|
|
# On Synology, check cache status in DSM
|
|
# Storage Manager > SSD Cache
|
|
|
|
# Common issues:
|
|
# - Cache full (increase size or add more SSDs)
|
|
# - Wrong cache mode (read-only vs read-write)
|
|
# - Cache disabled after DSM update
|
|
```
|
|
|
|
#### 4. SMB/NFS Performance
|
|
```bash
|
|
# Test SMB performance
|
|
smbclient //nas/share -U user -c "put largefile.bin"
|
|
|
|
# Optimize SMB settings in smb.conf:
|
|
socket options = TCP_NODELAY IPTOS_LOWDELAY
|
|
read raw = yes
|
|
write raw = yes
|
|
max xmit = 65535
|
|
|
|
# For NFS, use NFSv4.1 with larger rsize/wsize
|
|
mount -t nfs4 nas:/share /mnt -o rsize=1048576,wsize=1048576
|
|
```
|
|
|
|
#### 5. Docker Volume Performance
|
|
```bash
|
|
# Check volume driver
|
|
docker volume inspect <volume>
|
|
|
|
# For better performance, use:
|
|
# - Bind mounts instead of named volumes for large datasets
|
|
# - Local SSD for database volumes
|
|
|
|
# docker-compose.yml
|
|
volumes:
|
|
- /fast-ssd/postgres:/var/lib/postgresql/data
|
|
```
|
|
|
|
---
|
|
|
|
## 📺 Media Streaming Performance
|
|
|
|
### Symptoms
|
|
- Buffering during playback
|
|
- Transcoding takes too long
|
|
- Multiple streams cause stuttering
|
|
|
|
### Plex/Jellyfin Optimization
|
|
|
|
```bash
|
|
# Check transcoding status
|
|
# Plex: Settings > Dashboard > Now Playing
|
|
# Jellyfin: Dashboard > Active Streams
|
|
|
|
# Enable hardware transcoding
|
|
# Plex: Settings > Transcoder > Hardware Acceleration
|
|
# Jellyfin: Dashboard > Playback > Transcoding
|
|
|
|
# For Intel QuickSync (Synology):
|
|
docker run -d \
|
|
--device /dev/dri:/dev/dri \ # Pass GPU
|
|
-e PLEX_CLAIM="claim-xxx" \
|
|
plexinc/pms-docker
|
|
```
|
|
|
|
### Direct Play vs Transcoding
|
|
```
|
|
Performance comparison:
|
|
- Direct Play: ~5-20 Mbps per stream (no CPU usage)
|
|
- Transcoding: ~2000-4000 CPU score per 1080p stream
|
|
|
|
# Optimize for Direct Play:
|
|
# 1. Use compatible codecs (H.264, AAC)
|
|
# 2. Match client capabilities
|
|
# 3. Disable transcoding for local clients
|
|
```
|
|
|
|
### Multiple Concurrent Streams
|
|
```
|
|
10GbE can handle: ~80 concurrent 4K streams (theoretical)
|
|
1GbE can handle: ~8 concurrent 4K streams
|
|
|
|
# If hitting limits:
|
|
# 1. Reduce stream quality for remote users
|
|
# 2. Enable bandwidth limits per user
|
|
# 3. Upgrade network infrastructure
|
|
```
|
|
|
|
---
|
|
|
|
## 🖥️ Synology NAS Performance
|
|
|
|
### Check System Health
|
|
```bash
|
|
# SSH into Synology
|
|
ssh admin@nas
|
|
|
|
# Check CPU/memory
|
|
top
|
|
|
|
# Check storage health
|
|
cat /proc/mdstat
|
|
syno_hdd_util --all
|
|
|
|
# Check Docker performance
|
|
docker stats
|
|
```
|
|
|
|
### Common Synology Issues
|
|
|
|
#### 1. Indexing Slowing System
|
|
```bash
|
|
# Check if Synology is indexing
|
|
ps aux | grep -i index
|
|
|
|
# Temporarily stop indexing
|
|
synoservicectl --stop synoindexd
|
|
|
|
# Or schedule indexing for off-hours
|
|
# Control Panel > Indexing Service > Schedule
|
|
```
|
|
|
|
#### 2. Snapshot Replication Running
|
|
```bash
|
|
# Check running tasks
|
|
synoschedtask --list
|
|
|
|
# Schedule snapshots during low-usage hours
|
|
```
|
|
|
|
#### 3. Antivirus Scanning
|
|
```bash
|
|
# Disable real-time scanning or schedule scans
|
|
# Security Advisor > Advanced > Scheduled Scan
|
|
```
|
|
|
|
#### 4. Memory Pressure
|
|
```bash
|
|
# Check memory usage
|
|
free -h
|
|
|
|
# If low on RAM, consider:
|
|
# - Adding more RAM (DS1823xs+ supports up to 32GB)
|
|
# - Reducing number of running containers
|
|
# - Disabling unused packages
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Monitoring for Performance
|
|
|
|
### Set Up Prometheus Alerts
|
|
|
|
```yaml
|
|
# prometheus/rules/performance.yml
|
|
groups:
|
|
- name: performance
|
|
rules:
|
|
- alert: HighCPUUsage
|
|
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "High CPU usage on {{ $labels.instance }}"
|
|
|
|
- alert: HighMemoryUsage
|
|
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
|
|
- alert: DiskIOHigh
|
|
expr: rate(node_disk_io_time_seconds_total[5m]) > 0.9
|
|
for: 10m
|
|
labels:
|
|
severity: warning
|
|
|
|
- alert: NetworkErrors
|
|
expr: rate(node_network_receive_errs_total[5m]) > 10
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
```
|
|
|
|
### Grafana Dashboard Panels
|
|
|
|
Key metrics to monitor:
|
|
- CPU usage by core
|
|
- Memory usage and swap
|
|
- Disk I/O latency (await)
|
|
- Network throughput and errors
|
|
- Container resource usage
|
|
- Docker volume I/O
|
|
|
|
---
|
|
|
|
## 🛠️ Performance Tuning Checklist
|
|
|
|
### System Level
|
|
- [ ] Kernel parameters optimized (`/etc/sysctl.conf`)
|
|
- [ ] Disk scheduler appropriate for workload (mq-deadline for SSD)
|
|
- [ ] Swap configured appropriately
|
|
- [ ] File descriptor limits increased
|
|
|
|
### Docker Level
|
|
- [ ] Container resource limits set
|
|
- [ ] Logging driver configured (json-file with max-size)
|
|
- [ ] Unused containers/images removed
|
|
- [ ] Volumes on appropriate storage
|
|
|
|
### Network Level
|
|
- [ ] Jumbo frames enabled (if supported)
|
|
- [ ] DNS resolution fast
|
|
- [ ] Firewall rules optimized
|
|
- [ ] Quality of Service (QoS) configured
|
|
|
|
### Application Level
|
|
- [ ] Database indexes optimized
|
|
- [ ] Caching enabled (Redis/Memcached)
|
|
- [ ] Connection pooling configured
|
|
- [ ] Static assets served efficiently
|
|
|
|
---
|
|
|
|
## 🔗 Related Documentation
|
|
|
|
- [Network Performance Tuning](../infrastructure/network-performance-tuning.md)
|
|
- [Monitoring Setup](../admin/monitoring.md)
|
|
- [Common Issues](common-issues.md)
|
|
- [10GbE Backbone](../diagrams/10gbe-backbone.md)
|
|
- [Storage Topology](../diagrams/storage-topology.md)
|