Sanitized mirror from private repository - 2026-04-01 04:44:34 UTC
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m3s
Documentation / Deploy to GitHub Pages (push) Has been skipped

This commit is contained in:
Gitea Mirror Bot
2026-04-01 04:44:34 +00:00
commit ac5a4ca940
1285 changed files with 332083 additions and 0 deletions

View File

@@ -0,0 +1,475 @@
# ⚡ Performance Troubleshooting Guide
## Overview
This guide helps diagnose and resolve performance issues in your homelab, from slow containers to network bottlenecks and storage problems.
---
## 🔍 Quick Diagnostics Checklist
Before diving deep, run through this checklist:
```bash
# 1. Check system resources
htop # CPU, memory usage
docker stats # Container resource usage
df -h # Disk space
iostat -x 1 5 # Disk I/O
# 2. Check network
iperf3 -c <target-ip> # Network throughput
ping -c 10 <target> # Latency
netstat -tulpn # Open ports/connections
# 3. Check containers
docker ps -a # Container status
docker logs <container> --tail 100 # Recent logs
```
---
## 🐌 Slow Container Performance
### Symptoms
- Container takes long to respond
- High CPU usage by specific container
- Container restarts frequently
### Diagnosis
```bash
# Check container resource usage
docker stats <container_name>
# Check container logs for errors
docker logs <container_name> --tail 200 | grep -i "error\|warn\|slow"
# Inspect container health
docker inspect <container_name> | jq '.[0].State'
# Check container processes
docker top <container_name>
```
### Common Causes & Solutions
#### 1. Memory Limits Too Low
```yaml
# docker-compose.yml - Increase memory limits
services:
myservice:
mem_limit: 2g # Increase from default
memswap_limit: 4g # Allow swap if needed
```
#### 2. CPU Throttling
```yaml
# docker-compose.yml - Adjust CPU limits
services:
myservice:
cpus: '2.0' # Allow 2 CPU cores
cpu_shares: 1024 # Higher priority
```
#### 3. Storage I/O Bottleneck
```bash
# Check if container is doing heavy I/O
docker stats --format "table {{.Name}}\t{{.BlockIO}}"
# Solution: Move data to faster storage (NVMe cache, SSD)
```
#### 4. Database Performance
```bash
# PostgreSQL slow queries
docker exec -it postgres psql -U user -c "
SELECT query, calls, mean_time, total_time
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;"
# Add indexes for slow queries
# Increase shared_buffers in postgresql.conf
```
---
## 🌐 Network Performance Issues
### Symptoms
- Slow file transfers between hosts
- High latency to services
- Buffering when streaming media
### Diagnosis
```bash
# Test throughput between hosts
iperf3 -s # On server
iperf3 -c <server-ip> -t 30 # On client
# Expected speeds:
# - 1GbE: ~940 Mbps
# - 2.5GbE: ~2.35 Gbps
# - 10GbE: ~9.4 Gbps
# Check for packet loss
ping -c 100 <target> | tail -3
# Check network interface errors
ip -s link show eth0
```
### Common Causes & Solutions
#### 1. MTU Mismatch
```bash
# Check current MTU
ip link show | grep mtu
# Test for MTU issues (should not fragment)
ping -M do -s 1472 <target>
# Fix: Set consistent MTU across network
ip link set eth0 mtu 1500
```
#### 2. Duplex/Speed Mismatch
```bash
# Check link speed
ethtool eth0 | grep -i speed
# Force correct speed (if auto-negotiation fails)
ethtool -s eth0 speed 1000 duplex full autoneg off
```
#### 3. DNS Resolution Slow
```bash
# Test DNS resolution time
time dig google.com
# If slow, check /etc/resolv.conf
# Use local Pi-hole/AdGuard or fast upstream DNS
# Fix in Docker
# docker-compose.yml
services:
myservice:
dns:
- 192.168.1.x # Local DNS (Pi-hole)
- 1.1.1.1 # Fallback
```
#### 4. Tailscale Performance
```bash
# Check Tailscale connection type
tailscale status
# If using DERP relay (slow), check firewall
# Port 41641/UDP should be open for direct connections
# Check Tailscale latency
tailscale ping <device>
```
#### 5. Reverse Proxy Bottleneck
```bash
# Check Nginx Proxy Manager logs
docker logs nginx-proxy-manager --tail 100
# Increase worker connections
# In nginx.conf:
worker_processes auto;
events {
worker_connections 4096;
}
```
---
## 💾 Storage Performance Issues
### Symptoms
- Slow read/write speeds
- High disk I/O wait
- Database queries timing out
### Diagnosis
```bash
# Check disk I/O statistics
iostat -xz 1 10
# Key metrics:
# - %util > 90% = disk saturated
# - await > 20ms = slow disk
# - r/s, w/s = operations per second
# Check for processes doing heavy I/O
iotop -o
# Test disk speed
# Sequential write
dd if=/dev/zero of=/volume1/test bs=1G count=1 oflag=direct
# Sequential read
dd if=/volume1/test of=/dev/null bs=1G count=1 iflag=direct
```
### Common Causes & Solutions
#### 1. HDD vs SSD/NVMe
```
Expected speeds:
- HDD (7200 RPM): 100-200 MB/s sequential
- SATA SSD: 500-550 MB/s
- NVMe SSD: 2000-7000 MB/s
# Move frequently accessed data to faster storage
# Use NVMe cache on Synology NAS
```
#### 2. RAID Rebuild in Progress
```bash
# Check Synology RAID status
cat /proc/mdstat
# During rebuild, expect 30-50% performance loss
# Wait for rebuild to complete
```
#### 3. NVMe Cache Not Working
```bash
# On Synology, check cache status in DSM
# Storage Manager > SSD Cache
# Common issues:
# - Cache full (increase size or add more SSDs)
# - Wrong cache mode (read-only vs read-write)
# - Cache disabled after DSM update
```
#### 4. SMB/NFS Performance
```bash
# Test SMB performance
smbclient //nas/share -U user -c "put largefile.bin"
# Optimize SMB settings in smb.conf:
socket options = TCP_NODELAY IPTOS_LOWDELAY
read raw = yes
write raw = yes
max xmit = 65535
# For NFS, use NFSv4.1 with larger rsize/wsize
mount -t nfs4 nas:/share /mnt -o rsize=1048576,wsize=1048576
```
#### 5. Docker Volume Performance
```bash
# Check volume driver
docker volume inspect <volume>
# For better performance, use:
# - Bind mounts instead of named volumes for large datasets
# - Local SSD for database volumes
# docker-compose.yml
volumes:
- /fast-ssd/postgres:/var/lib/postgresql/data
```
---
## 📺 Media Streaming Performance
### Symptoms
- Buffering during playback
- Transcoding takes too long
- Multiple streams cause stuttering
### Plex/Jellyfin Optimization
```bash
# Check transcoding status
# Plex: Settings > Dashboard > Now Playing
# Jellyfin: Dashboard > Active Streams
# Enable hardware transcoding
# Plex: Settings > Transcoder > Hardware Acceleration
# Jellyfin: Dashboard > Playback > Transcoding
# For Intel QuickSync (Synology):
docker run -d \
--device /dev/dri:/dev/dri \ # Pass GPU
-e PLEX_CLAIM="claim-xxx" \
plexinc/pms-docker
```
### Direct Play vs Transcoding
```
Performance comparison:
- Direct Play: ~5-20 Mbps per stream (no CPU usage)
- Transcoding: ~2000-4000 CPU score per 1080p stream
# Optimize for Direct Play:
# 1. Use compatible codecs (H.264, AAC)
# 2. Match client capabilities
# 3. Disable transcoding for local clients
```
### Multiple Concurrent Streams
```
10GbE can handle: ~80 concurrent 4K streams (theoretical)
1GbE can handle: ~8 concurrent 4K streams
# If hitting limits:
# 1. Reduce stream quality for remote users
# 2. Enable bandwidth limits per user
# 3. Upgrade network infrastructure
```
---
## 🖥️ Synology NAS Performance
### Check System Health
```bash
# SSH into Synology
ssh admin@nas
# Check CPU/memory
top
# Check storage health
cat /proc/mdstat
syno_hdd_util --all
# Check Docker performance
docker stats
```
### Common Synology Issues
#### 1. Indexing Slowing System
```bash
# Check if Synology is indexing
ps aux | grep -i index
# Temporarily stop indexing
synoservicectl --stop synoindexd
# Or schedule indexing for off-hours
# Control Panel > Indexing Service > Schedule
```
#### 2. Snapshot Replication Running
```bash
# Check running tasks
synoschedtask --list
# Schedule snapshots during low-usage hours
```
#### 3. Antivirus Scanning
```bash
# Disable real-time scanning or schedule scans
# Security Advisor > Advanced > Scheduled Scan
```
#### 4. Memory Pressure
```bash
# Check memory usage
free -h
# If low on RAM, consider:
# - Adding more RAM (DS1823xs+ supports up to 32GB)
# - Reducing number of running containers
# - Disabling unused packages
```
---
## 📊 Monitoring for Performance
### Set Up Prometheus Alerts
```yaml
# prometheus/rules/performance.yml
groups:
- name: performance
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
- alert: HighMemoryUsage
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
for: 5m
labels:
severity: warning
- alert: DiskIOHigh
expr: rate(node_disk_io_time_seconds_total[5m]) > 0.9
for: 10m
labels:
severity: warning
- alert: NetworkErrors
expr: rate(node_network_receive_errs_total[5m]) > 10
for: 5m
labels:
severity: warning
```
### Grafana Dashboard Panels
Key metrics to monitor:
- CPU usage by core
- Memory usage and swap
- Disk I/O latency (await)
- Network throughput and errors
- Container resource usage
- Docker volume I/O
---
## 🛠️ Performance Tuning Checklist
### System Level
- [ ] Kernel parameters optimized (`/etc/sysctl.conf`)
- [ ] Disk scheduler appropriate for workload (mq-deadline for SSD)
- [ ] Swap configured appropriately
- [ ] File descriptor limits increased
### Docker Level
- [ ] Container resource limits set
- [ ] Logging driver configured (json-file with max-size)
- [ ] Unused containers/images removed
- [ ] Volumes on appropriate storage
### Network Level
- [ ] Jumbo frames enabled (if supported)
- [ ] DNS resolution fast
- [ ] Firewall rules optimized
- [ ] Quality of Service (QoS) configured
### Application Level
- [ ] Database indexes optimized
- [ ] Caching enabled (Redis/Memcached)
- [ ] Connection pooling configured
- [ ] Static assets served efficiently
---
## 🔗 Related Documentation
- [Network Performance Tuning](../infrastructure/network-performance-tuning.md)
- [Monitoring Setup](../admin/monitoring.md)
- [Common Issues](common-issues.md)
- [10GbE Backbone](../diagrams/10gbe-backbone.md)
- [Storage Topology](../diagrams/storage-topology.md)