Files
homelab-optimized/docs/troubleshooting/performance.md
Gitea Mirror Bot 1c70e90462
Some checks failed
Documentation / Build Docusaurus (push) Has started running
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-19 09:36:29 UTC
2026-04-19 09:36:29 +00:00

10 KiB

Performance Troubleshooting Guide

Overview

This guide helps diagnose and resolve performance issues in your homelab, from slow containers to network bottlenecks and storage problems.


🔍 Quick Diagnostics Checklist

Before diving deep, run through this checklist:

# 1. Check system resources
htop                           # CPU, memory usage
docker stats                   # Container resource usage
df -h                          # Disk space
iostat -x 1 5                  # Disk I/O

# 2. Check network
iperf3 -c <target-ip>          # Network throughput
ping -c 10 <target>            # Latency
netstat -tulpn                 # Open ports/connections

# 3. Check containers
docker ps -a                   # Container status
docker logs <container> --tail 100  # Recent logs

🐌 Slow Container Performance

Symptoms

  • Container takes long to respond
  • High CPU usage by specific container
  • Container restarts frequently

Diagnosis

# Check container resource usage
docker stats <container_name>

# Check container logs for errors
docker logs <container_name> --tail 200 | grep -i "error\|warn\|slow"

# Inspect container health
docker inspect <container_name> | jq '.[0].State'

# Check container processes
docker top <container_name>

Common Causes & Solutions

1. Memory Limits Too Low

# docker-compose.yml - Increase memory limits
services:
  myservice:
    mem_limit: 2g        # Increase from default
    memswap_limit: 4g    # Allow swap if needed

2. CPU Throttling

# docker-compose.yml - Adjust CPU limits
services:
  myservice:
    cpus: '2.0'          # Allow 2 CPU cores
    cpu_shares: 1024     # Higher priority

3. Storage I/O Bottleneck

# Check if container is doing heavy I/O
docker stats --format "table {{.Name}}\t{{.BlockIO}}"

# Solution: Move data to faster storage (NVMe cache, SSD)

4. Database Performance

# PostgreSQL slow queries
docker exec -it postgres psql -U user -c "
SELECT query, calls, mean_time, total_time 
FROM pg_stat_statements 
ORDER BY total_time DESC 
LIMIT 10;"

# Add indexes for slow queries
# Increase shared_buffers in postgresql.conf

🌐 Network Performance Issues

Symptoms

  • Slow file transfers between hosts
  • High latency to services
  • Buffering when streaming media

Diagnosis

# Test throughput between hosts
iperf3 -s                      # On server
iperf3 -c <server-ip> -t 30    # On client

# Expected speeds:
# - 1GbE: ~940 Mbps
# - 2.5GbE: ~2.35 Gbps
# - 10GbE: ~9.4 Gbps

# Check for packet loss
ping -c 100 <target> | tail -3

# Check network interface errors
ip -s link show eth0

Common Causes & Solutions

1. MTU Mismatch

# Check current MTU
ip link show | grep mtu

# Test for MTU issues (should not fragment)
ping -M do -s 1472 <target>

# Fix: Set consistent MTU across network
ip link set eth0 mtu 1500

2. Duplex/Speed Mismatch

# Check link speed
ethtool eth0 | grep -i speed

# Force correct speed (if auto-negotiation fails)
ethtool -s eth0 speed 1000 duplex full autoneg off

3. DNS Resolution Slow

# Test DNS resolution time
time dig google.com

# If slow, check /etc/resolv.conf
# Use local Pi-hole/AdGuard or fast upstream DNS

# Fix in Docker
# docker-compose.yml
services:
  myservice:
    dns:
      - 192.168.1.x    # Local DNS (Pi-hole)
      - 1.1.1.1        # Fallback

4. Tailscale Performance

# Check Tailscale connection type
tailscale status

# If using DERP relay (slow), check firewall
# Port 41641/UDP should be open for direct connections

# Check Tailscale latency
tailscale ping <device>

5. Reverse Proxy Bottleneck

# Check Nginx Proxy Manager logs
docker logs nginx-proxy-manager --tail 100

# Increase worker connections
# In nginx.conf:
worker_processes auto;
events {
    worker_connections 4096;
}

💾 Storage Performance Issues

Symptoms

  • Slow read/write speeds
  • High disk I/O wait
  • Database queries timing out

Diagnosis

# Check disk I/O statistics
iostat -xz 1 10

# Key metrics:
# - %util > 90% = disk saturated
# - await > 20ms = slow disk
# - r/s, w/s = operations per second

# Check for processes doing heavy I/O
iotop -o

# Test disk speed
# Sequential write
dd if=/dev/zero of=/volume1/test bs=1G count=1 oflag=direct

# Sequential read
dd if=/volume1/test of=/dev/null bs=1G count=1 iflag=direct

Common Causes & Solutions

1. HDD vs SSD/NVMe

Expected speeds:
- HDD (7200 RPM): 100-200 MB/s sequential
- SATA SSD: 500-550 MB/s
- NVMe SSD: 2000-7000 MB/s

# Move frequently accessed data to faster storage
# Use NVMe cache on Synology NAS

2. RAID Rebuild in Progress

# Check Synology RAID status
cat /proc/mdstat

# During rebuild, expect 30-50% performance loss
# Wait for rebuild to complete

3. NVMe Cache Not Working

# On Synology, check cache status in DSM
# Storage Manager > SSD Cache

# Common issues:
# - Cache full (increase size or add more SSDs)
# - Wrong cache mode (read-only vs read-write)
# - Cache disabled after DSM update

4. SMB/NFS Performance

# Test SMB performance
smbclient //nas/share -U user -c "put largefile.bin"

# Optimize SMB settings in smb.conf:
socket options = TCP_NODELAY IPTOS_LOWDELAY
read raw = yes
write raw = yes
max xmit = 65535

# For NFS, use NFSv4.1 with larger rsize/wsize
mount -t nfs4 nas:/share /mnt -o rsize=1048576,wsize=1048576

5. Docker Volume Performance

# Check volume driver
docker volume inspect <volume>

# For better performance, use:
# - Bind mounts instead of named volumes for large datasets
# - Local SSD for database volumes

# docker-compose.yml
volumes:
  - /fast-ssd/postgres:/var/lib/postgresql/data

📺 Media Streaming Performance

Symptoms

  • Buffering during playback
  • Transcoding takes too long
  • Multiple streams cause stuttering

Plex/Jellyfin Optimization

# Check transcoding status
# Plex: Settings > Dashboard > Now Playing
# Jellyfin: Dashboard > Active Streams

# Enable hardware transcoding
# Plex: Settings > Transcoder > Hardware Acceleration
# Jellyfin: Dashboard > Playback > Transcoding

# For Intel QuickSync (Synology):
docker run -d \
  --device /dev/dri:/dev/dri \  # Pass GPU
  -e PLEX_CLAIM="claim-xxx" \
  plexinc/pms-docker

Direct Play vs Transcoding

Performance comparison:
- Direct Play: ~5-20 Mbps per stream (no CPU usage)
- Transcoding: ~2000-4000 CPU score per 1080p stream

# Optimize for Direct Play:
# 1. Use compatible codecs (H.264, AAC)
# 2. Match client capabilities
# 3. Disable transcoding for local clients

Multiple Concurrent Streams

10GbE can handle: ~80 concurrent 4K streams (theoretical)
1GbE can handle: ~8 concurrent 4K streams

# If hitting limits:
# 1. Reduce stream quality for remote users
# 2. Enable bandwidth limits per user
# 3. Upgrade network infrastructure

🖥️ Synology NAS Performance

Check System Health

# SSH into Synology
ssh admin@nas

# Check CPU/memory
top

# Check storage health
cat /proc/mdstat
syno_hdd_util --all

# Check Docker performance
docker stats

Common Synology Issues

1. Indexing Slowing System

# Check if Synology is indexing
ps aux | grep -i index

# Temporarily stop indexing
synoservicectl --stop synoindexd

# Or schedule indexing for off-hours
# Control Panel > Indexing Service > Schedule

2. Snapshot Replication Running

# Check running tasks
synoschedtask --list

# Schedule snapshots during low-usage hours

3. Antivirus Scanning

# Disable real-time scanning or schedule scans
# Security Advisor > Advanced > Scheduled Scan

4. Memory Pressure

# Check memory usage
free -h

# If low on RAM, consider:
# - Adding more RAM (DS1823xs+ supports up to 32GB)
# - Reducing number of running containers
# - Disabling unused packages

📊 Monitoring for Performance

Set Up Prometheus Alerts

# prometheus/rules/performance.yml
groups:
  - name: performance
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: warning
          
      - alert: DiskIOHigh
        expr: rate(node_disk_io_time_seconds_total[5m]) > 0.9
        for: 10m
        labels:
          severity: warning
          
      - alert: NetworkErrors
        expr: rate(node_network_receive_errs_total[5m]) > 10
        for: 5m
        labels:
          severity: warning

Grafana Dashboard Panels

Key metrics to monitor:

  • CPU usage by core
  • Memory usage and swap
  • Disk I/O latency (await)
  • Network throughput and errors
  • Container resource usage
  • Docker volume I/O

🛠️ Performance Tuning Checklist

System Level

  • Kernel parameters optimized (/etc/sysctl.conf)
  • Disk scheduler appropriate for workload (mq-deadline for SSD)
  • Swap configured appropriately
  • File descriptor limits increased

Docker Level

  • Container resource limits set
  • Logging driver configured (json-file with max-size)
  • Unused containers/images removed
  • Volumes on appropriate storage

Network Level

  • Jumbo frames enabled (if supported)
  • DNS resolution fast
  • Firewall rules optimized
  • Quality of Service (QoS) configured

Application Level

  • Database indexes optimized
  • Caching enabled (Redis/Memcached)
  • Connection pooling configured
  • Static assets served efficiently