Vish/homelab-optimized

Fork 0

Files

Gitea Mirror Bot 1c70e90462

Documentation / Build Docusaurus (push) Has started running

Details

Documentation / Deploy to GitHub Pages (push) Has been cancelled

Details

Sanitized mirror from private repository - 2026-04-19 09:36:29 UTC

2026-04-19 09:36:29 +00:00

10 KiB

Raw Blame History

⚡ Performance Troubleshooting Guide

Overview

This guide helps diagnose and resolve performance issues in your homelab, from slow containers to network bottlenecks and storage problems.

🔍 Quick Diagnostics Checklist

Before diving deep, run through this checklist:

# 1. Check system resources
htop                           # CPU, memory usage
docker stats                   # Container resource usage
df -h                          # Disk space
iostat -x 1 5                  # Disk I/O

# 2. Check network
iperf3 -c <target-ip>          # Network throughput
ping -c 10 <target>            # Latency
netstat -tulpn                 # Open ports/connections

# 3. Check containers
docker ps -a                   # Container status
docker logs <container> --tail 100  # Recent logs

🐌 Slow Container Performance

Symptoms

Container takes long to respond
High CPU usage by specific container
Container restarts frequently

Diagnosis

# Check container resource usage
docker stats <container_name>

# Check container logs for errors
docker logs <container_name> --tail 200 | grep -i "error\|warn\|slow"

# Inspect container health
docker inspect <container_name> | jq '.[0].State'

# Check container processes
docker top <container_name>

Common Causes & Solutions

1. Memory Limits Too Low

# docker-compose.yml - Increase memory limits
services:
  myservice:
    mem_limit: 2g        # Increase from default
    memswap_limit: 4g    # Allow swap if needed

2. CPU Throttling

# docker-compose.yml - Adjust CPU limits
services:
  myservice:
    cpus: '2.0'          # Allow 2 CPU cores
    cpu_shares: 1024     # Higher priority

3. Storage I/O Bottleneck

# Check if container is doing heavy I/O
docker stats --format "table {{.Name}}\t{{.BlockIO}}"

# Solution: Move data to faster storage (NVMe cache, SSD)

4. Database Performance

# PostgreSQL slow queries
docker exec -it postgres psql -U user -c "
SELECT query, calls, mean_time, total_time 
FROM pg_stat_statements 
ORDER BY total_time DESC 
LIMIT 10;"

# Add indexes for slow queries
# Increase shared_buffers in postgresql.conf

🌐 Network Performance Issues

Symptoms

Slow file transfers between hosts
High latency to services
Buffering when streaming media

Diagnosis

# Test throughput between hosts
iperf3 -s                      # On server
iperf3 -c <server-ip> -t 30    # On client

# Expected speeds:
# - 1GbE: ~940 Mbps
# - 2.5GbE: ~2.35 Gbps
# - 10GbE: ~9.4 Gbps

# Check for packet loss
ping -c 100 <target> | tail -3

# Check network interface errors
ip -s link show eth0

Common Causes & Solutions

1. MTU Mismatch

# Check current MTU
ip link show | grep mtu

# Test for MTU issues (should not fragment)
ping -M do -s 1472 <target>

# Fix: Set consistent MTU across network
ip link set eth0 mtu 1500

2. Duplex/Speed Mismatch

# Check link speed
ethtool eth0 | grep -i speed

# Force correct speed (if auto-negotiation fails)
ethtool -s eth0 speed 1000 duplex full autoneg off

3. DNS Resolution Slow

# Test DNS resolution time
time dig google.com

# If slow, check /etc/resolv.conf
# Use local Pi-hole/AdGuard or fast upstream DNS

# Fix in Docker
# docker-compose.yml
services:
  myservice:
    dns:
      - 192.168.1.x    # Local DNS (Pi-hole)
      - 1.1.1.1        # Fallback

4. Tailscale Performance

# Check Tailscale connection type
tailscale status

# If using DERP relay (slow), check firewall
# Port 41641/UDP should be open for direct connections

# Check Tailscale latency
tailscale ping <device>

5. Reverse Proxy Bottleneck

# Check Nginx Proxy Manager logs
docker logs nginx-proxy-manager --tail 100

# Increase worker connections
# In nginx.conf:
worker_processes auto;
events {
    worker_connections 4096;
}

💾 Storage Performance Issues

Symptoms

Slow read/write speeds
High disk I/O wait
Database queries timing out

Diagnosis

# Check disk I/O statistics
iostat -xz 1 10

# Key metrics:
# - %util > 90% = disk saturated
# - await > 20ms = slow disk
# - r/s, w/s = operations per second

# Check for processes doing heavy I/O
iotop -o

# Test disk speed
# Sequential write
dd if=/dev/zero of=/volume1/test bs=1G count=1 oflag=direct

# Sequential read
dd if=/volume1/test of=/dev/null bs=1G count=1 iflag=direct

Common Causes & Solutions

1. HDD vs SSD/NVMe

Expected speeds:
- HDD (7200 RPM): 100-200 MB/s sequential
- SATA SSD: 500-550 MB/s
- NVMe SSD: 2000-7000 MB/s

# Move frequently accessed data to faster storage
# Use NVMe cache on Synology NAS

2. RAID Rebuild in Progress

# Check Synology RAID status
cat /proc/mdstat

# During rebuild, expect 30-50% performance loss
# Wait for rebuild to complete

3. NVMe Cache Not Working

# On Synology, check cache status in DSM
# Storage Manager > SSD Cache

# Common issues:
# - Cache full (increase size or add more SSDs)
# - Wrong cache mode (read-only vs read-write)
# - Cache disabled after DSM update

4. SMB/NFS Performance

# Test SMB performance
smbclient //nas/share -U user -c "put largefile.bin"

# Optimize SMB settings in smb.conf:
socket options = TCP_NODELAY IPTOS_LOWDELAY
read raw = yes
write raw = yes
max xmit = 65535

# For NFS, use NFSv4.1 with larger rsize/wsize
mount -t nfs4 nas:/share /mnt -o rsize=1048576,wsize=1048576

5. Docker Volume Performance

# Check volume driver
docker volume inspect <volume>

# For better performance, use:
# - Bind mounts instead of named volumes for large datasets
# - Local SSD for database volumes

# docker-compose.yml
volumes:
  - /fast-ssd/postgres:/var/lib/postgresql/data

📺 Media Streaming Performance

Symptoms

Buffering during playback
Transcoding takes too long
Multiple streams cause stuttering

Plex/Jellyfin Optimization

# Check transcoding status
# Plex: Settings > Dashboard > Now Playing
# Jellyfin: Dashboard > Active Streams

# Enable hardware transcoding
# Plex: Settings > Transcoder > Hardware Acceleration
# Jellyfin: Dashboard > Playback > Transcoding

# For Intel QuickSync (Synology):
docker run -d \
  --device /dev/dri:/dev/dri \  # Pass GPU
  -e PLEX_CLAIM="claim-xxx" \
  plexinc/pms-docker

Direct Play vs Transcoding

Performance comparison:
- Direct Play: ~5-20 Mbps per stream (no CPU usage)
- Transcoding: ~2000-4000 CPU score per 1080p stream

# Optimize for Direct Play:
# 1. Use compatible codecs (H.264, AAC)
# 2. Match client capabilities
# 3. Disable transcoding for local clients

Multiple Concurrent Streams

10GbE can handle: ~80 concurrent 4K streams (theoretical)
1GbE can handle: ~8 concurrent 4K streams

# If hitting limits:
# 1. Reduce stream quality for remote users
# 2. Enable bandwidth limits per user
# 3. Upgrade network infrastructure

🖥️ Synology NAS Performance

Check System Health

# SSH into Synology
ssh admin@nas

# Check CPU/memory
top

# Check storage health
cat /proc/mdstat
syno_hdd_util --all

# Check Docker performance
docker stats

Common Synology Issues

1. Indexing Slowing System

# Check if Synology is indexing
ps aux | grep -i index

# Temporarily stop indexing
synoservicectl --stop synoindexd

# Or schedule indexing for off-hours
# Control Panel > Indexing Service > Schedule

2. Snapshot Replication Running

# Check running tasks
synoschedtask --list

# Schedule snapshots during low-usage hours

3. Antivirus Scanning

# Disable real-time scanning or schedule scans
# Security Advisor > Advanced > Scheduled Scan

4. Memory Pressure

# Check memory usage
free -h

# If low on RAM, consider:
# - Adding more RAM (DS1823xs+ supports up to 32GB)
# - Reducing number of running containers
# - Disabling unused packages

📊 Monitoring for Performance

Set Up Prometheus Alerts

# prometheus/rules/performance.yml
groups:
  - name: performance
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          
      - alert: HighMemoryUsage
        expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 90
        for: 5m
        labels:
          severity: warning
          
      - alert: DiskIOHigh
        expr: rate(node_disk_io_time_seconds_total[5m]) > 0.9
        for: 10m
        labels:
          severity: warning
          
      - alert: NetworkErrors
        expr: rate(node_network_receive_errs_total[5m]) > 10
        for: 5m
        labels:
          severity: warning

Grafana Dashboard Panels

Key metrics to monitor:

CPU usage by core
Memory usage and swap
Disk I/O latency (await)
Network throughput and errors
Container resource usage
Docker volume I/O

🛠️ Performance Tuning Checklist

System Level

Kernel parameters optimized (/etc/sysctl.conf)
Disk scheduler appropriate for workload (mq-deadline for SSD)
Swap configured appropriately
File descriptor limits increased

Docker Level

Container resource limits set
Logging driver configured (json-file with max-size)
Unused containers/images removed
Volumes on appropriate storage

Network Level

Jumbo frames enabled (if supported)
DNS resolution fast
Firewall rules optimized
Quality of Service (QoS) configured

Application Level

Database indexes optimized
Caching enabled (Redis/Memcached)
Connection pooling configured
Static assets served efficiently

10 KiB Raw Blame History

⚡ Performance Troubleshooting Guide

Overview

🔍 Quick Diagnostics Checklist

🐌 Slow Container Performance

Symptoms

Diagnosis

Common Causes & Solutions

1. Memory Limits Too Low

2. CPU Throttling

3. Storage I/O Bottleneck

4. Database Performance

🌐 Network Performance Issues

Symptoms

Diagnosis

Common Causes & Solutions

1. MTU Mismatch

2. Duplex/Speed Mismatch

3. DNS Resolution Slow

4. Tailscale Performance

5. Reverse Proxy Bottleneck

💾 Storage Performance Issues

Symptoms

Diagnosis

Common Causes & Solutions

1. HDD vs SSD/NVMe

2. RAID Rebuild in Progress

3. NVMe Cache Not Working

4. SMB/NFS Performance

5. Docker Volume Performance

📺 Media Streaming Performance

Symptoms

Plex/Jellyfin Optimization

Direct Play vs Transcoding

Multiple Concurrent Streams

🖥️ Synology NAS Performance

Check System Health

Common Synology Issues

1. Indexing Slowing System

2. Snapshot Replication Running

3. Antivirus Scanning

4. Memory Pressure

📊 Monitoring for Performance

Set Up Prometheus Alerts

Grafana Dashboard Panels

🛠️ Performance Tuning Checklist

System Level

Docker Level

Network Level

Application Level

🔗 Related Documentation

10 KiB

Raw Blame History