Files
homelab-optimized/docs/getting-started/40-Common-Issues.md
Gitea Mirror Bot 5cdf36e545
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-05 12:11:15 UTC
2026-04-05 12:11:15 +00:00

15 KiB

Common Issues & Troubleshooting

Overview

This guide covers the most frequently encountered issues in the homelab environment and provides step-by-step solutions. Issues are organized by category with diagnostic steps and resolution procedures.

Container & Docker Issues

Container Won't Start

Symptoms

  • Container exits immediately after starting
  • "Container exited with code 1" errors
  • Service unavailable after deployment

Diagnostic Steps

# Check container status
docker ps -a

# View container logs
docker logs container-name

# Inspect container configuration
docker inspect container-name

# Check resource usage
docker stats

Common Causes & Solutions

Port Conflicts

# Check port usage
netstat -tulpn | grep :8080
ss -tulpn | grep :8080

# Solution: Change port in docker-compose.yml
ports:
  - "8081:8080"  # Use different external port

Permission Issues

# Check file ownership
ls -la /mnt/storage/service-name

# Fix ownership
sudo chown -R 1000:1000 /mnt/storage/service-name

# Set proper permissions
sudo chmod -R 755 /mnt/storage/service-name

Missing Environment Variables

# Check environment variables
docker exec container-name env

# Add missing variables to .env file
echo "MISSING_VAR=value" >> .env

# Recreate container
docker-compose up -d --force-recreate

Container Memory Issues

Symptoms

  • Container killed by OOM (Out of Memory)
  • Slow performance or timeouts
  • System becomes unresponsive

Diagnostic Steps

# Check memory usage
free -h
docker stats

# Check system logs for OOM kills
dmesg | grep -i "killed process"
journalctl -u docker.service | grep -i oom

Solutions

# Add memory limits to docker-compose.yml
services:
  service-name:
    deploy:
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 1G

# Increase system swap
sudo fallocate -l 4G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Docker Daemon Issues

Symptoms

  • "Cannot connect to Docker daemon" errors
  • Docker commands hang or timeout
  • Services become unresponsive

Diagnostic Steps

# Check Docker daemon status
systemctl status docker

# Check Docker daemon logs
journalctl -u docker.service -f

# Test Docker connectivity
docker version
docker info

Solutions

# Restart Docker daemon
sudo systemctl restart docker

# Clean up Docker system
docker system prune -a

# Reset Docker daemon (last resort)
sudo systemctl stop docker
sudo rm -rf /var/lib/docker
sudo systemctl start docker

Network & Connectivity Issues

Service Not Accessible

Symptoms

  • Connection refused errors
  • Timeouts when accessing services
  • Services work internally but not externally

Diagnostic Steps

# Test local connectivity
curl -I http://localhost:8080

# Test network connectivity
curl -I http://server-ip:8080

# Check firewall rules
sudo ufw status
iptables -L

# Check port binding
netstat -tulpn | grep :8080

Solutions

# Open firewall ports
sudo ufw allow 8080/tcp

# Check Docker port binding
# Ensure ports are properly exposed in docker-compose.yml
ports:
  - "0.0.0.0:8080:8080"  # Bind to all interfaces

# Restart networking
sudo systemctl restart networking

DNS Resolution Issues

Symptoms

  • Cannot resolve service hostnames
  • "Name or service not known" errors
  • Services can't communicate with each other

Diagnostic Steps

# Test DNS resolution
nslookup service.local
dig service.local

# Check DNS configuration
cat /etc/resolv.conf

# Test container DNS
docker exec container-name nslookup google.com

Solutions

# Update DNS servers
echo "nameserver 8.8.8.8" | sudo tee -a /etc/resolv.conf

# Restart systemd-resolved
sudo systemctl restart systemd-resolved

# Configure Docker DNS
# Add to /etc/docker/daemon.json
{
  "dns": ["8.8.8.8", "8.8.4.4"]
}

sudo systemctl restart docker

Reverse Proxy Issues

Symptoms

  • 502 Bad Gateway errors
  • SSL certificate errors
  • Services accessible directly but not through proxy

Diagnostic Steps

# Check proxy container logs
docker logs nginx-proxy-manager

# Test backend connectivity
curl -I http://backend-service:8080

# Check proxy configuration
docker exec nginx-proxy-manager cat /etc/nginx/nginx.conf

Solutions

# Verify backend service is running
docker ps | grep backend-service

# Check network connectivity between proxy and backend
docker exec nginx-proxy-manager ping backend-service

# Regenerate SSL certificates
# Through Nginx Proxy Manager UI or:
certbot renew --force-renewal

Storage & File System Issues

Disk Space Issues

Symptoms

  • "No space left on device" errors
  • Services failing to write data
  • System performance degradation

Diagnostic Steps

# Check disk usage
df -h
du -sh /*

# Check Docker space usage
docker system df

# Find large files
find / -type f -size +1G 2>/dev/null

Solutions

# Clean Docker system
docker system prune -a
docker volume prune

# Clean log files
sudo journalctl --vacuum-time=7d
sudo find /var/log -name "*.log" -type f -mtime +30 -delete

# Move data to larger partition
sudo mv /var/lib/docker /mnt/storage/docker
sudo ln -s /mnt/storage/docker /var/lib/docker

Permission Issues

Symptoms

  • "Permission denied" errors
  • Services can't read/write files
  • Configuration files not loading

Diagnostic Steps

# Check file permissions
ls -la /mnt/storage/service-name

# Check user/group IDs
id username
docker exec container-name id

# Check mount points
mount | grep storage

Solutions

# Fix ownership recursively
sudo chown -R 1000:1000 /mnt/storage/service-name

# Set proper permissions
sudo chmod -R 755 /mnt/storage/service-name

# Add user to docker group
sudo usermod -aG docker $USER

# Set PUID/PGID in docker-compose.yml
environment:
  - PUID=1000
  - PGID=1000

RAID Array Issues

Symptoms

  • Degraded RAID arrays
  • Disk failure notifications
  • Slow storage performance

Diagnostic Steps

# Check RAID status
cat /proc/mdstat
sudo mdadm --detail /dev/md0

# Check disk health
sudo smartctl -a /dev/sda

# Check system logs
dmesg | grep -i raid
journalctl | grep -i mdadm

Solutions

# Replace failed disk
sudo mdadm --manage /dev/md0 --remove /dev/sdb
# Physically replace disk
sudo mdadm --manage /dev/md0 --add /dev/sdb

# Force array rebuild
sudo mdadm --manage /dev/md0 --re-add /dev/sdb

# Monitor rebuild progress
watch cat /proc/mdstat

Service-Specific Issues

Database Connection Issues

Symptoms

  • "Connection refused" to database
  • "Too many connections" errors
  • Database corruption warnings

Diagnostic Steps

# Check database container status
docker logs postgres-container

# Test database connectivity
docker exec postgres-container psql -U user -d database -c "SELECT 1;"

# Check connection limits
docker exec postgres-container psql -U user -c "SHOW max_connections;"

Solutions

# Restart database container
docker-compose restart postgres

# Increase connection limits
# In postgresql.conf:
max_connections = 200

# Clean up idle connections
docker exec postgres-container psql -U user -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle';"

Web Service Issues

Symptoms

  • 500 Internal Server Error
  • Slow response times
  • Service timeouts

Diagnostic Steps

# Check service logs
docker logs web-service

# Test service health
curl -I http://localhost:8080/health

# Check resource usage
docker stats web-service

Solutions

# Restart service
docker-compose restart web-service

# Increase resource limits
deploy:
  resources:
    limits:
      memory: 2G
      cpus: '1.0'

# Check application configuration
docker exec web-service cat /config/app.conf

Authentication Issues

Symptoms

  • Login failures
  • "Unauthorized" errors
  • SSO integration problems

Diagnostic Steps

# Check authentication service logs
docker logs authentik-server

# Test authentication endpoint
curl -X POST http://auth.local/api/v3/auth/login

# Check user database
docker exec authentik-server ak list_users

Solutions

# Reset user password
docker exec authentik-server ak reset_password username

# Restart authentication service
docker-compose restart authentik

# Check LDAP connectivity (if applicable)
docker exec authentik-server ldapsearch -x -H ldap://server

Monitoring & Alerting Issues

Metrics Collection Issues

Symptoms

  • Missing metrics in Grafana
  • Prometheus targets down
  • Exporters not responding

Diagnostic Steps

# Check Prometheus targets
curl http://prometheus:9090/api/v1/targets

# Test exporter endpoints
curl http://node-exporter:9100/metrics

# Check Prometheus configuration
docker exec prometheus cat /etc/prometheus/prometheus.yml

Solutions

# Restart monitoring stack
docker-compose -f monitoring.yml restart

# Reload Prometheus configuration
curl -X POST http://prometheus:9090/-/reload

# Check network connectivity
docker exec prometheus ping node-exporter

Alert Manager Issues

Symptoms

  • Alerts not firing
  • Notifications not received
  • Alert routing problems

Diagnostic Steps

# Check AlertManager status
curl http://alertmanager:9093/api/v1/status

# View active alerts
curl http://alertmanager:9093/api/v1/alerts

# Check routing configuration
docker exec alertmanager cat /etc/alertmanager/alertmanager.yml

Solutions

# Test notification channels
curl -X POST http://alertmanager:9093/api/v1/alerts \
  -H "Content-Type: application/json" \
  -d '[{"labels":{"alertname":"test"}}]'

# Restart AlertManager
docker-compose restart alertmanager

# Validate configuration
docker exec alertmanager amtool config check

Performance Issues

High CPU Usage

Symptoms

  • System sluggishness
  • High load averages
  • Services timing out

Diagnostic Steps

# Check system load
uptime
htop

# Check container CPU usage
docker stats

# Identify CPU-intensive processes
top -o %CPU

Solutions

# Limit container CPU usage
deploy:
  resources:
    limits:
      cpus: '0.5'

# Optimize service configuration
# Reduce worker processes, adjust cache settings

# Scale services horizontally
docker-compose up -d --scale web-service=3

High Memory Usage

Symptoms

  • System swapping
  • OOM kills
  • Slow performance

Diagnostic Steps

# Check memory usage
free -h
cat /proc/meminfo

# Check container memory usage
docker stats

# Check for memory leaks
ps aux --sort=-%mem | head

Solutions

# Add memory limits
deploy:
  resources:
    limits:
      memory: 1G

# Increase system memory or swap
sudo fallocate -l 2G /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

# Optimize application memory usage
# Adjust JVM heap size, database buffers, etc.

Network Performance Issues

Symptoms

  • Slow file transfers
  • High network latency
  • Connection timeouts

Diagnostic Steps

# Test network speed
iperf3 -c server-ip

# Check network interface statistics
ip -s link show

# Monitor network traffic
iftop
nethogs

Solutions

# Optimize network settings
echo 'net.core.rmem_max = 16777216' >> /etc/sysctl.conf
echo 'net.core.wmem_max = 16777216' >> /etc/sysctl.conf
sysctl -p

# Check for network congestion
# Upgrade network infrastructure if needed

# Optimize Docker networking
# Use host networking for performance-critical services
network_mode: host

Security Issues

SSL Certificate Issues

Symptoms

  • Certificate expired warnings
  • SSL handshake failures
  • Browser security warnings

Diagnostic Steps

# Check certificate expiration
openssl x509 -in cert.pem -text -noout | grep "Not After"

# Test SSL connectivity
openssl s_client -connect domain.com:443

# Check certificate chain
curl -I https://domain.com

Solutions

# Renew Let's Encrypt certificates
certbot renew

# Generate new self-signed certificate
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365

# Update certificate in services
# Copy new certificates to appropriate volumes

Authentication Failures

Symptoms

  • Repeated login failures
  • Account lockouts
  • Suspicious access attempts

Diagnostic Steps

# Check authentication logs
journalctl -u ssh.service | grep "Failed password"
docker logs authentik-server | grep "login failed"

# Check fail2ban status
sudo fail2ban-client status
sudo fail2ban-client status sshd

Solutions

# Unban IP addresses
sudo fail2ban-client set sshd unbanip IP_ADDRESS

# Strengthen authentication
# Enable 2FA, use SSH keys, implement rate limiting

# Monitor for brute force attacks
# Set up alerting for repeated failures

Emergency Procedures

Complete System Recovery

When to Use

  • Multiple service failures
  • System corruption
  • Hardware failures

Recovery Steps

# 1. Stop all services
docker stop $(docker ps -q)

# 2. Check system integrity
fsck /dev/sda1

# 3. Restore from backup
./scripts/restore-system.sh

# 4. Restart critical services
./scripts/deploy-critical.sh

# 5. Verify system health
./scripts/health-check.sh

Data Recovery

When to Use

  • Data corruption
  • Accidental deletion
  • Storage failures

Recovery Steps

# 1. Stop affected services
docker-compose down

# 2. Mount backup storage
mount /dev/backup /mnt/restore

# 3. Restore data
rsync -av /mnt/restore/service-data/ /mnt/storage/service-data/

# 4. Fix permissions
chown -R 1000:1000 /mnt/storage/service-data

# 5. Restart services
docker-compose up -d

Network Recovery

When to Use

  • Network connectivity loss
  • DNS failures
  • Routing issues

Recovery Steps

# 1. Check physical connectivity
ip link show

# 2. Restart networking
systemctl restart networking

# 3. Reset network configuration
netplan apply

# 4. Flush DNS cache
systemctl restart systemd-resolved

# 5. Test connectivity
ping 8.8.8.8

Prevention Strategies

Monitoring & Alerting

  • Set up comprehensive monitoring
  • Configure proactive alerts
  • Regular health checks
  • Performance baselines

Backup & Recovery

  • Automated backup schedules
  • Regular restore testing
  • Offsite backup storage
  • Documentation of procedures

Maintenance

  • Regular system updates
  • Capacity planning
  • Performance optimization
  • Security hardening

Documentation

  • Incident response procedures
  • Configuration documentation
  • Change management processes
  • Knowledge sharing

This troubleshooting guide provides comprehensive solutions for common issues encountered in the homelab environment. Keep this guide updated with new issues and solutions as they are discovered.