Files
homelab-optimized/docs/advanced/scaling.md
Gitea Mirror Bot 5c2fcfeb21
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m1s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-28 12:26:38 UTC
2026-03-28 12:26:38 +00:00

5.0 KiB
Raw Blame History

📈 Scaling Guide

Overview

This guide covers scaling the homelab infrastructure to handle more services, users, and data.


🎯 Scaling Dimensions

1. Vertical Scaling (Scale Up)

  • Add more RAM to existing hosts
  • Upgrade CPU
  • Add faster storage (NVMe)
  • Upgrade network (10GbE → 25GbE)

2. Horizontal Scaling (Scale Out)

  • Add more NAS units
  • Add more compute nodes
  • Distribute services across hosts
  • Add remote locations

💾 Storage Scaling

Current Capacity

Atlantis (DS1823xs+): 8-bay, 128TB raw
Calypso (DS723+):     2-bay, 24TB raw
───────────────────────────────────────
Total:                152TB raw (~107TB usable)

Expansion Options

Option 1: Larger Drives

# Replace 16TB with 20TB or 24TB drives
current:  8 × 16TB = 128TB
upgraded: 8 × 24TB = 192TB (50% increase)

Option 2: Add Expansion Unit

# Synology DX1222 expansion
expansion: 12 × 16TB = 192TB additional
total:     128TB + 192TB = 320TB raw

Option 3: Add Another NAS

# New DS1823xs+ or RS1221+
benefits:
  - Separate failure domain
  - Independent workloads
  - Location redundancy

🖥️ Compute Scaling

Current Resources

Host          | CPU          | RAM   | Containers
──────────────┼──────────────┼───────┼───────────
Atlantis      | Ryzen V1780B | 32GB  | 55
Calypso       | Celeron J4125| 8GB   | 17
Homelab VM    | 4 vCPU       | 8GB   | 36
Proxmox       | Variable     | 32GB  | VMs

Adding Compute

Mini PC Nodes

# Intel NUC or similar
recommended:
  - Intel N100/N305 mini PC
  - 16-32GB RAM
  - NVMe storage
  - Low power (~15-25W)

Proxmox Cluster

# Scale VMs across multiple hosts
cluster:
  - Node 1: Proxmox primary
  - Node 2: Proxmox secondary
  - Node 3: Proxmox tertiary
  
benefits:
  - Live migration
  - High availability
  - Resource pooling

🌐 Network Scaling

Bandwidth Growth Path

1GbE → 2.5GbE → 10GbE → 25GbE

Current Bottlenecks

bottlenecks:
  - 1GbE devices limit backups
  - Internet upload for remote backup
  - Cross-VLAN traffic

solutions:
  - Upgrade remaining devices to 2.5GbE+
  - Link aggregation where supported
  - QoS for prioritization

Multi-Gig Upgrade

# Components needed for full 10GbE
switch:
  - TP-Link TL-SX1008 (8-port 10GbE)
  
nics:
  - Synology E10G22-T1-Mini
  - Intel X550-T2 for PCs
  
cables:
  - Cat6a minimum (Cat7 preferred)

📊 Service Scaling

Database Scaling

# When single PostgreSQL isn't enough
options:
  - Read replicas
  - Connection pooling (PgBouncer)
  - Partitioning large tables
  - Separate DB per service

Container Orchestration

# Beyond docker-compose
options:
  - Docker Swarm (simple)
  - Kubernetes (complex, powerful)
  - Nomad (middle ground)

# When to upgrade:
# - Need rolling updates
# - Need service discovery
# - >50 containers
# - Multi-host orchestration

Load Balancing

# Distribute traffic across instances
options:
  - NPM with upstream
  - Traefik
  - HAProxy
  
# Example: Multiple Plex transcoders
upstream plex_cluster {
    server plex1:32400;
    server plex2:32400;
}

👥 User Scaling

Current Limits

users:
  plex: 5 concurrent streams
  immich: ~10 users
  vaultwarden: unlimited (self-hosted)

Scaling for More Users

# Plex: Add hardware transcoding
hardware_transcoding:
  - Intel QuickSync (preferred)
  - NVIDIA GPU (more power)

# Immich: More workers
immich:
  web_concurrency: 4
  machine_learning_workers: 2

# General: CDN for static assets
cdn:
  - Cloudflare (free tier)
  - Self-hosted cache

🗺️ Geographic Scaling

Current Locations

Concord, CA (Primary)     - 150+ services
Tucson, AZ (Backup)       - 4 services
Honolulu, HI (Remote)     - Access only
Seattle, WA (Cloud)       - 1 service

Adding Locations

# New remote site checklist
requirements:
  - Stable internet (50+ Mbps)
  - Synology NAS or similar
  - Tailscale node
  - Basic UPS

services_to_deploy:
  - Pi-hole/AdGuard (local DNS)
  - Syncthing (file sync)
  - Monitoring exporter

📋 Scaling Checklist

Before Scaling

  • Identify actual bottleneck
  • Check if optimization can help first
  • Calculate cost/benefit
  • Plan migration path

During Scaling

  • Test in parallel first
  • Migrate data carefully
  • Update monitoring
  • Document changes

After Scaling

  • Verify performance improvement
  • Update documentation
  • Adjust budgets
  • Plan next scaling point

📚 Further Reading