5.0 KiB
5.0 KiB
📈 Scaling Guide
Overview
This guide covers scaling the homelab infrastructure to handle more services, users, and data.
🎯 Scaling Dimensions
1. Vertical Scaling (Scale Up)
- Add more RAM to existing hosts
- Upgrade CPU
- Add faster storage (NVMe)
- Upgrade network (10GbE → 25GbE)
2. Horizontal Scaling (Scale Out)
- Add more NAS units
- Add more compute nodes
- Distribute services across hosts
- Add remote locations
💾 Storage Scaling
Current Capacity
Atlantis (DS1823xs+): 8-bay, 128TB raw
Calypso (DS723+): 2-bay, 24TB raw
───────────────────────────────────────
Total: 152TB raw (~107TB usable)
Expansion Options
Option 1: Larger Drives
# Replace 16TB with 20TB or 24TB drives
current: 8 × 16TB = 128TB
upgraded: 8 × 24TB = 192TB (50% increase)
Option 2: Add Expansion Unit
# Synology DX1222 expansion
expansion: 12 × 16TB = 192TB additional
total: 128TB + 192TB = 320TB raw
Option 3: Add Another NAS
# New DS1823xs+ or RS1221+
benefits:
- Separate failure domain
- Independent workloads
- Location redundancy
🖥️ Compute Scaling
Current Resources
Host | CPU | RAM | Containers
──────────────┼──────────────┼───────┼───────────
Atlantis | Ryzen V1780B | 32GB | 55
Calypso | Celeron J4125| 8GB | 17
Homelab VM | 4 vCPU | 8GB | 36
Proxmox | Variable | 32GB | VMs
Adding Compute
Mini PC Nodes
# Intel NUC or similar
recommended:
- Intel N100/N305 mini PC
- 16-32GB RAM
- NVMe storage
- Low power (~15-25W)
Proxmox Cluster
# Scale VMs across multiple hosts
cluster:
- Node 1: Proxmox primary
- Node 2: Proxmox secondary
- Node 3: Proxmox tertiary
benefits:
- Live migration
- High availability
- Resource pooling
🌐 Network Scaling
Bandwidth Growth Path
1GbE → 2.5GbE → 10GbE → 25GbE
Current Bottlenecks
bottlenecks:
- 1GbE devices limit backups
- Internet upload for remote backup
- Cross-VLAN traffic
solutions:
- Upgrade remaining devices to 2.5GbE+
- Link aggregation where supported
- QoS for prioritization
Multi-Gig Upgrade
# Components needed for full 10GbE
switch:
- TP-Link TL-SX1008 (8-port 10GbE)
nics:
- Synology E10G22-T1-Mini
- Intel X550-T2 for PCs
cables:
- Cat6a minimum (Cat7 preferred)
📊 Service Scaling
Database Scaling
# When single PostgreSQL isn't enough
options:
- Read replicas
- Connection pooling (PgBouncer)
- Partitioning large tables
- Separate DB per service
Container Orchestration
# Beyond docker-compose
options:
- Docker Swarm (simple)
- Kubernetes (complex, powerful)
- Nomad (middle ground)
# When to upgrade:
# - Need rolling updates
# - Need service discovery
# - >50 containers
# - Multi-host orchestration
Load Balancing
# Distribute traffic across instances
options:
- NPM with upstream
- Traefik
- HAProxy
# Example: Multiple Plex transcoders
upstream plex_cluster {
server plex1:32400;
server plex2:32400;
}
👥 User Scaling
Current Limits
users:
plex: 5 concurrent streams
immich: ~10 users
vaultwarden: unlimited (self-hosted)
Scaling for More Users
# Plex: Add hardware transcoding
hardware_transcoding:
- Intel QuickSync (preferred)
- NVIDIA GPU (more power)
# Immich: More workers
immich:
web_concurrency: 4
machine_learning_workers: 2
# General: CDN for static assets
cdn:
- Cloudflare (free tier)
- Self-hosted cache
🗺️ Geographic Scaling
Current Locations
Concord, CA (Primary) - 150+ services
Tucson, AZ (Backup) - 4 services
Honolulu, HI (Remote) - Access only
Seattle, WA (Cloud) - 1 service
Adding Locations
# New remote site checklist
requirements:
- Stable internet (50+ Mbps)
- Synology NAS or similar
- Tailscale node
- Basic UPS
services_to_deploy:
- Pi-hole/AdGuard (local DNS)
- Syncthing (file sync)
- Monitoring exporter
📋 Scaling Checklist
Before Scaling
- Identify actual bottleneck
- Check if optimization can help first
- Calculate cost/benefit
- Plan migration path
During Scaling
- Test in parallel first
- Migrate data carefully
- Update monitoring
- Document changes
After Scaling
- Verify performance improvement
- Update documentation
- Adjust budgets
- Plan next scaling point