# πŸ“ˆ Scaling Guide ## Overview This guide covers scaling the homelab infrastructure to handle more services, users, and data. --- ## 🎯 Scaling Dimensions ### 1. Vertical Scaling (Scale Up) - Add more RAM to existing hosts - Upgrade CPU - Add faster storage (NVMe) - Upgrade network (10GbE β†’ 25GbE) ### 2. Horizontal Scaling (Scale Out) - Add more NAS units - Add more compute nodes - Distribute services across hosts - Add remote locations --- ## πŸ’Ύ Storage Scaling ### Current Capacity ``` Atlantis (DS1823xs+): 8-bay, 128TB raw Calypso (DS723+): 2-bay, 24TB raw ─────────────────────────────────────── Total: 152TB raw (~107TB usable) ``` ### Expansion Options #### Option 1: Larger Drives ```yaml # Replace 16TB with 20TB or 24TB drives current: 8 Γ— 16TB = 128TB upgraded: 8 Γ— 24TB = 192TB (50% increase) ``` #### Option 2: Add Expansion Unit ```yaml # Synology DX1222 expansion expansion: 12 Γ— 16TB = 192TB additional total: 128TB + 192TB = 320TB raw ``` #### Option 3: Add Another NAS ```yaml # New DS1823xs+ or RS1221+ benefits: - Separate failure domain - Independent workloads - Location redundancy ``` --- ## πŸ–₯️ Compute Scaling ### Current Resources ``` Host | CPU | RAM | Containers ──────────────┼──────────────┼───────┼─────────── Atlantis | Ryzen V1780B | 32GB | 55 Calypso | Celeron J4125| 8GB | 17 Homelab VM | 4 vCPU | 8GB | 36 Proxmox | Variable | 32GB | VMs ``` ### Adding Compute #### Mini PC Nodes ```yaml # Intel NUC or similar recommended: - Intel N100/N305 mini PC - 16-32GB RAM - NVMe storage - Low power (~15-25W) ``` #### Proxmox Cluster ```yaml # Scale VMs across multiple hosts cluster: - Node 1: Proxmox primary - Node 2: Proxmox secondary - Node 3: Proxmox tertiary benefits: - Live migration - High availability - Resource pooling ``` --- ## 🌐 Network Scaling ### Bandwidth Growth Path ``` 1GbE β†’ 2.5GbE β†’ 10GbE β†’ 25GbE ``` ### Current Bottlenecks ```yaml bottlenecks: - 1GbE devices limit backups - Internet upload for remote backup - Cross-VLAN traffic solutions: - Upgrade remaining devices to 2.5GbE+ - Link aggregation where supported - QoS for prioritization ``` ### Multi-Gig Upgrade ```yaml # Components needed for full 10GbE switch: - TP-Link TL-SX1008 (8-port 10GbE) nics: - Synology E10G22-T1-Mini - Intel X550-T2 for PCs cables: - Cat6a minimum (Cat7 preferred) ``` --- ## πŸ“Š Service Scaling ### Database Scaling ```yaml # When single PostgreSQL isn't enough options: - Read replicas - Connection pooling (PgBouncer) - Partitioning large tables - Separate DB per service ``` ### Container Orchestration ```yaml # Beyond docker-compose options: - Docker Swarm (simple) - Kubernetes (complex, powerful) - Nomad (middle ground) # When to upgrade: # - Need rolling updates # - Need service discovery # - >50 containers # - Multi-host orchestration ``` ### Load Balancing ```yaml # Distribute traffic across instances options: - NPM with upstream - Traefik - HAProxy # Example: Multiple Plex transcoders upstream plex_cluster { server plex1:32400; server plex2:32400; } ``` --- ## πŸ‘₯ User Scaling ### Current Limits ```yaml users: plex: 5 concurrent streams immich: ~10 users vaultwarden: unlimited (self-hosted) ``` ### Scaling for More Users ```yaml # Plex: Add hardware transcoding hardware_transcoding: - Intel QuickSync (preferred) - NVIDIA GPU (more power) # Immich: More workers immich: web_concurrency: 4 machine_learning_workers: 2 # General: CDN for static assets cdn: - Cloudflare (free tier) - Self-hosted cache ``` --- ## πŸ—ΊοΈ Geographic Scaling ### Current Locations ``` Concord, CA (Primary) - 150+ services Tucson, AZ (Backup) - 4 services Honolulu, HI (Remote) - Access only Seattle, WA (Cloud) - 1 service ``` ### Adding Locations ```yaml # New remote site checklist requirements: - Stable internet (50+ Mbps) - Synology NAS or similar - Tailscale node - Basic UPS services_to_deploy: - Pi-hole/AdGuard (local DNS) - Syncthing (file sync) - Monitoring exporter ``` --- ## πŸ“‹ Scaling Checklist ### Before Scaling - [ ] Identify actual bottleneck - [ ] Check if optimization can help first - [ ] Calculate cost/benefit - [ ] Plan migration path ### During Scaling - [ ] Test in parallel first - [ ] Migrate data carefully - [ ] Update monitoring - [ ] Document changes ### After Scaling - [ ] Verify performance improvement - [ ] Update documentation - [ ] Adjust budgets - [ ] Plan next scaling point --- ## πŸ“š Further Reading - [Performance Troubleshooting](../troubleshooting/performance.md) - [Storage Topology](../diagrams/storage-topology.md) - [Network Topology](../diagrams/network-topology.md) - [Ansible Automation](ansible.md)