homelab-optimized/docs/advanced/scaling.md

# 📈 Scaling Guide

## Overview

This guide covers scaling the homelab infrastructure to handle more services, users, and data.

---

## 🎯 Scaling Dimensions

### 1. Vertical Scaling (Scale Up)
- Add more RAM to existing hosts
- Upgrade CPU
- Add faster storage (NVMe)
- Upgrade network (10GbE → 25GbE)

### 2. Horizontal Scaling (Scale Out)
- Add more NAS units
- Add more compute nodes
- Distribute services across hosts
- Add remote locations

---

## 💾 Storage Scaling

### Current Capacity
```
Atlantis (DS1823xs+): 8-bay, 128TB raw
Calypso (DS723+):     2-bay, 24TB raw
───────────────────────────────────────
Total:                152TB raw (~107TB usable)
```

### Expansion Options

#### Option 1: Larger Drives
```yaml
# Replace 16TB with 20TB or 24TB drives
current:  8 × 16TB = 128TB
upgraded: 8 × 24TB = 192TB (50% increase)
```

#### Option 2: Add Expansion Unit
```yaml
# Synology DX1222 expansion
expansion: 12 × 16TB = 192TB additional
total:     128TB + 192TB = 320TB raw
```

#### Option 3: Add Another NAS
```yaml
# New DS1823xs+ or RS1221+
benefits:
  - Separate failure domain
  - Independent workloads
  - Location redundancy
```

---

## 🖥️ Compute Scaling

### Current Resources
```
Host          | CPU          | RAM   | Containers
──────────────┼──────────────┼───────┼───────────
Atlantis      | Ryzen V1780B | 32GB  | 55
Calypso       | Celeron J4125| 8GB   | 17
Homelab VM    | 4 vCPU       | 8GB   | 36
Proxmox       | Variable     | 32GB  | VMs
```

### Adding Compute

#### Mini PC Nodes
```yaml
# Intel NUC or similar
recommended:
  - Intel N100/N305 mini PC
  - 16-32GB RAM
  - NVMe storage
  - Low power (~15-25W)
```

#### Proxmox Cluster
```yaml
# Scale VMs across multiple hosts
cluster:
  - Node 1: Proxmox primary
  - Node 2: Proxmox secondary
  - Node 3: Proxmox tertiary

benefits:
  - Live migration
  - High availability
  - Resource pooling
```

---

## 🌐 Network Scaling

### Bandwidth Growth Path
```
1GbE → 2.5GbE → 10GbE → 25GbE
```

### Current Bottlenecks
```yaml
bottlenecks:
  - 1GbE devices limit backups
  - Internet upload for remote backup
  - Cross-VLAN traffic

solutions:
  - Upgrade remaining devices to 2.5GbE+
  - Link aggregation where supported
  - QoS for prioritization
```

### Multi-Gig Upgrade
```yaml
# Components needed for full 10GbE
switch:
  - TP-Link TL-SX1008 (8-port 10GbE)

nics:
  - Synology E10G22-T1-Mini
  - Intel X550-T2 for PCs

cables:
  - Cat6a minimum (Cat7 preferred)
```

---

## 📊 Service Scaling

### Database Scaling
```yaml
# When single PostgreSQL isn't enough
options:
  - Read replicas
  - Connection pooling (PgBouncer)
  - Partitioning large tables
  - Separate DB per service
```

### Container Orchestration
```yaml
# Beyond docker-compose
options:
  - Docker Swarm (simple)
  - Kubernetes (complex, powerful)
  - Nomad (middle ground)

# When to upgrade:
# - Need rolling updates
# - Need service discovery
# - >50 containers
# - Multi-host orchestration
```

### Load Balancing
```yaml
# Distribute traffic across instances
options:
  - NPM with upstream
  - Traefik
  - HAProxy

# Example: Multiple Plex transcoders
upstream plex_cluster {
    server plex1:32400;
    server plex2:32400;
}
```

---

## 👥 User Scaling

### Current Limits
```yaml
users:
  plex: 5 concurrent streams
  immich: ~10 users
  vaultwarden: unlimited (self-hosted)
```

### Scaling for More Users
```yaml
# Plex: Add hardware transcoding
hardware_transcoding:
  - Intel QuickSync (preferred)
  - NVIDIA GPU (more power)

# Immich: More workers
immich:
  web_concurrency: 4
  machine_learning_workers: 2

# General: CDN for static assets
cdn:
  - Cloudflare (free tier)
  - Self-hosted cache
```

---

## 🗺️ Geographic Scaling

### Current Locations
```
Concord, CA (Primary)     - 150+ services
Tucson, AZ (Backup)       - 4 services
Honolulu, HI (Remote)     - Access only
Seattle, WA (Cloud)       - 1 service
```

### Adding Locations
```yaml
# New remote site checklist
requirements:
  - Stable internet (50+ Mbps)
  - Synology NAS or similar
  - Tailscale node
  - Basic UPS

services_to_deploy:
  - Pi-hole/AdGuard (local DNS)
  - Syncthing (file sync)
  - Monitoring exporter
```

---

## 📋 Scaling Checklist

### Before Scaling
- [ ] Identify actual bottleneck
- [ ] Check if optimization can help first
- [ ] Calculate cost/benefit
- [ ] Plan migration path

### During Scaling
- [ ] Test in parallel first
- [ ] Migrate data carefully
- [ ] Update monitoring
- [ ] Document changes

### After Scaling
- [ ] Verify performance improvement
- [ ] Update documentation
- [ ] Adjust budgets
- [ ] Plan next scaling point

---

## 📚 Further Reading

- [Performance Troubleshooting](../troubleshooting/performance.md)
- [Storage Topology](../diagrams/storage-topology.md)
- [Network Topology](../diagrams/network-topology.md)
- [Ansible Automation](ansible.md)