267 lines
5.0 KiB
Markdown
267 lines
5.0 KiB
Markdown
# 📈 Scaling Guide
|
||
|
||
## Overview
|
||
|
||
This guide covers scaling the homelab infrastructure to handle more services, users, and data.
|
||
|
||
---
|
||
|
||
## 🎯 Scaling Dimensions
|
||
|
||
### 1. Vertical Scaling (Scale Up)
|
||
- Add more RAM to existing hosts
|
||
- Upgrade CPU
|
||
- Add faster storage (NVMe)
|
||
- Upgrade network (10GbE → 25GbE)
|
||
|
||
### 2. Horizontal Scaling (Scale Out)
|
||
- Add more NAS units
|
||
- Add more compute nodes
|
||
- Distribute services across hosts
|
||
- Add remote locations
|
||
|
||
---
|
||
|
||
## 💾 Storage Scaling
|
||
|
||
### Current Capacity
|
||
```
|
||
Atlantis (DS1823xs+): 8-bay, 128TB raw
|
||
Calypso (DS723+): 2-bay, 24TB raw
|
||
───────────────────────────────────────
|
||
Total: 152TB raw (~107TB usable)
|
||
```
|
||
|
||
### Expansion Options
|
||
|
||
#### Option 1: Larger Drives
|
||
```yaml
|
||
# Replace 16TB with 20TB or 24TB drives
|
||
current: 8 × 16TB = 128TB
|
||
upgraded: 8 × 24TB = 192TB (50% increase)
|
||
```
|
||
|
||
#### Option 2: Add Expansion Unit
|
||
```yaml
|
||
# Synology DX1222 expansion
|
||
expansion: 12 × 16TB = 192TB additional
|
||
total: 128TB + 192TB = 320TB raw
|
||
```
|
||
|
||
#### Option 3: Add Another NAS
|
||
```yaml
|
||
# New DS1823xs+ or RS1221+
|
||
benefits:
|
||
- Separate failure domain
|
||
- Independent workloads
|
||
- Location redundancy
|
||
```
|
||
|
||
---
|
||
|
||
## 🖥️ Compute Scaling
|
||
|
||
### Current Resources
|
||
```
|
||
Host | CPU | RAM | Containers
|
||
──────────────┼──────────────┼───────┼───────────
|
||
Atlantis | Ryzen V1780B | 32GB | 55
|
||
Calypso | Celeron J4125| 8GB | 17
|
||
Homelab VM | 4 vCPU | 8GB | 36
|
||
Proxmox | Variable | 32GB | VMs
|
||
```
|
||
|
||
### Adding Compute
|
||
|
||
#### Mini PC Nodes
|
||
```yaml
|
||
# Intel NUC or similar
|
||
recommended:
|
||
- Intel N100/N305 mini PC
|
||
- 16-32GB RAM
|
||
- NVMe storage
|
||
- Low power (~15-25W)
|
||
```
|
||
|
||
#### Proxmox Cluster
|
||
```yaml
|
||
# Scale VMs across multiple hosts
|
||
cluster:
|
||
- Node 1: Proxmox primary
|
||
- Node 2: Proxmox secondary
|
||
- Node 3: Proxmox tertiary
|
||
|
||
benefits:
|
||
- Live migration
|
||
- High availability
|
||
- Resource pooling
|
||
```
|
||
|
||
---
|
||
|
||
## 🌐 Network Scaling
|
||
|
||
### Bandwidth Growth Path
|
||
```
|
||
1GbE → 2.5GbE → 10GbE → 25GbE
|
||
```
|
||
|
||
### Current Bottlenecks
|
||
```yaml
|
||
bottlenecks:
|
||
- 1GbE devices limit backups
|
||
- Internet upload for remote backup
|
||
- Cross-VLAN traffic
|
||
|
||
solutions:
|
||
- Upgrade remaining devices to 2.5GbE+
|
||
- Link aggregation where supported
|
||
- QoS for prioritization
|
||
```
|
||
|
||
### Multi-Gig Upgrade
|
||
```yaml
|
||
# Components needed for full 10GbE
|
||
switch:
|
||
- TP-Link TL-SX1008 (8-port 10GbE)
|
||
|
||
nics:
|
||
- Synology E10G22-T1-Mini
|
||
- Intel X550-T2 for PCs
|
||
|
||
cables:
|
||
- Cat6a minimum (Cat7 preferred)
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 Service Scaling
|
||
|
||
### Database Scaling
|
||
```yaml
|
||
# When single PostgreSQL isn't enough
|
||
options:
|
||
- Read replicas
|
||
- Connection pooling (PgBouncer)
|
||
- Partitioning large tables
|
||
- Separate DB per service
|
||
```
|
||
|
||
### Container Orchestration
|
||
```yaml
|
||
# Beyond docker-compose
|
||
options:
|
||
- Docker Swarm (simple)
|
||
- Kubernetes (complex, powerful)
|
||
- Nomad (middle ground)
|
||
|
||
# When to upgrade:
|
||
# - Need rolling updates
|
||
# - Need service discovery
|
||
# - >50 containers
|
||
# - Multi-host orchestration
|
||
```
|
||
|
||
### Load Balancing
|
||
```yaml
|
||
# Distribute traffic across instances
|
||
options:
|
||
- NPM with upstream
|
||
- Traefik
|
||
- HAProxy
|
||
|
||
# Example: Multiple Plex transcoders
|
||
upstream plex_cluster {
|
||
server plex1:32400;
|
||
server plex2:32400;
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 👥 User Scaling
|
||
|
||
### Current Limits
|
||
```yaml
|
||
users:
|
||
plex: 5 concurrent streams
|
||
immich: ~10 users
|
||
vaultwarden: unlimited (self-hosted)
|
||
```
|
||
|
||
### Scaling for More Users
|
||
```yaml
|
||
# Plex: Add hardware transcoding
|
||
hardware_transcoding:
|
||
- Intel QuickSync (preferred)
|
||
- NVIDIA GPU (more power)
|
||
|
||
# Immich: More workers
|
||
immich:
|
||
web_concurrency: 4
|
||
machine_learning_workers: 2
|
||
|
||
# General: CDN for static assets
|
||
cdn:
|
||
- Cloudflare (free tier)
|
||
- Self-hosted cache
|
||
```
|
||
|
||
---
|
||
|
||
## 🗺️ Geographic Scaling
|
||
|
||
### Current Locations
|
||
```
|
||
Concord, CA (Primary) - 150+ services
|
||
Tucson, AZ (Backup) - 4 services
|
||
Honolulu, HI (Remote) - Access only
|
||
Seattle, WA (Cloud) - 1 service
|
||
```
|
||
|
||
### Adding Locations
|
||
```yaml
|
||
# New remote site checklist
|
||
requirements:
|
||
- Stable internet (50+ Mbps)
|
||
- Synology NAS or similar
|
||
- Tailscale node
|
||
- Basic UPS
|
||
|
||
services_to_deploy:
|
||
- Pi-hole/AdGuard (local DNS)
|
||
- Syncthing (file sync)
|
||
- Monitoring exporter
|
||
```
|
||
|
||
---
|
||
|
||
## 📋 Scaling Checklist
|
||
|
||
### Before Scaling
|
||
- [ ] Identify actual bottleneck
|
||
- [ ] Check if optimization can help first
|
||
- [ ] Calculate cost/benefit
|
||
- [ ] Plan migration path
|
||
|
||
### During Scaling
|
||
- [ ] Test in parallel first
|
||
- [ ] Migrate data carefully
|
||
- [ ] Update monitoring
|
||
- [ ] Document changes
|
||
|
||
### After Scaling
|
||
- [ ] Verify performance improvement
|
||
- [ ] Update documentation
|
||
- [ ] Adjust budgets
|
||
- [ ] Plan next scaling point
|
||
|
||
---
|
||
|
||
## 📚 Further Reading
|
||
|
||
- [Performance Troubleshooting](../troubleshooting/performance.md)
|
||
- [Storage Topology](../diagrams/storage-topology.md)
|
||
- [Network Topology](../diagrams/network-topology.md)
|
||
- [Ansible Automation](ansible.md)
|