Sanitized mirror from private repository - 2026-04-20 01:32:01 UTC
This commit is contained in:
266
docs/advanced/scaling.md
Normal file
266
docs/advanced/scaling.md
Normal file
@@ -0,0 +1,266 @@
|
||||
# 📈 Scaling Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers scaling the homelab infrastructure to handle more services, users, and data.
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Scaling Dimensions
|
||||
|
||||
### 1. Vertical Scaling (Scale Up)
|
||||
- Add more RAM to existing hosts
|
||||
- Upgrade CPU
|
||||
- Add faster storage (NVMe)
|
||||
- Upgrade network (10GbE → 25GbE)
|
||||
|
||||
### 2. Horizontal Scaling (Scale Out)
|
||||
- Add more NAS units
|
||||
- Add more compute nodes
|
||||
- Distribute services across hosts
|
||||
- Add remote locations
|
||||
|
||||
---
|
||||
|
||||
## 💾 Storage Scaling
|
||||
|
||||
### Current Capacity
|
||||
```
|
||||
Atlantis (DS1823xs+): 8-bay, 128TB raw
|
||||
Calypso (DS723+): 2-bay, 24TB raw
|
||||
───────────────────────────────────────
|
||||
Total: 152TB raw (~107TB usable)
|
||||
```
|
||||
|
||||
### Expansion Options
|
||||
|
||||
#### Option 1: Larger Drives
|
||||
```yaml
|
||||
# Replace 16TB with 20TB or 24TB drives
|
||||
current: 8 × 16TB = 128TB
|
||||
upgraded: 8 × 24TB = 192TB (50% increase)
|
||||
```
|
||||
|
||||
#### Option 2: Add Expansion Unit
|
||||
```yaml
|
||||
# Synology DX1222 expansion
|
||||
expansion: 12 × 16TB = 192TB additional
|
||||
total: 128TB + 192TB = 320TB raw
|
||||
```
|
||||
|
||||
#### Option 3: Add Another NAS
|
||||
```yaml
|
||||
# New DS1823xs+ or RS1221+
|
||||
benefits:
|
||||
- Separate failure domain
|
||||
- Independent workloads
|
||||
- Location redundancy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Compute Scaling
|
||||
|
||||
### Current Resources
|
||||
```
|
||||
Host | CPU | RAM | Containers
|
||||
──────────────┼──────────────┼───────┼───────────
|
||||
Atlantis | Ryzen V1780B | 32GB | 55
|
||||
Calypso | Celeron J4125| 8GB | 17
|
||||
Homelab VM | 4 vCPU | 8GB | 36
|
||||
Proxmox | Variable | 32GB | VMs
|
||||
```
|
||||
|
||||
### Adding Compute
|
||||
|
||||
#### Mini PC Nodes
|
||||
```yaml
|
||||
# Intel NUC or similar
|
||||
recommended:
|
||||
- Intel N100/N305 mini PC
|
||||
- 16-32GB RAM
|
||||
- NVMe storage
|
||||
- Low power (~15-25W)
|
||||
```
|
||||
|
||||
#### Proxmox Cluster
|
||||
```yaml
|
||||
# Scale VMs across multiple hosts
|
||||
cluster:
|
||||
- Node 1: Proxmox primary
|
||||
- Node 2: Proxmox secondary
|
||||
- Node 3: Proxmox tertiary
|
||||
|
||||
benefits:
|
||||
- Live migration
|
||||
- High availability
|
||||
- Resource pooling
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Network Scaling
|
||||
|
||||
### Bandwidth Growth Path
|
||||
```
|
||||
1GbE → 2.5GbE → 10GbE → 25GbE
|
||||
```
|
||||
|
||||
### Current Bottlenecks
|
||||
```yaml
|
||||
bottlenecks:
|
||||
- 1GbE devices limit backups
|
||||
- Internet upload for remote backup
|
||||
- Cross-VLAN traffic
|
||||
|
||||
solutions:
|
||||
- Upgrade remaining devices to 2.5GbE+
|
||||
- Link aggregation where supported
|
||||
- QoS for prioritization
|
||||
```
|
||||
|
||||
### Multi-Gig Upgrade
|
||||
```yaml
|
||||
# Components needed for full 10GbE
|
||||
switch:
|
||||
- TP-Link TL-SX1008 (8-port 10GbE)
|
||||
|
||||
nics:
|
||||
- Synology E10G22-T1-Mini
|
||||
- Intel X550-T2 for PCs
|
||||
|
||||
cables:
|
||||
- Cat6a minimum (Cat7 preferred)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Service Scaling
|
||||
|
||||
### Database Scaling
|
||||
```yaml
|
||||
# When single PostgreSQL isn't enough
|
||||
options:
|
||||
- Read replicas
|
||||
- Connection pooling (PgBouncer)
|
||||
- Partitioning large tables
|
||||
- Separate DB per service
|
||||
```
|
||||
|
||||
### Container Orchestration
|
||||
```yaml
|
||||
# Beyond docker-compose
|
||||
options:
|
||||
- Docker Swarm (simple)
|
||||
- Kubernetes (complex, powerful)
|
||||
- Nomad (middle ground)
|
||||
|
||||
# When to upgrade:
|
||||
# - Need rolling updates
|
||||
# - Need service discovery
|
||||
# - >50 containers
|
||||
# - Multi-host orchestration
|
||||
```
|
||||
|
||||
### Load Balancing
|
||||
```yaml
|
||||
# Distribute traffic across instances
|
||||
options:
|
||||
- NPM with upstream
|
||||
- Traefik
|
||||
- HAProxy
|
||||
|
||||
# Example: Multiple Plex transcoders
|
||||
upstream plex_cluster {
|
||||
server plex1:32400;
|
||||
server plex2:32400;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 👥 User Scaling
|
||||
|
||||
### Current Limits
|
||||
```yaml
|
||||
users:
|
||||
plex: 5 concurrent streams
|
||||
immich: ~10 users
|
||||
vaultwarden: unlimited (self-hosted)
|
||||
```
|
||||
|
||||
### Scaling for More Users
|
||||
```yaml
|
||||
# Plex: Add hardware transcoding
|
||||
hardware_transcoding:
|
||||
- Intel QuickSync (preferred)
|
||||
- NVIDIA GPU (more power)
|
||||
|
||||
# Immich: More workers
|
||||
immich:
|
||||
web_concurrency: 4
|
||||
machine_learning_workers: 2
|
||||
|
||||
# General: CDN for static assets
|
||||
cdn:
|
||||
- Cloudflare (free tier)
|
||||
- Self-hosted cache
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🗺️ Geographic Scaling
|
||||
|
||||
### Current Locations
|
||||
```
|
||||
Concord, CA (Primary) - 150+ services
|
||||
Tucson, AZ (Backup) - 4 services
|
||||
Honolulu, HI (Remote) - Access only
|
||||
Seattle, WA (Cloud) - 1 service
|
||||
```
|
||||
|
||||
### Adding Locations
|
||||
```yaml
|
||||
# New remote site checklist
|
||||
requirements:
|
||||
- Stable internet (50+ Mbps)
|
||||
- Synology NAS or similar
|
||||
- Tailscale node
|
||||
- Basic UPS
|
||||
|
||||
services_to_deploy:
|
||||
- Pi-hole/AdGuard (local DNS)
|
||||
- Syncthing (file sync)
|
||||
- Monitoring exporter
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Scaling Checklist
|
||||
|
||||
### Before Scaling
|
||||
- [ ] Identify actual bottleneck
|
||||
- [ ] Check if optimization can help first
|
||||
- [ ] Calculate cost/benefit
|
||||
- [ ] Plan migration path
|
||||
|
||||
### During Scaling
|
||||
- [ ] Test in parallel first
|
||||
- [ ] Migrate data carefully
|
||||
- [ ] Update monitoring
|
||||
- [ ] Document changes
|
||||
|
||||
### After Scaling
|
||||
- [ ] Verify performance improvement
|
||||
- [ ] Update documentation
|
||||
- [ ] Adjust budgets
|
||||
- [ ] Plan next scaling point
|
||||
|
||||
---
|
||||
|
||||
## 📚 Further Reading
|
||||
|
||||
- [Performance Troubleshooting](../troubleshooting/performance.md)
|
||||
- [Storage Topology](../diagrams/storage-topology.md)
|
||||
- [Network Topology](../diagrams/network-topology.md)
|
||||
- [Ansible Automation](ansible.md)
|
||||
Reference in New Issue
Block a user