Sanitized mirror from private repository - 2026-03-26 12:32:56 UTC
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m1s
Documentation / Deploy to GitHub Pages (push) Has been skipped

This commit is contained in:
Gitea Mirror Bot
2026-03-26 12:32:56 +00:00
commit 7122d277d5
1268 changed files with 311975 additions and 0 deletions

266
docs/advanced/scaling.md Normal file
View File

@@ -0,0 +1,266 @@
# 📈 Scaling Guide
## Overview
This guide covers scaling the homelab infrastructure to handle more services, users, and data.
---
## 🎯 Scaling Dimensions
### 1. Vertical Scaling (Scale Up)
- Add more RAM to existing hosts
- Upgrade CPU
- Add faster storage (NVMe)
- Upgrade network (10GbE → 25GbE)
### 2. Horizontal Scaling (Scale Out)
- Add more NAS units
- Add more compute nodes
- Distribute services across hosts
- Add remote locations
---
## 💾 Storage Scaling
### Current Capacity
```
Atlantis (DS1823xs+): 8-bay, 128TB raw
Calypso (DS723+): 2-bay, 24TB raw
───────────────────────────────────────
Total: 152TB raw (~107TB usable)
```
### Expansion Options
#### Option 1: Larger Drives
```yaml
# Replace 16TB with 20TB or 24TB drives
current: 8 × 16TB = 128TB
upgraded: 8 × 24TB = 192TB (50% increase)
```
#### Option 2: Add Expansion Unit
```yaml
# Synology DX1222 expansion
expansion: 12 × 16TB = 192TB additional
total: 128TB + 192TB = 320TB raw
```
#### Option 3: Add Another NAS
```yaml
# New DS1823xs+ or RS1221+
benefits:
- Separate failure domain
- Independent workloads
- Location redundancy
```
---
## 🖥️ Compute Scaling
### Current Resources
```
Host | CPU | RAM | Containers
──────────────┼──────────────┼───────┼───────────
Atlantis | Ryzen V1780B | 32GB | 55
Calypso | Celeron J4125| 8GB | 17
Homelab VM | 4 vCPU | 8GB | 36
Proxmox | Variable | 32GB | VMs
```
### Adding Compute
#### Mini PC Nodes
```yaml
# Intel NUC or similar
recommended:
- Intel N100/N305 mini PC
- 16-32GB RAM
- NVMe storage
- Low power (~15-25W)
```
#### Proxmox Cluster
```yaml
# Scale VMs across multiple hosts
cluster:
- Node 1: Proxmox primary
- Node 2: Proxmox secondary
- Node 3: Proxmox tertiary
benefits:
- Live migration
- High availability
- Resource pooling
```
---
## 🌐 Network Scaling
### Bandwidth Growth Path
```
1GbE → 2.5GbE → 10GbE → 25GbE
```
### Current Bottlenecks
```yaml
bottlenecks:
- 1GbE devices limit backups
- Internet upload for remote backup
- Cross-VLAN traffic
solutions:
- Upgrade remaining devices to 2.5GbE+
- Link aggregation where supported
- QoS for prioritization
```
### Multi-Gig Upgrade
```yaml
# Components needed for full 10GbE
switch:
- TP-Link TL-SX1008 (8-port 10GbE)
nics:
- Synology E10G22-T1-Mini
- Intel X550-T2 for PCs
cables:
- Cat6a minimum (Cat7 preferred)
```
---
## 📊 Service Scaling
### Database Scaling
```yaml
# When single PostgreSQL isn't enough
options:
- Read replicas
- Connection pooling (PgBouncer)
- Partitioning large tables
- Separate DB per service
```
### Container Orchestration
```yaml
# Beyond docker-compose
options:
- Docker Swarm (simple)
- Kubernetes (complex, powerful)
- Nomad (middle ground)
# When to upgrade:
# - Need rolling updates
# - Need service discovery
# - >50 containers
# - Multi-host orchestration
```
### Load Balancing
```yaml
# Distribute traffic across instances
options:
- NPM with upstream
- Traefik
- HAProxy
# Example: Multiple Plex transcoders
upstream plex_cluster {
server plex1:32400;
server plex2:32400;
}
```
---
## 👥 User Scaling
### Current Limits
```yaml
users:
plex: 5 concurrent streams
immich: ~10 users
vaultwarden: unlimited (self-hosted)
```
### Scaling for More Users
```yaml
# Plex: Add hardware transcoding
hardware_transcoding:
- Intel QuickSync (preferred)
- NVIDIA GPU (more power)
# Immich: More workers
immich:
web_concurrency: 4
machine_learning_workers: 2
# General: CDN for static assets
cdn:
- Cloudflare (free tier)
- Self-hosted cache
```
---
## 🗺️ Geographic Scaling
### Current Locations
```
Concord, CA (Primary) - 150+ services
Tucson, AZ (Backup) - 4 services
Honolulu, HI (Remote) - Access only
Seattle, WA (Cloud) - 1 service
```
### Adding Locations
```yaml
# New remote site checklist
requirements:
- Stable internet (50+ Mbps)
- Synology NAS or similar
- Tailscale node
- Basic UPS
services_to_deploy:
- Pi-hole/AdGuard (local DNS)
- Syncthing (file sync)
- Monitoring exporter
```
---
## 📋 Scaling Checklist
### Before Scaling
- [ ] Identify actual bottleneck
- [ ] Check if optimization can help first
- [ ] Calculate cost/benefit
- [ ] Plan migration path
### During Scaling
- [ ] Test in parallel first
- [ ] Migrate data carefully
- [ ] Update monitoring
- [ ] Document changes
### After Scaling
- [ ] Verify performance improvement
- [ ] Update documentation
- [ ] Adjust budgets
- [ ] Plan next scaling point
---
## 📚 Further Reading
- [Performance Troubleshooting](../troubleshooting/performance.md)
- [Storage Topology](../diagrams/storage-topology.md)
- [Network Topology](../diagrams/network-topology.md)
- [Ansible Automation](ansible.md)