Files
homelab-optimized/docs/advanced/scaling.md
Gitea Mirror Bot 96943e885e
Some checks failed
Documentation / Build Docusaurus (push) Failing after 10s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-13 09:24:11 UTC
2026-03-13 09:24:11 +00:00

267 lines
5.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 📈 Scaling Guide
## Overview
This guide covers scaling the homelab infrastructure to handle more services, users, and data.
---
## 🎯 Scaling Dimensions
### 1. Vertical Scaling (Scale Up)
- Add more RAM to existing hosts
- Upgrade CPU
- Add faster storage (NVMe)
- Upgrade network (10GbE → 25GbE)
### 2. Horizontal Scaling (Scale Out)
- Add more NAS units
- Add more compute nodes
- Distribute services across hosts
- Add remote locations
---
## 💾 Storage Scaling
### Current Capacity
```
Atlantis (DS1823xs+): 8-bay, 128TB raw
Calypso (DS723+): 2-bay, 24TB raw
───────────────────────────────────────
Total: 152TB raw (~107TB usable)
```
### Expansion Options
#### Option 1: Larger Drives
```yaml
# Replace 16TB with 20TB or 24TB drives
current: 8 × 16TB = 128TB
upgraded: 8 × 24TB = 192TB (50% increase)
```
#### Option 2: Add Expansion Unit
```yaml
# Synology DX1222 expansion
expansion: 12 × 16TB = 192TB additional
total: 128TB + 192TB = 320TB raw
```
#### Option 3: Add Another NAS
```yaml
# New DS1823xs+ or RS1221+
benefits:
- Separate failure domain
- Independent workloads
- Location redundancy
```
---
## 🖥️ Compute Scaling
### Current Resources
```
Host | CPU | RAM | Containers
──────────────┼──────────────┼───────┼───────────
Atlantis | Ryzen V1780B | 32GB | 55
Calypso | Celeron J4125| 8GB | 17
Homelab VM | 4 vCPU | 8GB | 36
Proxmox | Variable | 32GB | VMs
```
### Adding Compute
#### Mini PC Nodes
```yaml
# Intel NUC or similar
recommended:
- Intel N100/N305 mini PC
- 16-32GB RAM
- NVMe storage
- Low power (~15-25W)
```
#### Proxmox Cluster
```yaml
# Scale VMs across multiple hosts
cluster:
- Node 1: Proxmox primary
- Node 2: Proxmox secondary
- Node 3: Proxmox tertiary
benefits:
- Live migration
- High availability
- Resource pooling
```
---
## 🌐 Network Scaling
### Bandwidth Growth Path
```
1GbE → 2.5GbE → 10GbE → 25GbE
```
### Current Bottlenecks
```yaml
bottlenecks:
- 1GbE devices limit backups
- Internet upload for remote backup
- Cross-VLAN traffic
solutions:
- Upgrade remaining devices to 2.5GbE+
- Link aggregation where supported
- QoS for prioritization
```
### Multi-Gig Upgrade
```yaml
# Components needed for full 10GbE
switch:
- TP-Link TL-SX1008 (8-port 10GbE)
nics:
- Synology E10G22-T1-Mini
- Intel X550-T2 for PCs
cables:
- Cat6a minimum (Cat7 preferred)
```
---
## 📊 Service Scaling
### Database Scaling
```yaml
# When single PostgreSQL isn't enough
options:
- Read replicas
- Connection pooling (PgBouncer)
- Partitioning large tables
- Separate DB per service
```
### Container Orchestration
```yaml
# Beyond docker-compose
options:
- Docker Swarm (simple)
- Kubernetes (complex, powerful)
- Nomad (middle ground)
# When to upgrade:
# - Need rolling updates
# - Need service discovery
# - >50 containers
# - Multi-host orchestration
```
### Load Balancing
```yaml
# Distribute traffic across instances
options:
- NPM with upstream
- Traefik
- HAProxy
# Example: Multiple Plex transcoders
upstream plex_cluster {
server plex1:32400;
server plex2:32400;
}
```
---
## 👥 User Scaling
### Current Limits
```yaml
users:
plex: 5 concurrent streams
immich: ~10 users
vaultwarden: unlimited (self-hosted)
```
### Scaling for More Users
```yaml
# Plex: Add hardware transcoding
hardware_transcoding:
- Intel QuickSync (preferred)
- NVIDIA GPU (more power)
# Immich: More workers
immich:
web_concurrency: 4
machine_learning_workers: 2
# General: CDN for static assets
cdn:
- Cloudflare (free tier)
- Self-hosted cache
```
---
## 🗺️ Geographic Scaling
### Current Locations
```
Concord, CA (Primary) - 150+ services
Tucson, AZ (Backup) - 4 services
Honolulu, HI (Remote) - Access only
Seattle, WA (Cloud) - 1 service
```
### Adding Locations
```yaml
# New remote site checklist
requirements:
- Stable internet (50+ Mbps)
- Synology NAS or similar
- Tailscale node
- Basic UPS
services_to_deploy:
- Pi-hole/AdGuard (local DNS)
- Syncthing (file sync)
- Monitoring exporter
```
---
## 📋 Scaling Checklist
### Before Scaling
- [ ] Identify actual bottleneck
- [ ] Check if optimization can help first
- [ ] Calculate cost/benefit
- [ ] Plan migration path
### During Scaling
- [ ] Test in parallel first
- [ ] Migrate data carefully
- [ ] Update monitoring
- [ ] Document changes
### After Scaling
- [ ] Verify performance improvement
- [ ] Update documentation
- [ ] Adjust budgets
- [ ] Plan next scaling point
---
## 📚 Further Reading
- [Performance Troubleshooting](../troubleshooting/performance.md)
- [Storage Topology](../diagrams/storage-topology.md)
- [Network Topology](../diagrams/network-topology.md)
- [Ansible Automation](ansible.md)