Sanitized mirror from private repository - 2026-03-26 12:32:56 UTC

2026-03-26 12:32:56 +00:00
commit 7122d277d5
1268 changed files with 311975 additions and 0 deletions
--- a/docs/advanced/scaling.md
+++ b/docs/advanced/scaling.md
@@ -0,0 +1,266 @@
+# 📈 Scaling Guide
+
+## Overview
+
+This guide covers scaling the homelab infrastructure to handle more services, users, and data.
+
+---
+
+## 🎯 Scaling Dimensions
+
+### 1. Vertical Scaling (Scale Up)
+- Add more RAM to existing hosts
+- Upgrade CPU
+- Add faster storage (NVMe)
+- Upgrade network (10GbE → 25GbE)
+
+### 2. Horizontal Scaling (Scale Out)
+- Add more NAS units
+- Add more compute nodes
+- Distribute services across hosts
+- Add remote locations
+
+---
+
+## 💾 Storage Scaling
+
+### Current Capacity
+```
+Atlantis (DS1823xs+): 8-bay, 128TB raw
+Calypso (DS723+):     2-bay, 24TB raw
+───────────────────────────────────────
+Total:                152TB raw (~107TB usable)
+```
+
+### Expansion Options
+
+#### Option 1: Larger Drives
+```yaml
+# Replace 16TB with 20TB or 24TB drives
+current:  8 × 16TB = 128TB
+upgraded: 8 × 24TB = 192TB (50% increase)
+```
+
+#### Option 2: Add Expansion Unit
+```yaml
+# Synology DX1222 expansion
+expansion: 12 × 16TB = 192TB additional
+total:     128TB + 192TB = 320TB raw
+```
+
+#### Option 3: Add Another NAS
+```yaml
+# New DS1823xs+ or RS1221+
+benefits:
+  - Separate failure domain
+  - Independent workloads
+  - Location redundancy
+```
+
+---
+
+## 🖥️ Compute Scaling
+
+### Current Resources
+```
+Host          | CPU          | RAM   | Containers
+──────────────┼──────────────┼───────┼───────────
+Atlantis      | Ryzen V1780B | 32GB  | 55
+Calypso       | Celeron J4125| 8GB   | 17
+Homelab VM    | 4 vCPU       | 8GB   | 36
+Proxmox       | Variable     | 32GB  | VMs
+```
+
+### Adding Compute
+
+#### Mini PC Nodes
+```yaml
+# Intel NUC or similar
+recommended:
+  - Intel N100/N305 mini PC
+  - 16-32GB RAM
+  - NVMe storage
+  - Low power (~15-25W)
+```
+
+#### Proxmox Cluster
+```yaml
+# Scale VMs across multiple hosts
+cluster:
+  - Node 1: Proxmox primary
+  - Node 2: Proxmox secondary
+  - Node 3: Proxmox tertiary
+  
+benefits:
+  - Live migration
+  - High availability
+  - Resource pooling
+```
+
+---
+
+## 🌐 Network Scaling
+
+### Bandwidth Growth Path
+```
+1GbE → 2.5GbE → 10GbE → 25GbE
+```
+
+### Current Bottlenecks
+```yaml
+bottlenecks:
+  - 1GbE devices limit backups
+  - Internet upload for remote backup
+  - Cross-VLAN traffic
+
+solutions:
+  - Upgrade remaining devices to 2.5GbE+
+  - Link aggregation where supported
+  - QoS for prioritization
+```
+
+### Multi-Gig Upgrade
+```yaml
+# Components needed for full 10GbE
+switch:
+  - TP-Link TL-SX1008 (8-port 10GbE)
+  
+nics:
+  - Synology E10G22-T1-Mini
+  - Intel X550-T2 for PCs
+  
+cables:
+  - Cat6a minimum (Cat7 preferred)
+```
+
+---
+
+## 📊 Service Scaling
+
+### Database Scaling
+```yaml
+# When single PostgreSQL isn't enough
+options:
+  - Read replicas
+  - Connection pooling (PgBouncer)
+  - Partitioning large tables
+  - Separate DB per service
+```
+
+### Container Orchestration
+```yaml
+# Beyond docker-compose
+options:
+  - Docker Swarm (simple)
+  - Kubernetes (complex, powerful)
+  - Nomad (middle ground)
+
+# When to upgrade:
+# - Need rolling updates
+# - Need service discovery
+# - >50 containers
+# - Multi-host orchestration
+```
+
+### Load Balancing
+```yaml
+# Distribute traffic across instances
+options:
+  - NPM with upstream
+  - Traefik
+  - HAProxy
+  
+# Example: Multiple Plex transcoders
+upstream plex_cluster {
+    server plex1:32400;
+    server plex2:32400;
+}
+```
+
+---
+
+## 👥 User Scaling
+
+### Current Limits
+```yaml
+users:
+  plex: 5 concurrent streams
+  immich: ~10 users
+  vaultwarden: unlimited (self-hosted)
+```
+
+### Scaling for More Users
+```yaml
+# Plex: Add hardware transcoding
+hardware_transcoding:
+  - Intel QuickSync (preferred)
+  - NVIDIA GPU (more power)
+
+# Immich: More workers
+immich:
+  web_concurrency: 4
+  machine_learning_workers: 2
+
+# General: CDN for static assets
+cdn:
+  - Cloudflare (free tier)
+  - Self-hosted cache
+```
+
+---
+
+## 🗺️ Geographic Scaling
+
+### Current Locations
+```
+Concord, CA (Primary)     - 150+ services
+Tucson, AZ (Backup)       - 4 services
+Honolulu, HI (Remote)     - Access only
+Seattle, WA (Cloud)       - 1 service
+```
+
+### Adding Locations
+```yaml
+# New remote site checklist
+requirements:
+  - Stable internet (50+ Mbps)
+  - Synology NAS or similar
+  - Tailscale node
+  - Basic UPS
+
+services_to_deploy:
+  - Pi-hole/AdGuard (local DNS)
+  - Syncthing (file sync)
+  - Monitoring exporter
+```
+
+---
+
+## 📋 Scaling Checklist
+
+### Before Scaling
+- [ ] Identify actual bottleneck
+- [ ] Check if optimization can help first
+- [ ] Calculate cost/benefit
+- [ ] Plan migration path
+
+### During Scaling
+- [ ] Test in parallel first
+- [ ] Migrate data carefully
+- [ ] Update monitoring
+- [ ] Document changes
+
+### After Scaling
+- [ ] Verify performance improvement
+- [ ] Update documentation
+- [ ] Adjust budgets
+- [ ] Plan next scaling point
+
+---
+
+## 📚 Further Reading
+
+- [Performance Troubleshooting](../troubleshooting/performance.md)
+- [Storage Topology](../diagrams/storage-topology.md)
+- [Network Topology](../diagrams/network-topology.md)
+- [Ansible Automation](ansible.md)