Sanitized mirror from private repository - 2026-04-04 11:24:58 UTC
This commit is contained in:
393
docs/infrastructure/storage.md
Normal file
393
docs/infrastructure/storage.md
Normal file
@@ -0,0 +1,393 @@
|
||||
# 💾 Storage Systems
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This document covers the storage architecture, RAID configurations, backup strategies, and data management practices for the homelab infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Storage Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ STORAGE INFRASTRUCTURE │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ PRIMARY STORAGE BACKUP TARGETS │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ ATLANTIS │ │ CALYPSO │ │
|
||||
│ │ Synology NAS │ ──────► │ Synology NAS │ │
|
||||
│ │ │ Hyper │ │ │
|
||||
│ │ 8x 16TB RAID 6 │ Backup │ 2x 12TB RAID 1 │ │
|
||||
│ │ ≈96TB usable │ │ ≈12TB usable │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ + 2x 480GB NVMe │ │ + 2x 480GB NVMe │ │
|
||||
│ │ (SSD Cache) │ │ (SSD Cache) │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ │
|
||||
│ │ │ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ BACKBLAZE B2 │ │
|
||||
│ │ Cloud Offsite Backup │ │
|
||||
│ │ Encrypted, Versioned Storage │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ SECONDARY STORAGE │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ GUAVA │ │ SETILLO │ │ PROXMOX │ │
|
||||
│ │ RAID 1 HDD │ │ Single 1TB │ │ Local SSD │ │
|
||||
│ │ + NVMe SSD │ │ │ │ │ │
|
||||
│ └───────────────┘ └───────────────┘ └───────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Storage Summary
|
||||
|
||||
| Host | Total Raw | Usable | RAID Level | Purpose |
|
||||
|------|-----------|--------|------------|---------|
|
||||
| **Atlantis** | 128TB (8x16TB) | ~96TB | RAID 6 | Primary storage, media |
|
||||
| **Calypso** | 24TB (2x12TB) | ~12TB | RAID 1 | Backup, development |
|
||||
| **Guava** | 6TB+ | ~3TB | RAID 1 | AI/ML, compute |
|
||||
| **Setillo** | 1TB | 1TB | Single | Monitoring |
|
||||
| **Proxmox** | ~500GB | 500GB | Local SSD | VM storage |
|
||||
|
||||
---
|
||||
|
||||
## 🏛️ Atlantis - Primary Storage
|
||||
|
||||
### **Hardware Configuration**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **NAS Model** | Synology DS1823xs+ |
|
||||
| **Drive Bays** | 8x 3.5" hot-swap |
|
||||
| **Drives** | 8x Seagate IronWolf Pro 16TB (ST16000NT001) |
|
||||
| **Cache** | 2x WD Black SN750 480GB NVMe |
|
||||
| **RAID Level** | RAID 6 (dual parity) |
|
||||
| **Raw Capacity** | 128TB |
|
||||
| **Usable Capacity** | ~96TB |
|
||||
| **Fault Tolerance** | 2 drive failures |
|
||||
|
||||
### **RAID 6 Benefits**
|
||||
|
||||
```
|
||||
RAID 6 Configuration:
|
||||
┌────┬────┬────┬────┬────┬────┬────┬────┐
|
||||
│ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ P1 │ P2 │ ← Data + Dual Parity
|
||||
├────┼────┼────┼────┼────┼────┼────┼────┤
|
||||
│ D1 │ D2 │ D3 │ D4 │ D5 │ P1 │ P2 │ D6 │ ← Parity distributed
|
||||
├────┼────┼────┼────┼────┼────┼────┼────┤
|
||||
│ D1 │ D2 │ D3 │ D4 │ P1 │ P2 │ D5 │ D6 │
|
||||
└────┴────┴────┴────┴────┴────┴────┴────┘
|
||||
|
||||
✅ Survives 2 simultaneous drive failures
|
||||
✅ Good read performance
|
||||
✅ 6 drives worth of usable space (75% efficiency)
|
||||
⚠️ Slower writes due to parity calculation
|
||||
```
|
||||
|
||||
### **Volume Layout**
|
||||
|
||||
```
|
||||
/volume1/ (Atlantis - ~96TB usable)
|
||||
│
|
||||
├── /docker/ # Container persistent data
|
||||
│ ├── plex/
|
||||
│ ├── immich/
|
||||
│ ├── grafana/
|
||||
│ └── ... (all stack data)
|
||||
│
|
||||
├── /media/ # Media library
|
||||
│ ├── movies/ # 4K + 1080p movies
|
||||
│ ├── tv/ # TV series
|
||||
│ ├── music/ # Music library
|
||||
│ └── audiobooks/ # Audiobook collection
|
||||
│
|
||||
├── /photos/ # Immich photo library
|
||||
│ ├── library/ # Organized photos
|
||||
│ └── upload/ # Incoming uploads
|
||||
│
|
||||
├── /documents/ # Paperless-NGX
|
||||
│ ├── consume/ # Incoming documents
|
||||
│ └── archive/ # Processed documents
|
||||
│
|
||||
├── /backups/ # Local backup storage
|
||||
│ ├── calypso/ # Cross-NAS backups
|
||||
│ └── vm-snapshots/ # VM backup images
|
||||
│
|
||||
└── /archive/ # Long-term cold storage
|
||||
└── old-projects/
|
||||
```
|
||||
|
||||
### **NVMe SSD Cache**
|
||||
- **Type**: Read-write cache
|
||||
- **Drives**: 2x WD Black SN750 480GB
|
||||
- **Configuration**: RAID 1 (mirrored for safety)
|
||||
- **Purpose**: Accelerate frequently accessed data
|
||||
|
||||
---
|
||||
|
||||
## 🏢 Calypso - Secondary Storage
|
||||
|
||||
### **Hardware Configuration**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **NAS Model** | Synology DS723+ |
|
||||
| **Drive Bays** | 2x 3.5" hot-swap |
|
||||
| **Drives** | 2x Seagate IronWolf Pro 12TB (ST12000NT001) |
|
||||
| **Cache** | 2x WD Black SN750 480GB NVMe |
|
||||
| **RAID Level** | RAID 1 (mirrored) |
|
||||
| **Raw Capacity** | 24TB |
|
||||
| **Usable Capacity** | ~12TB |
|
||||
| **Fault Tolerance** | 1 drive failure |
|
||||
|
||||
### **RAID 1 Benefits**
|
||||
|
||||
```
|
||||
RAID 1 Configuration:
|
||||
┌────────────────┐ ┌────────────────┐
|
||||
│ Drive 1 │ │ Drive 2 │
|
||||
│ (12TB) │◄─► (12TB) │ ← Mirror
|
||||
│ │ │ │
|
||||
│ All data is │ │ Exact copy │
|
||||
│ written to │ │ of Drive 1 │
|
||||
│ both drives │ │ │
|
||||
└────────────────┘ └────────────────┘
|
||||
|
||||
✅ Survives 1 drive failure
|
||||
✅ Fast read performance (can read from either)
|
||||
✅ Simple recovery (just replace failed drive)
|
||||
⚠️ 50% storage efficiency
|
||||
```
|
||||
|
||||
### **Volume Layout**
|
||||
|
||||
```
|
||||
/volume1/ (Calypso - ~12TB usable)
|
||||
│
|
||||
├── /docker/ # Container persistent data
|
||||
│ ├── gitea/
|
||||
│ ├── firefly/
|
||||
│ ├── arr-suite/
|
||||
│ └── ... (dev stacks)
|
||||
│
|
||||
├── /apt-cache/ # APT-Cacher-NG
|
||||
│ └── cache/ # Debian package cache
|
||||
│
|
||||
├── /backups/ # Backup destination
|
||||
│ ├── atlantis/ # Hyper Backup from Atlantis
|
||||
│ └── databases/ # Database dumps
|
||||
│
|
||||
└── /development/ # Development data
|
||||
├── repos/ # Git repositories
|
||||
└── projects/ # Project files
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Other Storage Systems
|
||||
|
||||
### **Guava - AI/ML Workstation**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **Primary** | 1TB NVMe SSD (OS + fast storage) |
|
||||
| **Secondary** | 2x HDD in RAID 1 (~3TB usable) |
|
||||
| **Purpose** | AI model storage, datasets, compute scratch |
|
||||
|
||||
### **Setillo - Monitoring**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **Storage** | 1TB single drive |
|
||||
| **Purpose** | Prometheus metrics, AdGuard data |
|
||||
| **Note** | Non-critical data, can be rebuilt |
|
||||
|
||||
### **Proxmox - VM Host**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **Storage** | ~500GB local SSD |
|
||||
| **Purpose** | VM disk images |
|
||||
| **Backup** | VMs backed up to Atlantis |
|
||||
|
||||
---
|
||||
|
||||
## 📦 Backup Strategy
|
||||
|
||||
### **3-2-1 Rule Implementation**
|
||||
|
||||
| Rule | Implementation | Status |
|
||||
|------|----------------|--------|
|
||||
| **3 Copies** | Original + Calypso + Backblaze | ✅ |
|
||||
| **2 Media Types** | NAS HDDs + Cloud | ✅ |
|
||||
| **1 Offsite** | Backblaze B2 | ✅ |
|
||||
|
||||
### **Backup Flow**
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ ATLANTIS │────►│ CALYPSO │────►│ BACKBLAZE │
|
||||
│ (Primary) │ │ (Local) │ │ B2 │
|
||||
│ │ │ │ │ (Offsite) │
|
||||
│ Original │ │ Hyper │ │ Cloud │
|
||||
│ Data │ │ Backup │ │ Backup │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Immediate < 24 hours < 24 hours
|
||||
Access Recovery Recovery
|
||||
```
|
||||
|
||||
### **Backup Software**
|
||||
|
||||
| Tool | Source | Destination | Schedule |
|
||||
|------|--------|-------------|----------|
|
||||
| **Synology Hyper Backup** | Atlantis | Calypso | Daily |
|
||||
| **Synology Cloud Sync** | Atlantis | Backblaze B2 | Daily |
|
||||
| **Synology Hyper Backup** | Calypso | Backblaze B2 | Weekly |
|
||||
|
||||
### **What Gets Backed Up**
|
||||
|
||||
| Data Type | Priority | Frequency | Retention |
|
||||
|-----------|----------|-----------|-----------|
|
||||
| **Docker configs** | Critical | Daily | 30 days |
|
||||
| **Databases** | Critical | Daily | 30 days |
|
||||
| **Photos (Immich)** | High | Daily | Forever |
|
||||
| **Documents** | High | Daily | 1 year |
|
||||
| **Media library** | Medium | Weekly | Latest only |
|
||||
| **VM snapshots** | Medium | Weekly | 4 versions |
|
||||
| **Logs** | Low | Not backed up | N/A |
|
||||
|
||||
### **Recovery Time Objectives**
|
||||
|
||||
| Scenario | RTO Target | Recovery Method |
|
||||
|----------|------------|-----------------|
|
||||
| Single file recovery | < 1 hour | Hyper Backup restore |
|
||||
| Service recovery | < 4 hours | Docker volume restore |
|
||||
| Full NAS recovery | < 24 hours | Bare metal + B2 restore |
|
||||
| Disaster recovery | < 48 hours | New hardware + B2 restore |
|
||||
|
||||
---
|
||||
|
||||
## 📂 Shared Storage (NFS/SMB)
|
||||
|
||||
### **Network Shares**
|
||||
|
||||
| Share | Protocol | Host | Access | Purpose |
|
||||
|-------|----------|------|--------|---------|
|
||||
| `/media` | SMB | Atlantis | Read-only (most), RW (arr) | Media streaming |
|
||||
| `/photos` | SMB | Atlantis | RW (Immich user) | Photo backup |
|
||||
| `/docker` | NFS | Atlantis | RW (Docker hosts) | Container data |
|
||||
| `/backups` | SMB | Calypso | RW (backup service) | Backup destination |
|
||||
|
||||
### **Docker Volume Mounts**
|
||||
|
||||
Containers access NAS storage via NFS mounts:
|
||||
|
||||
```yaml
|
||||
# Example: Plex accessing media
|
||||
volumes:
|
||||
- /volume1/docker/plex:/config
|
||||
- /volume1/media:/media:ro
|
||||
```
|
||||
|
||||
### **Permission Model**
|
||||
|
||||
```
|
||||
NAS User: docker (UID 1000)
|
||||
├── Owns /volume1/docker/
|
||||
├── Read access to /volume1/media/
|
||||
└── Write access to specific paths
|
||||
|
||||
NAS User: media (UID 1001)
|
||||
├── Write access to /volume1/media/
|
||||
└── Used by *arr suite for downloads
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 Storage Monitoring
|
||||
|
||||
### **Metrics Collected**
|
||||
|
||||
| Metric | Tool | Alert Threshold |
|
||||
|--------|------|-----------------|
|
||||
| Disk usage | Prometheus + Node Exporter | > 85% |
|
||||
| RAID health | Synology DSM | Degraded |
|
||||
| Drive SMART | Synology DSM | Warning/Critical |
|
||||
| I/O latency | Prometheus | > 100ms |
|
||||
| Backup status | Hyper Backup | Failed |
|
||||
|
||||
### **Grafana Dashboard**
|
||||
|
||||
Storage dashboard shows:
|
||||
- Volume utilization trends
|
||||
- I/O throughput
|
||||
- RAID rebuild status
|
||||
- Drive temperatures
|
||||
- Backup completion status
|
||||
|
||||
---
|
||||
|
||||
## 🔮 Storage Expansion Plan
|
||||
|
||||
### **Current Utilization**
|
||||
|
||||
| Host | Used | Total | % Used |
|
||||
|------|------|-------|--------|
|
||||
| Atlantis | ~60TB | 96TB | 62% |
|
||||
| Calypso | ~12TB | 12TB | ~100% |
|
||||
|
||||
### **Future Expansion Options**
|
||||
|
||||
1. **Atlantis**: Already at max capacity (8 bays)
|
||||
- Replace 16TB drives with larger (24TB+) when available
|
||||
- Add expansion unit (DX517)
|
||||
|
||||
2. **Calypso**: At capacity
|
||||
- Replace 12TB drives with 20TB+ drives
|
||||
- Consider migration to larger NAS
|
||||
|
||||
3. **New NAS**: For cold/archive storage
|
||||
- Lower-powered unit for infrequent access
|
||||
- RAID 5 acceptable for archive data
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Maintenance Tasks
|
||||
|
||||
### **Regular Maintenance**
|
||||
|
||||
| Task | Frequency | Procedure |
|
||||
|------|-----------|-----------|
|
||||
| SMART check | Weekly | Review DSM health |
|
||||
| Scrub | Monthly | Synology scheduled task |
|
||||
| Backup verification | Monthly | Test restore of random files |
|
||||
| Capacity review | Quarterly | Plan for growth |
|
||||
|
||||
### **Drive Replacement Procedure**
|
||||
|
||||
1. **Identify failed drive** via DSM notification
|
||||
2. **Order replacement** (same or larger capacity)
|
||||
3. **Hot-swap** failed drive
|
||||
4. **Monitor rebuild** (can take 24-48 hours for large arrays)
|
||||
5. **Verify RAID health** after rebuild completes
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- **[Host Infrastructure](hosts.md)**: Server specifications
|
||||
- **[Security Model](security.md)**: Backup encryption details
|
||||
- **[Network Architecture](networking.md)**: NFS/SMB networking
|
||||
|
||||
---
|
||||
|
||||
*Storage infrastructure is critical. Regular monitoring and proactive maintenance prevent data loss.*
|
||||
Reference in New Issue
Block a user