Files
homelab-optimized/docs/infrastructure/storage.md
Gitea Mirror Bot 72afe8052c
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m0s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-20 00:58:22 UTC
2026-04-20 00:58:22 +00:00

394 lines
15 KiB
Markdown

# 💾 Storage Systems
**🟡 Intermediate Guide**
This document covers the storage architecture, RAID configurations, backup strategies, and data management practices for the homelab infrastructure.
---
## 🏗️ Storage Architecture Overview
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ STORAGE INFRASTRUCTURE │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ PRIMARY STORAGE BACKUP TARGETS │
│ ┌─────────────────────┐ ┌─────────────────────┐ │
│ │ ATLANTIS │ │ CALYPSO │ │
│ │ Synology NAS │ ──────► │ Synology NAS │ │
│ │ │ Hyper │ │ │
│ │ 8x 16TB RAID 6 │ Backup │ 2x 12TB RAID 1 │ │
│ │ ≈96TB usable │ │ ≈12TB usable │ │
│ │ │ │ │ │
│ │ + 2x 480GB NVMe │ │ + 2x 480GB NVMe │ │
│ │ (SSD Cache) │ │ (SSD Cache) │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ BACKBLAZE B2 │ │
│ │ Cloud Offsite Backup │ │
│ │ Encrypted, Versioned Storage │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ SECONDARY STORAGE │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ GUAVA │ │ SETILLO │ │ PROXMOX │ │
│ │ RAID 1 HDD │ │ Single 1TB │ │ Local SSD │ │
│ │ + NVMe SSD │ │ │ │ │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
```
---
## 📊 Storage Summary
| Host | Total Raw | Usable | RAID Level | Purpose |
|------|-----------|--------|------------|---------|
| **Atlantis** | 128TB (8x16TB) | ~96TB | RAID 6 | Primary storage, media |
| **Calypso** | 24TB (2x12TB) | ~12TB | RAID 1 | Backup, development |
| **Guava** | 6TB+ | ~3TB | RAID 1 | AI/ML, compute |
| **Setillo** | 1TB | 1TB | Single | Monitoring |
| **Proxmox** | ~500GB | 500GB | Local SSD | VM storage |
---
## 🏛️ Atlantis - Primary Storage
### **Hardware Configuration**
| Component | Specification |
|-----------|--------------|
| **NAS Model** | Synology DS1823xs+ |
| **Drive Bays** | 8x 3.5" hot-swap |
| **Drives** | 8x Seagate IronWolf Pro 16TB (ST16000NT001) |
| **Cache** | 2x WD Black SN750 480GB NVMe |
| **RAID Level** | RAID 6 (dual parity) |
| **Raw Capacity** | 128TB |
| **Usable Capacity** | ~96TB |
| **Fault Tolerance** | 2 drive failures |
### **RAID 6 Benefits**
```
RAID 6 Configuration:
┌────┬────┬────┬────┬────┬────┬────┬────┐
│ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ P1 │ P2 │ ← Data + Dual Parity
├────┼────┼────┼────┼────┼────┼────┼────┤
│ D1 │ D2 │ D3 │ D4 │ D5 │ P1 │ P2 │ D6 │ ← Parity distributed
├────┼────┼────┼────┼────┼────┼────┼────┤
│ D1 │ D2 │ D3 │ D4 │ P1 │ P2 │ D5 │ D6 │
└────┴────┴────┴────┴────┴────┴────┴────┘
✅ Survives 2 simultaneous drive failures
✅ Good read performance
✅ 6 drives worth of usable space (75% efficiency)
⚠️ Slower writes due to parity calculation
```
### **Volume Layout**
```
/volume1/ (Atlantis - ~96TB usable)
├── /docker/ # Container persistent data
│ ├── plex/
│ ├── immich/
│ ├── grafana/
│ └── ... (all stack data)
├── /media/ # Media library
│ ├── movies/ # 4K + 1080p movies
│ ├── tv/ # TV series
│ ├── music/ # Music library
│ └── audiobooks/ # Audiobook collection
├── /photos/ # Immich photo library
│ ├── library/ # Organized photos
│ └── upload/ # Incoming uploads
├── /documents/ # Paperless-NGX
│ ├── consume/ # Incoming documents
│ └── archive/ # Processed documents
├── /backups/ # Local backup storage
│ ├── calypso/ # Cross-NAS backups
│ └── vm-snapshots/ # VM backup images
└── /archive/ # Long-term cold storage
└── old-projects/
```
### **NVMe SSD Cache**
- **Type**: Read-write cache
- **Drives**: 2x WD Black SN750 480GB
- **Configuration**: RAID 1 (mirrored for safety)
- **Purpose**: Accelerate frequently accessed data
---
## 🏢 Calypso - Secondary Storage
### **Hardware Configuration**
| Component | Specification |
|-----------|--------------|
| **NAS Model** | Synology DS723+ |
| **Drive Bays** | 2x 3.5" hot-swap |
| **Drives** | 2x Seagate IronWolf Pro 12TB (ST12000NT001) |
| **Cache** | 2x WD Black SN750 480GB NVMe |
| **RAID Level** | RAID 1 (mirrored) |
| **Raw Capacity** | 24TB |
| **Usable Capacity** | ~12TB |
| **Fault Tolerance** | 1 drive failure |
### **RAID 1 Benefits**
```
RAID 1 Configuration:
┌────────────────┐ ┌────────────────┐
│ Drive 1 │ │ Drive 2 │
│ (12TB) │◄─► (12TB) │ ← Mirror
│ │ │ │
│ All data is │ │ Exact copy │
│ written to │ │ of Drive 1 │
│ both drives │ │ │
└────────────────┘ └────────────────┘
✅ Survives 1 drive failure
✅ Fast read performance (can read from either)
✅ Simple recovery (just replace failed drive)
⚠️ 50% storage efficiency
```
### **Volume Layout**
```
/volume1/ (Calypso - ~12TB usable)
├── /docker/ # Container persistent data
│ ├── gitea/
│ ├── firefly/
│ ├── arr-suite/
│ └── ... (dev stacks)
├── /apt-cache/ # APT-Cacher-NG
│ └── cache/ # Debian package cache
├── /backups/ # Backup destination
│ ├── atlantis/ # Hyper Backup from Atlantis
│ └── databases/ # Database dumps
└── /development/ # Development data
├── repos/ # Git repositories
└── projects/ # Project files
```
---
## 🖥️ Other Storage Systems
### **Guava - AI/ML Workstation**
| Component | Specification |
|-----------|--------------|
| **Primary** | 1TB NVMe SSD (OS + fast storage) |
| **Secondary** | 2x HDD in RAID 1 (~3TB usable) |
| **Purpose** | AI model storage, datasets, compute scratch |
### **Setillo - Monitoring**
| Component | Specification |
|-----------|--------------|
| **Storage** | 1TB single drive |
| **Purpose** | Prometheus metrics, AdGuard data |
| **Note** | Non-critical data, can be rebuilt |
### **Proxmox - VM Host**
| Component | Specification |
|-----------|--------------|
| **Storage** | ~500GB local SSD |
| **Purpose** | VM disk images |
| **Backup** | VMs backed up to Atlantis |
---
## 📦 Backup Strategy
### **3-2-1 Rule Implementation**
| Rule | Implementation | Status |
|------|----------------|--------|
| **3 Copies** | Original + Calypso + Backblaze | ✅ |
| **2 Media Types** | NAS HDDs + Cloud | ✅ |
| **1 Offsite** | Backblaze B2 | ✅ |
### **Backup Flow**
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ ATLANTIS │────►│ CALYPSO │────►│ BACKBLAZE │
│ (Primary) │ │ (Local) │ │ B2 │
│ │ │ │ │ (Offsite) │
│ Original │ │ Hyper │ │ Cloud │
│ Data │ │ Backup │ │ Backup │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
│ │ │
▼ ▼ ▼
Immediate < 24 hours < 24 hours
Access Recovery Recovery
```
### **Backup Software**
| Tool | Source | Destination | Schedule |
|------|--------|-------------|----------|
| **Synology Hyper Backup** | Atlantis | Calypso | Daily |
| **Synology Cloud Sync** | Atlantis | Backblaze B2 | Daily |
| **Synology Hyper Backup** | Calypso | Backblaze B2 | Weekly |
### **What Gets Backed Up**
| Data Type | Priority | Frequency | Retention |
|-----------|----------|-----------|-----------|
| **Docker configs** | Critical | Daily | 30 days |
| **Databases** | Critical | Daily | 30 days |
| **Photos (Immich)** | High | Daily | Forever |
| **Documents** | High | Daily | 1 year |
| **Media library** | Medium | Weekly | Latest only |
| **VM snapshots** | Medium | Weekly | 4 versions |
| **Logs** | Low | Not backed up | N/A |
### **Recovery Time Objectives**
| Scenario | RTO Target | Recovery Method |
|----------|------------|-----------------|
| Single file recovery | < 1 hour | Hyper Backup restore |
| Service recovery | < 4 hours | Docker volume restore |
| Full NAS recovery | < 24 hours | Bare metal + B2 restore |
| Disaster recovery | < 48 hours | New hardware + B2 restore |
---
## 📂 Shared Storage (NFS/SMB)
### **Network Shares**
| Share | Protocol | Host | Access | Purpose |
|-------|----------|------|--------|---------|
| `/media` | SMB | Atlantis | Read-only (most), RW (arr) | Media streaming |
| `/photos` | SMB | Atlantis | RW (Immich user) | Photo backup |
| `/docker` | NFS | Atlantis | RW (Docker hosts) | Container data |
| `/backups` | SMB | Calypso | RW (backup service) | Backup destination |
### **Docker Volume Mounts**
Containers access NAS storage via NFS mounts:
```yaml
# Example: Plex accessing media
volumes:
- /volume1/docker/plex:/config
- /volume1/media:/media:ro
```
### **Permission Model**
```
NAS User: docker (UID 1000)
├── Owns /volume1/docker/
├── Read access to /volume1/media/
└── Write access to specific paths
NAS User: media (UID 1001)
├── Write access to /volume1/media/
└── Used by *arr suite for downloads
```
---
## 📈 Storage Monitoring
### **Metrics Collected**
| Metric | Tool | Alert Threshold |
|--------|------|-----------------|
| Disk usage | Prometheus + Node Exporter | > 85% |
| RAID health | Synology DSM | Degraded |
| Drive SMART | Synology DSM | Warning/Critical |
| I/O latency | Prometheus | > 100ms |
| Backup status | Hyper Backup | Failed |
### **Grafana Dashboard**
Storage dashboard shows:
- Volume utilization trends
- I/O throughput
- RAID rebuild status
- Drive temperatures
- Backup completion status
---
## 🔮 Storage Expansion Plan
### **Current Utilization**
| Host | Used | Total | % Used |
|------|------|-------|--------|
| Atlantis | ~60TB | 96TB | 62% |
| Calypso | ~12TB | 12TB | ~100% |
### **Future Expansion Options**
1. **Atlantis**: Already at max capacity (8 bays)
- Replace 16TB drives with larger (24TB+) when available
- Add expansion unit (DX517)
2. **Calypso**: At capacity
- Replace 12TB drives with 20TB+ drives
- Consider migration to larger NAS
3. **New NAS**: For cold/archive storage
- Lower-powered unit for infrequent access
- RAID 5 acceptable for archive data
---
## 🛠️ Maintenance Tasks
### **Regular Maintenance**
| Task | Frequency | Procedure |
|------|-----------|-----------|
| SMART check | Weekly | Review DSM health |
| Scrub | Monthly | Synology scheduled task |
| Backup verification | Monthly | Test restore of random files |
| Capacity review | Quarterly | Plan for growth |
### **Drive Replacement Procedure**
1. **Identify failed drive** via DSM notification
2. **Order replacement** (same or larger capacity)
3. **Hot-swap** failed drive
4. **Monitor rebuild** (can take 24-48 hours for large arrays)
5. **Verify RAID health** after rebuild completes
---
## 📚 Related Documentation
- **[Host Infrastructure](hosts.md)**: Server specifications
- **[Security Model](security.md)**: Backup encryption details
- **[Network Architecture](networking.md)**: NFS/SMB networking
---
*Storage infrastructure is critical. Regular monitoring and proactive maintenance prevent data loss.*