394 lines
15 KiB
Markdown
394 lines
15 KiB
Markdown
# 💾 Storage Systems
|
|
|
|
**🟡 Intermediate Guide**
|
|
|
|
This document covers the storage architecture, RAID configurations, backup strategies, and data management practices for the homelab infrastructure.
|
|
|
|
---
|
|
|
|
## 🏗️ Storage Architecture Overview
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────────────┐
|
|
│ STORAGE INFRASTRUCTURE │
|
|
├─────────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ PRIMARY STORAGE BACKUP TARGETS │
|
|
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
|
│ │ ATLANTIS │ │ CALYPSO │ │
|
|
│ │ Synology NAS │ ──────► │ Synology NAS │ │
|
|
│ │ │ Hyper │ │ │
|
|
│ │ 8x 16TB RAID 6 │ Backup │ 2x 12TB RAID 1 │ │
|
|
│ │ ≈96TB usable │ │ ≈12TB usable │ │
|
|
│ │ │ │ │ │
|
|
│ │ + 2x 480GB NVMe │ │ + 2x 480GB NVMe │ │
|
|
│ │ (SSD Cache) │ │ (SSD Cache) │ │
|
|
│ └─────────────────────┘ └─────────────────────┘ │
|
|
│ │ │ │
|
|
│ │ │ │
|
|
│ ▼ ▼ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ BACKBLAZE B2 │ │
|
|
│ │ Cloud Offsite Backup │ │
|
|
│ │ Encrypted, Versioned Storage │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ SECONDARY STORAGE │
|
|
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
|
│ │ GUAVA │ │ SETILLO │ │ PROXMOX │ │
|
|
│ │ RAID 1 HDD │ │ Single 1TB │ │ Local SSD │ │
|
|
│ │ + NVMe SSD │ │ │ │ │ │
|
|
│ └───────────────┘ └───────────────┘ └───────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Storage Summary
|
|
|
|
| Host | Total Raw | Usable | RAID Level | Purpose |
|
|
|------|-----------|--------|------------|---------|
|
|
| **Atlantis** | 128TB (8x16TB) | ~96TB | RAID 6 | Primary storage, media |
|
|
| **Calypso** | 24TB (2x12TB) | ~12TB | RAID 1 | Backup, development |
|
|
| **Guava** | 6TB+ | ~3TB | RAID 1 | AI/ML, compute |
|
|
| **Setillo** | 1TB | 1TB | Single | Monitoring |
|
|
| **Proxmox** | ~500GB | 500GB | Local SSD | VM storage |
|
|
|
|
---
|
|
|
|
## 🏛️ Atlantis - Primary Storage
|
|
|
|
### **Hardware Configuration**
|
|
|
|
| Component | Specification |
|
|
|-----------|--------------|
|
|
| **NAS Model** | Synology DS1823xs+ |
|
|
| **Drive Bays** | 8x 3.5" hot-swap |
|
|
| **Drives** | 8x Seagate IronWolf Pro 16TB (ST16000NT001) |
|
|
| **Cache** | 2x WD Black SN750 480GB NVMe |
|
|
| **RAID Level** | RAID 6 (dual parity) |
|
|
| **Raw Capacity** | 128TB |
|
|
| **Usable Capacity** | ~96TB |
|
|
| **Fault Tolerance** | 2 drive failures |
|
|
|
|
### **RAID 6 Benefits**
|
|
|
|
```
|
|
RAID 6 Configuration:
|
|
┌────┬────┬────┬────┬────┬────┬────┬────┐
|
|
│ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ P1 │ P2 │ ← Data + Dual Parity
|
|
├────┼────┼────┼────┼────┼────┼────┼────┤
|
|
│ D1 │ D2 │ D3 │ D4 │ D5 │ P1 │ P2 │ D6 │ ← Parity distributed
|
|
├────┼────┼────┼────┼────┼────┼────┼────┤
|
|
│ D1 │ D2 │ D3 │ D4 │ P1 │ P2 │ D5 │ D6 │
|
|
└────┴────┴────┴────┴────┴────┴────┴────┘
|
|
|
|
✅ Survives 2 simultaneous drive failures
|
|
✅ Good read performance
|
|
✅ 6 drives worth of usable space (75% efficiency)
|
|
⚠️ Slower writes due to parity calculation
|
|
```
|
|
|
|
### **Volume Layout**
|
|
|
|
```
|
|
/volume1/ (Atlantis - ~96TB usable)
|
|
│
|
|
├── /docker/ # Container persistent data
|
|
│ ├── plex/
|
|
│ ├── immich/
|
|
│ ├── grafana/
|
|
│ └── ... (all stack data)
|
|
│
|
|
├── /media/ # Media library
|
|
│ ├── movies/ # 4K + 1080p movies
|
|
│ ├── tv/ # TV series
|
|
│ ├── music/ # Music library
|
|
│ └── audiobooks/ # Audiobook collection
|
|
│
|
|
├── /photos/ # Immich photo library
|
|
│ ├── library/ # Organized photos
|
|
│ └── upload/ # Incoming uploads
|
|
│
|
|
├── /documents/ # Paperless-NGX
|
|
│ ├── consume/ # Incoming documents
|
|
│ └── archive/ # Processed documents
|
|
│
|
|
├── /backups/ # Local backup storage
|
|
│ ├── calypso/ # Cross-NAS backups
|
|
│ └── vm-snapshots/ # VM backup images
|
|
│
|
|
└── /archive/ # Long-term cold storage
|
|
└── old-projects/
|
|
```
|
|
|
|
### **NVMe SSD Cache**
|
|
- **Type**: Read-write cache
|
|
- **Drives**: 2x WD Black SN750 480GB
|
|
- **Configuration**: RAID 1 (mirrored for safety)
|
|
- **Purpose**: Accelerate frequently accessed data
|
|
|
|
---
|
|
|
|
## 🏢 Calypso - Secondary Storage
|
|
|
|
### **Hardware Configuration**
|
|
|
|
| Component | Specification |
|
|
|-----------|--------------|
|
|
| **NAS Model** | Synology DS723+ |
|
|
| **Drive Bays** | 2x 3.5" hot-swap |
|
|
| **Drives** | 2x Seagate IronWolf Pro 12TB (ST12000NT001) |
|
|
| **Cache** | 2x WD Black SN750 480GB NVMe |
|
|
| **RAID Level** | RAID 1 (mirrored) |
|
|
| **Raw Capacity** | 24TB |
|
|
| **Usable Capacity** | ~12TB |
|
|
| **Fault Tolerance** | 1 drive failure |
|
|
|
|
### **RAID 1 Benefits**
|
|
|
|
```
|
|
RAID 1 Configuration:
|
|
┌────────────────┐ ┌────────────────┐
|
|
│ Drive 1 │ │ Drive 2 │
|
|
│ (12TB) │◄─► (12TB) │ ← Mirror
|
|
│ │ │ │
|
|
│ All data is │ │ Exact copy │
|
|
│ written to │ │ of Drive 1 │
|
|
│ both drives │ │ │
|
|
└────────────────┘ └────────────────┘
|
|
|
|
✅ Survives 1 drive failure
|
|
✅ Fast read performance (can read from either)
|
|
✅ Simple recovery (just replace failed drive)
|
|
⚠️ 50% storage efficiency
|
|
```
|
|
|
|
### **Volume Layout**
|
|
|
|
```
|
|
/volume1/ (Calypso - ~12TB usable)
|
|
│
|
|
├── /docker/ # Container persistent data
|
|
│ ├── gitea/
|
|
│ ├── firefly/
|
|
│ ├── arr-suite/
|
|
│ └── ... (dev stacks)
|
|
│
|
|
├── /apt-cache/ # APT-Cacher-NG
|
|
│ └── cache/ # Debian package cache
|
|
│
|
|
├── /backups/ # Backup destination
|
|
│ ├── atlantis/ # Hyper Backup from Atlantis
|
|
│ └── databases/ # Database dumps
|
|
│
|
|
└── /development/ # Development data
|
|
├── repos/ # Git repositories
|
|
└── projects/ # Project files
|
|
```
|
|
|
|
---
|
|
|
|
## 🖥️ Other Storage Systems
|
|
|
|
### **Guava - AI/ML Workstation**
|
|
|
|
| Component | Specification |
|
|
|-----------|--------------|
|
|
| **Primary** | 1TB NVMe SSD (OS + fast storage) |
|
|
| **Secondary** | 2x HDD in RAID 1 (~3TB usable) |
|
|
| **Purpose** | AI model storage, datasets, compute scratch |
|
|
|
|
### **Setillo - Monitoring**
|
|
|
|
| Component | Specification |
|
|
|-----------|--------------|
|
|
| **Storage** | 1TB single drive |
|
|
| **Purpose** | Prometheus metrics, AdGuard data |
|
|
| **Note** | Non-critical data, can be rebuilt |
|
|
|
|
### **Proxmox - VM Host**
|
|
|
|
| Component | Specification |
|
|
|-----------|--------------|
|
|
| **Storage** | ~500GB local SSD |
|
|
| **Purpose** | VM disk images |
|
|
| **Backup** | VMs backed up to Atlantis |
|
|
|
|
---
|
|
|
|
## 📦 Backup Strategy
|
|
|
|
### **3-2-1 Rule Implementation**
|
|
|
|
| Rule | Implementation | Status |
|
|
|------|----------------|--------|
|
|
| **3 Copies** | Original + Calypso + Backblaze | ✅ |
|
|
| **2 Media Types** | NAS HDDs + Cloud | ✅ |
|
|
| **1 Offsite** | Backblaze B2 | ✅ |
|
|
|
|
### **Backup Flow**
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ ATLANTIS │────►│ CALYPSO │────►│ BACKBLAZE │
|
|
│ (Primary) │ │ (Local) │ │ B2 │
|
|
│ │ │ │ │ (Offsite) │
|
|
│ Original │ │ Hyper │ │ Cloud │
|
|
│ Data │ │ Backup │ │ Backup │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
│ │ │
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
Immediate < 24 hours < 24 hours
|
|
Access Recovery Recovery
|
|
```
|
|
|
|
### **Backup Software**
|
|
|
|
| Tool | Source | Destination | Schedule |
|
|
|------|--------|-------------|----------|
|
|
| **Synology Hyper Backup** | Atlantis | Calypso | Daily |
|
|
| **Synology Cloud Sync** | Atlantis | Backblaze B2 | Daily |
|
|
| **Synology Hyper Backup** | Calypso | Backblaze B2 | Weekly |
|
|
|
|
### **What Gets Backed Up**
|
|
|
|
| Data Type | Priority | Frequency | Retention |
|
|
|-----------|----------|-----------|-----------|
|
|
| **Docker configs** | Critical | Daily | 30 days |
|
|
| **Databases** | Critical | Daily | 30 days |
|
|
| **Photos (Immich)** | High | Daily | Forever |
|
|
| **Documents** | High | Daily | 1 year |
|
|
| **Media library** | Medium | Weekly | Latest only |
|
|
| **VM snapshots** | Medium | Weekly | 4 versions |
|
|
| **Logs** | Low | Not backed up | N/A |
|
|
|
|
### **Recovery Time Objectives**
|
|
|
|
| Scenario | RTO Target | Recovery Method |
|
|
|----------|------------|-----------------|
|
|
| Single file recovery | < 1 hour | Hyper Backup restore |
|
|
| Service recovery | < 4 hours | Docker volume restore |
|
|
| Full NAS recovery | < 24 hours | Bare metal + B2 restore |
|
|
| Disaster recovery | < 48 hours | New hardware + B2 restore |
|
|
|
|
---
|
|
|
|
## 📂 Shared Storage (NFS/SMB)
|
|
|
|
### **Network Shares**
|
|
|
|
| Share | Protocol | Host | Access | Purpose |
|
|
|-------|----------|------|--------|---------|
|
|
| `/media` | SMB | Atlantis | Read-only (most), RW (arr) | Media streaming |
|
|
| `/photos` | SMB | Atlantis | RW (Immich user) | Photo backup |
|
|
| `/docker` | NFS | Atlantis | RW (Docker hosts) | Container data |
|
|
| `/backups` | SMB | Calypso | RW (backup service) | Backup destination |
|
|
|
|
### **Docker Volume Mounts**
|
|
|
|
Containers access NAS storage via NFS mounts:
|
|
|
|
```yaml
|
|
# Example: Plex accessing media
|
|
volumes:
|
|
- /volume1/docker/plex:/config
|
|
- /volume1/media:/media:ro
|
|
```
|
|
|
|
### **Permission Model**
|
|
|
|
```
|
|
NAS User: docker (UID 1000)
|
|
├── Owns /volume1/docker/
|
|
├── Read access to /volume1/media/
|
|
└── Write access to specific paths
|
|
|
|
NAS User: media (UID 1001)
|
|
├── Write access to /volume1/media/
|
|
└── Used by *arr suite for downloads
|
|
```
|
|
|
|
---
|
|
|
|
## 📈 Storage Monitoring
|
|
|
|
### **Metrics Collected**
|
|
|
|
| Metric | Tool | Alert Threshold |
|
|
|--------|------|-----------------|
|
|
| Disk usage | Prometheus + Node Exporter | > 85% |
|
|
| RAID health | Synology DSM | Degraded |
|
|
| Drive SMART | Synology DSM | Warning/Critical |
|
|
| I/O latency | Prometheus | > 100ms |
|
|
| Backup status | Hyper Backup | Failed |
|
|
|
|
### **Grafana Dashboard**
|
|
|
|
Storage dashboard shows:
|
|
- Volume utilization trends
|
|
- I/O throughput
|
|
- RAID rebuild status
|
|
- Drive temperatures
|
|
- Backup completion status
|
|
|
|
---
|
|
|
|
## 🔮 Storage Expansion Plan
|
|
|
|
### **Current Utilization**
|
|
|
|
| Host | Used | Total | % Used |
|
|
|------|------|-------|--------|
|
|
| Atlantis | ~60TB | 96TB | 62% |
|
|
| Calypso | ~12TB | 12TB | ~100% |
|
|
|
|
### **Future Expansion Options**
|
|
|
|
1. **Atlantis**: Already at max capacity (8 bays)
|
|
- Replace 16TB drives with larger (24TB+) when available
|
|
- Add expansion unit (DX517)
|
|
|
|
2. **Calypso**: At capacity
|
|
- Replace 12TB drives with 20TB+ drives
|
|
- Consider migration to larger NAS
|
|
|
|
3. **New NAS**: For cold/archive storage
|
|
- Lower-powered unit for infrequent access
|
|
- RAID 5 acceptable for archive data
|
|
|
|
---
|
|
|
|
## 🛠️ Maintenance Tasks
|
|
|
|
### **Regular Maintenance**
|
|
|
|
| Task | Frequency | Procedure |
|
|
|------|-----------|-----------|
|
|
| SMART check | Weekly | Review DSM health |
|
|
| Scrub | Monthly | Synology scheduled task |
|
|
| Backup verification | Monthly | Test restore of random files |
|
|
| Capacity review | Quarterly | Plan for growth |
|
|
|
|
### **Drive Replacement Procedure**
|
|
|
|
1. **Identify failed drive** via DSM notification
|
|
2. **Order replacement** (same or larger capacity)
|
|
3. **Hot-swap** failed drive
|
|
4. **Monitor rebuild** (can take 24-48 hours for large arrays)
|
|
5. **Verify RAID health** after rebuild completes
|
|
|
|
---
|
|
|
|
## 📚 Related Documentation
|
|
|
|
- **[Host Infrastructure](hosts.md)**: Server specifications
|
|
- **[Security Model](security.md)**: Backup encryption details
|
|
- **[Network Architecture](networking.md)**: NFS/SMB networking
|
|
|
|
---
|
|
|
|
*Storage infrastructure is critical. Regular monitoring and proactive maintenance prevent data loss.*
|