homelab-optimized/docs/infrastructure/storage.md

# 💾 Storage Systems

**🟡 Intermediate Guide**

This document covers the storage architecture, RAID configurations, backup strategies, and data management practices for the homelab infrastructure.

---

## 🏗️ Storage Architecture Overview

```
┌─────────────────────────────────────────────────────────────────────────────┐
│                         STORAGE INFRASTRUCTURE                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  PRIMARY STORAGE                    BACKUP TARGETS                           │
│  ┌─────────────────────┐           ┌─────────────────────┐                  │
│  │     ATLANTIS        │           │      CALYPSO        │                  │
│  │   Synology NAS      │ ──────►   │   Synology NAS      │                  │
│  │                     │  Hyper    │                     │                  │
│  │   8x 16TB RAID 6    │  Backup   │   2x 12TB RAID 1    │                  │
│  │   ≈96TB usable      │           │   ≈12TB usable      │                  │
│  │                     │           │                     │                  │
│  │   + 2x 480GB NVMe   │           │   + 2x 480GB NVMe   │                  │
│  │   (SSD Cache)       │           │   (SSD Cache)       │                  │
│  └─────────────────────┘           └─────────────────────┘                  │
│            │                                  │                              │
│            │                                  │                              │
│            ▼                                  ▼                              │
│  ┌─────────────────────────────────────────────────────────┐                │
│  │                    BACKBLAZE B2                          │                │
│  │              Cloud Offsite Backup                        │                │
│  │           Encrypted, Versioned Storage                   │                │
│  └─────────────────────────────────────────────────────────┘                │
│                                                                              │
│  SECONDARY STORAGE                                                          │
│  ┌───────────────┐  ┌───────────────┐  ┌───────────────┐                   │
│  │    GUAVA      │  │   SETILLO     │  │   PROXMOX    │                   │
│  │  RAID 1 HDD   │  │  Single 1TB   │  │   Local SSD  │                   │
│  │  + NVMe SSD   │  │               │  │              │                   │
│  └───────────────┘  └───────────────┘  └───────────────┘                   │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘
```

---

## 📊 Storage Summary

| Host | Total Raw | Usable | RAID Level | Purpose |
|------|-----------|--------|------------|---------|
| **Atlantis** | 128TB (8x16TB) | ~96TB | RAID 6 | Primary storage, media |
| **Calypso** | 24TB (2x12TB) | ~12TB | RAID 1 | Backup, development |
| **Guava** | 6TB+ | ~3TB | RAID 1 | AI/ML, compute |
| **Setillo** | 1TB | 1TB | Single | Monitoring |
| **Proxmox** | ~500GB | 500GB | Local SSD | VM storage |

---

## 🏛️ Atlantis - Primary Storage

### **Hardware Configuration**

| Component | Specification |
|-----------|--------------|
| **NAS Model** | Synology DS1823xs+ |
| **Drive Bays** | 8x 3.5" hot-swap |
| **Drives** | 8x Seagate IronWolf Pro 16TB (ST16000NT001) |
| **Cache** | 2x WD Black SN750 480GB NVMe |
| **RAID Level** | RAID 6 (dual parity) |
| **Raw Capacity** | 128TB |
| **Usable Capacity** | ~96TB |
| **Fault Tolerance** | 2 drive failures |

### **RAID 6 Benefits**

```
RAID 6 Configuration:
┌────┬────┬────┬────┬────┬────┬────┬────┐
│ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ P1 │ P2 │  ← Data + Dual Parity
├────┼────┼────┼────┼────┼────┼────┼────┤
│ D1 │ D2 │ D3 │ D4 │ D5 │ P1 │ P2 │ D6 │  ← Parity distributed
├────┼────┼────┼────┼────┼────┼────┼────┤
│ D1 │ D2 │ D3 │ D4 │ P1 │ P2 │ D5 │ D6 │
└────┴────┴────┴────┴────┴────┴────┴────┘

✅ Survives 2 simultaneous drive failures
✅ Good read performance
✅ 6 drives worth of usable space (75% efficiency)
⚠️ Slower writes due to parity calculation
```

### **Volume Layout**

```
/volume1/ (Atlantis - ~96TB usable)
│
├── /docker/              # Container persistent data
│   ├── plex/
│   ├── immich/
│   ├── grafana/
│   └── ... (all stack data)
│
├── /media/               # Media library
│   ├── movies/           # 4K + 1080p movies
│   ├── tv/               # TV series
│   ├── music/            # Music library
│   └── audiobooks/       # Audiobook collection
│
├── /photos/              # Immich photo library
│   ├── library/          # Organized photos
│   └── upload/           # Incoming uploads
│
├── /documents/           # Paperless-NGX
│   ├── consume/          # Incoming documents
│   └── archive/          # Processed documents
│
├── /backups/             # Local backup storage
│   ├── calypso/          # Cross-NAS backups
│   └── vm-snapshots/     # VM backup images
│
└── /archive/             # Long-term cold storage
    └── old-projects/
```

### **NVMe SSD Cache**
- **Type**: Read-write cache
- **Drives**: 2x WD Black SN750 480GB
- **Configuration**: RAID 1 (mirrored for safety)
- **Purpose**: Accelerate frequently accessed data

---

## 🏢 Calypso - Secondary Storage

### **Hardware Configuration**

| Component | Specification |
|-----------|--------------|
| **NAS Model** | Synology DS723+ |
| **Drive Bays** | 2x 3.5" hot-swap |
| **Drives** | 2x Seagate IronWolf Pro 12TB (ST12000NT001) |
| **Cache** | 2x WD Black SN750 480GB NVMe |
| **RAID Level** | RAID 1 (mirrored) |
| **Raw Capacity** | 24TB |
| **Usable Capacity** | ~12TB |
| **Fault Tolerance** | 1 drive failure |

### **RAID 1 Benefits**

```
RAID 1 Configuration:
┌────────────────┐  ┌────────────────┐
│    Drive 1     │  │    Drive 2     │
│    (12TB)      │◄─►    (12TB)      │  ← Mirror
│                │  │                │
│  All data is   │  │  Exact copy    │
│  written to    │  │  of Drive 1    │
│  both drives   │  │                │
└────────────────┘  └────────────────┘

✅ Survives 1 drive failure
✅ Fast read performance (can read from either)
✅ Simple recovery (just replace failed drive)
⚠️ 50% storage efficiency
```

### **Volume Layout**

```
/volume1/ (Calypso - ~12TB usable)
│
├── /docker/              # Container persistent data
│   ├── gitea/
│   ├── firefly/
│   ├── arr-suite/
│   └── ... (dev stacks)
│
├── /apt-cache/           # APT-Cacher-NG
│   └── cache/            # Debian package cache
│
├── /backups/             # Backup destination
│   ├── atlantis/         # Hyper Backup from Atlantis
│   └── databases/        # Database dumps
│
└── /development/         # Development data
    ├── repos/            # Git repositories
    └── projects/         # Project files
```

---

## 🖥️ Other Storage Systems

### **Guava - AI/ML Workstation**

| Component | Specification |
|-----------|--------------|
| **Primary** | 1TB NVMe SSD (OS + fast storage) |
| **Secondary** | 2x HDD in RAID 1 (~3TB usable) |
| **Purpose** | AI model storage, datasets, compute scratch |

### **Setillo - Monitoring**

| Component | Specification |
|-----------|--------------|
| **Storage** | 1TB single drive |
| **Purpose** | Prometheus metrics, AdGuard data |
| **Note** | Non-critical data, can be rebuilt |

### **Proxmox - VM Host**

| Component | Specification |
|-----------|--------------|
| **Storage** | ~500GB local SSD |
| **Purpose** | VM disk images |
| **Backup** | VMs backed up to Atlantis |

---

## 📦 Backup Strategy

### **3-2-1 Rule Implementation**

| Rule | Implementation | Status |
|------|----------------|--------|
| **3 Copies** | Original + Calypso + Backblaze | ✅ |
| **2 Media Types** | NAS HDDs + Cloud | ✅ |
| **1 Offsite** | Backblaze B2 | ✅ |

### **Backup Flow**

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  ATLANTIS   │────►│   CALYPSO   │────►│ BACKBLAZE   │
│  (Primary)  │     │  (Local)    │     │    B2       │
│             │     │             │     │  (Offsite)  │
│  Original   │     │  Hyper      │     │  Cloud      │
│  Data       │     │  Backup     │     │  Backup     │
└─────────────┘     └─────────────┘     └─────────────┘
     │                    │                    │
     │                    │                    │
     ▼                    ▼                    ▼
  Immediate           < 24 hours          < 24 hours
   Access              Recovery            Recovery
```

### **Backup Software**

| Tool | Source | Destination | Schedule |
|------|--------|-------------|----------|
| **Synology Hyper Backup** | Atlantis | Calypso | Daily |
| **Synology Cloud Sync** | Atlantis | Backblaze B2 | Daily |
| **Synology Hyper Backup** | Calypso | Backblaze B2 | Weekly |

### **What Gets Backed Up**

| Data Type | Priority | Frequency | Retention |
|-----------|----------|-----------|-----------|
| **Docker configs** | Critical | Daily | 30 days |
| **Databases** | Critical | Daily | 30 days |
| **Photos (Immich)** | High | Daily | Forever |
| **Documents** | High | Daily | 1 year |
| **Media library** | Medium | Weekly | Latest only |
| **VM snapshots** | Medium | Weekly | 4 versions |
| **Logs** | Low | Not backed up | N/A |

### **Recovery Time Objectives**

| Scenario | RTO Target | Recovery Method |
|----------|------------|-----------------|
| Single file recovery | < 1 hour | Hyper Backup restore |
| Service recovery | < 4 hours | Docker volume restore |
| Full NAS recovery | < 24 hours | Bare metal + B2 restore |
| Disaster recovery | < 48 hours | New hardware + B2 restore |

---

## 📂 Shared Storage (NFS/SMB)

### **Network Shares**

| Share | Protocol | Host | Access | Purpose |
|-------|----------|------|--------|---------|
| `/media` | SMB | Atlantis | Read-only (most), RW (arr) | Media streaming |
| `/photos` | SMB | Atlantis | RW (Immich user) | Photo backup |
| `/docker` | NFS | Atlantis | RW (Docker hosts) | Container data |
| `/backups` | SMB | Calypso | RW (backup service) | Backup destination |

### **Docker Volume Mounts**

Containers access NAS storage via NFS mounts:

```yaml
# Example: Plex accessing media
volumes:
  - /volume1/docker/plex:/config
  - /volume1/media:/media:ro
```

### **Permission Model**

```
NAS User: docker (UID 1000)
├── Owns /volume1/docker/
├── Read access to /volume1/media/
└── Write access to specific paths

NAS User: media (UID 1001)
├── Write access to /volume1/media/
└── Used by *arr suite for downloads
```

---

## 📈 Storage Monitoring

### **Metrics Collected**

| Metric | Tool | Alert Threshold |
|--------|------|-----------------|
| Disk usage | Prometheus + Node Exporter | > 85% |
| RAID health | Synology DSM | Degraded |
| Drive SMART | Synology DSM | Warning/Critical |
| I/O latency | Prometheus | > 100ms |
| Backup status | Hyper Backup | Failed |

### **Grafana Dashboard**

Storage dashboard shows:
- Volume utilization trends
- I/O throughput
- RAID rebuild status
- Drive temperatures
- Backup completion status

---

## 🔮 Storage Expansion Plan

### **Current Utilization**

| Host | Used | Total | % Used |
|------|------|-------|--------|
| Atlantis | ~60TB | 96TB | 62% |
| Calypso | ~12TB | 12TB | ~100% |

### **Future Expansion Options**

1. **Atlantis**: Already at max capacity (8 bays)
   - Replace 16TB drives with larger (24TB+) when available
   - Add expansion unit (DX517)

2. **Calypso**: At capacity
   - Replace 12TB drives with 20TB+ drives
   - Consider migration to larger NAS

3. **New NAS**: For cold/archive storage
   - Lower-powered unit for infrequent access
   - RAID 5 acceptable for archive data

---

## 🛠️ Maintenance Tasks

### **Regular Maintenance**

| Task | Frequency | Procedure |
|------|-----------|-----------|
| SMART check | Weekly | Review DSM health |
| Scrub | Monthly | Synology scheduled task |
| Backup verification | Monthly | Test restore of random files |
| Capacity review | Quarterly | Plan for growth |

### **Drive Replacement Procedure**

1. **Identify failed drive** via DSM notification
2. **Order replacement** (same or larger capacity)
3. **Hot-swap** failed drive
4. **Monitor rebuild** (can take 24-48 hours for large arrays)
5. **Verify RAID health** after rebuild completes

---

## 📚 Related Documentation

- **[Host Infrastructure](hosts.md)**: Server specifications
- **[Security Model](security.md)**: Backup encryption details
- **[Network Architecture](networking.md)**: NFS/SMB networking

---

*Storage infrastructure is critical. Regular monitoring and proactive maintenance prevent data loss.*