644 lines
17 KiB
Markdown
644 lines
17 KiB
Markdown
# 🚨 Synology NAS Disaster Recovery Guide
|
|
|
|
**🔴 Critical Emergency Procedures**
|
|
|
|
This guide covers critical disaster recovery scenarios specific to Synology NAS systems, with detailed procedures for the DS1823xs+ and related hardware failures. These procedures can save your data and minimize downtime.
|
|
|
|
## 🎯 Critical Scenarios Covered
|
|
|
|
1. **💾 SSD Cache Failure** - Current critical issue with Atlantis
|
|
2. **🔥 Complete NAS Failure** - Hardware replacement procedures
|
|
3. **⚡ Power Surge Damage** - Recovery from electrical damage
|
|
4. **🌊 Water/Physical Damage** - Emergency data extraction
|
|
5. **🔒 Encryption Key Loss** - Encrypted volume recovery
|
|
6. **📦 DSM Corruption** - Operating system recovery
|
|
|
|
---
|
|
|
|
## 💾 SSD Cache Failure Recovery (CURRENT CRITICAL ISSUE)
|
|
|
|
### **🚨 Current Situation: Atlantis DS1823xs+**
|
|
```bash
|
|
# CRITICAL STATUS:
|
|
# - SSD cache corrupted after DSM update
|
|
# - Volume1 is OFFLINE due to cache failure
|
|
# - 2x WD Black SN750 SE 500GB drives affected
|
|
# - All Docker services down
|
|
# - Immediate action required
|
|
|
|
# Symptoms:
|
|
# - Volume1 shows as "Crashed" in Storage Manager
|
|
# - SSD cache shows errors or corruption
|
|
# - Services fail to start
|
|
# - Data appears inaccessible
|
|
```
|
|
|
|
### **⚡ Emergency Recovery Procedure**
|
|
|
|
#### **Step 1: Immediate Assessment (5 minutes)**
|
|
```bash
|
|
# SSH into Atlantis
|
|
ssh admin@atlantis.vish.local
|
|
# or via Tailscale IP
|
|
ssh admin@100.83.230.112
|
|
|
|
# Check system status
|
|
sudo -i
|
|
cat /proc/mdstat
|
|
df -h
|
|
dmesg | tail -50
|
|
|
|
# Check volume status
|
|
synodisk --enum
|
|
synovolume --enum
|
|
```
|
|
|
|
#### **Step 2: Disable SSD Cache (10 minutes)**
|
|
```bash
|
|
# CRITICAL: This will restore Volume1 access
|
|
# Navigate via web interface:
|
|
# 1. DSM > Storage Manager
|
|
# 2. Storage > SSD Cache
|
|
# 3. Select corrupted cache
|
|
# 4. Click "Remove" or "Disable"
|
|
# 5. Confirm removal (data will be preserved)
|
|
|
|
# Alternative via SSH (if web interface fails):
|
|
echo 'Disabling SSD cache via command line...'
|
|
# Note: Exact commands vary by DSM version
|
|
# Consult Synology documentation for CLI cache management
|
|
```
|
|
|
|
#### **Step 3: Verify Volume1 Recovery (5 minutes)**
|
|
```bash
|
|
# Check if Volume1 is back online
|
|
df -h | grep volume1
|
|
ls -la /volume1/
|
|
|
|
# If Volume1 is accessible:
|
|
echo "✅ Volume1 recovered successfully"
|
|
|
|
# If still offline:
|
|
echo "❌ Volume1 still offline - proceed to advanced recovery"
|
|
```
|
|
|
|
#### **Step 4: Emergency Data Backup (30-60 minutes)**
|
|
```bash
|
|
# IMMEDIATELY backup critical data once Volume1 is accessible
|
|
# Priority order:
|
|
|
|
# 1. Docker configurations (highest priority)
|
|
rsync -av /volume1/docker/ /volume2/emergency-backup/docker-$(date +%Y%m%d)/
|
|
tar -czf /volume2/emergency-backup/docker-configs-$(date +%Y%m%d).tar.gz /volume1/docker/
|
|
|
|
# 2. Critical documents
|
|
rsync -av /volume1/documents/ /volume2/emergency-backup/documents-$(date +%Y%m%d)/
|
|
|
|
# 3. Database backups
|
|
find /volume1/docker -name "*backup*" -type f -exec cp {} /volume2/emergency-backup/db-backups/ \;
|
|
|
|
# 4. Configuration files
|
|
cp -r /volume1/homelab/ /volume2/emergency-backup/homelab-$(date +%Y%m%d)/
|
|
|
|
# Verify backup integrity
|
|
echo "Verifying backup integrity..."
|
|
find /volume2/emergency-backup/ -type f -exec md5sum {} \; > /volume2/emergency-backup/checksums-$(date +%Y%m%d).md5
|
|
```
|
|
|
|
#### **Step 5: Remove Failed SSD Drives (15 minutes)**
|
|
```bash
|
|
# Physical removal of corrupted SSD drives
|
|
# 1. Shutdown Atlantis safely
|
|
sudo shutdown -h now
|
|
|
|
# 2. Wait for complete shutdown (LED off)
|
|
# 3. Remove power cable
|
|
# 4. Open NAS case
|
|
# 5. Remove both WD Black SN750 SE drives from M.2 slots
|
|
# 6. Close case and reconnect power
|
|
# 7. Power on and verify system boots normally
|
|
|
|
# After boot, verify no SSD cache references remain
|
|
# DSM > Storage Manager > Storage > SSD Cache
|
|
# Should show "No SSD cache configured"
|
|
```
|
|
|
|
### **🔧 Permanent Solution: New NVMe Installation**
|
|
|
|
#### **Hardware Installation (When New Drives Arrive)**
|
|
```bash
|
|
# New hardware to install:
|
|
# - 2x Crucial P310 1TB (CT1000P310SSD801)
|
|
# - 1x Synology SNV5420-400G
|
|
|
|
# Installation procedure:
|
|
# 1. Power down Atlantis
|
|
# 2. Install Crucial P310 drives in M.2 slots 1 & 2
|
|
# 3. Install Synology SNV5420 in E10M20-T1 card M.2 slot
|
|
# 4. Power on and wait for drive recognition
|
|
```
|
|
|
|
#### **007revad Script Configuration**
|
|
```bash
|
|
# After hardware installation, run 007revad scripts
|
|
cd /volume1/homelab/synology_scripts/
|
|
|
|
# 1. Enable M.2 volume support
|
|
cd 007revad_enable_m2/
|
|
sudo ./syno_enable_m2_volume.sh
|
|
echo "✅ M.2 volume support enabled"
|
|
|
|
# 2. Create M.2 volumes
|
|
cd ../007revad_m2_volume/
|
|
sudo ./syno_m2_volume.sh
|
|
echo "✅ M.2 volumes created"
|
|
|
|
# 3. Update HDD database (for IronWolf Pro drives)
|
|
cd ../007revad_hdd_db/
|
|
sudo ./syno_hdd_db.sh
|
|
echo "✅ HDD database updated"
|
|
```
|
|
|
|
#### **New Cache Configuration**
|
|
```bash
|
|
# Configure new SSD cache with Crucial P310 drives
|
|
# DSM > Storage Manager > Storage > SSD Cache
|
|
|
|
# Recommended configuration:
|
|
# - Cache Type: Read-Write cache
|
|
# - RAID Type: RAID 1 (for redundancy)
|
|
# - Drives: Both Crucial P310 1TB drives
|
|
# - Skip data consistency check: NO (ensure integrity)
|
|
|
|
# Synology SNV5420 usage:
|
|
# - Use as separate high-performance volume
|
|
# - Ideal for Docker containers requiring high IOPS
|
|
# - Configure as Volume3 for critical services
|
|
```
|
|
|
|
---
|
|
|
|
## 🔥 Complete NAS Hardware Failure
|
|
|
|
### **Emergency Data Extraction**
|
|
```bash
|
|
# If NAS won't boot but drives are intact
|
|
# Use Linux PC for data recovery
|
|
|
|
# 1. Remove drives from failed NAS
|
|
# 2. Connect drives to Linux system via USB adapters
|
|
# 3. Install mdadm for RAID recovery
|
|
|
|
sudo apt update && sudo apt install mdadm
|
|
|
|
# 4. Scan for RAID arrays
|
|
sudo mdadm --assemble --scan
|
|
sudo mdadm --detail --scan
|
|
|
|
# 5. Mount recovered volumes
|
|
mkdir -p /mnt/synology-recovery
|
|
sudo mount /dev/md0 /mnt/synology-recovery
|
|
|
|
# 6. Copy critical data
|
|
rsync -av /mnt/synology-recovery/docker/ ~/synology-recovery/docker/
|
|
rsync -av /mnt/synology-recovery/documents/ ~/synology-recovery/documents/
|
|
```
|
|
|
|
### **NAS Replacement Procedure**
|
|
```bash
|
|
# Complete DS1823xs+ replacement
|
|
|
|
# Step 1: Order identical replacement
|
|
# - Same model: DS1823xs+
|
|
# - Same RAM configuration: 32GB DDR4 ECC
|
|
# - Same expansion cards: E10M20-T1
|
|
|
|
# Step 2: Drive migration
|
|
# - Remove all drives from old unit
|
|
# - Note drive bay positions (critical!)
|
|
# - Install drives in new unit in EXACT same order
|
|
# - Install M.2 drives in same slots
|
|
|
|
# Step 3: First boot
|
|
# - Power on new NAS
|
|
# - DSM will detect existing configuration
|
|
# - Follow migration wizard
|
|
# - Do NOT initialize drives (will erase data)
|
|
|
|
# Step 4: Configuration restoration
|
|
# - Restore DSM configuration from backup
|
|
# - Reinstall packages and applications
|
|
# - Run 007revad scripts
|
|
# - Verify all services operational
|
|
```
|
|
|
|
---
|
|
|
|
## ⚡ Power Surge Recovery
|
|
|
|
### **Assessment Procedure**
|
|
```bash
|
|
# After power surge or electrical event
|
|
|
|
# Step 1: Visual inspection
|
|
# - Check for burn marks on power adapter
|
|
# - Inspect NAS case for damage
|
|
# - Look for LED indicators
|
|
|
|
# Step 2: Controlled power-on test
|
|
# - Use different power outlet
|
|
# - Connect only essential cables
|
|
# - Power on and observe boot sequence
|
|
|
|
# Step 3: Component testing
|
|
# If NAS powers on:
|
|
# - Check all drive recognition
|
|
# - Verify network connectivity
|
|
# - Test all expansion cards
|
|
|
|
# If NAS doesn't power on:
|
|
# - Try different power adapter (if available)
|
|
# - Check fuses in power adapter
|
|
# - Consider professional repair
|
|
```
|
|
|
|
### **Data Protection After Surge**
|
|
```bash
|
|
# If NAS boots but shows errors:
|
|
|
|
# 1. Immediate backup
|
|
# Priority: Get data off potentially damaged system
|
|
rsync -av /volume1/critical/ /external-backup/
|
|
|
|
# 2. Drive health check
|
|
# Check all drives for damage
|
|
sudo smartctl -a /dev/sda
|
|
sudo smartctl -a /dev/sdb
|
|
# Repeat for all drives
|
|
|
|
# 3. Memory test
|
|
# Run memory diagnostic if available
|
|
# Check for ECC errors in logs
|
|
|
|
# 4. Replace damaged components
|
|
# Order replacements for any failed components
|
|
# Consider UPS installation to prevent future damage
|
|
```
|
|
|
|
---
|
|
|
|
## 🌊 Water/Physical Damage Recovery
|
|
|
|
### **Emergency Response (First 30 minutes)**
|
|
```bash
|
|
# If NAS exposed to water or physical damage:
|
|
|
|
# IMMEDIATE ACTIONS:
|
|
# 1. POWER OFF IMMEDIATELY - do not attempt to boot
|
|
# 2. Disconnect all cables
|
|
# 3. Remove drives if possible
|
|
# 4. Do not attempt to power on
|
|
|
|
# Drive preservation:
|
|
# - Place drives in anti-static bags
|
|
# - Store in dry, cool location
|
|
# - Do not attempt to clean or dry
|
|
# - Contact professional recovery service if needed
|
|
```
|
|
|
|
### **Professional Recovery Decision**
|
|
```bash
|
|
# When to contact professional data recovery:
|
|
# - Water damage to drives
|
|
# - Physical damage to drive enclosures
|
|
# - Clicking or grinding noises from drives
|
|
# - Drives not recognized by any system
|
|
# - Critical data with no backup
|
|
|
|
# Professional services:
|
|
# - DriveSavers: 1-800-440-1904
|
|
# - Ontrack: 1-800-872-2599
|
|
# - Secure Data Recovery: 1-800-388-1266
|
|
|
|
# Cost considerations:
|
|
# - $500-$5000+ depending on damage
|
|
# - Success not guaranteed
|
|
# - Weigh cost vs. data value
|
|
```
|
|
|
|
---
|
|
|
|
## 🔒 Encryption Key Recovery
|
|
|
|
### **Encrypted Volume Access**
|
|
```bash
|
|
# If encryption key is lost or corrupted:
|
|
|
|
# Step 1: Locate backup keys
|
|
# Check these locations:
|
|
# - Password manager (Vaultwarden)
|
|
# - Physical key backup (if created)
|
|
# - Email notifications from Synology
|
|
# - Configuration backup files
|
|
|
|
# Step 2: Key recovery attempt
|
|
# DSM > Control Panel > Shared Folder
|
|
# Select encrypted folder > Edit > Security
|
|
# Try "Recover" option with backup key
|
|
|
|
# Step 3: If no backup key exists:
|
|
# Data is likely unrecoverable without professional help
|
|
# Synology uses strong encryption - no backdoors
|
|
# Consider professional cryptographic recovery services
|
|
```
|
|
|
|
### **Prevention for Future**
|
|
```bash
|
|
# Create encryption key backup NOW:
|
|
# 1. DSM > Control Panel > Shared Folder
|
|
# 2. Select encrypted folder > Edit > Security
|
|
# 3. Export encryption key
|
|
# 4. Store in multiple secure locations:
|
|
# - Password manager
|
|
# - Physical printout in safe
|
|
# - Encrypted cloud storage
|
|
# - Secondary NAS location
|
|
```
|
|
|
|
---
|
|
|
|
## 📦 DSM Operating System Recovery
|
|
|
|
### **DSM Corruption Recovery**
|
|
```bash
|
|
# If DSM won't boot or is corrupted:
|
|
|
|
# Step 1: Download DSM installer
|
|
# From Synology website:
|
|
# - Find your exact model (DS1823xs+)
|
|
# - Download latest DSM .pat file
|
|
# - Save to computer
|
|
|
|
# Step 2: Synology Assistant recovery
|
|
# 1. Install Synology Assistant on computer
|
|
# 2. Connect NAS and computer to same network
|
|
# 3. Power on NAS while holding reset button
|
|
# 4. Release reset when power LED blinks orange
|
|
# 5. Use Synology Assistant to reinstall DSM
|
|
|
|
# Step 3: Configuration restoration
|
|
# After DSM reinstall:
|
|
# - Restore from configuration backup
|
|
# - Reinstall packages
|
|
# - Reconfigure services
|
|
# - Run 007revad scripts
|
|
```
|
|
|
|
### **Manual DSM Installation**
|
|
```bash
|
|
# If Synology Assistant fails:
|
|
|
|
# 1. Access recovery mode
|
|
# - Power off NAS
|
|
# - Hold reset button while powering on
|
|
# - Keep holding until power LED blinks orange
|
|
# - Release reset button
|
|
|
|
# 2. Web interface recovery
|
|
# - Open browser to NAS IP address
|
|
# - Should show recovery interface
|
|
# - Upload DSM .pat file
|
|
# - Follow installation wizard
|
|
|
|
# 3. Data preservation
|
|
# - Choose "Keep existing data" if option appears
|
|
# - Do not format drives unless absolutely necessary
|
|
# - Existing volumes should be preserved
|
|
```
|
|
|
|
---
|
|
|
|
## 🛠️ 007revad Scripts for Disaster Recovery
|
|
|
|
### **Post-Recovery Script Execution**
|
|
```bash
|
|
# After any hardware replacement or DSM reinstall:
|
|
|
|
# 1. Download/update scripts
|
|
cd /volume1/homelab/synology_scripts/
|
|
git pull origin main # Update to latest versions
|
|
|
|
# 2. HDD Database Update (for IronWolf Pro drives)
|
|
cd 007revad_hdd_db/
|
|
sudo ./syno_hdd_db.sh
|
|
# Ensures Seagate IronWolf Pro drives are properly recognized
|
|
# Prevents compatibility warnings
|
|
# Enables full SMART monitoring
|
|
|
|
# 3. Enable M.2 Volume Support
|
|
cd ../007revad_enable_m2/
|
|
sudo ./syno_enable_m2_volume.sh
|
|
# Re-enables M.2 volume creation after DSM updates
|
|
# Required after any DSM reinstall
|
|
# Fixes DSM limitations on M.2 usage
|
|
|
|
# 4. Create M.2 Volumes
|
|
cd ../007revad_m2_volume/
|
|
sudo ./syno_m2_volume.sh
|
|
# Creates storage volumes on M.2 drives
|
|
# Allows M.2 drives to be used for more than just cache
|
|
# Essential for high-performance storage setup
|
|
```
|
|
|
|
### **Script Automation for Recovery**
|
|
```bash
|
|
# Create automated recovery script
|
|
cat > /volume1/homelab/scripts/post-recovery-setup.sh << 'EOF'
|
|
#!/bin/bash
|
|
# Post-disaster recovery automation script
|
|
|
|
echo "🚀 Starting post-recovery setup..."
|
|
|
|
# Update 007revad scripts
|
|
cd /volume1/homelab/synology_scripts/
|
|
git pull origin main
|
|
|
|
# Run HDD database update
|
|
echo "📀 Updating HDD database..."
|
|
cd 007revad_hdd_db/
|
|
sudo ./syno_hdd_db.sh
|
|
|
|
# Enable M.2 volumes
|
|
echo "💾 Enabling M.2 volume support..."
|
|
cd ../007revad_enable_m2/
|
|
sudo ./syno_enable_m2_volume.sh
|
|
|
|
# Create M.2 volumes
|
|
echo "🔧 Creating M.2 volumes..."
|
|
cd ../007revad_m2_volume/
|
|
sudo ./syno_m2_volume.sh
|
|
|
|
# Restart Docker services
|
|
echo "🐳 Restarting Docker services..."
|
|
sudo systemctl restart docker
|
|
|
|
# Verify services
|
|
echo "✅ Verifying critical services..."
|
|
docker ps | grep -E "(plex|grafana|vaultwarden)"
|
|
|
|
echo "🎉 Post-recovery setup complete!"
|
|
EOF
|
|
|
|
chmod +x /volume1/homelab/scripts/post-recovery-setup.sh
|
|
```
|
|
|
|
---
|
|
|
|
## 📋 Recovery Checklists
|
|
|
|
### **🚨 SSD Cache Failure Checklist**
|
|
```bash
|
|
☐ SSH access to NAS confirmed
|
|
☐ Volume status assessed
|
|
☐ SSD cache disabled/removed
|
|
☐ Volume1 accessibility verified
|
|
☐ Emergency backup completed
|
|
☐ Failed SSD drives physically removed
|
|
☐ System stability confirmed
|
|
☐ New drives ordered (if needed)
|
|
☐ 007revad scripts prepared
|
|
☐ Recovery procedure documented
|
|
```
|
|
|
|
### **🔥 Complete NAS Failure Checklist**
|
|
```bash
|
|
☐ Damage assessment completed
|
|
☐ Drives safely removed
|
|
☐ Drive order documented
|
|
☐ Replacement NAS ordered
|
|
☐ Data recovery attempted (if needed)
|
|
☐ New NAS configured
|
|
☐ Drives installed in correct order
|
|
☐ Configuration restored
|
|
☐ 007revad scripts executed
|
|
☐ All services verified operational
|
|
```
|
|
|
|
### **⚡ Power Surge Recovery Checklist**
|
|
```bash
|
|
☐ Visual damage inspection completed
|
|
☐ Power adapter tested/replaced
|
|
☐ Controlled power-on test performed
|
|
☐ Drive health checks completed
|
|
☐ Memory diagnostics run
|
|
☐ Network connectivity verified
|
|
☐ UPS installation planned
|
|
☐ Surge protection upgraded
|
|
☐ Insurance claim filed (if applicable)
|
|
```
|
|
|
|
---
|
|
|
|
## 🚨 Emergency Contacts & Resources
|
|
|
|
### **Professional Data Recovery Services**
|
|
```bash
|
|
# DriveSavers (24/7 emergency service)
|
|
Phone: 1-800-440-1904
|
|
Web: https://www.drivesavers.com
|
|
Specialties: RAID, NAS, enterprise storage
|
|
|
|
# Ontrack Data Recovery
|
|
Phone: 1-800-872-2599
|
|
Web: https://www.ontrack.com
|
|
Specialties: Synology NAS, RAID arrays
|
|
|
|
# Secure Data Recovery Services
|
|
Phone: 1-800-388-1266
|
|
Web: https://www.securedatarecovery.com
|
|
Specialties: Water damage, physical damage
|
|
```
|
|
|
|
### **Synology Support**
|
|
```bash
|
|
# Synology Technical Support
|
|
Phone: 1-425-952-7900 (US)
|
|
Email: support@synology.com
|
|
Web: https://www.synology.com/support
|
|
Hours: 24/7 for critical issues
|
|
|
|
# Synology Community
|
|
Forum: https://community.synology.com
|
|
Reddit: r/synology
|
|
Discord: Synology Community Server
|
|
```
|
|
|
|
### **Hardware Vendors**
|
|
```bash
|
|
# Seagate Support (IronWolf Pro drives)
|
|
Phone: 1-800-732-4283
|
|
Web: https://www.seagate.com/support/
|
|
Warranty: https://www.seagate.com/support/warranty-and-replacements/
|
|
|
|
# Crucial Support (P310 SSDs)
|
|
Phone: 1-800-336-8896
|
|
Web: https://www.crucial.com/support
|
|
Warranty: https://www.crucial.com/support/warranty
|
|
```
|
|
|
|
---
|
|
|
|
## 🔄 Prevention & Monitoring
|
|
|
|
### **Proactive Monitoring Setup**
|
|
```bash
|
|
# Set up monitoring to prevent disasters:
|
|
|
|
# 1. SMART monitoring for all drives
|
|
# DSM > Storage Manager > Storage > HDD/SSD
|
|
# Enable SMART test scheduling
|
|
|
|
# 2. Temperature monitoring
|
|
# Install temperature sensors
|
|
# Set up alerts for overheating
|
|
|
|
# 3. UPS monitoring
|
|
# Install Network UPS Tools (NUT)
|
|
# Configure automatic shutdown
|
|
|
|
# 4. Backup verification
|
|
# Automated backup integrity checks
|
|
# Regular restore testing
|
|
```
|
|
|
|
### **Regular Maintenance Schedule**
|
|
```bash
|
|
# Monthly tasks:
|
|
☐ Check drive health (SMART status)
|
|
☐ Verify backup integrity
|
|
☐ Test UPS functionality
|
|
☐ Update DSM and packages
|
|
☐ Run 007revad scripts if needed
|
|
|
|
# Quarterly tasks:
|
|
☐ Full system backup
|
|
☐ Configuration export
|
|
☐ Hardware inspection
|
|
☐ Update disaster recovery documentation
|
|
☐ Test recovery procedures
|
|
|
|
# Annually:
|
|
☐ Replace UPS batteries
|
|
☐ Review warranty status
|
|
☐ Update emergency contacts
|
|
☐ Disaster recovery drill
|
|
☐ Insurance policy review
|
|
```
|
|
|
|
---
|
|
|
|
**💡 Critical Reminder**: The current SSD cache failure on Atlantis requires immediate attention. Follow the emergency recovery procedure above to restore Volume1 access and prevent data loss.
|
|
|
|
**🔄 Update Status**: This document should be updated after resolving the current cache failure and installing the new Crucial P310 and Synology SNV5420 drives.
|
|
|
|
**📞 Emergency Protocol**: If you cannot resolve issues using this guide, contact professional data recovery services immediately. Time is critical for data preservation. |