# 🚨 Synology NAS Disaster Recovery Guide **🔴 Critical Emergency Procedures** This guide covers critical disaster recovery scenarios specific to Synology NAS systems, with detailed procedures for the DS1823xs+ and related hardware failures. These procedures can save your data and minimize downtime. ## 🎯 Critical Scenarios Covered 1. **💾 SSD Cache Failure** - Current critical issue with Atlantis 2. **🔥 Complete NAS Failure** - Hardware replacement procedures 3. **⚡ Power Surge Damage** - Recovery from electrical damage 4. **🌊 Water/Physical Damage** - Emergency data extraction 5. **🔒 Encryption Key Loss** - Encrypted volume recovery 6. **📦 DSM Corruption** - Operating system recovery --- ## 💾 SSD Cache Failure Recovery (CURRENT CRITICAL ISSUE) ### **🚨 Current Situation: Atlantis DS1823xs+** ```bash # CRITICAL STATUS: # - SSD cache corrupted after DSM update # - Volume1 is OFFLINE due to cache failure # - 2x WD Black SN750 SE 500GB drives affected # - All Docker services down # - Immediate action required # Symptoms: # - Volume1 shows as "Crashed" in Storage Manager # - SSD cache shows errors or corruption # - Services fail to start # - Data appears inaccessible ``` ### **⚡ Emergency Recovery Procedure** #### **Step 1: Immediate Assessment (5 minutes)** ```bash # SSH into Atlantis ssh admin@atlantis.vish.local # or via Tailscale IP ssh admin@100.83.230.112 # Check system status sudo -i cat /proc/mdstat df -h dmesg | tail -50 # Check volume status synodisk --enum synovolume --enum ``` #### **Step 2: Disable SSD Cache (10 minutes)** ```bash # CRITICAL: This will restore Volume1 access # Navigate via web interface: # 1. DSM > Storage Manager # 2. Storage > SSD Cache # 3. Select corrupted cache # 4. Click "Remove" or "Disable" # 5. Confirm removal (data will be preserved) # Alternative via SSH (if web interface fails): echo 'Disabling SSD cache via command line...' # Note: Exact commands vary by DSM version # Consult Synology documentation for CLI cache management ``` #### **Step 3: Verify Volume1 Recovery (5 minutes)** ```bash # Check if Volume1 is back online df -h | grep volume1 ls -la /volume1/ # If Volume1 is accessible: echo "✅ Volume1 recovered successfully" # If still offline: echo "❌ Volume1 still offline - proceed to advanced recovery" ``` #### **Step 4: Emergency Data Backup (30-60 minutes)** ```bash # IMMEDIATELY backup critical data once Volume1 is accessible # Priority order: # 1. Docker configurations (highest priority) rsync -av /volume1/docker/ /volume2/emergency-backup/docker-$(date +%Y%m%d)/ tar -czf /volume2/emergency-backup/docker-configs-$(date +%Y%m%d).tar.gz /volume1/docker/ # 2. Critical documents rsync -av /volume1/documents/ /volume2/emergency-backup/documents-$(date +%Y%m%d)/ # 3. Database backups find /volume1/docker -name "*backup*" -type f -exec cp {} /volume2/emergency-backup/db-backups/ \; # 4. Configuration files cp -r /volume1/homelab/ /volume2/emergency-backup/homelab-$(date +%Y%m%d)/ # Verify backup integrity echo "Verifying backup integrity..." find /volume2/emergency-backup/ -type f -exec md5sum {} \; > /volume2/emergency-backup/checksums-$(date +%Y%m%d).md5 ``` #### **Step 5: Remove Failed SSD Drives (15 minutes)** ```bash # Physical removal of corrupted SSD drives # 1. Shutdown Atlantis safely sudo shutdown -h now # 2. Wait for complete shutdown (LED off) # 3. Remove power cable # 4. Open NAS case # 5. Remove both WD Black SN750 SE drives from M.2 slots # 6. Close case and reconnect power # 7. Power on and verify system boots normally # After boot, verify no SSD cache references remain # DSM > Storage Manager > Storage > SSD Cache # Should show "No SSD cache configured" ``` ### **🔧 Permanent Solution: New NVMe Installation** #### **Hardware Installation (When New Drives Arrive)** ```bash # New hardware to install: # - 2x Crucial P310 1TB (CT1000P310SSD801) # - 1x Synology SNV5420-400G # Installation procedure: # 1. Power down Atlantis # 2. Install Crucial P310 drives in M.2 slots 1 & 2 # 3. Install Synology SNV5420 in E10M20-T1 card M.2 slot # 4. Power on and wait for drive recognition ``` #### **007revad Script Configuration** ```bash # After hardware installation, run 007revad scripts cd /volume1/homelab/synology_scripts/ # 1. Enable M.2 volume support cd 007revad_enable_m2/ sudo ./syno_enable_m2_volume.sh echo "✅ M.2 volume support enabled" # 2. Create M.2 volumes cd ../007revad_m2_volume/ sudo ./syno_m2_volume.sh echo "✅ M.2 volumes created" # 3. Update HDD database (for IronWolf Pro drives) cd ../007revad_hdd_db/ sudo ./syno_hdd_db.sh echo "✅ HDD database updated" ``` #### **New Cache Configuration** ```bash # Configure new SSD cache with Crucial P310 drives # DSM > Storage Manager > Storage > SSD Cache # Recommended configuration: # - Cache Type: Read-Write cache # - RAID Type: RAID 1 (for redundancy) # - Drives: Both Crucial P310 1TB drives # - Skip data consistency check: NO (ensure integrity) # Synology SNV5420 usage: # - Use as separate high-performance volume # - Ideal for Docker containers requiring high IOPS # - Configure as Volume3 for critical services ``` --- ## 🔥 Complete NAS Hardware Failure ### **Emergency Data Extraction** ```bash # If NAS won't boot but drives are intact # Use Linux PC for data recovery # 1. Remove drives from failed NAS # 2. Connect drives to Linux system via USB adapters # 3. Install mdadm for RAID recovery sudo apt update && sudo apt install mdadm # 4. Scan for RAID arrays sudo mdadm --assemble --scan sudo mdadm --detail --scan # 5. Mount recovered volumes mkdir -p /mnt/synology-recovery sudo mount /dev/md0 /mnt/synology-recovery # 6. Copy critical data rsync -av /mnt/synology-recovery/docker/ ~/synology-recovery/docker/ rsync -av /mnt/synology-recovery/documents/ ~/synology-recovery/documents/ ``` ### **NAS Replacement Procedure** ```bash # Complete DS1823xs+ replacement # Step 1: Order identical replacement # - Same model: DS1823xs+ # - Same RAM configuration: 32GB DDR4 ECC # - Same expansion cards: E10M20-T1 # Step 2: Drive migration # - Remove all drives from old unit # - Note drive bay positions (critical!) # - Install drives in new unit in EXACT same order # - Install M.2 drives in same slots # Step 3: First boot # - Power on new NAS # - DSM will detect existing configuration # - Follow migration wizard # - Do NOT initialize drives (will erase data) # Step 4: Configuration restoration # - Restore DSM configuration from backup # - Reinstall packages and applications # - Run 007revad scripts # - Verify all services operational ``` --- ## ⚡ Power Surge Recovery ### **Assessment Procedure** ```bash # After power surge or electrical event # Step 1: Visual inspection # - Check for burn marks on power adapter # - Inspect NAS case for damage # - Look for LED indicators # Step 2: Controlled power-on test # - Use different power outlet # - Connect only essential cables # - Power on and observe boot sequence # Step 3: Component testing # If NAS powers on: # - Check all drive recognition # - Verify network connectivity # - Test all expansion cards # If NAS doesn't power on: # - Try different power adapter (if available) # - Check fuses in power adapter # - Consider professional repair ``` ### **Data Protection After Surge** ```bash # If NAS boots but shows errors: # 1. Immediate backup # Priority: Get data off potentially damaged system rsync -av /volume1/critical/ /external-backup/ # 2. Drive health check # Check all drives for damage sudo smartctl -a /dev/sda sudo smartctl -a /dev/sdb # Repeat for all drives # 3. Memory test # Run memory diagnostic if available # Check for ECC errors in logs # 4. Replace damaged components # Order replacements for any failed components # Consider UPS installation to prevent future damage ``` --- ## 🌊 Water/Physical Damage Recovery ### **Emergency Response (First 30 minutes)** ```bash # If NAS exposed to water or physical damage: # IMMEDIATE ACTIONS: # 1. POWER OFF IMMEDIATELY - do not attempt to boot # 2. Disconnect all cables # 3. Remove drives if possible # 4. Do not attempt to power on # Drive preservation: # - Place drives in anti-static bags # - Store in dry, cool location # - Do not attempt to clean or dry # - Contact professional recovery service if needed ``` ### **Professional Recovery Decision** ```bash # When to contact professional data recovery: # - Water damage to drives # - Physical damage to drive enclosures # - Clicking or grinding noises from drives # - Drives not recognized by any system # - Critical data with no backup # Professional services: # - DriveSavers: 1-800-440-1904 # - Ontrack: 1-800-872-2599 # - Secure Data Recovery: 1-800-388-1266 # Cost considerations: # - $500-$5000+ depending on damage # - Success not guaranteed # - Weigh cost vs. data value ``` --- ## 🔒 Encryption Key Recovery ### **Encrypted Volume Access** ```bash # If encryption key is lost or corrupted: # Step 1: Locate backup keys # Check these locations: # - Password manager (Vaultwarden) # - Physical key backup (if created) # - Email notifications from Synology # - Configuration backup files # Step 2: Key recovery attempt # DSM > Control Panel > Shared Folder # Select encrypted folder > Edit > Security # Try "Recover" option with backup key # Step 3: If no backup key exists: # Data is likely unrecoverable without professional help # Synology uses strong encryption - no backdoors # Consider professional cryptographic recovery services ``` ### **Prevention for Future** ```bash # Create encryption key backup NOW: # 1. DSM > Control Panel > Shared Folder # 2. Select encrypted folder > Edit > Security # 3. Export encryption key # 4. Store in multiple secure locations: # - Password manager # - Physical printout in safe # - Encrypted cloud storage # - Secondary NAS location ``` --- ## 📦 DSM Operating System Recovery ### **DSM Corruption Recovery** ```bash # If DSM won't boot or is corrupted: # Step 1: Download DSM installer # From Synology website: # - Find your exact model (DS1823xs+) # - Download latest DSM .pat file # - Save to computer # Step 2: Synology Assistant recovery # 1. Install Synology Assistant on computer # 2. Connect NAS and computer to same network # 3. Power on NAS while holding reset button # 4. Release reset when power LED blinks orange # 5. Use Synology Assistant to reinstall DSM # Step 3: Configuration restoration # After DSM reinstall: # - Restore from configuration backup # - Reinstall packages # - Reconfigure services # - Run 007revad scripts ``` ### **Manual DSM Installation** ```bash # If Synology Assistant fails: # 1. Access recovery mode # - Power off NAS # - Hold reset button while powering on # - Keep holding until power LED blinks orange # - Release reset button # 2. Web interface recovery # - Open browser to NAS IP address # - Should show recovery interface # - Upload DSM .pat file # - Follow installation wizard # 3. Data preservation # - Choose "Keep existing data" if option appears # - Do not format drives unless absolutely necessary # - Existing volumes should be preserved ``` --- ## 🛠️ 007revad Scripts for Disaster Recovery ### **Post-Recovery Script Execution** ```bash # After any hardware replacement or DSM reinstall: # 1. Download/update scripts cd /volume1/homelab/synology_scripts/ git pull origin main # Update to latest versions # 2. HDD Database Update (for IronWolf Pro drives) cd 007revad_hdd_db/ sudo ./syno_hdd_db.sh # Ensures Seagate IronWolf Pro drives are properly recognized # Prevents compatibility warnings # Enables full SMART monitoring # 3. Enable M.2 Volume Support cd ../007revad_enable_m2/ sudo ./syno_enable_m2_volume.sh # Re-enables M.2 volume creation after DSM updates # Required after any DSM reinstall # Fixes DSM limitations on M.2 usage # 4. Create M.2 Volumes cd ../007revad_m2_volume/ sudo ./syno_m2_volume.sh # Creates storage volumes on M.2 drives # Allows M.2 drives to be used for more than just cache # Essential for high-performance storage setup ``` ### **Script Automation for Recovery** ```bash # Create automated recovery script cat > /volume1/homelab/scripts/post-recovery-setup.sh << 'EOF' #!/bin/bash # Post-disaster recovery automation script echo "🚀 Starting post-recovery setup..." # Update 007revad scripts cd /volume1/homelab/synology_scripts/ git pull origin main # Run HDD database update echo "📀 Updating HDD database..." cd 007revad_hdd_db/ sudo ./syno_hdd_db.sh # Enable M.2 volumes echo "💾 Enabling M.2 volume support..." cd ../007revad_enable_m2/ sudo ./syno_enable_m2_volume.sh # Create M.2 volumes echo "🔧 Creating M.2 volumes..." cd ../007revad_m2_volume/ sudo ./syno_m2_volume.sh # Restart Docker services echo "🐳 Restarting Docker services..." sudo systemctl restart docker # Verify services echo "✅ Verifying critical services..." docker ps | grep -E "(plex|grafana|vaultwarden)" echo "🎉 Post-recovery setup complete!" EOF chmod +x /volume1/homelab/scripts/post-recovery-setup.sh ``` --- ## 📋 Recovery Checklists ### **🚨 SSD Cache Failure Checklist** ```bash ☐ SSH access to NAS confirmed ☐ Volume status assessed ☐ SSD cache disabled/removed ☐ Volume1 accessibility verified ☐ Emergency backup completed ☐ Failed SSD drives physically removed ☐ System stability confirmed ☐ New drives ordered (if needed) ☐ 007revad scripts prepared ☐ Recovery procedure documented ``` ### **🔥 Complete NAS Failure Checklist** ```bash ☐ Damage assessment completed ☐ Drives safely removed ☐ Drive order documented ☐ Replacement NAS ordered ☐ Data recovery attempted (if needed) ☐ New NAS configured ☐ Drives installed in correct order ☐ Configuration restored ☐ 007revad scripts executed ☐ All services verified operational ``` ### **⚡ Power Surge Recovery Checklist** ```bash ☐ Visual damage inspection completed ☐ Power adapter tested/replaced ☐ Controlled power-on test performed ☐ Drive health checks completed ☐ Memory diagnostics run ☐ Network connectivity verified ☐ UPS installation planned ☐ Surge protection upgraded ☐ Insurance claim filed (if applicable) ``` --- ## 🚨 Emergency Contacts & Resources ### **Professional Data Recovery Services** ```bash # DriveSavers (24/7 emergency service) Phone: 1-800-440-1904 Web: https://www.drivesavers.com Specialties: RAID, NAS, enterprise storage # Ontrack Data Recovery Phone: 1-800-872-2599 Web: https://www.ontrack.com Specialties: Synology NAS, RAID arrays # Secure Data Recovery Services Phone: 1-800-388-1266 Web: https://www.securedatarecovery.com Specialties: Water damage, physical damage ``` ### **Synology Support** ```bash # Synology Technical Support Phone: 1-425-952-7900 (US) Email: support@synology.com Web: https://www.synology.com/support Hours: 24/7 for critical issues # Synology Community Forum: https://community.synology.com Reddit: r/synology Discord: Synology Community Server ``` ### **Hardware Vendors** ```bash # Seagate Support (IronWolf Pro drives) Phone: 1-800-732-4283 Web: https://www.seagate.com/support/ Warranty: https://www.seagate.com/support/warranty-and-replacements/ # Crucial Support (P310 SSDs) Phone: 1-800-336-8896 Web: https://www.crucial.com/support Warranty: https://www.crucial.com/support/warranty ``` --- ## 🔄 Prevention & Monitoring ### **Proactive Monitoring Setup** ```bash # Set up monitoring to prevent disasters: # 1. SMART monitoring for all drives # DSM > Storage Manager > Storage > HDD/SSD # Enable SMART test scheduling # 2. Temperature monitoring # Install temperature sensors # Set up alerts for overheating # 3. UPS monitoring # Install Network UPS Tools (NUT) # Configure automatic shutdown # 4. Backup verification # Automated backup integrity checks # Regular restore testing ``` ### **Regular Maintenance Schedule** ```bash # Monthly tasks: ☐ Check drive health (SMART status) ☐ Verify backup integrity ☐ Test UPS functionality ☐ Update DSM and packages ☐ Run 007revad scripts if needed # Quarterly tasks: ☐ Full system backup ☐ Configuration export ☐ Hardware inspection ☐ Update disaster recovery documentation ☐ Test recovery procedures # Annually: ☐ Replace UPS batteries ☐ Review warranty status ☐ Update emergency contacts ☐ Disaster recovery drill ☐ Insurance policy review ``` --- **💡 Critical Reminder**: The current SSD cache failure on Atlantis requires immediate attention. Follow the emergency recovery procedure above to restore Volume1 access and prevent data loss. **🔄 Update Status**: This document should be updated after resolving the current cache failure and installing the new Crucial P310 and Synology SNV5420 drives. **📞 Emergency Protocol**: If you cannot resolve issues using this guide, contact professional data recovery services immediately. Time is critical for data preservation.