Sanitized mirror from private repository - 2026-04-04 03:48:45 UTC
This commit is contained in:
248
docs/infrastructure/INFRASTRUCTURE_HEALTH_REPORT.md
Normal file
248
docs/infrastructure/INFRASTRUCTURE_HEALTH_REPORT.md
Normal file
@@ -0,0 +1,248 @@
|
||||
# Infrastructure Health Report
|
||||
*Last Updated: February 14, 2026*
|
||||
*Previous Report: February 8, 2026*
|
||||
|
||||
## 🎯 Executive Summary
|
||||
|
||||
**Overall Status**: ✅ **EXCELLENT HEALTH**
|
||||
**GitOps Deployment**: ✅ **FULLY OPERATIONAL** (New since last report)
|
||||
**Infrastructure Optimization**: Complete across entire Tailscale homelab network
|
||||
**Critical Systems**: 100% operational with enhanced GitOps automation
|
||||
|
||||
### 🚀 Major Updates Since Last Report
|
||||
- **GitOps Deployment**: Portainer EE v2.33.7 now managing 18 active stacks
|
||||
- **Container Growth**: 50+ containers now deployed via GitOps on Atlantis
|
||||
- **Automation Enhancement**: Full GitOps workflow operational
|
||||
- **Service Expansion**: Multiple new services deployed automatically
|
||||
|
||||
## 📊 Infrastructure Status Overview
|
||||
|
||||
### Tailscale Network Health: ✅ **OPTIMAL**
|
||||
- **Total Devices**: 28 devices in tailnet
|
||||
- **Online Devices**: 12 active devices
|
||||
- **Critical Infrastructure**: 100% operational
|
||||
- **SSH Connectivity**: All online devices accessible
|
||||
|
||||
### Core Infrastructure Components
|
||||
|
||||
#### 🏢 Synology NAS Cluster: ✅ **ALL HEALTHY**
|
||||
|
||||
| Device | Tailscale IP | Status | DSM Version | RAID Status | Disk Usage | Role |
|
||||
|--------|--------------|---------|-------------|-------------|------------|------|
|
||||
| **atlantis** | 100.83.230.112 | ✅ Healthy | DSM 7.3.2 | Normal | 73% | Primary NAS |
|
||||
| **calypso** | 100.103.48.78 | ✅ Healthy | DSM 7.3.2 | Normal | 84% | APT Cache Server |
|
||||
| **setillo** | 100.125.0.20 | ✅ Healthy | DSM 7.3.2 | Normal | 78% | Backup NAS |
|
||||
|
||||
**Health Check Results**:
|
||||
- All RAID arrays functioning normally
|
||||
- Disk usage within acceptable thresholds
|
||||
- System temperatures normal
|
||||
- All critical services operational
|
||||
- **NEW**: GitOps deployment system fully operational
|
||||
|
||||
#### 🚀 GitOps Deployment System: ✅ **FULLY OPERATIONAL**
|
||||
|
||||
**Management Platform**: Portainer Enterprise Edition v2.33.7
|
||||
**Management URL**: https://192.168.0.200:9443
|
||||
**Deployment Method**: Automatic Git repository sync
|
||||
|
||||
| Host | GitOps Status | Active Stacks | Containers | Last Sync |
|
||||
|------|---------------|---------------|------------|-----------|
|
||||
| **atlantis** | ✅ Active | 18 stacks | 50+ containers | Continuous |
|
||||
| **calypso** | ✅ Ready | 0 stacks | 46 containers | Ready |
|
||||
| **homelab** | ✅ Ready | 0 stacks | 23 containers | Ready |
|
||||
| **vish-concord-nuc** | ✅ Ready | 0 stacks | 17 containers | Ready |
|
||||
| **pi-5** | ✅ Ready | 0 stacks | 4 containers | Ready |
|
||||
|
||||
**Active GitOps Stacks on Atlantis**:
|
||||
- arr-stack (18 containers) - Media automation
|
||||
- immich-stack (4 containers) - Photo management
|
||||
- jitsi (5 containers) - Video conferencing
|
||||
- vaultwarden-stack (2 containers) - Password management
|
||||
- ollama (2 containers) - AI/LLM services
|
||||
- +13 additional stacks (1-3 containers each)
|
||||
|
||||
**GitOps Benefits Achieved**:
|
||||
- 100% declarative infrastructure configuration
|
||||
- Automatic deployment from Git commits
|
||||
- Version-controlled service definitions
|
||||
- Rollback capability for all deployments
|
||||
- Multi-host deployment readiness
|
||||
|
||||
#### 🌐 APT Proxy Infrastructure: ✅ **FULLY OPTIMIZED**
|
||||
|
||||
**Proxy Server**: calypso (100.103.48.78:3142) running apt-cacher-ng
|
||||
|
||||
| Client System | OS Distribution | Proxy Status | Connectivity | Last Verified |
|
||||
|---------------|-----------------|--------------|--------------|---------------|
|
||||
| **homelab** | Ubuntu 24.04 | ✅ Configured | ✅ Connected | 2026-02-08 |
|
||||
| **pi-5** | Debian 12.13 | ✅ Configured | ✅ Connected | 2026-02-08 |
|
||||
| **vish-concord-nuc** | Ubuntu 24.04 | ✅ Configured | ✅ Connected | 2026-02-08 |
|
||||
| **pve** | Debian 12.13 | ✅ Configured | ✅ Connected | 2026-02-08 |
|
||||
| **truenas-scale** | Debian 12.9 | ✅ Configured | ✅ Connected | 2026-02-08 |
|
||||
|
||||
**Benefits Achieved**:
|
||||
- 100% of Debian/Ubuntu systems using centralized package cache
|
||||
- Significant bandwidth reduction for package updates
|
||||
- Faster package installation across all clients
|
||||
- Consistent package versions across infrastructure
|
||||
|
||||
#### 🔐 SSH Access Status: ✅ **FULLY RESOLVED**
|
||||
|
||||
**Issues Resolved**:
|
||||
- ✅ **seattle-tailscale**: fail2ban had banned homelab IP (100.67.40.126)
|
||||
- Unbanned IP from fail2ban jail
|
||||
- Added Tailscale subnet (100.64.0.0/10) to fail2ban ignore list
|
||||
- ✅ **homeassistant**: SSH access configured and verified
|
||||
- User: hassio
|
||||
- Authentication: Key-based
|
||||
|
||||
**Current Access Status**:
|
||||
- All 12 online Tailscale devices accessible via SSH
|
||||
- Proper fail2ban configurations prevent future lockouts
|
||||
- Centralized SSH key management in place
|
||||
|
||||
## 🔧 Automation & Monitoring Enhancements
|
||||
|
||||
### New Ansible Playbooks
|
||||
|
||||
#### 1. APT Proxy Health Monitor (`check_apt_proxy.yml`)
|
||||
**Purpose**: Comprehensive monitoring of APT proxy infrastructure
|
||||
|
||||
**Capabilities**:
|
||||
- ✅ Configuration file validation
|
||||
- ✅ Network connectivity testing
|
||||
- ✅ APT settings verification
|
||||
- ✅ Detailed status reporting
|
||||
- ✅ Automated recommendations
|
||||
|
||||
**Usage**:
|
||||
```bash
|
||||
cd /home/homelab/organized/repos/homelab/ansible/automation
|
||||
ansible-playbook playbooks/check_apt_proxy.yml
|
||||
```
|
||||
|
||||
#### 2. Enhanced Inventory Management
|
||||
**Improvements**:
|
||||
- ✅ Comprehensive host groupings (debian_clients, hypervisors, rpi, etc.)
|
||||
- ✅ Updated Tailscale IP addresses
|
||||
- ✅ Proper user configurations
|
||||
- ✅ Backward compatibility maintained
|
||||
|
||||
### Existing Playbook Status
|
||||
|
||||
| Playbook | Purpose | Status | Last Verified |
|
||||
|----------|---------|---------|---------------|
|
||||
| `synology_health.yml` | NAS health monitoring | ✅ Working | 2026-02-08 |
|
||||
| `configure_apt_proxy.yml` | APT proxy setup | ✅ Working | 2026-02-08 |
|
||||
| `tailscale_health.yml` | Tailscale connectivity | ✅ Working | Previous |
|
||||
| `system_info.yml` | System information gathering | ✅ Working | Previous |
|
||||
| `update_system.yml` | System updates | ✅ Working | Previous |
|
||||
|
||||
## 📈 Infrastructure Maturity Assessment
|
||||
|
||||
### Current Level: **Level 3 - Standardized**
|
||||
|
||||
**Achieved Capabilities**:
|
||||
- ✅ Automated health monitoring across all critical systems
|
||||
- ✅ Centralized configuration management via Ansible
|
||||
- ✅ Comprehensive documentation and runbooks
|
||||
- ✅ Reliable connectivity and access controls
|
||||
- ✅ Standardized package management infrastructure
|
||||
- ✅ Proactive monitoring and alerting capabilities
|
||||
|
||||
**Key Metrics**:
|
||||
- **Uptime**: 100% for critical infrastructure
|
||||
- **Automation Coverage**: 90% of routine tasks automated
|
||||
- **Documentation**: Comprehensive and up-to-date
|
||||
- **Monitoring**: Real-time health checks implemented
|
||||
|
||||
## 🔄 Maintenance Procedures
|
||||
|
||||
### Regular Health Checks
|
||||
|
||||
#### Weekly Tasks
|
||||
```bash
|
||||
# APT proxy infrastructure check
|
||||
ansible-playbook playbooks/check_apt_proxy.yml
|
||||
|
||||
# System information gathering
|
||||
ansible-playbook playbooks/system_info.yml
|
||||
```
|
||||
|
||||
#### Monthly Tasks
|
||||
```bash
|
||||
# Synology NAS health verification
|
||||
ansible-playbook playbooks/synology_health.yml
|
||||
|
||||
# Tailscale connectivity verification
|
||||
ansible-playbook playbooks/tailscale_health.yml
|
||||
|
||||
# System updates (as needed)
|
||||
ansible-playbook playbooks/update_system.yml
|
||||
```
|
||||
|
||||
### Monitoring Recommendations
|
||||
|
||||
1. **Automated Scheduling**: Consider setting up cron jobs for regular health checks
|
||||
2. **Alert Integration**: Connect health checks to notification systems (ntfy, email)
|
||||
3. **Trend Analysis**: Track metrics over time for capacity planning
|
||||
4. **Backup Verification**: Regular testing of backup and recovery procedures
|
||||
|
||||
## 🚨 Known Issues & Limitations
|
||||
|
||||
### Offline Systems (Expected)
|
||||
- **pi-5-kevin** (100.123.246.75): Offline for 114+ days - expected
|
||||
- Various mobile devices and test systems: Intermittent connectivity expected
|
||||
|
||||
### Non-Critical Items
|
||||
- **homeassistant**: Runs Alpine Linux (not Debian) - excluded from APT proxy
|
||||
- Some legacy configurations may need cleanup during future maintenance
|
||||
|
||||
## 📁 Documentation Structure
|
||||
|
||||
### Key Files Updated/Created
|
||||
```
|
||||
/home/homelab/organized/repos/homelab/
|
||||
├── ansible/automation/
|
||||
│ ├── hosts.ini # ✅ Updated with comprehensive inventory
|
||||
│ └── playbooks/
|
||||
│ └── check_apt_proxy.yml # ✅ New comprehensive health check
|
||||
├── docs/infrastructure/
|
||||
│ └── INFRASTRUCTURE_HEALTH_REPORT.md # ✅ This report
|
||||
└── AGENTS.md # ✅ Updated with latest procedures
|
||||
```
|
||||
|
||||
## 🎯 Next Steps & Recommendations
|
||||
|
||||
### Short Term (Next 30 Days)
|
||||
1. **Automated Scheduling**: Set up cron jobs for weekly health checks
|
||||
2. **Alert Integration**: Connect monitoring to notification systems
|
||||
3. **Backup Testing**: Verify all backup procedures are working
|
||||
|
||||
### Medium Term (Next 90 Days)
|
||||
1. **Capacity Planning**: Analyze disk usage trends on NAS systems
|
||||
2. **Security Audit**: Review SSH keys and access controls
|
||||
3. **Performance Optimization**: Analyze APT cache hit rates and optimize
|
||||
|
||||
### Long Term (Next 6 Months)
|
||||
1. **Infrastructure Scaling**: Plan for additional services and capacity
|
||||
2. **Disaster Recovery**: Enhance backup and recovery procedures
|
||||
3. **Monitoring Evolution**: Implement more sophisticated monitoring stack
|
||||
|
||||
---
|
||||
|
||||
## 📞 Emergency Contacts & Procedures
|
||||
|
||||
**Primary Administrator**: Vish
|
||||
**Management Node**: homelab (100.67.40.126)
|
||||
**Emergency Access**: SSH via Tailscale network
|
||||
|
||||
**Critical Service Recovery**:
|
||||
1. Synology NAS issues → Check RAID status, contact Synology support if needed
|
||||
2. APT proxy issues → Verify calypso connectivity, restart apt-cacher-ng service
|
||||
3. SSH access issues → Check fail2ban logs, use Tailscale admin console
|
||||
|
||||
---
|
||||
|
||||
*This report represents the current state of infrastructure as of February 8, 2026. All systems verified healthy and operational. 🚀*
|
||||
113
docs/infrastructure/INFRASTRUCTURE_OVERVIEW.md
Normal file
113
docs/infrastructure/INFRASTRUCTURE_OVERVIEW.md
Normal file
@@ -0,0 +1,113 @@
|
||||
# Homelab Infrastructure Overview
|
||||
|
||||
*Last Updated: 2026-03-08*
|
||||
|
||||
---
|
||||
|
||||
## Server Inventory
|
||||
|
||||
| Server | Type | Endpoint ID | Status | CPUs | RAM | Containers | Stacks |
|
||||
|--------|------|-------------|--------|------|-----|------------|--------|
|
||||
| Atlantis | Local Docker | 2 | 🟢 Online | 8 | 31.3 GB | 50+ | 24 |
|
||||
| Calypso | Edge Agent | 443397 | 🟢 Online | 4 | 31.3 GB | 54 | 23 |
|
||||
| RPi5 | Edge Agent | 443395 | 🟢 Online | 4 | 15.8 GB | 4 | 4 |
|
||||
| Concord NUC | Edge Agent | 443398 | 🟢 Online | 4 | 15.5 GB | 19 | 11 |
|
||||
| Homelab VM | Edge Agent | 443399 | 🟢 Online | 4 | 28.7 GB | 30 | 19 |
|
||||
|
||||
### Hardware Summary
|
||||
|
||||
| Server | Hardware | Docker Version | Public URL |
|
||||
|--------|----------|----------------|------------|
|
||||
| **Atlantis** | Synology DS1823xs+ (AMD Ryzen V1500B) | 24.0.2 | atlantis.vish.local |
|
||||
| **Concord NUC** | Intel NUC6i3SYB (i3-6100U, 16GB) | 29.1.5 | concordnuc.vish.local |
|
||||
| **Calypso** | Synology DS723+ (AMD Ryzen R1600) | 24.0.2 | calypso.vish.local |
|
||||
| **rpi5** | Raspberry Pi 5 (16GB) | 29.1.4 | - |
|
||||
| **Homelab VM** | Proxmox VM (4 vCPU, 28GB) | 25.0.2 | 192.168.0.210 |
|
||||
|
||||
## Service Categories
|
||||
|
||||
### Media Management
|
||||
|
||||
- arr-stack (Atlantis)
|
||||
- arr-stack (Calypso)
|
||||
- plex
|
||||
- jellyseerr
|
||||
- tautulli
|
||||
|
||||
### Photo Management
|
||||
|
||||
- Immich (Atlantis)
|
||||
- Immich (Calypso)
|
||||
|
||||
### Document Management
|
||||
|
||||
- PaperlessNGX
|
||||
- Joplin
|
||||
|
||||
### Network & DNS
|
||||
|
||||
- AdGuard (Concord NUC)
|
||||
- AdGuard (Calypso)
|
||||
- WireGuard
|
||||
- DynDNS
|
||||
|
||||
### Home Automation
|
||||
|
||||
- Home Assistant
|
||||
- Matter Server
|
||||
|
||||
### Development & DevOps
|
||||
|
||||
- Gitea
|
||||
- Portainer
|
||||
- OpenHands
|
||||
|
||||
### Communication
|
||||
|
||||
- Matrix/Synapse
|
||||
- **matrix.thevish.io** (Ubuntu VM) - Primary homeserver, server_name: `vish`
|
||||
- **mx.vish.gg** (Ubuntu VM) - Secondary homeserver with federation
|
||||
- See [Matrix Ubuntu VM Documentation](../matrix-ubuntu-vm/README.md)
|
||||
- Jitsi
|
||||
- Signal API
|
||||
|
||||
### Monitoring & Alerting
|
||||
|
||||
- Prometheus (metrics collection)
|
||||
- Grafana (dashboards & visualization)
|
||||
- Alertmanager (alert routing)
|
||||
- ntfy-bridge (formatted push notifications)
|
||||
- signal-bridge (Signal messenger alerts)
|
||||
- Uptime Kuma
|
||||
- Glances
|
||||
- WatchYourLAN
|
||||
|
||||
#### Alert Channels
|
||||
| Channel | Use Case | Topic/Number |
|
||||
|---------|----------|--------------|
|
||||
| **ntfy** | All alerts | homelab-alerts |
|
||||
| **Signal** | Critical only | REDACTED_PHONE_NUMBER |
|
||||
|
||||
See [Alerting Setup Guide](admin/alerting-setup.md) for configuration details.
|
||||
|
||||
### Security
|
||||
|
||||
- Vaultwarden/Bitwarden
|
||||
|
||||
### File Sync
|
||||
|
||||
- Syncthing
|
||||
- Seafile
|
||||
|
||||
### Privacy Tools
|
||||
|
||||
- Invidious
|
||||
- Libreddit/Redlib
|
||||
- Binternet
|
||||
|
||||
### Productivity
|
||||
|
||||
- Draw.io
|
||||
- Reactive Resume
|
||||
- ArchiveBox
|
||||
- Hoarder/Karakeep
|
||||
151
docs/infrastructure/MONITORING_ARCHITECTURE.md
Normal file
151
docs/infrastructure/MONITORING_ARCHITECTURE.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Homelab Monitoring Architecture
|
||||
|
||||
This document explains the different monitoring setups in the homelab and their purposes.
|
||||
|
||||
## 🏗️ Architecture Overview
|
||||
|
||||
The homelab has **three distinct monitoring deployments** serving different purposes:
|
||||
|
||||
### 1. **Production GitOps Monitoring** (Primary)
|
||||
- **Location**: `hosts/vms/homelab-vm/monitoring.yaml`
|
||||
- **Deployment**: Portainer GitOps on homelab-vm
|
||||
- **Purpose**: Production monitoring for all homelab infrastructure
|
||||
- **Access**: https://gf.vish.gg (with Authentik SSO)
|
||||
- **Status**: ✅ **ACTIVE** - This is the canonical monitoring stack
|
||||
|
||||
**Features:**
|
||||
- Monitors all homelab devices (Synology NAS, nodes, VMs)
|
||||
- Authentik OAuth2 SSO integration
|
||||
- Embedded dashboard configs in Docker Compose
|
||||
- Auto-provisioned datasources and dashboards
|
||||
- SNMP monitoring for Synology devices
|
||||
|
||||
### 2. **Fixed Development Stack** (New)
|
||||
- **Location**: `docker/monitoring/`
|
||||
- **Deployment**: Standalone Docker Compose
|
||||
- **Purpose**: Development/testing with fixed dashboard issues
|
||||
- **Access**: http://localhost:3300 (admin/admin)
|
||||
- **Status**: 🔧 **DEVELOPMENT** - For testing and dashboard fixes
|
||||
|
||||
**Features:**
|
||||
- All dashboard datasource UIDs fixed
|
||||
- Template variables working correctly
|
||||
- Instance filters properly configured
|
||||
- Verification scripts included
|
||||
- Backup/restore functionality
|
||||
|
||||
### 3. **Atlantis Legacy Setup** (Deprecated)
|
||||
- **Location**: `hosts/synology/atlantis/grafana_prometheus/`
|
||||
- **Deployment**: Synology Docker on Atlantis
|
||||
- **Purpose**: Legacy monitoring setup
|
||||
- **Status**: 📦 **ARCHIVED** - Kept for reference
|
||||
|
||||
## 🔄 GitOps Workflow
|
||||
|
||||
### Production Deployment (homelab-vm)
|
||||
```bash
|
||||
# GitOps automatically deploys from:
|
||||
hosts/vms/homelab-vm/monitoring.yaml
|
||||
|
||||
# Portainer Stack Details:
|
||||
# - Stack ID: 476
|
||||
# - Endpoint: 443399
|
||||
# - Auto-updates from git repository
|
||||
```
|
||||
|
||||
### Development Testing (docker/monitoring)
|
||||
```bash
|
||||
# Manual deployment for testing:
|
||||
cd docker/monitoring
|
||||
docker-compose up -d
|
||||
|
||||
# Verify dashboards:
|
||||
./verify-dashboard-sections.sh
|
||||
```
|
||||
|
||||
## 📊 Dashboard Status
|
||||
|
||||
| Dashboard | Production (GitOps) | Development (Fixed) | Status |
|
||||
|-----------|-------------------|-------------------|---------|
|
||||
| Infrastructure Overview | ✅ Working | ✅ Fixed | Both functional |
|
||||
| Synology NAS Monitoring | ⚠️ Needs UID fix | ✅ Fixed | Dev has fixes |
|
||||
| Node Exporter Full | ⚠️ Needs UID fix | ✅ Fixed | Dev has fixes |
|
||||
| Node Details | ⚠️ Needs UID fix | ✅ Fixed | Dev has fixes |
|
||||
|
||||
## 🔧 Applying Fixes to Production
|
||||
|
||||
To apply the dashboard fixes to the production GitOps deployment:
|
||||
|
||||
1. **Extract fixed dashboards** from `docker/monitoring/grafana/dashboards/`
|
||||
2. **Update the embedded configs** in `hosts/vms/homelab-vm/monitoring.yaml`
|
||||
3. **Test locally** using the development stack
|
||||
4. **Commit changes** - GitOps will auto-deploy
|
||||
|
||||
### Example: Updating Synology Dashboard in GitOps
|
||||
|
||||
```bash
|
||||
# 1. Extract the fixed dashboard JSON
|
||||
cat docker/monitoring/grafana/dashboards/synology-nas-monitoring.json
|
||||
|
||||
# 2. Update the embedded config in monitoring.yaml
|
||||
# Replace the dashboard_synology config content with the fixed JSON
|
||||
|
||||
# 3. Commit and push - GitOps handles deployment
|
||||
git add hosts/vms/homelab-vm/monitoring.yaml
|
||||
git commit -m "Fix Synology dashboard datasource UID in GitOps"
|
||||
git push
|
||||
```
|
||||
|
||||
## 🚀 Deployment Commands
|
||||
|
||||
### Production (GitOps - Automatic)
|
||||
```bash
|
||||
# No manual deployment needed
|
||||
# Portainer GitOps auto-deploys from git repository
|
||||
# Access: https://gf.vish.gg
|
||||
```
|
||||
|
||||
### Development (Manual)
|
||||
```bash
|
||||
cd docker/monitoring
|
||||
docker-compose up -d
|
||||
# Access: http://localhost:3300
|
||||
```
|
||||
|
||||
### Legacy (Manual - Not Recommended)
|
||||
```bash
|
||||
cd hosts/synology/atlantis/grafana_prometheus
|
||||
# Deploy via Synology Docker UI
|
||||
```
|
||||
|
||||
## 📋 Maintenance
|
||||
|
||||
### Updating Production Dashboards
|
||||
1. Test fixes in `docker/monitoring/` first
|
||||
2. Update embedded configs in `hosts/vms/homelab-vm/monitoring.yaml`
|
||||
3. Commit changes for GitOps auto-deployment
|
||||
|
||||
### Backup Strategy
|
||||
- **Production**: Automated via GitOps repository
|
||||
- **Development**: Use `backup.sh` and `restore.sh` scripts
|
||||
- **Legacy**: Manual Synology backup
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Dashboard "No Data" Issues
|
||||
1. Check datasource UID matches Prometheus instance
|
||||
2. Verify template variables have correct queries
|
||||
3. Ensure instance filters are not empty
|
||||
4. Use development stack to test fixes first
|
||||
|
||||
### GitOps Deployment Issues
|
||||
1. Check Portainer stack logs
|
||||
2. Verify git repository connectivity
|
||||
3. Ensure Docker configs are valid YAML
|
||||
4. Test locally with development stack
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- [Dashboard Verification Report](docker/monitoring/dashboard-verification-report.md)
|
||||
- [Synology Dashboard Fix Report](docker/monitoring/synology-dashboard-fix-report.md)
|
||||
- [Development Stack README](docker/monitoring/README.md)
|
||||
251
docs/infrastructure/SSH_ACCESS_GUIDE.md
Normal file
251
docs/infrastructure/SSH_ACCESS_GUIDE.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# SSH Access Guide for Homelab
|
||||
|
||||
This guide documents the actual SSH configuration used to access all homelab hosts. All access goes through the **Tailscale mesh network** (`tail.vish.gg` MagicDNS suffix). There is no direct LAN SSH — all hosts are accessed via their Tailscale IPs.
|
||||
|
||||
## Network Overview
|
||||
|
||||
- **Mesh network**: Tailscale / Headscale (`headscale.vish.gg:8443`)
|
||||
- **MagicDNS suffix**: `tail.vish.gg`
|
||||
- **SSH key**: `~/.ssh/id_ed25519` (default key, no IdentityFile needed in config)
|
||||
- **Config location**: `~/.ssh/config` on homelab VM
|
||||
|
||||
---
|
||||
|
||||
## SSH Config (`~/.ssh/config`)
|
||||
|
||||
The full working SSH config on the homelab VM:
|
||||
|
||||
```
|
||||
# Atlantis - Primary Synology NAS (DS1821+)
|
||||
Host atlantis
|
||||
HostName 100.83.230.112
|
||||
User vish
|
||||
Port 60000
|
||||
|
||||
# Calypso - Secondary Synology NAS (DS723+)
|
||||
Host calypso
|
||||
HostName 100.103.48.78
|
||||
User Vish
|
||||
Port 62000
|
||||
|
||||
# Homelab VM
|
||||
Host homelab
|
||||
HostName 100.67.40.126
|
||||
User homelab
|
||||
# Note: password authentication only (no key auth configured on this host)
|
||||
|
||||
# Proxmox VE host
|
||||
Host pve
|
||||
HostName 100.87.12.28
|
||||
User root
|
||||
|
||||
# Concord NUC (Intel NUC)
|
||||
Host vish-concord-nuc
|
||||
Host concord
|
||||
Host nuc
|
||||
HostName 100.72.55.21
|
||||
User vish
|
||||
|
||||
# TrueNAS Scale (Guava)
|
||||
Host guava
|
||||
Host truenas
|
||||
HostName 100.75.252.64
|
||||
User vish
|
||||
|
||||
# Raspberry Pi 5
|
||||
Host pi-5
|
||||
HostName 100.77.151.40
|
||||
User vish
|
||||
|
||||
# Setillo (Proxmox LXC / container)
|
||||
Host setillo
|
||||
HostName 100.125.0.20
|
||||
User vish
|
||||
|
||||
Host setillo-root
|
||||
HostName 100.125.0.20
|
||||
User root
|
||||
|
||||
# Jellyfish (GL-MT3000 LAN device)
|
||||
Host jellyfish
|
||||
HostName 100.69.121.120
|
||||
User lulu
|
||||
|
||||
# Home Assistant OS
|
||||
Host homeassistant
|
||||
HostName 100.112.186.90
|
||||
User hassio
|
||||
Port 22
|
||||
|
||||
# GL-MT3000 (Beryl AX - IoT/HA gateway router)
|
||||
Host gl-mt3000
|
||||
HostName 100.126.243.15
|
||||
User root
|
||||
|
||||
# GL-BE3600 (Slate 7 - travel/repeater router)
|
||||
Host gl-be3600
|
||||
HostName 100.105.59.123
|
||||
User root
|
||||
|
||||
# mastodon-rocky (Rocky Linux 10 VM - Mastodon)
|
||||
Host mastodon-rocky
|
||||
HostName 100.64.0.3
|
||||
User root
|
||||
|
||||
# vishdebian (Debian 13 Trixie desktop)
|
||||
Host vishdebian
|
||||
HostName 100.64.0.2
|
||||
User vish
|
||||
|
||||
# shinku-ryuu (Windows desktop)
|
||||
Host shinku-ryuu
|
||||
HostName 100.98.93.15
|
||||
User vish
|
||||
|
||||
# Seattle VPS
|
||||
Host seattle
|
||||
Host seattle-tailscale
|
||||
HostName <seattle-tailscale-ip>
|
||||
User root
|
||||
|
||||
# Laptop (offline when sleeping)
|
||||
Host laptop
|
||||
HostName 100.124.91.52
|
||||
User vish
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Host Reference
|
||||
|
||||
| Alias(es) | Tailscale IP | User | Port | Host |
|
||||
|-----------|-------------|------|------|------|
|
||||
| `atlantis` | 100.83.230.112 | vish | 60000 | Synology DS1821+ |
|
||||
| `calypso` | 100.103.48.78 | Vish | 62000 | Synology DS723+ |
|
||||
| `homelab` | 100.67.40.126 | homelab | 22 | Homelab VM (password auth) |
|
||||
| `pve` | 100.87.12.28 | root | 22 | Proxmox VE |
|
||||
| `concord`, `nuc`, `vish-concord-nuc` | 100.72.55.21 | vish | 22 | Intel NUC |
|
||||
| `guava`, `truenas` | 100.75.252.64 | vish | 22 | TrueNAS Scale |
|
||||
| `pi-5` | 100.77.151.40 | vish | 22 | Raspberry Pi 5 |
|
||||
| `setillo` | 100.125.0.20 | vish | 22 | Proxmox LXC container |
|
||||
| `setillo-root` | 100.125.0.20 | root | 22 | Proxmox LXC container (root) |
|
||||
| `jellyfish` | 100.69.121.120 | lulu | 22 | Device on GL-MT3000 LAN |
|
||||
| `homeassistant` | 100.112.186.90 | hassio | 22 | Home Assistant OS |
|
||||
| `gl-mt3000` | 100.126.243.15 | root | 22 | GL-MT3000 router (dropbear) |
|
||||
| `gl-be3600` | 100.105.59.123 | root | 22 | GL-BE3600 router (dropbear) |
|
||||
| `vishdebian` | 100.64.0.2 | vish | 22 | Debian 13 Trixie desktop |
|
||||
| `mastodon-rocky` | 100.64.0.3 | root | 22 | Rocky Linux 10 VM (Mastodon) |
|
||||
| `shinku-ryuu` | 100.98.93.15 | vish | 22 | Windows desktop (Win32-OpenSSH) |
|
||||
| `laptop` | 100.124.91.52 | vish | 22 | Laptop (offline when sleeping) |
|
||||
|
||||
---
|
||||
|
||||
## Special Notes Per Host
|
||||
|
||||
### Atlantis & Calypso (Synology)
|
||||
- SSH port is non-standard (60000 / 62000) — configured in DSM → Terminal & SNMP
|
||||
- Synology Docker is at `/usr/local/bin/docker`, requires `sudo`
|
||||
- `User` is case-sensitive: `vish` on Atlantis, `Vish` (capital V) on Calypso
|
||||
|
||||
### homelab VM
|
||||
- **Password authentication only** — no SSH key installed on this host
|
||||
- Auth: password (same as the username) # pragma: allowlist secret
|
||||
|
||||
### pve (Proxmox)
|
||||
- Root login; key-based auth
|
||||
- To access containers: `ssh pve "pct exec <CTID> -- <command>"`
|
||||
|
||||
### GL-MT3000
|
||||
- Uses **dropbear** SSH (not OpenSSH) — no `/etc/ssh/sshd_config`
|
||||
- Authorized keys: `/etc/dropbear/authorized_keys`
|
||||
- Is the **gateway for jellyfish and Home Assistant** (LAN: `192.168.12.0/24`)
|
||||
- Advertises subnet route `192.168.12.0/24` via Headscale
|
||||
- Tailscale version: `1.92.5-tiny` (GL-inet custom build)
|
||||
|
||||
### GL-BE3600
|
||||
- Uses **dropbear** SSH (not OpenSSH)
|
||||
- Authorized keys: `/etc/dropbear/authorized_keys`
|
||||
- Acts as a **Wi-Fi repeater** on the home network (management: `192.168.68.53`, own LAN: `192.168.8.1`)
|
||||
- Ports are filtered from homelab VM and NUC — only reachable directly via its `192.168.8.x` LAN or Tailscale
|
||||
- Advertises subnet route `192.168.8.0/24` via Headscale
|
||||
- Tailscale version: `1.90.9-tiny` (GL-inet custom build)
|
||||
|
||||
### shinku-ryuu (Windows)
|
||||
- Running **Win32-OpenSSH v10.0.0.0** (installed via MSI from GitHub)
|
||||
- Authorized keys location: `C:\ProgramData\ssh\administrators_authorized_keys`
|
||||
- (NOT `~/.ssh/authorized_keys` — Windows OpenSSH ignores per-user authorized_keys for Administrator group members)
|
||||
- Permissions on that file must be restricted to SYSTEM and Administrators only
|
||||
|
||||
### TrueNAS (guava)
|
||||
- User `vish` is in the `docker` group — no `sudo` needed for Docker commands
|
||||
|
||||
---
|
||||
|
||||
## Headscale Subnet Routes
|
||||
|
||||
All subnet routes are approved via Headscale. Non-overlapping:
|
||||
|
||||
| Node | Subnet | Status |
|
||||
|------|--------|--------|
|
||||
| calypso | 192.168.0.0/24 | Serving (primary) — **advertiser** |
|
||||
| atlantis | 192.168.0.0/24 | Approved, not serving (backup) — **advertiser** |
|
||||
| vish-concord-nuc | 192.168.68.0/22 | Serving |
|
||||
| setillo | 192.168.69.0/24 | Serving |
|
||||
| gl-mt3000 | 192.168.12.0/24 | Serving |
|
||||
| gl-be3600 | 192.168.8.0/24 | Serving |
|
||||
|
||||
To inspect/approve routes:
|
||||
```bash
|
||||
# On Calypso (where Headscale container runs):
|
||||
ssh calypso
|
||||
docker exec headscale headscale nodes list
|
||||
docker exec headscale headscale nodes list-routes --identifier <ID>
|
||||
docker exec headscale headscale nodes approve-routes --identifier <ID> --routes <CIDR>
|
||||
```
|
||||
|
||||
> **Note**: In Headscale v0.28, `--user` takes a numeric ID, not a username. Use `headscale users list` to find IDs.
|
||||
|
||||
---
|
||||
|
||||
## Common SSH Tasks
|
||||
|
||||
```bash
|
||||
# Run a docker command on Atlantis
|
||||
ssh atlantis "sudo /usr/local/bin/docker ps"
|
||||
|
||||
# Run a docker command on Guava (no sudo needed)
|
||||
ssh guava "docker ps"
|
||||
|
||||
# Access a Proxmox LXC container
|
||||
ssh pve "pct exec 103 -- docker ps"
|
||||
|
||||
# Copy a file to Atlantis
|
||||
scp myfile.yaml atlantis:/volume1/docker/
|
||||
|
||||
# Port forward a remote service locally
|
||||
ssh -L 8080:localhost:8080 atlantis
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
```bash
|
||||
# Debug connection
|
||||
ssh -vvv <host>
|
||||
|
||||
# Remove stale host key (after host rebuild)
|
||||
ssh-keygen -R <hostname-or-ip>
|
||||
|
||||
# Fix local permissions
|
||||
chmod 700 ~/.ssh
|
||||
chmod 600 ~/.ssh/config
|
||||
chmod 600 ~/.ssh/authorized_keys
|
||||
chmod 600 ~/.ssh/id_ed25519
|
||||
chmod 644 ~/.ssh/id_ed25519.pub
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Last Updated*: 2026-03-10 (added vishdebian, mastodon-rocky)
|
||||
*All hosts accessed via Tailscale mesh — no direct LAN SSH*
|
||||
147
docs/infrastructure/USER_ACCESS_GUIDE.md
Normal file
147
docs/infrastructure/USER_ACCESS_GUIDE.md
Normal file
@@ -0,0 +1,147 @@
|
||||
# User Access Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers user management for the homelab, including Homarr dashboard access and Authentik SSO.
|
||||
|
||||
## Authentik SSO
|
||||
|
||||
### Users
|
||||
|
||||
| Username | Name | Email | Groups |
|
||||
|----------|------|-------|--------|
|
||||
| akadmin | authentik Default Admin | admin@example.com | authentik Admins |
|
||||
| aquabroom | Crista | partner@example.com | Viewers |
|
||||
| openhands | openhands | your-email@example.com | - |
|
||||
|
||||
### Groups
|
||||
|
||||
| Group | Purpose | Members |
|
||||
|-------|---------|---------|
|
||||
| **authentik Admins** | Full admin access | akadmin |
|
||||
| **Viewers** | Read-only access | aquabroom (Crista) |
|
||||
|
||||
### Sites Protected by Authentik Forward Auth
|
||||
|
||||
These sites share the same SSO cookie (`vish.gg` domain). Once logged in, users can access ALL of them:
|
||||
|
||||
| Site | Service | Notes |
|
||||
|------|---------|-------|
|
||||
| dash.vish.gg | Homarr Dashboard | Main homelab dashboard |
|
||||
| actual.vish.gg | Actual Budget | Budgeting app |
|
||||
| docs.vish.gg | Documentation | Docs server |
|
||||
| npm.vish.gg | Nginx Proxy Manager | ⚠️ Admin access |
|
||||
| paperless.vish.gg | Paperless-NGX | Document management |
|
||||
|
||||
### Sites with OAuth SSO
|
||||
|
||||
These apps have their own user management after Authentik login:
|
||||
|
||||
| Site | Service | User Management |
|
||||
|------|---------|-----------------|
|
||||
| git.vish.gg | Gitea | Gitea user permissions |
|
||||
| gf.vish.gg | Grafana | Grafana org/role permissions |
|
||||
| sf.vish.gg | Seafile | Seafile user permissions |
|
||||
| mm.crista.love | Mattermost | Mattermost team permissions |
|
||||
|
||||
## Homarr Dashboard
|
||||
|
||||
### Access URL
|
||||
- **External**: https://dash.vish.gg
|
||||
- **Internal**: http://atlantis.vish.local:7575
|
||||
|
||||
### User Management
|
||||
|
||||
Homarr has its own user system in addition to Authentik:
|
||||
|
||||
1. Go to **https://dash.vish.gg**
|
||||
2. Login via Authentik
|
||||
3. Click **Manage** → **Users**
|
||||
4. Create/manage users and permissions
|
||||
|
||||
### Permissions
|
||||
|
||||
| Permission | Can Do |
|
||||
|------------|--------|
|
||||
| **Admin** | Edit boards, manage users, full access |
|
||||
| **User** | View boards, use apps |
|
||||
| **View Only** | View boards only |
|
||||
|
||||
## Creating a New User
|
||||
|
||||
### Step 1: Create Authentik Account
|
||||
1. Go to https://sso.vish.gg/if/admin/
|
||||
2. **Directory** → **Users** → **Create**
|
||||
3. Fill in username, email, name
|
||||
4. Set password or send invite
|
||||
|
||||
### Step 2: Add to Group
|
||||
1. **Directory** → **Groups** → **Viewers**
|
||||
2. **Users** tab → **Add existing user**
|
||||
3. Select the user → **Add**
|
||||
|
||||
### Step 3: Create Homarr Account (Optional)
|
||||
1. Go to https://dash.vish.gg
|
||||
2. **Manage** → **Users** → **Create User**
|
||||
3. Set permissions (uncheck Admin for read-only)
|
||||
|
||||
## Restricting Access
|
||||
|
||||
### Option 1: Remove Forward Auth from Sensitive Sites
|
||||
|
||||
Edit NPM proxy host and remove the Authentik advanced config for sites you want to restrict.
|
||||
|
||||
### Option 2: Add Authentik Policy Bindings
|
||||
|
||||
1. Go to Authentik Admin → **Applications**
|
||||
2. Select the application
|
||||
3. **Policy / Group / User Bindings** tab
|
||||
4. Add a policy to restrict by group
|
||||
|
||||
### Option 3: App-Level Permissions
|
||||
|
||||
Configure permissions within each app (Grafana roles, Gitea teams, etc.)
|
||||
|
||||
## Access Policy
|
||||
|
||||
**Philosophy**: Trusted users (like partners) get full access to view everything, but only admins get superuser/admin privileges.
|
||||
|
||||
### Current Setup
|
||||
|
||||
| User | Authentik Superuser | Access Level |
|
||||
|------|---------------------|--------------|
|
||||
| akadmin | ✅ Yes | Full admin everywhere |
|
||||
| aquabroom (Crista) | ❌ No | View all sites, no admin powers |
|
||||
|
||||
### What This Means
|
||||
|
||||
Crista can:
|
||||
- ✅ Access all `*.vish.gg` sites after SSO login
|
||||
- ✅ View Homarr dashboard
|
||||
- ✅ Use Actual Budget, Paperless, etc.
|
||||
- ✅ View NPM settings
|
||||
- ❌ Cannot access Authentik admin panel
|
||||
- ❌ Cannot modify Authentik users/groups
|
||||
- ❌ App-specific admin depends on each app's settings
|
||||
|
||||
### App-Specific Permissions
|
||||
|
||||
Some apps have their own user management after Authentik login:
|
||||
- **Homarr**: Set user as non-admin when creating account
|
||||
- **Grafana**: Assign Viewer role (not Admin/Editor)
|
||||
- **Gitea**: Add to teams with read permissions
|
||||
- **Paperless**: Create user without admin flag
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Authentik Admin
|
||||
- URL: https://sso.vish.gg/if/admin/
|
||||
- Login: Your admin account
|
||||
|
||||
### Homarr Admin
|
||||
- URL: https://dash.vish.gg/manage
|
||||
- Login: Via Authentik SSO
|
||||
|
||||
### API Tokens
|
||||
- Authentik: Directory → Tokens & App passwords
|
||||
- Homarr: Manage → Settings → API
|
||||
166
docs/infrastructure/atlantis-migration.md
Normal file
166
docs/infrastructure/atlantis-migration.md
Normal file
@@ -0,0 +1,166 @@
|
||||
# Atlantis Migration Guide
|
||||
|
||||
Moving Atlantis NAS and homelab-vm to a new location while Calypso stays.
|
||||
|
||||
## Overview
|
||||
|
||||
```
|
||||
LOCATION A (Calypso stays) LOCATION B (New location)
|
||||
┌──────────────────────┐ ┌─────────────────────────────────┐
|
||||
│ CALYPSO │ │ ATLANTIS + HOMELAB-VM │
|
||||
│ ├── sso.vish.gg │ │ ├── pw.vish.gg │
|
||||
│ ├── git.vish.gg │◄──Internet─┤ ├── gf.vish.gg │
|
||||
│ ├── seafile │ │ ├── meet.thevish.io │
|
||||
│ └── paperless │ │ ├── mastodon.vish.gg │
|
||||
└──────────────────────┘ │ └── (all other services) │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Pre-Migration Checklist
|
||||
|
||||
### 1. Backup Everything
|
||||
- [ ] Portainer stack configurations exported
|
||||
- [ ] Docker volumes backed up
|
||||
- [ ] Synology configuration backed up
|
||||
- [ ] DNS records documented
|
||||
|
||||
### 2. Create Cloudflare Tunnels
|
||||
|
||||
#### Atlantis Tunnel
|
||||
1. Go to [Cloudflare Zero Trust](https://one.dash.cloudflare.com/)
|
||||
2. Navigate to: Networks → Tunnels → Create tunnel
|
||||
3. Name: `atlantis-tunnel`
|
||||
4. Copy the tunnel token
|
||||
5. Add public hostnames:
|
||||
|
||||
| Public Hostname | Type | Service |
|
||||
|-----------------|------|---------|
|
||||
| pw.vish.gg | HTTP | localhost:4080 |
|
||||
| cal.vish.gg | HTTP | localhost:12852 |
|
||||
| meet.thevish.io | HTTPS | localhost:5443 |
|
||||
| joplin.thevish.io | HTTP | localhost:22300 |
|
||||
| mastodon.vish.gg | HTTP | 192.168.0.154:3000 |
|
||||
| matrix.thevish.io | HTTP | 192.168.0.154:8081 |
|
||||
| mx.vish.gg | HTTP | 192.168.0.154:8082 |
|
||||
| mm.crista.love | HTTP | 192.168.0.154:8065 |
|
||||
|
||||
#### Homelab-VM Tunnel
|
||||
1. Create another tunnel named `homelab-vm-tunnel`
|
||||
2. Add public hostnames:
|
||||
|
||||
| Public Hostname | Type | Service |
|
||||
|-----------------|------|---------|
|
||||
| gf.vish.gg | HTTP | localhost:3300 |
|
||||
| ntfy.vish.gg | HTTP | localhost:8081 |
|
||||
| hoarder.thevish.io | HTTP | localhost:3000 |
|
||||
| binterest.thevish.io | HTTP | localhost:21544 |
|
||||
|
||||
### 3. Deploy Tunnel Containers
|
||||
|
||||
Deploy `cloudflare-tunnel.yaml` on both:
|
||||
- Atlantis: `hosts/synology/atlantis/cloudflare-tunnel.yaml`
|
||||
- Homelab-VM: `hosts/vms/homelab-vm/cloudflare-tunnel.yaml`
|
||||
|
||||
Set the `TUNNEL_TOKEN` environment variable in Portainer.
|
||||
|
||||
### 4. Test Before Moving
|
||||
- [ ] Verify tunnel shows "Healthy" in Cloudflare dashboard
|
||||
- [ ] Test each service through tunnel (may conflict with current reverse proxy)
|
||||
|
||||
## Migration Day
|
||||
|
||||
### Step 1: Update Calypso Reverse Proxy
|
||||
Remove entries that will be handled by tunnels:
|
||||
- pw.vish.gg
|
||||
- cal.vish.gg
|
||||
- meet.thevish.io
|
||||
- joplin.thevish.io
|
||||
- mastodon.vish.gg
|
||||
- matrix.thevish.io
|
||||
- mx.vish.gg
|
||||
- mm.crista.love
|
||||
- gf.vish.gg
|
||||
- ntfy.vish.gg
|
||||
- hoarder.thevish.io
|
||||
- binterest.thevish.io
|
||||
|
||||
Keep only Calypso's local services:
|
||||
- sso.vish.gg
|
||||
- git.vish.gg
|
||||
- sf.vishconcord.synology.me
|
||||
- paperlessngx.vishconcord.synology.me
|
||||
- actual.vishconcord.synology.me
|
||||
- (other localhost services)
|
||||
|
||||
### Step 2: Update DDNS Configuration
|
||||
|
||||
**Calypso** (`dynamic_dns.yaml`):
|
||||
Only update domains that Calypso serves directly:
|
||||
- sso.vish.gg
|
||||
- git.vish.gg
|
||||
- (other Calypso services)
|
||||
|
||||
**Atlantis**:
|
||||
Disable or remove DDNS updater - tunnels don't need public IP.
|
||||
|
||||
### Step 3: Physical Move
|
||||
1. Shut down Atlantis and homelab-vm gracefully
|
||||
2. Transport equipment
|
||||
3. Connect to new network
|
||||
4. Power on and verify tunnel connectivity
|
||||
|
||||
### Step 4: Verify Services
|
||||
- [ ] All tunneled services accessible
|
||||
- [ ] Calypso services still working
|
||||
- [ ] No DNS conflicts
|
||||
|
||||
## Post-Migration
|
||||
|
||||
### DNS Records After Migration
|
||||
|
||||
| Domain | Before | After |
|
||||
|--------|--------|-------|
|
||||
| pw.vish.gg | A record → home IP | CNAME → tunnel |
|
||||
| gf.vish.gg | A record → home IP | CNAME → tunnel |
|
||||
| sso.vish.gg | A record → home IP | A record → Calypso IP (unchanged) |
|
||||
| git.vish.gg | A record → home IP | A record → Calypso IP (unchanged) |
|
||||
|
||||
### Benefits of Cloudflare Tunnel
|
||||
- No port forwarding needed at new location
|
||||
- Automatic SSL
|
||||
- DDoS protection
|
||||
- Works behind CGNAT
|
||||
- Access policies via Cloudflare Access (optional)
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If issues occur:
|
||||
1. Connect Atlantis back to original network
|
||||
2. Re-enable Calypso reverse proxy entries
|
||||
3. Disable tunnel containers
|
||||
4. Services resume through Calypso
|
||||
|
||||
## Services by Location (Post-Migration)
|
||||
|
||||
### Location A - Calypso Only
|
||||
| Service | Domain | Port |
|
||||
|---------|--------|------|
|
||||
| Authentik | sso.vish.gg | 9000 |
|
||||
| Gitea | git.vish.gg | 3052 |
|
||||
| Seafile | sf.vishconcord.synology.me | 8611 |
|
||||
| Paperless | paperlessngx.vishconcord.synology.me | 8777 |
|
||||
| Actual | actual.vishconcord.synology.me | 8304 |
|
||||
|
||||
### Location B - Via Cloudflare Tunnel
|
||||
| Service | Domain | Host | Port |
|
||||
|---------|--------|------|------|
|
||||
| Vaultwarden | pw.vish.gg | Atlantis | 4080 |
|
||||
| Grafana | gf.vish.gg | homelab-vm | 3300 |
|
||||
| Jitsi | meet.thevish.io | Atlantis | 5443 |
|
||||
| Mastodon | mastodon.vish.gg | Atlantis VM | 3000 |
|
||||
| Ntfy | ntfy.vish.gg | homelab-vm | 8081 |
|
||||
| Hoarder | hoarder.thevish.io | homelab-vm | 3000 |
|
||||
| Binterest | binterest.thevish.io | homelab-vm | 21544 |
|
||||
| Joplin | joplin.thevish.io | Atlantis | 22300 |
|
||||
| Calendar | cal.vish.gg | Atlantis | 12852 |
|
||||
| Matrix | matrix.thevish.io | Atlantis VM | 8081 |
|
||||
407
docs/infrastructure/authentik-sso.md
Normal file
407
docs/infrastructure/authentik-sso.md
Normal file
@@ -0,0 +1,407 @@
|
||||
# Authentik SSO Setup
|
||||
|
||||
Single Sign-On (SSO) for homelab services using Authentik.
|
||||
|
||||
## Overview
|
||||
|
||||
Authentik provides centralized authentication for all homelab services via OAuth2/OpenID Connect.
|
||||
|
||||
- **URL**: https://sso.vish.gg
|
||||
- **Admin Interface**: https://sso.vish.gg/if/admin/
|
||||
- **User Portal**: https://sso.vish.gg/if/user/
|
||||
- **Host**: Calypso NAS (Synology DS723+)
|
||||
- **Stack**: Docker Compose via Portainer
|
||||
|
||||
## Admin Credentials
|
||||
|
||||
- **Username**: `akadmin`
|
||||
- **Email**: `admin@example.com`
|
||||
- **Password**: REDACTED_PASSWORD in password manager
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────────┐
|
||||
│ Cloudflare │
|
||||
│ (DNS + SSL) │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
┌────────▼─────────┐
|
||||
│ sso.vish.gg │
|
||||
│ (Authentik) │
|
||||
│ Calypso NAS │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
┌────────────────────┼────────────────────┐
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────┐ ┌─────────┐ ┌──────────┐
|
||||
│ Grafana │ │ Gitea │ │Portainer │
|
||||
│gf.vish.gg│ │git.vish.gg│ │ internal │
|
||||
│homelab-vm│ │ Calypso │ │ Calypso │
|
||||
└─────────┘ └─────────┘ └──────────┘
|
||||
```
|
||||
|
||||
## OAuth2 Providers
|
||||
|
||||
### Grafana
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Client ID | `lEGw1UJ9Mhk6QVrNA61rAsr59Kel9gAvdPQ1FAJA` |
|
||||
| Client Secret | `ArP5XWdkwVyw9nvXZaqjE9sIjXdmIgpgI4ZR8oKvTUVLgmIGVvKU8T867diMGSQXgTcWQQPbdbEdXTU1v3y9RKMnAqu2k6V4xlmxwNYlCDuk5inxJSdoC0V8ICtZxk1X` |
|
||||
| Redirect URI | `https://gf.vish.gg` |
|
||||
| Scopes | `openid profile email` |
|
||||
|
||||
**Configuration File**: `hosts/vms/homelab-vm/monitoring.yaml`
|
||||
|
||||
### Gitea
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Client ID | `7KamS51a0H7V8HyIsfMKNJ8COstZEFh4Z8Em6ZhO` |
|
||||
| Client Secret | `3IjyKCbHtgev6eMb1hYpQGHoGwPSRKda4ijRtbWfkhguNomxexxTiWtoWtyrXwGaF0ORj4D7D0kzB3Z1YN9DN5iz0HOKjAn5AdWJrSyxan02MjiwKmEriAbSGyh53uph` |
|
||||
| Redirect URI | `https://git.vish.gg/user/oauth2/authentik/callback` |
|
||||
| Discovery URL | `https://sso.vish.gg/application/o/gitea/.well-known/openid-configuration` |
|
||||
|
||||
**Configuration File**: `hosts/synology/calypso/gitea-server.yaml`
|
||||
|
||||
**Manual Setup Required**: Add OAuth2 source in Gitea admin UI:
|
||||
1. Go to Site Administration → Authentication Sources
|
||||
2. Add new OAuth2 source
|
||||
3. Use Discovery URL for auto-configuration
|
||||
|
||||
### Portainer
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Client ID | `fLLnVh8iUyJYdw5HKdt1Q7LHKJLLB8tLZwxmVhNs` |
|
||||
| Client Secret | `xD9u47XbJd2g7vCeIyJC7MNvfEqytEnnHeVtJ7nU5Y1XGxYncXkejNAYkToUiRWcym3GpZIXgMpUnNNuUwud0Ff493ZwSHCiSKsk9n6RJLJ1iVvR20NdDnMe4YEGYXrt` |
|
||||
| Redirect URI | `http://vishinator.synology.me:10000` |
|
||||
| User Identifier | `email` |
|
||||
|
||||
**Configuration**: Via Portainer API (`/api/settings`)
|
||||
|
||||
### Reactive Resume v5
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Client ID | `QU5qA7jLP9ghxy7iGMJoyZsCja2vY2Y2oGaLGjxA` |
|
||||
| Client Secret | `wX1aFaby4aIABjLBBClYu4ukmIOjviL85GJBX8bAB3srQnt1BD31LcblRKyxzuv1yGwtsKLTFjwz12rUy6HknOqpIwk1QQ21jMjpWb1aa77iRG6lDkf4eNf8wWpE9Apo` |
|
||||
| Redirect URI | `https://rx.vish.gg/api/auth/callback/custom` |
|
||||
| Discovery URL | `https://sso.vish.gg/application/o/reactive-resume/.well-known/openid-configuration` |
|
||||
|
||||
**Configuration File**: `hosts/synology/calypso/reactive_resume_v5/docker-compose.yml` (also live at `/volume1/docker/rxv5/docker-compose.yml` on Calypso)
|
||||
|
||||
### Homarr
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Client ID | `8oP0ha7gLjdz13MAPVsb7fe7TBkFBz7mt1eU8MEO` |
|
||||
| Client Secret | `SpJXIGDk3SJfiS9GJwzH0fKrePsrumvCOmvFd2h0hEfxXMO77aCtpPEs6FShLTaUW5YxqgEDFkQi7q9NIOQDJTPQHlSy3nIeyDQmS2tVIV1BpSdGpnLQedouOkXACwe2` |
|
||||
| Redirect URI | `https://dash.vish.gg/api/auth/callback/oidc` |
|
||||
| Admin Group | `Homarr Admins` (Authentik group, pk=`892da833-5283-4672-a906-7448ae3ba9b6`) |
|
||||
| Discovery URL | `https://sso.vish.gg/application/o/homarr/.well-known/openid-configuration` |
|
||||
|
||||
**Configuration File**: `hosts/synology/atlantis/homarr.yaml`
|
||||
|
||||
**Note**: `SECRET_ENCRYPTION_KEY` is required by Homarr — a 64-char hex key must be provided as an env var. The `AUTH_OIDC_ADMIN_GROUP` and `AUTH_OIDC_OWNER_GROUP` map to an Authentik group name.
|
||||
|
||||
### Immich
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Client ID | `XSHhp1Hys1ZyRpbpGUv4iqu1y1kJXX7WIIFETqcL` |
|
||||
| Client Secret | `mlbc4NbqiyRyUSqeUupaob7WsA3sURWExmoxYAcozClnmsdCPzGHlyO6zmErnS9YNyBsKOYoGUPvSTQPrE07UnYDLSMy286fycHoAJoc0cAN8BMc5cIif5kf88NSNCj2` |
|
||||
| Redirect URIs | `http://192.168.0.250:8212/auth/login`, `http://calypso.vish.local:8212/auth/login`, `app.immich:/` |
|
||||
| Issuer URL | `https://sso.vish.gg/application/o/immich/` |
|
||||
| Button Text | `Sign in with Authentik` |
|
||||
| Auto Register | true |
|
||||
|
||||
**Configuration**: Via `immich-config.json` mounted at `/config/immich-config.json` inside the container. Config file lives at `/volume1/docker/immich/config/immich-config.json` on Calypso and is tracked at `/home/homelab/immich-config.json`.
|
||||
|
||||
**Note**: Immich constructs the redirect URI dynamically from the hostname the browser used to access it — so every access hostname must be registered in Authentik. Currently registered: IP, `calypso.vish.local`, `app.immich:/`. `mobileRedirectUri` in the config file must be empty string — Immich's validator rejects custom URI schemes there.
|
||||
|
||||
### Headplane
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Provider PK | `16` |
|
||||
| Client ID | `1xLx9TkufvLGKgq8UmQV2RfTB6raSpEjZExBOhJ4` |
|
||||
| Client Secret | `4r4n96jBGc8MlonyHStiN09ow0txTwERLupt9hsoNswpicEnJZHgKwi38jYP5zlou5J525dVFUmXNSvnxwBJgKIIAfpC43zi8yUVtT0NYNdEBeYQOsh1YW5jK8nVPSdc` |
|
||||
| Redirect URI | `https://headscale.vish.gg:8443/admin/oidc/callback` |
|
||||
| Issuer URL | `https://sso.vish.gg/application/o/headplane/` |
|
||||
| Scopes | `openid profile email` |
|
||||
| Sub Mode | `hashed_user_id` |
|
||||
|
||||
**Configuration File**: `hosts/synology/calypso/headplane-config.yaml` (reference, secrets redacted). Live config at `/volume1/docker/headscale/headplane/config.yaml` on Calypso.
|
||||
|
||||
**Note**: Headplane is served at `https://headscale.vish.gg:8443/admin` — no separate domain. NPM proxy host 44 routes `/admin` to port 3002. First user to log in via OIDC is automatically assigned the Owner role.
|
||||
|
||||
### NetBox
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Provider PK | `23` |
|
||||
| Client ID | `BB7PiOu8xFOl58H2MUfl9IHISVLuJ4UwwMGvmJ9N` |
|
||||
| Client Secret | `CRdRVCM13JN9bSiT2aU74cFXSI9GpVBLBShOFGBpVHOQ4brnDWOzk8I02cEww8Gcrr6GnsU0XdBxHTEpfvX2u9rhmey7XDT3XUVVh9ADaSldww83hp4hAzH5eNx1zKvB` |
|
||||
| Redirect URI | `https://nb.vish.gg/oauth/complete/oidc/` |
|
||||
| Discovery URL | `https://sso.vish.gg/application/o/netbox/.well-known/openid-configuration` |
|
||||
| Scopes | `openid profile email` |
|
||||
|
||||
**Configuration**: NetBox `configuration.py` on homelab-vm (`/home/homelab/docker/netbox/config/configuration.py`). Uses `python-social-auth` with `social_core.backends.open_id_connect.OpenIdConnectAuth` backend. `associate_by_email` pipeline maps Authentik users to existing NetBox accounts by email.
|
||||
|
||||
## Authentik Endpoints
|
||||
|
||||
| Endpoint | URL |
|
||||
|----------|-----|
|
||||
| Authorization | `https://sso.vish.gg/application/o/authorize/` |
|
||||
| Token | `https://sso.vish.gg/application/o/token/` |
|
||||
| User Info | `https://sso.vish.gg/application/o/userinfo/` |
|
||||
| JWKS | `https://sso.vish.gg/application/o/{app-slug}/jwks/` |
|
||||
| OpenID Config | `https://sso.vish.gg/application/o/{app-slug}/.well-known/openid-configuration` |
|
||||
| End Session | `https://sso.vish.gg/application/o/{app-slug}/end-session/` |
|
||||
|
||||
## Docker Compose Configuration
|
||||
|
||||
**Location**: `hosts/synology/calypso/authentik.yaml`
|
||||
|
||||
Key environment variables:
|
||||
- `AUTHENTIK_SECRET_KEY`: Random secret for encryption
|
||||
- `AUTHENTIK_REDIS__HOST`: Redis container hostname
|
||||
- `AUTHENTIK_POSTGRESQL__*`: PostgreSQL connection settings
|
||||
|
||||
## SSL/TLS Configuration
|
||||
|
||||
SSL is handled by Cloudflare Origin Certificate:
|
||||
- Certificate ID: `lONWNn` (Synology reverse proxy)
|
||||
- Covers: `*.vish.gg`
|
||||
- Origin: Cloudflare Full (Strict) mode
|
||||
|
||||
## DNS Configuration
|
||||
|
||||
| Domain | Type | Target | Proxy |
|
||||
|--------|------|--------|-------|
|
||||
| sso.vish.gg | CNAME | calypso DDNS | Orange (proxied) |
|
||||
|
||||
## Adding New Services
|
||||
|
||||
### Method 1: OAuth2/OpenID (for apps that support it)
|
||||
|
||||
1. **Create Provider in Authentik**
|
||||
- Admin → Providers → Create → OAuth2/OpenID
|
||||
- Set name, redirect URIs, scopes
|
||||
|
||||
2. **Create Application**
|
||||
- Admin → Applications → Create
|
||||
- Link to provider
|
||||
- Set launch URL
|
||||
|
||||
3. **Configure Service**
|
||||
- Add OAuth2/OIDC settings to service config
|
||||
- Use Authentik endpoints
|
||||
- Test login flow
|
||||
|
||||
### Method 2: Proxy Provider (for apps without OAuth support)
|
||||
|
||||
Use this for apps like Actual Budget, Paperless-NGX, etc.
|
||||
|
||||
1. **Create Proxy Provider in Authentik**
|
||||
- Admin → Providers → Create → Proxy Provider
|
||||
- Name: e.g., "actual-proxy"
|
||||
- Authorization flow: default-provider-authorization-implicit-consent
|
||||
- External host: `https://actual.vish.gg`
|
||||
- Mode: Forward auth (single application)
|
||||
|
||||
2. **Create Application**
|
||||
- Admin → Applications → Create
|
||||
- Name: e.g., "Actual Budget"
|
||||
- Slug: `actual`
|
||||
- Provider: Select the proxy provider
|
||||
- Launch URL: `https://actual.vish.gg`
|
||||
|
||||
3. **Create Outpost** (if not exists)
|
||||
- Admin → Applications → Outposts
|
||||
- Create embedded outpost or deploy standalone
|
||||
- Add the application to the outpost
|
||||
|
||||
4. **Configure Nginx/Reverse Proxy**
|
||||
|
||||
Add forward auth to your reverse proxy config:
|
||||
|
||||
```nginx
|
||||
location / {
|
||||
# Forward auth to Authentik
|
||||
auth_request /outpost.goauthentik.io/auth/nginx;
|
||||
error_page 401 = @goauthentik_proxy_signin;
|
||||
|
||||
auth_request_set $auth_cookie $upstream_http_set_cookie;
|
||||
add_header Set-Cookie $auth_cookie;
|
||||
|
||||
auth_request_set $authentik_username $upstream_http_x_authentik_username;
|
||||
auth_request_set $authentik_groups $upstream_http_x_authentik_groups;
|
||||
auth_request_set $authentik_email $upstream_http_x_authentik_email;
|
||||
|
||||
proxy_set_header X-authentik-username $authentik_username;
|
||||
proxy_set_header X-authentik-groups $authentik_groups;
|
||||
proxy_set_header X-authentik-email $authentik_email;
|
||||
|
||||
# Your existing proxy_pass
|
||||
proxy_pass http://localhost:PORT;
|
||||
}
|
||||
|
||||
location /outpost.goauthentik.io {
|
||||
proxy_pass https://sso.vish.gg/outpost.goauthentik.io;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Original-URL $scheme://$http_host$request_uri;
|
||||
}
|
||||
|
||||
location @goauthentik_proxy_signin {
|
||||
internal;
|
||||
add_header Set-Cookie $auth_cookie;
|
||||
return 302 /outpost.goauthentik.io/start?rd=$request_uri;
|
||||
}
|
||||
```
|
||||
|
||||
### Bypassing Auth for Share Links
|
||||
|
||||
For services like Seafile that have share links:
|
||||
|
||||
```nginx
|
||||
# Allow share links without auth
|
||||
location /f/ {
|
||||
proxy_pass http://localhost:8611;
|
||||
}
|
||||
|
||||
location /d/ {
|
||||
proxy_pass http://localhost:8611;
|
||||
}
|
||||
|
||||
# Everything else requires auth
|
||||
location / {
|
||||
auth_request /outpost.goauthentik.io/auth/nginx;
|
||||
# ... rest of auth config
|
||||
proxy_pass http://localhost:8611;
|
||||
}
|
||||
```
|
||||
|
||||
## Services Protection Summary
|
||||
|
||||
### OAuth2/OpenID Connect (Login Button)
|
||||
|
||||
Services with native OAuth support - users see a "Sign in with Authentik" button.
|
||||
|
||||
| Domain | Service | Backend | Port | Status |
|
||||
|--------|---------|---------|------|--------|
|
||||
| gf.vish.gg | Grafana | 192.168.0.210 | 3300 | ✅ Working |
|
||||
| git.vish.gg | Gitea | 192.168.0.250 | 3052 | ✅ Working |
|
||||
| sf.vish.gg | Seafile | 192.168.0.250 | 8611 | ✅ Working |
|
||||
| vishinator.synology.me:10000 | Portainer | 192.168.0.250 | 9000 | ✅ Working |
|
||||
| rx.vish.gg | Reactive Resume v5 | 192.168.0.250 | 4550 | ✅ Working |
|
||||
| dash.vish.gg | Homarr | 192.168.0.200 | 7575 | ✅ Working |
|
||||
| immich.vish.gg | Immich | 192.168.0.250 | 8212 | ✅ Working |
|
||||
| headscale.vish.gg/admin | Headplane | 192.168.0.250 | 3002 | ✅ Working |
|
||||
| nb.vish.gg | NetBox | 192.168.0.210 | 8443 | ✅ Working |
|
||||
|
||||
### Proxy Provider (Forward Auth)
|
||||
|
||||
Services without OAuth support - Authentik intercepts all requests and requires login first.
|
||||
|
||||
| Domain | Service | Backend | Port | Status |
|
||||
|--------|---------|---------|------|--------|
|
||||
| paperless.vish.gg | Paperless-NGX | 192.168.0.250 | 8777 | ✅ Working |
|
||||
| docs.vish.gg | Paperless-NGX | 192.168.0.250 | 8777 | ✅ Working |
|
||||
| actual.vish.gg | Actual Budget | 192.168.0.250 | 8304 | ✅ Working |
|
||||
| npm.vish.gg | NPM Admin | 192.168.0.250 | 81 | ✅ Working |
|
||||
| kuma.vish.gg | Uptime Kuma | 192.168.0.66 | 3001 | ✅ Working — `/status/*` public, rest gated |
|
||||
| ollama.vish.gg | Ollama | 192.168.0.200 | 11434 | ✅ Working |
|
||||
| wizarr.vish.gg | Wizarr | 192.168.0.200 | 5690 | ❌ Removed — caused redirect loop; Wizarr uses own auth |
|
||||
|
||||
### Services Without SSO
|
||||
|
||||
These services use their own authentication or are public.
|
||||
|
||||
| Domain | Service | Backend | Notes |
|
||||
|--------|---------|---------|-------|
|
||||
| sso.vish.gg | Authentik | 192.168.0.250:9000 | SSO itself |
|
||||
| pw.vish.gg | Vaultwarden | 192.168.0.200:4080 | Own auth |
|
||||
| ntfy.vish.gg | Ntfy | 192.168.0.210:8081 | Own auth |
|
||||
| cal.vish.gg | Baikal | 192.168.0.200:12852 | CalDAV auth |
|
||||
| dav.vish.gg | Seafile WebDAV | 192.168.0.250:8612 | WebDAV auth |
|
||||
| mm.crista.love | Mattermost | 192.168.0.154:8065 | Own auth |
|
||||
| mastodon.vish.gg | Mastodon | 192.168.0.154:3000 | Own auth |
|
||||
| mx.vish.gg | Mail | 192.168.0.154:8082 | Own auth |
|
||||
| ollama.vish.gg | Ollama | 192.168.0.200:11434 | See Forward Auth table above |
|
||||
| retro.vish.gg | Retro Site | 192.168.0.250:8025 | Static site |
|
||||
| rackula.vish.gg | Rackula | 192.168.0.250:3891 | Own auth |
|
||||
| ost.vish.gg | OpenSpeedTest | 192.168.0.250:8004 | Public |
|
||||
|
||||
### Other Domains
|
||||
|
||||
| Domain | Service | Backend | Notes |
|
||||
|--------|---------|---------|-------|
|
||||
| hoarder.thevish.io | Hoarder | 192.168.0.210:3000 | Own auth |
|
||||
| matrix.thevish.io | Matrix | 192.168.0.154:8081 | Own auth |
|
||||
| joplin.thevish.io | Joplin Server | 192.168.0.200:22300 | Own auth |
|
||||
| meet.thevish.io | Jitsi | 192.168.0.200:5443 | Public |
|
||||
| binterest.thevish.io | Binternet | 192.168.0.210:21544 | Own auth |
|
||||
| crista.love | Personal Site | 192.168.0.100:28888 | Static |
|
||||
| rxv4access.vish.gg | Reactive Resume v4 | 192.168.0.250:9751 | STALE - 525 SSL error, dead instance |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### OAuth Login Fails with "Unauthorized"
|
||||
- Verify user has email set in Authentik
|
||||
- Check redirect URI matches exactly
|
||||
- Verify client secret is correct
|
||||
|
||||
### Certificate Errors
|
||||
- Ensure Cloudflare proxy is enabled (orange cloud)
|
||||
- Verify origin certificate is valid
|
||||
- Check Synology reverse proxy SSL settings
|
||||
|
||||
### User Auto-Creation Not Working
|
||||
- Enable "Auto Create Users" in service OAuth settings
|
||||
- Verify email scope is requested
|
||||
- Check user identifier matches (email/username)
|
||||
|
||||
## Recovery Access
|
||||
|
||||
If locked out of Authentik admin, you can create a recovery token:
|
||||
|
||||
```bash
|
||||
# Via Portainer exec or SSH to Calypso
|
||||
docker exec -it Authentik-SERVER ak create_recovery_key 10 akadmin
|
||||
```
|
||||
|
||||
This generates a one-time recovery URL valid for 10 minutes.
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Cloudflare Tunnels](./cloudflare-tunnels.md)
|
||||
- [Port Forwarding Configuration](./port-forwarding-configuration.md)
|
||||
- [Security](./security.md)
|
||||
- [Grafana OAuth](../services/individual/grafana-oauth.md)
|
||||
- [Gitea OAuth](../services/individual/gitea.md#-oauth2-single-sign-on-authentik)
|
||||
- [Seafile OAuth](../services/individual/seafile-oauth.md)
|
||||
|
||||
## Change Log
|
||||
|
||||
- **2026-03-17**: Added NetBox OIDC provider (pk=23) — nb.vish.gg, associate_by_email pipeline
|
||||
- **2026-03-17**: Removed Wizarr forward auth from NPM (wizarr has own auth, forward auth caused redirect loop)
|
||||
- **2026-03-11**: Added Headplane OIDC provider (pk=16) — Headscale web UI at headscale.vish.gg/admin, port 3002
|
||||
- **2026-03-08**: Added Forward Auth for Uptime Kuma (kuma.vish.gg), Ollama (ollama.vish.gg), Wizarr (wizarr.vish.gg)
|
||||
- **2026-03-08**: Kuma /status/* and Wizarr /i/* paths are public; all other paths gated
|
||||
- **2026-03-08**: Removed Forward Auth from dash.vish.gg NPM proxy (Homarr handles auth natively via OIDC)
|
||||
- **2026-03-08**: Disabled Uptime Kuma built-in auth (disableAuth=true in SQLite); Authentik is sole gate
|
||||
- **2026-03-08**: Calibre-Web started on port 8183 (8083 was occupied by Watchtower)
|
||||
- **2026-03-08**: Added OIDC for Reactive Resume v5 (rx.vish.gg), Homarr (dash.vish.gg), Immich (immich.vish.gg) — all working
|
||||
- **2026-03-08**: Fixed Homarr startup crash — SECRET_ENCRYPTION_KEY is mandatory (64-char hex)
|
||||
- **2026-03-08**: Immich OAuth configured via immich-config.json mount (not Admin UI); mobileRedirectUri must be empty
|
||||
- **2026-03-08**: Immich stack.env added to repo so stack is self-contained (no Portainer env injection needed)
|
||||
- **2026-03-08**: Flagged rxv4access.vish.gg as stale (dead RR v4 instance, 525 SSL error)
|
||||
- **2026-01-31**: Verified all OAuth2 and Forward Auth services working
|
||||
- **2026-01-31**: Fixed Grafana OAuth "InternalError" - added scope mappings to provider
|
||||
- **2026-01-31**: Removed Forward Auth from NPM for gf.vish.gg (conflicts with native OAuth)
|
||||
- **2026-01-31**: Added scope mappings to Gitea, Portainer, Seafile OAuth2 providers
|
||||
- **2026-01-31**: Updated comprehensive service protection summary
|
||||
234
docs/infrastructure/backup-strategy.md
Normal file
234
docs/infrastructure/backup-strategy.md
Normal file
@@ -0,0 +1,234 @@
|
||||
# Backup Strategy
|
||||
|
||||
Last updated: 2026-03-21
|
||||
|
||||
## Overview
|
||||
|
||||
The homelab follows a **3-2-1+ backup strategy**: 3 copies of data, 2 different storage types, 1 offsite location, plus cloud backup to Backblaze B2.
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ BACKUP FLOW │
|
||||
│ │
|
||||
│ Atlantis (Primary) ──── Hyper Backup (weekly) ──── Calypso (Local copy) │
|
||||
│ │ │
|
||||
│ ├── Syncthing (real-time) ──── Setillo (Tucson, offsite) │
|
||||
│ │ │
|
||||
│ └── Hyper Backup S3 (weekly) ──── Backblaze B2 (cloud) │
|
||||
│ │ │
|
||||
│ Calypso ──── Hyper Backup S3 (daily) ─────┘ │
|
||||
│ │
|
||||
│ Guava ──── Restic (daily 3AM) ──── Backblaze B2 (vk-guava, encrypted) │
|
||||
│ Jellyfish ──── No backup (risk) │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Backup Tasks
|
||||
|
||||
### Atlantis → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Task name** | Backblaze b2 |
|
||||
| **Schedule** | Weekly, Sundays 00:00 |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-atlantis` |
|
||||
| **Encrypted** | Yes (client-side) |
|
||||
| **Versioned** | Yes (Smart Recycle) |
|
||||
| **Rotation** | Smart Recycle: keep daily for 7 days, weekly for 4 weeks, monthly for 3 months (max 30 versions) |
|
||||
|
||||
**What's backed up:**
|
||||
- `/archive` — long-term cold storage
|
||||
- `/documents/msi_uqiyoe` — PC sync documents
|
||||
- `/documents/pc_sync_documents` — PC sync documents
|
||||
- `/downloads` — download staging
|
||||
- `/photo` — Synology Photos library
|
||||
- `/homes/vish/Photos` — user photo library
|
||||
- Apps: SynologyPhotos, SynologyDrive, FileStation, HyperBackup, SynoFinder
|
||||
|
||||
**What's NOT backed up to cloud:**
|
||||
- `/volume1/media` (~60TB) — too large for cloud backup, replicated to Setillo instead
|
||||
- `/volume1/docker` — container data (stateless, can be redeployed from git)
|
||||
|
||||
### Calypso → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Task name** | Backblaze S3 |
|
||||
| **Schedule** | Daily, 00:00 |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-concord-1` |
|
||||
| **Encrypted** | Yes (client-side) |
|
||||
| **Versioned** | Yes (Smart Recycle) |
|
||||
|
||||
**What's backed up:**
|
||||
- `/docker/authentik` — SSO provider data (critical)
|
||||
- `/docker/gitea` — Git hosting data (critical)
|
||||
- `/docker/headscale` — VPN control plane (critical)
|
||||
- `/docker/immich` — Photo management DB
|
||||
- `/docker/nginx-proxy-manager` — old NPM config (historical)
|
||||
- `/docker/paperlessngx` — Document management DB
|
||||
- `/docker/retro_site` — Personal website
|
||||
- `/docker/seafile` — File storage data
|
||||
- `/data/media/misc` — miscellaneous media
|
||||
- `/data/media/music` — music library
|
||||
- `/data/media/photos` — photo library
|
||||
- Apps: Gitea, MariaDB10, CloudSync, Authentik, Immich, Paperless, HyperBackup
|
||||
|
||||
### Atlantis → Calypso (Local Copy)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Method** | Hyper Backup |
|
||||
| **Schedule** | Weekly |
|
||||
| **Destination** | Calypso `/volume1/backups/` |
|
||||
| **What** | Media, photos, documents |
|
||||
| **Encrypted** | Yes |
|
||||
|
||||
### Atlantis/Calypso → Setillo (Offsite)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Method** | Syncthing (real-time replication) |
|
||||
| **Destination** | Setillo `/volume1/syncthing/` (Tucson, AZ) |
|
||||
| **Distance** | ~1,000 miles from primary site |
|
||||
| **What** | Docker configs, critical data |
|
||||
|
||||
### Setillo → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Task name** | Backblaze B2 |
|
||||
| **Schedule** | Scheduled |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-setillo` |
|
||||
| **Encrypted** | No (data encryption disabled — transit only) |
|
||||
| **Versioned** | Yes (Smart Recycle) |
|
||||
| **Rotation** | Smart Recycle: keep daily for 7 days, weekly for 4 weeks, monthly for 3 months (max 30 versions) |
|
||||
|
||||
**What's backed up:**
|
||||
- `/backups` — backup destination
|
||||
- `/homes/Setillo/Documents` — Edgar's documents
|
||||
- `/homes/vish` — vish home directory
|
||||
- `/PlexMediaServer/2015_2016_crista_green_iphone_5c` — legacy phone photos
|
||||
- `/PlexMediaServer/other` — other media
|
||||
- `/PlexMediaServer/photos` — photos
|
||||
- Apps: DownloadStation, FileStation, HyperBackup, SurveillanceStation, SynoFinder, WebDAVServer
|
||||
|
||||
### Guava (TrueNAS) → Backblaze B2 (Cloud)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Tool** | Restic + Rclone |
|
||||
| **Schedule** | Daily, 03:00 (TrueNAS cron job ID 1) |
|
||||
| **Destination** | `s3.us-west-004.backblazeb2.com` |
|
||||
| **Bucket** | `vk-guava` |
|
||||
| **Repo path** | `vk-guava/restic` |
|
||||
| **Encrypted** | Yes (AES-256, restic client-side encryption) |
|
||||
| **Password file** | `/root/.restic-password` (chmod 600) |
|
||||
| **Rclone config** | `/root/.config/rclone/rclone.conf` |
|
||||
| **Retention** | `--keep-daily 7 --keep-weekly 4 --keep-monthly 3 --prune` |
|
||||
|
||||
**What's backed up:**
|
||||
- `/mnt/data/photos` (158 GB) — photo library (critical)
|
||||
- `/mnt/data/cocalc` (323 MB) — CoCalc notebooks and data
|
||||
- `/mnt/data/medical` (14 MB) — medical records (critical)
|
||||
- `/mnt/data/website` (58 MB) — website data
|
||||
- `/mnt/data/openproject` (13 MB) — project management DB
|
||||
- `/mnt/data/fasten` (5 MB) — health data
|
||||
|
||||
**What's NOT backed up:**
|
||||
- `/mnt/data/guava_turquoise` (3 TB) — large dataset, not yet assessed
|
||||
- `/mnt/data/jellyfin` (203 GB) — media metadata, re-downloadable
|
||||
- `/mnt/data/llama` (64 GB) — LLM models, re-downloadable
|
||||
- `/mnt/data/iso` (556 MB) — ISOs, re-downloadable
|
||||
|
||||
**Backup command (manual run):**
|
||||
```bash
|
||||
sudo restic -r rclone:b2:vk-guava/restic \
|
||||
--password-file /root/.restic-password \
|
||||
backup /mnt/data/photos /mnt/data/cocalc /mnt/data/medical \
|
||||
/mnt/data/website /mnt/data/openproject /mnt/data/fasten
|
||||
```
|
||||
|
||||
**Restore command:**
|
||||
```bash
|
||||
sudo restic -r rclone:b2:vk-guava/restic \
|
||||
--password-file /root/.restic-password \
|
||||
restore latest --target /mnt/data/restore
|
||||
```
|
||||
|
||||
**Check integrity:**
|
||||
```bash
|
||||
sudo restic -r rclone:b2:vk-guava/restic \
|
||||
--password-file /root/.restic-password \
|
||||
check
|
||||
```
|
||||
|
||||
### Guava Backup → Moon (Browse Access)
|
||||
|
||||
The Guava full backup on atlantis is NFS-mounted on moon for browsing:
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **Source** | atlantis `/volume1/archive/guava_full_backup` |
|
||||
| **Mount** | moon `/home/moon/guava_backup_atlantis` |
|
||||
| **Protocol** | NFS v3 over Tailscale (`100.83.230.112`) |
|
||||
| **Access** | Read-only, moon user (uid 1000) |
|
||||
| **Persistent** | fstab with `_netdev,nofail` |
|
||||
|
||||
### Disabled Tasks
|
||||
|
||||
| Task | Host | Reason |
|
||||
|------|------|--------|
|
||||
| Backblaze S3 Atlantis (ID 12) | Atlantis | Old task, replaced by "Backblaze b2" (ID 20) |
|
||||
|
||||
## Hosts Without Backup
|
||||
|
||||
| Host | Data at Risk | Mitigation |
|
||||
|------|-------------|------------|
|
||||
| **Jellyfish** (RPi 5) | 1.8TB photos (LUKS2 encrypted NVMe) | LUKS encryption protects at rest, but no redundancy beyond the single drive. Syncthing from phone provides source-of-truth copy. |
|
||||
| **Homelab VM** | Docker data, monitoring databases | Stateless — all compose files in git, data is regenerable. NetBox DB is the main risk |
|
||||
| **Concord NUC** | Home Assistant config, AdGuard | Container data is relatively small and rebuildable |
|
||||
|
||||
**Recommendation:** Set up Backblaze B2 backup for Jellyfish (photo archive) — irreplaceable data with no cloud backup. Guava is now covered.
|
||||
|
||||
## Recovery Procedures
|
||||
|
||||
### Full NAS Recovery (Atlantis)
|
||||
|
||||
1. Replace failed hardware / reinstall DSM
|
||||
2. Restore from Calypso (fastest — local, weekly copy)
|
||||
3. Or restore from Backblaze B2 (slower — download over internet)
|
||||
4. Redeploy Docker stacks from git (all GitOps-managed)
|
||||
|
||||
### Service Recovery (Any Host)
|
||||
|
||||
1. All Docker stacks are in git (`hosts/` directory)
|
||||
2. Portainer GitOps auto-deploys on push
|
||||
3. Just create the Portainer stack pointing to the compose file
|
||||
4. Service-specific data may need restore from backup
|
||||
|
||||
### Critical Service Priority
|
||||
|
||||
| Priority | Service | Backup Source | Recovery Time |
|
||||
|----------|---------|--------------|---------------|
|
||||
| 1 | Authentik (SSO) | Calypso B2 daily | ~30 min |
|
||||
| 2 | Gitea (Git) | Calypso B2 daily | ~30 min |
|
||||
| 3 | NPM (Reverse Proxy) | Calypso B2 daily / matrix-ubuntu local | ~5 min (redeploy) |
|
||||
| 4 | Plex (Media) | Atlantis B2 weekly | ~1 hr (metadata only, media on disk) |
|
||||
| 5 | Paperless (Documents) | Calypso B2 daily | ~30 min |
|
||||
|
||||
## Monitoring
|
||||
|
||||
- **DIUN**: Monitors container image updates (weekly, ntfy notification)
|
||||
- **Uptime Kuma**: Monitors service availability (97 monitors)
|
||||
- **HyperBackup**: Sends DSM notification on backup success/failure
|
||||
- **Backblaze B2**: Dashboard at `https://secure.backblaze.com/b2_buckets.htm`
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Storage Topology](../diagrams/storage-topology.md) — detailed storage layout per host
|
||||
- [Image Update Guide](../admin/IMAGE_UPDATE_GUIDE.md) — how services are updated
|
||||
- [Offline & Remote Access](offline-and-remote-access.md) — accessing services when internet is down
|
||||
- [Ansible Playbook Guide](../admin/ANSIBLE_PLAYBOOK_GUIDE.md) — `backup_configs.yml` and `backup_databases.yml` playbooks
|
||||
123
docs/infrastructure/cloudflare-dns.md
Normal file
123
docs/infrastructure/cloudflare-dns.md
Normal file
@@ -0,0 +1,123 @@
|
||||
# Cloudflare DNS Configuration
|
||||
|
||||
DNS management for vish.gg and thevish.io domains.
|
||||
|
||||
## Overview
|
||||
|
||||
All public-facing services use Cloudflare for:
|
||||
- DNS management
|
||||
- DDoS protection (orange cloud proxy)
|
||||
- SSL/TLS termination
|
||||
- Caching
|
||||
|
||||
## DNS Records - vish.gg
|
||||
|
||||
### 🟠 Proxied (Orange Cloud) - Protected
|
||||
|
||||
These domains route through Cloudflare's network, hiding your real IP:
|
||||
|
||||
| Domain | Service | Host |
|
||||
|--------|---------|------|
|
||||
| `vish.gg` | Main website | Atlantis |
|
||||
| `www.vish.gg` | Main website | Atlantis |
|
||||
| `sso.vish.gg` | Authentik SSO | Calypso |
|
||||
| `gf.vish.gg` | Grafana | homelab-vm |
|
||||
| `git.vish.gg` | Gitea | Calypso |
|
||||
| `pw.vish.gg` | Vaultwarden | Atlantis |
|
||||
| `ntfy.vish.gg` | Ntfy notifications | homelab-vm |
|
||||
| `cal.vish.gg` | Calendar | Atlantis |
|
||||
| `mastodon.vish.gg` | Mastodon | Atlantis |
|
||||
| `vp.vish.gg` | Piped (YouTube) | Concord NUC |
|
||||
| `mx.vish.gg` | Mail proxy | Atlantis |
|
||||
|
||||
### ⚪ DNS Only (Grey Cloud) - Direct Connection
|
||||
|
||||
These domains expose your real IP (use only when necessary):
|
||||
|
||||
| Domain | Reason for DNS-only |
|
||||
|--------|---------------------|
|
||||
| `*.vish.gg` | Wildcard fallback |
|
||||
| `api.vish.gg` | API endpoints (Concord NUC) |
|
||||
| `api.vp.vish.gg` | Piped API |
|
||||
| `spotify.vish.gg` | Spotify API |
|
||||
| `client.spotify.vish.gg` | Spotify client |
|
||||
| `in.vish.gg` | Invidious |
|
||||
|
||||
## DDNS Updaters
|
||||
|
||||
Dynamic DNS is managed by `favonia/cloudflare-ddns` containers:
|
||||
|
||||
### Atlantis NAS
|
||||
- **Stack**: `dynamicdnsupdater.yaml`
|
||||
- **Proxied**: Most vish.gg and thevish.io domains
|
||||
- Updates when Atlantis's public IP changes
|
||||
|
||||
### Calypso NAS
|
||||
- **Stack**: `dynamic_dns.yaml`
|
||||
- **Proxied**: `sso.vish.gg`, `git.vish.gg`, `gf.vish.gg`
|
||||
- Updates when Calypso's public IP changes
|
||||
|
||||
### Concord NUC
|
||||
- **Stack**: `dyndns_updater.yaml`
|
||||
- **DNS Only**: API endpoints (require direct connection)
|
||||
|
||||
## Cloudflare API
|
||||
|
||||
API token for DDNS: `REDACTED_CLOUDFLARE_TOKEN`
|
||||
|
||||
### Query DNS Records
|
||||
```bash
|
||||
curl -s "https://api.cloudflare.com/client/v4/zones/4dbd15d096d71101b7c0c6362b307a66/dns_records" \
|
||||
-H "Authorization: Bearer $TOKEN" | jq '.result[] | {name, proxied}'
|
||||
```
|
||||
|
||||
### Enable/Disable Proxy
|
||||
```bash
|
||||
# Get record ID
|
||||
RECORD_ID=$(curl -s "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records?name=example.vish.gg" \
|
||||
-H "Authorization: Bearer $TOKEN" | jq -r '.result[0].id')
|
||||
|
||||
# Enable proxy (orange cloud)
|
||||
curl -X PATCH "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records/$RECORD_ID" \
|
||||
-H "Authorization: Bearer $TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
--data '{"proxied":true}'
|
||||
```
|
||||
|
||||
## SSL/TLS Configuration
|
||||
|
||||
- **Mode**: Full (Strict)
|
||||
- **Origin Certificate**: Cloudflare-issued for `*.vish.gg`
|
||||
- **Certificate ID**: `lONWNn` (Synology reverse proxy)
|
||||
|
||||
## Adding New Subdomains
|
||||
|
||||
1. **Create DNS record** via Cloudflare dashboard or API
|
||||
2. **Set proxy status**: Orange cloud for public services
|
||||
3. **Update DDNS config** on appropriate host
|
||||
4. **Configure reverse proxy** on Synology
|
||||
5. **Test connectivity** and SSL
|
||||
|
||||
## IP Addresses
|
||||
|
||||
| IP | Location | Services |
|
||||
|----|----------|----------|
|
||||
| `YOUR_WAN_IP` | Home (Atlantis/Calypso) | Most services |
|
||||
| `YOUR_WAN_IP` | Concord NUC | API endpoints |
|
||||
| `YOUR_WAN_IP` | VPS | nx, obs, pp, wb |
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### DNS not resolving
|
||||
- Check Cloudflare dashboard for propagation
|
||||
- Verify DDNS container is running
|
||||
- Check API token permissions
|
||||
|
||||
### SSL errors
|
||||
- Ensure Cloudflare SSL mode is "Full (Strict)"
|
||||
- Verify origin certificate is valid
|
||||
- Check reverse proxy SSL settings
|
||||
|
||||
### Proxy issues
|
||||
- Some services (SSH, non-HTTP) can't use orange cloud
|
||||
- APIs may need direct connection for webhooks
|
||||
145
docs/infrastructure/cloudflare-tunnels-setup.md
Normal file
145
docs/infrastructure/cloudflare-tunnels-setup.md
Normal file
@@ -0,0 +1,145 @@
|
||||
# Cloudflare Tunnels Setup Guide
|
||||
|
||||
Step-by-step guide to create and configure Cloudflare Tunnels for the homelab.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Cloudflare account with Zero Trust enabled (free tier works)
|
||||
- Access to [Cloudflare Zero Trust Dashboard](https://one.dash.cloudflare.com/)
|
||||
|
||||
## Creating a Tunnel
|
||||
|
||||
### Step 1: Access Zero Trust Dashboard
|
||||
|
||||
1. Go to https://one.dash.cloudflare.com/
|
||||
2. Select your account
|
||||
3. Navigate to: **Networks** → **Tunnels**
|
||||
|
||||
### Step 2: Create New Tunnel
|
||||
|
||||
1. Click **Create a tunnel**
|
||||
2. Select **Cloudflared** as the connector type
|
||||
3. Click **Next**
|
||||
|
||||
### Step 3: Name Your Tunnel
|
||||
|
||||
- For Atlantis: `atlantis-tunnel`
|
||||
- For Homelab-VM: `homelab-vm-tunnel`
|
||||
|
||||
### Step 4: Install Connector
|
||||
|
||||
1. You'll see a tunnel token (starts with `eyJ...`)
|
||||
2. **Copy this token** - you'll need it for the Docker container
|
||||
3. The token is your `TUNNEL_TOKEN` environment variable
|
||||
|
||||
### Step 5: Add Public Hostnames
|
||||
|
||||
Click **Add a public hostname** for each service:
|
||||
|
||||
#### Atlantis Tunnel Hostnames
|
||||
|
||||
| Subdomain | Domain | Path | Type | URL |
|
||||
|-----------|--------|------|------|-----|
|
||||
| pw | vish.gg | | HTTP | localhost:4080 |
|
||||
| cal | vish.gg | | HTTP | localhost:12852 |
|
||||
| meet | thevish.io | | HTTPS | localhost:5443 |
|
||||
| joplin | thevish.io | | HTTP | localhost:22300 |
|
||||
| mastodon | vish.gg | | HTTP | 192.168.0.154:3000 |
|
||||
| matrix | thevish.io | | HTTP | 192.168.0.154:8081 |
|
||||
| mx | vish.gg | | HTTP | 192.168.0.154:8082 |
|
||||
| mm | crista.love | | HTTP | 192.168.0.154:8065 |
|
||||
|
||||
#### Homelab-VM Tunnel Hostnames
|
||||
|
||||
| Subdomain | Domain | Path | Type | URL |
|
||||
|-----------|--------|------|------|-----|
|
||||
| gf | vish.gg | | HTTP | localhost:3300 |
|
||||
| ntfy | vish.gg | | HTTP | localhost:8081 |
|
||||
| hoarder | thevish.io | | HTTP | localhost:3000 |
|
||||
| binterest | thevish.io | | HTTP | localhost:21544 |
|
||||
|
||||
### Step 6: Configure Additional Settings (Optional)
|
||||
|
||||
For each hostname, you can configure:
|
||||
|
||||
- **TLS Settings**: Usually leave as default
|
||||
- **HTTP Settings**:
|
||||
- Enable "No TLS Verify" if backend uses self-signed cert
|
||||
- Set HTTP Host Header if needed
|
||||
- **Access**: Add Cloudflare Access policies (see Authentik integration)
|
||||
|
||||
### Step 7: Save and Deploy
|
||||
|
||||
1. Click **Save tunnel**
|
||||
2. Deploy the Docker container with your token
|
||||
|
||||
## Docker Deployment
|
||||
|
||||
### Atlantis (Synology)
|
||||
|
||||
```yaml
|
||||
# Deploy via Portainer with environment variable:
|
||||
# TUNNEL_TOKEN=eyJ...your-token-here...
|
||||
|
||||
version: '3.8'
|
||||
services:
|
||||
cloudflared:
|
||||
image: cloudflare/cloudflared:latest
|
||||
container_name: cloudflare-tunnel
|
||||
restart: unless-stopped
|
||||
command: tunnel run
|
||||
environment:
|
||||
- TUNNEL_TOKEN=${TUNNEL_TOKEN}
|
||||
network_mode: host
|
||||
```
|
||||
|
||||
### Homelab-VM
|
||||
|
||||
Same configuration, different token for the homelab-vm tunnel.
|
||||
|
||||
## Verifying Tunnel Status
|
||||
|
||||
1. In Cloudflare Dashboard → Tunnels
|
||||
2. Your tunnel should show **Healthy** status
|
||||
3. Test each hostname in a browser
|
||||
|
||||
## DNS Changes
|
||||
|
||||
When tunnels are active, Cloudflare automatically manages DNS.
|
||||
The DNS records will show as CNAME pointing to your tunnel.
|
||||
|
||||
**Before tunnel:**
|
||||
```
|
||||
pw.vish.gg → A → YOUR_WAN_IP
|
||||
```
|
||||
|
||||
**After tunnel:**
|
||||
```
|
||||
pw.vish.gg → CNAME → <tunnel-id>.cfargotunnel.com
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Tunnel Shows "Down"
|
||||
- Check container is running: `docker ps | grep cloudflare`
|
||||
- Check logs: `docker logs cloudflare-tunnel`
|
||||
- Verify token is correct
|
||||
|
||||
### 502 Bad Gateway
|
||||
- Backend service not running
|
||||
- Wrong port number
|
||||
- Network mode issue (try `network_mode: host`)
|
||||
|
||||
### SSL Errors
|
||||
- Enable "No TLS Verify" for self-signed certs
|
||||
- Or use HTTP instead of HTTPS for backend
|
||||
|
||||
## Security Considerations
|
||||
|
||||
- Tunnel token is sensitive - store securely
|
||||
- Use Cloudflare Access for additional authentication
|
||||
- Consider IP allowlists for sensitive services
|
||||
|
||||
## Integration with Authentik
|
||||
|
||||
See [Authentik SSO Guide](./authentik-sso.md) for protecting tunneled services with SSO.
|
||||
542
docs/infrastructure/cloudflare-tunnels.md
Normal file
542
docs/infrastructure/cloudflare-tunnels.md
Normal file
@@ -0,0 +1,542 @@
|
||||
# Cloudflare Tunnels Guide
|
||||
|
||||
**Last Updated:** 2026-01-29
|
||||
|
||||
This guide covers how to use Cloudflare Tunnels (cloudflared) to expose local services to the internet securely, without opening ports on your router.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [What is Cloudflared?](#what-is-cloudflared)
|
||||
- [Quick Temporary Tunnel](#quick-temporary-tunnel-no-account-needed)
|
||||
- [Named Tunnel Setup](#named-tunnel-setup)
|
||||
- [Docker Compose Setup](#docker-compose-setup-recommended)
|
||||
- [Adding Authentication](#adding-authentication-cloudflare-access)
|
||||
- [Common Use Cases](#common-use-cases)
|
||||
- [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## What is Cloudflared?
|
||||
|
||||
**Cloudflared** is Cloudflare's tunnel client that creates a secure, encrypted connection between your local machine and Cloudflare's edge network. It allows you to expose local services to the internet **without opening ports on your router** or having a public IP.
|
||||
|
||||
### How It Works
|
||||
|
||||
```
|
||||
Your Local Service → cloudflared → Cloudflare Edge → Public URL → Visitor's Browser
|
||||
(port 8080) (outbound) (proxy/CDN) (your domain)
|
||||
```
|
||||
|
||||
**Key insight:** cloudflared makes an OUTBOUND connection to Cloudflare, so you don't need to configure any firewall rules or port forwarding.
|
||||
|
||||
### Benefits
|
||||
|
||||
- ✅ No port forwarding required
|
||||
- ✅ DDoS protection via Cloudflare
|
||||
- ✅ Free SSL certificates
|
||||
- ✅ Optional authentication (Cloudflare Access)
|
||||
- ✅ Works behind CGNAT
|
||||
- ✅ Multiple services on one tunnel
|
||||
|
||||
---
|
||||
|
||||
## Quick Temporary Tunnel (No Account Needed)
|
||||
|
||||
This is the fastest way to share something temporarily. No Cloudflare account required.
|
||||
|
||||
### Option 1: Using Docker (Easiest)
|
||||
|
||||
```bash
|
||||
# Expose a local service running on port 8080
|
||||
docker run --rm -it --network host cloudflare/cloudflared:latest tunnel --url http://localhost:8080
|
||||
|
||||
# Examples for specific services:
|
||||
# Jellyfin
|
||||
docker run --rm -it --network host cloudflare/cloudflared:latest tunnel --url http://localhost:8096
|
||||
|
||||
# Grafana
|
||||
docker run --rm -it --network host cloudflare/cloudflared:latest tunnel --url http://localhost:3000
|
||||
|
||||
# Any web service
|
||||
docker run --rm -it --network host cloudflare/cloudflared:latest tunnel --url http://localhost:PORT
|
||||
```
|
||||
|
||||
### Option 2: Install cloudflared Directly
|
||||
|
||||
```bash
|
||||
# On Debian/Ubuntu
|
||||
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb -o cloudflared.deb
|
||||
sudo dpkg -i cloudflared.deb
|
||||
|
||||
# On macOS
|
||||
brew install cloudflared
|
||||
|
||||
# On Windows (PowerShell)
|
||||
winget install Cloudflare.cloudflared
|
||||
|
||||
# Then run:
|
||||
cloudflared tunnel --url http://localhost:8080
|
||||
```
|
||||
|
||||
### What You'll See
|
||||
|
||||
```
|
||||
INF Thank you for trying Cloudflare Tunnel...
|
||||
INF Your quick Tunnel has been created! Visit it at:
|
||||
INF https://random-words-here.trycloudflare.com
|
||||
```
|
||||
|
||||
Share that URL with your friend! When done, press **Ctrl+C** to close the tunnel.
|
||||
|
||||
### Quick Tunnel Limitations
|
||||
|
||||
- URL changes every time you restart
|
||||
- No authentication
|
||||
- No uptime guarantee
|
||||
- Single service per tunnel
|
||||
|
||||
---
|
||||
|
||||
## Named Tunnel Setup
|
||||
|
||||
Named tunnels give you a **permanent, custom URL** on your own domain with optional authentication.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Cloudflare account (free tier works)
|
||||
- Domain on Cloudflare DNS (e.g., vish.gg, thevish.io)
|
||||
- cloudflared installed
|
||||
|
||||
### Step 1: Install cloudflared
|
||||
|
||||
```bash
|
||||
# For Synology/Debian/Ubuntu:
|
||||
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
|
||||
chmod +x /usr/local/bin/cloudflared
|
||||
|
||||
# Verify installation
|
||||
cloudflared --version
|
||||
```
|
||||
|
||||
### Step 2: Authenticate with Cloudflare
|
||||
|
||||
```bash
|
||||
cloudflared tunnel login
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Open a browser (or provide a URL to visit)
|
||||
2. Ask you to log into Cloudflare
|
||||
3. Select which domain to authorize
|
||||
4. Save a certificate to `~/.cloudflared/cert.pem`
|
||||
|
||||
### Step 3: Create a Named Tunnel
|
||||
|
||||
```bash
|
||||
# Create a tunnel named "homelab"
|
||||
cloudflared tunnel create homelab
|
||||
```
|
||||
|
||||
Output:
|
||||
```
|
||||
Created tunnel homelab with id a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
||||
```
|
||||
|
||||
**Save that UUID!** It's your tunnel's unique identifier.
|
||||
|
||||
This also creates a credentials file at:
|
||||
`~/.cloudflared/<TUNNEL_UUID>.json`
|
||||
|
||||
### Step 4: Create a Config File
|
||||
|
||||
Create `~/.cloudflared/config.yml`:
|
||||
|
||||
```yaml
|
||||
# Tunnel UUID (from step 3)
|
||||
tunnel: a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
||||
credentials-file: /root/.cloudflared/a1b2c3d4-e5f6-7890-abcd-ef1234567890.json
|
||||
|
||||
# Route traffic to local services
|
||||
ingress:
|
||||
# Jellyfin at jellyfin.vish.gg
|
||||
- hostname: jellyfin.vish.gg
|
||||
service: http://localhost:8096
|
||||
|
||||
# Paperless at docs.vish.gg
|
||||
- hostname: docs.vish.gg
|
||||
service: http://localhost:8000
|
||||
|
||||
# Grafana at grafana.vish.gg
|
||||
- hostname: grafana.vish.gg
|
||||
service: http://localhost:3000
|
||||
|
||||
# SSH access at ssh.vish.gg
|
||||
- hostname: ssh.vish.gg
|
||||
service: ssh://localhost:22
|
||||
|
||||
# Catch-all (required) - returns 404 for unmatched hostnames
|
||||
- service: http_status:404
|
||||
```
|
||||
|
||||
### Step 5: Create DNS Routes
|
||||
|
||||
For each hostname, create a DNS record pointing to your tunnel:
|
||||
|
||||
```bash
|
||||
# Automatically create CNAME records
|
||||
cloudflared tunnel route dns homelab jellyfin.vish.gg
|
||||
cloudflared tunnel route dns homelab docs.vish.gg
|
||||
cloudflared tunnel route dns homelab grafana.vish.gg
|
||||
cloudflared tunnel route dns homelab ssh.vish.gg
|
||||
```
|
||||
|
||||
This creates CNAME records pointing to `<TUNNEL_UUID>.cfargotunnel.com`
|
||||
|
||||
### Step 6: Run the Tunnel
|
||||
|
||||
```bash
|
||||
# Test it first
|
||||
cloudflared tunnel run homelab
|
||||
|
||||
# Or run with specific config file
|
||||
cloudflared tunnel --config ~/.cloudflared/config.yml run homelab
|
||||
```
|
||||
|
||||
### Step 7: Run as a Service (Persistent)
|
||||
|
||||
```bash
|
||||
# Install as a systemd service
|
||||
sudo cloudflared service install
|
||||
|
||||
# Start and enable
|
||||
sudo systemctl start cloudflared
|
||||
sudo systemctl enable cloudflared
|
||||
|
||||
# Check status
|
||||
sudo systemctl status cloudflared
|
||||
|
||||
# View logs
|
||||
sudo journalctl -u cloudflared -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose Setup (Recommended)
|
||||
|
||||
For homelab use, running cloudflared as a Docker container is recommended.
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
cloudflared/
|
||||
├── docker-compose.yml
|
||||
├── config.yml
|
||||
└── credentials.json # Copy from ~/.cloudflared/<UUID>.json
|
||||
```
|
||||
|
||||
### docker-compose.yml
|
||||
|
||||
```yaml
|
||||
version: "3.9"
|
||||
services:
|
||||
cloudflared:
|
||||
image: cloudflare/cloudflared:latest
|
||||
container_name: cloudflared
|
||||
restart: unless-stopped
|
||||
command: tunnel --config /etc/cloudflared/config.yml run
|
||||
volumes:
|
||||
- ./config.yml:/etc/cloudflared/config.yml:ro
|
||||
- ./credentials.json:/etc/cloudflared/credentials.json:ro
|
||||
networks:
|
||||
- homelab
|
||||
|
||||
networks:
|
||||
homelab:
|
||||
external: true
|
||||
```
|
||||
|
||||
### config.yml (Docker version)
|
||||
|
||||
```yaml
|
||||
tunnel: a1b2c3d4-e5f6-7890-abcd-ef1234567890
|
||||
credentials-file: /etc/cloudflared/credentials.json
|
||||
|
||||
ingress:
|
||||
# Use container names when on same Docker network
|
||||
- hostname: jellyfin.vish.gg
|
||||
service: http://jellyfin:8096
|
||||
|
||||
- hostname: paperless.vish.gg
|
||||
service: http://paperless-ngx:8000
|
||||
|
||||
- hostname: grafana.vish.gg
|
||||
service: http://grafana:3000
|
||||
|
||||
# For services on the host network, use host IP
|
||||
- hostname: portainer.vish.gg
|
||||
service: http://192.168.0.200:9000
|
||||
|
||||
# Catch-all (required)
|
||||
- service: http_status:404
|
||||
```
|
||||
|
||||
### Deploy
|
||||
|
||||
```bash
|
||||
cd cloudflared
|
||||
docker-compose up -d
|
||||
|
||||
# Check logs
|
||||
docker logs -f cloudflared
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Adding Authentication (Cloudflare Access)
|
||||
|
||||
Protect services with Cloudflare Access (free for up to 50 users).
|
||||
|
||||
### Setup via Dashboard
|
||||
|
||||
1. Go to **Cloudflare Dashboard** → **Zero Trust** → **Access** → **Applications**
|
||||
|
||||
2. Click **Add an Application** → **Self-hosted**
|
||||
|
||||
3. Configure:
|
||||
- **Application name**: Grafana
|
||||
- **Session duration**: 24 hours
|
||||
- **Application domain**: `grafana.vish.gg`
|
||||
|
||||
4. Create a **Policy**:
|
||||
- **Policy name**: Allow Me
|
||||
- **Action**: Allow
|
||||
- **Include**:
|
||||
- Emails: `your-email@gmail.com`
|
||||
- Or Emails ending in: `@yourdomain.com`
|
||||
|
||||
5. Save the application
|
||||
|
||||
### How It Works
|
||||
|
||||
```
|
||||
Friend visits grafana.vish.gg
|
||||
→ Cloudflare Access login page
|
||||
→ Enters email
|
||||
→ Receives one-time PIN via email
|
||||
→ Enters PIN
|
||||
→ Authenticated → Sees Grafana
|
||||
```
|
||||
|
||||
### Authentication Options
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| One-time PIN | Email-based OTP (default) |
|
||||
| Google/GitHub/etc. | OAuth integration |
|
||||
| SAML/OIDC | Enterprise SSO |
|
||||
| Service Token | For API/automated access |
|
||||
| mTLS | Certificate-based |
|
||||
|
||||
---
|
||||
|
||||
## Common Use Cases
|
||||
|
||||
### Share Jellyfin for Movie Night
|
||||
|
||||
```bash
|
||||
# Quick tunnel (temporary)
|
||||
docker run --rm -it --network host cloudflare/cloudflared:latest tunnel --url http://localhost:8096
|
||||
|
||||
# Named tunnel (permanent)
|
||||
# Add to config.yml:
|
||||
# - hostname: watch.vish.gg
|
||||
# service: http://localhost:8096
|
||||
```
|
||||
|
||||
### Expose SSH Access
|
||||
|
||||
```yaml
|
||||
# In config.yml
|
||||
ingress:
|
||||
- hostname: ssh.vish.gg
|
||||
service: ssh://localhost:22
|
||||
```
|
||||
|
||||
Client connects via:
|
||||
```bash
|
||||
# Install cloudflared on client
|
||||
cloudflared access ssh --hostname ssh.vish.gg
|
||||
```
|
||||
|
||||
Or configure SSH config (`~/.ssh/config`):
|
||||
```
|
||||
Host ssh.vish.gg
|
||||
ProxyCommand cloudflared access ssh --hostname %h
|
||||
```
|
||||
|
||||
### Expose RDP/VNC
|
||||
|
||||
```yaml
|
||||
ingress:
|
||||
- hostname: rdp.vish.gg
|
||||
service: rdp://localhost:3389
|
||||
|
||||
- hostname: vnc.vish.gg
|
||||
service: tcp://localhost:5900
|
||||
```
|
||||
|
||||
### Multiple Services Example
|
||||
|
||||
```yaml
|
||||
tunnel: your-tunnel-uuid
|
||||
credentials-file: /etc/cloudflared/credentials.json
|
||||
|
||||
ingress:
|
||||
# Media
|
||||
- hostname: jellyfin.vish.gg
|
||||
service: http://jellyfin:8096
|
||||
- hostname: plex.vish.gg
|
||||
service: http://plex:32400
|
||||
|
||||
# Productivity
|
||||
- hostname: paperless.vish.gg
|
||||
service: http://paperless:8000
|
||||
- hostname: wiki.vish.gg
|
||||
service: http://dokuwiki:80
|
||||
|
||||
# Development
|
||||
- hostname: git.vish.gg
|
||||
service: http://gitea:3000
|
||||
- hostname: code.vish.gg
|
||||
service: http://code-server:8080
|
||||
|
||||
# Monitoring
|
||||
- hostname: grafana.vish.gg
|
||||
service: http://grafana:3000
|
||||
- hostname: uptime.vish.gg
|
||||
service: http://uptime-kuma:3001
|
||||
|
||||
# Catch-all
|
||||
- service: http_status:404
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Reference Commands
|
||||
|
||||
```bash
|
||||
# Authentication
|
||||
cloudflared tunnel login # Authenticate with Cloudflare
|
||||
cloudflared tunnel logout # Remove authentication
|
||||
|
||||
# Tunnel Management
|
||||
cloudflared tunnel list # List all tunnels
|
||||
cloudflared tunnel info <name> # Get tunnel details
|
||||
cloudflared tunnel create <name> # Create new tunnel
|
||||
cloudflared tunnel delete <name> # Delete tunnel (must stop first)
|
||||
|
||||
# DNS Routes
|
||||
cloudflared tunnel route dns <tunnel> <hostname> # Create DNS route
|
||||
cloudflared tunnel route dns list # List all routes
|
||||
|
||||
# Running Tunnels
|
||||
cloudflared tunnel run <name> # Run tunnel
|
||||
cloudflared tunnel --config config.yml run # Run with config
|
||||
cloudflared tunnel ingress validate # Validate config
|
||||
|
||||
# Debugging
|
||||
cloudflared tunnel --loglevel debug run <name> # Debug logging
|
||||
cloudflared tunnel info <name> # Tunnel info
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Tunnel won't start
|
||||
|
||||
```bash
|
||||
# Check config syntax
|
||||
cloudflared tunnel ingress validate
|
||||
|
||||
# Run with debug logging
|
||||
cloudflared tunnel --loglevel debug run homelab
|
||||
```
|
||||
|
||||
### DNS not resolving
|
||||
|
||||
```bash
|
||||
# Verify DNS route exists
|
||||
cloudflared tunnel route dns list
|
||||
|
||||
# Check CNAME in Cloudflare dashboard
|
||||
# Should point to: <UUID>.cfargotunnel.com
|
||||
```
|
||||
|
||||
### Service unreachable
|
||||
|
||||
1. **Check service is running locally:**
|
||||
```bash
|
||||
curl http://localhost:8080
|
||||
```
|
||||
|
||||
2. **Check Docker networking:**
|
||||
- If using container names, ensure same Docker network
|
||||
- If using localhost, use `--network host` or host IP
|
||||
|
||||
3. **Check ingress rules order:**
|
||||
- More specific rules should come before catch-all
|
||||
- Catch-all (`http_status:404`) must be last
|
||||
|
||||
### Certificate errors
|
||||
|
||||
```bash
|
||||
# Re-authenticate
|
||||
cloudflared tunnel login
|
||||
|
||||
# Check cert exists
|
||||
ls -la ~/.cloudflared/cert.pem
|
||||
```
|
||||
|
||||
### View tunnel metrics
|
||||
|
||||
Cloudflare provides metrics at:
|
||||
- Dashboard → Zero Trust → Tunnels → Select tunnel → Metrics
|
||||
|
||||
---
|
||||
|
||||
## Quick vs Named Tunnel Comparison
|
||||
|
||||
| Feature | Quick Tunnel | Named Tunnel |
|
||||
|---------|--------------|--------------|
|
||||
| URL | `random.trycloudflare.com` | `app.yourdomain.com` |
|
||||
| Cloudflare Account | ❌ Not needed | ✅ Required |
|
||||
| Persistence | ❌ Dies with process | ✅ Permanent |
|
||||
| Custom domain | ❌ No | ✅ Yes |
|
||||
| Multiple services | ❌ One per tunnel | ✅ Many via ingress |
|
||||
| Authentication | ❌ None | ✅ Cloudflare Access |
|
||||
| Setup time | 10 seconds | 10 minutes |
|
||||
| Best for | Quick demos | Production |
|
||||
|
||||
---
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **Always use HTTPS** - Cloudflare handles this automatically
|
||||
2. **Enable Cloudflare Access** for sensitive services
|
||||
3. **Use service tokens** for automated/API access
|
||||
4. **Monitor tunnel logs** for suspicious activity
|
||||
5. **Rotate credentials** periodically
|
||||
6. **Limit ingress rules** to only what's needed
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Cloudflare Tunnel Docs](https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/)
|
||||
- [Cloudflare Access Docs](https://developers.cloudflare.com/cloudflare-one/policies/access/)
|
||||
- [Zero Trust Dashboard](https://one.dash.cloudflare.com/)
|
||||
|
||||
---
|
||||
|
||||
*Last Updated: 2026-01-29*
|
||||
488
docs/infrastructure/comprehensive-travel-setup.md
Normal file
488
docs/infrastructure/comprehensive-travel-setup.md
Normal file
@@ -0,0 +1,488 @@
|
||||
# 🌍 Comprehensive Travel Connectivity Setup
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide combines all travel networking components into a complete mobile homelab access solution, featuring the MSI Prestige 13 AI Plus laptop, GL.iNet travel routers, remote KVM, and Tailscale mesh networking.
|
||||
|
||||
---
|
||||
|
||||
## 🎒 Complete Travel Kit
|
||||
|
||||
### **Primary Hardware Stack**
|
||||
```
|
||||
MSI Prestige 13 AI Plus (Travel Laptop)
|
||||
├── GL.iNet Slate 7 (GL-BE3600) - Primary Wi-Fi 7 Router
|
||||
├── GL.iNet Beryl AX (GL-MT3000) - Backup Wi-Fi 6 Router
|
||||
├── GL.iNet Mango (GL-MT300N-V2) - Emergency Router
|
||||
├── GL.iNet S200 - IoT Gateway (optional)
|
||||
└── GL.iNet Comet (GL-RM1) - Remote KVM
|
||||
```
|
||||
|
||||
### **Connectivity Layers**
|
||||
1. **Physical Layer**: GL.iNet routers for internet access
|
||||
2. **Security Layer**: Tailscale mesh VPN for encrypted tunnels
|
||||
3. **Application Layer**: Full homelab service access
|
||||
4. **Management Layer**: Remote KVM for emergency server access
|
||||
|
||||
---
|
||||
|
||||
## 💻 MSI Prestige 13 AI Plus - Travel Workstation
|
||||
|
||||
### **Why This Laptop for Travel?**
|
||||
- **Ultra-Portable**: 990g weight, 13.3" form factor
|
||||
- **AI Acceleration**: Intel NPU for AI workloads (47 TOPS)
|
||||
- **Efficient Performance**: Intel Arc Graphics + Core Ultra 7 258V
|
||||
- **Premium Display**: OLED 2.8K touch-enabled for creative work
|
||||
- **Wi-Fi 7**: Latest wireless standard for maximum speed
|
||||
- **All-Day Battery**: 75Wh with fast charging
|
||||
- **Tailscale IP**: 100.80.0.26 (msi.tail.vish.gg)
|
||||
|
||||
### **Travel-Optimized Configuration**
|
||||
```bash
|
||||
# Windows 11 Pro Setup
|
||||
- WSL2 for Linux development environment
|
||||
- Docker Desktop for container development
|
||||
- Tailscale client for homelab access
|
||||
- GL.iNet mobile app for router management
|
||||
- Remote desktop tools for KVM access
|
||||
|
||||
# Development Environment
|
||||
- Visual Studio Code with remote development
|
||||
- Git with SSH keys for GitLab access
|
||||
- Node.js, Python, Docker for development
|
||||
- VPN clients for secure connectivity
|
||||
```
|
||||
|
||||
### **Power Management for Travel**
|
||||
- **Performance Mode**: Full power for intensive tasks
|
||||
- **Balanced Mode**: Optimal battery life for general use
|
||||
- **Battery Saver**: Extended operation when charging unavailable
|
||||
- **Fast Charging**: Quick top-ups during layovers
|
||||
|
||||
---
|
||||
|
||||
## 🌐 GL.iNet Travel Router Strategy
|
||||
|
||||
### **Multi-Router Redundancy**
|
||||
Each router serves a specific purpose in the travel connectivity stack:
|
||||
|
||||
#### **GL-BE3600 (Primary) - Wi-Fi 7 Performance**
|
||||
```bash
|
||||
# Use Cases:
|
||||
- High-bandwidth work (video calls, large file transfers)
|
||||
- Content creation and media streaming
|
||||
- Development with rapid Docker image pulls
|
||||
- AI/ML workloads requiring fast data access
|
||||
|
||||
# Configuration:
|
||||
- Primary VPN tunnel to homelab
|
||||
- QoS prioritization for work traffic
|
||||
- Guest network for untrusted devices
|
||||
- Captive portal bypass for hotel Wi-Fi
|
||||
```
|
||||
|
||||
#### **GL-MT3000 (Backup) - Wi-Fi 6 Reliability**
|
||||
```bash
|
||||
# Use Cases:
|
||||
- Backup connectivity when primary fails
|
||||
- Secondary location setup (hotel room + lobby)
|
||||
- Load balancing for multiple devices
|
||||
- Dedicated IoT device connectivity
|
||||
|
||||
# Configuration:
|
||||
- Secondary VPN tunnel for redundancy
|
||||
- Different SSID for easy identification
|
||||
- Optimized for battery operation
|
||||
- Simplified configuration for quick setup
|
||||
```
|
||||
|
||||
#### **GL-MT300N-V2 (Emergency) - Basic Connectivity**
|
||||
```bash
|
||||
# Use Cases:
|
||||
- Emergency internet access
|
||||
- Ultra-portable backup (credit card size)
|
||||
- Legacy device connectivity
|
||||
- Power-constrained environments
|
||||
|
||||
# Configuration:
|
||||
- Basic VPN tunnel
|
||||
- Minimal power consumption
|
||||
- Simple WPA2 security
|
||||
- Emergency contact access only
|
||||
```
|
||||
|
||||
#### **GL-S200 (IoT) - Smart Device Management**
|
||||
```bash
|
||||
# Use Cases:
|
||||
- Travel IoT device management
|
||||
- Smart home setup in extended stays
|
||||
- Development and testing of IoT protocols
|
||||
- Portable smart device hub
|
||||
|
||||
# Configuration:
|
||||
- Thread Border Router
|
||||
- Zigbee coordinator
|
||||
- Matter over Thread/Wi-Fi
|
||||
- Isolated IoT network
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Tailscale Integration Strategy
|
||||
|
||||
### **Split-Brain DNS Configuration**
|
||||
Based on your production setup (`tail.vish.gg`):
|
||||
|
||||
```bash
|
||||
# Nameserver Hierarchy:
|
||||
1. MagicDNS (100.100.100.100) - Tailscale devices
|
||||
2. vish.local (192.168.0.250) - Local network when home
|
||||
3. Homelab DNS (100.103.48.78, 100.72.55.21) - Custom resolution
|
||||
4. Public DNS - Fallback for internet queries
|
||||
|
||||
# Search Domains:
|
||||
- tail.vish.gg (automatic Tailscale resolution)
|
||||
- vish.local (local network resolution)
|
||||
```
|
||||
|
||||
### **Service Access Patterns**
|
||||
Based on current Tailscale network (tail.vish.gg):
|
||||
|
||||
```bash
|
||||
# Active Infrastructure Hosts:
|
||||
atlantis.tail.vish.gg # 100.83.230.112 - Primary NAS & Media
|
||||
calypso.tail.vish.gg # 100.103.48.78 - Development & Caching
|
||||
setillo.tail.vish.gg # 100.125.0.20 - Monitoring & Network
|
||||
homelab.tail.vish.gg # 100.67.40.126 - Experimentation VM
|
||||
pi-5.tail.vish.gg # 100.77.151.40 - Edge Computing
|
||||
pve.tail.vish.gg # 100.87.12.28 - Proxmox Virtualization
|
||||
truenas-scale.tail.vish.gg # 100.75.252.64 - Secondary Storage
|
||||
shinku-ryuu.tail.vish.gg # 100.98.93.15 - Primary Workstation
|
||||
vish-concord-nuc.tail.vish.gg # 100.72.55.21 - Family Network Bridge
|
||||
vmi2076105.tail.vish.gg # 100.99.156.20 - Chicago Remote VM
|
||||
|
||||
# Travel & Mobile Devices:
|
||||
msi.tail.vish.gg # 100.80.0.26 - MSI Prestige 13 AI Plus
|
||||
iphone16.tail.vish.gg # 100.79.252.108 - iPhone 16 Pro Max
|
||||
ipad-pro-12-9-6th-gen-wificellular.tail.vish.gg # 100.68.71.48
|
||||
gl-be3600.tail.vish.gg # 100.105.59.123 - Primary Travel Router
|
||||
gl-mt3000.tail.vish.gg # 100.126.243.15 - Backup Travel Router
|
||||
glkvm.tail.vish.gg # 100.64.137.1 - Remote KVM
|
||||
|
||||
# Service Examples:
|
||||
# Development: Access GitLab via atlantis.tail.vish.gg:3000
|
||||
# Media: Plex via atlantis.tail.vish.gg:32400
|
||||
# Monitoring: Grafana via atlantis.tail.vish.gg:7099
|
||||
# Passwords: Vaultwarden via atlantis.tail.vish.gg:8080
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Remote Management with GL-RM1 KVM
|
||||
|
||||
### **Emergency Server Access**
|
||||
The GL-RM1 provides out-of-band management for critical situations:
|
||||
|
||||
```bash
|
||||
# Physical Setup:
|
||||
Server → GL-RM1 KVM → Network → Tailscale → Travel Laptop
|
||||
|
||||
# Access Methods:
|
||||
1. Web Interface: https://gl-rm1.tail.vish.gg
|
||||
2. Direct IP: https://100.xxx.xxx.xxx (Tailscale IP)
|
||||
3. Local Access: https://192.168.8.100 (when on same network)
|
||||
```
|
||||
|
||||
### **Use Case Scenarios**
|
||||
- **BIOS Access**: Configure hardware settings remotely
|
||||
- **OS Installation**: Install/reinstall operating systems
|
||||
- **Network Troubleshooting**: Fix connectivity issues
|
||||
- **Emergency Recovery**: Access systems when SSH fails
|
||||
- **Hardware Diagnostics**: Check system health and status
|
||||
|
||||
---
|
||||
|
||||
## 📱 Mobile Device Integration
|
||||
|
||||
### **Seamless Multi-Device Experience**
|
||||
```bash
|
||||
# Device Ecosystem:
|
||||
MSI Prestige 13 AI Plus (Primary workstation)
|
||||
├── iPhone 16 Pro Max (Communication, monitoring)
|
||||
├── iPad Pro 12.9" 6th Gen (Creative work, presentations)
|
||||
├── GL.iNet Routers (Network infrastructure)
|
||||
└── GL-RM1 KVM (Emergency management)
|
||||
|
||||
# Tailscale Mesh:
|
||||
- All devices connected to same Tailscale network
|
||||
- Consistent service access across all platforms
|
||||
- Automatic failover between network connections
|
||||
- Synchronized settings and configurations
|
||||
```
|
||||
|
||||
### **Cross-Platform Workflows**
|
||||
- **Development**: Code on laptop, test on mobile devices
|
||||
- **Media**: Stream from homelab to any device
|
||||
- **Productivity**: Access documents from any platform
|
||||
- **Monitoring**: Check homelab status from mobile devices
|
||||
- **Security**: Vaultwarden access from all devices
|
||||
|
||||
---
|
||||
|
||||
## 🗺️ Travel Scenarios & Configurations
|
||||
|
||||
### **Business Travel (1-3 days)**
|
||||
```bash
|
||||
# Minimal Kit:
|
||||
- MSI Prestige 13 AI Plus
|
||||
- GL-BE3600 (primary router)
|
||||
- GL-MT300N-V2 (emergency backup)
|
||||
- Essential cables and chargers
|
||||
|
||||
# Configuration:
|
||||
- Single high-performance router
|
||||
- Full homelab access via Tailscale
|
||||
- Emergency backup for critical connectivity
|
||||
- Optimized for hotel/conference environments
|
||||
```
|
||||
|
||||
### **Extended Travel (1-4 weeks)**
|
||||
```bash
|
||||
# Complete Kit:
|
||||
- MSI Prestige 13 AI Plus
|
||||
- GL-BE3600 + GL-MT3000 (redundant routers)
|
||||
- GL-S200 (IoT gateway for smart devices)
|
||||
- GL-RM1 KVM (remote server management)
|
||||
- Full cable kit and backup power
|
||||
|
||||
# Configuration:
|
||||
- Redundant connectivity options
|
||||
- IoT device management capability
|
||||
- Remote server troubleshooting
|
||||
- Extended stay optimizations
|
||||
```
|
||||
|
||||
### **Digital Nomad (Months)**
|
||||
```bash
|
||||
# Full Infrastructure:
|
||||
- Complete GL.iNet router collection
|
||||
- Multiple backup power solutions
|
||||
- Comprehensive cable and adapter kit
|
||||
- Local SIM cards and cellular backup
|
||||
- Portable monitor and peripherals
|
||||
|
||||
# Configuration:
|
||||
- Location-specific optimizations
|
||||
- Local ISP integration
|
||||
- Cultural and regulatory compliance
|
||||
- Long-term reliability focus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Setup & Configuration Workflows
|
||||
|
||||
### **Pre-Travel Checklist**
|
||||
```bash
|
||||
# Hardware Preparation:
|
||||
□ All devices charged and firmware updated
|
||||
□ Tailscale clients installed and authenticated
|
||||
□ VPN configurations tested and verified
|
||||
□ Backup power solutions packed
|
||||
□ Essential cables and adapters included
|
||||
|
||||
# Software Preparation:
|
||||
□ Development environments synchronized
|
||||
□ Password manager updated and accessible
|
||||
□ Important documents backed up locally
|
||||
□ Emergency contact information accessible
|
||||
□ Homelab monitoring dashboards bookmarked
|
||||
|
||||
# Network Preparation:
|
||||
□ Router configurations backed up
|
||||
□ Emergency access credentials secured
|
||||
□ Failover procedures documented
|
||||
□ Local emergency contacts identified
|
||||
□ ISP and connectivity research completed
|
||||
```
|
||||
|
||||
### **On-Location Setup Procedure**
|
||||
```bash
|
||||
# Step 1: Establish Basic Connectivity
|
||||
1. Connect GL-BE3600 to local internet
|
||||
2. Verify internet access and speed
|
||||
3. Test Tailscale connection to homelab
|
||||
4. Confirm DNS resolution working
|
||||
|
||||
# Step 2: Secure Network Setup
|
||||
1. Configure guest network for untrusted devices
|
||||
2. Set up QoS rules for work traffic
|
||||
3. Enable firewall and security features
|
||||
4. Test VPN tunnel stability
|
||||
|
||||
# Step 3: Device Integration
|
||||
1. Connect laptop to secure network
|
||||
2. Verify all homelab services accessible
|
||||
3. Test backup router connectivity
|
||||
4. Configure IoT devices if needed
|
||||
|
||||
# Step 4: Monitoring & Maintenance
|
||||
1. Set up network monitoring
|
||||
2. Configure automatic failover
|
||||
3. Test emergency procedures
|
||||
4. Document local network details
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Performance Optimization
|
||||
|
||||
### **Network Performance Tuning**
|
||||
```bash
|
||||
# Router Optimization:
|
||||
- Channel selection for minimal interference
|
||||
- QoS configuration for work traffic priority
|
||||
- Bandwidth allocation for critical services
|
||||
- Latency optimization for real-time applications
|
||||
|
||||
# Tailscale Optimization:
|
||||
- Exit node selection for optimal routing
|
||||
- Subnet routing for efficient access
|
||||
- DNS configuration for fast resolution
|
||||
- Connection monitoring and alerting
|
||||
```
|
||||
|
||||
### **Power Management**
|
||||
```bash
|
||||
# Laptop Power Optimization:
|
||||
- Performance profiles for different scenarios
|
||||
- Battery conservation during travel
|
||||
- Fast charging strategies
|
||||
- Power bank compatibility
|
||||
|
||||
# Router Power Management:
|
||||
- Battery operation for portable routers
|
||||
- Power consumption monitoring
|
||||
- Charging schedules and rotation
|
||||
- Emergency power procedures
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ Security Best Practices
|
||||
|
||||
### **Multi-Layer Security**
|
||||
```bash
|
||||
# Network Security:
|
||||
- WPA3 encryption on all networks
|
||||
- Guest network isolation
|
||||
- Firewall rules and access control
|
||||
- Regular security updates
|
||||
|
||||
# VPN Security:
|
||||
- Strong encryption (WireGuard/OpenVPN)
|
||||
- Kill switch functionality
|
||||
- DNS leak protection
|
||||
- Connection monitoring
|
||||
|
||||
# Device Security:
|
||||
- Full disk encryption
|
||||
- Strong authentication (2FA)
|
||||
- Regular security updates
|
||||
- Endpoint protection
|
||||
```
|
||||
|
||||
### **Emergency Security Procedures**
|
||||
```bash
|
||||
# Compromise Response:
|
||||
1. Disconnect from network immediately
|
||||
2. Switch to cellular/backup connectivity
|
||||
3. Change critical passwords
|
||||
4. Notify homelab of potential breach
|
||||
5. Implement emergency access procedures
|
||||
|
||||
# Recovery Procedures:
|
||||
1. Factory reset compromised devices
|
||||
2. Restore from secure backups
|
||||
3. Re-establish secure connections
|
||||
4. Verify system integrity
|
||||
5. Document incident for future prevention
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Troubleshooting Guide
|
||||
|
||||
### **Common Issues & Solutions**
|
||||
```bash
|
||||
# Connectivity Problems:
|
||||
- Router not connecting to internet
|
||||
- Tailscale tunnel not establishing
|
||||
- DNS resolution failures
|
||||
- Slow network performance
|
||||
|
||||
# Solutions:
|
||||
- Check physical connections and power
|
||||
- Verify ISP settings and credentials
|
||||
- Test with different routers/configurations
|
||||
- Contact local ISP support if needed
|
||||
```
|
||||
|
||||
### **Emergency Procedures**
|
||||
```bash
|
||||
# Complete Network Failure:
|
||||
1. Switch to cellular hotspot
|
||||
2. Use emergency router (GL-MT300N-V2)
|
||||
3. Access homelab via Tailscale mobile app
|
||||
4. Use GL-RM1 KVM for server management
|
||||
5. Contact local technical support
|
||||
|
||||
# Hardware Failure:
|
||||
1. Identify failed component
|
||||
2. Switch to backup hardware
|
||||
3. Restore configuration from backup
|
||||
4. Test all critical functions
|
||||
5. Arrange replacement if needed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Advanced Use Cases
|
||||
|
||||
### **Content Creation on the Road**
|
||||
- **4K Video Editing**: High-performance laptop with OLED display
|
||||
- **Large File Transfers**: Wi-Fi 7 for rapid upload/download
|
||||
- **Cloud Storage Sync**: Seamless access to homelab storage
|
||||
- **Collaboration**: Real-time sharing via homelab services
|
||||
|
||||
### **Remote Development**
|
||||
- **Full Dev Environment**: WSL2 + Docker + VS Code
|
||||
- **Git Operations**: Direct GitLab access via Tailscale
|
||||
- **Container Development**: Local Docker with homelab registry
|
||||
- **Testing & Deployment**: Remote access to staging environments
|
||||
|
||||
### **AI/ML Workloads**
|
||||
- **Local Processing**: Intel NPU for edge AI tasks
|
||||
- **Dataset Access**: High-speed download from homelab
|
||||
- **Model Training**: Hybrid local/remote processing
|
||||
- **Result Sharing**: Upload models back to homelab
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Integration Points
|
||||
|
||||
### **Homelab Service Integration**
|
||||
- **[Tailscale Setup](tailscale-setup-guide.md)**: Core VPN configuration
|
||||
- **[GL.iNet Devices](glinet-travel-networking.md)**: Detailed router setup
|
||||
- **[Mobile Devices](mobile-device-setup.md)**: Phone and tablet integration
|
||||
- **[Laptop Setup](laptop-travel-setup.md)**: Detailed laptop configuration
|
||||
|
||||
### **Infrastructure Components**
|
||||
- **[Network Architecture](networking.md)**: Overall network design
|
||||
- **[Host Overview](hosts.md)**: All system specifications
|
||||
- **[Security Model](../admin/security.md)**: Security implementation
|
||||
- **[Monitoring Setup](../admin/monitoring.md)**: System monitoring
|
||||
|
||||
---
|
||||
|
||||
*This comprehensive travel setup provides enterprise-level connectivity, security, and functionality while maintaining the portability and flexibility needed for modern mobile work and digital nomad lifestyles.*
|
||||
261
docs/infrastructure/docker/monitoring/README.md
Normal file
261
docs/infrastructure/docker/monitoring/README.md
Normal file
@@ -0,0 +1,261 @@
|
||||
# 📊 Monitoring Infrastructure
|
||||
|
||||
*Docker-based monitoring stack for comprehensive homelab observability*
|
||||
|
||||
## Overview
|
||||
This directory contains the Docker-based monitoring infrastructure that provides comprehensive observability across the entire homelab environment.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Core Components
|
||||
- **Prometheus** - Metrics collection and storage
|
||||
- **Grafana** - Visualization and dashboards
|
||||
- **AlertManager** - Alert routing and management
|
||||
- **Node Exporter** - System metrics collection
|
||||
- **cAdvisor** - Container metrics collection
|
||||
|
||||
### Deployment Structure
|
||||
```
|
||||
monitoring/
|
||||
├── prometheus/
|
||||
│ ├── prometheus.yml # Main configuration
|
||||
│ ├── alert-rules.yml # Alert definitions
|
||||
│ └── targets/ # Service discovery configs
|
||||
├── grafana/
|
||||
│ ├── provisioning/ # Dashboard and datasource configs
|
||||
│ └── dashboards/ # JSON dashboard definitions
|
||||
├── alertmanager/
|
||||
│ └── alertmanager.yml # Alert routing configuration
|
||||
└── docker-compose.yml # Complete monitoring stack
|
||||
```
|
||||
|
||||
## Service Endpoints
|
||||
|
||||
### Internal Access
|
||||
- **Prometheus**: `http://prometheus:9090`
|
||||
- **Grafana**: `http://grafana:3000`
|
||||
- **AlertManager**: `http://alertmanager:9093`
|
||||
|
||||
### External Access (via Nginx Proxy Manager)
|
||||
- **Grafana**: `https://grafana.vish.gg`
|
||||
- **Prometheus**: `https://prometheus.vish.gg` (admin only)
|
||||
- **AlertManager**: `https://alerts.vish.gg` (admin only)
|
||||
|
||||
## Metrics Collection
|
||||
|
||||
### System Metrics
|
||||
- **Node Exporter**: CPU, memory, disk, network statistics
|
||||
- **SNMP Exporter**: Network equipment monitoring
|
||||
- **Blackbox Exporter**: Service availability checks
|
||||
|
||||
### Container Metrics
|
||||
- **cAdvisor**: Docker container resource usage
|
||||
- **Portainer metrics**: Container orchestration metrics
|
||||
- **Docker daemon metrics**: Docker engine statistics
|
||||
|
||||
### Application Metrics
|
||||
- **Plex**: Media server performance metrics
|
||||
- **Nginx**: Web server access and performance
|
||||
- **Database metrics**: PostgreSQL, Redis performance
|
||||
|
||||
### Custom Metrics
|
||||
- **Backup status**: Success/failure rates
|
||||
- **Storage usage**: Disk space across all hosts
|
||||
- **Network performance**: Bandwidth and latency
|
||||
|
||||
## Dashboard Categories
|
||||
|
||||
### Infrastructure Dashboards
|
||||
- **Host Overview**: System resource utilization
|
||||
- **Network Performance**: Bandwidth and connectivity
|
||||
- **Storage Monitoring**: Disk usage and health
|
||||
- **Docker Containers**: Container resource usage
|
||||
|
||||
### Service Dashboards
|
||||
- **Media Services**: Plex, Arr suite performance
|
||||
- **Web Services**: Nginx, application response times
|
||||
- **Database Performance**: Query performance and connections
|
||||
- **Backup Monitoring**: Backup job status and trends
|
||||
|
||||
### Security Dashboards
|
||||
- **Authentication Events**: Login attempts and failures
|
||||
- **Network Security**: Firewall logs and intrusion attempts
|
||||
- **Certificate Monitoring**: SSL certificate expiration
|
||||
- **Vulnerability Scanning**: Security scan results
|
||||
|
||||
## Alert Configuration
|
||||
|
||||
### Critical Alerts
|
||||
- **Host down**: System unreachable
|
||||
- **High resource usage**: CPU/Memory > 90%
|
||||
- **Disk space critical**: < 10% free space
|
||||
- **Service unavailable**: Key services down
|
||||
|
||||
### Warning Alerts
|
||||
- **High resource usage**: CPU/Memory > 80%
|
||||
- **Disk space low**: < 20% free space
|
||||
- **Certificate expiring**: < 30 days to expiration
|
||||
- **Backup failures**: Failed backup jobs
|
||||
|
||||
### Info Alerts
|
||||
- **System updates**: Available updates
|
||||
- **Maintenance windows**: Scheduled maintenance
|
||||
- **Performance trends**: Unusual patterns
|
||||
- **Capacity planning**: Resource growth trends
|
||||
|
||||
## Data Retention
|
||||
|
||||
### Prometheus Retention
|
||||
- **Raw metrics**: 15 days high resolution
|
||||
- **Downsampled**: 90 days medium resolution
|
||||
- **Long-term**: 1 year low resolution
|
||||
|
||||
### Grafana Data
|
||||
- **Dashboards**: Version controlled in Git
|
||||
- **User preferences**: Backed up weekly
|
||||
- **Annotations**: Retained for 1 year
|
||||
|
||||
### Log Retention
|
||||
- **Application logs**: 30 days
|
||||
- **System logs**: 90 days
|
||||
- **Audit logs**: 1 year
|
||||
- **Security logs**: 2 years
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Configuration Backup
|
||||
```bash
|
||||
# Backup Prometheus configuration
|
||||
docker exec prometheus tar -czf /backup/prometheus-config-$(date +%Y%m%d).tar.gz /etc/prometheus/
|
||||
|
||||
# Backup Grafana dashboards
|
||||
docker exec grafana tar -czf /backup/grafana-dashboards-$(date +%Y%m%d).tar.gz /var/lib/grafana/
|
||||
```
|
||||
|
||||
### Data Backup
|
||||
```bash
|
||||
# Backup Prometheus data
|
||||
docker exec prometheus tar -czf /backup/prometheus-data-$(date +%Y%m%d).tar.gz /prometheus/
|
||||
|
||||
# Backup Grafana database
|
||||
docker exec grafana sqlite3 /var/lib/grafana/grafana.db ".backup /backup/grafana-$(date +%Y%m%d).db"
|
||||
```
|
||||
|
||||
### Disaster Recovery
|
||||
1. **Restore configurations** from backup
|
||||
2. **Redeploy containers** with restored configs
|
||||
3. **Import historical data** if needed
|
||||
4. **Verify alert routing** and dashboard functionality
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Prometheus Optimization
|
||||
- **Recording rules**: Pre-calculate expensive queries
|
||||
- **Metric relabeling**: Reduce cardinality
|
||||
- **Storage optimization**: Efficient time series storage
|
||||
- **Query optimization**: Efficient PromQL queries
|
||||
|
||||
### Grafana Optimization
|
||||
- **Dashboard caching**: Reduce query load
|
||||
- **Panel optimization**: Efficient visualizations
|
||||
- **User management**: Role-based access control
|
||||
- **Plugin management**: Only necessary plugins
|
||||
|
||||
### Network Optimization
|
||||
- **Local metrics**: Minimize network traffic
|
||||
- **Compression**: Enable metric compression
|
||||
- **Batching**: Batch metric collection
|
||||
- **Filtering**: Collect only necessary metrics
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### High Memory Usage
|
||||
```bash
|
||||
# Check Prometheus memory usage
|
||||
docker stats prometheus
|
||||
|
||||
# Reduce retention period
|
||||
# Edit prometheus.yml: --storage.tsdb.retention.time=7d
|
||||
```
|
||||
|
||||
#### Missing Metrics
|
||||
```bash
|
||||
# Check target status
|
||||
curl http://prometheus:9090/api/v1/targets
|
||||
|
||||
# Verify service discovery
|
||||
curl http://prometheus:9090/api/v1/label/__name__/values
|
||||
```
|
||||
|
||||
#### Dashboard Loading Issues
|
||||
```bash
|
||||
# Check Grafana logs
|
||||
docker logs grafana
|
||||
|
||||
# Verify datasource connectivity
|
||||
curl http://grafana:3000/api/datasources/proxy/1/api/v1/query?query=up
|
||||
```
|
||||
|
||||
### Monitoring Health Checks
|
||||
```bash
|
||||
# Prometheus health
|
||||
curl http://prometheus:9090/-/healthy
|
||||
|
||||
# Grafana health
|
||||
curl http://grafana:3000/api/health
|
||||
|
||||
# AlertManager health
|
||||
curl http://alertmanager:9093/-/healthy
|
||||
```
|
||||
|
||||
## Security Configuration
|
||||
|
||||
### Authentication
|
||||
- **Grafana**: OAuth integration with Authentik
|
||||
- **Prometheus**: Basic auth via reverse proxy
|
||||
- **AlertManager**: Basic auth via reverse proxy
|
||||
|
||||
### Network Security
|
||||
- **Internal network**: Isolated Docker network
|
||||
- **Reverse proxy**: Nginx Proxy Manager
|
||||
- **SSL termination**: Let's Encrypt certificates
|
||||
- **Access control**: IP-based restrictions
|
||||
|
||||
### Data Security
|
||||
- **Encryption at rest**: Encrypted storage volumes
|
||||
- **Encryption in transit**: TLS for all communications
|
||||
- **Access logging**: Comprehensive audit trails
|
||||
- **Regular updates**: Automated security updates
|
||||
|
||||
## Integration Points
|
||||
|
||||
### External Systems
|
||||
- **NTFY**: Push notifications for alerts
|
||||
- **Email**: Backup notification channel
|
||||
- **Slack**: Team notifications (optional)
|
||||
- **PagerDuty**: Escalation for critical alerts
|
||||
|
||||
### Automation
|
||||
- **Ansible**: Configuration management
|
||||
- **GitOps**: Version-controlled configurations
|
||||
- **CI/CD**: Automated deployment pipeline
|
||||
- **Backup automation**: Scheduled backups
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Planned Features
|
||||
- **Log aggregation**: Centralized log management
|
||||
- **Distributed tracing**: Application tracing
|
||||
- **Synthetic monitoring**: Proactive service testing
|
||||
- **Machine learning**: Anomaly detection
|
||||
|
||||
### Scaling Considerations
|
||||
- **High availability**: Multi-instance deployment
|
||||
- **Load balancing**: Distribute query load
|
||||
- **Federation**: Multi-cluster monitoring
|
||||
- **Storage scaling**: Efficient long-term storage
|
||||
|
||||
---
|
||||
**Status**: ✅ Comprehensive monitoring infrastructure operational across all homelab systems
|
||||
122
docs/infrastructure/domain-migration-synology.md
Normal file
122
docs/infrastructure/domain-migration-synology.md
Normal file
@@ -0,0 +1,122 @@
|
||||
# Synology Domain Migration Guide
|
||||
|
||||
Migrating from `*.vishconcord.synology.me` to `*.vish.gg` domains.
|
||||
|
||||
## Why Migrate?
|
||||
|
||||
- **Consistency**: All services under your own domain
|
||||
- **Control**: Full DNS control via Cloudflare
|
||||
- **Security**: Can proxy through Cloudflare (orange cloud)
|
||||
- **Professional**: Cleaner URLs for sharing
|
||||
- **SSO**: Easier Authentik integration with single domain
|
||||
|
||||
## Current → New Domain Mapping
|
||||
|
||||
### Calypso Services (Stay at Location A)
|
||||
|
||||
| Current | New | Service | Expose? |
|
||||
|---------|-----|---------|---------|
|
||||
| `sf.vishconcord.synology.me` | `sf.vish.gg` | Seafile | Yes - sharing |
|
||||
| `dav.vishconcord.synology.me` | `dav.vish.gg` | Seafile WebDAV | Internal |
|
||||
| `actual.vishconcord.synology.me` | `actual.vish.gg` | Actual Budget | Internal |
|
||||
| `paperlessngx.vishconcord.synology.me` | `docs.vish.gg` | Paperless-NGX | Internal |
|
||||
| `ost.vishconcord.synology.me` | `ost.vish.gg` | OST | Internal |
|
||||
| `retro.vishconcord.synology.me` | `retro.vish.gg` | Retro site | Maybe |
|
||||
| `rackula.vishconcord.synology.me` | - | Rackula (broken) | Remove |
|
||||
|
||||
### Atlantis Services (Move to Location B)
|
||||
|
||||
| Current | New | Service | Expose? |
|
||||
|---------|-----|---------|---------|
|
||||
| `ollama.vishconcord.synology.me` | `ollama.vish.gg` | Ollama AI | Internal |
|
||||
| `ssh.vishconcord.synology.me` | - | Termix SSH | Internal/VPN |
|
||||
| `rxv4access.vishconcord.synology.me` | - | RXV4 Access | Internal |
|
||||
| `rxv4download.vishconcord.synology.me` | - | RXV4 Download | Internal |
|
||||
|
||||
## Migration Steps
|
||||
|
||||
### Step 1: Create DNS Records
|
||||
|
||||
For each new domain, create an A record in Cloudflare:
|
||||
|
||||
```bash
|
||||
# Example: sf.vish.gg
|
||||
curl -X POST "https://api.cloudflare.com/client/v4/zones/ZONE_ID/dns_records" \
|
||||
-H "Authorization: Bearer TOKEN" \
|
||||
-H "Content-Type: application/json" \
|
||||
--data '{
|
||||
"type": "A",
|
||||
"name": "sf.vish.gg",
|
||||
"content": "YOUR_WAN_IP",
|
||||
"ttl": 1,
|
||||
"proxied": true
|
||||
}'
|
||||
```
|
||||
|
||||
### Step 2: Update Synology Reverse Proxy
|
||||
|
||||
For each service, add a new reverse proxy entry with the new domain:
|
||||
|
||||
1. DSM → Control Panel → Login Portal → Advanced → Reverse Proxy
|
||||
2. Create new entry with same backend, new domain
|
||||
3. Assign SSL certificate (Cloudflare origin cert)
|
||||
|
||||
### Step 3: Update SSL Certificates
|
||||
|
||||
The existing `*.vish.gg` Cloudflare origin certificate should cover new subdomains.
|
||||
|
||||
If needed, generate a new certificate covering:
|
||||
- `*.vish.gg`
|
||||
- `vish.gg`
|
||||
|
||||
### Step 4: Test New Domains
|
||||
|
||||
Test each new domain before removing old ones.
|
||||
|
||||
### Step 5: Remove Old Entries
|
||||
|
||||
Once confirmed working, remove the `*.synology.me` reverse proxy entries.
|
||||
|
||||
## Authentik Protection
|
||||
|
||||
### Services to Protect with SSO
|
||||
|
||||
| Domain | Service | Auth Required? |
|
||||
|--------|---------|----------------|
|
||||
| `sf.vish.gg` | Seafile | Yes (has share links) |
|
||||
| `docs.vish.gg` | Paperless | Yes |
|
||||
| `actual.vish.gg` | Actual Budget | Yes |
|
||||
| `gf.vish.gg` | Grafana | Yes (already configured) |
|
||||
| `git.vish.gg` | Gitea | Yes (already configured) |
|
||||
|
||||
### Services to Keep Public (or with built-in auth)
|
||||
|
||||
| Domain | Service | Reason |
|
||||
|--------|---------|--------|
|
||||
| `sso.vish.gg` | Authentik | Is the auth provider |
|
||||
| `pw.vish.gg` | Vaultwarden | Has own auth |
|
||||
| `mastodon.vish.gg` | Mastodon | Public social |
|
||||
| `ntfy.vish.gg` | Ntfy | Notification endpoint |
|
||||
|
||||
### Forward Auth Setup
|
||||
|
||||
Use Authentik as a forward auth proxy:
|
||||
|
||||
```nginx
|
||||
# In reverse proxy config
|
||||
location / {
|
||||
auth_request /outpost.goauthentik.io/auth/nginx;
|
||||
# ... rest of config
|
||||
}
|
||||
```
|
||||
|
||||
See [Authentik Proxy Provider docs](https://docs.goauthentik.io/docs/providers/proxy/) for full setup.
|
||||
|
||||
## Implementation Order
|
||||
|
||||
1. **Phase 1**: Create DNS records for new domains
|
||||
2. **Phase 2**: Add reverse proxy entries (keep old ones working)
|
||||
3. **Phase 3**: Test new domains thoroughly
|
||||
4. **Phase 4**: Add Authentik protection where needed
|
||||
5. **Phase 5**: Remove old `*.synology.me` entries
|
||||
6. **Phase 6**: Update any apps/configs using old URLs
|
||||
808
docs/infrastructure/family-network-integration.md
Normal file
808
docs/infrastructure/family-network-integration.md
Normal file
@@ -0,0 +1,808 @@
|
||||
# 👨👩👧👦 Family Network Integration Guide
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide covers integrating your family's separate network and ISP with your homelab infrastructure, enabling seamless access to Plex, Immich photo sync, and Synology services while optimizing for different bandwidth capabilities.
|
||||
|
||||
## 🎯 Network Architecture Overview
|
||||
|
||||
### **Network Topology**
|
||||
```bash
|
||||
# Your Homelab Network
|
||||
ISP: 20 Gbps up/down
|
||||
Location: Primary residence
|
||||
Subnet: 192.168.1.0/24
|
||||
Key Services: Atlantis (Plex, Immich), Calypso (Media), Synology
|
||||
|
||||
# Family Network
|
||||
ISP: 2 Gbps down / 400 Mbps up
|
||||
Location: Family residence
|
||||
Subnet: 192.168.2.0/24 (different to avoid conflicts)
|
||||
Bridge Device: Concord-NUC (on family network)
|
||||
```
|
||||
|
||||
### **Integration Strategy**
|
||||
```bash
|
||||
# Concord-NUC as Bridge/Gateway
|
||||
Role: Site-to-site VPN endpoint and local cache
|
||||
Services: WireGuard server, Tailscale exit node, local caching
|
||||
Network: Connected to family network (192.168.2.x)
|
||||
Tailscale IP: concord-nuc.vish.local
|
||||
|
||||
# Bandwidth Optimization
|
||||
Homelab → Family: Utilize full 20 Gbps upload
|
||||
Family → Homelab: Respect 400 Mbps upload limit
|
||||
Local Caching: Cache frequently accessed content on Concord-NUC
|
||||
Quality Adaptation: Automatic quality adjustment based on bandwidth
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Site-to-Site VPN Configuration
|
||||
|
||||
### **Tailscale Site-to-Site Setup**
|
||||
|
||||
#### **Configure Concord-NUC as Subnet Router**
|
||||
```bash
|
||||
# On Concord-NUC (at family location)
|
||||
# Enable IP forwarding
|
||||
echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.conf
|
||||
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.conf
|
||||
sudo sysctl -p
|
||||
|
||||
# Advertise family subnet to Tailscale
|
||||
sudo tailscale up --advertise-routes=192.168.2.0/24 --accept-dns=false
|
||||
|
||||
# Verify subnet advertisement
|
||||
tailscale status
|
||||
```
|
||||
|
||||
#### **Accept Subnet Routes on Homelab**
|
||||
```bash
|
||||
# In Tailscale Admin Console (https://login.tailscale.com/admin)
|
||||
# Navigate to: Machines → concord-nuc → Route settings
|
||||
# Enable: 192.168.2.0/24 subnet route
|
||||
# This allows homelab to reach family network devices directly
|
||||
|
||||
# On homelab servers, accept the routes
|
||||
sudo tailscale up --accept-routes
|
||||
```
|
||||
|
||||
#### **Configure Family Router**
|
||||
```bash
|
||||
# Add static routes on family router to route homelab traffic through Concord-NUC
|
||||
# Router Admin → Advanced → Static Routes
|
||||
|
||||
# Route homelab Tailscale network through Concord-NUC
|
||||
Destination: 100.64.0.0/10
|
||||
Gateway: 192.168.2.100 (Concord-NUC local IP)
|
||||
Interface: LAN
|
||||
|
||||
# Route specific homelab subnets (optional)
|
||||
Destination: 192.168.1.0/24
|
||||
Gateway: 192.168.2.100
|
||||
Interface: LAN
|
||||
```
|
||||
|
||||
### **WireGuard Site-to-Site (Alternative)**
|
||||
|
||||
#### **Configure WireGuard on Concord-NUC**
|
||||
```bash
|
||||
# Install WireGuard
|
||||
sudo apt update && sudo apt install wireguard
|
||||
|
||||
# Generate keys
|
||||
wg genkey | sudo tee /etc/wireguard/private.key
|
||||
sudo chmod 600 /etc/wireguard/private.key
|
||||
sudo cat /etc/wireguard/private.key | wg pubkey | sudo tee /etc/wireguard/public.key
|
||||
|
||||
# Configure WireGuard interface
|
||||
sudo tee /etc/wireguard/wg-family.conf << 'EOF'
|
||||
[Interface]
|
||||
PrivateKey = CONCORD_PRIVATE_KEY
|
||||
Address = 10.100.0.2/24
|
||||
ListenPort = 51821
|
||||
PostUp = iptables -A FORWARD -i %i -j ACCEPT; iptables -A FORWARD -o %i -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
|
||||
PostDown = iptables -D FORWARD -i %i -j ACCEPT; iptables -D FORWARD -o %i -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
|
||||
|
||||
[Peer]
|
||||
# Homelab endpoint (Atlantis)
|
||||
PublicKey = ATLANTIS_PUBLIC_KEY
|
||||
Endpoint = your-homelab-external-ip:51820
|
||||
AllowedIPs = 192.168.1.0/24, 10.100.0.1/32
|
||||
PersistentKeepalive = 25
|
||||
EOF
|
||||
|
||||
# Enable and start WireGuard
|
||||
sudo systemctl enable wg-quick@wg-family
|
||||
sudo systemctl start wg-quick@wg-family
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📺 Plex Integration and Optimization
|
||||
|
||||
### **Plex Server Configuration**
|
||||
|
||||
#### **Network and Remote Access**
|
||||
```bash
|
||||
# On Atlantis (Plex server)
|
||||
# Plex Settings → Network
|
||||
|
||||
# Network Interface: All interfaces
|
||||
# Secure connections: Preferred
|
||||
# Remote access: Enable
|
||||
# Manually specify public port: 32400
|
||||
# Custom server access URLs:
|
||||
# - https://atlantis.vish.local:32400
|
||||
# - https://plex.vish.local:32400 (if using custom DNS)
|
||||
|
||||
# Bandwidth settings for family network
|
||||
# Settings → Network → Remote streaming
|
||||
Maximum remote streaming bitrate: 20 Mbps (respect family's download limit)
|
||||
Internet upload speed: 20000 Mbps (your homelab upload)
|
||||
```
|
||||
|
||||
#### **Quality and Transcoding Settings**
|
||||
```bash
|
||||
# Settings → Transcoder
|
||||
Transcoder quality: Automatic
|
||||
Use hardware acceleration: Enable (if available)
|
||||
Use hardware-accelerated video encoding: Enable
|
||||
Maximum simultaneous video transcode: 4
|
||||
|
||||
# Settings → Network → Show Advanced
|
||||
Enable Relay: Disable (force direct connections)
|
||||
Treat WAN IP As LAN: Add family network subnet (192.168.2.0/24)
|
||||
List of IP addresses and networks that are allowed without auth: 192.168.2.0/24
|
||||
```
|
||||
|
||||
### **Family Device Configuration**
|
||||
|
||||
#### **Plex App Setup on Family Devices**
|
||||
```bash
|
||||
# Install Plex app on family devices:
|
||||
# - Smart TVs, Apple TV, Roku, Fire TV
|
||||
# - Mobile devices (iOS/Android)
|
||||
# - Computers (Windows/Mac/Linux)
|
||||
|
||||
# Sign in with Plex account
|
||||
# Server should auto-discover via Tailscale or direct connection
|
||||
# If not found, manually add server:
|
||||
# Server address: atlantis.vish.local:32400
|
||||
# Or: concord-nuc.vish.local:32400 (if using local proxy)
|
||||
```
|
||||
|
||||
#### **Local Plex Cache on Concord-NUC**
|
||||
```bash
|
||||
# Set up Plex Media Server on Concord-NUC for caching
|
||||
# This reduces bandwidth usage for frequently watched content
|
||||
|
||||
# Install Plex on Concord-NUC
|
||||
wget https://downloads.plex.tv/plex-media-server-new/1.40.0.7998-c29d4c0c8/debian/plexmediaserver_1.40.0.7998-c29d4c0c8_amd64.deb
|
||||
sudo dpkg -i plexmediaserver_*.deb
|
||||
|
||||
# Configure as secondary server with sync
|
||||
# Plex Settings → Sync
|
||||
# Enable sync for frequently watched content
|
||||
# Sync location: /var/lib/plexmediaserver/sync
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📸 Immich Photo Sync Integration
|
||||
|
||||
### **Immich Server Configuration**
|
||||
|
||||
#### **Multi-Site Photo Management**
|
||||
```bash
|
||||
# On Calypso (primary Immich server)
|
||||
# Configure for external access via Tailscale
|
||||
|
||||
# Immich Admin Settings
|
||||
# Server Settings → External domain: https://calypso.vish.local:2283
|
||||
# Storage Settings → Upload location: /volume1/immich/upload
|
||||
# User Settings → Storage quota: Unlimited (for family)
|
||||
|
||||
# Create family user accounts
|
||||
# Administration → Users → Add User
|
||||
Username: family-member-1
|
||||
Email: family1@vish.local
|
||||
Password: "REDACTED_PASSWORD" strong password]
|
||||
Storage quota: Unlimited
|
||||
```
|
||||
|
||||
#### **Immich Proxy on Concord-NUC**
|
||||
```bash
|
||||
# Set up Nginx proxy on Concord-NUC for local access optimization
|
||||
sudo apt install nginx
|
||||
|
||||
# Configure Nginx proxy
|
||||
sudo tee /etc/nginx/sites-available/immich-proxy << 'EOF'
|
||||
server {
|
||||
listen 2283;
|
||||
server_name concord-nuc.vish.local;
|
||||
|
||||
# Increase upload limits for photos/videos
|
||||
client_max_body_size 2G;
|
||||
proxy_request_buffering off;
|
||||
|
||||
location / {
|
||||
proxy_pass https://calypso.vish.local:2283;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Real-IP $remote_addr;
|
||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||
proxy_set_header X-Forwarded-Proto $scheme;
|
||||
|
||||
# Optimize for photo uploads
|
||||
proxy_buffering off;
|
||||
proxy_cache off;
|
||||
proxy_read_timeout 300s;
|
||||
proxy_send_timeout 300s;
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
sudo ln -s /etc/nginx/sites-available/immich-proxy /etc/nginx/sites-enabled/
|
||||
sudo systemctl restart nginx
|
||||
```
|
||||
|
||||
### **Family Device Photo Sync**
|
||||
|
||||
#### **iOS Immich App Configuration**
|
||||
```bash
|
||||
# Install Immich mobile app from App Store
|
||||
# Configure connection:
|
||||
Server URL: https://concord-nuc.vish.local:2283
|
||||
# Or direct: https://calypso.vish.local:2283
|
||||
|
||||
# Login with family account credentials
|
||||
# Enable auto-backup:
|
||||
# Settings → Auto backup
|
||||
# Backup when charging: Enable
|
||||
# Backup on WiFi only: Enable (to respect mobile data)
|
||||
# Background app refresh: Enable
|
||||
|
||||
# Backup settings:
|
||||
# Include videos: Enable
|
||||
# Backup quality: Original (you have bandwidth)
|
||||
# Backup frequency: Immediate
|
||||
```
|
||||
|
||||
#### **Android Immich App Configuration**
|
||||
```bash
|
||||
# Install Immich from Google Play Store or F-Droid
|
||||
# Configure similar to iOS:
|
||||
Server URL: https://concord-nuc.vish.local:2283
|
||||
Auto-backup: Enable
|
||||
WiFi only: Enable
|
||||
Background sync: Enable
|
||||
Quality: Original
|
||||
```
|
||||
|
||||
#### **Desktop Immich CLI Sync**
|
||||
```bash
|
||||
# Install Immich CLI on family computers
|
||||
npm install -g @immich-app/cli
|
||||
|
||||
# Configure API key (from Immich web interface)
|
||||
# User Settings → API Keys → Create API Key
|
||||
|
||||
# Set up sync script for family computers
|
||||
cat > ~/sync-photos.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
export IMMICH_INSTANCE_URL="https://concord-nuc.vish.local:2283"
|
||||
export IMMICH_API_KEY=REDACTED_API_KEY
|
||||
|
||||
# Sync photos from common directories
|
||||
immich upload ~/Pictures/
|
||||
immich upload ~/Desktop/Photos/
|
||||
immich upload /Users/Shared/Photos/ # macOS
|
||||
immich upload ~/Documents/Photos/
|
||||
|
||||
echo "Photo sync completed: $(date)"
|
||||
EOF
|
||||
|
||||
chmod +x ~/sync-photos.sh
|
||||
|
||||
# Schedule regular sync (every 4 hours)
|
||||
crontab -e
|
||||
# Add: 0 */4 * * * /home/user/sync-photos.sh >> /home/user/sync-photos.log 2>&1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💾 Synology Integration
|
||||
|
||||
### **Synology Drive for Family**
|
||||
|
||||
#### **Configure Synology Drive Server**
|
||||
```bash
|
||||
# On Atlantis (Synology NAS)
|
||||
# Package Center → Install Synology Drive Server
|
||||
|
||||
# Synology Drive Admin Console
|
||||
# Enable Synology Drive: ✅
|
||||
# Enable versioning: ✅ (keep 32 versions)
|
||||
# Enable team folders: ✅
|
||||
# External access: Enable via Tailscale (atlantis.vish.local:6690)
|
||||
```
|
||||
|
||||
#### **Create Family Shared Folders**
|
||||
```bash
|
||||
# Control Panel → Shared Folder → Create
|
||||
|
||||
# Family Photos (for Synology Photos)
|
||||
Name: FamilyPhotos
|
||||
Location: /volume1/FamilyPhotos
|
||||
Description: Family photo collection
|
||||
Users: family-member-1, family-member-2 (Read/Write)
|
||||
|
||||
# Family Documents
|
||||
Name: FamilyDocuments
|
||||
Location: /volume1/FamilyDocuments
|
||||
Description: Shared family documents
|
||||
Users: family-member-1, family-member-2 (Read/Write)
|
||||
|
||||
# Family Media
|
||||
Name: FamilyMedia
|
||||
Location: /volume1/FamilyMedia
|
||||
Description: Family videos and media
|
||||
Users: family-member-1, family-member-2 (Read/Write)
|
||||
```
|
||||
|
||||
#### **Synology Drive Client Setup**
|
||||
```bash
|
||||
# Install Synology Drive Client on family devices
|
||||
# Download from: https://www.synology.com/en-us/support/download
|
||||
|
||||
# Configuration:
|
||||
Server address: https://atlantis.vish.local:6690
|
||||
Username: family-member-1
|
||||
Password: "REDACTED_PASSWORD" member password]
|
||||
|
||||
# Sync settings:
|
||||
Local folder: ~/SynologyDrive
|
||||
Server folder: /FamilyDocuments, /FamilyPhotos
|
||||
Sync mode: Two-way sync
|
||||
Bandwidth limit: 50 Mbps upload (respect family ISP limit)
|
||||
```
|
||||
|
||||
### **Synology Photos Integration**
|
||||
|
||||
#### **Configure Synology Photos**
|
||||
```bash
|
||||
# On Atlantis
|
||||
# Package Center → Install Synology Photos
|
||||
|
||||
# Synology Photos Settings
|
||||
# General → Enable Synology Photos: ✅
|
||||
# Indexing → Auto-index shared folders: FamilyPhotos
|
||||
# External access: Enable (via Tailscale)
|
||||
# Face recognition: Enable
|
||||
# Object recognition: Enable
|
||||
```
|
||||
|
||||
#### **Family Device Photo Backup**
|
||||
```bash
|
||||
# Install Synology Photos mobile app
|
||||
# Configure backup:
|
||||
Server: https://atlantis.vish.local (Synology Photos port)
|
||||
Account: family-member-1
|
||||
Backup folder: FamilyPhotos/[Device Name]
|
||||
|
||||
# Backup settings:
|
||||
Auto backup: Enable
|
||||
WiFi only: Enable
|
||||
Original quality: Enable
|
||||
Include videos: Enable
|
||||
Background backup: Enable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Performance Optimization
|
||||
|
||||
### **Bandwidth Management**
|
||||
|
||||
#### **QoS Configuration on Family Router**
|
||||
```bash
|
||||
# Configure QoS to prioritize homelab traffic
|
||||
# Router Admin → Advanced → QoS
|
||||
|
||||
# Upload QoS (400 Mbps total)
|
||||
High Priority (200 Mbps): Video calls, VoIP
|
||||
Medium Priority (150 Mbps): Homelab sync, photo uploads
|
||||
Low Priority (50 Mbps): General browsing, updates
|
||||
|
||||
# Download QoS (2 Gbps total)
|
||||
High Priority (1 Gbps): Streaming, video calls
|
||||
Medium Priority (800 Mbps): Homelab services, file downloads
|
||||
Low Priority (200 Mbps): Background updates
|
||||
```
|
||||
|
||||
#### **Traffic Shaping on Concord-NUC**
|
||||
```bash
|
||||
# Install traffic control tools
|
||||
sudo apt install iproute2 wondershaper
|
||||
|
||||
# Create traffic shaping script
|
||||
sudo tee /usr/local/bin/family-qos.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# Family network traffic shaping
|
||||
|
||||
# Clear existing rules
|
||||
tc qdisc del dev eth0 root 2>/dev/null
|
||||
|
||||
# Create root qdisc
|
||||
tc qdisc add dev eth0 root handle 1: htb default 30
|
||||
|
||||
# Create classes for different traffic types
|
||||
# Class 1:10 - High priority (streaming, real-time)
|
||||
tc class add dev eth0 parent 1: classid 1:10 htb rate 1000mbit ceil 1500mbit
|
||||
# Class 1:20 - Medium priority (homelab services)
|
||||
tc class add dev eth0 parent 1: classid 1:20 htb rate 400mbit ceil 800mbit
|
||||
# Class 1:30 - Low priority (background)
|
||||
tc class add dev eth0 parent 1: classid 1:30 htb rate 100mbit ceil 200mbit
|
||||
|
||||
# Add filters for different services
|
||||
# Plex traffic (high priority)
|
||||
tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 match ip dport 32400 0xffff flowid 1:10
|
||||
# Immich uploads (medium priority)
|
||||
tc filter add dev eth0 protocol ip parent 1:0 prio 2 u32 match ip dport 2283 0xffff flowid 1:20
|
||||
# Synology sync (medium priority)
|
||||
tc filter add dev eth0 protocol ip parent 1:0 prio 2 u32 match ip dport 6690 0xffff flowid 1:20
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/family-qos.sh
|
||||
|
||||
# Run on startup
|
||||
echo "/usr/local/bin/family-qos.sh" >> /etc/rc.local
|
||||
```
|
||||
|
||||
### **Caching and CDN**
|
||||
|
||||
#### **Nginx Caching on Concord-NUC**
|
||||
```bash
|
||||
# Configure Nginx for caching frequently accessed content
|
||||
sudo tee /etc/nginx/conf.d/cache.conf << 'EOF'
|
||||
# Cache configuration
|
||||
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=homelab_cache:100m max_size=50g inactive=7d use_temp_path=off;
|
||||
|
||||
# Cache for Plex thumbnails and metadata
|
||||
location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
|
||||
proxy_cache homelab_cache;
|
||||
proxy_cache_valid 200 7d;
|
||||
proxy_cache_valid 404 1m;
|
||||
add_header X-Cache-Status $upstream_cache_status;
|
||||
expires 7d;
|
||||
}
|
||||
|
||||
# Cache for Immich thumbnails
|
||||
location /api/asset/thumbnail {
|
||||
proxy_cache homelab_cache;
|
||||
proxy_cache_valid 200 30d;
|
||||
proxy_cache_key "$scheme$request_method$host$request_uri";
|
||||
add_header X-Cache-Status $upstream_cache_status;
|
||||
}
|
||||
EOF
|
||||
|
||||
# Create cache directory
|
||||
sudo mkdir -p /var/cache/nginx
|
||||
sudo chown www-data:www-data /var/cache/nginx
|
||||
sudo systemctl restart nginx
|
||||
```
|
||||
|
||||
#### **Local DNS Caching**
|
||||
```bash
|
||||
# Install and configure dnsmasq for local DNS caching
|
||||
sudo apt install dnsmasq
|
||||
|
||||
# Configure dnsmasq
|
||||
sudo tee /etc/dnsmasq.conf << 'EOF'
|
||||
# Listen on family network interface
|
||||
interface=eth0
|
||||
bind-interfaces
|
||||
|
||||
# Cache size and TTL
|
||||
cache-size=10000
|
||||
local-ttl=300
|
||||
|
||||
# Forward to homelab DNS (Pi-hole) via Tailscale
|
||||
server=100.64.0.1 # Atlantis Tailscale IP
|
||||
|
||||
# Local overrides for performance
|
||||
address=/concord-nuc.vish.local/192.168.2.100
|
||||
address=/plex.family.local/192.168.2.100
|
||||
address=/photos.family.local/192.168.2.100
|
||||
EOF
|
||||
|
||||
sudo systemctl enable dnsmasq
|
||||
sudo systemctl start dnsmasq
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring and Analytics
|
||||
|
||||
### **Family Network Monitoring**
|
||||
|
||||
#### **Grafana Dashboard for Family Network**
|
||||
```bash
|
||||
# Create family-specific Grafana dashboard
|
||||
# Panels to include:
|
||||
# 1. Bandwidth usage (upload/download)
|
||||
# 2. Plex streaming sessions and quality
|
||||
# 3. Photo sync progress and storage usage
|
||||
# 4. Concord-NUC system resources
|
||||
# 5. Network latency between sites
|
||||
# 6. Service availability (Plex, Immich, Synology)
|
||||
|
||||
# Add Prometheus monitoring to Concord-NUC
|
||||
# Install node_exporter
|
||||
wget https://github.com/prometheus/node_exporter/releases/latest/download/node_exporter-*linux-amd64.tar.gz
|
||||
tar xvfz node_exporter-*linux-amd64.tar.gz
|
||||
sudo mv node_exporter-*/node_exporter /usr/local/bin/
|
||||
sudo useradd -rs /bin/false node_exporter
|
||||
|
||||
# Create systemd service
|
||||
sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
|
||||
[Unit]
|
||||
Description=Node Exporter
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
User=node_exporter
|
||||
Group=node_exporter
|
||||
Type=simple
|
||||
ExecStart=/usr/local/bin/node_exporter
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl enable node_exporter
|
||||
sudo systemctl start node_exporter
|
||||
```
|
||||
|
||||
#### **Family Usage Analytics**
|
||||
```bash
|
||||
# Track family usage patterns
|
||||
# Create InfluxDB database for family metrics
|
||||
|
||||
# On homelab (Atlantis), add family data collection
|
||||
# Plex usage by family members
|
||||
# Photo upload statistics
|
||||
# Bandwidth utilization patterns
|
||||
# Service response times from family network
|
||||
|
||||
# Example Telegraf configuration for family metrics
|
||||
cat >> /etc/telegraf/telegraf.conf << 'EOF'
|
||||
# Family network monitoring
|
||||
[[inputs.ping]]
|
||||
urls = ["concord-nuc.vish.local", "192.168.2.1"]
|
||||
count = 3
|
||||
ping_timeout = 10.0
|
||||
|
||||
[[inputs.http_response]]
|
||||
urls = [
|
||||
"https://concord-nuc.vish.local:2283", # Immich proxy
|
||||
"https://concord-nuc.vish.local:32400", # Plex proxy
|
||||
"https://concord-nuc.vish.local:6690" # Synology proxy
|
||||
]
|
||||
response_timeout = "10s"
|
||||
method = "GET"
|
||||
|
||||
[[inputs.net]]
|
||||
interfaces = ["tailscale0", "wg-family"]
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security Considerations
|
||||
|
||||
### **Network Segmentation**
|
||||
|
||||
#### **Firewall Rules on Concord-NUC**
|
||||
```bash
|
||||
# Configure UFW for family network security
|
||||
sudo ufw enable
|
||||
|
||||
# Allow family network access to homelab services
|
||||
sudo ufw allow from 192.168.2.0/24 to any port 32400 # Plex
|
||||
sudo ufw allow from 192.168.2.0/24 to any port 2283 # Immich
|
||||
sudo ufw allow from 192.168.2.0/24 to any port 6690 # Synology
|
||||
|
||||
# Allow Tailscale traffic
|
||||
sudo ufw allow in on tailscale0
|
||||
sudo ufw allow out on tailscale0
|
||||
|
||||
# Block direct access to homelab management
|
||||
sudo ufw deny from 192.168.2.0/24 to any port 22 # SSH
|
||||
sudo ufw deny from 192.168.2.0/24 to any port 3000 # Grafana
|
||||
sudo ufw deny from 192.168.2.0/24 to any port 9090 # Prometheus
|
||||
|
||||
# Log denied connections
|
||||
sudo ufw logging on
|
||||
```
|
||||
|
||||
#### **Access Control Lists**
|
||||
```bash
|
||||
# Configure Tailscale ACLs for family access
|
||||
# Tailscale Admin → Access Controls
|
||||
|
||||
{
|
||||
"groups": {
|
||||
"group:family": ["family-member-1@domain.com", "family-member-2@domain.com"],
|
||||
"group:admin": ["admin@domain.com"]
|
||||
},
|
||||
"acls": [
|
||||
// Family members - limited access to media services
|
||||
{
|
||||
"action": "accept",
|
||||
"src": ["group:family"],
|
||||
"dst": [
|
||||
"atlantis.vish.local:32400", // Plex
|
||||
"calypso.vish.local:2283", // Immich
|
||||
"atlantis.vish.local:6690", // Synology Drive
|
||||
"concord-nuc.vish.local:*" // Local proxy services
|
||||
]
|
||||
},
|
||||
// Admin - full access
|
||||
{
|
||||
"action": "accept",
|
||||
"src": ["group:admin"],
|
||||
"dst": ["*:*"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### **Data Privacy and Backup**
|
||||
|
||||
#### **Family Data Backup Strategy**
|
||||
```bash
|
||||
# Automated backup of family data from Concord-NUC to homelab
|
||||
# Create backup script
|
||||
|
||||
cat > /usr/local/bin/family-backup.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# Family data backup to homelab
|
||||
|
||||
BACKUP_DATE=$(date +%Y%m%d)
|
||||
BACKUP_LOG="/var/log/family-backup.log"
|
||||
|
||||
log() {
|
||||
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$BACKUP_LOG"
|
||||
}
|
||||
|
||||
# Backup family photos to Atlantis
|
||||
log "Starting family photo backup"
|
||||
rsync -avz --progress /var/lib/immich/upload/ \
|
||||
atlantis.vish.local:/volume1/backups/family/photos/ \
|
||||
>> "$BACKUP_LOG" 2>&1
|
||||
|
||||
# Backup Synology Drive sync data
|
||||
log "Starting Synology Drive backup"
|
||||
rsync -avz --progress /home/*/SynologyDrive/ \
|
||||
atlantis.vish.local:/volume1/backups/family/documents/ \
|
||||
>> "$BACKUP_LOG" 2>&1
|
||||
|
||||
# Backup Plex cache/metadata
|
||||
log "Starting Plex cache backup"
|
||||
rsync -avz --progress /var/lib/plexmediaserver/ \
|
||||
atlantis.vish.local:/volume1/backups/family/plex-cache/ \
|
||||
>> "$BACKUP_LOG" 2>&1
|
||||
|
||||
log "Family backup completed"
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/family-backup.sh
|
||||
|
||||
# Schedule daily backups at 2 AM
|
||||
echo "0 2 * * * /usr/local/bin/family-backup.sh" | crontab -
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📱 Family Mobile Device Setup
|
||||
|
||||
### **Simplified Mobile Configuration**
|
||||
|
||||
#### **Family iOS/Android Setup**
|
||||
```bash
|
||||
# Install essential apps on family devices:
|
||||
|
||||
# Core Apps:
|
||||
- Plex (media streaming)
|
||||
- Immich (photo backup)
|
||||
- Synology Drive (file sync)
|
||||
- Synology Photos (photo management)
|
||||
|
||||
# Optional Apps:
|
||||
- Tailscale (for advanced users)
|
||||
- Home Assistant (if using smart home)
|
||||
- Grafana (for tech-savvy family members)
|
||||
|
||||
# Configure apps to use Concord-NUC as proxy:
|
||||
Plex Server: concord-nuc.vish.local:32400
|
||||
Immich Server: concord-nuc.vish.local:2283
|
||||
Synology: concord-nuc.vish.local:6690
|
||||
```
|
||||
|
||||
#### **Family Network WiFi Optimization**
|
||||
```bash
|
||||
# Configure family router for optimal streaming
|
||||
# WiFi Settings:
|
||||
Channel Width: 160 MHz (5 GHz)
|
||||
QAM: 1024-QAM (if supported)
|
||||
Band Steering: Enable
|
||||
Airtime Fairness: Enable
|
||||
Beamforming: Enable
|
||||
|
||||
# Device Priority:
|
||||
High Priority: Streaming devices (Apple TV, Roku, etc.)
|
||||
Medium Priority: Mobile devices
|
||||
Low Priority: IoT devices, smart home
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Family Integration Checklist
|
||||
|
||||
### **Initial Setup**
|
||||
```bash
|
||||
☐ Configure Concord-NUC as Tailscale subnet router
|
||||
☐ Set up site-to-site VPN between networks
|
||||
☐ Configure family router static routes
|
||||
☐ Install and configure Plex proxy on Concord-NUC
|
||||
☐ Set up Immich proxy and photo sync
|
||||
☐ Configure Synology Drive for family access
|
||||
☐ Implement QoS and traffic shaping
|
||||
☐ Set up local DNS caching
|
||||
☐ Configure monitoring and analytics
|
||||
☐ Test all services from family network
|
||||
```
|
||||
|
||||
### **Family Device Setup**
|
||||
```bash
|
||||
☐ Install Plex app on all family streaming devices
|
||||
☐ Configure Immich mobile apps for photo backup
|
||||
☐ Set up Synology Drive clients on family computers
|
||||
☐ Install Synology Photos apps for photo management
|
||||
☐ Configure WiFi optimization on family router
|
||||
☐ Test streaming quality and performance
|
||||
☐ Set up parental controls if needed
|
||||
☐ Create user accounts for all family members
|
||||
☐ Document access credentials securely
|
||||
☐ Train family members on app usage
|
||||
```
|
||||
|
||||
### **Security and Maintenance**
|
||||
```bash
|
||||
☐ Configure firewall rules on Concord-NUC
|
||||
☐ Set up Tailscale ACLs for family access
|
||||
☐ Implement automated backup procedures
|
||||
☐ Configure monitoring alerts
|
||||
☐ Set up bandwidth monitoring
|
||||
☐ Create maintenance schedule
|
||||
☐ Document troubleshooting procedures
|
||||
☐ Test disaster recovery procedures
|
||||
☐ Regular security audits
|
||||
☐ Update documentation as needed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [Tailscale Setup Guide](tailscale-setup-guide.md) - VPN infrastructure setup
|
||||
- [Mobile Device Setup](mobile-device-setup.md) - Family mobile device configuration
|
||||
- [Ubiquiti Enterprise Setup](ubiquiti-enterprise-setup.md) - Advanced networking options
|
||||
- [Individual Service Docs](../services/individual/README.md) - Plex, Immich, Synology configuration
|
||||
- [Security Model](security.md) - Security considerations for family access
|
||||
|
||||
---
|
||||
|
||||
**💡 Pro Tip**: Start with Plex streaming to test the connection, then gradually add photo sync and file sharing. Monitor bandwidth usage closely during the first few weeks to optimize QoS settings for your family's usage patterns!
|
||||
527
docs/infrastructure/glinet-travel-networking.md
Normal file
527
docs/infrastructure/glinet-travel-networking.md
Normal file
@@ -0,0 +1,527 @@
|
||||
# 🌐 GL.iNet Travel Networking Infrastructure
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide covers the complete GL.iNet travel networking setup, including travel routers, IoT gateway, and remote KVM for secure mobile connectivity and remote management.
|
||||
|
||||
---
|
||||
|
||||
## 🎒 GL.iNet Device Portfolio
|
||||
|
||||
### **GL.iNet Comet (GL-RM1) - Remote KVM**
|
||||
|
||||
#### **Hardware Specifications**
|
||||
- **Model**: GL-RM1 Remote KVM over IP
|
||||
- **Purpose**: Remote server management and troubleshooting
|
||||
- **Video**: Up to 1920x1200@60Hz resolution
|
||||
- **USB**: Virtual keyboard and mouse support
|
||||
- **Network**: Ethernet connection for remote access
|
||||
- **Power**: USB-C powered, low power consumption
|
||||
- **Form Factor**: Compact, portable design
|
||||
|
||||
#### **Use Cases**
|
||||
- **Remote Server Management**: Access BIOS, boot sequences, OS installation
|
||||
- **Headless System Control**: Manage servers without physical access
|
||||
- **Emergency Recovery**: Fix systems when SSH/network is down
|
||||
- **Travel Troubleshooting**: Diagnose homelab issues from anywhere
|
||||
- **Secure Access**: Out-of-band management independent of OS
|
||||
|
||||
#### **Integration with Homelab**
|
||||
```
|
||||
Homelab Server → GL-RM1 KVM → Network → Tailscale → Travel Device
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### **GL.iNet Slate 7 (GL-BE3600) - Wi-Fi 7 Travel Router**
|
||||
|
||||
#### **Hardware Specifications**
|
||||
- **Model**: GL-BE3600 Dual-Band Wi-Fi 7 Travel Router
|
||||
- **Wi-Fi Standard**: Wi-Fi 7 (802.11be)
|
||||
- **Speed**: Up to 3.6 Gbps total throughput
|
||||
- **Bands**: Dual-band (2.4GHz + 5GHz)
|
||||
- **Ports**: 1x Gigabit WAN, 1x Gigabit LAN
|
||||
- **CPU**: Quad-core ARM processor
|
||||
- **RAM**: 1GB DDR4
|
||||
- **Storage**: 256MB flash storage
|
||||
- **Power**: USB-C, portable battery support
|
||||
- **VPN**: Built-in OpenVPN, WireGuard support
|
||||
|
||||
#### **Key Features**
|
||||
- **Wi-Fi 7 Technology**: Latest wireless standard for maximum performance
|
||||
- **Travel-Optimized**: Compact form factor, battery operation
|
||||
- **VPN Client/Server**: Secure tunnel back to homelab
|
||||
- **Captive Portal Bypass**: Automatic hotel/airport Wi-Fi connection
|
||||
- **Dual WAN**: Ethernet + Wi-Fi uplink for redundancy
|
||||
- **Guest Network**: Isolated network for untrusted devices
|
||||
|
||||
---
|
||||
|
||||
### **GL.iNet Beryl AX (GL-MT3000) - Wi-Fi 6 Pocket Router**
|
||||
|
||||
#### **Hardware Specifications**
|
||||
- **Model**: GL-MT3000 Pocket-Sized Wi-Fi 6 Router
|
||||
- **Wi-Fi Standard**: Wi-Fi 6 (802.11ax)
|
||||
- **Speed**: Up to 2.4 Gbps total throughput
|
||||
- **Bands**: Dual-band (2.4GHz + 5GHz)
|
||||
- **Ports**: 1x Gigabit WAN/LAN
|
||||
- **CPU**: Dual-core ARM Cortex-A53
|
||||
- **RAM**: 512MB DDR4
|
||||
- **Storage**: 128MB flash storage
|
||||
- **Power**: USB-C, ultra-portable
|
||||
- **Battery**: Optional external battery pack
|
||||
|
||||
#### **Use Cases**
|
||||
- **Ultra-Portable Networking**: Smallest form factor for minimal travel
|
||||
- **Hotel Room Setup**: Instant secure Wi-Fi in accommodations
|
||||
- **Conference Networking**: Secure connection at events
|
||||
- **Backup Connectivity**: Secondary router for redundancy
|
||||
- **IoT Device Management**: Isolated network for smart devices
|
||||
|
||||
---
|
||||
|
||||
### **GL.iNet Mango (GL-MT300N-V2) - Compact Travel Router**
|
||||
|
||||
#### **Hardware Specifications**
|
||||
- **Model**: GL-MT300N-V2 Mini Travel Router
|
||||
- **Wi-Fi Standard**: Wi-Fi 4 (802.11n)
|
||||
- **Speed**: Up to 300 Mbps
|
||||
- **Band**: Single-band (2.4GHz)
|
||||
- **Ports**: 1x Fast Ethernet WAN/LAN
|
||||
- **CPU**: Single-core MIPS processor
|
||||
- **RAM**: 128MB DDR2
|
||||
- **Storage**: 16MB flash storage
|
||||
- **Power**: Micro-USB, very low power
|
||||
- **Size**: Ultra-compact, credit card sized
|
||||
|
||||
#### **Use Cases**
|
||||
- **Emergency Connectivity**: Basic internet access when needed
|
||||
- **Legacy Device Support**: Connect older devices to modern networks
|
||||
- **IoT Prototyping**: Simple network for development projects
|
||||
- **Backup Router**: Ultra-portable emergency networking
|
||||
- **Budget Travel**: Cost-effective secure connectivity
|
||||
|
||||
---
|
||||
|
||||
### **GL.iNet S200 - Multi-Protocol IoT Gateway**
|
||||
|
||||
#### **Hardware Specifications**
|
||||
- **Model**: GL-S200 Multi-Protocol IoT Gateway
|
||||
- **Protocols**: Thread, Zigbee, Matter, Wi-Fi
|
||||
- **Thread**: Thread Border Router functionality
|
||||
- **Zigbee**: Zigbee 3.0 coordinator support
|
||||
- **Matter**: Matter over Thread/Wi-Fi support
|
||||
- **CPU**: ARM Cortex-A7 processor
|
||||
- **RAM**: 256MB DDR3
|
||||
- **Storage**: 128MB flash storage
|
||||
- **Network**: Ethernet, Wi-Fi connectivity
|
||||
- **Power**: USB-C powered
|
||||
|
||||
#### **IoT Integration**
|
||||
- **Smart Home Hub**: Central control for IoT devices
|
||||
- **Protocol Translation**: Bridge between different IoT standards
|
||||
- **Remote Management**: Control IoT devices via Tailscale
|
||||
- **Travel IoT**: Portable smart home setup for extended stays
|
||||
- **Development Platform**: IoT protocol testing and development
|
||||
|
||||
---
|
||||
|
||||
## 🗺️ Travel Networking Architecture
|
||||
|
||||
### **Multi-Layer Connectivity Strategy**
|
||||
```
|
||||
Internet (Hotel/Airport/Cellular)
|
||||
│
|
||||
├── GL-BE3600 (Primary Wi-Fi 7 Router)
|
||||
│ ├── Secure Tunnel → Tailscale → Homelab
|
||||
│ ├── Guest Network (Untrusted devices)
|
||||
│ └── Private Network (Trusted devices)
|
||||
│
|
||||
├── GL-MT3000 (Backup Wi-Fi 6 Router)
|
||||
│ └── Secondary VPN Connection
|
||||
│
|
||||
├── GL-MT300N-V2 (Emergency Router)
|
||||
│ └── Basic connectivity fallback
|
||||
│
|
||||
└── GL-S200 (IoT Gateway)
|
||||
└── Smart device management
|
||||
```
|
||||
|
||||
### **Redundancy & Failover**
|
||||
- **Primary**: GL-BE3600 with Wi-Fi 7 for maximum performance
|
||||
- **Secondary**: GL-MT3000 for backup connectivity
|
||||
- **Emergency**: GL-MT300N-V2 for basic internet access
|
||||
- **Specialized**: GL-S200 for IoT device management
|
||||
|
||||
---
|
||||
|
||||
## 🏠 Current Homelab Deployment
|
||||
|
||||
Both GL-MT3000 and GL-BE3600 are deployed as **permanent infrastructure** in the homelab (not travel use), connected to Headscale and providing subnet routing.
|
||||
|
||||
### GL-MT3000 — IoT/HA Gateway
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Role** | Gateway for jellyfish + Home Assistant |
|
||||
| **LAN** | `192.168.12.0/24` (gateway: `192.168.12.1`) |
|
||||
| **WAN** | Separate uplink (`76.93.214.253`) — not on home LAN |
|
||||
| **Tailscale IP** | `100.126.243.15` |
|
||||
| **Tailscale version** | `1.92.5-tiny` (GL-inet custom build) |
|
||||
| **Subnet route** | `192.168.12.0/24` (approved in Headscale) |
|
||||
| **SSH** | `ssh gl-mt3000` (dropbear, key auth) |
|
||||
|
||||
Devices on `192.168.12.0/24` accessible via Tailscale:
|
||||
- `jellyfish` (`100.69.121.120`) — jump host / device
|
||||
- `homeassistant` (`100.112.186.90`) — Home Assistant OS
|
||||
|
||||
### GL-BE3600 — Wi-Fi Repeater
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Role** | Wi-Fi repeater on home network |
|
||||
| **Management IP** | `192.168.68.53` (upstream LAN) |
|
||||
| **Own LAN** | `192.168.8.0/24` (gateway: `192.168.8.1`) |
|
||||
| **Tailscale IP** | `100.105.59.123` |
|
||||
| **Tailscale version** | `1.90.9-tiny` (GL-inet custom build) |
|
||||
| **Subnet route** | `192.168.8.0/24` (approved in Headscale) |
|
||||
| **SSH** | `ssh gl-be3600` (dropbear, key auth) |
|
||||
|
||||
> **Note**: GL-BE3600 ports are filtered from homelab VM (`192.168.0.210`) and NUC (`192.168.68.x`). It is only directly reachable from its own `192.168.8.x` LAN — or via its Tailscale IP (`100.105.59.123`).
|
||||
|
||||
---
|
||||
|
||||
## 🔑 SSH Access
|
||||
|
||||
Both routers use **dropbear SSH** (not OpenSSH). Authorized keys are stored at `/etc/dropbear/authorized_keys`.
|
||||
|
||||
```bash
|
||||
# Connect via Tailscale (preferred)
|
||||
ssh gl-mt3000 # 100.126.243.15, root
|
||||
ssh gl-be3600 # 100.105.59.123, root
|
||||
|
||||
# Add a new SSH key manually (from the router shell)
|
||||
echo "ssh-ed25519 AAAA... your-key-comment" >> /etc/dropbear/authorized_keys
|
||||
```
|
||||
|
||||
SSH config entries (in `~/.ssh/config` on homelab VM):
|
||||
```
|
||||
Host gl-mt3000
|
||||
HostName 100.126.243.15
|
||||
User root
|
||||
|
||||
Host gl-be3600
|
||||
HostName 100.105.59.123
|
||||
User root
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📡 Headscale / Tailscale Setup on GL-inet Routers
|
||||
|
||||
GL-inet routers ship with a custom Tailscale build (`tailscale-tiny`). The standard install script does not work — use the GL-inet package manager or the pre-installed binary.
|
||||
|
||||
### Joining Headscale
|
||||
|
||||
```bash
|
||||
# 1. Generate a pre-auth key on the Headscale server
|
||||
ssh calypso
|
||||
sudo /usr/local/bin/docker exec headscale headscale preauthkeys create --user <numeric-user-id> --expiration 1h
|
||||
# Note: --user requires numeric ID in Headscale v0.28, not username
|
||||
# Find ID with: sudo /usr/local/bin/docker exec headscale headscale users list
|
||||
|
||||
# 2. On the GL-inet router shell:
|
||||
tailscale up --login-server=https://headscale.vish.gg:8443 --authkey=<preauthkey> --accept-routes --advertise-routes=192.168.X.0/24 --advertise-exit-node --hostname=gl-<model>
|
||||
|
||||
# 3. Approve the subnet route and exit node on Headscale:
|
||||
sudo /usr/local/bin/docker exec headscale headscale nodes list # get node ID
|
||||
sudo /usr/local/bin/docker exec headscale headscale nodes approve-routes -i <ID> -r '0.0.0.0/0,::/0,192.168.X.0/24'
|
||||
```
|
||||
|
||||
### Tailscale Status
|
||||
|
||||
```bash
|
||||
# Check status on the router
|
||||
ssh gl-mt3000 "tailscale status"
|
||||
ssh gl-be3600 "tailscale status"
|
||||
|
||||
# Check from Headscale
|
||||
ssh calypso "sudo /usr/local/bin/docker exec headscale headscale nodes list"
|
||||
```
|
||||
|
||||
### Headscale v0.28 Command Reference
|
||||
|
||||
| Old command | New command |
|
||||
|-------------|-------------|
|
||||
| `headscale routes list` | `headscale nodes list-routes --identifier <ID>` |
|
||||
| `headscale routes enable -r <ID>` | `headscale nodes approve-routes --identifier <ID> --routes <CIDR>` |
|
||||
| `headscale preauthkeys create --user <name>` | `headscale preauthkeys create --user <numeric-id>` |
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Tailscale Autostart on Boot
|
||||
|
||||
### How GL-inet Manages Tailscale
|
||||
|
||||
GL-inet routers use a custom wrapper script `/usr/bin/gl_tailscale` that is called on boot by the `tailscale` init service. This wrapper reads UCI config from `/etc/config/tailscale` and constructs the `tailscale up` command automatically.
|
||||
|
||||
**Important**: The GL-inet wrapper calls `tailscale up --reset ...` on every boot, which wipes any flags set manually or stored in the state file. This means `--login-server`, `--advertise-exit-node`, and `--hostname` must be baked into the wrapper script itself — they cannot be set once and remembered.
|
||||
|
||||
### Current Configuration (both routers)
|
||||
|
||||
Both routers have been patched so `/usr/bin/gl_tailscale` always passes the correct flags on boot. The relevant line in the wrapper:
|
||||
|
||||
**gl-be3600:**
|
||||
```sh
|
||||
timeout 10 /usr/sbin/tailscale up --reset --accept-routes $param --timeout 3s \
|
||||
--accept-dns=false \
|
||||
--login-server=https://headscale.vish.gg:8443 \
|
||||
--advertise-exit-node \
|
||||
--hostname=gl-be3600 > /dev/null
|
||||
```
|
||||
|
||||
**gl-mt3000:**
|
||||
```sh
|
||||
timeout 10 /usr/sbin/tailscale up --reset --accept-routes $param --timeout 3s \
|
||||
--accept-dns=false \
|
||||
--login-server=https://headscale.vish.gg:8443 \
|
||||
--advertise-exit-node \
|
||||
--hostname=gl-mt3000 > /dev/null
|
||||
```
|
||||
|
||||
The `$param` variable is built by the wrapper from UCI settings and includes `--advertise-routes=192.168.X.0/24` automatically based on `lan_enabled=1` in `/etc/config/tailscale`.
|
||||
|
||||
### Persistence Across Firmware Upgrades
|
||||
|
||||
Both routers have `/etc/sysupgrade.conf` entries to preserve the patched files:
|
||||
|
||||
```
|
||||
/usr/sbin/tailscale
|
||||
/usr/sbin/tailscaled
|
||||
/etc/config/tailscale
|
||||
/usr/bin/gl_tailscale
|
||||
/etc/init.d/tailscale-up
|
||||
```
|
||||
|
||||
### Re-applying the Patch After Firmware Upgrade
|
||||
|
||||
If a firmware upgrade overwrites `/usr/bin/gl_tailscale` (check with `tailscale status` — if "Logged out", patch was lost):
|
||||
|
||||
```bash
|
||||
# SSH to the router
|
||||
ssh gl-be3600 # or gl-mt3000
|
||||
|
||||
# Edit the gl_tailscale wrapper
|
||||
vi /usr/bin/gl_tailscale
|
||||
|
||||
# Find the tailscale up line (around line 226):
|
||||
# timeout 10 /usr/sbin/tailscale up --reset --accept-routes $param --timeout 3s --accept-dns=false > /dev/null
|
||||
# Change it to (for be3600):
|
||||
# timeout 10 /usr/sbin/tailscale up --reset --accept-routes $param --timeout 3s --accept-dns=false --login-server=https://headscale.vish.gg:8443 --advertise-exit-node --hostname=gl-be3600 > /dev/null
|
||||
|
||||
# Or use sed:
|
||||
sed -i 's|tailscale up --reset --accept-routes $param --timeout 3s --accept-dns=false|tailscale up --reset --accept-routes $param --timeout 3s --accept-dns=false --login-server=https://headscale.vish.gg:8443 --advertise-exit-node --hostname=gl-be3600|' /usr/bin/gl_tailscale
|
||||
```
|
||||
|
||||
### update-tailscale.sh
|
||||
|
||||
There is a community script at `/root/update-tailscale.sh` on both routers — this is the [GL-inet Tailscale Updater by Admon](https://github.com/Admonstrator/glinet-tailscale-updater). It updates the `tailscale`/`tailscaled` binaries to a newer version than GL-inet ships in firmware. It also restores `/usr/bin/gl_tailscale` from `/rom` before patching for SSH support — **re-apply the headscale patch after running this script**.
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Configuration & Setup
|
||||
|
||||
### **GL-BE3600 Primary Setup**
|
||||
|
||||
#### **Initial Configuration**
|
||||
```bash
|
||||
# Access router admin panel
|
||||
http://192.168.8.1
|
||||
|
||||
# Configure WAN connection
|
||||
- Set to DHCP for hotel/public Wi-Fi
|
||||
- Configure static IP if needed
|
||||
- Enable MAC address cloning for captive portals
|
||||
|
||||
# Configure VPN
|
||||
- Enable WireGuard client
|
||||
- Import Tailscale configuration
|
||||
- Set auto-connect on boot
|
||||
```
|
||||
|
||||
#### **Network Segmentation**
|
||||
```bash
|
||||
# Private Network (192.168.8.0/24)
|
||||
- Trusted devices (laptop, phone, tablet)
|
||||
- Full access to homelab via VPN
|
||||
- Local device communication allowed
|
||||
|
||||
# Guest Network (192.168.9.0/24)
|
||||
- Untrusted devices
|
||||
- Internet-only access
|
||||
- Isolated from private network
|
||||
```
|
||||
|
||||
### **Remote KVM (GL-RM1) Setup**
|
||||
|
||||
#### **Physical Connection**
|
||||
```bash
|
||||
# Connect to target server
|
||||
1. USB-A to server for keyboard/mouse emulation
|
||||
2. HDMI/VGA to server for video capture
|
||||
3. Ethernet to network for remote access
|
||||
4. USB-C for power
|
||||
|
||||
# Network Configuration
|
||||
- Assign static IP: 192.168.8.100
|
||||
- Configure port forwarding: 8080 → 80
|
||||
- Enable HTTPS for secure access
|
||||
```
|
||||
|
||||
#### **Tailscale Integration**
|
||||
```bash
|
||||
# Install Tailscale on KVM device
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
sudo tailscale up --accept-routes
|
||||
|
||||
# Access via Tailscale
|
||||
https://gl-rm1.tail.vish.gg
|
||||
```
|
||||
|
||||
### **IoT Gateway (GL-S200) Configuration**
|
||||
|
||||
#### **Thread Border Router Setup**
|
||||
```bash
|
||||
# Enable Thread functionality
|
||||
- Configure as Thread Border Router
|
||||
- Set network credentials
|
||||
- Enable Matter support
|
||||
|
||||
# Zigbee Coordinator Setup
|
||||
- Configure Zigbee channel
|
||||
- Set network key
|
||||
- Enable device pairing mode
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ Security Configuration
|
||||
|
||||
### **VPN Security**
|
||||
- **WireGuard Tunnels**: All traffic encrypted back to homelab
|
||||
- **Kill Switch**: Block internet if VPN disconnects
|
||||
- **DNS Security**: Use homelab Pi-hole for ad blocking
|
||||
- **Firewall Rules**: Strict ingress/egress filtering
|
||||
|
||||
### **Network Isolation**
|
||||
- **Guest Network**: Completely isolated from private devices
|
||||
- **IoT Segmentation**: Smart devices on separate VLAN
|
||||
- **Management Network**: KVM and admin access isolated
|
||||
- **Zero Trust**: All connections authenticated and encrypted
|
||||
|
||||
### **Access Control**
|
||||
- **Strong Passwords**: Unique passwords for each device
|
||||
- **SSH Keys**: Key-based authentication where possible
|
||||
- **Regular Updates**: Firmware updates for security patches
|
||||
- **Monitoring**: Log analysis for suspicious activity
|
||||
|
||||
---
|
||||
|
||||
## 📱 Mobile Device Integration
|
||||
|
||||
### **Seamless Connectivity**
|
||||
```bash
|
||||
# Device Auto-Connection Priority
|
||||
1. GL-BE3600 (Primary Wi-Fi 7)
|
||||
2. GL-MT3000 (Backup Wi-Fi 6)
|
||||
3. GL-MT300N-V2 (Emergency)
|
||||
4. Cellular (Last resort)
|
||||
|
||||
# Tailscale Configuration
|
||||
- All devices connected to Tailscale mesh
|
||||
- Automatic failover between networks
|
||||
- Consistent homelab access regardless of uplink
|
||||
```
|
||||
|
||||
### **Performance Optimization**
|
||||
- **Wi-Fi 7**: Maximum throughput for data-intensive tasks
|
||||
- **QoS**: Prioritize critical traffic (VPN, video calls)
|
||||
- **Band Steering**: Automatic 2.4GHz/5GHz selection
|
||||
- **Load Balancing**: Distribute devices across routers
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Monitoring & Management
|
||||
|
||||
### **Remote Monitoring**
|
||||
- **Router Status**: Monitor via web interface and mobile app
|
||||
- **VPN Health**: Check tunnel status and throughput
|
||||
- **Device Connectivity**: Track connected devices and usage
|
||||
- **Performance Metrics**: Bandwidth, latency, packet loss
|
||||
|
||||
### **Troubleshooting Tools**
|
||||
- **Network Diagnostics**: Built-in ping, traceroute, speed test
|
||||
- **Log Analysis**: System logs for connection issues
|
||||
- **Remote Access**: SSH access for advanced configuration
|
||||
- **Factory Reset**: Hardware reset button for recovery
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Use Case Scenarios
|
||||
|
||||
### **Business Travel**
|
||||
1. **Hotel Setup**: GL-BE3600 for secure Wi-Fi, KVM for server access
|
||||
2. **Conference**: GL-MT3000 for portable networking
|
||||
3. **Emergency**: GL-MT300N-V2 for basic connectivity
|
||||
4. **IoT Devices**: GL-S200 for smart device management
|
||||
|
||||
### **Extended Stay**
|
||||
1. **Primary Network**: GL-BE3600 with full homelab access
|
||||
2. **Smart Home**: GL-S200 for temporary IoT setup
|
||||
3. **Backup Connectivity**: Multiple routers for redundancy
|
||||
4. **Remote Management**: KVM for homelab troubleshooting
|
||||
|
||||
### **Digital Nomad**
|
||||
1. **Mobile Office**: Secure, high-speed connectivity anywhere
|
||||
2. **Content Creation**: High-bandwidth for video uploads
|
||||
3. **Development Work**: Full access to homelab resources
|
||||
4. **IoT Projects**: Portable development environment
|
||||
|
||||
---
|
||||
|
||||
## 📋 Maintenance & Updates
|
||||
|
||||
### **Regular Tasks**
|
||||
- **Firmware Updates**: Monthly security and feature updates
|
||||
- **Configuration Backup**: Export settings before changes
|
||||
- **Performance Testing**: Regular speed and latency tests
|
||||
- **Security Audit**: Review firewall rules and access logs
|
||||
|
||||
### **Travel Checklist**
|
||||
- [ ] All devices charged and firmware updated
|
||||
- [ ] VPN configurations tested and working
|
||||
- [ ] Backup connectivity options verified
|
||||
- [ ] Emergency contact information accessible
|
||||
- [ ] Documentation and passwords secured
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Integration with Homelab
|
||||
|
||||
### **Tailscale Mesh Network**
|
||||
- **Seamless Access**: All GL.iNet devices join Tailscale mesh
|
||||
- **Split-Brain DNS**: Local hostname resolution while traveling
|
||||
- **Subnet Routing**: Access homelab subnets via travel routers
|
||||
- **Exit Nodes**: Route internet traffic through homelab
|
||||
|
||||
### **Service Access**
|
||||
- **Media Streaming**: Plex, Jellyfin via high-speed VPN
|
||||
- **Development**: GitLab, Portainer, development environments
|
||||
- **Productivity**: Paperless-NGX, Vaultwarden, file sync
|
||||
- **Monitoring**: Grafana, Uptime Kuma for homelab status
|
||||
|
||||
---
|
||||
|
||||
*This GL.iNet travel networking infrastructure provides enterprise-level connectivity and security for mobile work, ensuring seamless access to homelab resources from anywhere in the world.*
|
||||
|
||||
*Last Updated*: 2026-03-11 (added Tailscale autostart section, gl_tailscale patch details, update-tailscale.sh note)
|
||||
1096
docs/infrastructure/hardware-inventory.md
Normal file
1096
docs/infrastructure/hardware-inventory.md
Normal file
File diff suppressed because it is too large
Load Diff
411
docs/infrastructure/headscale-migration-guide.md
Normal file
411
docs/infrastructure/headscale-migration-guide.md
Normal file
@@ -0,0 +1,411 @@
|
||||
# Headscale Migration Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This homelab uses a self-hosted [Headscale](https://github.com/juanfont/headscale) instance instead of Tailscale cloud. Headscale is a drop-in open-source replacement for the Tailscale control server.
|
||||
|
||||
- **Headscale server**: `https://headscale.vish.gg:8443`
|
||||
- **MagicDNS suffix**: `tail.vish.gg` (e.g. `atlantis.tail.vish.gg`)
|
||||
- **Login**: Authentik SSO at `sso.vish.gg` — username `vish` or email `admin@thevish.io`
|
||||
- **Hosted on**: Calypso (`192.168.0.250`), managed via Docker
|
||||
|
||||
---
|
||||
|
||||
## Connecting a New Device
|
||||
|
||||
### Linux (Ubuntu / Debian)
|
||||
|
||||
1. Install Tailscale if not already installed:
|
||||
```bash
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
```
|
||||
|
||||
2. Connect to headscale:
|
||||
```bash
|
||||
sudo tailscale up \
|
||||
--login-server=https://headscale.vish.gg:8443 \
|
||||
--accept-routes \
|
||||
--force-reauth
|
||||
```
|
||||
|
||||
3. A browser auth URL will be printed. Open it and log in with Authentik SSO.
|
||||
|
||||
4. If DNS doesn't resolve `headscale.vish.gg` (e.g. fresh machine with no AdGuard), add a temporary hosts entry first:
|
||||
```bash
|
||||
echo '184.23.52.14 headscale.vish.gg' | sudo tee -a /etc/hosts
|
||||
# Run tailscale up, then clean up:
|
||||
sudo sed -i '/headscale.vish.gg/d' /etc/hosts
|
||||
```
|
||||
|
||||
5. If the machine was previously on Tailscale cloud and complains about non-default flags, Tailscale will print the exact command with all required flags — copy and run that command.
|
||||
|
||||
> **Note**: After registration, an admin must approve the node and fix the IP if preserving the original Tailscale IP (see Admin section below).
|
||||
|
||||
---
|
||||
|
||||
### Windows
|
||||
|
||||
1. Download and install Tailscale from https://tailscale.com/download/windows
|
||||
|
||||
2. Open **PowerShell as Administrator** and run:
|
||||
```powershell
|
||||
tailscale up --login-server=https://headscale.vish.gg:8443 --accept-routes --force-reauth
|
||||
```
|
||||
|
||||
3. A browser window will open — log in with Authentik SSO (`vish` / `admin@thevish.io`).
|
||||
|
||||
4. If it shows a "mention all non-default flags" error, copy and run the exact command it provides, adding `--login-server=https://headscale.vish.gg:8443 --force-reauth` to it.
|
||||
|
||||
> **Important**: Always include `--accept-routes` on Windows otherwise subnet routes (e.g. `192.168.0.x`) won't be reachable.
|
||||
|
||||
---
|
||||
|
||||
### iOS (iPhone / iPad)
|
||||
|
||||
1. Install **Tailscale** from the App Store.
|
||||
|
||||
2. Open the app → tap your **account icon** (top right) → **Log in**
|
||||
|
||||
3. Tap the `···` menu (top right of the login screen) → **Use custom coordination server**
|
||||
|
||||
4. Enter: `https://headscale.vish.gg:8443` → **Save**
|
||||
|
||||
5. Log in with Authentik SSO — username `vish` or email `admin@thevish.io`
|
||||
|
||||
> **Note**: `.vish.local` hostnames do NOT work on iOS — iOS intercepts `.local` for mDNS and never forwards to DNS. Use Tailscale IPs (`100.x.x.x`) or MagicDNS names (`hostname.tail.vish.gg`) instead.
|
||||
|
||||
---
|
||||
|
||||
### macOS
|
||||
|
||||
1. Install Tailscale from the App Store or https://tailscale.com/download/mac
|
||||
|
||||
2. **Option A — GUI**: Click the Tailscale menu bar icon → Preferences → hold `Option` while clicking "Log in" to enter a custom server URL → enter `https://headscale.vish.gg:8443`
|
||||
|
||||
3. **Option B — CLI**:
|
||||
```bash
|
||||
sudo tailscale up \
|
||||
--login-server=https://headscale.vish.gg:8443 \
|
||||
--accept-routes \
|
||||
--force-reauth
|
||||
```
|
||||
|
||||
4. Log in with Authentik SSO when the browser opens.
|
||||
|
||||
> **Note**: Same as iOS, `.vish.local` hostnames won't resolve on macOS when remote. Use `hostname.tail.vish.gg` or the Tailscale IP instead.
|
||||
|
||||
---
|
||||
|
||||
### GL.iNet Routers (OpenWrt)
|
||||
|
||||
1. SSH into the router.
|
||||
|
||||
2. Add a hosts entry (since GL routers don't use AdGuard):
|
||||
```bash
|
||||
echo '184.23.52.14 headscale.vish.gg' >> /etc/hosts
|
||||
```
|
||||
|
||||
3. Run tailscale up — it will error with the required flags. Copy and run the exact command it provides, appending:
|
||||
```
|
||||
--login-server=https://headscale.vish.gg:8443 --auth-key=<preauth-key> --force-reauth
|
||||
```
|
||||
Get a pre-auth key from an admin (see below).
|
||||
|
||||
4. If advertising subnet routes, add `--advertise-routes=<subnet>` to the command.
|
||||
|
||||
---
|
||||
|
||||
### Home Assistant (Tailscale Add-on)
|
||||
|
||||
> **Note**: HA Green does not expose SSH by default. Use the WebSocket API approach below,
|
||||
> which works fully remotely via a Tailscale-connected hop host.
|
||||
|
||||
**Remote migration steps** (no physical access required):
|
||||
|
||||
1. Reach HA via a hop host on the same LAN (e.g. jellyfish at `100.69.121.120`):
|
||||
```
|
||||
ssh lulu@100.69.121.120
|
||||
curl http://192.168.12.202:8123/api/ # confirm HA reachable
|
||||
```
|
||||
|
||||
2. If the add-on was previously authenticated to Tailscale cloud, it will refuse
|
||||
`--login-server` change with: `can't change --login-server without --force-reauth`.
|
||||
**Fix**: uninstall + reinstall the add-on via supervisor API to clear `tailscaled.state`:
|
||||
```python
|
||||
# Via HA WebSocket API (supervisor/api endpoint):
|
||||
{"type": "supervisor/api", "endpoint": "/addons/a0d7b954_tailscale/uninstall", "method": "post"}
|
||||
{"type": "supervisor/api", "endpoint": "/addons/a0d7b954_tailscale/install", "method": "post"}
|
||||
```
|
||||
|
||||
3. Set options before starting:
|
||||
```python
|
||||
{"type": "supervisor/api", "endpoint": "/addons/a0d7b954_tailscale/options", "method": "post",
|
||||
"data": {"options": {"login_server": "https://headscale.vish.gg:8443", "accept_dns": false}}}
|
||||
```
|
||||
|
||||
4. Start the add-on via `hassio/addon_start` service, then read logs:
|
||||
```
|
||||
GET http://192.168.12.202:8123/api/hassio/addons/a0d7b954_tailscale/logs
|
||||
```
|
||||
Look for: `AuthURL is https://headscale.vish.gg:8443/register/<key>`
|
||||
|
||||
5. Register on Calypso:
|
||||
```bash
|
||||
docker exec headscale headscale nodes register --user vish --key <key-from-log>
|
||||
```
|
||||
|
||||
6. Fix IP via SQLite (see section above) and restart headscale.
|
||||
|
||||
---
|
||||
|
||||
## Admin: Registering a New Node
|
||||
|
||||
After a node connects, an admin needs to:
|
||||
|
||||
### 1. Generate a Pre-Auth Key (optional, avoids browser auth)
|
||||
|
||||
```bash
|
||||
ssh -p 62000 Vish@192.168.0.250
|
||||
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
|
||||
headscale preauthkeys create --user 1 --expiration 1h
|
||||
```
|
||||
|
||||
Use `--authkey=<key>` instead of browser auth in `tailscale up`.
|
||||
|
||||
### 2. Check Registered Nodes
|
||||
|
||||
```bash
|
||||
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale headscale nodes list
|
||||
```
|
||||
|
||||
### 3. Preserve Original Tailscale IP (if migrating from Tailscale cloud)
|
||||
|
||||
Headscale v0.28+ removed the `--ipv4` flag. Fix IPs via SQLite:
|
||||
|
||||
```bash
|
||||
sudo sqlite3 /volume1/@docker/volumes/headscale-data/_data/db.sqlite \
|
||||
"UPDATE nodes SET ipv4='100.x.x.x' WHERE id=<node-id>;"
|
||||
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker restart headscale
|
||||
```
|
||||
|
||||
### 4. Rename a Node
|
||||
|
||||
```bash
|
||||
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
|
||||
headscale nodes rename -i <id> <new-name>
|
||||
```
|
||||
|
||||
### 5. Approve Subnet Routes
|
||||
|
||||
Routes advertised by nodes must be explicitly approved:
|
||||
|
||||
```bash
|
||||
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
|
||||
headscale nodes approve-routes -i <node-id> -r <subnet>
|
||||
# e.g. -r 192.168.0.0/24
|
||||
```
|
||||
|
||||
Check all routes (v0.28 — routes are embedded in node JSON output):
|
||||
```bash
|
||||
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
|
||||
headscale nodes list --output json | python3 -c "
|
||||
import sys,json
|
||||
for n in json.load(sys.stdin):
|
||||
r=n.get('available_routes',[])
|
||||
a=n.get('approved_routes',[])
|
||||
if r: print(n['given_name'], 'available:', r, 'approved:', a)
|
||||
"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## DNS Notes
|
||||
|
||||
- **MagicDNS**: Headscale pushes `192.168.0.250` (Calypso AdGuard) as DNS to all tailnet clients
|
||||
- **AdGuard rewrites**: `*.vish.local` names resolve to their Tailscale IPs via AdGuard rewrites on Calypso
|
||||
- **`.vish.local` on iOS/macOS**: Does NOT work remotely — iOS/macOS intercept `.local` for mDNS. Use `hostname.tail.vish.gg` instead
|
||||
- **External DNS**: `headscale.vish.gg` resolves to `184.23.52.14` (home WAN) externally, `192.168.0.250` internally via AdGuard rewrite
|
||||
|
||||
## Uptime Kuma Monitoring
|
||||
|
||||
Kuma runs on **pi-5** (`100.77.151.40`) inside the `uptime-kuma` container. DB at `/app/data/kuma.db`.
|
||||
|
||||
### Monitor groups and hosts
|
||||
|
||||
| Group | Host | Tailscale IP |
|
||||
|-------|------|-------------|
|
||||
| Homelab | `homelab.tail.vish.gg` | `100.67.40.126` |
|
||||
| Atlantis | `atlantis.tail.vish.gg` | `100.83.230.112` |
|
||||
| Calypso | `calypso.tail.vish.gg` | `100.103.48.78` |
|
||||
| Concord_NUC | `vish-concord-nuc.tail.vish.gg` | `100.72.55.21` |
|
||||
| Setillo | `setillo.tail.vish.gg` | `100.125.0.20` |
|
||||
| Proxmox_NUC | `pve.tail.vish.gg` | `100.87.12.28` |
|
||||
| Guava | `truenas-scale.tail.vish.gg` | `100.75.252.64` |
|
||||
| Seattle | `seattle.tail.vish.gg` | `100.82.197.124` |
|
||||
| Raspberry Pi 5 | `100.77.151.40` | `100.77.151.40` |
|
||||
|
||||
### Firewall rules required for Kuma (pi-5 = `100.77.151.40`)
|
||||
|
||||
Kuma polls via Tailscale IP. Each host with a ts-input/ts-forward chain needs ACCEPT rules for pi-5:
|
||||
|
||||
- **Homelab VM**: Rules in `iptables-legacy` ts-input/ts-forward for pi-5 on all monitored ports. Persisted via `netfilter-persistent`.
|
||||
- **Concord NUC**: Same — ts-input/ts-forward ACCEPT for pi-5 on monitored ports.
|
||||
- **Seattle**: UFW rule `ufw allow from 100.77.151.40 to any port 8444`
|
||||
- **Calypso/Atlantis/Setillo**: No ts-input blocking — Tailscale is in userspace mode on Synology.
|
||||
|
||||
### Duplicate service naming
|
||||
|
||||
Services that exist on both Atlantis and Calypso use prefixes:
|
||||
- `[ATL] Sonarr`, `[ATL] Radarr`, etc. for Atlantis
|
||||
- `[CAL] Sonarr`, `[CAL] Radarr`, etc. for Calypso
|
||||
|
||||
### AdGuard DNS fix for `*.tail.vish.gg` on pi-5
|
||||
|
||||
Pi-5's Docker daemon was using `100.100.100.100` (Tailscale MagicDNS) but AdGuard on Calypso was forwarding `*.vish.gg` to Cloudflare, which returned stale IPs. Fixed by adding a private upstream in AdGuard config at `/volume1/docker/adguard/config/AdGuardHome.yaml`:
|
||||
|
||||
```yaml
|
||||
upstream_dns:
|
||||
- "[/tail.vish.gg/]100.100.100.100"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## NPM Proxy Host Gotcha — Same-Subnet LAN IPs
|
||||
|
||||
**Problem**: NPM on Calypso (`192.168.0.250`) cannot reach Docker-published ports on other hosts
|
||||
that are on the same LAN subnet (`192.168.0.x`).
|
||||
|
||||
**Root cause**: When the `Tailscale_outbound_connections` DSM task runs `tailscale configure-host`
|
||||
on Calypso, it installs kernel netfilter hooks. After this, Docker containers on Calypso sending
|
||||
traffic to a LAN IP on the same subnet bypass the DNAT rules on the destination host (same-subnet
|
||||
traffic doesn't go through PREROUTING on the target). The containers are unreachable via their
|
||||
published ports.
|
||||
|
||||
**Fix**: Always use the **Tailscale IP** as the `forward_host` in NPM for services running in
|
||||
Docker on other hosts, not the LAN IP.
|
||||
|
||||
| Host | Use this in NPM (not LAN IP) |
|
||||
|------|------------------------------|
|
||||
| Homelab VM | `100.67.40.126` |
|
||||
| Guava / TrueNAS | `100.75.252.64` |
|
||||
| Atlantis | `100.83.230.112` |
|
||||
|
||||
**Why it worked pre-Headscale**: Before the migration, Tailscale on Calypso ran in pure userspace
|
||||
mode without kernel netfilter hooks. NPM's outbound packets took the normal kernel path, hitting
|
||||
the destination's Docker DNAT rules correctly. The `configure-host` task (which installs kernel
|
||||
hooks) is required for Headscale's subnet routing to work, which introduced this side effect.
|
||||
|
||||
**Known affected proxy hosts** (already fixed to Tailscale IPs):
|
||||
- `gf.vish.gg` → `100.67.40.126:3300` (Grafana)
|
||||
- `ntfy.vish.gg` → `100.67.40.126:8081` (NTFY)
|
||||
- `hoarder.thevish.io` → `100.67.40.126:3482` (Karakeep)
|
||||
- `binterest.thevish.io` → `100.67.40.126:21544` (Binternet)
|
||||
- `crista.love` → `100.75.252.64:28888` (Guava nginx/static site)
|
||||
|
||||
---
|
||||
|
||||
## DERP Relay Servers
|
||||
|
||||
Three DERP relay regions are configured for redundancy:
|
||||
|
||||
| Region | Code | Host | DERP Port | STUN Port | Notes |
|
||||
|--------|------|------|-----------|-----------|-------|
|
||||
| 900 | home-cal | headscale.vish.gg:8443 | 8443 | none | Headscale built-in, LAN only |
|
||||
| 901 | sea | derp-sea.vish.gg:8444 | 8444 | 3478 | Seattle VPS |
|
||||
| 902 | home-atl | derp-atl.vish.gg:8445 | 8445 | 3480 | Atlantis NAS — added for redundancy |
|
||||
|
||||
> **Important**: Tailscale public DERP servers (sfo, nyc, etc.) are disabled. Headscale nodes cannot authenticate through Tailscale's infrastructure. All relay traffic goes through regions 900, 901, or 902.
|
||||
|
||||
### DERP Infrastructure Notes
|
||||
|
||||
- `derp-sea.vish.gg` → Seattle VPS (`YOUR_WAN_IP`), derper container at `hosts/vms/seattle/derper.yaml`
|
||||
- `derp-atl.vish.gg` → Home public IP (`184.23.52.14`), router forwards `8445/tcp` + `3480/udp` to Atlantis (`192.168.0.200`)
|
||||
- Container deployed as **Portainer stack ID 688** on Atlantis (from `hosts/synology/atlantis/derper.yaml`)
|
||||
- TLS cert at `/volume1/docker/derper-atl/certs/live/derp-atl.vish.gg/` (flat `.crt`/`.key` layout required by derper)
|
||||
- Cloudflare credentials at `/volume1/docker/derper-atl/secrets/cloudflare.ini`
|
||||
- Cert auto-renewed monthly (1st of month, 03:00) by `derper-atl-cert-renewer` sidecar container
|
||||
(certbot/dns-cloudflare + supercronic; logs at `/volume1/docker/derper-atl/certs/renew.log`)
|
||||
- Port 3478/udp: coturn/Jitsi on Atlantis — do not use
|
||||
- Port 3479/udp: coturn/Matrix TURN on matrix-ubuntu — do not use
|
||||
- `derpmap.yaml` lives at `hosts/synology/calypso/derpmap.yaml` in repo; must be manually synced to `/volume1/docker/headscale/config/derpmap.yaml` on Calypso after changes
|
||||
|
||||
## Subnet Routes in Use
|
||||
|
||||
| Subnet | Advertised by | Approved |
|
||||
|--------|--------------|---------|
|
||||
| 192.168.0.0/24 | calypso (primary), atlantis | ✅ |
|
||||
| 192.168.68.0/22 | vish-concord-nuc | ✅ |
|
||||
| 192.168.69.0/24 | setillo | ✅ |
|
||||
| 192.168.12.0/24 | gl-mt3000 | ✅ |
|
||||
|
||||
## Node Inventory
|
||||
|
||||
| ID | Hostname | Tailscale IP | Status |
|
||||
|----|----------|-------------|--------|
|
||||
| 1 | headscale-test | 100.64.0.1 | test LXC |
|
||||
| 2 | seattle (vmi2076105) | 100.82.197.124 | Seattle VPS |
|
||||
| 3 | matrix-ubuntu | 100.85.21.51 | |
|
||||
| 4 | pi-5 | 100.77.151.40 | |
|
||||
| 5 | vish-concord-nuc | 100.72.55.21 | |
|
||||
| 6 | setillo | 100.125.0.20 | |
|
||||
| 7 | pve | 100.87.12.28 | |
|
||||
| 8 | truenas-scale | 100.75.252.64 | Guava/TrueNAS |
|
||||
| 9 | ipad-pro | 100.68.71.48 | |
|
||||
| 10 | iphone16-pro-max | 100.79.252.108 | |
|
||||
| 11 | atlantis | 100.83.230.112 | |
|
||||
| 12 | calypso | 100.103.48.78 | Runs headscale |
|
||||
| 13 | homelab | 100.67.40.126 | |
|
||||
| 14 | uqiyoe | 100.124.91.52 | Windows laptop |
|
||||
| 15 | jellyfish | 100.69.121.120 | Remote location |
|
||||
| 16 | gl-mt3000 | 100.126.243.15 | Remote router |
|
||||
| 17 | gl-be3600 | 100.105.59.123 | Home router |
|
||||
|
||||
### Still to migrate (offline nodes)
|
||||
Run `tailscale up --login-server=https://headscale.vish.gg:8443 --force-reauth` when they come online:
|
||||
|
||||
- kevinlaptop (`100.89.160.65`)
|
||||
- mah-pc (`100.121.22.51`)
|
||||
- shinku-ryuu (`100.98.93.15`)
|
||||
- vish-mint (`100.115.169.43`)
|
||||
- vishdebian (`100.86.60.62`)
|
||||
- mastodon-rocky (`100.111.200.21`)
|
||||
- nvidia-shield (`100.89.79.99`)
|
||||
- pi-5-kevin (`100.123.246.75`)
|
||||
- rocky9-playground (`100.105.250.128`)
|
||||
- samsung-sm-x510 (`100.72.118.117`)
|
||||
- sd (`100.83.141.1`)
|
||||
- bluecrownpassionflower (`100.110.25.127`)
|
||||
- glkvm (`100.64.137.1`)
|
||||
- google-pixel-10-pro (`100.122.119.40`)
|
||||
|
||||
### Home Assistant — Migrated ✅
|
||||
|
||||
**Device**: Home Assistant Green at `192.168.12.202:8123` (jellyfish remote location)
|
||||
**Tailscale IP**: `100.112.186.90` (preserved) | **Node ID**: 19 | **MagicDNS**: `homeassistant.tail.vish.gg`
|
||||
|
||||
**Migration completed** remotely (no physical access needed) via:
|
||||
1. HA WebSocket API (`ws://192.168.12.202:8123/api/websocket`) proxied through jellyfish (`100.69.121.120`)
|
||||
2. Supervisor `addon_configs` API to set `login_server: https://headscale.vish.gg:8443`
|
||||
3. Uninstalled + reinstalled the Tailscale add-on to clear stale `tailscaled.state`
|
||||
(necessary because `can't change --login-server without --force-reauth`)
|
||||
4. Add-on registered against headscale — auth URL approved via `headscale nodes register`
|
||||
5. IP updated via SQLite: `UPDATE nodes SET ipv4='100.112.186.90' WHERE id=19;`
|
||||
|
||||
**Current add-on config**:
|
||||
```json
|
||||
{ "login_server": "https://headscale.vish.gg:8443", "accept_dns": false }
|
||||
```
|
||||
|
||||
**Uptime Kuma monitor**: `[JLF] Home Assistant` (ID 5) → `homeassistant.tail.vish.gg:8123`
|
||||
|
||||
**HA API token** (expires 2028-06-07):
|
||||
`eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIxMzA1ZTE0NDg2ZGY0NDExYmMyOGEwZTY3ZmUyMTc3NyIsImlhdCI6MTc3MzA1MjkzNywiZXhwIjoyMDg4NDEyOTM3fQ.hzqjg7ALTdTDkMJS9Us-RUetQ309Nmfzx4gXevRRlp8` <!-- pragma: allowlist secret -->
|
||||
|
||||
---
|
||||
|
||||
## Outstanding TODOs
|
||||
|
||||
| Priority | Task | Notes |
|
||||
|----------|------|-------|
|
||||
| Low | **Migrate offline nodes** | ~13 nodes still on Tailscale cloud — migrate when they come online |
|
||||
| Info | **NPM proxy hosts audit** | Going forward, always use Tailscale IPs in NPM for Docker services on other LAN hosts (see NPM section above) |
|
||||
666
docs/infrastructure/hosts.md
Normal file
666
docs/infrastructure/hosts.md
Normal file
@@ -0,0 +1,666 @@
|
||||
# 🏗️ Host Infrastructure Overview
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This homelab consists of multiple hosts running **159 containers** across various hardware platforms. Each host serves specific roles and runs services optimized for its capabilities.
|
||||
|
||||
**Last Verified**: 2026-02-08 via SSH verification (jellyfish added)
|
||||
|
||||
## 📊 Infrastructure Summary
|
||||
|
||||
| Host Category | Count | Total Services | Primary Purpose |
|
||||
|---------------|-------|----------------|-----------------|
|
||||
| **Synology NAS** | 2 | 105 containers | Storage, media, always-on services |
|
||||
| **Proxmox VMs** | 1 | 30 containers | Monitoring, privacy frontends, AI |
|
||||
| **Physical Hosts** | 2 | 24 containers | Home automation, media, networking |
|
||||
| **Edge Devices** | 1 | 4 containers | Uptime monitoring, NAS services |
|
||||
|
||||
> **Note**: This covers Portainer-managed endpoints only. Total: 159 containers across 5 endpoints.
|
||||
|
||||
---
|
||||
|
||||
## 📦 Synology NAS Cluster
|
||||
|
||||
### 🏛️ **Atlantis** - Primary Media & Infrastructure Hub
|
||||
**Hardware**: Synology DS1823xs+ (8-bay enterprise NAS)
|
||||
**Services**: 51 containers
|
||||
**Role**: Core infrastructure, media services, monitoring
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Media Streaming** | Plex, Immich, Tautulli | Personal Netflix and Google Photos |
|
||||
| **Content Management** | Arr Suite (Sonarr, Radarr, etc.) | Automated media acquisition |
|
||||
| **Monitoring** | Grafana, Prometheus, Uptime Kuma | Infrastructure monitoring |
|
||||
| **Security** | Vaultwarden, Pi-hole, Wireguard | Password management, ad blocking |
|
||||
| **Development** | GitLab, Dozzle, Portainer | Code management, container monitoring |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: AMD Ryzen Embedded V1780B (4-core/8-thread, 3.35GHz)
|
||||
- **RAM**: 32GB DDR4 ECC (installed, upgradeable to 64GB)
|
||||
- **Storage**: 8x 16TB Seagate IronWolf Pro (ST16000NT001) - 128TB total capacity
|
||||
- **Drive specs**: Enterprise NAS, CMR, 3.5", SATA 6Gb/s, 7,200 RPM, 256MB cache
|
||||
- **RAID**: Configured for high availability and performance
|
||||
- **Cache**: 2x 480GB WD Black SN750 NVMe SSDs (M.2 slots)
|
||||
- **Network**: 2x Gigabit Ethernet + 10GbE (connected to TP-Link TL-SX1008)
|
||||
- **Power**: ~65W average consumption (with full drive array)
|
||||
|
||||
#### 📁 **Storage Layout**
|
||||
```
|
||||
/volume1/ (128TB total capacity)
|
||||
├── docker/ # Container persistent data
|
||||
├── media/ # Movies, TV shows, music (massive 4K library)
|
||||
├── photos/ # Photo library for Immich (high-resolution storage)
|
||||
├── documents/ # Paperless-NGX documents
|
||||
├── backups/ # Local backup storage
|
||||
├── archive/ # Long-term data archival
|
||||
└── cache/ # NVMe cache acceleration (2x 480GB WD Black SN750)
|
||||
|
||||
# RAID Configuration:
|
||||
# - 8x 16TB Seagate IronWolf Pro drives
|
||||
# - Enterprise-grade CMR technology
|
||||
# - 7,200 RPM, 256MB cache per drive
|
||||
# - Configured for optimal performance and redundancy
|
||||
```
|
||||
|
||||
#### 🌐 **Key Ports & Access**
|
||||
- **Plex**: `atlantis.local:32400`
|
||||
- **Grafana**: `atlantis.local:7099`
|
||||
- **Portainer**: `atlantis.local:9000`
|
||||
- **DokuWiki**: `atlantis.local:8399`
|
||||
|
||||
---
|
||||
|
||||
### 🏢 **Calypso** - Development & Secondary Services
|
||||
**Hardware**: Synology DS723+ (2-bay plus NAS)
|
||||
**Services**: 54 containers
|
||||
**Role**: Development tools, backup services, package caching, SSO authentication
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Development** | Gitea, Reactive Resume, Gitea Runner | Git hosting, CI/CD, resume builder |
|
||||
| **Finance** | Actual Budget | Personal finance management |
|
||||
| **Authentication** | Authentik SSO | Single sign-on for all services |
|
||||
| **Infrastructure** | APT-Cacher-NG, Nginx Proxy Manager | Package caching, reverse proxy |
|
||||
| **Media** | Immich, Arr Suite, Tdarr | Media services, transcoding |
|
||||
| **Documents** | Paperless-NGX | Document management |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: AMD Ryzen R1600 (2-core, 2.6GHz)
|
||||
- **RAM**: 32GB DDR4 (fully upgraded from 2GB)
|
||||
- **Storage**: 2x 12TB Seagate IronWolf Pro (ST12000NT001) - 24TB total capacity
|
||||
- **Drive specs**: Enterprise NAS, CMR, 3.5", SATA 6Gb/s, 7,200 RPM, 256MB cache
|
||||
- **RAID**: RAID 1 for redundancy
|
||||
- **Cache**: 2x 480GB WD Black SN750 NVMe SSDs (M.2 slot)
|
||||
- **Network**: 2x Gigabit Ethernet + 10GbE PCIe card (connected to TP-Link TL-SX1008)
|
||||
- **Expansion**: 10 Gigabit Ethernet PCIe card for high-speed connectivity
|
||||
- **Power**: ~25W average consumption
|
||||
|
||||
#### 📁 **Storage Layout**
|
||||
```
|
||||
/volume1/ (24TB total capacity - RAID 1)
|
||||
├── docker/ # Container data
|
||||
├── apt-cache/ # Debian package cache (high-speed access)
|
||||
├── backups/ # Backup destination from Atlantis
|
||||
├── development/ # Git repositories and development data
|
||||
└── cache/ # NVMe cache acceleration (2x 480GB WD Black SN750)
|
||||
|
||||
# RAID Configuration:
|
||||
# - 2x 12TB Seagate IronWolf Pro drives in RAID 1
|
||||
# - Enterprise-grade CMR technology
|
||||
# - 7,200 RPM, 256MB cache per drive
|
||||
# - Full redundancy with 10GbE connectivity
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🔍 **Setillo** - Remote Monitoring & Offsite Backup
|
||||
**Hardware**: Synology DS223j (2-bay entry-level NAS)
|
||||
**Services**: 4 containers
|
||||
**Role**: Remote monitoring, offsite backup, Plex server (Tucson, AZ)
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Monitoring** | Prometheus, AdGuard | Network monitoring, DNS filtering |
|
||||
| **Network** | SNMP Exporter | Network device monitoring |
|
||||
| **Media** | Plex Media Server | Remote media streaming |
|
||||
| **Backup** | HyperBackup | Offsite backup destination |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: Realtek RTD1619B (4-core, 1.7GHz ARM Cortex-A55, aarch64)
|
||||
- **RAM**: 1GB DDR4 (non-upgradeable)
|
||||
- **Storage**: 2x 10TB WD Gold Enterprise drives (SHR, ~8.9TB usable)
|
||||
- **Network**: 1x Gigabit Ethernet
|
||||
- **Tailscale IP**: 100.125.0.20
|
||||
- **Location**: Tucson, AZ (remote, Tailscale-only access)
|
||||
- **Power**: ~8W average consumption
|
||||
|
||||
---
|
||||
|
||||
## 💻 Proxmox Virtual Machines
|
||||
|
||||
### 🏠 **Homelab VM** - General Purpose Experimentation
|
||||
**Host**: Proxmox VE
|
||||
**Services**: 30 containers
|
||||
**Role**: Monitoring hub, privacy frontends, AI tools
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Monitoring** | Grafana, Prometheus, Alertmanager | Centralized monitoring |
|
||||
| **Notifications** | NTFY, Signal API | Push notifications |
|
||||
| **Privacy** | Redlib, Binternet, Proxitok | Privacy-respecting frontends |
|
||||
| **Archiving** | ArchiveBox, Hoarder/Karakeep | Web archiving, bookmarks |
|
||||
| **AI** | Perplexica, OpenHands | AI search, development agent |
|
||||
|
||||
#### 🔧 **VM Specifications**
|
||||
- **vCPU**: 4 cores
|
||||
- **RAM**: 8GB
|
||||
- **Storage**: 100GB SSD
|
||||
- **Network**: Bridged to main network
|
||||
- **OS**: Ubuntu 22.04 LTS
|
||||
|
||||
---
|
||||
|
||||
### 🌍 **matrix-ubuntu** - Communication Services VM
|
||||
**Host**: Atlantis (Synology Virtual Machine Manager)
|
||||
**Services**: Matrix Synapse, Mattermost, Mastodon
|
||||
**Role**: Decentralized communication platform
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Communication** | Matrix (Synapse) | Decentralized chat server (mx.vish.gg) |
|
||||
| **Chat** | Mattermost | Team messaging (mm.crista.love) |
|
||||
| **Social** | Mastodon | Federated social network (mastodon.vish.gg) |
|
||||
|
||||
#### 🔧 **VM Specifications**
|
||||
- **vCPU**: 4 cores (AMD Ryzen Embedded V1780B)
|
||||
- **RAM**: 8GB
|
||||
- **Storage**: 100GB (87GB available)
|
||||
- **OS**: Ubuntu 24.04.3 LTS
|
||||
- **LAN IP**: 192.168.0.154
|
||||
- **Tailscale IP**: 100.85.21.51
|
||||
- **SSH Port**: 65533
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Physical Hosts
|
||||
|
||||
### 🎨 **Shinku-Ryuu** - Primary Desktop Workstation
|
||||
**Hardware**: Custom built gaming/workstation in HYTE Y70 Red case
|
||||
**Services**: Development environment, creative workstation
|
||||
**Role**: Primary development machine, creative work, high-performance computing
|
||||
|
||||
#### 🎯 **Primary Use Cases**
|
||||
| Category | Purpose | Applications |
|
||||
|----------|---------|-------------|
|
||||
| **Development** | Software development, coding | VS Code, IDEs, Docker Desktop |
|
||||
| **Creative** | Content creation, design | Adobe Creative Suite, Blender |
|
||||
| **Gaming** | High-end gaming, streaming | Steam, OBS, game development |
|
||||
| **AI/ML** | Machine learning development | PyTorch, TensorFlow, CUDA workloads |
|
||||
| **Homelab Management** | Infrastructure administration | SSH clients, monitoring dashboards |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: Intel Core i7-14700K (20-core, 3.4GHz base, 5.6GHz boost)
|
||||
- **RAM**: 96GB DDR4 (high-capacity for AI/ML workloads)
|
||||
- **GPU**: NVIDIA RTX 4080 (16GB VRAM for AI/gaming)
|
||||
- **Storage**: 2TB+ NVMe SSD (high-speed storage)
|
||||
- **Case**: HYTE Y70 Red (premium gaming case with excellent airflow)
|
||||
- **Network**: Gigabit Ethernet + WiFi 6E + 10GbE (connected to TP-Link TL-SX1008)
|
||||
- **OS**: Windows 11 Pro (with WSL2 for Linux development)
|
||||
|
||||
---
|
||||
|
||||
### ⚡ **Anubis** - Legacy Mac Mini Server
|
||||
**Hardware**: Apple Mac Mini (Late 2014)
|
||||
**Services**: 8 containers
|
||||
**Role**: Legacy services, lightweight workloads, testing
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **AI/ML** | ChatGPT Interface | AI chat applications |
|
||||
| **Media** | PhotoPrism | AI-powered photo management |
|
||||
| **Communication** | Element, Conduit | Matrix client and server |
|
||||
| **Productivity** | Draw.io, ArchiveBox | Diagramming, web archiving |
|
||||
| **Monitoring** | Pi Alert | Network device discovery |
|
||||
| **Privacy** | Proxitok | TikTok privacy frontend |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: Intel Core i5-4278U (2-core, 2.6GHz, Haswell)
|
||||
- **RAM**: 8GB DDR3L (soldered, non-upgradeable)
|
||||
- **GPU**: Intel Iris 5100 (integrated graphics)
|
||||
- **Storage**: 1TB Fusion Drive (128GB SSD + 1TB HDD hybrid)
|
||||
- **Network**: Gigabit Ethernet + 802.11ac WiFi
|
||||
- **Ports**: 2x Thunderbolt 2, 4x USB 3.0, HDMI, SDXC
|
||||
- **OS**: macOS (potentially running Docker via VM or Linux)
|
||||
|
||||
---
|
||||
|
||||
### 🧠 **Guava** - TrueNAS Scale Workstation
|
||||
**Hardware**: Custom built AMD workstation in SilverStone SUGO 16 case
|
||||
**Services**: 12+ containers (TrueNAS apps)
|
||||
**Role**: Storage server, media, AI/ML, development, compute-intensive tasks
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Media** | Jellyfin | Media streaming server |
|
||||
| **AI/ML** | Ollama, LlamaGPT | Local language models |
|
||||
| **Development** | Gitea, CoCalc | Git hosting, collaborative computing |
|
||||
| **Health** | Fasten Health | Personal health record management |
|
||||
| **Infrastructure** | Portainer, Nginx, Fenrus | Container management, dashboard |
|
||||
| **Networking** | WireGuard, Tailscale | VPN server, mesh networking |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **OS**: TrueNAS Scale 25.04.2.6 (Dragonfish, Debian-based)
|
||||
- **Motherboard**: ASRock B850I Lightning WiFi (Mini-ITX)
|
||||
- **CPU**: AMD Ryzen 5 8600G (6-core/12-thread, 4.3GHz base, 5.0GHz boost, Zen 4)
|
||||
- **RAM**: 32GB DDR5-5600
|
||||
- **GPU**: Integrated AMD Radeon 760M (RDNA 3 iGPU)
|
||||
- **Storage**: ZFS Mirror — 2x WD Blue SA510 4TB SATA SSD (data pool) + WD Black SN770 500GB NVMe (boot)
|
||||
- **Case**: SilverStone SUGO 16 (compact Mini-ITX case)
|
||||
- **Network**: Mellanox ConnectX-5 10GbE (connected to TP-Link TL-SX1008)
|
||||
- **LAN IP**: 192.168.0.100
|
||||
- **Tailscale IP**: 100.75.252.64
|
||||
|
||||
---
|
||||
|
||||
### 💻 **MSI Prestige 13 AI Plus** - Travel Laptop
|
||||
**Hardware**: MSI Prestige 13 AI Plus Ukiyo-e Edition (A2VMX)
|
||||
**Role**: Primary travel workstation with AI acceleration
|
||||
**Connectivity**: Tailscale mesh networking for homelab access
|
||||
|
||||
#### 🎯 **Primary Use Cases**
|
||||
| Category | Use Case | Homelab Integration |
|
||||
|----------|----------|-------------------|
|
||||
| **Development** | Remote coding, Git operations | Full GitLab access via Tailscale |
|
||||
| **Content Creation** | Photo/video editing, AI processing | Access to Atlantis media storage |
|
||||
| **Productivity** | Document editing, presentations | Paperless-NGX, file sync |
|
||||
| **Communication** | Video calls, messaging | Matrix, Jitsi via homelab |
|
||||
| **Security** | Password management, 2FA | Vaultwarden access |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: Intel Core Ultra 7 258V (8-core, up to 4.8GHz, Meteor Lake)
|
||||
- **GPU**: Intel Arc Graphics (integrated, AI-optimized)
|
||||
- **AI Accelerator**: Intel AI Boost NPU (up to 47 TOPS)
|
||||
- **RAM**: 32GB LPDDR5X (high-speed, soldered)
|
||||
- **Storage**: 1TB PCIe 4.0 NVMe SSD
|
||||
- **Display**: 13.3" OLED 2.8K (2880x1800) 100% DCI-P3, touch-enabled
|
||||
- **Network**: Wi-Fi 7 (802.11be), Bluetooth 5.4
|
||||
- **Ports**: 2x Thunderbolt 4, 1x USB-A 3.2, 1x HDMI 2.1, 1x Audio
|
||||
- **Battery**: 75Wh with fast charging support
|
||||
- **Weight**: 2.18 lbs (990g) ultra-portable
|
||||
- **OS**: Windows 11 Pro with WSL2 for Linux development
|
||||
- **Tailscale IP**: 100.80.0.26 (msi)
|
||||
|
||||
#### 🌐 **Connectivity Features**
|
||||
- **Wi-Fi 7**: Latest wireless standard for maximum performance
|
||||
- **Thunderbolt 4**: High-speed external storage and displays
|
||||
- **HDMI 2.1**: 4K@120Hz external monitor support
|
||||
- **Tailscale Integration**: Seamless homelab access from anywhere
|
||||
- **GL.iNet Compatibility**: Works with all travel router configurations
|
||||
|
||||
#### 🎨 **Special Edition Features**
|
||||
- **Ukiyo-e Design**: Traditional Japanese art-inspired aesthetics
|
||||
- **Premium Build**: Magnesium-aluminum alloy construction
|
||||
- **OLED Display**: True blacks, vibrant colors for creative work
|
||||
- **AI Optimization**: Hardware-accelerated AI workloads
|
||||
|
||||
#### 🔗 **Homelab Integration**
|
||||
- **Remote Development**: Full access to development environments
|
||||
- **Media Access**: Stream from Plex/Jellyfin via Tailscale
|
||||
- **File Synchronization**: Seamless access to NAS storage
|
||||
- **Monitoring**: View Grafana dashboards and system status
|
||||
- **Security**: Vaultwarden for password management
|
||||
- **Communication**: Matrix, Element for team collaboration
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Edge Devices
|
||||
|
||||
### 🏠 **Concord NUC** - Home Automation Hub
|
||||
**Hardware**: Intel NUC6i3SYB (6th gen NUC)
|
||||
**Services**: 9 containers
|
||||
**Role**: Home automation, IoT hub, edge computing
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Home Automation** | Home Assistant | Smart home control center |
|
||||
| **Security** | AdGuard Home, Wireguard | DNS filtering, VPN access |
|
||||
| **Media** | Invidious, YourSpotify | Privacy-focused media |
|
||||
| **Infrastructure** | Dynamic DNS, Syncthing | Network services, file sync |
|
||||
| **Gaming** | Don't Starve Together | Game server hosting |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: Intel Core i3-6100U (2-core, 2.3GHz)
|
||||
- **RAM**: 16GB DDR4 (upgraded from 4GB)
|
||||
- **Storage**: 256GB M.2 SATA SSD
|
||||
- **Network**: Gigabit Ethernet + WiFi AC
|
||||
- **Power**: ~10W average consumption
|
||||
- **OS**: Ubuntu 22.04 LTS
|
||||
|
||||
---
|
||||
|
||||
### 🥧 **Raspberry Pi Cluster**
|
||||
|
||||
#### **Pi-5 (Vish)** - Primary Pi Node
|
||||
**Hardware**: Raspberry Pi 5 16GB in PiRonMan 5 Max case
|
||||
**Services**: 1 container
|
||||
**Role**: Lightweight services, sensors, development
|
||||
|
||||
- **CPU**: Broadcom BCM2712 (4-core, 2.4GHz)
|
||||
- **RAM**: 16GB LPDDR4X (maximum capacity model)
|
||||
- **Storage**: 235GB microSD + USB SSD
|
||||
- **Case**: SunFounder PiRonMan 5 Max (premium case with cooling and expansion)
|
||||
- **Network**: Gigabit Ethernet + WiFi 6
|
||||
- **Features**: Enhanced cooling, GPIO expansion, OLED display
|
||||
|
||||
#### **Pi-5-Kevin** - Secondary Pi Node
|
||||
**Hardware**: Raspberry Pi 5 8GB
|
||||
**Services**: 1 container
|
||||
**Role**: Backup services, IoT sensors
|
||||
**Status**: Frequently offline (typically powered off or disconnected)
|
||||
|
||||
- **CPU**: Broadcom BCM2712 (4-core, 2.4GHz)
|
||||
- **RAM**: 8GB LPDDR4X
|
||||
- **Storage**: 64GB microSD
|
||||
- **Network**: Gigabit Ethernet + WiFi 6
|
||||
|
||||
> **Note**: This Pi node may be unavailable as it is occasionally disconnected and not always actively managed.
|
||||
|
||||
#### **Jellyfish** - NAS & Media Server Pi
|
||||
**Hardware**: Raspberry Pi 5 Model B Rev 1.0 (4GB)
|
||||
**Services**: Docker containers, NAS storage
|
||||
**Role**: Network Attached Storage, media server, lightweight services
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Category | Services | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Storage** | NAS services | 3.6TB external storage mounted at /srv/nas |
|
||||
| **Network** | Tailscale VPN | Remote access via 100.69.121.120 |
|
||||
| **Infrastructure** | Docker containers | Container orchestration |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: ARM Cortex-A76 (4-core, 1.5-2.4GHz)
|
||||
- **RAM**: 4GB LPDDR4X
|
||||
- **Storage**: 29GB microSD (root) + 3.6TB external SSD (NAS)
|
||||
- **Network**: Gigabit Ethernet (192.168.12.181) + WiFi (192.168.12.182) + Tailscale VPN
|
||||
- **OS**: Debian GNU/Linux 13 (trixie) with kernel 6.12.47+rpt-rpi-2712
|
||||
- **Uptime**: 38+ days (highly stable)
|
||||
- **Power**: Low power consumption ARM architecture
|
||||
|
||||
#### 🌐 **Network Configuration**
|
||||
- **Local Ethernet**: 192.168.12.181/24 (MAC: 2c:cf:67:24:39:d6)
|
||||
- **Local WiFi**: 192.168.12.182/24 (MAC: 2c:cf:67:24:39:d7)
|
||||
- **Tailscale VPN**: 100.69.121.120/32 (secure remote access)
|
||||
- **Docker Networks**: Bridge networks for container isolation
|
||||
|
||||
#### 💾 **Storage Layout**
|
||||
```
|
||||
/dev/mmcblk0p2 29G 8.4G 20G 31% / # Root filesystem (SD card)
|
||||
/dev/mapper/ssd 3.6T 1.8T 1.7T 53% /srv/nas # External NAS storage
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌍 Remote Systems
|
||||
|
||||
### 🌙 **Moon** - Remote Desktop Workstation
|
||||
**Hardware**: MSI MS-7E03 (Z790), Intel i7-14700K
|
||||
**Hostname**: moon
|
||||
**Headscale IP**: 100.64.0.6
|
||||
**LAN IP**: 192.168.12.223 (behind GL-MT3000)
|
||||
**SSH**: `ssh moon` (direct via Tailscale)
|
||||
**Role**: Remote workstation, runs local Headscale instance
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Service | Purpose |
|
||||
|---------|---------|
|
||||
| Headscale v0.23.0-rc.1 | Local Headscale instance (primary runs on Calypso) |
|
||||
| Docker | Container runtime |
|
||||
| Glances | System monitoring |
|
||||
| iperf3 | Network performance testing |
|
||||
|
||||
#### 🔧 **Technical Specifications**
|
||||
- **CPU**: Intel Core i7-14700K (20-core, Raptor Lake-S)
|
||||
- **RAM**: 48GB DDR5
|
||||
- **Storage**: 2x NVMe SSD (WD Black SN770 + SanDisk SN8000S), 456GB root
|
||||
- **GPU**: Intel UHD Graphics 770 (iGPU)
|
||||
- **OS**: Debian 12 (bookworm) with GNOME desktop
|
||||
- **Network**: Intel I226-V 2.5GbE + Intel CNVi WiFi
|
||||
|
||||
#### 📝 **Notes**
|
||||
- Migrated from public Tailscale to self-hosted Headscale on 2026-03-14
|
||||
- `accept_routes=true` — routes `192.168.0.0/24` via Calypso for home LAN access
|
||||
- Headscale runs as a systemd service (not Docker)
|
||||
|
||||
---
|
||||
|
||||
### ☁️ **Seattle (Contabo VPS)** - Cloud Services & Exit Node
|
||||
**Provider**: Contabo GmbH
|
||||
**Tailscale Name**: `seattle` (100.82.197.124)
|
||||
**Hostname**: `vmi2076105.contaboserver.net`
|
||||
**Services**: Multiple Docker stacks
|
||||
**Role**: Cloud services, public-facing apps, Tailscale exit node
|
||||
|
||||
#### 🎯 **Primary Services**
|
||||
| Container | Purpose |
|
||||
|-----------|---------|
|
||||
| `padloc` (nginx/server/pwa) | Padloc password manager |
|
||||
| `keeweb` | KeeWeb password manager |
|
||||
| `obsidian` | Obsidian sync server |
|
||||
| `wallabag` | Read-it-later / article archiving |
|
||||
| `derper` | DERP relay server for Headscale |
|
||||
| `diun` | Docker image update notifier |
|
||||
| `dozzle-agent` | Log viewer agent |
|
||||
| `ddns-*` | Cloudflare DDNS updaters |
|
||||
|
||||
#### 🔧 **VM Specifications**
|
||||
- **vCPU**: 16 cores (AMD EPYC)
|
||||
- **RAM**: 62GB
|
||||
- **Storage**: 290GB NVMe (142GB used)
|
||||
- **Network**: Unmetered (Contabo)
|
||||
- **Location**: Seattle, WA (US West)
|
||||
- **OS**: Ubuntu 24.04.4 LTS
|
||||
- **Tailscale**: Exit node (100.82.197.124)
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Network Architecture
|
||||
|
||||
### 🚀 **10 Gigabit Ethernet Infrastructure**
|
||||
|
||||
#### **TP-Link TL-SX1008 - 10GbE Switch**
|
||||
**Hardware**: 8-port 10 Gigabit Ethernet unmanaged switch
|
||||
**Role**: High-speed backbone for storage and compute-intensive systems
|
||||
|
||||
#### **10GbE Connected Systems**
|
||||
| Host | 10GbE Interface | Primary Use Case |
|
||||
|------|----------------|------------------|
|
||||
| **Atlantis** | Built-in 10GbE | Media streaming, backup operations |
|
||||
| **Calypso** | PCIe 10GbE card | Development, package caching |
|
||||
| **Shinku-Ryuu** | PCIe 10GbE card | Gaming, creative work, large file transfers |
|
||||
| **Guava** | PCIe 10GbE card | AI/ML datasets, model training |
|
||||
|
||||
#### **Network Performance Benefits**
|
||||
- **Media Streaming**: 4K/8K content delivery without buffering
|
||||
- **Backup Operations**: Fast inter-NAS synchronization
|
||||
- **Development**: Rapid Docker image pulls, package caching
|
||||
- **AI/ML**: High-speed dataset transfers for training
|
||||
- **Creative Work**: Large video/photo file transfers
|
||||
|
||||
### 🔗 **Network Topology**
|
||||
```
|
||||
Internet (25Gbps Fiber)
|
||||
│
|
||||
├── TP-Link Archer BE800 Router (192.168.0.1)
|
||||
│ ├── Main Network (192.168.0.0/24) - trusted devices
|
||||
│ └── TP-Link TL-SX1008 (10GbE Switch)
|
||||
│ ├── Atlantis (192.168.0.200) - 10GbE
|
||||
│ ├── Calypso (192.168.0.250) - 10GbE
|
||||
│ ├── Guava (192.168.0.100) - 10GbE
|
||||
│ └── Shinku-Ryuu (192.168.0.3) - 10GbE
|
||||
│
|
||||
│
|
||||
├── GL-MT3000 Router (192.168.12.1) — remote location
|
||||
│ ├── moon (192.168.12.223) — i7-14700K desktop
|
||||
│ ├── jellyfish (192.168.12.181) — Pi 5 NAS
|
||||
│ └── homeassistant (192.168.12.202) — HA Green
|
||||
│
|
||||
└── Headscale VPN Overlay (self-hosted at headscale.vish.gg:8443, runs on Calypso)
|
||||
├── Atlantis (100.83.230.112)
|
||||
├── Calypso (100.103.48.78) ← advertises 192.168.0.0/24 subnet route
|
||||
├── Guava (100.75.252.64) ← accept_routes=false (avoids routing loop)
|
||||
├── Setillo (100.125.0.20) ← Tucson, AZ
|
||||
├── Seattle VPS (100.82.197.124) ← Contabo, exit node
|
||||
├── Homelab VM (100.67.40.126)
|
||||
├── moon (100.64.0.6) ← accept_routes=true
|
||||
└── All other 10+ nodes...
|
||||
```
|
||||
|
||||
### 🏷️ **Tailscale Network Status**
|
||||
Based on current network status (`tailscale status`):
|
||||
|
||||
#### **Active Homelab Infrastructure**
|
||||
| Host | Tailscale IP | Status | Connection | Primary Access |
|
||||
|------|--------------|--------|------------|----------------|
|
||||
| **Atlantis** | 100.83.230.112 | Active | Direct (192.168.0.200) | atlantis.tail.vish.gg | OOB: 192.168.0.80 |
|
||||
| **Calypso** | 100.103.48.78 | Active | Direct (192.168.0.250) | calypso.tail.vish.gg |
|
||||
| **Setillo** | 100.125.0.20 | Active | Direct (98.97.118.125) | setillo.tail.vish.gg |
|
||||
| **Homelab VM** | 100.67.40.126 | Online | Local | homelab.tail.vish.gg |
|
||||
| **Pi-5** | 100.77.151.40 | Active | Direct (192.168.0.66) | pi-5.tail.vish.gg |
|
||||
| **PVE** | 100.87.12.28 | Active | Direct (192.168.0.205) | pve.tail.vish.gg |
|
||||
| **TrueNAS Scale** | 100.75.252.64 | Active | Direct (192.168.0.100) | truenas-scale.tail.vish.gg |
|
||||
| **Shinku-Ryuu** | 100.98.93.15 | Active | Direct (184.23.52.219) | shinku-ryuu.tail.vish.gg |
|
||||
| **Concord NUC** | 100.72.55.21 | Active | Direct (YOUR_WAN_IP) | vish-concord-nuc.tail.vish.gg |
|
||||
| **Seattle VPS** | 100.82.197.124 | Active | Direct | seattle.tail.vish.gg |
|
||||
|
||||
#### **Mobile & Travel Devices**
|
||||
| Device | Tailscale IP | Status | Type | Access |
|
||||
|--------|--------------|--------|------|--------|
|
||||
| **MSI Prestige 13 AI** | 100.80.0.26 | Offline (1h ago) | Windows | msi.tail.vish.gg |
|
||||
| **iPhone 16** | 100.79.252.108 | Offline (1d ago) | iOS | iphone16.tail.vish.gg |
|
||||
| **iPad Pro 12.9"** | 100.68.71.48 | Offline (19h ago) | iOS | ipad-pro-12-9-6th-gen-wificellular.tail.vish.gg |
|
||||
| **GL-BE3600** | 100.105.59.123 | Offline (7h ago) | Linux | gl-be3600.tail.vish.gg |
|
||||
| **GL-MT3000** | 100.126.243.15 | Offline | Linux | gl-mt3000.tail.vish.gg |
|
||||
| **GL-RM1 KVM** | 100.64.137.1 | Offline (20d ago) | Linux | glkvm.tail.vish.gg |
|
||||
|
||||
#### **Secondary Systems**
|
||||
| Host | Tailscale IP | Status | Purpose | Access |
|
||||
|------|--------------|--------|---------|--------|
|
||||
| **moon** | 100.64.0.6 | Active | Remote desktop workstation | `ssh moon` |
|
||||
| **Pi-5-Kevin** | 100.123.246.75 | Offline | Secondary Pi | pi-5-kevin.tail.vish.gg |
|
||||
| **Home Assistant VM** | 100.125.209.124 | Idle | Smart Home | homeassistant-vm.tail.vish.gg |
|
||||
| **NVIDIA Shield** | 100.89.79.99 | Offline | Media Player | nvidia-shield-android-tv.tail.vish.gg |
|
||||
|
||||
#### **Exit Nodes Available**
|
||||
- **Concord NUC** (100.72.55.21) - Family network bridge
|
||||
- **Home Assistant VM** (100.125.209.124) - Smart home network
|
||||
|
||||
#### **Network Health Notes**
|
||||
- Some peers advertising routes but `--accept-routes` is false
|
||||
- Direct connections established for most active systems
|
||||
- Relay connections used when direct connection unavailable
|
||||
|
||||
---
|
||||
|
||||
## 📊 Resource Utilization
|
||||
|
||||
### 💾 **Storage Distribution**
|
||||
| Host | Total Storage | Used | Available | Type |
|
||||
|------|---------------|------|-----------|------|
|
||||
| **Atlantis** | 128TB | ~60TB | ~68TB | 8x 16TB IronWolf Pro + NVMe cache |
|
||||
| **Calypso** | 24TB | ~12TB | ~12TB | 2x 12TB IronWolf Pro RAID 1 + NVMe cache |
|
||||
| **Setillo** | 1TB | 400GB | 600GB | Single drive |
|
||||
| **Anubis** | 1TB | 600GB | 400GB | Fusion Drive (hybrid SSD/HDD) |
|
||||
| **Guava** | 6TB | 2TB | 4TB | NVMe + HDD |
|
||||
|
||||
### ⚡ **Power Consumption**
|
||||
| Host Category | Power Usage | Annual Cost* |
|
||||
|---------------|-------------|--------------|
|
||||
| **Synology NAS** | ~90W | $195 |
|
||||
| **Proxmox Host** | ~150W | $325 |
|
||||
| **Physical Hosts** | ~280W | $610 |
|
||||
| **Edge Devices** | ~25W | $55 |
|
||||
| **Total** | ~545W | $1,185 |
|
||||
|
||||
*Based on $0.25/kWh electricity rate
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Management & Automation
|
||||
|
||||
### 🤖 **Ansible Inventory**
|
||||
All hosts are managed through Ansible with the following groups:
|
||||
|
||||
```ini
|
||||
[synology]
|
||||
atlantis ansible_host=100.83.230.112 ansible_port=60000
|
||||
calypso ansible_host=100.103.48.78 ansible_port=62000
|
||||
setillo ansible_host=100.125.0.20
|
||||
|
||||
[proxmox_vms]
|
||||
homelab ansible_host=100.67.40.126
|
||||
matrix-ubuntu ansible_host=100.85.21.51 ansible_port=65533
|
||||
|
||||
[physical_hosts]
|
||||
shinku-ryuu ansible_host=100.98.93.15
|
||||
guava ansible_host=100.75.252.64
|
||||
|
||||
[edge_devices]
|
||||
concord-nuc ansible_host=100.72.55.21
|
||||
pi-5 ansible_host=100.77.151.40
|
||||
pi-5-kevin ansible_host=100.123.246.75
|
||||
jellyfish ansible_host=100.69.121.120
|
||||
|
||||
[remote]
|
||||
seattle ansible_host=100.82.197.124
|
||||
```
|
||||
|
||||
### 📋 **Common Management Tasks**
|
||||
- **Health Checks**: Automated service monitoring
|
||||
- **Updates**: Coordinated system and container updates
|
||||
- **Backups**: Automated backup orchestration
|
||||
- **Deployment**: New service deployment across hosts
|
||||
- **Configuration**: Consistent configuration management
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Scaling Strategy
|
||||
|
||||
### 📈 **Horizontal Scaling**
|
||||
- **Add new VMs**: Easy to provision on Proxmox
|
||||
- **Expand Pi cluster**: Add more Raspberry Pi nodes
|
||||
- **Cloud integration**: Utilize remote VPS for specific workloads
|
||||
|
||||
### 📊 **Vertical Scaling**
|
||||
- **Memory upgrades**: Most hosts support RAM expansion
|
||||
- **Storage expansion**: Add drives to NAS units
|
||||
- **CPU upgrades**: Replace older hardware as needed
|
||||
|
||||
### 🔄 **Load Distribution**
|
||||
- **Service placement**: Optimize services based on host capabilities
|
||||
- **Database clustering**: Distribute database workloads
|
||||
- **CDN integration**: Use edge nodes for content delivery
|
||||
|
||||
---
|
||||
|
||||
## 📋 Related Documentation
|
||||
|
||||
| Document | Description |
|
||||
|----------|-------------|
|
||||
| **[Network Architecture](networking.md)** | 25Gbps internet, 10GbE backbone, Cloudflare, DNS |
|
||||
| **[Security Model](security.md)** | Firewall, authentication, secrets, backups |
|
||||
| **[Storage Systems](storage.md)** | RAID configs, backup strategy, 3-2-1 compliance |
|
||||
| **[Service Categories](../services/categories.md)** | What services run where |
|
||||
|
||||
---
|
||||
|
||||
*This infrastructure has evolved over time and continues to grow. Each host serves specific purposes while contributing to the overall homelab ecosystem.*
|
||||
|
||||
*Last updated: March 2026*
|
||||
228
docs/infrastructure/hosts/atlantis-runbook.md
Normal file
228
docs/infrastructure/hosts/atlantis-runbook.md
Normal file
@@ -0,0 +1,228 @@
|
||||
# Atlantis Runbook
|
||||
|
||||
*Synology DS1821+ - Primary NAS and Media Server*
|
||||
|
||||
**Endpoint ID:** 2
|
||||
**Status:** 🟢 Online
|
||||
**Hardware:** AMD Ryzen V1500B, 32GB RAM, 8 bays
|
||||
**Access:** `atlantis.vish.local`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Atlantis is the primary Synology NAS serving as the homelab's central storage and media infrastructure.
|
||||
|
||||
## Hardware Specs
|
||||
|
||||
| Component | Specification |
|
||||
|----------|---------------|
|
||||
| Model | Synology DS1821+ |
|
||||
| CPU | AMD Ryzen V1500B (4-core) |
|
||||
| RAM | 32GB |
|
||||
| Storage | 8-bay RAID6 + SSD cache |
|
||||
| Network | 4x 1GbE (Link aggregated) |
|
||||
|
||||
## Services
|
||||
|
||||
### Critical Services
|
||||
|
||||
| Service | Port | Purpose | Docker Image |
|
||||
|---------|------|---------|--------------|
|
||||
| **Vaultwarden** | 8080 | Password manager | vaultwarden/server |
|
||||
| **Immich** | 2283 | Photo backup | immich-app/immich |
|
||||
| **Plex** | 32400 | Media server | plexinc/pms-docker |
|
||||
| **Ollama** | 11434 | AI/ML | ollama/ollama |
|
||||
|
||||
### Media Stack
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| arr-suite | Various | Sonarr, Radarr, Lidarr, Prowlarr |
|
||||
| qBittorrent | 8080 | Download client |
|
||||
| Jellyseerr | 5055 | Media requests |
|
||||
|
||||
### Infrastructure
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| Portainer | 9000 | Container management |
|
||||
| Watchtower | 9001 | Auto-updates |
|
||||
| Dozzle | 8081 | Log viewer |
|
||||
| Nginx Proxy Manager | 81/444 | Legacy proxy |
|
||||
|
||||
### Additional Services
|
||||
|
||||
- Jitsi (Video conferencing)
|
||||
- Matrix/Synapse (Chat)
|
||||
- Mastodon (Social)
|
||||
- Paperless-NGX (Documents)
|
||||
- Syncthing (File sync)
|
||||
- Grafana + Prometheus (Monitoring)
|
||||
|
||||
---
|
||||
|
||||
## Storage Layout
|
||||
|
||||
```
|
||||
/volume1/
|
||||
├── docker/ # Docker volumes
|
||||
├── docker/compose/ # Service configurations
|
||||
├── media/ # Media files
|
||||
│ ├── movies/
|
||||
│ ├── tv/
|
||||
│ ├── music/
|
||||
│ └── books/
|
||||
├── photos/ # Immich storage
|
||||
├── backups/ # Backup destination
|
||||
└── shared/ # Shared folders
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Daily Operations
|
||||
|
||||
### Check Service Health
|
||||
```bash
|
||||
# Via Portainer
|
||||
open http://atlantis.vish.local:9000
|
||||
|
||||
# Via SSH
|
||||
ssh admin@atlantis.vish.local
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
||||
```
|
||||
|
||||
### Check Disk Usage
|
||||
```bash
|
||||
# SSH to Atlantis
|
||||
ssh admin@atlantis.vish.local
|
||||
|
||||
# Synology storage manager
|
||||
sudo syno-storage-usage -a
|
||||
|
||||
# Or via Docker
|
||||
docker system df
|
||||
```
|
||||
|
||||
### View Logs
|
||||
```bash
|
||||
# Specific service
|
||||
docker logs vaultwarden
|
||||
|
||||
# Follow logs
|
||||
docker logs -f vaultwarden
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Service Won't Start
|
||||
1. Check if port is already in use: `netstat -tulpn | grep <port>`
|
||||
2. Check logs: `docker logs <container>`
|
||||
3. Verify volume paths exist
|
||||
4. Restart Docker: `sudo systemctl restart docker`
|
||||
|
||||
### Storage Full
|
||||
1. Identify large files: `docker system df -v`
|
||||
2. Clean Docker: `docker system prune -a`
|
||||
3. Check Synology Storage Analyzer
|
||||
4. Archive old media files
|
||||
|
||||
### Performance Issues
|
||||
1. Check resource usage: `docker stats`
|
||||
2. Review Plex transcode logs
|
||||
3. Check RAID health: `sudo mdadm --detail /dev/md0`
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Weekly
|
||||
- [ ] Verify backup completion
|
||||
- [ ] Check disk health (S.M.A.R.T.)
|
||||
- [ ] Review Watchtower updates
|
||||
- [ ] Check Plex library integrity
|
||||
|
||||
### Monthly
|
||||
- [ ] Run Docker cleanup
|
||||
- [ ] Update Docker Compose files
|
||||
- [ ] Review storage usage trends
|
||||
- [ ] Check security updates
|
||||
|
||||
### Quarterly
|
||||
- [ ] Deep clean unused images/containers
|
||||
- [ ] Review service dependencies
|
||||
- [ ] Test disaster recovery
|
||||
- [ ] Update documentation
|
||||
|
||||
---
|
||||
|
||||
## Backup Procedures
|
||||
|
||||
### Configuration Backup
|
||||
```bash
|
||||
# Via Ansible
|
||||
ansible-playbook ansible/automation/playbooks/backup_configs.yml --tags atlantis
|
||||
```
|
||||
|
||||
### Data Backup
|
||||
- Synology Hyper Backup to external drive
|
||||
- Cloud sync to Backblaze B2
|
||||
- Critical configs to Git repository
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
ansible-playbook ansible/automation/playbooks/backup_verification.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Complete Outage
|
||||
1. Verify Synology is powered on
|
||||
2. Check network connectivity
|
||||
3. Access via DSM: `https://atlantis.vish.local:5001`
|
||||
4. Check Storage Manager for RAID status
|
||||
5. Contact via serial if no network
|
||||
|
||||
### RAID Degraded
|
||||
1. Identify failed drive via Storage Manager
|
||||
2. Power down and replace drive
|
||||
3. Rebuild will start automatically
|
||||
4. Monitor rebuild progress
|
||||
|
||||
### Data Recovery
|
||||
See [Disaster Recovery Guide](../troubleshooting/disaster-recovery.md)
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# SSH access
|
||||
ssh admin@atlantis.vish.local
|
||||
|
||||
# Container management
|
||||
cd /volume1/docker/compose/<service>
|
||||
docker-compose restart <service>
|
||||
|
||||
# View all containers
|
||||
docker ps -a --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
||||
|
||||
# Logs for critical services
|
||||
docker logs vaultwarden
|
||||
docker logs plex
|
||||
docker logs immich
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Synology DSM](https://atlantis.vish.local:5001)
|
||||
- [Portainer](http://atlantis.vish.local:9000)
|
||||
- [Vaultwarden](http://atlantis.vish.local:8080)
|
||||
- [Plex](http://atlantis.vish.local:32400)
|
||||
- [Immich](http://atlantis.vish.local:2283)
|
||||
237
docs/infrastructure/hosts/calypso-runbook.md
Normal file
237
docs/infrastructure/hosts/calypso-runbook.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Calypso Runbook
|
||||
|
||||
*Synology DS723+ - Secondary NAS and Infrastructure*
|
||||
|
||||
**Endpoint ID:** 443397
|
||||
**Status:** 🟢 Online
|
||||
**Hardware:** AMD Ryzen R1600, 32GB RAM, 2 bays + expansion
|
||||
**Access:** `calypso.vish.local`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Calypso is the secondary Synology NAS handling critical infrastructure services including authentication, reverse proxy, and monitoring.
|
||||
|
||||
## Hardware Specs
|
||||
|
||||
| Component | Specification |
|
||||
|----------|---------------|
|
||||
| Model | Synology DS723+ |
|
||||
| CPU | AMD Ryzen R1600 (2-core/4-thread) |
|
||||
| RAM | 32GB |
|
||||
| Storage | 2-bay SHR + eSATA expansion |
|
||||
| Network | 2x 1GbE |
|
||||
|
||||
## Services
|
||||
|
||||
### Critical Infrastructure
|
||||
|
||||
| Service | Port | Purpose | Status |
|
||||
|---------|------|---------|--------|
|
||||
| **Nginx Proxy Manager** | 80/443 | SSL termination & routing | Required |
|
||||
| **Authentik** | 9000 | SSO authentication | Required |
|
||||
| **Prometheus** | 9090 | Metrics collection | Required |
|
||||
| **Grafana** | 3000 | Dashboards | Required |
|
||||
| **Alertmanager** | 9093 | Alert routing | Required |
|
||||
|
||||
### Additional Services
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| AdGuard | 3053 | DNS filtering (backup) |
|
||||
| Paperless-NGX | 8000 | Document management |
|
||||
| Reactive Resume | 3001 | Resume builder |
|
||||
| Gitea | 3000/22 | Git hosting |
|
||||
| Gitea Runner | 3008 | CI/CD |
|
||||
| Headscale | 8080 | WireGuard VPN controller |
|
||||
| Seafile | 8082 | File sync & share |
|
||||
| Syncthing | 8384 | File sync |
|
||||
| WireGuard | 51820 | VPN server |
|
||||
| Portainer Agent | 9001 | Container management |
|
||||
|
||||
### Media (ARR Stack)
|
||||
|
||||
- Sonarr, Radarr, Lidarr
|
||||
- Prowlarr (indexers)
|
||||
- Bazarr (subtitles)
|
||||
|
||||
---
|
||||
|
||||
## Storage Layout
|
||||
|
||||
```
|
||||
/volume1/
|
||||
├── docker/
|
||||
├── docker/compose/
|
||||
├── appdata/ # Application data
|
||||
│ ├── authentik/
|
||||
│ ├── npm/
|
||||
│ ├── prometheus/
|
||||
│ └── grafana/
|
||||
├── documents/ # Paperless
|
||||
├── seafile/ # Seafile data
|
||||
└── backups/ # Backup destination
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Daily Operations
|
||||
|
||||
### Check Service Health
|
||||
```bash
|
||||
# Via Portainer
|
||||
open http://calypso.vish.local:9001
|
||||
|
||||
# Via SSH
|
||||
ssh admin@calypso.vish.local
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
||||
```
|
||||
|
||||
### Monitor Critical Services
|
||||
```bash
|
||||
# Check NPM
|
||||
curl -I http://localhost:80
|
||||
|
||||
# Check Authentik
|
||||
curl -I http://localhost:9000
|
||||
|
||||
# Check Prometheus
|
||||
curl -I http://localhost:9090
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### NPM Not Routing
|
||||
1. Check if NPM is running: `docker ps | grep npm`
|
||||
2. Verify proxy hosts configured: Access NPM UI → Proxy Hosts
|
||||
3. Check SSL certificates
|
||||
4. Review NPM logs: `docker logs nginx-proxy-manager`
|
||||
|
||||
### Authentik SSO Broken
|
||||
1. Check Authentik running: `docker ps | grep authentik`
|
||||
2. Verify PostgreSQL: `docker logs authentik-postgresql`
|
||||
3. Check Redis: `docker logs authentik-redis`
|
||||
4. Review OIDC configurations in services
|
||||
|
||||
### Prometheus Down
|
||||
1. Check storage: `docker system df`
|
||||
2. Verify volume: `docker volume ls | grep prometheus`
|
||||
3. Check retention settings
|
||||
4. Review logs: `docker logs prometheus`
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Weekly
|
||||
- [ ] Verify Authentik users can login
|
||||
- [ ] Check Prometheus metrics collection
|
||||
- [ ] Review Alertmanager notifications
|
||||
- [ ] Verify NPM certificates
|
||||
|
||||
### Monthly
|
||||
- [ ] Clean unused Docker images
|
||||
- [ ] Review Prometheus retention
|
||||
- [ ] Update applications
|
||||
- [ ] Check disk usage
|
||||
|
||||
### Quarterly
|
||||
- [ ] Test OAuth flows
|
||||
- [ ] Verify backup restoration
|
||||
- [ ] Review monitoring thresholds
|
||||
- [ ] Update SSL certificates
|
||||
|
||||
---
|
||||
|
||||
## SSL Certificate Management
|
||||
|
||||
NPM handles all SSL certificates:
|
||||
|
||||
1. **Automatic Renewal**: Let's Encrypt (default)
|
||||
2. **Manual**: Access NPM → SSL Certificates → Add
|
||||
3. **Check Status**: NPM Dashboard → SSL
|
||||
|
||||
### Common Certificate Issues
|
||||
- Rate limits: Wait 1 hour between requests
|
||||
- DNS challenge: Verify external DNS
|
||||
- Self-signed: Use for internal services
|
||||
|
||||
---
|
||||
|
||||
## Backup Procedures
|
||||
|
||||
### Configuration Backup
|
||||
```bash
|
||||
# Via Ansible
|
||||
ansible-playbook ansible/automation/playbooks/backup_configs.yml --tags calypso
|
||||
```
|
||||
|
||||
### Key Data to Backup
|
||||
- NPM configurations: `/volume1/docker/compose/nginx_proxy_manager/`
|
||||
- Authentik: `/volume1/docker/appdata/authentik/`
|
||||
- Prometheus: `/volume1/docker/appdata/prometheus/`
|
||||
- Grafana: `/volume1/docker/appdata/grafana/`
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Authentik Down
|
||||
**Impact**: SSO broken for all services
|
||||
|
||||
1. Verify containers running
|
||||
2. Check PostgreSQL: `docker logs authentik-postgresql`
|
||||
3. Check Redis: `docker logs authentik-redis`
|
||||
4. Restart Authentik: `docker-compose restart`
|
||||
5. If needed, restore from backup
|
||||
|
||||
### NPM Down
|
||||
**Impact**: No external access
|
||||
|
||||
1. Verify container: `docker ps | grep npm`
|
||||
2. Check ports 80/443: `netstat -tulpn | grep -E '80|443'`
|
||||
3. Restart: `docker-compose restart`
|
||||
4. Check DNS resolution
|
||||
|
||||
### Prometheus Full
|
||||
**Impact**: No metrics
|
||||
|
||||
1. Check storage: `docker system df`
|
||||
2. Reduce retention: Edit prometheus.yml
|
||||
3. Clean old data: `docker exec prometheus promtool tsdb delete-insufficient`
|
||||
4. Restart container
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# SSH access
|
||||
ssh admin@calypso.vish.local
|
||||
|
||||
# Check critical services
|
||||
docker ps --filter "name=nginx" --filter "name=authentik" --filter "name=prometheus"
|
||||
|
||||
# Restart infrastructure
|
||||
cd /volume1/docker/compose/nginx_proxy_manager && docker-compose restart
|
||||
cd /volume1/docker/compose/authentik && docker-compose restart
|
||||
|
||||
# View logs
|
||||
docker logs -f nginx-proxy-manager
|
||||
docker logs -f authentik-server
|
||||
docker logs -f prometheus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Synology DSM](https://calypso.vish.local:5001)
|
||||
- [Nginx Proxy Manager](http://calypso.vish.local:81)
|
||||
- [Authentik](http://calypso.vish.local:9000)
|
||||
- [Prometheus](http://calypso.vish.local:9090)
|
||||
- [Grafana](http://calypso.vish.local:3000)
|
||||
- [Alertmanager](http://calypso.vish.local:9093)
|
||||
244
docs/infrastructure/hosts/concord-nuc-runbook.md
Normal file
244
docs/infrastructure/hosts/concord-nuc-runbook.md
Normal file
@@ -0,0 +1,244 @@
|
||||
# Concord NUC Runbook
|
||||
|
||||
*Intel NUC6i3SYB - Home Automation & DNS*
|
||||
|
||||
**Endpoint ID:** 443398
|
||||
**Status:** 🟢 Online
|
||||
**Hardware:** Intel Core i3-6100U, 16GB RAM, 256GB SSD
|
||||
**Access:** `concordnuc.vish.local`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Concord NUC runs lightweight services focused on home automation, DNS filtering, and local network services.
|
||||
|
||||
## Hardware Specs
|
||||
|
||||
| Component | Specification |
|
||||
|----------|---------------|
|
||||
| Model | Intel NUC6i3SYB |
|
||||
| CPU | Intel Core i3-6100U (2-core) |
|
||||
| RAM | 16GB |
|
||||
| Storage | 256GB SSD |
|
||||
| Network | 1x 1GbE |
|
||||
|
||||
## Services
|
||||
|
||||
### Critical Services
|
||||
|
||||
| Service | Port | Purpose | Docker Image |
|
||||
|---------|------|---------|---------------|
|
||||
| **AdGuard Home** | 3053/53 | DNS filtering | adguard/adguardhome |
|
||||
| **Home Assistant** | 8123 | Home automation | homeassistant/home-assistant |
|
||||
| **Matter Server** | 5580 | Matter protocol | matter-server/matter-server |
|
||||
|
||||
### Additional Services
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| Plex | 32400 | Media server |
|
||||
| Invidious | 2999 | YouTube frontend |
|
||||
| Piped | 1234 | YouTube music |
|
||||
| Syncthing | 8384 | File sync |
|
||||
| WireGuard | 51820 | VPN server |
|
||||
| Portainer Agent | 9001 | Container management |
|
||||
| Node Exporter | 9100 | Metrics |
|
||||
|
||||
---
|
||||
|
||||
## Network Position
|
||||
|
||||
```
|
||||
Internet
|
||||
│
|
||||
▼
|
||||
[Home Router] ──WAN──► (Public IP)
|
||||
│
|
||||
├─► [Pi-hole Primary]
|
||||
│
|
||||
└─► [AdGuard Home] ──► Local DNS
|
||||
│
|
||||
▼
|
||||
[Home Assistant] ──► Zigbee/Z-Wave
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Daily Operations
|
||||
|
||||
### Check Service Health
|
||||
```bash
|
||||
# Via Portainer
|
||||
open http://concordnuc.vish.local:9001
|
||||
|
||||
# Via SSH
|
||||
ssh homelab@concordnuc.vish.local
|
||||
docker ps
|
||||
```
|
||||
|
||||
### Home Assistant
|
||||
```bash
|
||||
# Access UI
|
||||
open http://concordnuc.vish.local:8123
|
||||
|
||||
# Check logs
|
||||
docker logs homeassistant
|
||||
```
|
||||
|
||||
### AdGuard Home
|
||||
```bash
|
||||
# Access UI
|
||||
open http://concordnuc.vish.local:3053
|
||||
|
||||
# Check DNS filtering
|
||||
# Admin → Dashboard → DNS Queries
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Home Assistant Won't Start
|
||||
1. Check logs: `docker logs homeassistant`
|
||||
2. Verify config: `config/configuration.yaml`
|
||||
3. Check Zigbee/Z-Wave stick
|
||||
4. Restore from backup if needed
|
||||
|
||||
### AdGuard Not Filtering
|
||||
1. Check service: `docker ps | grep adguard`
|
||||
2. Verify DNS settings on router
|
||||
3. Check filter lists: Admin → Filters
|
||||
4. Review query log
|
||||
|
||||
### No Network Connectivity
|
||||
1. Check Docker: `systemctl status docker`
|
||||
2. Verify network: `ip addr`
|
||||
3. Check firewall: `sudo ufw status`
|
||||
|
||||
---
|
||||
|
||||
## Home Assistant Configuration
|
||||
|
||||
### Add-ons Running
|
||||
- Zigbee2MQTT
|
||||
- Z-Wave JS UI
|
||||
- File editor
|
||||
- Terminal
|
||||
|
||||
### Backup
|
||||
```bash
|
||||
# Manual backup via UI
|
||||
Configuration → Backups → Create backup
|
||||
|
||||
# Automated to Synology
|
||||
Syncthing → Backups/homeassistant/
|
||||
```
|
||||
|
||||
### Restoration
|
||||
1. Access HA in safe mode
|
||||
2. Configuration → Backups
|
||||
3. Select backup → Restore
|
||||
|
||||
---
|
||||
|
||||
## AdGuard Home Configuration
|
||||
|
||||
### DNS Providers
|
||||
- Cloudflare: 1.1.1.1
|
||||
- Google: 8.8.8.8
|
||||
|
||||
### Blocklists Enabled
|
||||
- AdGuard Default
|
||||
- AdAway
|
||||
- Malware domains
|
||||
|
||||
### Query Log
|
||||
Access: Admin → Logs
|
||||
- Useful for debugging DNS issues
|
||||
- Check for blocked domains
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Weekly
|
||||
- [ ] Check HA logs for errors
|
||||
- [ ] Review AdGuard query log
|
||||
- [ ] Verify backups completed
|
||||
|
||||
### Monthly
|
||||
- [ ] Update Home Assistant
|
||||
- [ ] Review AdGuard filters
|
||||
- [ ] Clean unused Docker images
|
||||
|
||||
### Quarterly
|
||||
- [ ] Test automation reliability
|
||||
- [ ] Review device states
|
||||
- [ ] Check Zigbee network health
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Home Assistant Down
|
||||
**Impact**: Smart home controls unavailable
|
||||
|
||||
1. Check container: `docker ps | grep homeassistant`
|
||||
2. Restart: `docker-compose restart`
|
||||
3. Check logs: `docker logs homeassistant`
|
||||
4. If corrupted, restore from backup
|
||||
|
||||
### AdGuard Down
|
||||
**Impact**: DNS issues on network
|
||||
|
||||
1. Verify: `dig google.com @localhost`
|
||||
2. Restart: `docker-compose restart`
|
||||
3. Check config in UI
|
||||
4. Fallback to Pi-hole
|
||||
|
||||
### Complete Hardware Failure
|
||||
1. Replace NUC hardware
|
||||
2. Reinstall Ubuntu/Debian
|
||||
3. Run deploy playbook:
|
||||
```bash
|
||||
ansible-playbook ansible/homelab/playbooks/deploy_concord_nuc.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# SSH access
|
||||
ssh homelab@concordnuc.vish.local
|
||||
|
||||
# Restart services
|
||||
docker-compose -f /opt/docker/compose/homeassistant.yaml restart
|
||||
docker-compose -f /opt/docker/compose/adguard.yaml restart
|
||||
|
||||
# View logs
|
||||
docker logs -f homeassistant
|
||||
docker logs -f adguard
|
||||
|
||||
# Check resource usage
|
||||
docker stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Device Access
|
||||
|
||||
| Device | Protocol | Address |
|
||||
|--------|----------|---------|
|
||||
| Zigbee Coordinator | USB | /dev/serial/by-id/* |
|
||||
| Z-Wave Controller | USB | /dev/serial/by-id/* |
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Home Assistant](http://concordnuc.vish.local:8123)
|
||||
- [AdGuard Home](http://concordnuc.vish.local:3053)
|
||||
- [Plex](http://concordnuc.vish.local:32400)
|
||||
- [Invidious](http://concordnuc.vish.local:2999)
|
||||
218
docs/infrastructure/hosts/homelab-vm-runbook.md
Normal file
218
docs/infrastructure/hosts/homelab-vm-runbook.md
Normal file
@@ -0,0 +1,218 @@
|
||||
# Homelab VM Runbook
|
||||
|
||||
*Proxmox VM - Monitoring & DevOps*
|
||||
|
||||
**Endpoint ID:** 443399
|
||||
**Status:** 🟢 Online
|
||||
**Hardware:** 4 vCPU, 28GB RAM
|
||||
**Access:** `192.168.0.210`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Homelab VM runs monitoring, alerting, and development services on Proxmox.
|
||||
|
||||
## Hardware Specs
|
||||
|
||||
| Component | Specification |
|
||||
|----------|---------------|
|
||||
| Platform | Proxmox VE |
|
||||
| vCPU | 4 cores |
|
||||
| RAM | 28GB |
|
||||
| Storage | 100GB SSD |
|
||||
| Network | 1x 1GbE |
|
||||
|
||||
## Services
|
||||
|
||||
### Monitoring Stack
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| **Prometheus** | 9090 | Metrics collection |
|
||||
| **Grafana** | 3000 | Dashboards |
|
||||
| **Alertmanager** | 9093 | Alert routing |
|
||||
| **Node Exporter** | 9100 | System metrics |
|
||||
| **cAdvisor** | 8080 | Container metrics |
|
||||
| **Uptime Kuma** | 3001 | Uptime monitoring |
|
||||
|
||||
### Development
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| Gitea | 3000 | Git hosting |
|
||||
| Gitea Runner | 3008 | CI/CD runner |
|
||||
| OpenHands | 8000 | AI developer |
|
||||
|
||||
### Database
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| PostgreSQL | 5432 | Database |
|
||||
| Redis | 6379 | Caching |
|
||||
|
||||
---
|
||||
|
||||
## Daily Operations
|
||||
|
||||
### Check Monitoring
|
||||
```bash
|
||||
# Prometheus targets
|
||||
curl http://192.168.0.210:9090/api/v1/targets | jq
|
||||
|
||||
# Grafana dashboards
|
||||
open http://192.168.0.210:3000
|
||||
```
|
||||
|
||||
### Alert Status
|
||||
```bash
|
||||
# Alertmanager
|
||||
open http://192.168.0.210:9093
|
||||
|
||||
# Check ntfy for alerts
|
||||
curl -s ntfy.vish.local/homelab-alerts | head -20
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Prometheus Configuration
|
||||
|
||||
### Scraping Targets
|
||||
- Node exporters (all hosts)
|
||||
- cAdvisor (all hosts)
|
||||
- Prometheus self-monitoring
|
||||
- Application-specific metrics
|
||||
|
||||
### Retention
|
||||
- Time: 30 days
|
||||
- Storage: 20GB
|
||||
|
||||
### Maintenance
|
||||
```bash
|
||||
# Check TSDB size
|
||||
du -sh /var/lib/prometheus/
|
||||
|
||||
# Manual compaction
|
||||
docker exec prometheus promtool tsdb compact /prometheus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Grafana Dashboards
|
||||
|
||||
### Key Dashboards
|
||||
- Infrastructure Overview
|
||||
- Container Health
|
||||
- Network Traffic
|
||||
- Service-specific metrics
|
||||
|
||||
### Alert Rules
|
||||
- CPU > 80% for 5 minutes
|
||||
- Memory > 90% for 5 minutes
|
||||
- Disk > 85%
|
||||
- Service down > 2 minutes
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Prometheus Not Scraping
|
||||
1. Check targets: Prometheus UI → Status → Targets
|
||||
2. Verify network connectivity
|
||||
3. Check firewall rules
|
||||
4. Review scrape errors in logs
|
||||
|
||||
### Grafana Dashboards Slow
|
||||
1. Check Prometheus query performance
|
||||
2. Reduce time range
|
||||
3. Optimize queries
|
||||
4. Check resource usage
|
||||
|
||||
### Alerts Not Firing
|
||||
1. Verify Alertmanager config
|
||||
2. Check ntfy integration
|
||||
3. Review alert rules syntax
|
||||
4. Test with artificial alert
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Weekly
|
||||
- [ ] Review alert history
|
||||
- [ ] Check disk space
|
||||
- [ ] Verify backups
|
||||
|
||||
### Monthly
|
||||
- [ ] Clean old metrics
|
||||
- [ ] Update dashboards
|
||||
- [ ] Review alert thresholds
|
||||
|
||||
### Quarterly
|
||||
- [ ] Test alert notifications
|
||||
- [ ] Review retention policy
|
||||
- [ ] Optimize queries
|
||||
|
||||
---
|
||||
|
||||
## Backup Procedures
|
||||
|
||||
### Configuration
|
||||
```bash
|
||||
# Grafana dashboards
|
||||
cp -r /opt/grafana/dashboards /backup/
|
||||
|
||||
# Prometheus rules
|
||||
cp -r /opt/prometheus/rules /backup/
|
||||
```
|
||||
|
||||
### Ansible
|
||||
```bash
|
||||
ansible-playbook ansible/automation/playbooks/backup_configs.yml --tags homelab_vm
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Prometheus Full
|
||||
1. Check storage: `docker system df`
|
||||
2. Reduce retention in prometheus.yml
|
||||
3. Delete old data: `docker exec prometheus rm -rf /prometheus/wal/*`
|
||||
4. Restart container
|
||||
|
||||
### VM Down
|
||||
1. Check Proxmox: `qm list`
|
||||
2. Start VM: `qm start <vmid>`
|
||||
3. Check console: `qm terminal <vmid>`
|
||||
4. Review logs in Proxmox UI
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# SSH access
|
||||
ssh homelab@192.168.0.210
|
||||
|
||||
# Restart monitoring
|
||||
cd /opt/docker/prometheus && docker-compose restart
|
||||
cd /opt/docker/grafana && docker-compose restart
|
||||
|
||||
# Check targets
|
||||
curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health=="down")'
|
||||
|
||||
# View logs
|
||||
docker logs prometheus
|
||||
docker logs grafana
|
||||
docker logs alertmanager
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Prometheus](http://192.168.0.210:9090)
|
||||
- [Grafana](http://192.168.0.210:3000)
|
||||
- [Alertmanager](http://192.168.0.210:9093)
|
||||
- [Uptime Kuma](http://192.168.0.210:3001)
|
||||
179
docs/infrastructure/hosts/rpi5-runbook.md
Normal file
179
docs/infrastructure/hosts/rpi5-runbook.md
Normal file
@@ -0,0 +1,179 @@
|
||||
# RPi5 Runbook
|
||||
|
||||
*Raspberry Pi 5 - Edge Services*
|
||||
|
||||
**Endpoint ID:** 443395
|
||||
**Status:** 🟢 Online
|
||||
**Hardware:** ARM Cortex-A76, 16GB RAM, 512GB USB SSD
|
||||
**Access:** `rpi5-vish.local`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Raspberry Pi 5 runs edge services including Immich backup and lightweight applications.
|
||||
|
||||
## Hardware Specs
|
||||
|
||||
| Component | Specification |
|
||||
|----------|---------------|
|
||||
| Model | Raspberry Pi 5 |
|
||||
| CPU | ARM Cortex-A76 (4-core) |
|
||||
| RAM | 16GB |
|
||||
| Storage | 512GB USB-C SSD |
|
||||
| Network | 1x 1GbE (Pi 4 adapter) |
|
||||
|
||||
## Services
|
||||
|
||||
### Primary Services
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| **Immich** | 2283 | Photo backup (edge) |
|
||||
| Portainer Agent | 9001 | Container management |
|
||||
| Node Exporter | 9100 | Metrics |
|
||||
|
||||
### Services (if enabled)
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| Plex | 32400 | Media server |
|
||||
| WireGuard | 51820 | VPN |
|
||||
|
||||
## Secondary Pi Nodes
|
||||
|
||||
### Pi-5-Kevin
|
||||
This is a secondary Raspberry Pi 5 node with identical specifications but not typically online.
|
||||
|
||||
- **CPU**: Broadcom BCM2712 (4-core, 2.4GHz)
|
||||
- **RAM**: 8GB LPDDR4X
|
||||
- **Storage**: 64GB microSD
|
||||
- **Network**: Gigabit Ethernet + WiFi 6
|
||||
|
||||
---
|
||||
|
||||
## Daily Operations
|
||||
|
||||
### Check Service Health
|
||||
```bash
|
||||
# Via Portainer
|
||||
open http://rpi5-vish.local:9001
|
||||
|
||||
# Via SSH
|
||||
ssh pi@rpi5-vish.local
|
||||
docker ps
|
||||
```
|
||||
|
||||
### Immich Status
|
||||
```bash
|
||||
# Access UI
|
||||
open http://rpi5-vish.local:2283
|
||||
|
||||
# Check sync status
|
||||
docker logs immich-server | grep -i sync
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Container Won't Start (ARM compatibility)
|
||||
1. Verify image supports ARM64: `docker pull --platform linux/arm64 <image>`
|
||||
2. Check container logs
|
||||
3. Verify Raspberry Pi OS 64-bit
|
||||
|
||||
### Storage Slow
|
||||
1. Check USB drive: `lsusb`
|
||||
2. Verify SSD: `sudo hdparm -t /dev/sda`
|
||||
3. Use fast USB port (USB-C)
|
||||
|
||||
### Network Issues
|
||||
1. Check adapter compatibility
|
||||
2. Verify driver loaded: `lsmod | grep smsc95xx`
|
||||
3. Update firmware: `sudo rpi-eeprom-update`
|
||||
|
||||
---
|
||||
|
||||
## Storage
|
||||
|
||||
### Layout
|
||||
```
|
||||
/home/pi/
|
||||
├── docker/ # Docker data
|
||||
├── immich/ # Photo storage
|
||||
└── backups/ # Local backups
|
||||
```
|
||||
|
||||
### Performance Tips
|
||||
- Use USB 3.0 SSD
|
||||
- Usequality power supply (5V 5A)
|
||||
- Enable USB max_current in config.txt
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Weekly
|
||||
- [ ] Check Docker disk usage
|
||||
- [ ] Verify Immich backup
|
||||
- [ ] Check container health
|
||||
|
||||
### Monthly
|
||||
- [ ] Update Raspberry Pi OS
|
||||
- [ ] Clean unused images
|
||||
- [ ] Review resource usage
|
||||
|
||||
### Quarterly
|
||||
- [ ] Test backup restoration
|
||||
- [ ] Verify ARM image compatibility
|
||||
- [ ] Check firmware updates
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### SD Card/Storage Failure
|
||||
1. Replace storage drive
|
||||
2. Reinstall Raspberry Pi OS
|
||||
3. Run deploy playbook:
|
||||
```bash
|
||||
ansible-playbook ansible/homelab/playbooks/deploy_rpi5_vish.yml
|
||||
```
|
||||
|
||||
### Overheating
|
||||
1. Add heatsinks
|
||||
2. Enable fan
|
||||
3. Reduce CPU frequency: `sudo echo "arm_freq=1800" >> /boot/config.txt`
|
||||
|
||||
## Notes
|
||||
|
||||
This Raspberry Pi 5 system is the primary node that runs Immich and other services, with the secondary node **pi-5-kevin** intentionally kept offline for backup purposes when needed.
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# SSH access
|
||||
ssh pi@rpi5-vish.local
|
||||
|
||||
# Check temperature
|
||||
vcgencmd measure_temp
|
||||
|
||||
# Check throttling
|
||||
vcgencmd get_throttled
|
||||
|
||||
# Update firmware
|
||||
sudo rpi-eeprom-update
|
||||
sudo rpi-eeprom-update -a
|
||||
|
||||
# View Immich logs
|
||||
docker logs -f immich-server
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Immich](http://rpi5-vish.local:2283)
|
||||
- [Portainer](http://rpi5-vish.local:9001)
|
||||
66
docs/infrastructure/hosts/runbooks.md
Normal file
66
docs/infrastructure/hosts/runbooks.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Host Runbooks
|
||||
|
||||
This directory contains operational runbooks for each host in the homelab infrastructure.
|
||||
|
||||
## Available Runbooks
|
||||
|
||||
- [Atlantis Runbook](./atlantis-runbook.md) - Synology DS1821+ (Primary NAS)
|
||||
- [Calypso Runbook](./calypso-runbook.md) - Synology DS723+ (Secondary NAS)
|
||||
- [Concord NUC Runbook](./concord-nuc-runbook.md) - Intel NUC (Home Automation & DNS)
|
||||
- [Homelab VM Runbook](./homelab-vm-runbook.md) - Proxmox VM (Monitoring & DevOps)
|
||||
- [RPi5 Runbook](./rpi5-runbook.md) - Raspberry Pi 5 (Edge Services)
|
||||
|
||||
---
|
||||
|
||||
## Common Tasks
|
||||
|
||||
All hosts share common operational procedures:
|
||||
|
||||
### Viewing Logs
|
||||
```bash
|
||||
# Via SSH to host
|
||||
docker logs <container_name>
|
||||
|
||||
# Via Portainer
|
||||
Portainer → Containers → <container> → Logs
|
||||
```
|
||||
|
||||
### Restarting Services
|
||||
```bash
|
||||
# Via docker-compose
|
||||
cd hosts/<host>/<service>
|
||||
docker-compose restart <service>
|
||||
|
||||
# Via Portainer
|
||||
Portainer → Stacks → <stack> → Restart
|
||||
```
|
||||
|
||||
### Checking Resource Usage
|
||||
```bash
|
||||
# Via Portainer
|
||||
Portainer → Containers → Sort by CPU/Memory
|
||||
|
||||
# Via CLI
|
||||
docker stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Contacts
|
||||
|
||||
| Role | Contact | When to Contact |
|
||||
|------|---------|------------------|
|
||||
| Primary Admin | User | All critical issues |
|
||||
| Emergency | NTFY | Critical alerts only |
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Host | Primary Role | Critical Services | SSH Access |
|
||||
|------|--------------|-------------------|------------|
|
||||
| Atlantis | Media, Vault | Vaultwarden, Plex, Immich | atlantis.local |
|
||||
| Calypso | Infrastructure | NPM, Authentik, Prometheus | calypso.local |
|
||||
| Concord NUC | DNS, HA | AdGuard, Home Assistant | concord-nuc.local |
|
||||
| Homelab VM | Monitoring | Prometheus, Grafana | 192.168.0.210 |
|
||||
| RPi5 | Edge | Immich (backup) | rpi5-vish.local |
|
||||
931
docs/infrastructure/kubernetes-cluster-setup.md
Normal file
931
docs/infrastructure/kubernetes-cluster-setup.md
Normal file
@@ -0,0 +1,931 @@
|
||||
# ☸️ Kubernetes Cluster Setup Guide
|
||||
|
||||
**🔴 Advanced Guide**
|
||||
|
||||
This guide covers deploying and managing a production-ready Kubernetes cluster in your homelab, including high availability, storage, networking, and service deployment.
|
||||
|
||||
## 🎯 Kubernetes Architecture for Homelab
|
||||
|
||||
### **Cluster Design**
|
||||
```bash
|
||||
# Recommended cluster topology:
|
||||
|
||||
# Control Plane Nodes (3 nodes for HA)
|
||||
k8s-master-01: 192.168.10.201 (Concord-NUC)
|
||||
k8s-master-02: 192.168.10.202 (Homelab-VM)
|
||||
k8s-master-03: 192.168.10.203 (Chicago-VM)
|
||||
|
||||
# Worker Nodes (3+ nodes)
|
||||
k8s-worker-01: 192.168.10.211 (Bulgaria-VM)
|
||||
k8s-worker-02: 192.168.10.212 (Guava)
|
||||
k8s-worker-03: 192.168.10.213 (Setillo)
|
||||
|
||||
# Storage Nodes (Ceph/Longhorn)
|
||||
k8s-storage-01: 192.168.10.221 (Atlantis)
|
||||
k8s-storage-02: 192.168.10.222 (Calypso)
|
||||
k8s-storage-03: 192.168.10.223 (Anubis)
|
||||
```
|
||||
|
||||
### **Resource Requirements**
|
||||
```bash
|
||||
# Control Plane Nodes (minimum)
|
||||
CPU: 2 cores
|
||||
RAM: 4 GB
|
||||
Storage: 50 GB SSD
|
||||
Network: 1 Gbps
|
||||
|
||||
# Worker Nodes (minimum)
|
||||
CPU: 4 cores
|
||||
RAM: 8 GB
|
||||
Storage: 100 GB SSD
|
||||
Network: 1 Gbps
|
||||
|
||||
# Storage Nodes (recommended)
|
||||
CPU: 4 cores
|
||||
RAM: 16 GB
|
||||
Storage: 500 GB+ SSD + additional storage
|
||||
Network: 10 Gbps (if available)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Cluster Installation
|
||||
|
||||
### **Method 1: kubeadm (Recommended for Learning)**
|
||||
|
||||
#### **Prerequisites on All Nodes**
|
||||
```bash
|
||||
# Update system
|
||||
sudo apt update && sudo apt upgrade -y
|
||||
|
||||
# Install required packages
|
||||
sudo apt install -y apt-transport-https ca-certificates curl gpg
|
||||
|
||||
# Disable swap
|
||||
sudo swapoff -a
|
||||
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab
|
||||
|
||||
# Load kernel modules
|
||||
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
|
||||
overlay
|
||||
br_netfilter
|
||||
EOF
|
||||
|
||||
sudo modprobe overlay
|
||||
sudo modprobe br_netfilter
|
||||
|
||||
# Configure sysctl
|
||||
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
|
||||
net.bridge.bridge-nf-call-iptables = 1
|
||||
net.bridge.bridge-nf-call-ip6tables = 1
|
||||
net.ipv4.ip_forward = 1
|
||||
EOF
|
||||
|
||||
sudo sysctl --system
|
||||
```
|
||||
|
||||
#### **Install Container Runtime (containerd)**
|
||||
```bash
|
||||
# Install containerd
|
||||
sudo apt install -y containerd
|
||||
|
||||
# Configure containerd
|
||||
sudo mkdir -p /etc/containerd
|
||||
containerd config default | sudo tee /etc/containerd/config.toml
|
||||
|
||||
# Enable SystemdCgroup
|
||||
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
|
||||
|
||||
# Restart containerd
|
||||
sudo systemctl restart containerd
|
||||
sudo systemctl enable containerd
|
||||
```
|
||||
|
||||
#### **Install Kubernetes Components**
|
||||
```bash
|
||||
# Add Kubernetes repository
|
||||
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
|
||||
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
|
||||
|
||||
# Install Kubernetes
|
||||
sudo apt update
|
||||
sudo apt install -y kubelet kubeadm kubectl
|
||||
sudo apt-mark hold kubelet kubeadm kubectl
|
||||
|
||||
# Enable kubelet
|
||||
sudo systemctl enable kubelet
|
||||
```
|
||||
|
||||
#### **Initialize First Control Plane Node**
|
||||
```bash
|
||||
# On k8s-master-01 (192.168.10.201)
|
||||
sudo kubeadm init \
|
||||
--control-plane-endpoint="k8s-api.vish.local:6443" \
|
||||
--upload-certs \
|
||||
--apiserver-advertise-address=192.168.10.201 \
|
||||
--pod-network-cidr=10.244.0.0/16 \
|
||||
--service-cidr=10.96.0.0/12
|
||||
|
||||
# Configure kubectl for root
|
||||
mkdir -p $HOME/.kube
|
||||
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
|
||||
sudo chown $(id -u):$(id -g) $HOME/.kube/config
|
||||
|
||||
# Save join commands (output from kubeadm init)
|
||||
# Control plane join command:
|
||||
kubeadm join k8s-api.vish.local:6443 --token TOKEN \
|
||||
--discovery-token-ca-cert-hash sha256:HASH \
|
||||
--control-plane --certificate-key CERT_KEY
|
||||
|
||||
# Worker join command:
|
||||
kubeadm join k8s-api.vish.local:6443 --token TOKEN \
|
||||
--discovery-token-ca-cert-hash sha256:HASH
|
||||
```
|
||||
|
||||
#### **Install CNI Plugin (Flannel)**
|
||||
```bash
|
||||
# Install Flannel for pod networking
|
||||
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
|
||||
|
||||
# Verify installation
|
||||
kubectl get pods -n kube-flannel
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
#### **Join Additional Control Plane Nodes**
|
||||
```bash
|
||||
# On k8s-master-02 and k8s-master-03
|
||||
# Use the control plane join command from kubeadm init output
|
||||
sudo kubeadm join k8s-api.vish.local:6443 --token TOKEN \
|
||||
--discovery-token-ca-cert-hash sha256:HASH \
|
||||
--control-plane --certificate-key CERT_KEY
|
||||
|
||||
# Configure kubectl
|
||||
mkdir -p $HOME/.kube
|
||||
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
|
||||
sudo chown $(id -u):$(id -g) $HOME/.kube/config
|
||||
```
|
||||
|
||||
#### **Join Worker Nodes**
|
||||
```bash
|
||||
# On all worker nodes
|
||||
# Use the worker join command from kubeadm init output
|
||||
sudo kubeadm join k8s-api.vish.local:6443 --token TOKEN \
|
||||
--discovery-token-ca-cert-hash sha256:HASH
|
||||
```
|
||||
|
||||
### **Method 2: k3s (Lightweight Alternative)**
|
||||
|
||||
#### **Install k3s Master**
|
||||
```bash
|
||||
# On first master node
|
||||
curl -sfL https://get.k3s.io | sh -s - server \
|
||||
--cluster-init \
|
||||
--disable traefik \
|
||||
--disable servicelb \
|
||||
--write-kubeconfig-mode 644 \
|
||||
--cluster-cidr=10.244.0.0/16 \
|
||||
--service-cidr=10.96.0.0/12
|
||||
|
||||
# Get node token
|
||||
sudo cat /var/lib/rancher/k3s/server/node-token
|
||||
```
|
||||
|
||||
#### **Join Additional Masters**
|
||||
```bash
|
||||
# On additional master nodes
|
||||
curl -sfL https://get.k3s.io | sh -s - server \
|
||||
--server https://192.168.10.201:6443 \
|
||||
--token NODE_TOKEN \
|
||||
--disable traefik \
|
||||
--disable servicelb
|
||||
|
||||
# Configure kubectl
|
||||
mkdir -p $HOME/.kube
|
||||
sudo cp /etc/rancher/k3s/k3s.yaml $HOME/.kube/config
|
||||
sudo chown $(id -u):$(id -g) $HOME/.kube/config
|
||||
```
|
||||
|
||||
#### **Join Worker Nodes**
|
||||
```bash
|
||||
# On worker nodes
|
||||
curl -sfL https://get.k3s.io | sh -s - agent \
|
||||
--server https://192.168.10.201:6443 \
|
||||
--token NODE_TOKEN
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🗄️ Storage Configuration
|
||||
|
||||
### **Longhorn Distributed Storage**
|
||||
|
||||
#### **Install Longhorn**
|
||||
```bash
|
||||
# Add Longhorn Helm repository
|
||||
helm repo add longhorn https://charts.longhorn.io
|
||||
helm repo update
|
||||
|
||||
# Create namespace
|
||||
kubectl create namespace longhorn-system
|
||||
|
||||
# Install Longhorn
|
||||
helm install longhorn longhorn/longhorn \
|
||||
--namespace longhorn-system \
|
||||
--set defaultSettings.defaultDataPath="/var/lib/longhorn" \
|
||||
--set defaultSettings.replicaCount=3 \
|
||||
--set defaultSettings.defaultDataLocality="best-effort"
|
||||
|
||||
# Verify installation
|
||||
kubectl get pods -n longhorn-system
|
||||
kubectl get storageclass
|
||||
```
|
||||
|
||||
#### **Configure Storage Classes**
|
||||
```bash
|
||||
# Create storage classes for different use cases
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-fast
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
parameters:
|
||||
numberOfReplicas: "2"
|
||||
staleReplicaTimeout: "2880"
|
||||
fromBackup: ""
|
||||
diskSelector: "ssd"
|
||||
nodeSelector: "storage"
|
||||
---
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: longhorn-bulk
|
||||
provisioner: driver.longhorn.io
|
||||
allowVolumeExpansion: true
|
||||
parameters:
|
||||
numberOfReplicas: "3"
|
||||
staleReplicaTimeout: "2880"
|
||||
fromBackup: ""
|
||||
diskSelector: "hdd"
|
||||
EOF
|
||||
```
|
||||
|
||||
### **NFS Storage (Alternative)**
|
||||
|
||||
#### **Setup NFS Server (on Atlantis)**
|
||||
```bash
|
||||
# Install NFS server
|
||||
sudo apt install nfs-kernel-server
|
||||
|
||||
# Create NFS exports
|
||||
sudo mkdir -p /volume1/k8s-storage/{pv,dynamic}
|
||||
sudo chown nobody:nogroup /volume1/k8s-storage/
|
||||
sudo chmod 777 /volume1/k8s-storage/
|
||||
|
||||
# Configure exports
|
||||
echo "/volume1/k8s-storage 192.168.10.0/24(rw,sync,no_subtree_check,no_root_squash)" | sudo tee -a /etc/exports
|
||||
|
||||
# Apply exports
|
||||
sudo exportfs -ra
|
||||
sudo systemctl restart nfs-kernel-server
|
||||
```
|
||||
|
||||
#### **Install NFS CSI Driver**
|
||||
```bash
|
||||
# Install NFS CSI driver
|
||||
helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts
|
||||
helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs \
|
||||
--namespace kube-system \
|
||||
--version v4.5.0
|
||||
|
||||
# Create NFS storage class
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: storage.k8s.io/v1
|
||||
kind: StorageClass
|
||||
metadata:
|
||||
name: nfs-csi
|
||||
provisioner: nfs.csi.k8s.io
|
||||
parameters:
|
||||
server: atlantis.vish.local
|
||||
share: /volume1/k8s-storage/dynamic
|
||||
reclaimPolicy: Delete
|
||||
volumeBindingMode: Immediate
|
||||
mountOptions:
|
||||
- nfsvers=4.1
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Networking Configuration
|
||||
|
||||
### **Install Ingress Controller (Nginx)**
|
||||
```bash
|
||||
# Add Nginx Ingress Helm repository
|
||||
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
|
||||
helm repo update
|
||||
|
||||
# Install Nginx Ingress Controller
|
||||
helm install ingress-nginx ingress-nginx/ingress-nginx \
|
||||
--namespace ingress-nginx \
|
||||
--create-namespace \
|
||||
--set controller.service.type=LoadBalancer \
|
||||
--set controller.service.loadBalancerIP=192.168.10.240 \
|
||||
--set controller.metrics.enabled=true \
|
||||
--set controller.podAnnotations."prometheus\.io/scrape"="true" \
|
||||
--set controller.podAnnotations."prometheus\.io/port"="10254"
|
||||
|
||||
# Verify installation
|
||||
kubectl get pods -n ingress-nginx
|
||||
kubectl get svc -n ingress-nginx
|
||||
```
|
||||
|
||||
### **Install MetalLB Load Balancer**
|
||||
```bash
|
||||
# Install MetalLB
|
||||
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.12/config/manifests/metallb-native.yaml
|
||||
|
||||
# Wait for MetalLB to be ready
|
||||
kubectl wait --namespace metallb-system \
|
||||
--for=condition=ready pod \
|
||||
--selector=app=metallb \
|
||||
--timeout=90s
|
||||
|
||||
# Configure IP address pool
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: metallb.io/v1beta1
|
||||
kind: IPAddressPool
|
||||
metadata:
|
||||
name: homelab-pool
|
||||
namespace: metallb-system
|
||||
spec:
|
||||
addresses:
|
||||
- 192.168.10.240-192.168.10.250
|
||||
---
|
||||
apiVersion: metallb.io/v1beta1
|
||||
kind: L2Advertisement
|
||||
metadata:
|
||||
name: homelab-l2
|
||||
namespace: metallb-system
|
||||
spec:
|
||||
ipAddressPools:
|
||||
- homelab-pool
|
||||
EOF
|
||||
```
|
||||
|
||||
### **Install Cert-Manager**
|
||||
```bash
|
||||
# Add Cert-Manager Helm repository
|
||||
helm repo add jetstack https://charts.jetstack.io
|
||||
helm repo update
|
||||
|
||||
# Install Cert-Manager
|
||||
helm install cert-manager jetstack/cert-manager \
|
||||
--namespace cert-manager \
|
||||
--create-namespace \
|
||||
--version v1.13.3 \
|
||||
--set installCRDs=true
|
||||
|
||||
# Create Let's Encrypt ClusterIssuer
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: letsencrypt-prod
|
||||
spec:
|
||||
acme:
|
||||
server: https://acme-v02.api.letsencrypt.org/directory
|
||||
email: admin@vish.local
|
||||
privateKeySecretRef:
|
||||
name: letsencrypt-prod
|
||||
solvers:
|
||||
- http01:
|
||||
ingress:
|
||||
class: nginx
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring and Observability
|
||||
|
||||
### **Install Prometheus Stack**
|
||||
```bash
|
||||
# Add Prometheus Helm repository
|
||||
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
|
||||
helm repo update
|
||||
|
||||
# Create monitoring namespace
|
||||
kubectl create namespace monitoring
|
||||
|
||||
# Install kube-prometheus-stack
|
||||
helm install prometheus prometheus-community/kube-prometheus-stack \
|
||||
--namespace monitoring \
|
||||
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=longhorn-fast \
|
||||
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
|
||||
--set grafana.persistence.enabled=true \
|
||||
--set grafana.persistence.storageClassName=longhorn-fast \
|
||||
--set grafana.persistence.size=10Gi \
|
||||
--set grafana.adminPassword="REDACTED_PASSWORD" \
|
||||
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.storageClassName=longhorn-fast \
|
||||
--set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi
|
||||
|
||||
# Verify installation
|
||||
kubectl get pods -n monitoring
|
||||
kubectl get svc -n monitoring
|
||||
```
|
||||
|
||||
### **Create Ingress for Monitoring Services**
|
||||
```bash
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: monitoring-ingress
|
||||
namespace: monitoring
|
||||
annotations:
|
||||
kubernetes.io/ingress.class: nginx
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
nginx.ingress.kubernetes.io/auth-type: basic
|
||||
nginx.ingress.kubernetes.io/auth-secret: basic-auth
|
||||
spec:
|
||||
tls:
|
||||
- hosts:
|
||||
- grafana.k8s.vish.local
|
||||
- prometheus.k8s.vish.local
|
||||
- alertmanager.k8s.vish.local
|
||||
secretName: monitoring-tls
|
||||
rules:
|
||||
- host: grafana.k8s.vish.local
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: prometheus-grafana
|
||||
port:
|
||||
number: 80
|
||||
- host: prometheus.k8s.vish.local
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: prometheus-kube-prometheus-prometheus
|
||||
port:
|
||||
number: 9090
|
||||
- host: alertmanager.k8s.vish.local
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: prometheus-kube-prometheus-alertmanager
|
||||
port:
|
||||
number: 9093
|
||||
EOF
|
||||
```
|
||||
|
||||
### **Install Logging Stack (ELK)**
|
||||
```bash
|
||||
# Add Elastic Helm repository
|
||||
helm repo add elastic https://helm.elastic.co
|
||||
helm repo update
|
||||
|
||||
# Install Elasticsearch
|
||||
helm install elasticsearch elastic/elasticsearch \
|
||||
--namespace logging \
|
||||
--create-namespace \
|
||||
--set replicas=3 \
|
||||
--set volumeClaimTemplate.storageClassName=longhorn-fast \
|
||||
--set volumeClaimTemplate.resources.requests.storage=100Gi
|
||||
|
||||
# Install Kibana
|
||||
helm install kibana elastic/kibana \
|
||||
--namespace logging \
|
||||
--set service.type=ClusterIP
|
||||
|
||||
# Install Filebeat
|
||||
helm install filebeat elastic/filebeat \
|
||||
--namespace logging \
|
||||
--set daemonset.enabled=true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Application Deployment
|
||||
|
||||
### **Migrate Docker Compose Services**
|
||||
|
||||
#### **Convert Docker Compose to Kubernetes**
|
||||
```bash
|
||||
# Install kompose for conversion
|
||||
curl -L https://github.com/kubernetes/kompose/releases/latest/download/kompose-linux-amd64 -o kompose
|
||||
chmod +x kompose
|
||||
sudo mv kompose /usr/local/bin
|
||||
|
||||
# Convert existing docker-compose files
|
||||
cd ~/homelab/Atlantis/uptime-kuma
|
||||
kompose convert -f docker-compose.yml
|
||||
|
||||
# Review and modify generated manifests
|
||||
# Add ingress, persistent volumes, etc.
|
||||
```
|
||||
|
||||
#### **Example: Uptime Kuma on Kubernetes**
|
||||
```bash
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: uptime-kuma
|
||||
namespace: monitoring
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: uptime-kuma
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: uptime-kuma
|
||||
spec:
|
||||
containers:
|
||||
- name: uptime-kuma
|
||||
image: louislam/uptime-kuma:1
|
||||
ports:
|
||||
- containerPort: 3001
|
||||
volumeMounts:
|
||||
- name: data
|
||||
mountPath: /app/data
|
||||
resources:
|
||||
requests:
|
||||
memory: "256Mi"
|
||||
cpu: "100m"
|
||||
limits:
|
||||
memory: "512Mi"
|
||||
cpu: "500m"
|
||||
volumes:
|
||||
- name: data
|
||||
persistentVolumeClaim:
|
||||
claimName: uptime-kuma-data
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: uptime-kuma-data
|
||||
namespace: monitoring
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteOnce
|
||||
storageClassName: longhorn-fast
|
||||
resources:
|
||||
requests:
|
||||
storage: 5Gi
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: uptime-kuma
|
||||
namespace: monitoring
|
||||
spec:
|
||||
selector:
|
||||
app: uptime-kuma
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 3001
|
||||
targetPort: 3001
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: uptime-kuma
|
||||
namespace: monitoring
|
||||
annotations:
|
||||
kubernetes.io/ingress.class: nginx
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
spec:
|
||||
tls:
|
||||
- hosts:
|
||||
- uptime.k8s.vish.local
|
||||
secretName: uptime-kuma-tls
|
||||
rules:
|
||||
- host: uptime.k8s.vish.local
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: uptime-kuma
|
||||
port:
|
||||
number: 3001
|
||||
EOF
|
||||
```
|
||||
|
||||
### **Helm Charts for Complex Applications**
|
||||
|
||||
#### **Create Custom Helm Chart**
|
||||
```bash
|
||||
# Create new Helm chart
|
||||
helm create homelab-app
|
||||
|
||||
# Directory structure:
|
||||
homelab-app/
|
||||
├── Chart.yaml
|
||||
├── values.yaml
|
||||
├── templates/
|
||||
│ ├── deployment.yaml
|
||||
│ ├── service.yaml
|
||||
│ ├── ingress.yaml
|
||||
│ └── pvc.yaml
|
||||
└── charts/
|
||||
|
||||
# Example values.yaml for homelab services:
|
||||
cat <<EOF > homelab-app/values.yaml
|
||||
replicaCount: 1
|
||||
|
||||
image:
|
||||
repository: nginx
|
||||
tag: latest
|
||||
pullPolicy: IfNotPresent
|
||||
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: 80
|
||||
|
||||
ingress:
|
||||
enabled: true
|
||||
className: nginx
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
hosts:
|
||||
- host: app.k8s.vish.local
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: app-tls
|
||||
hosts:
|
||||
- app.k8s.vish.local
|
||||
|
||||
persistence:
|
||||
enabled: true
|
||||
storageClass: longhorn-fast
|
||||
size: 10Gi
|
||||
|
||||
resources:
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 128Mi
|
||||
EOF
|
||||
|
||||
# Install chart
|
||||
helm install my-app ./homelab-app
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security Configuration
|
||||
|
||||
### **Pod Security Standards**
|
||||
```bash
|
||||
# Create Pod Security Policy
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: secure-apps
|
||||
labels:
|
||||
pod-security.kubernetes.io/enforce: restricted
|
||||
pod-security.kubernetes.io/audit: restricted
|
||||
pod-security.kubernetes.io/warn: restricted
|
||||
EOF
|
||||
```
|
||||
|
||||
### **Network Policies**
|
||||
```bash
|
||||
# Example: Deny all traffic by default
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: default-deny-all
|
||||
namespace: default
|
||||
spec:
|
||||
podSelector: {}
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
---
|
||||
# Allow ingress traffic
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-ingress
|
||||
namespace: default
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app: web-app
|
||||
policyTypes:
|
||||
- Ingress
|
||||
ingress:
|
||||
- from:
|
||||
- namespaceSelector:
|
||||
matchLabels:
|
||||
name: ingress-nginx
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 80
|
||||
EOF
|
||||
```
|
||||
|
||||
### **RBAC Configuration**
|
||||
```bash
|
||||
# Create service account for applications
|
||||
cat <<EOF | kubectl apply -f -
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: homelab-app
|
||||
namespace: default
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
namespace: default
|
||||
name: homelab-app-role
|
||||
rules:
|
||||
- apiGroups: [""]
|
||||
resources: ["pods", "services"]
|
||||
verbs: ["get", "list", "watch"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: homelab-app-binding
|
||||
namespace: default
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: homelab-app
|
||||
namespace: default
|
||||
roleRef:
|
||||
kind: Role
|
||||
name: homelab-app-role
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Cluster Management
|
||||
|
||||
### **Backup and Restore**
|
||||
|
||||
#### **etcd Backup**
|
||||
```bash
|
||||
# Create backup script
|
||||
cat <<EOF > /usr/local/bin/etcd-backup.sh
|
||||
#!/bin/bash
|
||||
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot-\$(date +%Y%m%d-%H%M%S).db \
|
||||
--endpoints=https://127.0.0.1:2379 \
|
||||
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
|
||||
--cert=/etc/kubernetes/pki/etcd/server.crt \
|
||||
--key=/etc/kubernetes/pki/etcd/server.key
|
||||
|
||||
# Keep only last 7 days of backups
|
||||
find /backup -name "etcd-snapshot-*.db" -mtime +7 -delete
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/etcd-backup.sh
|
||||
|
||||
# Schedule daily backups
|
||||
echo "0 2 * * * /usr/local/bin/etcd-backup.sh" | crontab -
|
||||
```
|
||||
|
||||
#### **Velero for Application Backup**
|
||||
```bash
|
||||
# Install Velero CLI
|
||||
wget https://github.com/vmware-tanzu/velero/releases/latest/download/velero-linux-amd64.tar.gz
|
||||
tar -xzf velero-linux-amd64.tar.gz
|
||||
sudo mv velero-*/velero /usr/local/bin/
|
||||
|
||||
# Install Velero server (using MinIO for storage)
|
||||
velero install \
|
||||
--provider aws \
|
||||
--plugins velero/velero-plugin-for-aws:v1.8.0 \
|
||||
--bucket velero-backups \
|
||||
--secret-file ./credentials-velero \
|
||||
--use-volume-snapshots=false \
|
||||
--backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://minio.vish.local:9000
|
||||
|
||||
# Create backup schedule
|
||||
velero schedule create daily-backup --schedule="0 1 * * *"
|
||||
```
|
||||
|
||||
### **Cluster Upgrades**
|
||||
```bash
|
||||
# Upgrade control plane nodes (one at a time)
|
||||
# 1. Drain node
|
||||
kubectl drain k8s-master-01 --ignore-daemonsets --delete-emptydir-data
|
||||
|
||||
# 2. Upgrade kubeadm
|
||||
sudo apt update
|
||||
sudo apt-mark unhold kubeadm
|
||||
sudo apt install kubeadm=1.29.x-00
|
||||
sudo apt-mark hold kubeadm
|
||||
|
||||
# 3. Upgrade cluster
|
||||
sudo kubeadm upgrade plan
|
||||
sudo kubeadm upgrade apply v1.29.x
|
||||
|
||||
# 4. Upgrade kubelet and kubectl
|
||||
sudo apt-mark unhold kubelet kubectl
|
||||
sudo apt install kubelet=1.29.x-00 kubectl=1.29.x-00
|
||||
sudo apt-mark hold kubelet kubectl
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl restart kubelet
|
||||
|
||||
# 5. Uncordon node
|
||||
kubectl uncordon k8s-master-01
|
||||
|
||||
# Repeat for other control plane nodes and workers
|
||||
```
|
||||
|
||||
### **Troubleshooting**
|
||||
```bash
|
||||
# Common troubleshooting commands
|
||||
kubectl get nodes -o wide
|
||||
kubectl get pods --all-namespaces
|
||||
kubectl describe node NODE_NAME
|
||||
kubectl logs -n kube-system POD_NAME
|
||||
|
||||
# Check cluster health
|
||||
kubectl get componentstatuses
|
||||
kubectl cluster-info
|
||||
kubectl get events --sort-by=.metadata.creationTimestamp
|
||||
|
||||
# Debug networking
|
||||
kubectl run debug --image=nicolaka/netshoot -it --rm -- /bin/bash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Migration Strategy
|
||||
|
||||
### **Phase 1: Cluster Setup**
|
||||
```bash
|
||||
☐ Plan cluster architecture and resource allocation
|
||||
☐ Install Kubernetes on all nodes
|
||||
☐ Configure networking and storage
|
||||
☐ Install monitoring and logging
|
||||
☐ Set up backup and disaster recovery
|
||||
☐ Configure security policies
|
||||
☐ Test cluster functionality
|
||||
```
|
||||
|
||||
### **Phase 2: Service Migration**
|
||||
```bash
|
||||
☐ Identify services suitable for Kubernetes
|
||||
☐ Convert Docker Compose to Kubernetes manifests
|
||||
☐ Create Helm charts for complex applications
|
||||
☐ Set up ingress and SSL certificates
|
||||
☐ Configure persistent storage
|
||||
☐ Test service functionality
|
||||
☐ Update DNS and load balancing
|
||||
```
|
||||
|
||||
### **Phase 3: Production Cutover**
|
||||
```bash
|
||||
☐ Migrate non-critical services first
|
||||
☐ Update monitoring and alerting
|
||||
☐ Test disaster recovery procedures
|
||||
☐ Migrate critical services during maintenance window
|
||||
☐ Update documentation and runbooks
|
||||
☐ Train team on Kubernetes operations
|
||||
☐ Decommission old Docker Compose services
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [Network Architecture](networking.md) - Network design and VLANs for Kubernetes
|
||||
- [Ubiquiti Enterprise Setup](ubiquiti-enterprise-setup.md) - Enterprise networking for cluster infrastructure
|
||||
- [Laptop Travel Setup](laptop-travel-setup.md) - Remote access to Kubernetes cluster
|
||||
- [Tailscale Setup Guide](tailscale-setup-guide.md) - VPN access to cluster services
|
||||
- [Disaster Recovery Guide](../troubleshooting/disaster-recovery.md) - Cluster backup and recovery
|
||||
- [Security Model](security.md) - Security architecture and policies
|
||||
|
||||
---
|
||||
|
||||
**💡 Pro Tip**: Start with a small, non-critical service migration to Kubernetes. Learn the platform gradually before moving mission-critical services. Kubernetes has a steep learning curve, but the benefits of container orchestration, scaling, and management are worth the investment for a growing homelab!
|
||||
723
docs/infrastructure/laptop-travel-setup.md
Normal file
723
docs/infrastructure/laptop-travel-setup.md
Normal file
@@ -0,0 +1,723 @@
|
||||
# 💻 Laptop Travel Setup Guide
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide covers setting up your laptop for secure travel with full homelab access, including Tailscale VPN tunneling through Atlantis for IP privacy, remote filesystem mounting, and zero-local-storage security practices.
|
||||
|
||||
## 🎯 Travel Security Philosophy
|
||||
|
||||
### **Zero Trust Laptop Model**
|
||||
- **No critical data stored locally** - Everything mounted from homelab
|
||||
- **Encrypted disk** - Full disk encryption for physical security
|
||||
- **VPN-only access** - All traffic routed through homelab
|
||||
- **Disposable mindset** - Laptop loss/theft has minimal impact
|
||||
- **Remote wipe capability** - Can be wiped remotely if compromised
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Tailscale Travel Configuration
|
||||
|
||||
### **Step 1: Install Tailscale on Laptop**
|
||||
|
||||
#### **Linux (Ubuntu/Debian)**
|
||||
```bash
|
||||
# Install Tailscale
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
|
||||
# Connect to your tailnet
|
||||
sudo tailscale up
|
||||
|
||||
# Verify connection
|
||||
tailscale status
|
||||
tailscale ip -4
|
||||
```
|
||||
|
||||
#### **macOS**
|
||||
```bash
|
||||
# Install via Homebrew
|
||||
brew install --cask tailscale
|
||||
|
||||
# Or download from: https://tailscale.com/download/mac
|
||||
# Launch Tailscale and sign in to your tailnet
|
||||
```
|
||||
|
||||
#### **Windows**
|
||||
```bash
|
||||
# Download from: https://tailscale.com/download/windows
|
||||
# Install and sign in to your tailnet
|
||||
# Run as administrator for best performance
|
||||
```
|
||||
|
||||
### **Step 2: Configure Exit Node (Atlantis)**
|
||||
|
||||
#### **On Atlantis (Exit Node Setup)**
|
||||
```bash
|
||||
# Enable IP forwarding
|
||||
echo 'net.ipv4.ip_forward = 1' | sudo tee -a /etc/sysctl.conf
|
||||
echo 'net.ipv6.conf.all.forwarding = 1' | sudo tee -a /etc/sysctl.conf
|
||||
sudo sysctl -p
|
||||
|
||||
# Advertise as exit node
|
||||
sudo tailscale up --advertise-exit-node
|
||||
|
||||
# Verify exit node status
|
||||
tailscale status
|
||||
```
|
||||
|
||||
#### **On Laptop (Use Exit Node)**
|
||||
```bash
|
||||
# Use Atlantis as exit node for all traffic
|
||||
tailscale up --exit-node=atlantis.vish.local
|
||||
|
||||
# Verify your public IP is now Atlantis
|
||||
curl ifconfig.me
|
||||
# Should show your home IP, not travel location IP
|
||||
|
||||
# Check routing
|
||||
tailscale status
|
||||
ip route | grep 100.64
|
||||
```
|
||||
|
||||
### **Step 3: Advanced Tailscale Configuration**
|
||||
|
||||
#### **Laptop-Specific Settings**
|
||||
```bash
|
||||
# Enable key expiry for security
|
||||
tailscale up --exit-node=atlantis.vish.local --auth-key=[auth-key] --timeout=24h
|
||||
|
||||
# Configure DNS to use homelab Pi-hole
|
||||
tailscale up --exit-node=atlantis.vish.local --accept-dns=true
|
||||
|
||||
# Disable key expiry warnings (optional)
|
||||
tailscale set --auto-update
|
||||
```
|
||||
|
||||
#### **Split Tunneling (Advanced)**
|
||||
```bash
|
||||
# Route only specific traffic through exit node
|
||||
# Create custom routing rules
|
||||
|
||||
# Route homelab traffic through Tailscale
|
||||
sudo ip route add 192.168.1.0/24 via $(tailscale ip -4) dev tailscale0
|
||||
|
||||
# Route specific services through exit node
|
||||
sudo ip route add 0.0.0.0/0 via $(tailscale ip -4 atlantis) dev tailscale0 table 100
|
||||
sudo ip rule add from $(tailscale ip -4) table 100
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 Remote Filesystem Mounting
|
||||
|
||||
### **SSHFS Setup (Recommended)**
|
||||
|
||||
#### **Install SSHFS**
|
||||
```bash
|
||||
# Ubuntu/Debian
|
||||
sudo apt install sshfs
|
||||
|
||||
# macOS
|
||||
brew install macfuse sshfs
|
||||
|
||||
# Windows (WSL)
|
||||
sudo apt install sshfs
|
||||
```
|
||||
|
||||
#### **Mount Homelab Filesystems**
|
||||
```bash
|
||||
# Create mount points
|
||||
mkdir -p ~/mounts/{atlantis,calypso,homelab-vm,projects,documents}
|
||||
|
||||
# Mount Atlantis (Primary NAS)
|
||||
sshfs vish@atlantis.vish.local:/volume1/homes/vish ~/mounts/atlantis \
|
||||
-o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3,follow_symlinks
|
||||
|
||||
# Mount Calypso (Media NAS)
|
||||
sshfs vish@calypso.vish.local:/volume1/media ~/mounts/calypso \
|
||||
-o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3
|
||||
|
||||
# Mount Homelab VM (Development)
|
||||
sshfs vish@homelab-vm.vish.local:/home/vish/projects ~/mounts/projects \
|
||||
-o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3
|
||||
|
||||
# Mount Documents (Secure storage)
|
||||
sshfs vish@atlantis.vish.local:/volume1/documents ~/mounts/documents \
|
||||
-o reconnect,ServerAliveInterval=15,ServerAliveCountMax=3
|
||||
```
|
||||
|
||||
#### **Automated Mounting Script**
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# ~/scripts/mount-homelab.sh
|
||||
|
||||
set -e
|
||||
|
||||
MOUNTS_DIR="$HOME/mounts"
|
||||
LOG_FILE="$HOME/.homelab-mounts.log"
|
||||
|
||||
log() {
|
||||
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" | tee -a "$LOG_FILE"
|
||||
}
|
||||
|
||||
mount_fs() {
|
||||
local name="$1"
|
||||
local remote="$2"
|
||||
local local_path="$3"
|
||||
local options="$4"
|
||||
|
||||
if mountpoint -q "$local_path"; then
|
||||
log "✅ $name already mounted"
|
||||
return 0
|
||||
fi
|
||||
|
||||
mkdir -p "$local_path"
|
||||
|
||||
if sshfs "$remote" "$local_path" -o "$options"; then
|
||||
log "✅ Mounted $name: $remote -> $local_path"
|
||||
else
|
||||
log "❌ Failed to mount $name"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Check Tailscale connectivity
|
||||
if ! tailscale status >/dev/null 2>&1; then
|
||||
log "❌ Tailscale not connected"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log "🚀 Starting homelab filesystem mounting..."
|
||||
|
||||
# Default SSHFS options
|
||||
OPTS="reconnect,ServerAliveInterval=15,ServerAliveCountMax=3,follow_symlinks,cache=yes,compression=yes"
|
||||
|
||||
# Mount all filesystems
|
||||
mount_fs "Atlantis Home" "vish@atlantis.vish.local:/volume1/homes/vish" "$MOUNTS_DIR/atlantis" "$OPTS"
|
||||
mount_fs "Calypso Media" "vish@calypso.vish.local:/volume1/media" "$MOUNTS_DIR/calypso" "$OPTS"
|
||||
mount_fs "Projects" "vish@homelab-vm.vish.local:/home/vish/projects" "$MOUNTS_DIR/projects" "$OPTS"
|
||||
mount_fs "Documents" "vish@atlantis.vish.local:/volume1/documents" "$MOUNTS_DIR/documents" "$OPTS"
|
||||
mount_fs "Backups" "vish@anubis.vish.local:/volume1/backups" "$MOUNTS_DIR/backups" "$OPTS"
|
||||
|
||||
log "🎯 Homelab mounting complete"
|
||||
|
||||
# Create convenient symlinks
|
||||
ln -sf "$MOUNTS_DIR/projects" "$HOME/Projects"
|
||||
ln -sf "$MOUNTS_DIR/documents" "$HOME/Documents"
|
||||
ln -sf "$MOUNTS_DIR/atlantis/Desktop" "$HOME/Desktop-Remote"
|
||||
ln -sf "$MOUNTS_DIR/calypso/Photos" "$HOME/Photos"
|
||||
ln -sf "$MOUNTS_DIR/calypso/Movies" "$HOME/Movies"
|
||||
|
||||
log "🔗 Symlinks created"
|
||||
```
|
||||
|
||||
#### **Unmounting Script**
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# ~/scripts/unmount-homelab.sh
|
||||
|
||||
MOUNTS_DIR="$HOME/mounts"
|
||||
|
||||
unmount_fs() {
|
||||
local path="$1"
|
||||
local name="$2"
|
||||
|
||||
if mountpoint -q "$path"; then
|
||||
if fusermount -u "$path" 2>/dev/null || umount "$path" 2>/dev/null; then
|
||||
echo "✅ Unmounted $name"
|
||||
else
|
||||
echo "❌ Failed to unmount $name"
|
||||
return 1
|
||||
fi
|
||||
else
|
||||
echo "ℹ️ $name not mounted"
|
||||
fi
|
||||
}
|
||||
|
||||
echo "🔄 Unmounting homelab filesystems..."
|
||||
|
||||
unmount_fs "$MOUNTS_DIR/atlantis" "Atlantis"
|
||||
unmount_fs "$MOUNTS_DIR/calypso" "Calypso"
|
||||
unmount_fs "$MOUNTS_DIR/projects" "Projects"
|
||||
unmount_fs "$MOUNTS_DIR/documents" "Documents"
|
||||
unmount_fs "$MOUNTS_DIR/backups" "Backups"
|
||||
|
||||
# Remove symlinks
|
||||
rm -f "$HOME/Projects" "$HOME/Documents" "$HOME/Desktop-Remote" "$HOME/Photos" "$HOME/Movies"
|
||||
|
||||
echo "🎯 Unmounting complete"
|
||||
```
|
||||
|
||||
### **NFS Setup (Alternative)**
|
||||
|
||||
#### **On Homelab Servers (NFS Server)**
|
||||
```bash
|
||||
# Install NFS server (on Atlantis/Calypso)
|
||||
sudo apt install nfs-kernel-server
|
||||
|
||||
# Configure exports
|
||||
sudo tee /etc/exports << 'EOF'
|
||||
/volume1/homes/vish 100.64.0.0/10(rw,sync,no_subtree_check,no_root_squash)
|
||||
/volume1/documents 100.64.0.0/10(rw,sync,no_subtree_check,no_root_squash)
|
||||
/volume1/media 100.64.0.0/10(ro,sync,no_subtree_check)
|
||||
EOF
|
||||
|
||||
# Apply exports
|
||||
sudo exportfs -ra
|
||||
sudo systemctl restart nfs-kernel-server
|
||||
|
||||
# Check exports
|
||||
sudo exportfs -v
|
||||
```
|
||||
|
||||
#### **On Laptop (NFS Client)**
|
||||
```bash
|
||||
# Install NFS client
|
||||
sudo apt install nfs-common
|
||||
|
||||
# Mount NFS shares
|
||||
sudo mount -t nfs atlantis.vish.local:/volume1/homes/vish ~/mounts/atlantis
|
||||
sudo mount -t nfs calypso.vish.local:/volume1/media ~/mounts/calypso
|
||||
|
||||
# Add to /etc/fstab for automatic mounting
|
||||
echo "atlantis.vish.local:/volume1/homes/vish $HOME/mounts/atlantis nfs defaults,user,noauto 0 0" | sudo tee -a /etc/fstab
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔐 SSH Key Management for Travel
|
||||
|
||||
### **SSH Agent Setup**
|
||||
```bash
|
||||
# Start SSH agent
|
||||
eval "$(ssh-agent -s)"
|
||||
|
||||
# Add homelab keys
|
||||
ssh-add ~/.ssh/homelab_ed25519
|
||||
ssh-add ~/.ssh/atlantis_ed25519
|
||||
ssh-add ~/.ssh/servers_ed25519
|
||||
|
||||
# List loaded keys
|
||||
ssh-add -l
|
||||
|
||||
# Configure SSH agent forwarding
|
||||
echo "ForwardAgent yes" >> ~/.ssh/config
|
||||
```
|
||||
|
||||
### **SSH Configuration for Homelab**
|
||||
```bash
|
||||
# ~/.ssh/config
|
||||
Host atlantis
|
||||
HostName atlantis.vish.local
|
||||
User vish
|
||||
IdentityFile ~/.ssh/homelab_ed25519
|
||||
ServerAliveInterval 60
|
||||
ServerAliveCountMax 3
|
||||
ForwardAgent yes
|
||||
Compression yes
|
||||
|
||||
Host calypso
|
||||
HostName calypso.vish.local
|
||||
User vish
|
||||
IdentityFile ~/.ssh/homelab_ed25519
|
||||
ServerAliveInterval 60
|
||||
ServerAliveCountMax 3
|
||||
ForwardAgent yes
|
||||
Compression yes
|
||||
|
||||
Host homelab-vm
|
||||
HostName homelab-vm.vish.local
|
||||
User vish
|
||||
IdentityFile ~/.ssh/homelab_ed25519
|
||||
ServerAliveInterval 60
|
||||
ServerAliveCountMax 3
|
||||
ForwardAgent yes
|
||||
Compression yes
|
||||
|
||||
Host *.vish.local
|
||||
User vish
|
||||
IdentityFile ~/.ssh/homelab_ed25519
|
||||
ServerAliveInterval 60
|
||||
ServerAliveCountMax 3
|
||||
ForwardAgent yes
|
||||
Compression yes
|
||||
StrictHostKeyChecking accept-new
|
||||
```
|
||||
|
||||
### **Secure Key Storage**
|
||||
```bash
|
||||
# Encrypt SSH keys for travel
|
||||
gpg --cipher-algo AES256 --compress-algo 1 --s2k-mode 3 \
|
||||
--s2k-digest-algo SHA512 --s2k-count 65536 --symmetric \
|
||||
--output ~/.ssh/homelab_ed25519.gpg ~/.ssh/homelab_ed25519
|
||||
|
||||
# Decrypt when needed
|
||||
gpg --decrypt ~/.ssh/homelab_ed25519.gpg > ~/.ssh/homelab_ed25519
|
||||
chmod 600 ~/.ssh/homelab_ed25519
|
||||
ssh-add ~/.ssh/homelab_ed25519
|
||||
|
||||
# Secure delete original after encryption
|
||||
shred -vfz -n 3 ~/.ssh/homelab_ed25519
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Development Environment Setup
|
||||
|
||||
### **VS Code Remote Development**
|
||||
```bash
|
||||
# Install VS Code extensions
|
||||
code --install-extension ms-vscode-remote.remote-ssh
|
||||
code --install-extension ms-vscode-remote.remote-containers
|
||||
|
||||
# Configure remote development
|
||||
# File: ~/.vscode/settings.json
|
||||
{
|
||||
"remote.SSH.remotePlatform": {
|
||||
"homelab-vm.vish.local": "linux",
|
||||
"atlantis.vish.local": "linux",
|
||||
"concord-nuc.vish.local": "linux"
|
||||
},
|
||||
"remote.SSH.configFile": "~/.ssh/config",
|
||||
"remote.SSH.enableAgentForwarding": true
|
||||
}
|
||||
|
||||
# Connect to remote development environment
|
||||
code --remote ssh-remote+homelab-vm.vish.local /home/vish/projects
|
||||
```
|
||||
|
||||
### **Terminal Multiplexer (tmux/screen)**
|
||||
```bash
|
||||
# Install tmux on homelab servers
|
||||
ssh atlantis.vish.local 'sudo apt install tmux'
|
||||
ssh homelab-vm.vish.local 'sudo apt install tmux'
|
||||
|
||||
# Create persistent development sessions
|
||||
ssh homelab-vm.vish.local
|
||||
tmux new-session -d -s development
|
||||
tmux new-session -d -s monitoring
|
||||
tmux new-session -d -s admin
|
||||
|
||||
# Reconnect to sessions from laptop
|
||||
ssh homelab-vm.vish.local -t tmux attach-session -t development
|
||||
```
|
||||
|
||||
### **Docker Development**
|
||||
```bash
|
||||
# Use Docker on homelab servers remotely
|
||||
export DOCKER_HOST="ssh://vish@homelab-vm.vish.local"
|
||||
|
||||
# Run containers on remote host
|
||||
docker run -it --rm ubuntu:latest bash
|
||||
docker-compose -f ~/mounts/projects/myapp/docker-compose.yml up -d
|
||||
|
||||
# Build images on remote host
|
||||
docker build -t myapp ~/mounts/projects/myapp/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📱 Mobile Companion Setup
|
||||
|
||||
### **Mobile Apps for Homelab Access**
|
||||
```bash
|
||||
# Essential mobile apps:
|
||||
|
||||
# VPN & Network
|
||||
- Tailscale (primary VPN)
|
||||
- WireGuard (backup VPN)
|
||||
- Network Analyzer (troubleshooting)
|
||||
|
||||
# Remote Access
|
||||
- Termius (SSH client)
|
||||
- RDP Client (Windows remote desktop)
|
||||
- VNC Viewer (Linux desktop access)
|
||||
|
||||
# File Access
|
||||
- Solid Explorer (Android file manager with SFTP)
|
||||
- Documents (iOS file manager with SSH)
|
||||
- Syncthing (file synchronization)
|
||||
|
||||
# Services
|
||||
- Bitwarden (password manager)
|
||||
- Plex/Jellyfin (media streaming)
|
||||
- Home Assistant (smart home control)
|
||||
```
|
||||
|
||||
### **Mobile Hotspot Configuration**
|
||||
```bash
|
||||
# Configure laptop to use mobile hotspot when needed
|
||||
# Network Manager configuration for automatic connection
|
||||
|
||||
# Create hotspot profile
|
||||
nmcli connection add type wifi ifname wlan0 con-name "Mobile-Hotspot" \
|
||||
autoconnect yes ssid "YourPhone-Hotspot"
|
||||
nmcli connection modify "Mobile-Hotspot" wifi-sec.key-mgmt wpa-psk
|
||||
nmcli connection modify "Mobile-Hotspot" wifi-sec.psk "hotspot-password"
|
||||
|
||||
# Set priority (lower number = higher priority)
|
||||
nmcli connection modify "Mobile-Hotspot" connection.autoconnect-priority 10
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security Hardening for Travel
|
||||
|
||||
### **Full Disk Encryption**
|
||||
```bash
|
||||
# Ubuntu/Debian - Enable during installation or:
|
||||
sudo cryptsetup luksFormat /dev/sdX
|
||||
sudo cryptsetup luksOpen /dev/sdX encrypted_disk
|
||||
|
||||
# macOS - Enable FileVault
|
||||
sudo fdesetup enable
|
||||
|
||||
# Windows - Enable BitLocker
|
||||
manage-bde -on C: -REDACTED_APP_PASSWORD
|
||||
```
|
||||
|
||||
### **Firewall Configuration**
|
||||
```bash
|
||||
# Ubuntu/Debian UFW
|
||||
sudo ufw enable
|
||||
sudo ufw default deny incoming
|
||||
sudo ufw default allow outgoing
|
||||
|
||||
# Allow only Tailscale traffic
|
||||
sudo ufw allow in on tailscale0
|
||||
sudo ufw allow out on tailscale0
|
||||
|
||||
# Block all other VPN interfaces
|
||||
sudo ufw deny in on tun0
|
||||
sudo ufw deny in on wg0
|
||||
```
|
||||
|
||||
### **Auto-lock and Security**
|
||||
```bash
|
||||
# Linux - Auto-lock after 5 minutes
|
||||
gsettings set org.gnome.desktop.screensaver lock-delay 300
|
||||
gsettings set org.gnome.desktop.screensaver lock-enabled true
|
||||
|
||||
# Require password immediately after lock
|
||||
gsettings set org.gnome.desktop.screensaver lock-delay 0
|
||||
|
||||
# Auto-suspend after 30 minutes
|
||||
gsettings set org.gnome.settings-daemon.plugins.power sleep-inactive-ac-timeout 1800
|
||||
```
|
||||
|
||||
### **Remote Wipe Capability**
|
||||
```bash
|
||||
# Install remote wipe tools
|
||||
sudo apt install openssh-server fail2ban
|
||||
|
||||
# Create remote wipe script
|
||||
sudo tee /usr/local/bin/emergency-wipe.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
# Emergency laptop wipe script
|
||||
# Trigger via SSH: ssh laptop.tailscale "sudo /usr/local/bin/emergency-wipe.sh"
|
||||
|
||||
echo "🚨 EMERGENCY WIPE INITIATED"
|
||||
logger "Emergency wipe initiated from $(who am i)"
|
||||
|
||||
# Unmount all SSHFS mounts
|
||||
fusermount -u /home/*/mounts/* 2>/dev/null || true
|
||||
|
||||
# Clear SSH keys and known hosts
|
||||
rm -rf /home/*/.ssh/id_* /home/*/.ssh/known_hosts
|
||||
|
||||
# Clear browser data
|
||||
rm -rf /home/*/.mozilla/firefox/*/cookies.sqlite
|
||||
rm -rf /home/*/.config/google-chrome/Default/Cookies
|
||||
rm -rf /home/*/.config/chromium/Default/Cookies
|
||||
|
||||
# Clear recent files and history
|
||||
rm -rf /home/*/.local/share/recently-used.xbel
|
||||
rm -rf /home/*/.bash_history /home/*/.zsh_history
|
||||
|
||||
# Disconnect from Tailscale
|
||||
tailscale logout
|
||||
|
||||
# Optional: Full disk wipe (DESTRUCTIVE!)
|
||||
# dd if=/dev/urandom of=/dev/sda bs=1M
|
||||
|
||||
echo "🎯 Emergency wipe complete"
|
||||
logger "Emergency wipe completed"
|
||||
EOF
|
||||
|
||||
sudo chmod +x /usr/local/bin/emergency-wipe.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌍 Travel Workflow Examples
|
||||
|
||||
### **Coffee Shop Work Session**
|
||||
```bash
|
||||
# 1. Connect to WiFi
|
||||
# 2. Start Tailscale
|
||||
tailscale up --exit-node=atlantis.vish.local
|
||||
|
||||
# 3. Mount filesystems
|
||||
~/scripts/mount-homelab.sh
|
||||
|
||||
# 4. Start development environment
|
||||
code --remote ssh-remote+homelab-vm.vish.local ~/projects/current-project
|
||||
|
||||
# 5. Open monitoring dashboards
|
||||
firefox https://atlantis.vish.local:3000 # Grafana
|
||||
firefox https://atlantis.vish.local:3001 # Uptime Kuma
|
||||
|
||||
# 6. Work normally - all data stays on homelab
|
||||
```
|
||||
|
||||
### **Hotel Work Session**
|
||||
```bash
|
||||
# 1. Connect to hotel WiFi (potentially untrusted)
|
||||
# 2. Immediately connect Tailscale with exit node
|
||||
tailscale up --exit-node=atlantis.vish.local --accept-dns=true
|
||||
|
||||
# 3. Verify IP is masked
|
||||
curl ifconfig.me # Should show home IP
|
||||
|
||||
# 4. Mount filesystems and work
|
||||
~/scripts/mount-homelab.sh
|
||||
```
|
||||
|
||||
### **Airplane Work (Offline)**
|
||||
```bash
|
||||
# 1. Before flight, sync critical files
|
||||
rsync -av atlantis.vish.local:/volume1/homes/vish/current-project/ ~/offline-work/
|
||||
|
||||
# 2. Work offline on local copy
|
||||
# 3. After landing, sync changes back
|
||||
rsync -av ~/offline-work/ atlantis.vish.local:/volume1/homes/vish/current-project/
|
||||
|
||||
# 4. Clean up local copy
|
||||
rm -rf ~/offline-work/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Troubleshooting Travel Issues
|
||||
|
||||
### **Tailscale Connection Problems**
|
||||
```bash
|
||||
# Check Tailscale status
|
||||
tailscale status
|
||||
tailscale netcheck
|
||||
|
||||
# Reset Tailscale connection
|
||||
sudo tailscale down
|
||||
sudo tailscale up --exit-node=atlantis.vish.local
|
||||
|
||||
# Check routing
|
||||
ip route | grep tailscale
|
||||
ip route | grep 100.64
|
||||
|
||||
# Test connectivity to homelab
|
||||
ping atlantis.vish.local
|
||||
ping 192.168.1.100
|
||||
```
|
||||
|
||||
### **SSHFS Mount Issues**
|
||||
```bash
|
||||
# Check if mounts are stale
|
||||
df -h | grep fuse
|
||||
mountpoint ~/mounts/atlantis
|
||||
|
||||
# Force unmount stale mounts
|
||||
fusermount -uz ~/mounts/atlantis
|
||||
# or
|
||||
sudo umount -f ~/mounts/atlantis
|
||||
|
||||
# Remount with debug
|
||||
sshfs -d vish@atlantis.vish.local:/volume1/homes/vish ~/mounts/atlantis
|
||||
|
||||
# Check SSH connectivity
|
||||
ssh -v atlantis.vish.local
|
||||
```
|
||||
|
||||
### **DNS Resolution Issues**
|
||||
```bash
|
||||
# Check DNS settings
|
||||
cat /etc/resolv.conf
|
||||
systemd-resolve --status
|
||||
|
||||
# Test DNS resolution
|
||||
nslookup atlantis.vish.local
|
||||
dig atlantis.vish.local
|
||||
|
||||
# Force DNS through Tailscale
|
||||
tailscale up --exit-node=atlantis.vish.local --accept-dns=true
|
||||
```
|
||||
|
||||
### **Performance Issues**
|
||||
```bash
|
||||
# Test network speed
|
||||
speedtest-cli
|
||||
|
||||
# Test Tailscale performance
|
||||
iperf3 -c atlantis.vish.local
|
||||
|
||||
# Check for packet loss
|
||||
ping -c 100 atlantis.vish.local | grep loss
|
||||
|
||||
# Monitor network usage
|
||||
iftop -i tailscale0
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Travel Checklist
|
||||
|
||||
### **Pre-Travel Setup**
|
||||
```bash
|
||||
☐ Tailscale installed and configured
|
||||
☐ Exit node (Atlantis) configured and tested
|
||||
☐ SSH keys encrypted and backed up
|
||||
☐ Mount scripts tested and working
|
||||
☐ Remote wipe script configured
|
||||
☐ Full disk encryption enabled
|
||||
☐ Firewall configured for travel
|
||||
☐ Mobile apps installed and configured
|
||||
☐ Emergency contact information accessible
|
||||
☐ Backup authentication methods available
|
||||
```
|
||||
|
||||
### **Daily Travel Routine**
|
||||
```bash
|
||||
☐ Connect to Tailscale immediately after WiFi
|
||||
☐ Verify exit node is active (check IP)
|
||||
☐ Mount homelab filesystems
|
||||
☐ Check homelab service status
|
||||
☐ Work with remote-only data
|
||||
☐ Unmount filesystems before sleep/shutdown
|
||||
☐ Log out of sensitive services
|
||||
☐ Clear browser cache/history if needed
|
||||
```
|
||||
|
||||
### **Post-Travel Security**
|
||||
```bash
|
||||
☐ Review travel access logs
|
||||
☐ Change passwords if compromise suspected
|
||||
☐ Update SSH keys if needed
|
||||
☐ Review and clean up local files
|
||||
☐ Update travel procedures based on experience
|
||||
☐ Backup any new configurations
|
||||
☐ Document any issues encountered
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [📱 Mobile Device Setup](mobile-device-setup.md) - **NEW!** iOS, Android, macOS, Linux Tailscale configuration
|
||||
- [Tailscale Setup Guide](tailscale-setup-guide.md) - Complete Tailscale configuration
|
||||
- [Ubiquiti Enterprise Setup](ubiquiti-enterprise-setup.md) - Enterprise networking for advanced setups
|
||||
- [Kubernetes Cluster Setup](kubernetes-cluster-setup.md) - Remote access to Kubernetes services
|
||||
- [Disaster Recovery Guide](../troubleshooting/disaster-recovery.md) - Emergency procedures
|
||||
- [Offline Password Access](../troubleshooting/offline-password-access.md) - Password management while traveling
|
||||
- [Security Model](security.md) - Overall security architecture
|
||||
|
||||
---
|
||||
|
||||
**💡 Pro Tip**: Practice your travel setup at home first! Test all mounting, VPN, and remote access procedures on your home network before traveling. This ensures everything works smoothly when you're away from your homelab.
|
||||
776
docs/infrastructure/mobile-device-setup.md
Normal file
776
docs/infrastructure/mobile-device-setup.md
Normal file
@@ -0,0 +1,776 @@
|
||||
# 📱 Mobile Device Setup Guide
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide covers setting up Tailscale on all mobile and desktop platforms (iOS, macOS, Linux, iPadOS, Android, Debian, Rocky Linux) for secure homelab access with a disposable device philosophy.
|
||||
|
||||
## 🎯 Mobile Security Philosophy
|
||||
|
||||
### **Disposable Device Model**
|
||||
- **No critical data stored locally** - Everything accessed remotely
|
||||
- **Zero trust approach** - Assume devices will be lost/stolen/broken
|
||||
- **Cloud-based authentication** - Bitwarden, iCloud Keychain, Google Password Manager
|
||||
- **Remote wipe capability** - All devices can be wiped remotely
|
||||
- **Minimal local storage** - Only cached data and temporary files
|
||||
- **VPN-first access** - All homelab access through Tailscale
|
||||
|
||||
---
|
||||
|
||||
## 📱 iOS Setup (iPhone 16 Pro Max)
|
||||
|
||||
### **Install and Configure Tailscale**
|
||||
|
||||
#### **Installation**
|
||||
```bash
|
||||
# Install from App Store
|
||||
# Search: "Tailscale"
|
||||
# Developer: Tailscale Inc.
|
||||
# Install and open app
|
||||
# Compatible with iPhone 16 Pro Max running iOS 18+
|
||||
```
|
||||
|
||||
#### **Initial Setup**
|
||||
```bash
|
||||
# 1. Open Tailscale app
|
||||
# 2. Tap "Sign in"
|
||||
# 3. Choose your identity provider:
|
||||
# - Google (recommended for personal)
|
||||
# - Microsoft (for work accounts)
|
||||
# - GitHub (for developers)
|
||||
# 4. Complete authentication
|
||||
# 5. Allow VPN configuration when prompted
|
||||
# 6. Device will appear in Tailscale admin console
|
||||
```
|
||||
|
||||
#### **iOS-Specific Configuration**
|
||||
```bash
|
||||
# Enable key features in Tailscale app:
|
||||
|
||||
# Settings → General
|
||||
Use Tailscale DNS: ✅ Enabled
|
||||
Accept DNS Configuration: ✅ Enabled
|
||||
Use Exit Nodes: ✅ Enabled (for privacy)
|
||||
|
||||
# Settings → Exit Nodes
|
||||
Select: atlantis.vish.local (your homelab exit node)
|
||||
Allow LAN Access: ✅ Enabled (access homelab services)
|
||||
|
||||
# Settings → Preferences
|
||||
Start on Boot: ✅ Enabled
|
||||
Use Cellular Data: ✅ Enabled (for mobile access)
|
||||
```
|
||||
|
||||
### **iOS Shortcuts for Homelab Access**
|
||||
|
||||
#### **Create Homelab Shortcuts**
|
||||
```bash
|
||||
# Open Shortcuts app and create:
|
||||
|
||||
# Shortcut 1: "Connect Homelab"
|
||||
Actions:
|
||||
1. Set Variable: "tailscale_status" to "Get Network Details"
|
||||
2. If (Tailscale connected):
|
||||
- Show Notification: "Homelab Connected"
|
||||
3. Otherwise:
|
||||
- Open App: Tailscale
|
||||
- Wait 2 seconds
|
||||
- Show Notification: "Connecting to Homelab..."
|
||||
|
||||
# Shortcut 2: "Open Grafana"
|
||||
Actions:
|
||||
1. Open URLs: https://atlantis.vish.local:3000
|
||||
2. (Will open in Safari with Tailscale routing)
|
||||
|
||||
# Shortcut 3: "Open Plex"
|
||||
Actions:
|
||||
1. Open URLs: https://atlantis.vish.local:32400/web
|
||||
|
||||
# Shortcut 4: "Open Home Assistant"
|
||||
Actions:
|
||||
1. Open URLs: https://concord-nuc.vish.local:8123
|
||||
```
|
||||
|
||||
### **Essential iOS Apps for Homelab**
|
||||
|
||||
#### **Core Apps**
|
||||
```bash
|
||||
# VPN & Network
|
||||
- Tailscale (primary VPN)
|
||||
- Network Analyzer (troubleshooting)
|
||||
- Ping (network testing)
|
||||
|
||||
# Remote Access
|
||||
- Termius (SSH client)
|
||||
- Microsoft Remote Desktop (RDP)
|
||||
- VNC Viewer (Linux desktop access)
|
||||
- Jump Desktop (comprehensive remote access)
|
||||
|
||||
# File Management
|
||||
- Documents by Readdle (SFTP/SSH file access)
|
||||
- FileBrowser (web-based file management)
|
||||
- Working Copy (Git client)
|
||||
|
||||
# Password Management
|
||||
- Bitwarden (primary password manager)
|
||||
- Built-in iCloud Keychain (backup)
|
||||
|
||||
# Monitoring & Services
|
||||
- Grafana mobile app (monitoring dashboards)
|
||||
- Home Assistant Companion (smart home)
|
||||
- Plex (media streaming)
|
||||
- Immich (photo management)
|
||||
```
|
||||
|
||||
#### **iOS Configuration for Each App**
|
||||
|
||||
**Termius SSH Client:**
|
||||
```bash
|
||||
# Add homelab hosts
|
||||
Host: atlantis
|
||||
Address: atlantis.vish.local
|
||||
Username: vish
|
||||
Authentication: SSH Key
|
||||
Port: 22
|
||||
|
||||
# Import SSH key (if needed)
|
||||
# Settings → Keys → Add Key → Import from Files
|
||||
# Or generate new key pair in Termius
|
||||
```
|
||||
|
||||
**Documents by Readdle:**
|
||||
```bash
|
||||
# Add SFTP connections
|
||||
Name: Atlantis Files
|
||||
Protocol: SFTP
|
||||
Server: atlantis.vish.local
|
||||
Username: vish
|
||||
Authentication: SSH Key or Password
|
||||
Port: 22
|
||||
Path: /volume1/homes/vish
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💻 macOS Setup
|
||||
|
||||
### **Install Tailscale**
|
||||
|
||||
#### **Installation Methods**
|
||||
```bash
|
||||
# Method 1: Direct Download
|
||||
# Visit: https://tailscale.com/download/mac
|
||||
# Download and install .pkg file
|
||||
|
||||
# Method 2: Homebrew
|
||||
brew install --cask tailscale
|
||||
|
||||
# Method 3: Mac App Store
|
||||
# Search for "Tailscale" and install
|
||||
```
|
||||
|
||||
#### **Configuration**
|
||||
```bash
|
||||
# Launch Tailscale from Applications
|
||||
# Sign in with your account
|
||||
# Configure in System Preferences → Network
|
||||
|
||||
# Tailscale Preferences:
|
||||
Use Tailscale DNS: ✅ Enabled
|
||||
Accept Routes: ✅ Enabled
|
||||
Use Exit Node: atlantis.vish.local
|
||||
Allow LAN Access: ✅ Enabled
|
||||
Start at Login: ✅ Enabled
|
||||
```
|
||||
|
||||
### **macOS Integration Features**
|
||||
|
||||
#### **Menu Bar Access**
|
||||
```bash
|
||||
# Tailscale menu bar icon provides:
|
||||
- Connection status
|
||||
- Quick exit node switching
|
||||
- Device list with status
|
||||
- Admin console access
|
||||
- Preferences shortcut
|
||||
```
|
||||
|
||||
#### **Keychain Integration**
|
||||
```bash
|
||||
# Store SSH keys in Keychain
|
||||
ssh-add --apple-use-keychain ~/.ssh/homelab_ed25519
|
||||
|
||||
# Configure SSH to use Keychain
|
||||
echo "UseKeychain yes" >> ~/.ssh/config
|
||||
echo "AddKeysToAgent yes" >> ~/.ssh/config
|
||||
```
|
||||
|
||||
### **macOS Homelab Workflow**
|
||||
|
||||
#### **Terminal Setup**
|
||||
```bash
|
||||
# Install essential tools
|
||||
brew install htop tmux git wget curl
|
||||
|
||||
# Configure SSH for homelab
|
||||
cat >> ~/.ssh/config << 'EOF'
|
||||
Host *.vish.local
|
||||
User vish
|
||||
IdentityFile ~/.ssh/homelab_ed25519
|
||||
ServerAliveInterval 60
|
||||
ServerAliveCountMax 3
|
||||
UseKeychain yes
|
||||
AddKeysToAgent yes
|
||||
EOF
|
||||
|
||||
# Create homelab aliases
|
||||
cat >> ~/.zshrc << 'EOF'
|
||||
# Homelab aliases
|
||||
alias atlantis='ssh atlantis.vish.local'
|
||||
alias calypso='ssh calypso.vish.local'
|
||||
alias homelab='ssh homelab-vm.vish.local'
|
||||
alias grafana='open https://atlantis.vish.local:3000'
|
||||
alias plex='open https://atlantis.vish.local:32400/web'
|
||||
alias homeassistant='open https://concord-nuc.vish.local:8123'
|
||||
EOF
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🐧 Linux Setup (Debian/Ubuntu)
|
||||
|
||||
### **Install Tailscale**
|
||||
|
||||
#### **Official Installation**
|
||||
```bash
|
||||
# Add Tailscale repository
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
|
||||
# Alternative manual installation
|
||||
curl -fsSL https://pkgs.tailscale.com/stable/debian/bullseye.noarmor.gpg | sudo tee /usr/share/keyrings/tailscale-archive-keyring.gpg >/dev/null
|
||||
curl -fsSL https://pkgs.tailscale.com/stable/debian/bullseye.list | sudo tee /etc/apt/sources.list.d/tailscale.list
|
||||
|
||||
sudo apt update
|
||||
sudo apt install tailscale
|
||||
|
||||
# Start and enable service
|
||||
sudo systemctl enable --now tailscaled
|
||||
```
|
||||
|
||||
#### **Authentication and Configuration**
|
||||
```bash
|
||||
# Connect to tailnet
|
||||
sudo tailscale up --accept-dns --accept-routes
|
||||
|
||||
# Use exit node for privacy
|
||||
sudo tailscale up --exit-node=atlantis.vish.local --accept-dns --accept-routes
|
||||
|
||||
# Check status
|
||||
tailscale status
|
||||
tailscale ip -4
|
||||
```
|
||||
|
||||
### **Linux Desktop Integration**
|
||||
|
||||
#### **GNOME Integration**
|
||||
```bash
|
||||
# Install GNOME extensions for network management
|
||||
sudo apt install gnome-shell-extensions
|
||||
|
||||
# Network Manager integration
|
||||
# Tailscale will appear in network settings
|
||||
# Can be controlled via GUI
|
||||
```
|
||||
|
||||
#### **KDE Integration**
|
||||
```bash
|
||||
# KDE Plasma network widget shows Tailscale
|
||||
# System Settings → Network → Connections
|
||||
# Tailscale appears as VPN connection
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏔️ Rocky Linux Setup
|
||||
|
||||
### **Install Tailscale**
|
||||
|
||||
#### **RPM Installation**
|
||||
```bash
|
||||
# Add Tailscale repository
|
||||
sudo dnf config-manager --add-repo https://pkgs.tailscale.com/stable/rhel/9/tailscale.repo
|
||||
|
||||
# Install Tailscale
|
||||
sudo dnf install tailscale
|
||||
|
||||
# Enable and start service
|
||||
sudo systemctl enable --now tailscaled
|
||||
|
||||
# Configure firewall
|
||||
sudo firewall-cmd --permanent --add-port=41641/udp
|
||||
sudo firewall-cmd --reload
|
||||
```
|
||||
|
||||
#### **SELinux Configuration**
|
||||
```bash
|
||||
# Allow Tailscale through SELinux
|
||||
sudo setsebool -P use_vpn_generic 1
|
||||
|
||||
# If needed, create custom policy
|
||||
sudo ausearch -c 'tailscaled' --raw | audit2allow -M tailscale-policy
|
||||
sudo semodule -i tailscale-policy.pp
|
||||
```
|
||||
|
||||
#### **Rocky Linux Specific Setup**
|
||||
```bash
|
||||
# Connect to tailnet
|
||||
sudo tailscale up --accept-dns --accept-routes --exit-node=atlantis.vish.local
|
||||
|
||||
# Configure NetworkManager (if using GUI)
|
||||
sudo nmcli connection modify tailscale0 connection.autoconnect yes
|
||||
|
||||
# Verify configuration
|
||||
tailscale status
|
||||
ip route | grep tailscale
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📱 iPadOS Setup (iPad Pro 12.9" 6th Gen)
|
||||
|
||||
### **Installation and Configuration**
|
||||
```bash
|
||||
# Same as iOS installation process
|
||||
# App Store → Search "Tailscale" → Install
|
||||
|
||||
# iPad Pro 12.9" 6th Gen specific features:
|
||||
# - M2 chip performance for demanding remote work
|
||||
# - 12.9" Liquid Retina XDR display for detailed work
|
||||
# - Split View support for SSH + web browsing
|
||||
# - External keyboard shortcuts (Magic Keyboard compatible)
|
||||
# - Mouse/trackpad support for remote desktop
|
||||
# - Files app integration for SFTP
|
||||
# - USB-C connectivity for external storage
|
||||
# - Thunderbolt 4 support for high-speed connections
|
||||
```
|
||||
|
||||
### **iPadOS Productivity Setup**
|
||||
|
||||
#### **Split Screen Workflows**
|
||||
```bash
|
||||
# Common split-screen combinations:
|
||||
# 1. Termius (SSH) + Safari (web services)
|
||||
# 2. Working Copy (Git) + Textastic (code editor)
|
||||
# 3. Documents (files) + Grafana (monitoring)
|
||||
# 4. Home Assistant + Plex (entertainment + automation)
|
||||
```
|
||||
|
||||
#### **External Keyboard Shortcuts (Magic Keyboard)**
|
||||
```bash
|
||||
# Configure in Settings → General → Keyboard → Hardware Keyboard
|
||||
# Magic Keyboard for iPad Pro 12.9" provides laptop-like experience
|
||||
|
||||
# Essential shortcuts for homelab work:
|
||||
Cmd+Tab: Switch between apps
|
||||
Cmd+Space: Spotlight search (find apps quickly)
|
||||
Cmd+Shift+4: Screenshot (for documentation)
|
||||
Cmd+`: Switch between windows of same app
|
||||
Cmd+H: Hide current app
|
||||
Cmd+Option+D: Show/hide dock
|
||||
F1-F12: Function keys for terminal work
|
||||
Brightness/Volume: Dedicated keys on Magic Keyboard
|
||||
|
||||
# iPad Pro specific shortcuts:
|
||||
Cmd+Shift+A: Open App Library
|
||||
Cmd+Shift+H: Go to Home Screen
|
||||
Cmd+Control+Space: Emoji picker
|
||||
```
|
||||
|
||||
### **iPadOS-Specific Apps**
|
||||
|
||||
#### **Professional Apps**
|
||||
```bash
|
||||
# Development
|
||||
- Working Copy (Git client with SSH)
|
||||
- Textastic (code editor)
|
||||
- Prompt 3 (SSH client)
|
||||
- Blink Shell (terminal emulator)
|
||||
|
||||
# System Administration
|
||||
- Termius (SSH with sync)
|
||||
- Network Analyzer (network diagnostics)
|
||||
- iStat Menus (system monitoring)
|
||||
|
||||
# File Management
|
||||
- Documents by Readdle (SFTP/cloud integration)
|
||||
- FileBrowser (web-based file management)
|
||||
- Secure ShellFish (SSH file manager)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🤖 Android Setup
|
||||
|
||||
### **Install Tailscale**
|
||||
|
||||
#### **Installation**
|
||||
```bash
|
||||
# Google Play Store
|
||||
# Search: "Tailscale"
|
||||
# Install official Tailscale app
|
||||
|
||||
# F-Droid (alternative)
|
||||
# Add Tailscale repository if available
|
||||
# Or sideload APK from GitHub releases
|
||||
```
|
||||
|
||||
#### **Android Configuration**
|
||||
```bash
|
||||
# Open Tailscale app
|
||||
# Sign in with your account
|
||||
# Grant VPN permission when prompted
|
||||
|
||||
# Settings within Tailscale app:
|
||||
Use Tailscale DNS: ✅ Enabled
|
||||
Accept Routes: ✅ Enabled
|
||||
Use Exit Node: atlantis.vish.local
|
||||
Allow LAN Access: ✅ Enabled
|
||||
Start on Boot: ✅ Enabled
|
||||
Use Mobile Data: ✅ Enabled
|
||||
```
|
||||
|
||||
### **Android Integration**
|
||||
|
||||
#### **Always-On VPN**
|
||||
```bash
|
||||
# Android Settings → Network & Internet → VPN
|
||||
# Select Tailscale
|
||||
# Enable "Always-on VPN"
|
||||
# Enable "Block connections without VPN"
|
||||
# This ensures all traffic goes through Tailscale
|
||||
```
|
||||
|
||||
#### **Battery Optimization**
|
||||
```bash
|
||||
# Prevent Android from killing Tailscale
|
||||
# Settings → Apps → Tailscale → Battery
|
||||
# Battery Optimization: Don't optimize
|
||||
# Background Activity: Allow
|
||||
```
|
||||
|
||||
### **Essential Android Apps**
|
||||
|
||||
#### **Core Homelab Apps**
|
||||
```bash
|
||||
# Remote Access
|
||||
- Termux (terminal emulator)
|
||||
- JuiceSSH (SSH client)
|
||||
- Microsoft Remote Desktop (RDP)
|
||||
- VNC Viewer (Linux desktop)
|
||||
|
||||
# File Management
|
||||
- Solid Explorer (SFTP support)
|
||||
- Material Files (open source file manager)
|
||||
- Syncthing (file synchronization)
|
||||
|
||||
# Monitoring & Services
|
||||
- Grafana mobile app
|
||||
- Home Assistant Companion
|
||||
- Plex for Android
|
||||
- Immich mobile app
|
||||
|
||||
# Password Management
|
||||
- Bitwarden
|
||||
- Google Password Manager (backup)
|
||||
```
|
||||
|
||||
#### **Android Automation**
|
||||
|
||||
**Tasker Integration:**
|
||||
```bash
|
||||
# Create Tasker profiles for homelab automation
|
||||
|
||||
# Profile 1: Auto-connect Tailscale when leaving home WiFi
|
||||
Trigger: WiFi Disconnected (home network)
|
||||
Action: Launch App → Tailscale
|
||||
|
||||
# Profile 2: Open homelab dashboard when connected
|
||||
Trigger: Tailscale connected
|
||||
Action: Browse URL → https://atlantis.vish.local:3000
|
||||
|
||||
# Profile 3: Backup photos to Immich
|
||||
Trigger: WiFi Connected (any network) + Tailscale active
|
||||
Action: HTTP Post to Immich API
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Cross-Platform Security
|
||||
|
||||
### **Device Management**
|
||||
|
||||
#### **Tailscale Admin Console**
|
||||
```bash
|
||||
# Access: https://login.tailscale.com/admin/machines
|
||||
|
||||
# For each device, configure:
|
||||
Device Name: Descriptive name (iPhone-Personal, MacBook-Work)
|
||||
Key Expiry: 90 days (shorter for mobile devices)
|
||||
Tags: mobile, personal, work (for ACL rules)
|
||||
Approval: Require approval for new devices
|
||||
```
|
||||
|
||||
#### **Access Control Lists (ACLs)**
|
||||
```bash
|
||||
# Configure device-specific access rules
|
||||
# Tailscale Admin → Access Controls
|
||||
|
||||
{
|
||||
"groups": {
|
||||
"group:mobile": ["user@domain.com"],
|
||||
"group:admin": ["user@domain.com"]
|
||||
},
|
||||
"acls": [
|
||||
// Mobile devices - limited access
|
||||
{
|
||||
"action": "accept",
|
||||
"src": ["group:mobile"],
|
||||
"dst": [
|
||||
"atlantis.vish.local:443", // HTTPS services
|
||||
"atlantis.vish.local:3000", // Grafana
|
||||
"atlantis.vish.local:32400", // Plex
|
||||
"concord-nuc.vish.local:8123" // Home Assistant
|
||||
]
|
||||
},
|
||||
// Admin devices - full access
|
||||
{
|
||||
"action": "accept",
|
||||
"src": ["group:admin"],
|
||||
"dst": ["*:*"]
|
||||
}
|
||||
],
|
||||
"nodeAttrs": [
|
||||
{
|
||||
"target": ["tag:mobile"],
|
||||
"attr": ["funnel"]
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### **Remote Device Management**
|
||||
|
||||
#### **Find My Device / Find My iPhone**
|
||||
```bash
|
||||
# iOS: Settings → [Your Name] → Find My → Find My iPhone
|
||||
# Enable: Find My iPhone, Find My network, Send Last Location
|
||||
|
||||
# Android: Settings → Security → Find My Device
|
||||
# Enable: Find My Device, Send last location
|
||||
|
||||
# macOS: System Preferences → Apple ID → iCloud → Find My Mac
|
||||
# Enable: Find My Mac, Find My network
|
||||
|
||||
# These work even with Tailscale VPN active
|
||||
```
|
||||
|
||||
#### **Remote Wipe Procedures**
|
||||
```bash
|
||||
# iOS Remote Wipe:
|
||||
# 1. Visit icloud.com/find
|
||||
# 2. Select device
|
||||
# 3. Click "Erase iPhone/iPad"
|
||||
# 4. Confirm erasure
|
||||
|
||||
# Android Remote Wipe:
|
||||
# 1. Visit android.com/find
|
||||
# 2. Select device
|
||||
# 3. Click "Erase device"
|
||||
# 4. Confirm erasure
|
||||
|
||||
# macOS Remote Wipe:
|
||||
# 1. Visit icloud.com/find
|
||||
# 2. Select Mac
|
||||
# 3. Click "Erase Mac"
|
||||
# 4. Confirm erasure
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Mobile Monitoring and Management
|
||||
|
||||
### **Device Health Monitoring**
|
||||
|
||||
#### **Tailscale Status Monitoring**
|
||||
```bash
|
||||
# Create monitoring script for mobile devices
|
||||
# Run on homelab server to check mobile connectivity
|
||||
|
||||
#!/bin/bash
|
||||
# ~/scripts/check-mobile-devices.sh
|
||||
|
||||
DEVICES=(
|
||||
"iPhone-Personal"
|
||||
"iPad-Work"
|
||||
"Android-Phone"
|
||||
"MacBook-Travel"
|
||||
)
|
||||
|
||||
for device in "${DEVICES[@]}"; do
|
||||
if tailscale ping "$device" >/dev/null 2>&1; then
|
||||
echo "✅ $device is online"
|
||||
else
|
||||
echo "❌ $device is offline"
|
||||
# Send notification to admin
|
||||
curl -X POST "https://ntfy.sh/REDACTED_TOPIC" \
|
||||
-d "Device $device is offline"
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
#### **Grafana Mobile Dashboard**
|
||||
```bash
|
||||
# Create mobile-optimized Grafana dashboard
|
||||
# Panel 1: Device connectivity status
|
||||
# Panel 2: Bandwidth usage by device
|
||||
# Panel 3: Connection duration
|
||||
# Panel 4: Geographic location (if enabled)
|
||||
# Panel 5: Battery status (if available)
|
||||
```
|
||||
|
||||
### **Usage Analytics**
|
||||
|
||||
#### **Track Mobile Usage Patterns**
|
||||
```bash
|
||||
# Prometheus metrics for mobile devices
|
||||
# Add to prometheus.yml:
|
||||
|
||||
- job_name: 'tailscale-mobile'
|
||||
static_configs:
|
||||
- targets: ['localhost:9090']
|
||||
metrics_path: /api/v2/tailnet/tailnet-name/devices
|
||||
params:
|
||||
format: ['prometheus']
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Mobile Workflows
|
||||
|
||||
### **Daily Mobile Workflows**
|
||||
|
||||
#### **Morning Routine**
|
||||
```bash
|
||||
# 1. Check Tailscale connection status
|
||||
# 2. Open Home Assistant to check house status
|
||||
# 3. Review Grafana alerts from overnight
|
||||
# 4. Check Uptime Kuma for service status
|
||||
# 5. Browse Immich for new photos backed up
|
||||
```
|
||||
|
||||
#### **Work Day Access**
|
||||
```bash
|
||||
# From mobile device:
|
||||
# 1. SSH to homelab-vm for development work
|
||||
# 2. Access GitLab for code repositories
|
||||
# 3. Monitor services via Grafana mobile
|
||||
# 4. Use Vaultwarden for password access
|
||||
# 5. Stream music via Navidrome
|
||||
```
|
||||
|
||||
#### **Travel Scenarios**
|
||||
```bash
|
||||
# Airport/Plane WiFi:
|
||||
# 1. Connect to WiFi
|
||||
# 2. Verify Tailscale connects automatically
|
||||
# 3. Check exit node is active (IP shows home location)
|
||||
# 4. Access homelab services normally
|
||||
# 5. Stream media via Plex for entertainment
|
||||
|
||||
# Hotel WiFi:
|
||||
# 1. Connect to hotel network
|
||||
# 2. Tailscale auto-connects and secures traffic
|
||||
# 3. Work normally with full homelab access
|
||||
# 4. No need to trust hotel network security
|
||||
```
|
||||
|
||||
### **Emergency Procedures**
|
||||
|
||||
#### **Device Loss/Theft**
|
||||
```bash
|
||||
# Immediate actions (within 5 minutes):
|
||||
# 1. Use Find My Device to locate
|
||||
# 2. If not recoverable, initiate remote wipe
|
||||
# 3. Log into Tailscale admin console
|
||||
# 4. Disable/delete the compromised device
|
||||
# 5. Change critical passwords if device had saved credentials
|
||||
# 6. Monitor homelab logs for suspicious access
|
||||
```
|
||||
|
||||
#### **Network Connectivity Issues**
|
||||
```bash
|
||||
# Troubleshooting steps:
|
||||
# 1. Check cellular/WiFi connectivity
|
||||
# 2. Force-quit and restart Tailscale app
|
||||
# 3. Try different exit node
|
||||
# 4. Check Tailscale status page
|
||||
# 5. Use mobile hotspot as backup
|
||||
# 6. Contact homelab admin if persistent issues
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Mobile Device Checklist
|
||||
|
||||
### **Initial Setup Checklist**
|
||||
```bash
|
||||
☐ Install Tailscale from official app store
|
||||
☐ Sign in with homelab account
|
||||
☐ Configure exit node (atlantis.vish.local)
|
||||
☐ Enable DNS settings and route acceptance
|
||||
☐ Test connectivity to homelab services
|
||||
☐ Install essential homelab apps
|
||||
☐ Configure SSH keys and authentication
|
||||
☐ Set up remote wipe capability
|
||||
☐ Configure device in Tailscale admin console
|
||||
☐ Test emergency procedures
|
||||
```
|
||||
|
||||
### **Security Checklist**
|
||||
```bash
|
||||
☐ Enable device lock screen with strong passcode/biometrics
|
||||
☐ Configure automatic lock timeout (5 minutes max)
|
||||
☐ Enable remote wipe capability
|
||||
☐ Configure Find My Device/iPhone
|
||||
☐ Use password manager for all credentials
|
||||
☐ Enable two-factor authentication where possible
|
||||
☐ Regular security updates installed
|
||||
☐ VPN always-on configured
|
||||
☐ No critical data stored locally
|
||||
☐ Regular backup of device settings
|
||||
```
|
||||
|
||||
### **Maintenance Checklist**
|
||||
```bash
|
||||
☐ Weekly: Check Tailscale connectivity and performance
|
||||
☐ Monthly: Review device access logs in admin console
|
||||
☐ Monthly: Update all homelab-related apps
|
||||
☐ Quarterly: Rotate SSH keys and passwords
|
||||
☐ Quarterly: Test remote wipe procedures
|
||||
☐ Quarterly: Review and update ACL rules
|
||||
☐ Annually: Full security audit of mobile access
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [Tailscale Setup Guide](tailscale-setup-guide.md) - Complete Tailscale infrastructure setup
|
||||
- [👨👩👧👦 Family Network Integration](family-network-integration.md) - **NEW!** Connect family devices to homelab
|
||||
- [Laptop Travel Setup](laptop-travel-setup.md) - Laptop-specific travel configuration
|
||||
- [Disaster Recovery Guide](../troubleshooting/disaster-recovery.md) - Emergency procedures
|
||||
- [Offline Password Access](../troubleshooting/offline-password-access.md) - Password management
|
||||
- [Security Model](security.md) - Overall security architecture
|
||||
|
||||
---
|
||||
|
||||
**💡 Pro Tip**: Treat mobile devices as disposable terminals for accessing your homelab. Keep no critical data locally, use strong authentication, and maintain the ability to remotely wipe any device. This approach provides maximum security and flexibility for accessing your homelab from anywhere!
|
||||
79
docs/infrastructure/monitoring/README.md
Normal file
79
docs/infrastructure/monitoring/README.md
Normal file
@@ -0,0 +1,79 @@
|
||||
# Monitoring Stack
|
||||
|
||||
The production monitoring stack runs on **homelab_vm** as a single Portainer GitOps stack.
|
||||
|
||||
## Deployment
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Stack name** | `monitoring-stack` |
|
||||
| **Portainer stack ID** | 687 (endpoint 443399) |
|
||||
| **Compose file** | `hosts/vms/homelab-vm/monitoring.yaml` |
|
||||
| **Deployment method** | GitOps (Portainer pulls from `main` branch) |
|
||||
|
||||
## Services
|
||||
|
||||
| Service | Image | Port | Purpose |
|
||||
|---------|-------|------|---------|
|
||||
| `grafana` | `grafana/grafana-oss:12.4.0` | 3300 | Dashboards & visualization |
|
||||
| `prometheus` | `prom/prometheus:latest` | 9090 | Metrics collection & storage |
|
||||
| `node_exporter` | `prom/node-exporter:latest` | 9100 (host) | homelab-vm host metrics |
|
||||
| `snmp_exporter` | `prom/snmp-exporter:latest` | 9116 | Synology NAS SNMP metrics |
|
||||
|
||||
## Access
|
||||
|
||||
| Service | URL |
|
||||
|---------|-----|
|
||||
| Grafana (external) | `https://gf.vish.gg` |
|
||||
| Grafana (internal) | `http://192.168.0.210:3300` |
|
||||
| Prometheus | `http://192.168.0.210:9090` |
|
||||
| SNMP Exporter | `http://192.168.0.210:9116` |
|
||||
|
||||
## Grafana Dashboards
|
||||
|
||||
All configs are embedded as Docker `configs` in `monitoring.yaml` — no bind mounts or separate config files needed.
|
||||
|
||||
| Dashboard | UID | Source |
|
||||
|-----------|-----|--------|
|
||||
| Node Details - Full Metrics *(default home)* | `node-details-v2` | DB (imported) |
|
||||
| Infrastructure Overview - All Devices | `infrastructure-overview-v2` | Provisioned in monitoring.yaml |
|
||||
| Synology NAS Monitoring | `synology-dashboard-v2` | Provisioned in monitoring.yaml |
|
||||
| Node Exporter Full | `rYdddlPWk` | DB (imported from grafana.com) |
|
||||
|
||||
The home dashboard is set via the Grafana org preferences API (persists in `grafana-data` volume).
|
||||
|
||||
## Prometheus Scrape Targets
|
||||
|
||||
| Job | Target | Instance label |
|
||||
|-----|--------|---------------|
|
||||
| `node_exporter` | `host.docker.internal:9100` | homelab-vm |
|
||||
| `homelab-node` | `100.67.40.126:9100` | homelab-vm |
|
||||
| `raspberry-pis` | `100.77.151.40:9100` | pi-5 |
|
||||
| `setillo-node` | `100.125.0.20:9100` | setillo |
|
||||
| `calypso-node` | `100.103.48.78:9100` | calypso |
|
||||
| `atlantis-node` | `100.83.230.112:9100` | atlantis |
|
||||
| `concord-nuc-node` | `100.72.55.21:9100` | concord-nuc |
|
||||
| `truenas-node` | `100.75.252.64:9100` | guava |
|
||||
| `seattle-node` | `100.82.197.124:9100` | seattle |
|
||||
| `proxmox-node` | `100.87.12.28:9100` | proxmox |
|
||||
| `setillo-snmp` | `100.125.0.20:9116` | setillo (SNMP) |
|
||||
| `calypso-snmp` | `100.103.48.78:9116` | calypso (SNMP) |
|
||||
| `atlantis-snmp` | `100.83.230.112:9116` | atlantis (SNMP) |
|
||||
|
||||
## Notes
|
||||
|
||||
- **Grafana 12 `kubernetesDashboards`**: This feature toggle is ON by default in Grafana 12 and causes noisy log spam. It is disabled via `GF_FEATURE_TOGGLES_DISABLE=kubernetesDashboards` in the compose file.
|
||||
- **Image pinning**: Grafana is pinned to `12.4.0` to prevent unexpected breaking changes from `:latest` pulls.
|
||||
- **Admin password**: `GF_SECURITY_ADMIN_PASSWORD` only applies on first run (empty DB). After that, use `grafana cli admin reset-admin-password` to change it.
|
||||
- **DB-only dashboards**: `node-details-v2` and `Node Exporter Full` are not in `monitoring.yaml` — they live only in the `grafana-data` volume. They would need to be re-imported if the volume is deleted.
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `docs/services/individual/grafana.md` — full Grafana service reference
|
||||
- `docs/admin/monitoring-setup.md` — monitoring stack quick reference
|
||||
- `docs/admin/monitoring.md` — full monitoring & observability guide
|
||||
- `hosts/vms/homelab-vm/monitoring.yaml` — compose file (source of truth)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2026-03-08
|
||||
203
docs/infrastructure/monitoring/backup.sh
Executable file
203
docs/infrastructure/monitoring/backup.sh
Executable file
@@ -0,0 +1,203 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Stoatchat Backup Script
|
||||
# Creates a complete backup of the Stoatchat instance including database, files, and configuration
|
||||
|
||||
set -e # Exit on any error
|
||||
|
||||
# Configuration
|
||||
BACKUP_DIR="/root/stoatchat-backups"
|
||||
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||
BACKUP_NAME="stoatchat_backup_${TIMESTAMP}"
|
||||
BACKUP_PATH="${BACKUP_DIR}/${BACKUP_NAME}"
|
||||
STOATCHAT_DIR="/root/stoatchat"
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
log() {
|
||||
echo -e "${BLUE}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1"
|
||||
}
|
||||
|
||||
success() {
|
||||
echo -e "${GREEN}✅ $1${NC}"
|
||||
}
|
||||
|
||||
warning() {
|
||||
echo -e "${YELLOW}⚠️ $1${NC}"
|
||||
}
|
||||
|
||||
error() {
|
||||
echo -e "${RED}❌ $1${NC}"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Check if running as root
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
error "This script must be run as root"
|
||||
fi
|
||||
|
||||
log "Starting Stoatchat backup process..."
|
||||
log "Backup will be saved to: ${BACKUP_PATH}"
|
||||
|
||||
# Create backup directory
|
||||
mkdir -p "${BACKUP_PATH}"
|
||||
|
||||
# 1. Backup MongoDB Database
|
||||
log "Backing up MongoDB database..."
|
||||
if command -v mongodump &> /dev/null; then
|
||||
mongodump --host localhost:27017 --db revolt --out "${BACKUP_PATH}/mongodb"
|
||||
success "MongoDB backup completed"
|
||||
else
|
||||
# Use docker if mongodump not available
|
||||
MONGO_CONTAINER=$(docker ps --format "{{.Names}}" | grep mongo | head -1)
|
||||
if [ ! -z "$MONGO_CONTAINER" ]; then
|
||||
docker exec "$MONGO_CONTAINER" mongodump --db revolt --out /tmp/backup
|
||||
docker cp "$MONGO_CONTAINER:/tmp/backup" "${BACKUP_PATH}/mongodb"
|
||||
success "MongoDB backup completed (via Docker)"
|
||||
else
|
||||
warning "MongoDB backup skipped - no mongodump or mongo container found"
|
||||
fi
|
||||
fi
|
||||
|
||||
# 2. Backup Configuration Files
|
||||
log "Backing up configuration files..."
|
||||
mkdir -p "${BACKUP_PATH}/config"
|
||||
cp "${STOATCHAT_DIR}/Revolt.toml" "${BACKUP_PATH}/config/" 2>/dev/null || warning "Revolt.toml not found"
|
||||
cp "${STOATCHAT_DIR}/Revolt.overrides.toml" "${BACKUP_PATH}/config/" 2>/dev/null || warning "Revolt.overrides.toml not found"
|
||||
cp "${STOATCHAT_DIR}/compose.yml" "${BACKUP_PATH}/config/" 2>/dev/null || warning "compose.yml not found"
|
||||
cp "${STOATCHAT_DIR}/livekit.yml" "${BACKUP_PATH}/config/" 2>/dev/null || warning "livekit.yml not found"
|
||||
cp "${STOATCHAT_DIR}/manage-services.sh" "${BACKUP_PATH}/config/" 2>/dev/null || warning "manage-services.sh not found"
|
||||
success "Configuration files backed up"
|
||||
|
||||
# 3. Backup Nginx Configuration
|
||||
log "Backing up Nginx configuration..."
|
||||
mkdir -p "${BACKUP_PATH}/nginx"
|
||||
cp -r /etc/nginx/sites-available/st.vish.gg "${BACKUP_PATH}/nginx/" 2>/dev/null || warning "Nginx site config not found"
|
||||
cp -r /etc/nginx/ssl/ "${BACKUP_PATH}/nginx/" 2>/dev/null || warning "SSL certificates not found"
|
||||
success "Nginx configuration backed up"
|
||||
|
||||
# 4. Backup User Uploads and Files
|
||||
log "Backing up user uploads and file storage..."
|
||||
mkdir -p "${BACKUP_PATH}/files"
|
||||
# Backup autumn (file server) uploads if they exist
|
||||
if [ -d "${STOATCHAT_DIR}/uploads" ]; then
|
||||
cp -r "${STOATCHAT_DIR}/uploads" "${BACKUP_PATH}/files/"
|
||||
success "User uploads backed up"
|
||||
else
|
||||
warning "No uploads directory found"
|
||||
fi
|
||||
|
||||
# Check for Docker volume data
|
||||
if docker volume ls | grep -q stoatchat; then
|
||||
log "Backing up Docker volumes..."
|
||||
mkdir -p "${BACKUP_PATH}/docker-volumes"
|
||||
for volume in $(docker volume ls --format "{{.Name}}" | grep stoatchat); do
|
||||
log "Backing up volume: $volume"
|
||||
docker run --rm -v "$volume":/source -v "${BACKUP_PATH}/docker-volumes":/backup alpine tar czf "/backup/${volume}.tar.gz" -C /source .
|
||||
done
|
||||
success "Docker volumes backed up"
|
||||
fi
|
||||
|
||||
# 5. Backup Environment and System Info
|
||||
log "Backing up system information..."
|
||||
mkdir -p "${BACKUP_PATH}/system"
|
||||
|
||||
# Save running processes
|
||||
ps aux | grep -E "(revolt|stoatchat|nginx|mongo|redis|livekit)" > "${BACKUP_PATH}/system/processes.txt" 2>/dev/null || true
|
||||
|
||||
# Save Docker containers
|
||||
docker ps -a > "${BACKUP_PATH}/system/docker-containers.txt" 2>/dev/null || true
|
||||
|
||||
# Save network configuration
|
||||
ss -tulpn > "${BACKUP_PATH}/system/network-ports.txt" 2>/dev/null || true
|
||||
|
||||
# Save environment variables (filtered for security)
|
||||
env | grep -E "(REVOLT|STOATCHAT|LIVEKIT)" | grep -v -E "(PASSWORD|SECRET|TOKEN)" > "${BACKUP_PATH}/system/environment.txt" 2>/dev/null || true
|
||||
|
||||
# Save installed packages
|
||||
dpkg -l > "${BACKUP_PATH}/system/installed-packages.txt" 2>/dev/null || true
|
||||
|
||||
# Save systemd services
|
||||
systemctl list-units --type=service --state=running > "${BACKUP_PATH}/system/systemd-services.txt" 2>/dev/null || true
|
||||
|
||||
success "System information backed up"
|
||||
|
||||
# 6. Create backup metadata
|
||||
log "Creating backup metadata..."
|
||||
cat > "${BACKUP_PATH}/backup-info.txt" << EOF
|
||||
Stoatchat Backup Information
|
||||
============================
|
||||
Backup Date: $(date)
|
||||
Backup Name: ${BACKUP_NAME}
|
||||
Source Directory: ${STOATCHAT_DIR}
|
||||
Hostname: $(hostname)
|
||||
OS: $(lsb_release -d 2>/dev/null | cut -f2 || echo "Unknown")
|
||||
Kernel: $(uname -r)
|
||||
|
||||
Services Status at Backup Time:
|
||||
$(systemctl is-active nginx 2>/dev/null || echo "nginx: unknown")
|
||||
$(docker ps --format "table {{.Names}}\t{{.Status}}" 2>/dev/null || echo "Docker: not available")
|
||||
|
||||
Git Information:
|
||||
$(cd "${STOATCHAT_DIR}" && git remote -v 2>/dev/null || echo "No git repository")
|
||||
$(cd "${STOATCHAT_DIR}" && git log -1 --oneline 2>/dev/null || echo "No git history")
|
||||
|
||||
Backup Contents:
|
||||
- MongoDB database (revolt)
|
||||
- Configuration files (Revolt.toml, Revolt.overrides.toml, compose.yml, etc.)
|
||||
- Nginx configuration and SSL certificates
|
||||
- User uploads and file storage
|
||||
- Docker volumes
|
||||
- System information and process list
|
||||
EOF
|
||||
|
||||
success "Backup metadata created"
|
||||
|
||||
# 7. Create compressed archive
|
||||
log "Creating compressed archive..."
|
||||
cd "${BACKUP_DIR}"
|
||||
tar -czf "${BACKUP_NAME}.tar.gz" "${BACKUP_NAME}/"
|
||||
ARCHIVE_SIZE=$(du -h "${BACKUP_NAME}.tar.gz" | cut -f1)
|
||||
success "Compressed archive created: ${BACKUP_NAME}.tar.gz (${ARCHIVE_SIZE})"
|
||||
|
||||
# 8. Cleanup old backups (keep last 7 days)
|
||||
log "Cleaning up old backups (keeping last 7 days)..."
|
||||
find "${BACKUP_DIR}" -name "stoatchat_backup_*.tar.gz" -mtime +7 -delete 2>/dev/null || true
|
||||
find "${BACKUP_DIR}" -name "stoatchat_backup_*" -type d -mtime +7 -exec rm -rf {} + 2>/dev/null || true
|
||||
success "Old backups cleaned up"
|
||||
|
||||
# 9. Verify backup integrity
|
||||
log "Verifying backup integrity..."
|
||||
if tar -tzf "${BACKUP_NAME}.tar.gz" >/dev/null 2>&1; then
|
||||
success "Backup archive integrity verified"
|
||||
else
|
||||
error "Backup archive is corrupted!"
|
||||
fi
|
||||
|
||||
# Final summary
|
||||
echo
|
||||
echo "=================================================="
|
||||
echo -e "${GREEN}🎉 BACKUP COMPLETED SUCCESSFULLY! 🎉${NC}"
|
||||
echo "=================================================="
|
||||
echo "Backup Location: ${BACKUP_PATH}.tar.gz"
|
||||
echo "Backup Size: ${ARCHIVE_SIZE}"
|
||||
echo "Backup Contains:"
|
||||
echo " ✅ MongoDB database"
|
||||
echo " ✅ Configuration files"
|
||||
echo " ✅ Nginx configuration & SSL certificates"
|
||||
echo " ✅ User uploads & file storage"
|
||||
echo " ✅ Docker volumes"
|
||||
echo " ✅ System information"
|
||||
echo
|
||||
echo "To restore this backup on a new machine:"
|
||||
echo " 1. Extract: tar -xzf ${BACKUP_NAME}.tar.gz"
|
||||
echo " 2. Follow the deployment guide in DEPLOYMENT.md"
|
||||
echo " 3. Run the restore script: ./restore.sh ${BACKUP_NAME}"
|
||||
echo
|
||||
echo "Backup completed at: $(date)"
|
||||
echo "=================================================="
|
||||
142
docs/infrastructure/monitoring/dashboard-verification-report.md
Normal file
142
docs/infrastructure/monitoring/dashboard-verification-report.md
Normal file
@@ -0,0 +1,142 @@
|
||||
# Grafana Dashboard Verification Report
|
||||
|
||||
## Executive Summary
|
||||
✅ **All dashboard sections are now working correctly**
|
||||
✅ **Datasource UID mismatches resolved**
|
||||
✅ **Template variables configured with correct default values**
|
||||
✅ **All key metrics displaying data**
|
||||
|
||||
## Issues Resolved
|
||||
|
||||
### 1. Datasource UID Mismatch
|
||||
- **Problem**: Dashboard JSON files contained hardcoded UID `cfbskvs8upds0b`
|
||||
- **Actual UID**: `PBFA97CFB590B2093`
|
||||
- **Solution**: Updated all dashboard files with correct datasource UID
|
||||
- **Files Fixed**:
|
||||
- infrastructure-overview.json
|
||||
- node-details.json
|
||||
- node-exporter-full.json
|
||||
- synology-nas-monitoring.json
|
||||
|
||||
### 2. Template Variable Default Values
|
||||
- **Problem**: Template variables had incorrect default values (e.g., `node_exporter`, `homelab-vm`)
|
||||
- **Solution**: Updated defaults to match actual job names and instances
|
||||
- **Updates Made**:
|
||||
- Job: `node_exporter` → `atlantis-node`
|
||||
- Nodename: `homelab` → `atlantis`
|
||||
- Instance: `homelab-vm` → `100.83.230.112:9100`
|
||||
|
||||
## Dashboard Status
|
||||
|
||||
### 🟢 Node Exporter Full Dashboard
|
||||
- **UID**: `rYdddlPWk`
|
||||
- **Panels**: 32 panels, all functional
|
||||
- **Template Variables**: ✅ All working
|
||||
- DS_PROMETHEUS: Prometheus
|
||||
- job: atlantis-node
|
||||
- nodename: atlantis
|
||||
- node: 100.83.230.112:9100
|
||||
- diskdevices: [a-z]+|nvme[0-9]+n[0-9]+|mmcblk[0-9]+
|
||||
- **Key Metrics**: ✅ All displaying data
|
||||
- CPU Usage: 11.35%
|
||||
- Memory Usage: 65.05%
|
||||
- Disk I/O: 123 data points
|
||||
- Network Traffic: 297 data points
|
||||
|
||||
### 🟢 Synology NAS Monitoring Dashboard
|
||||
- **UID**: `synology-dashboard-v2`
|
||||
- **Panels**: 8 panels, all functional
|
||||
- **Key Metrics**: ✅ All displaying data
|
||||
- Storage Usage: 67.62%
|
||||
- Disk Temperatures: 18 sensors
|
||||
- System Uptime: 3 devices
|
||||
- SNMP Targets: 3 up
|
||||
|
||||
### 🟢 Node Details Dashboard
|
||||
- **UID**: `node-details-v2`
|
||||
- **Panels**: 21 panels, all functional
|
||||
- **Template Variables**: ✅ Fixed
|
||||
- datasource: Prometheus
|
||||
- job: atlantis-node
|
||||
- instance: 100.83.230.112:9100
|
||||
|
||||
### 🟢 Infrastructure Overview Dashboard
|
||||
- **UID**: `infrastructure-overview-v2`
|
||||
- **Panels**: 7 panels, all functional
|
||||
- **Template Variables**: ✅ Fixed
|
||||
- datasource: Prometheus
|
||||
- job: All (multi-select enabled)
|
||||
|
||||
## Monitoring Targets Health
|
||||
|
||||
### Node Exporters (8 total)
|
||||
- ✅ atlantis-node: 100.83.230.112:9100
|
||||
- ✅ calypso-node: 100.103.48.78:9100
|
||||
- ✅ concord-nuc-node: 100.72.55.21:9100
|
||||
- ✅ homelab-node: 100.67.40.126:9100
|
||||
- ✅ proxmox-node: 100.87.12.28:9100
|
||||
- ✅ raspberry-pis: 100.77.151.40:9100
|
||||
- ✅ setillo-node: 100.125.0.20:9100
|
||||
- ✅ truenas-node: 100.75.252.64:9100
|
||||
- ❌ raspberry-pis: 100.123.246.75:9100 (down)
|
||||
- ❌ vmi2076105-node: 100.99.156.20:9100 (down)
|
||||
|
||||
**Active Node Targets**: 7/8 (87.5% uptime)
|
||||
|
||||
### SNMP Targets (3 total)
|
||||
- ✅ atlantis-snmp: 100.83.230.112
|
||||
- ✅ calypso-snmp: 100.103.48.78
|
||||
- ✅ setillo-snmp: 100.125.0.20
|
||||
|
||||
**Active SNMP Targets**: 3/3 (100% uptime)
|
||||
|
||||
### System Services
|
||||
- ✅ prometheus: prometheus:9090
|
||||
- ✅ alertmanager: alertmanager:9093
|
||||
|
||||
## Dashboard Access URLs
|
||||
|
||||
- **Node Exporter Full**: http://localhost:3300/d/rYdddlPWk
|
||||
- **Synology NAS**: http://localhost:3300/d/synology-dashboard-v2
|
||||
- **Node Details**: http://localhost:3300/d/node-details-v2
|
||||
- **Infrastructure Overview**: http://localhost:3300/d/infrastructure-overview-v2
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Prometheus Configuration
|
||||
- **Endpoint**: http://prometheus:9090
|
||||
- **Datasource UID**: PBFA97CFB590B2093
|
||||
- **Status**: ✅ Healthy
|
||||
- **Targets**: 15 total (13 up, 2 down)
|
||||
|
||||
### GitOps Implementation
|
||||
- **Repository**: /home/homelab/docker/monitoring
|
||||
- **Provisioning**: Automated via Grafana provisioning
|
||||
- **Dashboards**: Auto-loaded from `/grafana/dashboards/`
|
||||
- **Datasources**: Auto-configured from `/grafana/provisioning/datasources/`
|
||||
|
||||
## Verification Scripts
|
||||
|
||||
Two verification scripts have been created:
|
||||
|
||||
1. **fix-datasource-uids.sh**: Automated UID correction script
|
||||
2. **verify-dashboard-sections.sh**: Comprehensive dashboard testing script
|
||||
|
||||
## Recommendations
|
||||
|
||||
1. **Monitor Down Targets**: Investigate the 2 down targets:
|
||||
- raspberry-pis: 100.123.246.75:9100
|
||||
- vmi2076105-node: 100.99.156.20:9100
|
||||
|
||||
2. **Regular Health Checks**: Run `verify-dashboard-sections.sh` periodically to ensure continued functionality
|
||||
|
||||
3. **Template Variable Optimization**: Consider setting up more dynamic defaults based on available targets
|
||||
|
||||
## Conclusion
|
||||
|
||||
✅ **All dashboard sections are now fully functional**
|
||||
✅ **Data is displaying correctly across all panels**
|
||||
✅ **Template variables are working as expected**
|
||||
✅ **GitOps implementation is successful**
|
||||
|
||||
The Grafana monitoring setup is now complete and operational with all major dashboard sections verified and working correctly.
|
||||
48
docs/infrastructure/monitoring/docker-compose.yml
Normal file
48
docs/infrastructure/monitoring/docker-compose.yml
Normal file
@@ -0,0 +1,48 @@
|
||||
version: "3.8"
|
||||
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus:latest
|
||||
container_name: prometheus
|
||||
volumes:
|
||||
- ./prometheus:/etc/prometheus
|
||||
- prometheus-data:/prometheus
|
||||
command:
|
||||
- "--config.file=/etc/prometheus/prometheus.yml"
|
||||
- "--storage.tsdb.path=/prometheus"
|
||||
- "--web.enable-lifecycle"
|
||||
ports:
|
||||
- "9090:9090"
|
||||
restart: unless-stopped
|
||||
|
||||
grafana:
|
||||
image: grafana/grafana-oss:latest
|
||||
container_name: grafana
|
||||
environment:
|
||||
- GF_SECURITY_ADMIN_USER=admin
|
||||
- GF_SECURITY_ADMIN_PASSWORD="REDACTED_PASSWORD"
|
||||
volumes:
|
||||
- grafana-data:/var/lib/grafana
|
||||
- ./grafana/provisioning/datasources:/etc/grafana/provisioning/datasources
|
||||
- ./grafana/provisioning/dashboards:/etc/grafana/provisioning/dashboards
|
||||
- ./grafana/dashboards:/var/lib/grafana/dashboards
|
||||
ports:
|
||||
- "3300:3000"
|
||||
restart: unless-stopped
|
||||
|
||||
node_exporter:
|
||||
image: prom/node-exporter:latest
|
||||
container_name: node_exporter
|
||||
network_mode: host
|
||||
pid: host
|
||||
volumes:
|
||||
- /:/host:ro,rslave
|
||||
- /sys:/host/sys:ro
|
||||
- /proc:/host/proc:ro
|
||||
command:
|
||||
- '--path.rootfs=/host'
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
prometheus-data:
|
||||
grafana-data:
|
||||
@@ -0,0 +1,373 @@
|
||||
{
|
||||
"id": 1,
|
||||
"panels": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": ""
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"mappings": [
|
||||
{
|
||||
"options": {
|
||||
"0": {
|
||||
"color": "red",
|
||||
"text": "DOWN"
|
||||
},
|
||||
"1": {
|
||||
"color": "green",
|
||||
"text": "UP"
|
||||
}
|
||||
},
|
||||
"type": "value"
|
||||
}
|
||||
],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "red",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 1,
|
||||
"options": {
|
||||
"colorMode": "background",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
},
|
||||
"textMode": "value_and_name"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "up{job=~\"\"}",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Device Status",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": ""
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 5
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (avg by(job) (rate(node_cpu_seconds_total{mode=\"idle\", job=~\"\"}[5m])) * 100)",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "CPU Usage",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": ""
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 5
|
||||
},
|
||||
"id": 3,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(1 - (node_memory_MemAvailable_bytes{job=~\"\"} / node_memory_MemTotal_bytes{job=~\"\"})) * 100",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": ""
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 70
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 85
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 13
|
||||
},
|
||||
"id": 4,
|
||||
"options": {
|
||||
"displayMode": "gradient",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - ((node_filesystem_avail_bytes{job=~\"\", mountpoint=\"/\", fstype!=\"rootfs\"} / node_filesystem_size_bytes{job=~\"\", mountpoint=\"/\", fstype!=\"rootfs\"}) * 100)",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Root Disk Usage",
|
||||
"type": "bargauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": ""
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "s"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 13
|
||||
},
|
||||
"id": 5,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_time_seconds{job=~\"\"} - node_boot_time_seconds{job=~\"\"}",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Uptime",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": ""
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 21
|
||||
},
|
||||
"id": 6,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(job) (rate(node_network_receive_bytes_total{job=~\"\", device!~\"lo|docker.*|br-.*|veth.*\"}[5m]))",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Network Receive",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": ""
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 21
|
||||
},
|
||||
"id": 7,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by(job) (rate(node_network_transmit_bytes_total{job=~\"\", device!~\"lo|docker.*|br-.*|veth.*\"}[5m]))",
|
||||
"legendFormat": "{{job}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Network Transmit",
|
||||
"type": "timeseries"
|
||||
}
|
||||
],
|
||||
"refresh": "30s",
|
||||
"schemaVersion": 38,
|
||||
"tags": [
|
||||
"infrastructure",
|
||||
"node-exporter",
|
||||
"tailscale"
|
||||
],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"current": {
|
||||
"text": "Prometheus",
|
||||
"value": "PBFA97CFB590B2093"
|
||||
},
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Data Source",
|
||||
"multi": false,
|
||||
"name": "datasource",
|
||||
"options": [],
|
||||
"query": "prometheus",
|
||||
"refresh": 1,
|
||||
"type": "datasource"
|
||||
},
|
||||
{
|
||||
"allValue": "",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"definition": "label_values(node_uname_info, job)",
|
||||
"hide": 0,
|
||||
"includeAll": true,
|
||||
"label": "Host",
|
||||
"multi": true,
|
||||
"name": "job",
|
||||
"query": "label_values(node_uname_info, job)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 1,
|
||||
"type": "query"
|
||||
}
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Infrastructure Overview - All Devices",
|
||||
"uid": "infrastructure-overview-v2",
|
||||
"version": 4
|
||||
}
|
||||
@@ -0,0 +1,941 @@
|
||||
{
|
||||
"id": 2,
|
||||
"panels": [
|
||||
{
|
||||
"collapsed": false,
|
||||
"gridPos": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 1,
|
||||
"title": "📊 Quick Stats",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "s"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 4,
|
||||
"x": 0,
|
||||
"y": 1
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "none",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_time_seconds{job=\"$job\",instance=\"$instance\"} - node_boot_time_seconds{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "Uptime",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Uptime",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "blue",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 3,
|
||||
"x": 4,
|
||||
"y": 1
|
||||
},
|
||||
"id": 3,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "none",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(node_cpu_seconds_total{job=\"$job\",instance=\"$instance\",mode=\"idle\"})",
|
||||
"legendFormat": "Cores",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "CPU Cores",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "purple",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "bytes"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 3,
|
||||
"x": 7,
|
||||
"y": 1
|
||||
},
|
||||
"id": 4,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "none",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_memory_MemTotal_bytes{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "RAM",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Total RAM",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 60
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 80
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 3,
|
||||
"x": 10,
|
||||
"y": 1
|
||||
},
|
||||
"id": 5,
|
||||
"options": {
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (avg(rate(node_cpu_seconds_total{job=\"$job\",instance=\"$instance\",mode=\"idle\"}[5m])) * 100)",
|
||||
"legendFormat": "CPU",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "CPU",
|
||||
"type": "gauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 70
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 85
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 3,
|
||||
"x": 13,
|
||||
"y": 1
|
||||
},
|
||||
"id": 6,
|
||||
"options": {
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(1 - (node_memory_MemAvailable_bytes{job=\"$job\",instance=\"$instance\"} / node_memory_MemTotal_bytes{job=\"$job\",instance=\"$instance\"})) * 100",
|
||||
"legendFormat": "Memory",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Memory",
|
||||
"type": "gauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 70
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 85
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 3,
|
||||
"x": 16,
|
||||
"y": 1
|
||||
},
|
||||
"id": 7,
|
||||
"options": {
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - ((node_filesystem_avail_bytes{job=\"$job\",instance=\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"} / node_filesystem_size_bytes{job=\"$job\",instance=\"$instance\",mountpoint=\"/\",fstype!=\"rootfs\"}) * 100)",
|
||||
"legendFormat": "Disk",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Disk /",
|
||||
"type": "gauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 2,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 4
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 2,
|
||||
"x": 19,
|
||||
"y": 1
|
||||
},
|
||||
"id": 8,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_load1{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "1m",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Load 1m",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"decimals": 2,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 4
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 2,
|
||||
"x": 21,
|
||||
"y": 1
|
||||
},
|
||||
"id": 9,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_load5{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "5m",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Load 5m",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"collapsed": false,
|
||||
"gridPos": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 5
|
||||
},
|
||||
"id": 10,
|
||||
"title": "🖥️ CPU Details",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {
|
||||
"fillOpacity": 50,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "normal"
|
||||
}
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 6
|
||||
},
|
||||
"id": 11,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg(rate(node_cpu_seconds_total{job=\"$job\",instance=\"$instance\",mode=\"user\"}[5m])) * 100",
|
||||
"legendFormat": "User",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "avg(rate(node_cpu_seconds_total{job=\"$job\",instance=\"$instance\",mode=\"system\"}[5m])) * 100",
|
||||
"legendFormat": "System",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"expr": "avg(rate(node_cpu_seconds_total{job=\"$job\",instance=\"$instance\",mode=\"iowait\"}[5m])) * 100",
|
||||
"legendFormat": "IOWait",
|
||||
"refId": "C"
|
||||
},
|
||||
{
|
||||
"expr": "avg(rate(node_cpu_seconds_total{job=\"$job\",instance=\"$instance\",mode=\"steal\"}[5m])) * 100",
|
||||
"legendFormat": "Steal",
|
||||
"refId": "D"
|
||||
}
|
||||
],
|
||||
"title": "CPU Usage Breakdown",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 6
|
||||
},
|
||||
"id": 12,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (rate(node_cpu_seconds_total{job=\"$job\",instance=\"$instance\",mode=\"idle\"}[5m]) * 100)",
|
||||
"legendFormat": "CPU {{cpu}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "CPU Per Core",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"collapsed": false,
|
||||
"gridPos": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 14
|
||||
},
|
||||
"id": 20,
|
||||
"title": "🧠 Memory Details",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"custom": {
|
||||
"fillOpacity": 30,
|
||||
"stacking": {
|
||||
"group": "A",
|
||||
"mode": "normal"
|
||||
}
|
||||
},
|
||||
"unit": "bytes"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 15
|
||||
},
|
||||
"id": 21,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_memory_MemTotal_bytes{job=\"$job\",instance=\"$instance\"} - node_memory_MemAvailable_bytes{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "Used",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_Buffers_bytes{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "Buffers",
|
||||
"refId": "B"
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_Cached_bytes{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "Cached",
|
||||
"refId": "C"
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_MemFree_bytes{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "Free",
|
||||
"refId": "D"
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "bytes"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 15
|
||||
},
|
||||
"id": 22,
|
||||
"targets": [
|
||||
{
|
||||
"expr": "node_memory_SwapTotal_bytes{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "Total",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "node_memory_SwapTotal_bytes{job=\"$job\",instance=\"$instance\"} - node_memory_SwapFree_bytes{job=\"$job\",instance=\"$instance\"}",
|
||||
"legendFormat": "Used",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"title": "Swap Usage",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"collapsed": false,
|
||||
"gridPos": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 23
|
||||
},
|
||||
"id": 30,
|
||||
"title": "💾 Disk Details",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 70
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 85
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 24
|
||||
},
|
||||
"id": 31,
|
||||
"options": {
|
||||
"displayMode": "gradient",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - ((node_filesystem_avail_bytes{job=\"$job\",instance=\"$instance\",fstype!~\"tmpfs|overlay|squashfs\"} / node_filesystem_size_bytes{job=\"$job\",instance=\"$instance\",fstype!~\"tmpfs|overlay|squashfs\"}) * 100)",
|
||||
"legendFormat": "{{mountpoint}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Disk Space Usage",
|
||||
"type": "bargauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps"
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byRegexp",
|
||||
"options": ".*Write.*"
|
||||
},
|
||||
"properties": [
|
||||
{
|
||||
"id": "custom.transform",
|
||||
"value": "negative-Y"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 24
|
||||
},
|
||||
"id": 32,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(node_disk_read_bytes_total{job=\"$job\",instance=\"$instance\",device!~\"loop.*|dm-.*\"}[5m])",
|
||||
"legendFormat": "{{device}} Read",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "rate(node_disk_written_bytes_total{job=\"$job\",instance=\"$instance\",device!~\"loop.*|dm-.*\"}[5m])",
|
||||
"legendFormat": "{{device}} Write",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"title": "Disk I/O",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"collapsed": false,
|
||||
"gridPos": {
|
||||
"h": 1,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 32
|
||||
},
|
||||
"id": 40,
|
||||
"title": "🌐 Network Details",
|
||||
"type": "row"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "bps"
|
||||
},
|
||||
"overrides": [
|
||||
{
|
||||
"matcher": {
|
||||
"id": "byRegexp",
|
||||
"options": ".*TX.*"
|
||||
},
|
||||
"properties": [
|
||||
{
|
||||
"id": "custom.transform",
|
||||
"value": "negative-Y"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 33
|
||||
},
|
||||
"id": 41,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean",
|
||||
"max"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(node_network_receive_bytes_total{job=\"$job\",instance=\"$instance\",device!~\"lo|docker.*|br-.*|veth.*\"}[5m]) * 8",
|
||||
"legendFormat": "{{device}} RX",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "rate(node_network_transmit_bytes_total{job=\"$job\",instance=\"$instance\",device!~\"lo|docker.*|br-.*|veth.*\"}[5m]) * 8",
|
||||
"legendFormat": "{{device}} TX",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"title": "Network Traffic",
|
||||
"type": "timeseries"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "pps"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 33
|
||||
},
|
||||
"id": 42,
|
||||
"options": {
|
||||
"legend": {
|
||||
"calcs": [
|
||||
"mean"
|
||||
],
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "rate(node_network_receive_errs_total{job=\"$job\",instance=\"$instance\",device!~\"lo|docker.*|br-.*|veth.*\"}[5m])",
|
||||
"legendFormat": "{{device}} RX Errors",
|
||||
"refId": "A"
|
||||
},
|
||||
{
|
||||
"expr": "rate(node_network_transmit_errs_total{job=\"$job\",instance=\"$instance\",device!~\"lo|docker.*|br-.*|veth.*\"}[5m])",
|
||||
"legendFormat": "{{device}} TX Errors",
|
||||
"refId": "B"
|
||||
}
|
||||
],
|
||||
"title": "Network Errors",
|
||||
"type": "timeseries"
|
||||
}
|
||||
],
|
||||
"refresh": "30s",
|
||||
"schemaVersion": 38,
|
||||
"tags": [
|
||||
"node-exporter",
|
||||
"detailed",
|
||||
"infrastructure"
|
||||
],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"current": {
|
||||
"text": "Prometheus",
|
||||
"value": "PBFA97CFB590B2093"
|
||||
},
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Data Source",
|
||||
"multi": false,
|
||||
"name": "datasource",
|
||||
"options": [],
|
||||
"query": "prometheus",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"type": "datasource"
|
||||
},
|
||||
{
|
||||
"current": {
|
||||
"text": "atlantis-node",
|
||||
"value": "atlantis-node"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"definition": "label_values(node_uname_info, job)",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Host",
|
||||
"multi": false,
|
||||
"name": "job",
|
||||
"options": [],
|
||||
"query": "label_values(node_uname_info, job)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"sort": 1,
|
||||
"type": "query"
|
||||
},
|
||||
{
|
||||
"current": {
|
||||
"text": "100.83.230.112:9100",
|
||||
"value": "100.83.230.112:9100"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"definition": "label_values(node_uname_info{job=\"$job\"}, instance)",
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Instance",
|
||||
"multi": false,
|
||||
"name": "instance",
|
||||
"options": [],
|
||||
"query": "label_values(node_uname_info{job=\"$job\"}, instance)",
|
||||
"refresh": 2,
|
||||
"regex": "",
|
||||
"skipUrlSync": false,
|
||||
"sort": 1,
|
||||
"type": "query"
|
||||
}
|
||||
]
|
||||
},
|
||||
"time": {
|
||||
"from": "now-1h",
|
||||
"to": "now"
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Node Details - Full Metrics",
|
||||
"uid": "node-details-v2",
|
||||
"version": 2
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,509 @@
|
||||
{
|
||||
"id": 3,
|
||||
"panels": [
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"mappings": [
|
||||
{
|
||||
"options": {
|
||||
"1": {
|
||||
"color": "green",
|
||||
"text": "Normal"
|
||||
},
|
||||
"2": {
|
||||
"color": "red",
|
||||
"text": "Failed"
|
||||
}
|
||||
},
|
||||
"type": "value"
|
||||
}
|
||||
],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 2
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"id": 1,
|
||||
"options": {
|
||||
"colorMode": "background",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
},
|
||||
"textMode": "value_and_name"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "systemStatus{instance=~\"\"}",
|
||||
"legendFormat": "{{instance}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "NAS Status",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 80,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 65
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "celsius"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 6,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 4
|
||||
},
|
||||
"id": 2,
|
||||
"options": {
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "temperature{instance=~\"\"}",
|
||||
"legendFormat": "{{instance}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Temperature",
|
||||
"type": "gauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 70
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 90
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 6,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 4
|
||||
},
|
||||
"id": 3,
|
||||
"options": {
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((memTotalReal{instance=~\"\"} - memAvailReal{instance=~\"\"}) / memTotalReal{instance=~\"\"}) * 100",
|
||||
"legendFormat": "{{instance}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Memory Usage",
|
||||
"type": "gauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "blue",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "decbytes"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 6,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 4
|
||||
},
|
||||
"id": 4,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "none",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "memTotalReal{instance=~\"\"} * 1024",
|
||||
"legendFormat": "{{instance}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Total Memory",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 40
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 50
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "celsius"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 6,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 10
|
||||
},
|
||||
"id": 5,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "diskTemperature{instance=~\"\"}",
|
||||
"legendFormat": "{{instance}} - Disk {{diskIndex}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Disk Temperature",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"mappings": [
|
||||
{
|
||||
"options": {
|
||||
"1": {
|
||||
"color": "green",
|
||||
"text": "Normal"
|
||||
},
|
||||
"11": {
|
||||
"color": "orange",
|
||||
"text": "Degraded"
|
||||
},
|
||||
"12": {
|
||||
"color": "red",
|
||||
"text": "Crashed"
|
||||
},
|
||||
"2": {
|
||||
"color": "yellow",
|
||||
"text": "Repairing"
|
||||
},
|
||||
"3": {
|
||||
"color": "yellow",
|
||||
"text": "Migrating"
|
||||
},
|
||||
"4": {
|
||||
"color": "yellow",
|
||||
"text": "Expanding"
|
||||
},
|
||||
"5": {
|
||||
"color": "orange",
|
||||
"text": "Deleting"
|
||||
},
|
||||
"6": {
|
||||
"color": "blue",
|
||||
"text": "Creating"
|
||||
}
|
||||
},
|
||||
"type": "value"
|
||||
}
|
||||
],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 6,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 10
|
||||
},
|
||||
"id": 6,
|
||||
"options": {
|
||||
"colorMode": "background",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
},
|
||||
"textMode": "value_and_name"
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "raidStatus{instance=~\"\"}",
|
||||
"legendFormat": "{{instance}} - {{raidIndex}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "RAID Status",
|
||||
"type": "stat"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"max": 100,
|
||||
"min": 0,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 70
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 85
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 16
|
||||
},
|
||||
"id": 7,
|
||||
"options": {
|
||||
"displayMode": "gradient",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "((raidTotalSize{instance=~\"\"} - raidFreeSize{instance=~\"\"}) / raidTotalSize{instance=~\"\"}) * 100",
|
||||
"legendFormat": "{{instance}} - RAID {{raidIndex}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "RAID Usage",
|
||||
"type": "bargauge"
|
||||
},
|
||||
{
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "dtdurations"
|
||||
}
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 24
|
||||
},
|
||||
"id": 8,
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
]
|
||||
}
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sysUpTime{instance=~\"\"} / 100",
|
||||
"legendFormat": "{{instance}}",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"title": "Uptime",
|
||||
"type": "stat"
|
||||
}
|
||||
],
|
||||
"refresh": "30s",
|
||||
"schemaVersion": 38,
|
||||
"tags": [
|
||||
"synology",
|
||||
"nas",
|
||||
"snmp"
|
||||
],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"current": {
|
||||
"text": "Prometheus",
|
||||
"value": "PBFA97CFB590B2093"
|
||||
},
|
||||
"hide": 0,
|
||||
"includeAll": false,
|
||||
"label": "Data Source",
|
||||
"multi": false,
|
||||
"name": "datasource",
|
||||
"options": [],
|
||||
"query": "prometheus",
|
||||
"refresh": 1,
|
||||
"type": "datasource"
|
||||
},
|
||||
{
|
||||
"allValue": "",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "PBFA97CFB590B2093"
|
||||
},
|
||||
"definition": "label_values(diskTemperature, instance)",
|
||||
"hide": 0,
|
||||
"includeAll": true,
|
||||
"label": "NAS",
|
||||
"multi": true,
|
||||
"name": "instance",
|
||||
"query": "label_values(diskTemperature, instance)",
|
||||
"refresh": 1,
|
||||
"regex": "",
|
||||
"sort": 1,
|
||||
"type": "query"
|
||||
}
|
||||
]
|
||||
},
|
||||
"timezone": "browser",
|
||||
"title": "Synology NAS Monitoring",
|
||||
"uid": "synology-dashboard-v2",
|
||||
"version": 4
|
||||
}
|
||||
@@ -0,0 +1,12 @@
|
||||
apiVersion: 1
|
||||
|
||||
providers:
|
||||
- name: 'default'
|
||||
orgId: 1
|
||||
folder: ''
|
||||
type: file
|
||||
disableDeletion: false
|
||||
updateIntervalSeconds: 10
|
||||
allowUiUpdates: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards
|
||||
@@ -0,0 +1,9 @@
|
||||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus:9090
|
||||
isDefault: true
|
||||
editable: true
|
||||
146
docs/infrastructure/monitoring/prometheus/alert-rules.yml
Normal file
146
docs/infrastructure/monitoring/prometheus/alert-rules.yml
Normal file
@@ -0,0 +1,146 @@
|
||||
# Prometheus Alerting Rules for Homelab Infrastructure
|
||||
|
||||
groups:
|
||||
- name: host-availability
|
||||
interval: 30s
|
||||
rules:
|
||||
- alert: HostDown
|
||||
expr: up{job=~".*-node"} == 0
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Host {{ $labels.instance }} is down"
|
||||
description: "Host {{ $labels.instance }} has been unreachable for more than 2 minutes."
|
||||
|
||||
- alert: HostHighLoadAverage
|
||||
expr: node_load15 / count without(cpu, mode) (node_cpu_seconds_total{mode="idle"}) > 2
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High load average on {{ $labels.instance }}"
|
||||
description: "15-minute load average is {{ $value | printf \"%.2f\" }} on {{ $labels.instance }}."
|
||||
|
||||
- name: cpu-alerts
|
||||
interval: 30s
|
||||
rules:
|
||||
- alert: REDACTED_APP_PASSWORD
|
||||
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High CPU usage on {{ $labels.instance }}"
|
||||
description: "CPU usage is {{ $value | printf \"%.1f\" }}% on {{ $labels.instance }}."
|
||||
|
||||
- alert: HostCriticalCpuUsage
|
||||
expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 95
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "🔥 CRITICAL CPU on {{ $labels.instance }}"
|
||||
description: "CPU usage is {{ $value | printf \"%.1f\" }}% on {{ $labels.instance }}. Immediate attention required!"
|
||||
|
||||
- name: memory-alerts
|
||||
interval: 30s
|
||||
rules:
|
||||
- alert: HostHighMemoryUsage
|
||||
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High memory usage on {{ $labels.instance }}"
|
||||
description: "Memory usage is {{ $value | printf \"%.1f\" }}% on {{ $labels.instance }}."
|
||||
|
||||
- alert: HostCriticalMemoryUsage
|
||||
expr: (1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 95
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "🔥 CRITICAL Memory on {{ $labels.instance }}"
|
||||
description: "Memory usage is {{ $value | printf \"%.1f\" }}% on {{ $labels.instance }}."
|
||||
|
||||
- alert: HostOutOfMemory
|
||||
expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 5
|
||||
for: 2m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "💀 OUT OF MEMORY on {{ $labels.instance }}"
|
||||
description: "Only {{ $value | printf \"%.1f\" }}% memory available on {{ $labels.instance }}."
|
||||
|
||||
- name: disk-alerts
|
||||
interval: 60s
|
||||
rules:
|
||||
- alert: HostHighDiskUsage
|
||||
expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"})) * 100 > 80
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Disk space warning on {{ $labels.instance }}"
|
||||
description: "Disk {{ $labels.mountpoint }} is {{ $value | printf \"%.1f\" }}% full on {{ $labels.instance }}."
|
||||
|
||||
- alert: HostCriticalDiskUsage
|
||||
expr: (1 - (node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"} / node_filesystem_size_bytes{fstype!~"tmpfs|overlay"})) * 100 > 90
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "🔥 CRITICAL Disk space on {{ $labels.instance }}"
|
||||
description: "Disk {{ $labels.mountpoint }} is {{ $value | printf \"%.1f\" }}% full on {{ $labels.instance }}."
|
||||
|
||||
- alert: HostDiskWillFillIn24Hours
|
||||
expr: predict_linear(node_filesystem_avail_bytes{fstype!~"tmpfs|overlay"}[6h], 24*60*60) < 0
|
||||
for: 30m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Disk {{ $labels.mountpoint }} will fill within 24 hours"
|
||||
description: "Based on current growth rate, disk on {{ $labels.instance }} will be full within 24 hours."
|
||||
|
||||
- alert: REDACTED_APP_PASSWORD
|
||||
expr: node_filesystem_readonly{fstype!~"tmpfs|overlay"} == 1
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "🔥 Filesystem is read-only on {{ $labels.instance }}"
|
||||
description: "Filesystem {{ $labels.mountpoint }} has become read-only. This usually indicates disk failure!"
|
||||
|
||||
- name: network-alerts
|
||||
interval: 30s
|
||||
rules:
|
||||
- alert: HostNetworkReceiveErrors
|
||||
expr: rate(node_network_receive_errs_total{device!~"lo|veth.*|docker.*|br-.*"}[5m]) > 10
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Network receive errors on {{ $labels.instance }}"
|
||||
description: "{{ $labels.device }} has {{ $value | printf \"%.0f\" }} receive errors/sec."
|
||||
|
||||
- alert: HostNetworkTransmitErrors
|
||||
expr: rate(node_network_transmit_errs_total{device!~"lo|veth.*|docker.*|br-.*"}[5m]) > 10
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Network transmit errors on {{ $labels.instance }}"
|
||||
description: "{{ $labels.device }} has {{ $value | printf \"%.0f\" }} transmit errors/sec."
|
||||
|
||||
- name: system-alerts
|
||||
interval: 60s
|
||||
rules:
|
||||
- alert: HostClockSkew
|
||||
expr: abs(node_timex_offset_seconds) > 0.5
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Clock skew detected on {{ $labels.instance }}"
|
||||
description: "Clock is off by {{ $value | printf \"%.2f\" }} seconds."
|
||||
117
docs/infrastructure/monitoring/prometheus/prometheus.yml
Normal file
117
docs/infrastructure/monitoring/prometheus/prometheus.yml
Normal file
@@ -0,0 +1,117 @@
|
||||
# Updated Prometheus Configuration with Alertmanager
|
||||
# This adds alerting configuration to your existing prometheus.yml
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s # How often to evaluate rules
|
||||
|
||||
# Alertmanager configuration
|
||||
alerting:
|
||||
alertmanagers:
|
||||
- static_configs:
|
||||
- targets:
|
||||
- alertmanager:9093
|
||||
|
||||
# Load alerting rules
|
||||
rule_files:
|
||||
- /etc/prometheus/alert-rules.yml
|
||||
|
||||
scrape_configs:
|
||||
- job_name: "prometheus"
|
||||
static_configs:
|
||||
- targets: ["prometheus:9090"]
|
||||
|
||||
- job_name: "alertmanager"
|
||||
static_configs:
|
||||
- targets: ["alertmanager:9093"]
|
||||
|
||||
- job_name: "homelab-node"
|
||||
static_configs:
|
||||
- targets: ["100.67.40.126:9100"]
|
||||
|
||||
- job_name: "raspberry-pis"
|
||||
static_configs:
|
||||
- targets: ["100.77.151.40:9100"] # pi-5
|
||||
- targets: ["100.123.246.75:9100"] # pi-5-kevin
|
||||
|
||||
- job_name: "setillo-node"
|
||||
static_configs:
|
||||
- targets: ["100.125.0.20:9100"]
|
||||
|
||||
- job_name: "setillo-snmp"
|
||||
metrics_path: /snmp
|
||||
params:
|
||||
module: [synology]
|
||||
auth: [snmpv3]
|
||||
target: ["127.0.0.1"]
|
||||
static_configs:
|
||||
- targets: ["100.125.0.20:9116"]
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
replacement: "127.0.0.1"
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
replacement: "100.125.0.20"
|
||||
- target_label: __address__
|
||||
replacement: "100.125.0.20:9116"
|
||||
|
||||
- job_name: "calypso-node"
|
||||
static_configs:
|
||||
- targets: ["100.103.48.78:9100"]
|
||||
|
||||
- job_name: "calypso-snmp"
|
||||
metrics_path: /snmp
|
||||
params:
|
||||
module: [synology]
|
||||
auth: [snmpv3]
|
||||
target: ["127.0.0.1"]
|
||||
static_configs:
|
||||
- targets: ["100.103.48.78:9116"]
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
replacement: "127.0.0.1"
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
replacement: "100.103.48.78"
|
||||
- target_label: __address__
|
||||
replacement: "100.103.48.78:9116"
|
||||
|
||||
- job_name: "atlantis-node"
|
||||
static_configs:
|
||||
- targets: ["100.83.230.112:9100"]
|
||||
|
||||
- job_name: "atlantis-snmp"
|
||||
metrics_path: /snmp
|
||||
params:
|
||||
module: [synology]
|
||||
auth: [snmpv3]
|
||||
target: ["127.0.0.1"]
|
||||
static_configs:
|
||||
- targets: ["100.83.230.112:9116"]
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: __param_target
|
||||
replacement: "127.0.0.1"
|
||||
- source_labels: [__param_target]
|
||||
target_label: instance
|
||||
replacement: "100.83.230.112"
|
||||
- target_label: __address__
|
||||
replacement: "100.83.230.112:9116"
|
||||
|
||||
- job_name: "concord-nuc-node"
|
||||
static_configs:
|
||||
- targets: ["100.72.55.21:9100"]
|
||||
|
||||
- job_name: "truenas-node"
|
||||
static_configs:
|
||||
- targets: ["100.75.252.64:9100"]
|
||||
|
||||
- job_name: "vmi2076105-node"
|
||||
static_configs:
|
||||
- targets: ["100.99.156.20:9100"]
|
||||
|
||||
- job_name: "proxmox-node"
|
||||
static_configs:
|
||||
- targets: ["100.87.12.28:9100"]
|
||||
216
docs/infrastructure/monitoring/restore.sh
Executable file
216
docs/infrastructure/monitoring/restore.sh
Executable file
@@ -0,0 +1,216 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Stoatchat Restore Script
|
||||
# Restores a complete backup of the Stoatchat instance
|
||||
|
||||
set -e # Exit on any error
|
||||
|
||||
# Colors for output
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
log() {
|
||||
echo -e "${BLUE}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1"
|
||||
}
|
||||
|
||||
success() {
|
||||
echo -e "${GREEN}✅ $1${NC}"
|
||||
}
|
||||
|
||||
warning() {
|
||||
echo -e "${YELLOW}⚠️ $1${NC}"
|
||||
}
|
||||
|
||||
error() {
|
||||
echo -e "${RED}❌ $1${NC}"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Check if running as root
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
error "This script must be run as root"
|
||||
fi
|
||||
|
||||
# Check if backup path provided
|
||||
if [ $# -eq 0 ]; then
|
||||
error "Usage: $0 <backup-directory-name>"
|
||||
fi
|
||||
|
||||
BACKUP_NAME="$1"
|
||||
BACKUP_DIR="/root/stoatchat-backups"
|
||||
BACKUP_PATH="${BACKUP_DIR}/${BACKUP_NAME}"
|
||||
STOATCHAT_DIR="/root/stoatchat"
|
||||
|
||||
# Check if backup exists
|
||||
if [ ! -d "${BACKUP_PATH}" ]; then
|
||||
# Try to extract from tar.gz
|
||||
if [ -f "${BACKUP_PATH}.tar.gz" ]; then
|
||||
log "Extracting backup archive..."
|
||||
cd "${BACKUP_DIR}"
|
||||
tar -xzf "${BACKUP_NAME}.tar.gz"
|
||||
success "Backup archive extracted"
|
||||
else
|
||||
error "Backup not found: ${BACKUP_PATH} or ${BACKUP_PATH}.tar.gz"
|
||||
fi
|
||||
fi
|
||||
|
||||
log "Starting Stoatchat restore process..."
|
||||
log "Restoring from: ${BACKUP_PATH}"
|
||||
|
||||
# Stop services before restore
|
||||
log "Stopping Stoatchat services..."
|
||||
pkill -f revolt || true
|
||||
docker-compose -f "${STOATCHAT_DIR}/compose.yml" down 2>/dev/null || true
|
||||
systemctl stop nginx 2>/dev/null || true
|
||||
success "Services stopped"
|
||||
|
||||
# 1. Restore Configuration Files
|
||||
log "Restoring configuration files..."
|
||||
if [ -d "${BACKUP_PATH}/config" ]; then
|
||||
cp "${BACKUP_PATH}/config/"* "${STOATCHAT_DIR}/" 2>/dev/null || warning "Some config files could not be restored"
|
||||
success "Configuration files restored"
|
||||
else
|
||||
warning "No configuration backup found"
|
||||
fi
|
||||
|
||||
# 2. Restore Nginx Configuration
|
||||
log "Restoring Nginx configuration..."
|
||||
if [ -d "${BACKUP_PATH}/nginx" ]; then
|
||||
mkdir -p /etc/nginx/sites-available
|
||||
mkdir -p /etc/nginx/ssl
|
||||
cp -r "${BACKUP_PATH}/nginx/st.vish.gg" /etc/nginx/sites-available/ 2>/dev/null || warning "Nginx site config not restored"
|
||||
cp -r "${BACKUP_PATH}/nginx/ssl/"* /etc/nginx/ssl/ 2>/dev/null || warning "SSL certificates not restored"
|
||||
|
||||
# Enable site
|
||||
ln -sf /etc/nginx/sites-available/st.vish.gg /etc/nginx/sites-enabled/ 2>/dev/null || true
|
||||
success "Nginx configuration restored"
|
||||
else
|
||||
warning "No Nginx backup found"
|
||||
fi
|
||||
|
||||
# 3. Restore MongoDB Database
|
||||
log "Restoring MongoDB database..."
|
||||
if [ -d "${BACKUP_PATH}/mongodb" ]; then
|
||||
# Start MongoDB if not running
|
||||
systemctl start mongod 2>/dev/null || docker-compose -f "${STOATCHAT_DIR}/compose.yml" up -d mongo 2>/dev/null || true
|
||||
sleep 5
|
||||
|
||||
if command -v mongorestore &> /dev/null; then
|
||||
mongorestore --host localhost:27017 --db revolt --drop "${BACKUP_PATH}/mongodb/revolt"
|
||||
success "MongoDB database restored"
|
||||
else
|
||||
# Use docker if mongorestore not available
|
||||
if docker ps | grep -q mongo; then
|
||||
docker cp "${BACKUP_PATH}/mongodb" $(docker ps --format "table {{.Names}}" | grep mongo | head -1):/tmp/
|
||||
docker exec $(docker ps --format "table {{.Names}}" | grep mongo | head -1) mongorestore --db revolt --drop /tmp/mongodb/revolt
|
||||
success "MongoDB database restored (via Docker)"
|
||||
else
|
||||
warning "MongoDB restore skipped - no mongorestore or mongo container found"
|
||||
fi
|
||||
fi
|
||||
else
|
||||
warning "No MongoDB backup found"
|
||||
fi
|
||||
|
||||
# 4. Restore User Uploads and Files
|
||||
log "Restoring user uploads and file storage..."
|
||||
if [ -d "${BACKUP_PATH}/files" ]; then
|
||||
mkdir -p "${STOATCHAT_DIR}/uploads"
|
||||
cp -r "${BACKUP_PATH}/files/"* "${STOATCHAT_DIR}/" 2>/dev/null || warning "Some files could not be restored"
|
||||
success "User files restored"
|
||||
else
|
||||
warning "No file backup found"
|
||||
fi
|
||||
|
||||
# 5. Restore Docker Volumes
|
||||
log "Restoring Docker volumes..."
|
||||
if [ -d "${BACKUP_PATH}/docker-volumes" ]; then
|
||||
for volume_backup in "${BACKUP_PATH}/docker-volumes"/*.tar.gz; do
|
||||
if [ -f "$volume_backup" ]; then
|
||||
volume_name=$(basename "$volume_backup" .tar.gz)
|
||||
log "Restoring volume: $volume_name"
|
||||
|
||||
# Create volume if it doesn't exist
|
||||
docker volume create "$volume_name" 2>/dev/null || true
|
||||
|
||||
# Restore volume data
|
||||
docker run --rm -v "$volume_name":/target -v "${BACKUP_PATH}/docker-volumes":/backup alpine tar xzf "/backup/${volume_name}.tar.gz" -C /target
|
||||
fi
|
||||
done
|
||||
success "Docker volumes restored"
|
||||
else
|
||||
warning "No Docker volume backups found"
|
||||
fi
|
||||
|
||||
# 6. Set proper permissions
|
||||
log "Setting proper permissions..."
|
||||
chown -R root:root "${STOATCHAT_DIR}"
|
||||
chmod +x "${STOATCHAT_DIR}/manage-services.sh" 2>/dev/null || true
|
||||
chmod +x "${STOATCHAT_DIR}/backup.sh" 2>/dev/null || true
|
||||
chmod +x "${STOATCHAT_DIR}/restore.sh" 2>/dev/null || true
|
||||
success "Permissions set"
|
||||
|
||||
# 7. Start services
|
||||
log "Starting services..."
|
||||
systemctl start nginx 2>/dev/null || warning "Could not start nginx"
|
||||
cd "${STOATCHAT_DIR}"
|
||||
docker-compose up -d 2>/dev/null || warning "Could not start Docker services"
|
||||
|
||||
# Start Stoatchat services
|
||||
if [ -f "${STOATCHAT_DIR}/manage-services.sh" ]; then
|
||||
"${STOATCHAT_DIR}/manage-services.sh" start 2>/dev/null || warning "Could not start Stoatchat services with manage-services.sh"
|
||||
else
|
||||
# Manual start
|
||||
REVOLT_CONFIG_PATH=Revolt.overrides.toml nohup "${STOATCHAT_DIR}/target/debug/revolt-delta" > api.log 2>&1 &
|
||||
warning "Started services manually - consider using manage-services.sh"
|
||||
fi
|
||||
|
||||
success "Services started"
|
||||
|
||||
# 8. Verify restoration
|
||||
log "Verifying restoration..."
|
||||
sleep 10
|
||||
|
||||
# Check if API is responding
|
||||
if curl -s http://localhost:14702/health >/dev/null 2>&1; then
|
||||
success "API service is responding"
|
||||
else
|
||||
warning "API service may not be fully started yet"
|
||||
fi
|
||||
|
||||
# Check if nginx is serving the site
|
||||
if curl -s -k https://localhost >/dev/null 2>&1; then
|
||||
success "Nginx is serving HTTPS"
|
||||
else
|
||||
warning "Nginx HTTPS may not be configured correctly"
|
||||
fi
|
||||
|
||||
# Final summary
|
||||
echo
|
||||
echo "=================================================="
|
||||
echo -e "${GREEN}🎉 RESTORE COMPLETED! 🎉${NC}"
|
||||
echo "=================================================="
|
||||
echo "Restored from: ${BACKUP_PATH}"
|
||||
echo "Restoration includes:"
|
||||
echo " ✅ Configuration files"
|
||||
echo " ✅ Nginx configuration & SSL certificates"
|
||||
echo " ✅ MongoDB database"
|
||||
echo " ✅ User uploads & file storage"
|
||||
echo " ✅ Docker volumes"
|
||||
echo
|
||||
echo "Next steps:"
|
||||
echo " 1. Verify services are running: systemctl status nginx"
|
||||
echo " 2. Check Stoatchat API: curl http://localhost:14702/health"
|
||||
echo " 3. Test frontend: visit https://st.vish.gg"
|
||||
echo " 4. Check logs: tail -f ${STOATCHAT_DIR}/api.log"
|
||||
echo
|
||||
echo "If you encounter issues:"
|
||||
echo " - Check the backup info: cat ${BACKUP_PATH}/backup-info.txt"
|
||||
echo " - Review system info: cat ${BACKUP_PATH}/system/"
|
||||
echo " - Restart services: ${STOATCHAT_DIR}/manage-services.sh restart"
|
||||
echo
|
||||
echo "Restore completed at: $(date)"
|
||||
echo "=================================================="
|
||||
155
docs/infrastructure/monitoring/setup-backup-cron.sh
Executable file
155
docs/infrastructure/monitoring/setup-backup-cron.sh
Executable file
@@ -0,0 +1,155 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Setup automated backups for Stoatchat
|
||||
# This script configures a daily backup at 2 AM
|
||||
|
||||
set -e
|
||||
|
||||
# Colors for output
|
||||
GREEN='\033[0;32m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
log() {
|
||||
echo -e "${BLUE}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1"
|
||||
}
|
||||
|
||||
success() {
|
||||
echo -e "${GREEN}✅ $1${NC}"
|
||||
}
|
||||
|
||||
# Check if running as root
|
||||
if [[ $EUID -ne 0 ]]; then
|
||||
echo "This script must be run as root"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
STOATCHAT_DIR="/root/stoatchat"
|
||||
BACKUP_SCRIPT="${STOATCHAT_DIR}/backup.sh"
|
||||
|
||||
# Check if backup script exists
|
||||
if [ ! -f "$BACKUP_SCRIPT" ]; then
|
||||
echo "❌ Backup script not found at $BACKUP_SCRIPT"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
log "Setting up automated daily backups for Stoatchat..."
|
||||
|
||||
# Create cron job for daily backup at 2 AM
|
||||
CRON_JOB="0 2 * * * $BACKUP_SCRIPT >> /var/log/stoatchat-backup.log 2>&1"
|
||||
|
||||
# Check if cron job already exists
|
||||
if crontab -l 2>/dev/null | grep -q "$BACKUP_SCRIPT"; then
|
||||
log "Backup cron job already exists, updating..."
|
||||
# Remove existing job and add new one
|
||||
(crontab -l 2>/dev/null | grep -v "$BACKUP_SCRIPT"; echo "$CRON_JOB") | crontab -
|
||||
else
|
||||
log "Adding new backup cron job..."
|
||||
# Add new cron job
|
||||
(crontab -l 2>/dev/null; echo "$CRON_JOB") | crontab -
|
||||
fi
|
||||
|
||||
success "Daily backup scheduled for 2:00 AM"
|
||||
|
||||
# Create log rotation for backup logs
|
||||
log "Setting up log rotation..."
|
||||
cat > /etc/logrotate.d/stoatchat-backup << EOF
|
||||
/var/log/stoatchat-backup.log {
|
||||
daily
|
||||
rotate 30
|
||||
compress
|
||||
delaycompress
|
||||
missingok
|
||||
notifempty
|
||||
create 644 root root
|
||||
}
|
||||
EOF
|
||||
|
||||
success "Log rotation configured"
|
||||
|
||||
# Create backup monitoring script
|
||||
log "Creating backup monitoring script..."
|
||||
cat > "${STOATCHAT_DIR}/check-backup-health.sh" << 'EOF'
|
||||
#!/bin/bash
|
||||
|
||||
# Check backup health and send alerts if needed
|
||||
|
||||
BACKUP_DIR="/root/stoatchat-backups"
|
||||
ALERT_EMAIL="admin@example.com" # Change this to your email
|
||||
MAX_AGE_HOURS=26 # Alert if no backup in last 26 hours
|
||||
|
||||
# Find the most recent backup
|
||||
LATEST_BACKUP=$(find "$BACKUP_DIR" -name "stoatchat_backup_*.tar.gz" -type f -printf '%T@ %p\n' | sort -n | tail -1 | cut -d' ' -f2-)
|
||||
|
||||
if [ -z "$LATEST_BACKUP" ]; then
|
||||
echo "❌ No backups found in $BACKUP_DIR"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check age of latest backup
|
||||
BACKUP_AGE=$(find "$LATEST_BACKUP" -mtime +1 | wc -l)
|
||||
|
||||
if [ "$BACKUP_AGE" -gt 0 ]; then
|
||||
echo "⚠️ Latest backup is older than 24 hours: $LATEST_BACKUP"
|
||||
echo "Backup age: $(stat -c %y "$LATEST_BACKUP")"
|
||||
exit 1
|
||||
else
|
||||
echo "✅ Backup is current: $LATEST_BACKUP"
|
||||
echo "Backup size: $(du -h "$LATEST_BACKUP" | cut -f1)"
|
||||
echo "Backup date: $(stat -c %y "$LATEST_BACKUP")"
|
||||
fi
|
||||
|
||||
# Check backup integrity
|
||||
if tar -tzf "$LATEST_BACKUP" >/dev/null 2>&1; then
|
||||
echo "✅ Backup integrity verified"
|
||||
else
|
||||
echo "❌ Backup integrity check failed!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check disk space
|
||||
DISK_USAGE=$(df "$BACKUP_DIR" | tail -1 | awk '{print $5}' | sed 's/%//')
|
||||
if [ "$DISK_USAGE" -gt 80 ]; then
|
||||
echo "⚠️ Disk usage is high: ${DISK_USAGE}%"
|
||||
echo "Consider cleaning old backups or expanding storage"
|
||||
fi
|
||||
|
||||
echo "✅ Backup health check completed successfully"
|
||||
EOF
|
||||
|
||||
chmod +x "${STOATCHAT_DIR}/check-backup-health.sh"
|
||||
success "Backup monitoring script created"
|
||||
|
||||
# Add weekly backup health check
|
||||
HEALTH_CRON_JOB="0 8 * * 1 ${STOATCHAT_DIR}/check-backup-health.sh >> /var/log/stoatchat-backup-health.log 2>&1"
|
||||
if ! crontab -l 2>/dev/null | grep -q "check-backup-health.sh"; then
|
||||
(crontab -l 2>/dev/null; echo "$HEALTH_CRON_JOB") | crontab -
|
||||
success "Weekly backup health check scheduled for Mondays at 8:00 AM"
|
||||
fi
|
||||
|
||||
# Show current cron jobs
|
||||
log "Current backup-related cron jobs:"
|
||||
crontab -l | grep -E "(backup|stoatchat)" || echo "No backup cron jobs found"
|
||||
|
||||
echo
|
||||
echo "=================================================="
|
||||
echo -e "${GREEN}🎉 AUTOMATED BACKUP SETUP COMPLETE! 🎉${NC}"
|
||||
echo "=================================================="
|
||||
echo "✅ Daily backup scheduled for 2:00 AM"
|
||||
echo "✅ Weekly health check scheduled for Mondays at 8:00 AM"
|
||||
echo "✅ Log rotation configured"
|
||||
echo "✅ Backup monitoring script created"
|
||||
echo
|
||||
echo "Backup locations:"
|
||||
echo " 📁 Backups: /root/stoatchat-backups/"
|
||||
echo " 📄 Logs: /var/log/stoatchat-backup.log"
|
||||
echo " 📄 Health logs: /var/log/stoatchat-backup-health.log"
|
||||
echo
|
||||
echo "Manual commands:"
|
||||
echo " 🔧 Run backup now: $BACKUP_SCRIPT"
|
||||
echo " 🔍 Check backup health: ${STOATCHAT_DIR}/check-backup-health.sh"
|
||||
echo " 📋 View cron jobs: crontab -l"
|
||||
echo " 📄 View backup logs: tail -f /var/log/stoatchat-backup.log"
|
||||
echo
|
||||
echo "Setup completed at: $(date)"
|
||||
echo "=================================================="
|
||||
102
docs/infrastructure/monitoring/synology-dashboard-fix-report.md
Normal file
102
docs/infrastructure/monitoring/synology-dashboard-fix-report.md
Normal file
@@ -0,0 +1,102 @@
|
||||
# Synology NAS Monitoring Dashboard Fix Report
|
||||
|
||||
## Issue Summary
|
||||
The Synology NAS Monitoring dashboard was showing "no data" due to several configuration issues:
|
||||
|
||||
1. **Empty Datasource UIDs**: All panels had `"uid": ""` instead of the correct Prometheus datasource UID
|
||||
2. **Broken Template Variables**: Template variables had empty current values and incorrect queries
|
||||
3. **Empty Instance Filters**: Queries used `instance=~""` which matched nothing
|
||||
|
||||
## Fixes Applied
|
||||
|
||||
### 1. Datasource UID Correction
|
||||
**Before**: `"uid": ""`
|
||||
**After**: `"uid": "PBFA97CFB590B2093"`
|
||||
**Impact**: All 8 panels now connect to the correct Prometheus datasource
|
||||
|
||||
### 2. Template Variable Fixes
|
||||
|
||||
#### Datasource Variable
|
||||
```json
|
||||
"current": {
|
||||
"text": "Prometheus",
|
||||
"value": "PBFA97CFB590B2093"
|
||||
}
|
||||
```
|
||||
|
||||
#### Instance Variable
|
||||
- **Query Changed**: `label_values(temperature, instance)` → `label_values(diskTemperature, instance)`
|
||||
- **Current Value**: Set to "All" with `$__all` value
|
||||
- **Datasource UID**: Updated to correct UID
|
||||
|
||||
### 3. Query Filter Fixes
|
||||
**Before**: `instance=~""`
|
||||
**After**: `instance=~"$instance"`
|
||||
**Impact**: Queries now properly use the instance template variable
|
||||
|
||||
## Verification Results
|
||||
|
||||
### Dashboard Status: ✅ WORKING
|
||||
- **Total Panels**: 8
|
||||
- **Template Variables**: 2 (both working)
|
||||
- **Data Points**: All panels showing data
|
||||
|
||||
### Metrics Verified
|
||||
| Metric | Data Points | Status |
|
||||
|--------|-------------|--------|
|
||||
| systemStatus | 3 NAS devices | ✅ Working |
|
||||
| temperature | 3 readings | ✅ Working |
|
||||
| diskTemperature | 18 disk sensors | ✅ Working |
|
||||
| hrStorageUsed/Size | 92 storage metrics | ✅ Working |
|
||||
|
||||
### SNMP Targets Health
|
||||
| Target | Instance | Status |
|
||||
|--------|----------|--------|
|
||||
| atlantis-snmp | 100.83.230.112 | ✅ Up |
|
||||
| calypso-snmp | 100.103.48.78 | ✅ Up |
|
||||
| setillo-snmp | 100.125.0.20 | ✅ Up |
|
||||
|
||||
## Sample Data
|
||||
- **NAS Temperature**: 40°C (atlantis)
|
||||
- **Disk Temperature**: 31°C (sample disk)
|
||||
- **Storage Usage**: 67.6% (sample volume)
|
||||
- **System Status**: Normal (all 3 devices)
|
||||
|
||||
## Dashboard Access
|
||||
**URL**: http://localhost:3300/d/synology-dashboard-v2
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Available SNMP Metrics
|
||||
- `systemStatus`: Overall NAS health status
|
||||
- `temperature`: System temperature readings
|
||||
- `diskTemperature`: Individual disk temperatures
|
||||
- `hrStorageUsed`: Storage space used
|
||||
- `hrStorageSize`: Total storage capacity
|
||||
- `diskStatus`: Individual disk health
|
||||
- `diskModel`: Disk model information
|
||||
|
||||
### Template Variable Configuration
|
||||
```json
|
||||
{
|
||||
"datasource": {
|
||||
"current": {"text": "Prometheus", "value": "PBFA97CFB590B2093"}
|
||||
},
|
||||
"instance": {
|
||||
"current": {"text": "All", "value": "$__all"},
|
||||
"query": "label_values(diskTemperature, instance)"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
✅ **Synology NAS Monitoring dashboard is now fully functional**
|
||||
✅ **All panels displaying real-time data**
|
||||
✅ **Template variables working correctly**
|
||||
✅ **SNMP monitoring operational across 3 NAS devices**
|
||||
|
||||
The dashboard now provides comprehensive monitoring of:
|
||||
- System health and status
|
||||
- Temperature monitoring (system and individual disks)
|
||||
- Storage utilization across all volumes
|
||||
- Disk health and performance metrics
|
||||
142
docs/infrastructure/monitoring/verify-dashboard-sections.sh
Executable file
142
docs/infrastructure/monitoring/verify-dashboard-sections.sh
Executable file
@@ -0,0 +1,142 @@
|
||||
#!/bin/bash
|
||||
|
||||
# Comprehensive Dashboard Section Verification Script
|
||||
# Tests each dashboard and its individual sections/panels
|
||||
|
||||
GRAFANA_URL="http://localhost:3300"
|
||||
GRAFANA_USER="admin"
|
||||
GRAFANA_PASS="REDACTED_PASSWORD"
|
||||
|
||||
echo "=== Comprehensive Dashboard Section Verification ==="
|
||||
echo "Grafana URL: $GRAFANA_URL"
|
||||
echo
|
||||
|
||||
# Function to test a metric query
|
||||
test_metric() {
|
||||
local metric="$1"
|
||||
local description="$2"
|
||||
local result=$(curl -s -u "$GRAFANA_USER:$GRAFANA_PASS" "$GRAFANA_URL/api/datasources/proxy/1/api/v1/query?query=$metric" | jq '.data.result | length')
|
||||
if [ "$result" -gt 0 ]; then
|
||||
echo " ✅ $description: $result data points"
|
||||
else
|
||||
echo " ❌ $description: No data"
|
||||
fi
|
||||
}
|
||||
|
||||
# Function to test a dashboard's panels
|
||||
test_dashboard_panels() {
|
||||
local uid="$1"
|
||||
local name="$2"
|
||||
echo
|
||||
echo "=== Testing $name Dashboard (UID: $uid) ==="
|
||||
|
||||
# Get dashboard JSON
|
||||
local dashboard=$(curl -s -u "$GRAFANA_USER:$GRAFANA_PASS" "$GRAFANA_URL/api/dashboards/uid/$uid")
|
||||
local panel_count=$(echo "$dashboard" | jq '.dashboard.panels | length')
|
||||
echo "📊 Total panels: $panel_count"
|
||||
|
||||
# Get template variables
|
||||
echo
|
||||
echo "🔧 Template Variables:"
|
||||
echo "$dashboard" | jq -r '.dashboard.templating.list[] | " • \(.name): \(.current.text // "N/A")"'
|
||||
|
||||
# Test some key metrics based on dashboard type
|
||||
echo
|
||||
echo "📈 Testing Key Metrics:"
|
||||
}
|
||||
|
||||
# Test API connectivity
|
||||
echo "1. Testing API connectivity..."
|
||||
if curl -s -u "$GRAFANA_USER:$GRAFANA_PASS" "$GRAFANA_URL/api/health" | grep -q "ok"; then
|
||||
echo "✅ API connectivity: OK"
|
||||
else
|
||||
echo "❌ API connectivity: FAILED"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Test data source
|
||||
echo
|
||||
echo "2. Testing Prometheus data source..."
|
||||
PROMETHEUS_STATUS=$(curl -s -u "$GRAFANA_USER:$GRAFANA_PASS" "$GRAFANA_URL/api/datasources/1/health" | jq -r '.status')
|
||||
echo "✅ Prometheus status: $PROMETHEUS_STATUS"
|
||||
|
||||
# Test Node Exporter Dashboard
|
||||
test_dashboard_panels "rYdddlPWk" "Node Exporter Full"
|
||||
|
||||
# Test key Node Exporter metrics
|
||||
test_metric "up%7Bjob%3D~%22.*-node%22%7D" "Node Exporter targets up"
|
||||
test_metric "node_load1" "CPU Load (1m)"
|
||||
test_metric "node_memory_MemAvailable_bytes" "Memory Available"
|
||||
test_metric "node_filesystem_avail_bytes" "Filesystem Available"
|
||||
test_metric "node_disk_io_time_seconds_total" "Disk I/O Time"
|
||||
test_metric "node_network_receive_bytes_total" "Network Receive Bytes"
|
||||
test_metric "node_cpu_seconds_total" "CPU Usage"
|
||||
test_metric "node_boot_time_seconds" "Boot Time"
|
||||
|
||||
# Test Synology Dashboard
|
||||
test_dashboard_panels "synology-dashboard-v2" "Synology NAS Monitoring"
|
||||
|
||||
# Test key Synology/SNMP metrics
|
||||
test_metric "up%7Bjob%3D~%22.*-snmp%22%7D" "SNMP targets up"
|
||||
test_metric "diskTemperature" "Disk Temperature"
|
||||
test_metric "hrStorageSize" "Storage Size"
|
||||
test_metric "hrStorageUsed" "Storage Used"
|
||||
test_metric "sysUpTime" "System Uptime"
|
||||
|
||||
# Test Node Details Dashboard
|
||||
test_dashboard_panels "node-details-v2" "Node Details"
|
||||
|
||||
# Test Infrastructure Overview Dashboard
|
||||
test_dashboard_panels "infrastructure-overview-v2" "Infrastructure Overview"
|
||||
|
||||
echo
|
||||
echo "=== Detailed Panel Testing ==="
|
||||
|
||||
# Test specific dashboard sections
|
||||
echo
|
||||
echo "🔍 Node Exporter Dashboard Sections:"
|
||||
echo " Testing CPU, Memory, Disk, Network, and System panels..."
|
||||
|
||||
# CPU metrics
|
||||
test_metric "100%20-%20%28avg%20by%20%28instance%29%20%28irate%28node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B5m%5D%29%29%20*%20100%29" "CPU Usage Percentage"
|
||||
|
||||
# Memory metrics
|
||||
test_metric "%28node_memory_MemTotal_bytes%20-%20node_memory_MemAvailable_bytes%29%20/%20node_memory_MemTotal_bytes%20*%20100" "Memory Usage Percentage"
|
||||
|
||||
# Disk metrics
|
||||
test_metric "100%20-%20%28node_filesystem_avail_bytes%20/%20node_filesystem_size_bytes%29%20*%20100" "Disk Usage Percentage"
|
||||
|
||||
# Network metrics
|
||||
test_metric "irate%28node_network_receive_bytes_total%5B5m%5D%29" "Network Receive Rate"
|
||||
test_metric "irate%28node_network_transmit_bytes_total%5B5m%5D%29" "Network Transmit Rate"
|
||||
|
||||
echo
|
||||
echo "🔍 Synology Dashboard Sections:"
|
||||
echo " Testing Storage, Temperature, and System panels..."
|
||||
|
||||
# Storage metrics
|
||||
test_metric "hrStorageUsed%20/%20hrStorageSize%20*%20100" "Storage Usage Percentage"
|
||||
|
||||
# Temperature metrics (if available)
|
||||
test_metric "diskTemperature" "Disk Temperatures"
|
||||
|
||||
echo
|
||||
echo "=== Target Health Summary ==="
|
||||
|
||||
# Get all targets and their health
|
||||
echo "📡 All Prometheus Targets:"
|
||||
curl -s -u "$GRAFANA_USER:$GRAFANA_PASS" "$GRAFANA_URL/api/datasources/proxy/1/api/v1/targets" | jq -r '.data.activeTargets[] | " \(if .health == "up" then "✅" else "❌" end) \(.labels.job): \(.labels.instance // "N/A") (\(.health))"'
|
||||
|
||||
echo
|
||||
echo "=== Dashboard URLs ==="
|
||||
echo "🌐 Access your dashboards:"
|
||||
echo " • Node Exporter Full: $GRAFANA_URL/d/rYdddlPWk"
|
||||
echo " • Synology NAS: $GRAFANA_URL/d/synology-dashboard-v2"
|
||||
echo " • Node Details: $GRAFANA_URL/d/node-details-v2"
|
||||
echo " • Infrastructure Overview: $GRAFANA_URL/d/infrastructure-overview-v2"
|
||||
|
||||
echo
|
||||
echo "=== Verification Complete ==="
|
||||
echo "✅ All dashboard sections have been tested"
|
||||
echo "📊 Check the results above for any issues"
|
||||
echo "🔧 Template variables and data sources verified"
|
||||
86
docs/infrastructure/mounting-calypso-on-nuc.md
Normal file
86
docs/infrastructure/mounting-calypso-on-nuc.md
Normal file
@@ -0,0 +1,86 @@
|
||||
# Mounting Calypso NAS on Concord NUC
|
||||
|
||||
This guide covers mounting the Calypso NAS media share on the NUC for Plex access.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
1. Verify Tailscale connectivity:
|
||||
```bash
|
||||
ping 100.103.48.78 # Calypso's Tailscale IP
|
||||
```
|
||||
|
||||
2. Install CIFS utilities:
|
||||
```bash
|
||||
sudo apt install cifs-utils -y
|
||||
```
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Create Mount Point
|
||||
|
||||
```bash
|
||||
sudo mkdir -p /mnt/nas
|
||||
```
|
||||
|
||||
### 2. Create Credentials File (Secure)
|
||||
|
||||
```bash
|
||||
sudo nano /root/.smbcredentials
|
||||
```
|
||||
|
||||
Add:
|
||||
```
|
||||
username=Vish
|
||||
password=REDACTED_PASSWORD
|
||||
```
|
||||
|
||||
Secure the file:
|
||||
```bash
|
||||
sudo chmod 600 /root/.smbcredentials
|
||||
```
|
||||
|
||||
### 3. Add to /etc/fstab (Persistent Mount)
|
||||
|
||||
```bash
|
||||
sudo nano /etc/fstab
|
||||
```
|
||||
|
||||
Add this line:
|
||||
```
|
||||
//100.103.48.78/data/media /mnt/nas cifs credentials=/root/.smbcredentials,vers=3.0,uid=1000,gid=1000,file_mode=0755,dir_mode=0755,_netdev,x-systemd.automount 0 0
|
||||
```
|
||||
|
||||
### 4. Mount
|
||||
|
||||
```bash
|
||||
sudo mount -a
|
||||
```
|
||||
|
||||
### 5. Verify
|
||||
|
||||
```bash
|
||||
ls -la /mnt/nas
|
||||
# Should show: movies, tv, music, etc.
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Mount fails on boot
|
||||
The `_netdev` and `x-systemd.automount` options ensure the mount waits for network.
|
||||
If issues persist, check that Tailscale starts before mount:
|
||||
|
||||
```bash
|
||||
sudo systemctl status tailscaled
|
||||
```
|
||||
|
||||
### Permission issues
|
||||
Ensure `uid=1000,gid=1000` matches the user running Plex/Docker.
|
||||
|
||||
### Slow performance
|
||||
See [Network Performance Tuning](docs/infrastructure/network-performance-tuning.md) for SMB optimization.
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- **SMB over Tailscale**: ~139 MB/s (1.1 Gbps) - sufficient for 4K streaming
|
||||
- **Direct LAN access**: Best for 4K remux playback
|
||||
- **NFS alternative**: Not recommended over Tailscale (slower than SMB in testing)
|
||||
282
docs/infrastructure/network-architecture.md
Normal file
282
docs/infrastructure/network-architecture.md
Normal file
@@ -0,0 +1,282 @@
|
||||
# Network Architecture
|
||||
|
||||
*Homelab network topology and configuration*
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The homelab uses a multi-layered network architecture with external access via Cloudflare, internal services through Nginx Proxy Manager, and mesh VPN for secure remote access.
|
||||
|
||||
---
|
||||
|
||||
## Network Topology
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────────┐
|
||||
│ INTERNET │
|
||||
│ (Public IP via ISP) │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────────────────────────────────────────────────┐
|
||||
│ CLOUDFLARE │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ DNS │ │ Proxy │ │ Tunnels │ │
|
||||
│ │ vish.gg │ │ vish.gg │ │ (if used) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────────────────────────────────────────────────────┐
|
||||
│ HOME NETWORK │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Router │ │ Switch │ │ WiFi AP │ │
|
||||
│ │ (Gateway) │ │ (Managed) │ │ (Ubiquiti) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ │
|
||||
│ │ │ │
|
||||
│ └──────────────────┬────────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────┴──────┐ │
|
||||
│ │ VLANs │ │
|
||||
│ │ 10 (MGMT) │ │
|
||||
│ │ 20 (IOT) │ │
|
||||
│ │ 30 (MAIN) │ │
|
||||
│ └─────────────┘ │
|
||||
└────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌───────────────┼───────────────┐
|
||||
▼ ▼ ▼
|
||||
┌───────────┐ ┌───────────┐ ┌───────────┐
|
||||
│ ATLANTIS │ │ CALYPSO │ │ NUC │
|
||||
│ (NAS) │ │ (NAS) │ │ (HA) │
|
||||
└───────────┘ └───────────┘ └───────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## IP Address Scheme
|
||||
|
||||
### Subnet Configuration
|
||||
|
||||
| VLAN | Network | Gateway | DHCP Range | Purpose |
|
||||
|------|---------|---------|------------|---------|
|
||||
| 10 (MGMT) | 192.168.0.0/24 | .1 | .100-.150 | Infrastructure |
|
||||
| 20 (IOT) | 192.168.1.0/24 | .1 | .100-.200 | Smart home |
|
||||
| 30 (GUEST) | 192.168.2.0/24 | .1 | .100-.150 | Guest access |
|
||||
|
||||
### Static Assignments
|
||||
|
||||
| Host | IP | MAC | Purpose |
|
||||
|------|-----|-----|---------|
|
||||
| Atlantis | 192.168.0.200 | - | Primary NAS (DS1823xs+) |
|
||||
| Calypso | 192.168.0.250 | - | Secondary NAS (DS723+), runs NPM |
|
||||
| Guava | 192.168.0.100 | - | TrueNAS Scale workstation |
|
||||
| PVE | 192.168.0.205 | - | Proxmox hypervisor |
|
||||
| Pi-5 | 192.168.0.66 | - | Raspberry Pi 5 |
|
||||
| Homelab VM | 192.168.0.210 | - | Proxmox VM, monitoring |
|
||||
|
||||
---
|
||||
|
||||
## Port Forwarding
|
||||
|
||||
### External Access
|
||||
|
||||
| Service | External Port | Internal IP | Internal Port | Protocol |
|
||||
|---------|---------------|-------------|----------------|----------|
|
||||
| NPM HTTP | 80 | 192.168.0.250 | 80 | HTTP |
|
||||
| NPM HTTPS | 443 | 192.168.0.250 | 443 | HTTPS |
|
||||
| Headscale | 8443 | 192.168.0.250 | 8085 | TCP (control server) |
|
||||
| Plex | 32400 | 192.168.0.200 | 32400 | TCP |
|
||||
|
||||
### Internal Only (No Port Forward)
|
||||
|
||||
| Service | Internal IP | Port | Access Method |
|
||||
|---------|-------------|------|----------------|
|
||||
| Grafana | 192.168.0.210 | 3000 | VPN only |
|
||||
| Prometheus | 192.168.0.210 | 9090 | VPN only |
|
||||
| Home Assistant | 192.168.12.202 | 8123 | VPN only (via GL-MT3000 subnet) |
|
||||
| Authentik | 192.168.0.250 | 9000 | VPN only |
|
||||
| Vaultwarden | 192.168.0.200 | 8080 | VPN only |
|
||||
|
||||
---
|
||||
|
||||
## DNS Configuration
|
||||
|
||||
### Primary: Pi-hole / AdGuard
|
||||
|
||||
```
|
||||
Upstream DNS:
|
||||
- 1.1.1.1 (Cloudflare)
|
||||
- 8.8.8.8 (Google)
|
||||
|
||||
Local Domains:
|
||||
- vish.local
|
||||
- vish.gg
|
||||
```
|
||||
|
||||
### Local DNS Entries
|
||||
|
||||
| Hostname | IP | Description |
|
||||
|----------|-----|-------------|
|
||||
| atlantis | 192.168.0.200 | Primary NAS (DS1823xs+) |
|
||||
| calypso | 192.168.0.250 | Secondary NAS (DS723+) |
|
||||
| guava | 192.168.0.100 | TrueNAS Scale |
|
||||
| pve | 192.168.0.205 | Proxmox host |
|
||||
| homelab | 192.168.0.210 | Proxmox VM |
|
||||
| pi-5 | 192.168.0.66 | Raspberry Pi 5 |
|
||||
|
||||
---
|
||||
|
||||
## Reverse Proxy Flow
|
||||
|
||||
### External Request (vish.gg)
|
||||
|
||||
```
|
||||
1. User → https://service.vish.gg
|
||||
2. Cloudflare DNS → resolves to home IP
|
||||
3. Home Router → forwards to 192.168.0.250:443
|
||||
4. NPM (Calypso) → terminates SSL
|
||||
5. Authentik (if SSO) → authenticates
|
||||
6. Backend service → responds
|
||||
7. NPM → returns to user
|
||||
```
|
||||
|
||||
### Internal Request
|
||||
|
||||
```
|
||||
1. User → http://service.local (or IP)
|
||||
2. Pi-hole/AdGuard → resolves to internal IP
|
||||
3. NPM (optional) or direct → service
|
||||
4. Response → user
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## VPN Configuration
|
||||
|
||||
### Headscale (Primary Mesh VPN)
|
||||
|
||||
All nodes use the Tailscale client pointed at the self-hosted Headscale control server.
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Control Server | `headscale.vish.gg:8443` |
|
||||
| Host | Calypso (192.168.0.250) |
|
||||
| Admin UI | Headplane (via NPM at :8443/admin) |
|
||||
| DERP Servers | Tailscale public DERP map |
|
||||
| MagicDNS suffix | `tail.vish.gg` |
|
||||
| IP Range | 100.64.0.0/10 |
|
||||
| Exit Nodes | atlantis, calypso, setillo, vish-concord-nuc, seattle, homeassistant |
|
||||
|
||||
### WireGuard (Point-to-Point, Secondary)
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| Server | Concord NUC (wg-easy, port 51820) |
|
||||
| Interface | Dynamic |
|
||||
| Use Case | Clients that can't run Tailscale |
|
||||
|
||||
---
|
||||
|
||||
## VLAN Configuration
|
||||
|
||||
### Management VLAN (10)
|
||||
- Devices: NAS, switches, APs
|
||||
- Access: Admin only
|
||||
- Internet: Full
|
||||
|
||||
### IoT VLAN (20)
|
||||
- Devices: Smart home, cameras
|
||||
- Access: Restricted
|
||||
- Internet: Filtered (Pi-hole)
|
||||
- Isolation: Yes
|
||||
|
||||
### Main VLAN (30)
|
||||
- Devices: Personal devices
|
||||
- Access: Full
|
||||
- Internet: Full
|
||||
|
||||
---
|
||||
|
||||
## Firewall Rules
|
||||
|
||||
### Router (UFW/iptables)
|
||||
|
||||
```bash
|
||||
# Allow established connections
|
||||
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
|
||||
|
||||
# Allow SSH
|
||||
iptables -A INPUT -p tcp --dport 22 -j ACCEPT
|
||||
|
||||
# Allow HTTP/HTTPS
|
||||
iptables -A INPUT -p tcp --dport 80 -j ACCEPT
|
||||
iptables -A INPUT -p tcp --dport 443 -j ACCEPT
|
||||
|
||||
# Allow WireGuard
|
||||
iptables -A INPUT -p udp --dport 51820 -j ACCEPT
|
||||
|
||||
# Drop everything else
|
||||
iptables -A INPUT -j DROP
|
||||
```
|
||||
|
||||
### Docker Network
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
networks:
|
||||
default:
|
||||
driver: bridge
|
||||
ipam:
|
||||
config:
|
||||
- subnet: 172.20.0.0/24
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Network Metrics
|
||||
|
||||
| Metric | Source | Dashboard |
|
||||
|--------|--------|-----------|
|
||||
| Bandwidth | Node Exporter | Network |
|
||||
| Packet loss | Prometheus | Network |
|
||||
| DNS queries | Pi-hole | DNS |
|
||||
| VPN connections | WireGuard | VPN |
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Cannot Access Service
|
||||
|
||||
1. **Check DNS:** `nslookup service.vish.local`
|
||||
2. **Check connectivity:** `ping 192.168.0.x`
|
||||
3. **Check port:** `nc -zv 192.168.0.x 443`
|
||||
4. **Check service:** `curl -I http://localhost:PORT`
|
||||
5. **Check firewall:** `sudo iptables -L`
|
||||
|
||||
### Slow Network
|
||||
|
||||
1. Check bandwidth: `iperf3 -c 192.168.0.x`
|
||||
2. Check for interference (WiFi)
|
||||
3. Check switch port speed
|
||||
4. Check for broadcast storms
|
||||
|
||||
### VPN Issues
|
||||
|
||||
1. Check WireGuard status: `wg show`
|
||||
2. Check Headscale nodes: `headscale nodes list`
|
||||
3. Verify firewall allows UDP 51820
|
||||
4. Check NAT traversal
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Cloudflare Setup](../infrastructure/cloudflare-dns.md)
|
||||
- [WireGuard Guide](../services/individual/wg-easy.md)
|
||||
- [Headscale Setup](../infrastructure/tailscale-setup-guide.md)
|
||||
- [Port Forwarding](../infrastructure/port-forwarding-configuration.md)
|
||||
280
docs/infrastructure/network-performance-tuning.md
Normal file
280
docs/infrastructure/network-performance-tuning.md
Normal file
@@ -0,0 +1,280 @@
|
||||
# 🚀 Network Performance Tuning Guide
|
||||
|
||||
**🟠 Advanced Guide**
|
||||
|
||||
This guide documents the network performance testing and optimization between Calypso and Atlantis NAS units, connected via the TP-Link TL-SX1008 10GbE switch.
|
||||
|
||||
---
|
||||
|
||||
## 📊 Network Performance Test Results
|
||||
|
||||
### Test Configuration
|
||||
- **Date**: January 2025
|
||||
- **Tool**: iperf3 (via Docker: `networkstatic/iperf3`)
|
||||
- **Connection**: Calypso ↔ TL-SX1008 ↔ Atlantis (10GbE)
|
||||
- **MTU**: 1500 (standard)
|
||||
|
||||
### Baseline Results (Before Tuning)
|
||||
|
||||
| Direction | Speed | Notes |
|
||||
|-----------|-------|-------|
|
||||
| **Calypso → Atlantis** (upload) | 6.87 Gbps | ~3,570 TCP retransmits |
|
||||
| **Atlantis → Calypso** (download) | 9.27 Gbps | Near line-rate ✅ |
|
||||
|
||||
### Optimized Results (After Tuning)
|
||||
|
||||
| Direction | Speed | Improvement |
|
||||
|-----------|-------|-------------|
|
||||
| **Calypso → Atlantis** (upload) | 7.35 Gbps | +7% |
|
||||
| **Atlantis → Calypso** (download) | 9.27 Gbps | Unchanged |
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Optimizations Applied
|
||||
|
||||
### 1. Ring Buffer Optimization (Calypso)
|
||||
|
||||
**Before:**
|
||||
```
|
||||
RX: 2048 (max: 8184)
|
||||
TX: 4096 (max: 8184)
|
||||
```
|
||||
|
||||
**After:**
|
||||
```bash
|
||||
sudo ethtool -G eth2 rx 8184 tx 8184
|
||||
```
|
||||
|
||||
**Result:**
|
||||
```
|
||||
RX: 8184 ✅
|
||||
TX: 8184 ✅
|
||||
```
|
||||
|
||||
> ⚠️ **Note**: Changing ring buffers may briefly reset the NIC and drop connections.
|
||||
|
||||
### 2. TCP Buffer Tuning (Both NAS)
|
||||
|
||||
**Before:**
|
||||
```
|
||||
net.core.rmem_max = 212992
|
||||
net.core.wmem_max = 212992
|
||||
net.ipv4.tcp_rmem = 4096 87380 6291456
|
||||
net.ipv4.tcp_wmem = 4096 16384 4194304
|
||||
```
|
||||
|
||||
**Optimized settings:**
|
||||
```bash
|
||||
sudo sysctl -w net.core.rmem_max=16777216
|
||||
sudo sysctl -w net.core.wmem_max=16777216
|
||||
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
|
||||
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
|
||||
```
|
||||
|
||||
### 3. NIC Offloading Features (Verified Enabled)
|
||||
|
||||
```bash
|
||||
ethtool -k eth2 | grep -E 'tcp-segmentation|generic-segmentation|generic-receive'
|
||||
```
|
||||
|
||||
All offloading features should show `on`:
|
||||
- `tcp-segmentation-offload: on`
|
||||
- `generic-segmentation-offload: on`
|
||||
- `generic-receive-offload: on`
|
||||
|
||||
### 4. Flow Control (Verified Enabled)
|
||||
|
||||
```bash
|
||||
ethtool -a eth2
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
Pause parameters for eth2:
|
||||
Autonegotiate: off
|
||||
RX: on
|
||||
TX: on
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Commands Reference
|
||||
|
||||
### Check Current Settings
|
||||
|
||||
```bash
|
||||
# Ring buffers
|
||||
ethtool -g eth2
|
||||
|
||||
# TCP buffers
|
||||
sysctl net.core.rmem_max net.core.wmem_max net.ipv4.tcp_rmem net.ipv4.tcp_wmem
|
||||
|
||||
# Offloading
|
||||
ethtool -k eth2
|
||||
|
||||
# Flow control
|
||||
ethtool -a eth2
|
||||
|
||||
# MTU
|
||||
cat /sys/class/net/eth2/mtu
|
||||
```
|
||||
|
||||
### Apply Optimizations (Temporary)
|
||||
|
||||
```bash
|
||||
# Max ring buffers
|
||||
sudo ethtool -G eth2 rx 8184 tx 8184
|
||||
|
||||
# Increase TCP buffers
|
||||
sudo sysctl -w net.core.rmem_max=16777216
|
||||
sudo sysctl -w net.core.wmem_max=16777216
|
||||
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216"
|
||||
sudo sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
|
||||
```
|
||||
|
||||
> ⚠️ These settings reset on reboot. See "Making Changes Persistent" below.
|
||||
|
||||
### Running iperf3 Tests
|
||||
|
||||
```bash
|
||||
# Start server on Atlantis
|
||||
sudo docker run -d --rm --name iperf3-server --network host networkstatic/iperf3 -s
|
||||
|
||||
# Run upload test from Calypso
|
||||
sudo docker run --rm --network host networkstatic/iperf3 -c 192.168.0.200 -t 10 -P 4
|
||||
|
||||
# Run download test from Calypso (reverse mode)
|
||||
sudo docker run --rm --network host networkstatic/iperf3 -c 192.168.0.200 -t 10 -P 4 -R
|
||||
|
||||
# Stop server
|
||||
sudo docker stop iperf3-server
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Making Changes Persistent
|
||||
|
||||
### On Synology DSM (Recommended)
|
||||
|
||||
For MTU and basic network settings, use DSM GUI:
|
||||
- **Control Panel** → **Network** → **Network Interface**
|
||||
- Select interface → **Edit** → Configure settings
|
||||
|
||||
### Via sysctl.conf
|
||||
|
||||
Create `/etc/sysctl.d/99-network-tuning.conf`:
|
||||
```bash
|
||||
# TCP buffer sizes for 10GbE
|
||||
net.core.rmem_max = 16777216
|
||||
net.core.wmem_max = 16777216
|
||||
net.ipv4.tcp_rmem = 4096 87380 16777216
|
||||
net.ipv4.tcp_wmem = 4096 65536 16777216
|
||||
|
||||
# Additional tuning
|
||||
net.core.netdev_max_backlog = 250000
|
||||
net.ipv4.tcp_max_syn_backlog = 30000
|
||||
net.ipv4.tcp_tw_reuse = 1
|
||||
```
|
||||
|
||||
Apply: `sudo sysctl -p /etc/sysctl.d/99-network-tuning.conf`
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Jumbo Frames (MTU 9000)
|
||||
|
||||
### Why Jumbo Frames Help
|
||||
|
||||
Jumbo frames reduce per-packet overhead by sending larger packets (9000 bytes vs 1500 bytes). This can improve throughput by ~10-15% on 10GbE.
|
||||
|
||||
### Requirements
|
||||
|
||||
All devices in the path must support jumbo frames:
|
||||
- ✅ **TL-SX1008**: Supports up to 9KB frames
|
||||
- ✅ **Calypso**: Can be configured via DSM
|
||||
- ✅ **Atlantis**: Can be configured via DSM
|
||||
- ❌ **Archer BE19000**: Does NOT support jumbo frames
|
||||
|
||||
### Safe Configuration
|
||||
|
||||
Since Calypso and Atlantis communicate directly through the TL-SX1008 (not the router), jumbo frames can be enabled between them without affecting other devices:
|
||||
|
||||
```
|
||||
Calypso (MTU 9000) ──► TL-SX1008 ──► Atlantis (MTU 9000)
|
||||
│
|
||||
▼
|
||||
Archer (MTU 1500) ──► Other devices
|
||||
```
|
||||
|
||||
### Enabling Jumbo Frames
|
||||
|
||||
**Via DSM GUI (Persistent):**
|
||||
1. **Control Panel** → **Network** → **Network Interface**
|
||||
2. Select your 10G interface → **Edit**
|
||||
3. Set **MTU** to **9000**
|
||||
4. Click **OK**
|
||||
|
||||
**Via CLI (Temporary):**
|
||||
```bash
|
||||
sudo ip link set eth2 mtu 9000
|
||||
sudo ip link set ovs_eth2 mtu 9000
|
||||
```
|
||||
|
||||
> ⚠️ **Synology OVS Note**: On Synology with Open vSwitch, the `ovs_eth2` bridge interface may not accept MTU changes via CLI. Use DSM GUI instead.
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### High Retransmit Count
|
||||
|
||||
If you see many TCP retransmits in iperf3:
|
||||
1. Check ring buffer sizes (increase to max)
|
||||
2. Verify TCP buffers are tuned
|
||||
3. Check for packet loss: `ethtool -S eth2 | grep -i error`
|
||||
4. Verify flow control is enabled
|
||||
|
||||
### Asymmetric Speeds
|
||||
|
||||
If upload is slower than download:
|
||||
- This can be normal due to NIC/driver asymmetry
|
||||
- Check if one side has smaller buffers
|
||||
- Synology OVS adds some overhead
|
||||
|
||||
### Speed Below Expected
|
||||
|
||||
1. Verify link speed: `ethtool eth2 | grep Speed`
|
||||
2. Check for errors: `ethtool -S eth2`
|
||||
3. Test with single stream first: `iperf3 -c IP -t 10` (no `-P`)
|
||||
4. Check CPU usage during test (might be CPU-bound)
|
||||
|
||||
---
|
||||
|
||||
## 📈 Performance Summary
|
||||
|
||||
### Current Achieved Speeds
|
||||
|
||||
| Path | Speed | % of Line Rate |
|
||||
|------|-------|----------------|
|
||||
| Atlantis → Calypso | 9.27 Gbps | 93% ✅ |
|
||||
| Calypso → Atlantis | 7.35 Gbps | 74% |
|
||||
| NUC → Calypso (Tailscale) | 550 Mbps | N/A (WAN limited) |
|
||||
| NUC → Calypso (SMB) | 1.1 Gbps | N/A (caching benefit) |
|
||||
|
||||
### For Streaming Use Cases
|
||||
|
||||
These speeds are more than sufficient for:
|
||||
- **4K HDR streaming**: Requires ~80-150 Mbps ✅
|
||||
- **4K Remux playback**: Requires ~100-150 Mbps ✅
|
||||
- **Multiple concurrent 4K streams**: Easily supported ✅
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- [Network Infrastructure Guide](networking.md)
|
||||
- [10GbE Backbone Diagram](../diagrams/10gbe-backbone.md)
|
||||
- [Storage Topology](../diagrams/storage-topology.md)
|
||||
|
||||
---
|
||||
|
||||
*Last updated: January 2025*
|
||||
415
docs/infrastructure/networking.md
Normal file
415
docs/infrastructure/networking.md
Normal file
@@ -0,0 +1,415 @@
|
||||
# 🌐 Network Infrastructure Guide
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide covers the complete network infrastructure of the homelab, including the blazing-fast **25Gbps symmetric internet connection**, 10 Gigabit Ethernet backbone, Tailscale overlay network, and DNS architecture.
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Internet Connection
|
||||
|
||||
### **ISP Specifications**
|
||||
| Specification | Value |
|
||||
|---------------|-------|
|
||||
| **Download Speed** | 25 Gbps |
|
||||
| **Upload Speed** | 25 Gbps |
|
||||
| **Type** | Symmetric Fiber |
|
||||
| **Latency** | <5ms to major CDNs |
|
||||
|
||||
> **Note**: This enterprise-grade connection supports the entire infrastructure with bandwidth to spare, enabling true 10GbE LAN-to-WAN performance.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 10 Gigabit Ethernet Infrastructure
|
||||
|
||||
### **TP-Link TL-SX1008 - Core 10GbE Switch**
|
||||
|
||||
#### **Hardware Specifications**
|
||||
- **Model**: TP-Link TL-SX1008
|
||||
- **Type**: 8-port 10 Gigabit Ethernet unmanaged switch
|
||||
- **Ports**: 8x 10GBASE-T RJ45 ports
|
||||
- **Switching Capacity**: 160 Gbps
|
||||
- **Forwarding Rate**: 119.05 Mpps
|
||||
- **Power**: External power adapter
|
||||
- **Form Factor**: Desktop/rack-mountable
|
||||
|
||||
#### **Connected Systems**
|
||||
| Host | Interface Type | Use Case | Performance |
|
||||
|------|---------------|----------|-------------|
|
||||
| **Atlantis** | Built-in 10GbE | Media streaming, backup operations | Full 10Gbps |
|
||||
| **Calypso** | PCIe 10GbE card | Development, package caching | Full 10Gbps |
|
||||
| **Shinku-Ryuu** | PCIe 10GbE card | Gaming, creative work, large transfers | Full 10Gbps |
|
||||
| **Guava** | PCIe 10GbE card | AI/ML datasets, model training | Full 10Gbps |
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Network Topology
|
||||
|
||||
### **Physical Network Layout**
|
||||
```
|
||||
Internet (25Gbps Symmetric Fiber)
|
||||
│
|
||||
├── TP-Link Archer BE800 Router (WiFi 7)
|
||||
│ │
|
||||
│ ├── Main Network (192.168.0.0/24) ──── Trusted devices
|
||||
│ │ │
|
||||
│ │ └── Mesh Nodes (APs) ──── WiFi coverage
|
||||
│ │
|
||||
│ ├── IoT WiFi ──── Smart home devices (isolated)
|
||||
│ │
|
||||
│ └── Guest WiFi ──── Visitors (internet only)
|
||||
│
|
||||
└── TP-Link TL-SX1008 (10GbE Switch)
|
||||
├── Atlantis (192.168.0.200) - 10GbE
|
||||
├── Calypso (192.168.0.250) - 10GbE
|
||||
├── Shinku-Ryuu - 10GbE
|
||||
└── Guava - 10GbE
|
||||
```
|
||||
|
||||
### **Router Details**
|
||||
|
||||
| Specification | Value |
|
||||
|---------------|-------|
|
||||
| **Model** | TP-Link Archer BE800 |
|
||||
| **WiFi Standard** | WiFi 7 (802.11be) |
|
||||
| **WAN Port** | 10GbE |
|
||||
| **LAN Ports** | 4x 2.5GbE + 1x 10GbE |
|
||||
| **Mesh Support** | Yes (EasyMesh) |
|
||||
|
||||
### **Wireless Coverage**
|
||||
- **Primary Router**: TP-Link Archer BE800 (WiFi 7)
|
||||
- **Mesh Nodes**: Additional APs for whole-home coverage
|
||||
- **SSIDs**: Main, IoT, Guest (isolated networks)
|
||||
|
||||
### **Network Segments**
|
||||
|
||||
#### **Main Network (192.168.0.0/24)**
|
||||
- **Purpose**: Primary homelab infrastructure
|
||||
- **Speed**: 1GbE standard, 10GbE for high-performance systems
|
||||
- **Access**: Full LAN access, Tailscale routing
|
||||
- **Devices**: Servers, NAS, workstations, trusted devices
|
||||
|
||||
#### **IoT WiFi Network**
|
||||
- **Purpose**: Smart home devices, sensors
|
||||
- **Isolation**: Internet access only, no LAN access
|
||||
- **Devices**: Smart bulbs, sensors, cameras, etc.
|
||||
- **Note**: VLAN segmentation planned for future
|
||||
|
||||
#### **Guest Network**
|
||||
- **Purpose**: Visitor internet access
|
||||
- **Isolation**: Complete isolation from internal networks
|
||||
- **Features**: Bandwidth limiting, time restrictions available
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Headscale VPN Overlay
|
||||
|
||||
> **Self-Hosted Control Plane**: This homelab uses [Headscale](https://headscale.net/), a self-hosted Tailscale control server, rather than Tailscale cloud. The control server runs at `headscale.vish.gg:8443` on Calypso. All Tailscale clients are pointed to this server.
|
||||
|
||||
### **Headscale / Tailscale Network Architecture**
|
||||
```
|
||||
Headscale Mesh Network (100.x.x.x/10)
|
||||
├── Atlantis (100.83.230.112) - Primary NAS
|
||||
├── Calypso (100.103.48.78) - Secondary NAS, runs Headscale
|
||||
├── Setillo (100.125.0.20) - Remote NAS, Tucson
|
||||
├── Homelab VM (100.67.40.126) - Main monitoring/services VM
|
||||
├── PVE (100.87.12.28) - Proxmox hypervisor
|
||||
├── Guava (100.75.252.64) - TrueNAS Scale physical host
|
||||
├── Concord NUC (100.72.55.21) - Intel NUC, exit node
|
||||
├── Shinku-Ryuu (100.98.93.15) - Desktop workstation
|
||||
├── Pi-5 (100.77.151.40) - Raspberry Pi 5
|
||||
├── Pi-5-Kevin (100.123.246.75) - Raspberry Pi 5 (backup ISP)
|
||||
├── Jellyfish (100.69.121.120) - Pi 5 media/NAS
|
||||
├── GL-MT3000 (100.126.243.15) - GL.iNet router (Concord)
|
||||
├── GL-BE3600 (100.105.59.123) - GL.iNet router (Concord)
|
||||
├── Home Assistant (100.112.186.90) - HA Green via GL-MT3000
|
||||
├── Seattle VPS (100.82.197.124) - Contabo VPS exit node
|
||||
└── matrix-ubuntu (100.85.21.51) - Atlantis VM
|
||||
```
|
||||
|
||||
### **Headscale Benefits**
|
||||
- **Self-Hosted Control**: Full ownership of coordination server and private keys
|
||||
- **Zero-Config Mesh**: Automatic peer-to-peer networking
|
||||
- **MagicDNS**: Device hostnames via `tail.vish.gg` suffix
|
||||
- **Mobile Access**: Secure remote access from anywhere
|
||||
- **Cross-Platform**: Works on all devices and operating systems
|
||||
- **NAT Traversal**: Works behind firewalls and NAT (via DERP relays)
|
||||
- **Unlimited Devices**: No tier limits unlike Tailscale cloud free tier
|
||||
|
||||
---
|
||||
|
||||
## 🌐 DNS Architecture
|
||||
|
||||
### **Split-Horizon DNS with AdGuard Home**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ DNS RESOLUTION FLOW │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Query: plex.vish.gg │
|
||||
│ │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
|
||||
│ │ Device │───►│ AdGuard │───►│ Cloudflare │ │
|
||||
│ │ (Client) │ │ Home │ │ DNS │ │
|
||||
│ └─────────────┘ └──────┬──────┘ └─────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────▼──────┐ │
|
||||
│ │ Local Match? │ │
|
||||
│ └──────┬──────┘ │
|
||||
│ │ │
|
||||
│ ┌─────────────┼─────────────┐ │
|
||||
│ │ YES │ │ NO │
|
||||
│ ▼ │ ▼ │
|
||||
│ Return Local IP │ Forward to Upstream │
|
||||
│ (192.168.0.x) │ (Cloudflare) │
|
||||
│ │ │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### **AdGuard Home Instances**
|
||||
|
||||
| Host | Location | Purpose | Tailscale IP |
|
||||
|------|----------|---------|--------------|
|
||||
| **Concord NUC** | Home | Primary DNS for home network | 100.72.55.21 |
|
||||
| **Calypso** | Home | Secondary DNS, local services | 100.103.48.78 |
|
||||
|
||||
### **DNS Features**
|
||||
- **Ad Blocking**: Network-wide ad blocking for all devices
|
||||
- **Split-Horizon**: Local services resolve to internal IPs when on Tailscale
|
||||
- **Query Logging**: DNS query analytics and monitoring
|
||||
- **Parental Controls**: Content filtering capabilities
|
||||
- **Custom Rewrites**: *.vish.gg → local IPs when internal
|
||||
|
||||
### **Split-Horizon Example**
|
||||
|
||||
| Query | From Internet | From Tailscale/LAN |
|
||||
|-------|--------------|-------------------|
|
||||
| `plex.vish.gg` | → Cloudflare → Public IP | → AdGuard → 192.168.0.80 |
|
||||
| `git.vish.gg` | → Cloudflare → Public IP | → AdGuard → 192.168.0.250 |
|
||||
| `grafana.vish.gg` | → Cloudflare → Public IP | → AdGuard → Internal IP |
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Network Performance
|
||||
|
||||
### **10GbE Performance Benefits**
|
||||
|
||||
#### **Media Streaming**
|
||||
- **4K Content**: Smooth streaming without buffering
|
||||
- **8K Content**: Future-proof for ultra-high resolution
|
||||
- **Multiple Streams**: Concurrent 4K streams to multiple devices
|
||||
- **Plex Performance**: Instant transcoding and delivery
|
||||
|
||||
#### **Backup Operations**
|
||||
- **NAS-to-NAS**: Fast synchronization between Atlantis and Calypso
|
||||
- **Incremental Backups**: Rapid delta transfers
|
||||
- **Snapshot Replication**: Quick BTRFS/ZFS snapshot transfers
|
||||
- **Disaster Recovery**: Fast restoration from backups
|
||||
|
||||
#### **Development Workflows**
|
||||
- **Docker Images**: Rapid container image pulls/pushes
|
||||
- **Package Caching**: Fast APT/NPM/PyPI cache access
|
||||
- **Git Operations**: Large repository clones and pushes
|
||||
- **Build Artifacts**: Quick distribution of compiled binaries
|
||||
|
||||
#### **AI/ML Workloads**
|
||||
- **Dataset Transfers**: Multi-GB datasets in seconds
|
||||
- **Model Training**: Fast data loading during training
|
||||
- **Model Sharing**: Quick distribution of trained models
|
||||
- **Jupyter Notebooks**: Responsive remote notebook access
|
||||
|
||||
#### **Creative Work**
|
||||
- **Video Editing**: 4K/8K raw footage transfers
|
||||
- **Photo Libraries**: RAW image synchronization
|
||||
- ** 3D Rendering**: Asset and render file distribution
|
||||
- **Audio Production**: Multi-track project sharing
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Network Configuration
|
||||
|
||||
### **10GbE Interface Configuration**
|
||||
|
||||
#### **Atlantis (Built-in 10GbE)**
|
||||
```bash
|
||||
# Check interface status
|
||||
ip addr show eth1
|
||||
|
||||
# Configure static IP (if needed)
|
||||
sudo nmcli con mod "Wired connection 2" ipv4.addresses 10.0.0.112/24
|
||||
sudo nmcli con mod "Wired connection 2" ipv4.gateway 10.0.0.1
|
||||
sudo nmcli con mod "Wired connection 2" ipv4.dns 10.0.0.1
|
||||
sudo nmcli con up "Wired connection 2"
|
||||
```
|
||||
|
||||
#### **PCIe 10GbE Cards (Calypso, Shinku-Ryuu, Guava)**
|
||||
```bash
|
||||
# Install drivers (if needed)
|
||||
sudo apt update
|
||||
sudo apt install linux-headers-$(uname -r)
|
||||
|
||||
# Check PCI device
|
||||
lspci | grep -i ethernet
|
||||
|
||||
# Configure interface
|
||||
sudo nmcli con add type ethernet ifname eth1 con-name 10gbe
|
||||
sudo nmcli con mod 10gbe ipv4.addresses 10.0.0.XXX/24
|
||||
sudo nmcli con mod 10gbe ipv4.gateway 10.0.0.1
|
||||
sudo nmcli con mod 10gbe ipv4.dns 10.0.0.1
|
||||
sudo nmcli con mod 10gbe ipv4.method manual
|
||||
sudo nmcli con up 10gbe
|
||||
```
|
||||
|
||||
### **Performance Testing**
|
||||
|
||||
#### **Bandwidth Testing**
|
||||
```bash
|
||||
# Install iperf3
|
||||
sudo apt install iperf3
|
||||
|
||||
# Server mode (on target system)
|
||||
iperf3 -s
|
||||
|
||||
# Client mode (test from another system)
|
||||
iperf3 -c 10.0.0.112 -t 30 -P 4
|
||||
|
||||
# Expected results: ~9.4 Gbps (accounting for overhead)
|
||||
```
|
||||
|
||||
#### **Latency Testing**
|
||||
```bash
|
||||
# Ping test
|
||||
ping -c 100 10.0.0.112
|
||||
|
||||
# Expected results: <1ms latency on local network
|
||||
```
|
||||
|
||||
#### **Real-World Performance**
|
||||
```bash
|
||||
# Large file transfer test
|
||||
scp large_file.bin user@10.0.0.112:/tmp/
|
||||
|
||||
# rsync performance test
|
||||
rsync -avz --progress /large/dataset/ user@10.0.0.112:/storage/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌍 Public Access & Cloudflare
|
||||
|
||||
### **Publicly Accessible Services**
|
||||
|
||||
All public services are accessed via `*.vish.gg` domain through Cloudflare:
|
||||
|
||||
```
|
||||
Internet User
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Cloudflare │ ← DDoS protection, WAF, SSL
|
||||
│ (Proxy) │
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Router :443 │ ← Only ports 80/443 forwarded
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Nginx Proxy │ ← SSL termination, routing
|
||||
│ Manager │
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Internal Service│ ← Plex, Gitea, Grafana, etc.
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
### **Cloudflare Configuration**
|
||||
|
||||
| Setting | Value |
|
||||
|---------|-------|
|
||||
| **SSL Mode** | Full (Strict) |
|
||||
| **Always HTTPS** | Enabled |
|
||||
| **Minimum TLS** | 1.2 |
|
||||
| **Proxy Status** | Proxied (orange cloud) |
|
||||
| **DDoS Protection** | Always On |
|
||||
|
||||
### **Port Forwarding**
|
||||
|
||||
| External Port | Internal Destination | Purpose |
|
||||
|---------------|---------------------|---------|
|
||||
| 80 | Nginx Proxy Manager | HTTP → HTTPS redirect |
|
||||
| 443 | Nginx Proxy Manager | HTTPS services |
|
||||
|
||||
> **Security Note**: All other ports are blocked. Internal services are accessed via Tailscale VPN.
|
||||
|
||||
### **Cloudflare Tunnels**
|
||||
Some services use Cloudflare Tunnels as an alternative to port forwarding:
|
||||
- Zero-config public access
|
||||
- No ports exposed on router
|
||||
- Additional DDoS protection
|
||||
|
||||
---
|
||||
|
||||
## 🛡️ Network Security
|
||||
|
||||
### **Firewall Configuration**
|
||||
- **Router Firewall**: TP-Link Archer BE800 built-in firewall
|
||||
- **Exposed Ports**: Only 80 and 443 for reverse proxy
|
||||
- **Default Policy**: Deny all inbound except allowed
|
||||
- **VPN Security**: Headscale/Tailscale encrypted mesh networking
|
||||
|
||||
### **Access Control**
|
||||
- **SSH Keys**: Key-based authentication for all Linux systems
|
||||
- **Port Security**: Non-standard SSH ports where applicable
|
||||
- **Service Binding**: Services bound to specific interfaces
|
||||
- **Headscale ACLs**: Network access control policies
|
||||
|
||||
---
|
||||
|
||||
## 📊 Network Monitoring
|
||||
|
||||
### **Monitoring Tools**
|
||||
- **Grafana**: Network performance dashboards
|
||||
- **Prometheus**: Metrics collection and alerting
|
||||
- **SNMP Monitoring**: Switch and router monitoring
|
||||
- **Uptime Kuma**: Service availability monitoring
|
||||
|
||||
### **Key Metrics**
|
||||
- **Bandwidth Utilization**: 10GbE link usage
|
||||
- **Latency**: Inter-host communication delays
|
||||
- **Packet Loss**: Network reliability metrics
|
||||
- **Connection Counts**: Active network connections
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Network Maintenance
|
||||
|
||||
### **Regular Tasks**
|
||||
- **Firmware Updates**: Router and switch firmware
|
||||
- **Cable Management**: Organize and label cables
|
||||
- **Performance Testing**: Regular bandwidth tests
|
||||
- **Security Audits**: Network vulnerability scans
|
||||
|
||||
### **Troubleshooting**
|
||||
- **Link Status**: Check physical connections
|
||||
- **Speed Negotiation**: Verify 10GbE link speeds
|
||||
- **DNS Resolution**: Test hostname resolution
|
||||
- **Routing Tables**: Verify network routing
|
||||
|
||||
---
|
||||
|
||||
## 📋 Related Documentation
|
||||
|
||||
- **[Host Infrastructure](hosts.md)**: Detailed host specifications
|
||||
- **[Headscale Setup](../services/individual/headscale.md)**: Self-hosted Tailscale control server
|
||||
- **[Tailscale Mesh Diagram](../diagrams/tailscale-mesh.md)**: Full mesh network map
|
||||
- **[Network Topology](../diagrams/network-topology.md)**: Physical network layout
|
||||
|
||||
---
|
||||
|
||||
*This network infrastructure provides enterprise-level performance and reliability for the homelab environment, supporting everything from basic web browsing to high-performance computing workloads.*
|
||||
360
docs/infrastructure/npm-migration-jan2026.md
Normal file
360
docs/infrastructure/npm-migration-jan2026.md
Normal file
@@ -0,0 +1,360 @@
|
||||
# NPM Migration & Authentik Configuration (January 2026)
|
||||
|
||||
This document details the migration from Synology's built-in reverse proxy to Nginx Proxy Manager (NPM) with Authentik SSO protection.
|
||||
|
||||
## Migration Summary
|
||||
|
||||
**Date**: January 31, 2026
|
||||
**Status**: Complete
|
||||
**Last Updated**: January 31, 2026 (Session 2)
|
||||
**Performed by**: OpenHands AI Agent
|
||||
|
||||
### What Changed
|
||||
|
||||
1. **Router Configuration**
|
||||
- Port 443 → 192.168.0.250:8443 (NPM HTTPS)
|
||||
- Port 80 → 192.168.0.250:8880 (NPM HTTP)
|
||||
|
||||
2. **NPM Container Ports**
|
||||
- HTTP: 8880 → 80 (internal)
|
||||
- HTTPS: 8443 → 443 (internal)
|
||||
- Admin: 81 → 81 (internal)
|
||||
|
||||
3. **Cleaned up duplicate .synology.me entries** (11 deleted)
|
||||
4. **Created new .vish.gg equivalents** for services that only had .synology.me
|
||||
5. **Added Cloudflare Origin Certificates** for thevish.io and crista.love domains
|
||||
6. **Changed Cloudflare SSL mode** from "Full (strict)" to "Full" for thevish.io
|
||||
7. **Fixed meet.thevish.io (Jitsi)**:
|
||||
- Enabled Cloudflare proxy (was DNS-only)
|
||||
- Changed backend to HTTPS (port 5443 uses SSL internally)
|
||||
- Added WebSocket support for XMPP connections
|
||||
8. **Fixed joplin.thevish.io**: Works correctly - `/login` accessible, root returns 400 (expected API behavior)
|
||||
|
||||
---
|
||||
|
||||
## Access Credentials
|
||||
|
||||
### NPM (Nginx Proxy Manager)
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| URL | https://npm.vish.gg or http://192.168.0.250:81 (local) |
|
||||
| Email | user@example.com |
|
||||
| Password | REDACTED_NPM_PASSWORD |
|
||||
| API Port | 81 |
|
||||
|
||||
> Note: npm.vish.gg shows "Not Secure" because the wildcard cert doesn't cover it. Access locally at http://192.168.0.250:81 for admin tasks.
|
||||
|
||||
### Authentik SSO
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| URL | https://sso.vish.gg |
|
||||
| Admin Username | akadmin |
|
||||
| Recovery Command | `docker exec -it Authentik-SERVER ak create_recovery_key 10 akadmin` |
|
||||
| Secret Key | RpRexcYo5HAzvb8UGBhznwhq17sa2HALAYdMN51LR1ZBg5iL |
|
||||
| PostgreSQL Password | ANJXq7n70DFEgWE+gD1qKhY/cXgQDPpjAJeF+Huiac8= |
|
||||
|
||||
### Portainer
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| URL | http://vishinator.synology.me:10000 |
|
||||
| API Key | ptr_REDACTED_PORTAINER_TOKEN |
|
||||
| NPM Endpoint ID | 443397 |
|
||||
|
||||
### Cloudflare API
|
||||
|
||||
| Field | Value |
|
||||
|-------|-------|
|
||||
| Token | REDACTED_CLOUDFLARE_TOKEN |
|
||||
| vish.gg Zone ID | 4dbd15d096d71101b7c0c6362b307a66 |
|
||||
| thevish.io Zone ID | 11681f1c93ca32f56a0c41973e02b6f9 |
|
||||
| crista.love Zone ID | (not documented) |
|
||||
|
||||
---
|
||||
|
||||
## SSL Certificates
|
||||
|
||||
### Certificate Inventory
|
||||
|
||||
| ID | Domain | Type | Expires | Location |
|
||||
|----|--------|------|---------|----------|
|
||||
| 1 | `*.vish.gg`, `vish.gg` | Cloudflare Origin | 2041 | `/data/custom_ssl/npm-1/` |
|
||||
| 2 | `*.thevish.io`, `thevish.io` | Cloudflare Origin | 2041-01-27 | `/data/custom_ssl/npm-2/` |
|
||||
| 3 | `*.crista.love`, `crista.love` | Cloudflare Origin | 2041-01-21 | `/data/custom_ssl/npm-3/` |
|
||||
|
||||
### Cloudflare SSL Mode Settings
|
||||
|
||||
| Zone | SSL Mode | Notes |
|
||||
|------|----------|-------|
|
||||
| vish.gg | Full | Works with Origin CA |
|
||||
| thevish.io | Full | Changed from Full (strict) on 2026-01-31 |
|
||||
| crista.love | Full | Works with Origin CA |
|
||||
|
||||
---
|
||||
|
||||
## Proxy Host Inventory
|
||||
|
||||
### vish.gg Domains (20 total, SSL cert ID 1)
|
||||
|
||||
| Domain | Backend | Port | Authentik | Status |
|
||||
|--------|---------|------|-----------|--------|
|
||||
| actual.vish.gg | 192.168.0.250 | 8304 | ✅ Yes | ✅ Working |
|
||||
| cal.vish.gg | 192.168.0.200 | 12852 | No | ✅ Working |
|
||||
| dav.vish.gg | 192.168.0.250 | 8612 | No | ✅ Working |
|
||||
| docs.vish.gg | 192.168.0.250 | 8777 | ✅ Yes | ✅ Working |
|
||||
| gf.vish.gg | 192.168.0.210 | 3300 | ✅ Yes | ✅ Working |
|
||||
| git.vish.gg | 192.168.0.250 | 3052 | No (own auth) | ✅ Working |
|
||||
| mastodon.vish.gg | 192.168.0.154 | 3000 | No (public) | ✅ Working |
|
||||
| mx.vish.gg | 192.168.0.154 | 8082 | No | ✅ Working |
|
||||
| npm.vish.gg | 192.168.0.250 | 81 | ✅ Yes | ✅ Working |
|
||||
| ntfy.vish.gg | 192.168.0.210 | 8081 | No (API access needed) | ✅ Working |
|
||||
| ollama.vish.gg | 192.168.0.200 | 11434 | No | ✅ Working |
|
||||
| ost.vish.gg | 192.168.0.250 | 8004 | No | ✅ Working |
|
||||
| paperless.vish.gg | 192.168.0.250 | 8777 | ✅ Yes | ✅ Working |
|
||||
| pw.vish.gg | 192.168.0.200 | 4080 | No (Vaultwarden) | ✅ Working |
|
||||
| rackula.vish.gg | 192.168.0.250 | 3891 | No | ✅ Working |
|
||||
| retro.vish.gg | 192.168.0.250 | 8025 | No | ⚠️ 403 (upstream issue) |
|
||||
| rxv4access.vish.gg | 192.168.0.250 | 9751 | No | ✅ Working |
|
||||
| rxv4download.vish.gg | 192.168.0.250 | 9753 | No | ✅ Working |
|
||||
| sf.vish.gg | 192.168.0.250 | 8611 | No (Seafile) | ✅ Working |
|
||||
| sso.vish.gg | 192.168.0.250 | 9000 | No (Authentik itself) | ✅ Working |
|
||||
|
||||
### thevish.io Domains (5 total, SSL cert ID 2)
|
||||
|
||||
| Domain | Backend | Port | Status | Notes |
|
||||
|--------|---------|------|--------|-------|
|
||||
| binterest.thevish.io | 192.168.0.210 | 21544 | ✅ Working | |
|
||||
| hoarder.thevish.io | 192.168.0.210 | 3000 | ✅ Working | Returns 307 redirect |
|
||||
| joplin.thevish.io | 192.168.0.200 | 22300 | ✅ Working | /login works, / returns 400 (expected for API) |
|
||||
| matrix.thevish.io | 192.168.0.154 | 8081 | ✅ Working | |
|
||||
| meet.thevish.io | 192.168.0.200 | 5443 | ✅ Working | HTTPS backend, WebSocket config added |
|
||||
|
||||
### crista.love Domains (3 total, SSL cert ID 3)
|
||||
|
||||
| Domain | Backend | Port | Status | Notes |
|
||||
|--------|---------|------|--------|-------|
|
||||
| crista.love | 192.168.0.100 | 28888 | ✅ Working | Academic portfolio site |
|
||||
| cocalc.crista.love | 192.168.0.100 | 8080 | ❌ 502 | Backend service is down |
|
||||
| mm.crista.love | 192.168.0.154 | 8065 | ✅ Working | Mattermost |
|
||||
|
||||
---
|
||||
|
||||
## Authentik Forward Auth Configuration
|
||||
|
||||
Services protected by Authentik use this NPM Advanced Configuration:
|
||||
|
||||
```nginx
|
||||
# Authentik Forward Auth Configuration
|
||||
proxy_buffers 8 16k;
|
||||
proxy_buffer_size 32k;
|
||||
|
||||
auth_request /outpost.goauthentik.io/auth/nginx;
|
||||
error_page 401 = @goauthentik_proxy_signin;
|
||||
|
||||
auth_request_set $auth_cookie $upstream_http_set_cookie;
|
||||
add_header Set-Cookie $auth_cookie;
|
||||
|
||||
auth_request_set $authentik_username $upstream_http_x_authentik_username;
|
||||
auth_request_set $authentik_groups $upstream_http_x_authentik_groups;
|
||||
auth_request_set $authentik_email $upstream_http_x_authentik_email;
|
||||
auth_request_set $authentik_name $upstream_http_x_authentik_name;
|
||||
auth_request_set $authentik_uid $upstream_http_x_authentik_uid;
|
||||
|
||||
proxy_set_header X-authentik-username $authentik_username;
|
||||
proxy_set_header X-authentik-groups $authentik_groups;
|
||||
proxy_set_header X-authentik-email $authentik_email;
|
||||
proxy_set_header X-authentik-name $authentik_name;
|
||||
proxy_set_header X-authentik-uid $authentik_uid;
|
||||
|
||||
location /outpost.goauthentik.io {
|
||||
proxy_pass http://192.168.0.250:9000/outpost.goauthentik.io;
|
||||
proxy_set_header Host $host;
|
||||
proxy_set_header X-Original-URL $scheme://$http_host$request_uri;
|
||||
add_header Set-Cookie $auth_cookie;
|
||||
auth_request_set $auth_cookie $upstream_http_set_cookie;
|
||||
proxy_pass_request_body off;
|
||||
proxy_set_header Content-Length "";
|
||||
}
|
||||
|
||||
location @goauthentik_proxy_signin {
|
||||
internal;
|
||||
add_header Set-Cookie $auth_cookie;
|
||||
return 302 https://sso.vish.gg/outpost.goauthentik.io/start?rd=$scheme://$http_host$request_uri;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cloudflare DNS Configuration
|
||||
|
||||
### vish.gg Zone
|
||||
|
||||
All subdomains should be **Proxied** (orange cloud) and point to `YOUR_WAN_IP`.
|
||||
|
||||
Missing DNS records were added during migration:
|
||||
- paperless.vish.gg
|
||||
- ollama.vish.gg
|
||||
- rxv4access.vish.gg
|
||||
- rxv4download.vish.gg
|
||||
|
||||
### thevish.io Zone
|
||||
|
||||
All subdomains point to `YOUR_WAN_IP` and are proxied.
|
||||
|
||||
**Important**: SSL mode must be "Full" (not "Full strict") for Origin CA certs to work.
|
||||
|
||||
### crista.love Zone
|
||||
|
||||
Subdomains point to `YOUR_WAN_IP` and are proxied.
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### NPM Returns 500 Error
|
||||
Check if Authentik outpost is accessible:
|
||||
```bash
|
||||
curl -I http://192.168.0.250:9000/outpost.goauthentik.io/auth/nginx
|
||||
```
|
||||
|
||||
### Authentik Recovery
|
||||
```bash
|
||||
docker exec -it Authentik-SERVER ak create_recovery_key 10 akadmin
|
||||
```
|
||||
Then visit: `https://sso.vish.gg/recovery/use-token/<TOKEN>/`
|
||||
|
||||
### Check NPM Logs
|
||||
Via Portainer or:
|
||||
```bash
|
||||
docker logs nginx-proxy-manager
|
||||
```
|
||||
|
||||
### Test Domain Resolution
|
||||
```bash
|
||||
curl -sI -k https://domain.vish.gg | head -5
|
||||
```
|
||||
|
||||
### 522 Error (Connection Timed Out)
|
||||
- Check if Cloudflare can reach your origin (port 443 forwarded?)
|
||||
- Verify SSL mode is "Full" not "Full (strict)" for Origin CA certs
|
||||
- Check if backend service is running
|
||||
|
||||
### 525 Error (SSL Handshake Failed)
|
||||
- Origin expects HTTPS but backend doesn't have SSL
|
||||
- Check `forward_scheme` is set to `http` in NPM for internal services
|
||||
|
||||
### Host Shows "Offline" in NPM
|
||||
- Config file may not be generated
|
||||
- Re-save the host in NPM to regenerate config
|
||||
- Or manually create config in `/data/nginx/proxy_host/{id}.conf`
|
||||
|
||||
---
|
||||
|
||||
## TODO / Known Issues
|
||||
|
||||
1. ~~**thevish.io domains**: Need SSL certificates~~ ✅ Fixed - Origin certs added
|
||||
2. ~~**crista.love domains**: Need SSL certificates~~ ✅ Fixed - Origin certs added
|
||||
3. ~~**Change NPM password**: Currently using default~~ ✅ Changed to REDACTED_NPM_PASSWORD
|
||||
4. **retro.vish.gg**: Returns 403 - check upstream service
|
||||
5. ~~**joplin.thevish.io**: Returns 400~~ ✅ Works correctly - /login accessible
|
||||
6. ~~**meet.thevish.io**: DNS not proxied~~ ✅ Fixed - Enabled proxy, HTTPS backend, WebSocket support
|
||||
7. **cocalc.crista.love**: Backend service (192.168.0.100:8080) is down
|
||||
8. ~~**crista.love**: Verify correct backend~~ ✅ Working - Academic portfolio site
|
||||
|
||||
---
|
||||
|
||||
## Jitsi Meet (meet.thevish.io) WebSocket Configuration
|
||||
|
||||
Jitsi requires special WebSocket handling for XMPP connections. The NPM config at `/data/nginx/proxy_host/18.conf` includes:
|
||||
|
||||
```nginx
|
||||
# meet.thevish.io - Jitsi Meet with WebSocket support
|
||||
map $scheme $hsts_header {
|
||||
https "max-age=63072000; preload";
|
||||
}
|
||||
|
||||
map $http_upgrade $connection_upgrade {
|
||||
default upgrade;
|
||||
'' close;
|
||||
}
|
||||
|
||||
server {
|
||||
set $forward_scheme https; # Jitsi uses HTTPS internally
|
||||
set $server "192.168.0.200";
|
||||
set $port 5443;
|
||||
|
||||
listen 80;
|
||||
listen 443 ssl;
|
||||
server_name meet.thevish.io;
|
||||
http2 on;
|
||||
|
||||
ssl_certificate /data/custom_ssl/npm-2/fullchain.pem;
|
||||
ssl_certificate_key /data/custom_ssl/npm-2/privkey.pem;
|
||||
|
||||
# XMPP WebSocket endpoint - critical for Jitsi
|
||||
location /xmpp-websocket {
|
||||
proxy_pass $forward_scheme://$server:$port/xmpp-websocket;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection "upgrade";
|
||||
proxy_set_header Host $host;
|
||||
proxy_read_timeout 3600s;
|
||||
proxy_send_timeout 3600s;
|
||||
}
|
||||
|
||||
# BOSH endpoint (fallback)
|
||||
location /http-bind {
|
||||
proxy_pass $forward_scheme://$server:$port/http-bind;
|
||||
proxy_buffering off;
|
||||
tcp_nodelay on;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Manual Config Creation
|
||||
|
||||
If NPM shows a host as "offline" and won't generate configs, create manually:
|
||||
|
||||
```bash
|
||||
# Inside NPM container
|
||||
cat > /data/nginx/proxy_host/{ID}.conf << 'EOF'
|
||||
# {domain}
|
||||
map $scheme $hsts_header {
|
||||
https "max-age=63072000; preload";
|
||||
}
|
||||
server {
|
||||
set $forward_scheme http;
|
||||
set $server "{backend_ip}";
|
||||
set $port {backend_port};
|
||||
listen 80;
|
||||
listen 443 ssl;
|
||||
server_name {domain};
|
||||
http2 on;
|
||||
ssl_certificate /data/custom_ssl/npm-{cert_id}/fullchain.pem;
|
||||
ssl_certificate_key /data/custom_ssl/npm-{cert_id}/privkey.pem;
|
||||
include conf.d/include/block-exploits.conf;
|
||||
include conf.d/include/force-ssl.conf;
|
||||
access_log /data/logs/proxy-host-{ID}_access.log proxy;
|
||||
error_log /data/logs/proxy-host-{ID}_error.log warn;
|
||||
location / {
|
||||
include conf.d/include/proxy.conf;
|
||||
}
|
||||
include /data/nginx/custom/server_proxy[.]conf;
|
||||
}
|
||||
EOF
|
||||
|
||||
# Then reload nginx
|
||||
nginx -t && nginx -s reload
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Authentik SSO Setup](./authentik-sso.md)
|
||||
- [Cloudflare DNS](./cloudflare-dns.md)
|
||||
- [Service Documentation](../services/README.md)
|
||||
275
docs/infrastructure/npm-migration-to-matrix-ubuntu.md
Normal file
275
docs/infrastructure/npm-migration-to-matrix-ubuntu.md
Normal file
@@ -0,0 +1,275 @@
|
||||
# NPM Migration: Calypso → matrix-ubuntu
|
||||
|
||||
**Status:** COMPLETE
|
||||
**Completed:** 2026-03-20
|
||||
**Risk:** Medium (all proxied services briefly down during cutover)
|
||||
|
||||
## Overview
|
||||
|
||||
Migrate Nginx Proxy Manager from Calypso (Synology DS723+) to matrix-ubuntu VM (192.168.0.154) to enable split-horizon DNS. Synology's built-in nginx occupies ports 80/443 and can't be easily moved, so NPM gets a new home where it can bind 80/443 directly.
|
||||
|
||||
## Current State
|
||||
|
||||
```
|
||||
Internet → Router:443 → Calypso:8443 (NPM) → backends
|
||||
Internet → Router:80 → Calypso:8880 (NPM) → backends
|
||||
```
|
||||
|
||||
| Component | Location | Ports |
|
||||
|-----------|----------|-------|
|
||||
| NPM | Calypso (192.168.0.250) | 8880/8443/81 |
|
||||
| Host nginx | matrix-ubuntu (192.168.0.154) | 443 (mastodon, matrix, mattermost) |
|
||||
| Synology nginx | Calypso (192.168.0.250) | 80/443 (DSM redirect, can't remove) |
|
||||
|
||||
## Target State
|
||||
|
||||
```
|
||||
Internet → Router:443 → matrix-ubuntu:443 (NPM) → backends
|
||||
Internet → Router:80 → matrix-ubuntu:80 (NPM) → backends
|
||||
LAN → AdGuard → matrix-ubuntu:443 (NPM) → backends (split-horizon)
|
||||
```
|
||||
|
||||
| Component | Location | Ports |
|
||||
|-----------|----------|-------|
|
||||
| NPM | matrix-ubuntu (192.168.0.154) | **80/443/81** |
|
||||
| Host nginx | **removed** (NPM handles all routing) | — |
|
||||
| Synology nginx | Calypso (unchanged) | 80/443 (irrelevant, not used) |
|
||||
|
||||
## Pre-Migration Checklist
|
||||
|
||||
- [x] Back up Calypso NPM data (`/home/homelab/backups/npm-migration-20260320/npm-backup-20260320.tar.gz`)
|
||||
- [x] Back up matrix-ubuntu nginx config (`/home/homelab/backups/npm-migration-20260320/nginx-backup-20260320.tar.gz`)
|
||||
- [x] Verify matrix-ubuntu has sufficient resources (16GB RAM, 1TB disk as of 2026-03-27)
|
||||
- [x] Verify port 80 is free on matrix-ubuntu
|
||||
- [x] Port 443 freed — host nginx stopped and disabled during migration
|
||||
|
||||
## Services Currently on matrix-ubuntu's Host Nginx
|
||||
|
||||
These 3 services use host nginx on port 443 with SNI-based routing:
|
||||
|
||||
| Domain | Backend | nginx Config |
|
||||
|--------|---------|-------------|
|
||||
| mastodon.vish.gg | localhost:3000 (Mastodon web) | `/etc/nginx/sites-enabled/mastodon` |
|
||||
| mx.vish.gg | localhost:8008 (Synapse) on 443, localhost:8018 on 8082 | `/etc/nginx/sites-enabled/matrix` |
|
||||
| mm.crista.love | localhost:8065 (Mattermost) | `/etc/nginx/sites-enabled/mattermost` |
|
||||
|
||||
**These must be re-created as NPM proxy hosts** before removing host nginx.
|
||||
|
||||
Additional matrix-ubuntu nginx services on non-443 ports (can coexist or migrate):
|
||||
|
||||
| Domain | Port | Backend |
|
||||
|--------|------|---------|
|
||||
| matrix.thevish.io | 8081 | localhost:8008 |
|
||||
| mx.vish.gg (federation) | 8082 | localhost:8018 |
|
||||
| mx.vish.gg (client) | 8080 | localhost:8008 |
|
||||
|
||||
## Migration Steps
|
||||
|
||||
### Phase 1: Install NPM on matrix-ubuntu
|
||||
|
||||
```bash
|
||||
# Create NPM data directory
|
||||
ssh matrix-ubuntu "sudo mkdir -p /opt/npm/{data,letsencrypt}"
|
||||
|
||||
# Deploy NPM via docker compose (initially on temp ports to avoid conflict)
|
||||
# Use ports 8880/8443/81 while host nginx still runs on 443
|
||||
```
|
||||
|
||||
Compose file to create at `hosts/vms/matrix-ubuntu/nginx-proxy-manager.yaml`:
|
||||
```yaml
|
||||
services:
|
||||
nginx-proxy-manager:
|
||||
image: jc21/nginx-proxy-manager:latest
|
||||
container_name: nginx-proxy-manager
|
||||
ports:
|
||||
- "80:80" # HTTP
|
||||
- "443:443" # HTTPS
|
||||
- "81:81" # Admin UI
|
||||
environment:
|
||||
TZ: America/Los_Angeles
|
||||
volumes:
|
||||
- /opt/npm/data:/data
|
||||
- /opt/npm/letsencrypt:/etc/letsencrypt
|
||||
restart: unless-stopped
|
||||
```
|
||||
|
||||
### Phase 2: Migrate NPM Data
|
||||
|
||||
```bash
|
||||
# Copy NPM data from Calypso to matrix-ubuntu
|
||||
scp /home/homelab/backups/npm-migration-20260320/npm-backup-20260320.tar.gz matrix-ubuntu:/tmp/
|
||||
|
||||
# Extract to NPM directory
|
||||
ssh matrix-ubuntu "sudo tar xzf /tmp/npm-backup-20260320.tar.gz -C /opt/npm/data/"
|
||||
```
|
||||
|
||||
This brings over all 36 proxy hosts, SSL certs, access lists, and configuration.
|
||||
|
||||
### Phase 3: Update Proxy Host Backends
|
||||
|
||||
Several proxy hosts currently point to `192.168.0.250` (Calypso LAN IP) for services still on Calypso. These stay the same — NPM on matrix-ubuntu will proxy to Calypso's IP just like before.
|
||||
|
||||
Proxy hosts that currently point to `100.67.40.126` (homelab-vm Tailscale) should be updated to LAN IPs for better performance:
|
||||
|
||||
| Domain | Current Backend | New Backend |
|
||||
|--------|----------------|-------------|
|
||||
| gf.vish.gg | 100.67.40.126:3300 | 192.168.0.210:3300 |
|
||||
| nb.vish.gg | 100.67.40.126:8443 | 192.168.0.210:8443 |
|
||||
| ntfy.vish.gg | 100.67.40.126:8081 | 192.168.0.210:8081 |
|
||||
| scrutiny.vish.gg | 100.67.40.126:8090 | 192.168.0.210:8090 |
|
||||
| hoarder.thevish.io | 100.67.40.126:3482 | 192.168.0.210:3482 |
|
||||
| binterest.thevish.io | 100.67.40.126:21544 | 192.168.0.210:21544 |
|
||||
|
||||
Add new proxy hosts for services currently handled by host nginx:
|
||||
|
||||
| Domain | Backend | SSL |
|
||||
|--------|---------|-----|
|
||||
| mastodon.vish.gg | http://127.0.0.1:3000 | *.vish.gg cert |
|
||||
| mx.vish.gg | http://127.0.0.1:8008 | *.vish.gg cert |
|
||||
| mm.crista.love | http://127.0.0.1:8065 | *.crista.love cert |
|
||||
|
||||
### Phase 4: Cutover (Downtime: ~2 minutes)
|
||||
|
||||
This is the sequence that requires your router change:
|
||||
|
||||
```
|
||||
1. Stop host nginx on matrix-ubuntu
|
||||
ssh matrix-ubuntu "sudo systemctl stop nginx && sudo systemctl disable nginx"
|
||||
|
||||
2. Start NPM on matrix-ubuntu (binds 80/443)
|
||||
cd hosts/vms/matrix-ubuntu && docker compose -f nginx-proxy-manager.yaml up -d
|
||||
|
||||
3. Test locally:
|
||||
curl -sk -H "Host: nb.vish.gg" https://192.168.0.154/ -w "%{http_code}\n"
|
||||
|
||||
4. ** YOU: Change router port forwards **
|
||||
Old: WAN:443 → 192.168.0.250:8443
|
||||
New: WAN:443 → 192.168.0.154:443
|
||||
|
||||
Old: WAN:80 → 192.168.0.250:8880
|
||||
New: WAN:80 → 192.168.0.154:80
|
||||
|
||||
5. Test externally:
|
||||
curl -s https://nb.vish.gg/ -o /dev/null -w "%{http_code}\n"
|
||||
|
||||
6. Stop old NPM on Calypso (after confirming everything works)
|
||||
```
|
||||
|
||||
### Phase 5: Split-Horizon DNS
|
||||
|
||||
Once NPM is on matrix-ubuntu with ports 80/443:
|
||||
|
||||
1. Add AdGuard DNS rewrites (Calypso AdGuard at http://192.168.0.250:9080):
|
||||
```
|
||||
*.vish.gg → 192.168.0.154
|
||||
*.thevish.io → 192.168.0.154
|
||||
*.crista.love → 192.168.0.154
|
||||
```
|
||||
|
||||
2. Set router DHCP DNS to 192.168.0.250 (AdGuard)
|
||||
|
||||
### Phase 6: Cleanup
|
||||
|
||||
```bash
|
||||
# Stop old NPM on Calypso
|
||||
ssh calypso "cd /volume1/docker/nginx-proxy-manager && sudo docker compose down"
|
||||
|
||||
# Update DDNS — no changes needed (DDNS updates WAN IP, not internal routing)
|
||||
|
||||
# Update documentation
|
||||
# - docs/infrastructure/split-horizon-dns.md
|
||||
# - docs/infrastructure/npm-migration-jan2026.md
|
||||
# - Authentik SSO docs (outpost URL may reference calypso)
|
||||
```
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If anything goes wrong at any phase:
|
||||
|
||||
### Quick Rollback (< 1 minute)
|
||||
|
||||
```bash
|
||||
# 1. Change router forwards back:
|
||||
# WAN:443 → 192.168.0.250:8443
|
||||
# WAN:80 → 192.168.0.250:8880
|
||||
|
||||
# 2. Calypso NPM is still running — traffic flows immediately
|
||||
|
||||
# 3. Restore host nginx on matrix-ubuntu (if stopped):
|
||||
ssh matrix-ubuntu "sudo systemctl start nginx"
|
||||
|
||||
# 4. Stop new NPM on matrix-ubuntu:
|
||||
ssh matrix-ubuntu "docker stop nginx-proxy-manager"
|
||||
```
|
||||
|
||||
### Full Rollback
|
||||
|
||||
```bash
|
||||
# If NPM data was corrupted during migration:
|
||||
ssh matrix-ubuntu "
|
||||
docker stop nginx-proxy-manager
|
||||
sudo rm -rf /opt/npm/data/*
|
||||
sudo systemctl start nginx
|
||||
"
|
||||
|
||||
# Router forwards back to Calypso
|
||||
# Everything reverts to pre-migration state
|
||||
# Backups at: /home/homelab/backups/npm-migration-20260320/
|
||||
```
|
||||
|
||||
### Key Rollback Points
|
||||
|
||||
| Phase | Rollback Action | Downtime |
|
||||
|-------|----------------|----------|
|
||||
| Phase 1-2 (install/copy) | Just stop new NPM, old still running | None |
|
||||
| Phase 3 (update backends) | Revert in NPM admin UI | None |
|
||||
| Phase 4 (cutover) | Change router forwards back to Calypso | ~30 seconds |
|
||||
| Phase 5 (split-horizon) | Remove AdGuard DNS rewrites | ~30 seconds |
|
||||
| Phase 6 (cleanup) | Restart old Calypso NPM | ~10 seconds |
|
||||
|
||||
**The old NPM on Calypso should NOT be stopped until you've confirmed everything works for at least 24 hours.** Keep it as a warm standby.
|
||||
|
||||
## Risks
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|-----------|
|
||||
| Matrix federation breaks | mx.vish.gg must be re-created in NPM with correct `:8448` federation port handling |
|
||||
| Mastodon WebSocket breaks | NPM proxy host must enable WebSocket support |
|
||||
| SSL cert not trusted | Copy Cloudflare origin certs from Calypso NPM data or re-issue Let's Encrypt |
|
||||
| Authentik outpost can't reach NPM | Update outpost external_host if it references calypso IP |
|
||||
| Matrix-ubuntu VM goes down | Router forward change back to Calypso takes 30 seconds |
|
||||
| Memory pressure | NPM uses ~100MB, matrix-ubuntu has 14GB available (resized to 16GB RAM on 2026-03-27) |
|
||||
|
||||
## Affected Documentation
|
||||
|
||||
After migration, update:
|
||||
- `docs/infrastructure/split-horizon-dns.md` — NPM IP changes
|
||||
- `docs/infrastructure/npm-migration-jan2026.md` — historical reference
|
||||
- `docs/infrastructure/authentik-sso.md` — outpost URLs
|
||||
- `docs/diagrams/service-architecture.md` — NPM location
|
||||
- `docs/diagrams/network-topology.md` — traffic flow
|
||||
- `hosts/synology/calypso/nginx-proxy-manager.yaml` — mark as decommissioned
|
||||
- `hosts/vms/matrix-ubuntu/nginx-proxy-manager.yaml` — new compose file
|
||||
|
||||
## Backups
|
||||
|
||||
| What | Location | Size |
|
||||
|------|----------|------|
|
||||
| Calypso NPM full data | `/home/homelab/backups/npm-migration-20260320/npm-backup-20260320.tar.gz` | 200MB |
|
||||
| matrix-ubuntu nginx config | `/home/homelab/backups/npm-migration-20260320/nginx-backup-20260320.tar.gz` | 7.5KB |
|
||||
|
||||
## Completion Notes (2026-03-20)
|
||||
|
||||
Migration completed successfully. All phases executed, follow-up items resolved:
|
||||
|
||||
| Item | Status |
|
||||
|------|--------|
|
||||
| NPM on matrix-ubuntu with ports 80/443/81 | Done |
|
||||
| Router forwards updated to 192.168.0.154 | Done |
|
||||
| Host nginx disabled on matrix-ubuntu | Done |
|
||||
| mastodon.vish.gg, mx.vish.gg, mm.crista.love re-created as NPM proxy hosts | Done |
|
||||
| Let's Encrypt wildcard certs issued (replaced CF Origin certs) | Done |
|
||||
| Split-horizon DNS via dual AdGuard (Calypso + Atlantis) | Done |
|
||||
| Headscale control plane unaffected (stays on Calypso) | Confirmed |
|
||||
| DERP relay routing verified | Confirmed |
|
||||
| Old NPM on Calypso stopped | Done |
|
||||
271
docs/infrastructure/offline-and-remote-access.md
Normal file
271
docs/infrastructure/offline-and-remote-access.md
Normal file
@@ -0,0 +1,271 @@
|
||||
# Offline & Remote Access Guide
|
||||
|
||||
Last updated: 2026-03-20
|
||||
|
||||
## How DNS Resolution Works
|
||||
|
||||
The homelab uses **split-horizon DNS** so services are reachable from anywhere — LAN, Tailscale VPN, or the open internet — using the same `*.vish.gg` domain names.
|
||||
|
||||
### Three Access Paths
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────────────────────┐
|
||||
│ DNS Query: nb.vish.gg │
|
||||
├──────────────┬──────────────────┬────────────────────────────────────┤
|
||||
│ LAN Client │ Tailscale Client│ Internet Client │
|
||||
│ (at home) │ (travel laptop) │ (phone on cellular) │
|
||||
├──────────────┼──────────────────┼────────────────────────────────────┤
|
||||
│ DNS: AdGuard│ DNS: Headscale │ DNS: Cloudflare │
|
||||
│ (192.168.0 │ MagicDNS → │ (1.1.1.1) │
|
||||
│ .250) │ AdGuard │ │
|
||||
├──────────────┼──────────────────┼────────────────────────────────────┤
|
||||
│ Resolves to:│ Resolves to: │ Resolves to: │
|
||||
│ 100.85.21.51│ 100.85.21.51 │ 104.21.73.214 (Cloudflare) │
|
||||
│ (NPM via TS)│ (NPM via TS) │ │
|
||||
├──────────────┼──────────────────┼────────────────────────────────────┤
|
||||
│ Path: │ Path: │ Path: │
|
||||
│ Client → │ Client → │ Client → Cloudflare → │
|
||||
│ NPM (direct)│ Tailscale → │ Router → NPM → │
|
||||
│ → backend │ NPM → backend │ backend │
|
||||
├──────────────┼──────────────────┼────────────────────────────────────┤
|
||||
│ Latency: │ Latency: │ Latency: │
|
||||
│ ~1ms │ ~5-50ms │ ~50-100ms │
|
||||
│ (LAN) │ (Tailscale) │ (Cloudflare roundtrip) │
|
||||
├──────────────┼──────────────────┼────────────────────────────────────┤
|
||||
│ Internet │ Internet │ Internet │
|
||||
│ required? │ required? │ required? │
|
||||
│ NO │ NO (peer-to-peer│ YES │
|
||||
│ │ if both on TS) │ │
|
||||
└──────────────┴──────────────────┴────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Key: Everything Resolves to 100.85.21.51
|
||||
|
||||
All `*.vish.gg`, `*.thevish.io`, and `*.crista.love` domains resolve to `100.85.21.51` (matrix-ubuntu's Tailscale IP) when queried through AdGuard. This is NPM's address on the Tailscale network, reachable from:
|
||||
|
||||
- **LAN clients** — via the router's DHCP DNS (AdGuard at 192.168.0.250)
|
||||
- **Remote Tailscale clients** — via Headscale MagicDNS which forwards to AdGuard
|
||||
- **Both paths hit NPM on its Tailscale IP**, which works from anywhere on the tailnet
|
||||
|
||||
## When Internet Goes Down
|
||||
|
||||
If your WAN link drops:
|
||||
|
||||
| What works | How |
|
||||
|------------|-----|
|
||||
| All `*.vish.gg` services | AdGuard returns Tailscale IP, NPM proxies locally |
|
||||
| MagicDNS names (`atlantis.tail.vish.gg`) | Headscale resolves directly |
|
||||
| Direct Tailscale IPs (100.x.x.x) | Always work between peers |
|
||||
| Olares/K8s (k9s, kubectl) | LAN access at 192.168.0.145 |
|
||||
|
||||
| What breaks | Why |
|
||||
|-------------|-----|
|
||||
| External access (from internet) | Cloudflare can't reach you |
|
||||
| Cloudflare-only domains without split-horizon rewrite | DNS returns unreachable CF proxy IP |
|
||||
| Renovate, DDNS updates | Need internet to reach APIs |
|
||||
| DERP relays for remote peers | Remote Tailscale clients may lose connectivity |
|
||||
|
||||
## Access from Travel Laptop
|
||||
|
||||
Your travel laptop (MSI Prestige) connects via Headscale VPN:
|
||||
|
||||
1. **Join the tailnet**: `tailscale up --login-server=https://headscale.vish.gg`
|
||||
2. **DNS is automatic**: Headscale pushes AdGuard as the DNS server via MagicDNS
|
||||
3. **All domains work**: `nb.vish.gg`, `git.vish.gg`, etc. resolve to NPM's Tailscale IP
|
||||
4. **No VPN split tunneling needed**: Only homelab traffic routes through Tailscale
|
||||
|
||||
```bash
|
||||
# From the travel laptop:
|
||||
curl https://nb.vish.gg/ # → 100.85.21.51 (Tailscale) → NPM → backend
|
||||
curl https://gf.vish.gg/ # → 100.85.21.51 (Tailscale) → NPM → Grafana
|
||||
ssh homelab.tail.vish.gg # → MagicDNS → direct Tailscale peer
|
||||
```
|
||||
|
||||
### If Headscale Is Down
|
||||
|
||||
If the Headscale control server (calypso) is unreachable, already-connected peers maintain their connections. New peers can't join. Use direct Tailscale IPs as fallback:
|
||||
|
||||
| Service | Direct URL |
|
||||
|---------|-----------|
|
||||
| Grafana | `http://100.67.40.126:3300` |
|
||||
| NetBox | `http://100.67.40.126:8443` |
|
||||
| Portainer | `https://100.83.230.112:9443` |
|
||||
| Gitea | `http://100.103.48.78:3052` |
|
||||
|
||||
## MagicDNS (.tail.vish.gg)
|
||||
|
||||
Headscale MagicDNS provides `<hostname>.tail.vish.gg` for all peers:
|
||||
|
||||
| Hostname | Tailscale IP | Use |
|
||||
|----------|-------------|-----|
|
||||
| atlantis.tail.vish.gg | 100.83.230.112 | NAS, media |
|
||||
| calypso.tail.vish.gg | 100.103.48.78 | NAS, Gitea, auth |
|
||||
| homelab.tail.vish.gg | 100.67.40.126 | Monitoring, tools |
|
||||
| matrix-ubuntu.tail.vish.gg | 100.85.21.51 | NPM, Matrix, Mastodon |
|
||||
| pve.tail.vish.gg | 100.87.12.28 | Proxmox |
|
||||
| pi-5.tail.vish.gg | 100.77.151.40 | Uptime Kuma |
|
||||
| vish-concord-nuc.tail.vish.gg | 100.72.55.21 | Home Assistant, edge |
|
||||
| setillo.tail.vish.gg | 100.125.0.20 | Remote NAS |
|
||||
| seattle.tail.vish.gg | 100.82.197.124 | Cloud VPS |
|
||||
| truenas-scale.tail.vish.gg | 100.75.252.64 | TrueNAS |
|
||||
|
||||
`.tail.vish.gg` names are resolved by AdGuard rewrites (not MagicDNS) so they work on **all LAN devices**, not just Tailscale clients. Both AdGuard instances (Calypso and Atlantis) have identical entries.
|
||||
|
||||
### .vish.local Names
|
||||
|
||||
AdGuard also resolves `.vish.local` shortnames to Tailscale IPs:
|
||||
|
||||
| Hostname | Tailscale IP |
|
||||
|----------|-------------|
|
||||
| atlantis.vish.local | 100.83.230.112 |
|
||||
| calypso.vish.local | 100.103.48.78 |
|
||||
| homelab.vish.local | 100.67.40.126 |
|
||||
| concordnuc.vish.local | 100.72.55.21 |
|
||||
| pi5.vish.local | 100.77.151.40 |
|
||||
| px.vish.local | 100.87.12.28 |
|
||||
|
||||
## DNS Infrastructure
|
||||
|
||||
### Two Redundant AdGuard Instances
|
||||
|
||||
Both instances have **identical configuration** — same rewrites, filters, upstream DNS, and user rules.
|
||||
|
||||
| Role | Host | IP | Web UI |
|
||||
|------|------|-----|--------|
|
||||
| **Primary DNS** | Calypso | `192.168.0.250` | `http://192.168.0.250:9080` |
|
||||
| **Backup DNS** | Atlantis | `192.168.0.200` | `http://192.168.0.200:9080` |
|
||||
|
||||
Router DHCP hands out both as DNS servers. If Calypso reboots, Atlantis takes over seamlessly.
|
||||
|
||||
Login for both: username `vish`, same password.
|
||||
|
||||
### Upstream DNS
|
||||
|
||||
Both AdGuard instances use:
|
||||
- `https://dns.adguard-dns.com/dns-query` (AdGuard DoH)
|
||||
- `https://dns.cloudflare.com/dns-query` (Cloudflare DoH)
|
||||
- `[/tail.vish.gg/]100.100.100.100` (Headscale MagicDNS for tail.vish.gg)
|
||||
|
||||
### AdGuard DNS Rewrites (Split-Horizon)
|
||||
|
||||
All rewrites are identical on both Calypso and Atlantis.
|
||||
|
||||
**Wildcard rewrites (all services through NPM):**
|
||||
|
||||
| Domain Pattern | Resolves To | Purpose |
|
||||
|---------------|-------------|---------|
|
||||
| `*.vish.gg` | `100.85.21.51` | NPM via Tailscale |
|
||||
| `*.thevish.io` | `100.85.21.51` | NPM via Tailscale |
|
||||
| `*.crista.love` | `100.85.21.51` | NPM via Tailscale |
|
||||
|
||||
**Specific overrides (bypass NPM wildcard):**
|
||||
|
||||
| Domain | Resolves To | Purpose |
|
||||
|--------|-------------|---------|
|
||||
| `derp.vish.gg` | `192.168.0.250` | DERP relay — direct, no NPM |
|
||||
| `derp-atl.vish.gg` | `192.168.0.200` | DERP relay — direct, no NPM |
|
||||
| `derp-sea.vish.gg` | `100.82.197.124` | DERP relay on Seattle VPS |
|
||||
| `turn.thevish.io` | `192.168.0.200` | TURN/STUN — needs direct UDP |
|
||||
|
||||
**Tailscale host rewrites (override *.vish.gg wildcard):**
|
||||
|
||||
| Domain | Resolves To |
|
||||
|--------|-------------|
|
||||
| `atlantis.tail.vish.gg` | `100.83.230.112` |
|
||||
| `calypso.tail.vish.gg` | `100.103.48.78` |
|
||||
| `homelab.tail.vish.gg` | `100.67.40.126` |
|
||||
| `matrix-ubuntu.tail.vish.gg` | `100.85.21.51` |
|
||||
| `pve.tail.vish.gg` | `100.87.12.28` |
|
||||
| `pi-5.tail.vish.gg` | `100.77.151.40` |
|
||||
| `vish-concord-nuc.tail.vish.gg` | `100.72.55.21` |
|
||||
| `setillo.tail.vish.gg` | `100.125.0.20` |
|
||||
| `seattle.tail.vish.gg` | `100.82.197.124` |
|
||||
| `truenas-scale.tail.vish.gg` | `100.75.252.64` |
|
||||
| `jellyfish.tail.vish.gg` | `100.69.121.120` |
|
||||
| `shinku-ryuu.tail.vish.gg` | `100.98.93.15` |
|
||||
|
||||
### Keeping Both Instances in Sync
|
||||
|
||||
When adding new DNS rewrites, update **both** AdGuard configs:
|
||||
- Calypso: `/volume1/docker/adguard/config/AdGuardHome.yaml`
|
||||
- Atlantis: `/volume1/docker/adguard/config/AdGuardHome.yaml`
|
||||
|
||||
Then restart both:
|
||||
```bash
|
||||
ssh calypso "sudo docker restart AdGuard"
|
||||
ssh atlantis "sudo /var/packages/REDACTED_APP_PASSWORD/target/usr/bin/docker restart AdGuard"
|
||||
```
|
||||
|
||||
### Ad-Blocking Filters
|
||||
|
||||
Both instances use the same 5 filter lists:
|
||||
1. AdGuard DNS filter
|
||||
2. AdAway Default Blocklist
|
||||
3. AdGuard DNS Popup Hosts filter
|
||||
4. Dandelion Sprout's Anti Push Notifications
|
||||
5. AWAvenue Ads Rule
|
||||
|
||||
Plus 20 custom user rules blocking specific ad domains.
|
||||
|
||||
## SSL Certificates
|
||||
|
||||
All services use **Let's Encrypt wildcard certificates** (issued via DNS challenge with Cloudflare API):
|
||||
|
||||
| Certificate | Domains | Issuer |
|
||||
|------------|---------|--------|
|
||||
| Cert 8 | `*.vish.gg`, `vish.gg` | ZeroSSL (via acme.sh) |
|
||||
| Cert 9 | `*.thevish.io`, `thevish.io` | ZeroSSL (via acme.sh) |
|
||||
| Cert 10 | `*.crista.love`, `crista.love` | ZeroSSL (via acme.sh) |
|
||||
|
||||
These certs are **publicly trusted** — no certificate warnings on any access path (LAN, Tailscale, or internet).
|
||||
|
||||
### Certificate Renewal
|
||||
|
||||
acme.sh is installed on matrix-ubuntu (`/home/test/.acme.sh/`) with auto-renewal via cron. To manually renew:
|
||||
|
||||
```bash
|
||||
ssh matrix-ubuntu
|
||||
export CF_Token="REDACTED_TOKEN" # pragma: allowlist secret
|
||||
~/.acme.sh/acme.sh --renew -d '*.vish.gg' -d 'vish.gg' --force
|
||||
~/.acme.sh/acme.sh --renew -d '*.thevish.io' -d 'thevish.io' --force
|
||||
~/.acme.sh/acme.sh --renew -d '*.crista.love' -d 'crista.love' --force
|
||||
|
||||
# Then re-upload to NPM (certs need to be uploaded via NPM API or UI)
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### I'm at home on WiFi
|
||||
Just use `https://nb.vish.gg` — AdGuard resolves to NPM's Tailscale IP, works instantly.
|
||||
|
||||
### I'm traveling with the laptop
|
||||
Connect to Headscale tailnet → same URLs work: `https://nb.vish.gg`
|
||||
|
||||
### I'm on my phone (no VPN)
|
||||
Use the public URLs: `https://nb.vish.gg` → goes through Cloudflare as normal.
|
||||
|
||||
### Internet is down at home
|
||||
All services still work from LAN via AdGuard → Tailscale IP → NPM. No Cloudflare dependency.
|
||||
|
||||
### I need to access a service directly (no NPM)
|
||||
Three options, all equivalent:
|
||||
```
|
||||
http://homelab.tail.vish.gg:3300 # .tail.vish.gg name
|
||||
http://homelab.vish.local:3300 # .vish.local shortname
|
||||
http://100.67.40.126:3300 # Tailscale IP directly
|
||||
```
|
||||
|
||||
### Everything is down — emergency access
|
||||
SSH via Tailscale: `ssh homelab` (uses ~/.ssh/config with Tailscale IPs)
|
||||
|
||||
### I need to manage DNS
|
||||
- Calypso AdGuard: `http://192.168.0.250:9080` (primary)
|
||||
- Atlantis AdGuard: `http://192.168.0.200:9080` (backup)
|
||||
- Login: `vish` / same password on both
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Split-Horizon DNS Implementation](split-horizon-dns.md)
|
||||
- [NPM Migration Plan](npm-migration-to-matrix-ubuntu.md)
|
||||
- [Authentik SSO](authentik-sso.md)
|
||||
- [Image Update Guide](../admin/IMAGE_UPDATE_GUIDE.md)
|
||||
345
docs/infrastructure/openclaw-installation-guide.md
Normal file
345
docs/infrastructure/openclaw-installation-guide.md
Normal file
@@ -0,0 +1,345 @@
|
||||
# OpenClaw AI Assistant Installation Guide
|
||||
|
||||
## Overview
|
||||
|
||||
OpenClaw is a powerful AI assistant tool that provides a WebSocket gateway for AI interactions with support for multiple channels (Discord, Slack, etc.) and advanced features like browser control, voice commands, and device pairing.
|
||||
|
||||
**Installation Date:** February 16, 2026
|
||||
**OpenClaw Version:** 2026.2.15 (dc9808a)
|
||||
**Host:** seattle (100.82.197.124)
|
||||
**Installation Location:** `/root/openclaw`
|
||||
|
||||
## 🚀 Quick Access
|
||||
|
||||
- **Tailscale HTTPS URL:** https://seattle.tail.vish.gg/
|
||||
- **Local Access:** http://127.0.0.1:18789/
|
||||
- **WebSocket:** wss://seattle.tail.vish.gg (via Tailscale)
|
||||
|
||||
## 📋 Prerequisites
|
||||
|
||||
### System Requirements
|
||||
- **Node.js:** v22+ (installed v22.22.0)
|
||||
- **Package Manager:** pnpm (installed globally)
|
||||
- **Operating System:** Linux (Ubuntu/Debian)
|
||||
- **Network:** Tailscale for secure remote access
|
||||
|
||||
### Dependencies Installed
|
||||
- Node.js upgraded from v20.20.0 to v22.22.0
|
||||
- pnpm package manager
|
||||
- 1003+ npm packages for OpenClaw functionality
|
||||
|
||||
## 🔧 Installation Steps
|
||||
|
||||
### 1. System Preparation
|
||||
```bash
|
||||
# Update system
|
||||
sudo apt update && sudo apt upgrade -y
|
||||
|
||||
# Install Node.js v22
|
||||
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
|
||||
sudo apt-get install -y nodejs
|
||||
|
||||
# Install pnpm globally
|
||||
npm install -g pnpm
|
||||
|
||||
# Verify versions
|
||||
node --version # Should be v22.22.0+
|
||||
pnpm --version
|
||||
```
|
||||
|
||||
### 2. Clone and Build OpenClaw
|
||||
```bash
|
||||
# Clone the repository
|
||||
cd /root
|
||||
git clone https://github.com/openclaw/openclaw.git
|
||||
cd openclaw
|
||||
|
||||
# Install dependencies
|
||||
pnpm install
|
||||
|
||||
# Build the project
|
||||
pnpm build
|
||||
```
|
||||
|
||||
### 3. Initial Setup
|
||||
```bash
|
||||
# Run setup command to create configuration
|
||||
pnpm openclaw setup
|
||||
|
||||
# This creates configuration files in ~/.openclaw/
|
||||
```
|
||||
|
||||
### 4. Network Configuration
|
||||
|
||||
#### UFW Firewall Rules
|
||||
```bash
|
||||
# Allow OpenClaw access from Tailscale network
|
||||
sudo ufw allow from 100.64.0.0/10 to any port 18789 comment "OpenClaw - Tailscale only"
|
||||
|
||||
# Verify rule was added
|
||||
sudo ufw status verbose
|
||||
```
|
||||
|
||||
#### Tailscale Configuration
|
||||
```bash
|
||||
# Verify Tailscale is running
|
||||
tailscale status
|
||||
|
||||
# Get this machine's Tailscale IP
|
||||
tailscale ip -4
|
||||
```
|
||||
|
||||
## 🚀 Running OpenClaw
|
||||
|
||||
### Production Mode (Recommended)
|
||||
```bash
|
||||
cd /root/openclaw
|
||||
|
||||
# Start with Tailscale serve for HTTPS access
|
||||
pnpm openclaw gateway --port 18789 --bind loopback --tailscale serve --verbose --allow-unconfigured &
|
||||
```
|
||||
|
||||
### Development Mode
|
||||
```bash
|
||||
# Start in foreground for debugging
|
||||
pnpm openclaw gateway --port 18789 --bind loopback --verbose --allow-unconfigured
|
||||
```
|
||||
|
||||
### Service Management
|
||||
```bash
|
||||
# Check status
|
||||
pnpm openclaw status
|
||||
|
||||
# View logs
|
||||
pnpm openclaw logs --follow
|
||||
|
||||
# Stop gateway
|
||||
kill %1 # If running in background
|
||||
```
|
||||
|
||||
## 🌐 Access Methods
|
||||
|
||||
### 1. Tailscale HTTPS (Recommended)
|
||||
- **URL:** https://seattle.tail.vish.gg/
|
||||
- **Features:** Full WebSocket support, secure HTTPS
|
||||
- **Requirements:** Must be connected to the same Tailscale network
|
||||
- **First-time setup:** Requires device pairing (see Device Pairing section below)
|
||||
|
||||
### 2. Local Access
|
||||
- **URL:** http://127.0.0.1:18789/
|
||||
- **Features:** Full functionality when accessed locally
|
||||
- **Limitations:** Only accessible from the host machine
|
||||
|
||||
### 3. Direct IP Access
|
||||
- **URL:** http://100.82.197.124:18789/
|
||||
- **Features:** Basic HTTP interface
|
||||
- **Limitations:** WebSocket connections require HTTPS (use Tailscale instead)
|
||||
|
||||
## 🔗 Device Pairing
|
||||
|
||||
OpenClaw requires device pairing for security. When you first visit the web interface, you'll see "disconnected (1008): pairing required".
|
||||
|
||||
### Pairing Process
|
||||
|
||||
1. **Visit the web interface** from your device (triggers pairing request)
|
||||
2. **On the server, list pending requests:**
|
||||
```bash
|
||||
cd /root/openclaw
|
||||
pnpm openclaw devices list
|
||||
```
|
||||
|
||||
3. **Approve the pairing request:**
|
||||
```bash
|
||||
pnpm openclaw devices approve <request-id>
|
||||
```
|
||||
|
||||
4. **Refresh your browser** - the interface should now work
|
||||
|
||||
### Device Management Commands
|
||||
```bash
|
||||
# List all devices (pending and paired)
|
||||
pnpm openclaw devices list
|
||||
|
||||
# Approve a pending device
|
||||
pnpm openclaw devices approve <request-id>
|
||||
|
||||
# Reject a pending device
|
||||
pnpm openclaw devices reject <request-id>
|
||||
|
||||
# Revoke access for a paired device
|
||||
pnpm openclaw devices revoke <device-id> <role>
|
||||
```
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Configuration Files Location
|
||||
```
|
||||
~/.openclaw/
|
||||
├── config.json # Main configuration
|
||||
├── credentials.json # API keys and tokens
|
||||
└── sessions/ # Session data
|
||||
```
|
||||
|
||||
### Key Configuration Options
|
||||
```json
|
||||
{
|
||||
"gateway": {
|
||||
"mode": "local",
|
||||
"bind": "loopback",
|
||||
"port": 18789
|
||||
},
|
||||
"agent": {
|
||||
"model": "anthropic/claude-opus-4-6",
|
||||
"context": "200k"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🔐 Security Considerations
|
||||
|
||||
### Firewall Configuration
|
||||
- Port 18789 is restricted to Tailscale network (100.64.0.0/10)
|
||||
- No public internet access to OpenClaw gateway
|
||||
- HTTPS enforced for WebSocket connections
|
||||
|
||||
### Authentication
|
||||
- Control UI requires HTTPS or localhost access
|
||||
- Tailscale provides secure tunnel with automatic certificates
|
||||
- No additional authentication configured (uses --allow-unconfigured)
|
||||
|
||||
### Network Security
|
||||
- Tailscale serve mode provides automatic HTTPS certificates
|
||||
- All traffic encrypted via Tailscale's WireGuard protocol
|
||||
- Access limited to authorized Tailscale devices
|
||||
|
||||
## 🛠️ Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### 1. Device Pairing Required
|
||||
**Symptom:** "disconnected (1008): pairing required"
|
||||
**Solution:**
|
||||
1. Visit the web interface to trigger pairing request
|
||||
2. Run `pnpm openclaw devices list` on the server
|
||||
3. Approve the request with `pnpm openclaw devices approve <request-id>`
|
||||
4. Refresh your browser
|
||||
|
||||
#### 2. WebSocket Connection Failures
|
||||
**Symptom:** "control ui requires HTTPS or localhost (secure context)"
|
||||
**Solution:** Use Tailscale HTTPS URL instead of direct IP access
|
||||
|
||||
#### 3. Port Already in Use
|
||||
```bash
|
||||
# Kill existing process
|
||||
pnpm openclaw gateway --force --port 18789
|
||||
|
||||
# Or find and kill manually
|
||||
lsof -ti:18789 | xargs kill -9
|
||||
```
|
||||
|
||||
#### 3. Node.js Version Issues
|
||||
```bash
|
||||
# Verify Node.js version
|
||||
node --version
|
||||
|
||||
# Should be v22.22.0 or higher
|
||||
# If not, reinstall Node.js v22
|
||||
```
|
||||
|
||||
#### 4. Tailscale Serve Not Working
|
||||
```bash
|
||||
# Check Tailscale status
|
||||
tailscale status
|
||||
|
||||
# Restart Tailscale if needed
|
||||
sudo systemctl restart tailscaled
|
||||
|
||||
# Verify serve configuration
|
||||
tailscale serve status
|
||||
```
|
||||
|
||||
### Log Files
|
||||
```bash
|
||||
# OpenClaw logs
|
||||
tail -f /tmp/openclaw/openclaw-2026-02-16.log
|
||||
|
||||
# System logs
|
||||
journalctl -u tailscaled -f
|
||||
```
|
||||
|
||||
## 📊 System Status
|
||||
|
||||
### Current Configuration
|
||||
- **Host:** seattle.tail.vish.gg
|
||||
- **Tailscale IP:** 100.82.197.124
|
||||
- **Gateway Port:** 18789
|
||||
- **Bind Mode:** loopback (with Tailscale serve)
|
||||
- **Agent Model:** anthropic/claude-opus-4-6
|
||||
- **Context Window:** 200k tokens
|
||||
|
||||
### Installed Features
|
||||
- Device pairing (`/pair` command)
|
||||
- Phone control (`/phone` command)
|
||||
- Voice commands (`/voice` command)
|
||||
- Browser control service
|
||||
- Canvas hosting
|
||||
- Bonjour discovery
|
||||
|
||||
### Network Status
|
||||
- UFW firewall: Active with Tailscale rules
|
||||
- Tailscale: Connected and serving HTTPS
|
||||
- Gateway: Running in background
|
||||
- WebSocket: Available via wss://seattle.tail.vish.gg
|
||||
|
||||
## 🔄 Maintenance
|
||||
|
||||
### Regular Tasks
|
||||
```bash
|
||||
# Update OpenClaw
|
||||
cd /root/openclaw
|
||||
git pull
|
||||
pnpm install
|
||||
pnpm build
|
||||
|
||||
# Restart gateway
|
||||
kill %1
|
||||
pnpm openclaw gateway --port 18789 --bind loopback --tailscale serve --verbose --allow-unconfigured &
|
||||
```
|
||||
|
||||
### Backup Configuration
|
||||
```bash
|
||||
# Backup configuration
|
||||
tar -czf openclaw-config-$(date +%Y%m%d).tar.gz ~/.openclaw/
|
||||
|
||||
# Backup installation
|
||||
tar -czf openclaw-install-$(date +%Y%m%d).tar.gz /root/openclaw/
|
||||
```
|
||||
|
||||
### Security Audit
|
||||
```bash
|
||||
# Run security audit
|
||||
pnpm openclaw security audit --deep
|
||||
|
||||
# Check for updates
|
||||
pnpm openclaw update check
|
||||
```
|
||||
|
||||
## 📚 Additional Resources
|
||||
|
||||
- **OpenClaw Documentation:** https://docs.openclaw.ai/
|
||||
- **CLI Reference:** https://docs.openclaw.ai/cli/gateway
|
||||
- **Tailscale Documentation:** https://tailscale.com/kb/
|
||||
- **GitHub Repository:** https://github.com/openclaw/openclaw
|
||||
|
||||
## 🎯 Next Steps
|
||||
|
||||
1. **Configure API Keys:** Add your AI model API keys to `~/.openclaw/credentials.json`
|
||||
2. **Set Up Channels:** Configure Discord, Slack, or other communication channels
|
||||
3. **Customize Settings:** Modify `~/.openclaw/config.json` for your needs
|
||||
4. **Security Review:** Run `pnpm openclaw security audit --deep`
|
||||
5. **Monitoring:** Set up log monitoring and alerting
|
||||
|
||||
---
|
||||
|
||||
**Installation completed successfully on February 16, 2026**
|
||||
**OpenClaw is now accessible at:** https://seattle.tail.vish.gg/
|
||||
287
docs/infrastructure/port-forwarding-configuration.md
Normal file
287
docs/infrastructure/port-forwarding-configuration.md
Normal file
@@ -0,0 +1,287 @@
|
||||
# 🌐 Port Forwarding Configuration
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This document details the current port forwarding configuration on the TP-Link Archer BE800 router, enabling external access to specific homelab services.
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Current Port Forwarding Rules
|
||||
|
||||
Based on the TP-Link router configuration:
|
||||
|
||||
### **Active Port Forwards**
|
||||
|
||||
| Service Name | Device IP | External Port | Internal Port | Protocol | Purpose |
|
||||
|--------------|-----------|---------------|---------------|----------|---------|
|
||||
| **jitsi3** | 192.168.0.200 | 4443 | 4443 | TCP | Jitsi Meet video conferencing |
|
||||
| **stun3** | 192.168.0.200 | 5349 | 5349 | All | STUN server for WebRTC |
|
||||
| **stun2** | 192.168.0.200 | 49160-49200 | 49160-49200 | All | RTP media ports for Jitsi |
|
||||
| **stun1** | 192.168.0.200 | 3478 | 3478 | All | Primary STUN server |
|
||||
| **gitea** | 192.168.0.250 | 2222 | 2222 | All | Gitea SSH access |
|
||||
| **portainer2** | 192.168.0.200 | 8000 | 8000 | All | Portainer Edge Agent |
|
||||
| **portainer2** | 192.168.0.200 | 9443 | 9443 | All | Portainer HTTPS interface |
|
||||
| **portainer2** | 192.168.0.200 | 10000 | 10000 | All | Portainer additional service |
|
||||
| **Https** | 192.168.0.250 | 443 | 443 | All | HTTPS web services |
|
||||
| **HTTP** | 192.168.0.250 | 80 | 80 | All | HTTP web services (redirects to HTTPS) |
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Service Dependencies & Access
|
||||
|
||||
### **Jitsi Meet Video Conferencing (192.168.0.200)**
|
||||
```bash
|
||||
# External Access URLs:
|
||||
https://your-domain.com:4443 # Jitsi Meet web interface
|
||||
|
||||
# Required Ports:
|
||||
- 4443/TCP # HTTPS web interface
|
||||
- 5349/All # TURN server for NAT traversal
|
||||
- 3478/All # STUN server for peer discovery
|
||||
- 49160-49200/All # RTP media streams (40 port range)
|
||||
|
||||
# Service Dependencies:
|
||||
- Requires all 4 port ranges for full functionality
|
||||
- WebRTC media negotiation depends on STUN/TURN
|
||||
- RTP port range handles multiple concurrent calls
|
||||
```
|
||||
|
||||
### **Gitea Git Repository (192.168.0.250 - Calypso)**
|
||||
```bash
|
||||
# External SSH Access:
|
||||
git clone ssh://git@your-domain.com:2222/username/repo.git
|
||||
|
||||
# Required Ports:
|
||||
- 2222/All # SSH access for Git operations
|
||||
|
||||
# Service Dependencies:
|
||||
- SSH key authentication required
|
||||
- Alternative to HTTPS Git access
|
||||
- Enables Git operations from external networks
|
||||
```
|
||||
|
||||
### **Portainer Container Management (192.168.0.200)**
|
||||
```bash
|
||||
# External Access URLs:
|
||||
https://your-domain.com:9443 # Main Portainer interface
|
||||
https://your-domain.com:8000 # Edge Agent communication
|
||||
https://your-domain.com:10000 # Additional services
|
||||
|
||||
# Required Ports:
|
||||
- 9443/All # Primary HTTPS interface
|
||||
- 8000/All # Edge Agent communication
|
||||
- 10000/All # Extended functionality
|
||||
|
||||
# Service Dependencies:
|
||||
- All three ports required for full Portainer functionality
|
||||
- Edge Agent enables remote Docker management
|
||||
- HTTPS interface provides web-based container management
|
||||
```
|
||||
|
||||
### **Web Services (192.168.0.250 - Calypso)**
|
||||
```bash
|
||||
# External Access URLs:
|
||||
https://your-domain.com # Main web services (443)
|
||||
http://your-domain.com # HTTP redirect to HTTPS (80)
|
||||
|
||||
# Required Ports:
|
||||
- 443/All # HTTPS web services
|
||||
- 80/All # HTTP (typically redirects to HTTPS)
|
||||
|
||||
# Service Dependencies:
|
||||
- Reverse proxy (likely Nginx/Traefik) on Calypso
|
||||
- SSL/TLS certificates for HTTPS
|
||||
- Automatic HTTP to HTTPS redirection
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏠 Host Mapping
|
||||
|
||||
### **192.168.0.200 - Atlantis (Primary NAS)**
|
||||
- **Jitsi Meet**: Video conferencing platform
|
||||
- **Portainer**: Container management interface
|
||||
- **Services**: 4 port forwards (Jitsi + Portainer)
|
||||
|
||||
### **192.168.0.250 - Calypso (Development Server)**
|
||||
- **Gitea**: Git repository hosting
|
||||
- **Web Services**: HTTPS/HTTP reverse proxy
|
||||
- **Services**: 3 port forwards (Git SSH + Web)
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security Considerations
|
||||
|
||||
### **Exposed Services Risk Assessment**
|
||||
|
||||
#### **High Security Services** ✅
|
||||
- **HTTPS (443)**: Encrypted web traffic, reverse proxy protected
|
||||
- **Jitsi Meet (4443)**: Encrypted video conferencing
|
||||
- **Portainer HTTPS (9443)**: Encrypted container management
|
||||
|
||||
#### **Medium Security Services** ⚠️
|
||||
- **Gitea SSH (2222)**: SSH key authentication required
|
||||
- **Portainer Edge (8000)**: Agent communication, should be secured
|
||||
- **HTTP (80)**: Unencrypted, should redirect to HTTPS
|
||||
|
||||
#### **Network Services** 🔧
|
||||
- **STUN/TURN (3478, 5349)**: Required for WebRTC, standard protocols
|
||||
- **RTP Range (49160-49200)**: Media streams, encrypted by Jitsi
|
||||
|
||||
### **Security Recommendations**
|
||||
|
||||
```bash
|
||||
# 1. Ensure Strong Authentication
|
||||
- Use SSH keys for Gitea (port 2222)
|
||||
- Enable 2FA on Portainer (port 9443)
|
||||
- Implement strong passwords on all services
|
||||
|
||||
# 2. Monitor Access Logs
|
||||
- Review Nginx/reverse proxy logs regularly
|
||||
- Monitor failed authentication attempts
|
||||
- Set up alerts for suspicious activity
|
||||
|
||||
# 3. Keep Services Updated
|
||||
- Regular security updates for all exposed services
|
||||
- Monitor CVE databases for vulnerabilities
|
||||
- Implement automated security scanning
|
||||
|
||||
# 4. Network Segmentation
|
||||
- Consider moving exposed services to DMZ
|
||||
- Implement firewall rules between network segments
|
||||
- Use VLANs to isolate public-facing services
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌐 External Access Methods
|
||||
|
||||
### **Primary Access (Port Forwarding)**
|
||||
```bash
|
||||
# Direct external access via domain names (DDNS updated every 5 minutes)
|
||||
https://pw.vish.gg:9443 # Portainer
|
||||
https://meet.thevish.io:4443 # Jitsi Meet (primary)
|
||||
ssh://git@git.vish.gg:2222 # Gitea SSH
|
||||
|
||||
# Alternative domain access
|
||||
https://vish.gg:9443 # Portainer (main domain)
|
||||
https://meet.vish.gg:4443 # Jitsi Meet (alt domain)
|
||||
https://www.vish.gg # Main web services (HTTPS)
|
||||
https://vish.gg # Main web services (HTTPS)
|
||||
|
||||
# Additional service domains (from Cloudflare DNS)
|
||||
https://cal.vish.gg # Calendar service (proxied)
|
||||
https://reddit.vish.gg # Reddit alternative (proxied)
|
||||
https://www.thevish.io # Alternative main domain (proxied)
|
||||
https://matrix.thevish.io # Matrix chat server (proxied)
|
||||
https://joplin.thevish.io # Joplin notes (proxied)
|
||||
```
|
||||
|
||||
### **Alternative Access (Tailscale)**
|
||||
```bash
|
||||
# Secure mesh VPN access (recommended)
|
||||
https://atlantis.tail.vish.gg:9443 # Portainer via Tailscale
|
||||
https://atlantis.tail.vish.gg:4443 # Jitsi via Tailscale
|
||||
ssh://git@calypso.tail.vish.gg:2222 # Gitea via Tailscale
|
||||
```
|
||||
|
||||
### **Hybrid Approach**
|
||||
- **Public Services**: Jitsi Meet (external users need direct access)
|
||||
- **Admin Services**: Portainer, Gitea (use Tailscale for security)
|
||||
- **Web Services**: Public content via port forwarding, admin via Tailscale
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Configuration Management
|
||||
|
||||
### **Router Configuration Backup**
|
||||
```bash
|
||||
# Regular backups of port forwarding rules
|
||||
- Export TP-Link configuration monthly
|
||||
- Document all port forward changes
|
||||
- Maintain change log with dates and reasons
|
||||
```
|
||||
|
||||
### **Service Health Monitoring**
|
||||
```bash
|
||||
# Monitor forwarded services
|
||||
- Set up uptime monitoring for each forwarded port
|
||||
- Implement health checks for critical services
|
||||
- Configure alerts for service failures
|
||||
```
|
||||
|
||||
### **Dynamic DNS Configuration**
|
||||
```bash
|
||||
# Automated DDNS updates via Cloudflare
|
||||
- DDNS updater runs every 5 minutes
|
||||
- Updates both vish.gg and thevish.io domains
|
||||
- Handles both IPv4 (A) and IPv6 (AAAA) records
|
||||
- Proxied services: cal, reddit, www, matrix, joplin
|
||||
- DNS-only services: git, meet, pw, api, spotify
|
||||
|
||||
# DDNS Services Running:
|
||||
- ddns-vish-proxied: Updates proxied A records
|
||||
- ddns-vish-unproxied: Updates DNS-only A records
|
||||
- ddns-thevish-proxied: Updates thevish.io proxied records
|
||||
- ddns-thevish-unproxied: Updates thevish.io DNS-only records
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Troubleshooting
|
||||
|
||||
### **Common Issues**
|
||||
|
||||
#### **Service Not Accessible Externally**
|
||||
```bash
|
||||
# Check list:
|
||||
1. Verify port forward rule is enabled
|
||||
2. Confirm internal service is running
|
||||
3. Test internal access first (192.168.0.x:port)
|
||||
4. Check firewall rules on target host
|
||||
5. Verify router external IP hasn't changed
|
||||
```
|
||||
|
||||
#### **Jitsi Meet Connection Issues**
|
||||
```bash
|
||||
# WebRTC requires all ports:
|
||||
1. Test STUN server: 3478, 5349
|
||||
2. Verify RTP range: 49160-49200
|
||||
3. Check browser WebRTC settings
|
||||
4. Test with different networks/devices
|
||||
```
|
||||
|
||||
#### **Gitea SSH Access Problems**
|
||||
```bash
|
||||
# SSH troubleshooting:
|
||||
1. Verify SSH key is added to Gitea
|
||||
2. Test SSH connection: ssh -p 2222 git@git.vish.gg
|
||||
3. Check Gitea SSH configuration
|
||||
4. Verify port 2222 is not blocked by ISP
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Maintenance Tasks
|
||||
|
||||
### **Monthly Tasks**
|
||||
- [ ] Review access logs for all forwarded services
|
||||
- [ ] Test external access to all forwarded ports
|
||||
- [ ] Update service passwords and SSH keys
|
||||
- [ ] Backup router configuration
|
||||
|
||||
### **Quarterly Tasks**
|
||||
- [ ] Security audit of exposed services
|
||||
- [ ] Update all forwarded services to latest versions
|
||||
- [ ] Review and optimize port forwarding rules
|
||||
- [ ] Test disaster recovery procedures
|
||||
|
||||
### **Annual Tasks**
|
||||
- [ ] Complete security assessment
|
||||
- [ ] Review and update documentation
|
||||
- [ ] Evaluate need for additional security measures
|
||||
- [ ] Plan for service migrations or updates
|
||||
|
||||
---
|
||||
|
||||
*This port forwarding configuration enables external access to critical homelab services while maintaining security through proper authentication and monitoring.*
|
||||
221
docs/infrastructure/port-forwarding-guide.md
Normal file
221
docs/infrastructure/port-forwarding-guide.md
Normal file
@@ -0,0 +1,221 @@
|
||||
# 🌐 Router Port Forwarding Guide
|
||||
|
||||
This guide covers the essential ports you need to forward on your router to access your homelab services from outside your network.
|
||||
|
||||
## 🚨 Security Warning
|
||||
|
||||
**⚠️ IMPORTANT**: Only forward ports for services you actually need external access to. Each forwarded port is a potential security risk. Consider using a VPN instead for most services.
|
||||
|
||||
## 🔑 Essential Ports (Recommended)
|
||||
|
||||
### 🛡️ VPN Access (Highest Priority)
|
||||
**Forward these first - they provide secure access to everything else:**
|
||||
|
||||
| Port | Protocol | Service | Host | Purpose |
|
||||
|------|----------|---------|------|---------|
|
||||
| `51820` | UDP | WireGuard VPN | Atlantis | Primary VPN server |
|
||||
| `51820` | UDP | WireGuard VPN | concord_nuc | Secondary VPN server |
|
||||
|
||||
**Why VPN First?**: Once you have VPN access, you can reach all internal services securely without exposing them directly to the internet.
|
||||
|
||||
### 🌐 Web Services (If VPN isn't sufficient)
|
||||
**Only if you need direct external access:**
|
||||
|
||||
| Port | Protocol | Service | Host | Purpose |
|
||||
|------|----------|---------|------|---------|
|
||||
| `80` | TCP | HTTP | Nginx Proxy Manager | Web traffic (redirects to HTTPS) |
|
||||
| `443` | TCP | HTTPS | Nginx Proxy Manager | Secure web traffic |
|
||||
| `8341` | TCP | HTTP Alt | Atlantis | Nginx Proxy Manager HTTP |
|
||||
| `8766` | TCP | HTTPS Alt | Atlantis | Nginx Proxy Manager HTTPS |
|
||||
|
||||
## 🎮 Gaming Servers (If Hosting Public Games)
|
||||
|
||||
### Satisfactory Server
|
||||
| Port | Protocol | Service | Host | Purpose |
|
||||
|------|----------|---------|------|---------|
|
||||
| `7777` | TCP/UDP | Satisfactory | homelab_vm | Game server |
|
||||
|
||||
### Left 4 Dead 2 Server
|
||||
| Port | Protocol | Service | Host | Purpose |
|
||||
|------|----------|---------|------|---------|
|
||||
| `27015` | TCP/UDP | L4D2 Server | homelab_vm | Game server |
|
||||
| `27020` | UDP | L4D2 Server | homelab_vm | SourceTV |
|
||||
| `27005` | UDP | L4D2 Server | homelab_vm | Client port |
|
||||
|
||||
## 📱 Communication Services (If Needed Externally)
|
||||
|
||||
| Port | Protocol | Service | Host | Purpose |
|
||||
|------|----------|---------|------|---------|
|
||||
| `8065` | TCP | Mattermost | homelab_vm | Team chat (if external users) |
|
||||
| `8080` | TCP | Signal API | homelab_vm | Signal messaging API |
|
||||
|
||||
## 🔄 File Sync (If External Sync Needed)
|
||||
|
||||
| Port | Protocol | Service | Host | Purpose |
|
||||
|------|----------|---------|------|---------|
|
||||
| `22000` | TCP/UDP | Syncthing | homelab_vm | File synchronization |
|
||||
| `21027` | UDP | Syncthing | homelab_vm | Discovery |
|
||||
|
||||
## 🚫 Ports You Should NOT Forward
|
||||
|
||||
**These services should remain internal-only:**
|
||||
|
||||
- **Database ports** (PostgreSQL: 5432, MySQL: 3306, Redis: 6379)
|
||||
- **Monitoring services** (Prometheus: 9090, Grafana: 3000)
|
||||
- **Admin interfaces** (Portainer, Docker APIs)
|
||||
- **Internal APIs** and microservices
|
||||
- **Development tools** (VS Code Server, etc.)
|
||||
|
||||
## 🏗️ Recommended Setup Architecture
|
||||
|
||||
### Option 1: VPN-Only (Most Secure)
|
||||
```
|
||||
Internet → Router → VPN Server → Internal Services
|
||||
```
|
||||
1. Forward only VPN ports (51820/UDP)
|
||||
2. Access all services through VPN tunnel
|
||||
3. No other ports exposed to internet
|
||||
|
||||
### Option 2: Reverse Proxy + VPN (Balanced)
|
||||
```
|
||||
Internet → Router → Nginx Proxy Manager → Internal Services
|
||||
→ VPN Server → Internal Services
|
||||
```
|
||||
1. Forward HTTP/HTTPS (80, 443) to Nginx Proxy Manager
|
||||
2. Forward VPN port (51820/UDP)
|
||||
3. Use SSL certificates and authentication
|
||||
4. VPN for admin access
|
||||
|
||||
### Option 3: Selective Forwarding (Least Secure)
|
||||
```
|
||||
Internet → Router → Individual Services
|
||||
```
|
||||
1. Forward only specific service ports
|
||||
2. Use strong authentication on each service
|
||||
3. Regular security updates essential
|
||||
|
||||
## 🔧 Router Configuration Steps
|
||||
|
||||
### 1. Access Router Admin
|
||||
- Open router web interface (usually `192.168.1.1` or `192.168.0.1`)
|
||||
- Login with admin credentials
|
||||
|
||||
### 2. Find Port Forwarding Section
|
||||
- Look for "Port Forwarding", "Virtual Servers", or "NAT"
|
||||
- May be under "Advanced" or "Security" settings
|
||||
|
||||
### 3. Add Port Forward Rules
|
||||
For each port, configure:
|
||||
- **External Port**: Port from internet
|
||||
- **Internal IP**: IP of your homelab host
|
||||
- **Internal Port**: Port on the host
|
||||
- **Protocol**: TCP, UDP, or Both
|
||||
|
||||
### Example Configuration:
|
||||
```
|
||||
Service: WireGuard VPN
|
||||
External Port: 51820
|
||||
Internal IP: 192.168.1.100 (Atlantis IP)
|
||||
Internal Port: 51820
|
||||
Protocol: UDP
|
||||
```
|
||||
|
||||
## 🛡️ Security Best Practices
|
||||
|
||||
### 1. Use Strong Authentication
|
||||
- Enable 2FA where possible
|
||||
- Use complex passwords
|
||||
- Consider fail2ban for brute force protection
|
||||
|
||||
### 2. Keep Services Updated
|
||||
- Regular Docker image updates
|
||||
- Security patches for host OS
|
||||
- Monitor security advisories
|
||||
|
||||
### 3. Monitor Access Logs
|
||||
- Check for unusual access patterns
|
||||
- Set up alerts for failed login attempts
|
||||
- Regular security audits
|
||||
|
||||
### 4. Use SSL/TLS
|
||||
- Let's Encrypt certificates through Nginx Proxy Manager
|
||||
- Force HTTPS redirects
|
||||
- Strong cipher suites
|
||||
|
||||
### 5. Network Segmentation
|
||||
- Separate IoT devices
|
||||
- DMZ for public services
|
||||
- VLANs for different service types
|
||||
|
||||
## 🔍 Testing Your Setup
|
||||
|
||||
### Internal Testing
|
||||
```bash
|
||||
# Test from inside network
|
||||
curl -I http://your-service:port
|
||||
nmap -p port your-internal-ip
|
||||
```
|
||||
|
||||
### External Testing
|
||||
```bash
|
||||
# Test from outside network (use mobile data or different network)
|
||||
curl -I http://your-external-ip:port
|
||||
nmap -p port your-external-ip
|
||||
```
|
||||
|
||||
### VPN Testing
|
||||
```bash
|
||||
# Connect to VPN, then test internal services
|
||||
ping internal-service-ip
|
||||
curl http://internal-service:port
|
||||
```
|
||||
|
||||
## 🚨 Emergency Procedures
|
||||
|
||||
### If Compromised
|
||||
1. **Immediately disable port forwarding** for affected services
|
||||
2. Change all passwords
|
||||
3. Check logs for unauthorized access
|
||||
4. Update all services
|
||||
5. Consider rebuilding affected containers
|
||||
|
||||
### Monitoring Commands
|
||||
```bash
|
||||
# Check active connections
|
||||
netstat -an | grep :port
|
||||
|
||||
# Monitor logs
|
||||
docker logs container-name --tail 100 -f
|
||||
|
||||
# Check for failed logins
|
||||
grep "Failed" /var/log/auth.log
|
||||
```
|
||||
|
||||
## 📊 Port Summary Table
|
||||
|
||||
| Priority | Ports | Services | Security Level |
|
||||
|----------|-------|----------|----------------|
|
||||
| **High** | 51820/UDP | VPN | 🟢 High |
|
||||
| **Medium** | 80, 443 | Web (via proxy) | 🟡 Medium |
|
||||
| **Low** | 7777, 27015 | Gaming | 🟡 Medium |
|
||||
| **Avoid** | 22, 3389, 5432 | SSH, RDP, DB | 🔴 High Risk |
|
||||
|
||||
## 💡 Pro Tips
|
||||
|
||||
1. **Start with VPN only** - Get WireGuard working first
|
||||
2. **Use non-standard ports** - Change default ports when possible
|
||||
3. **Document everything** - Keep track of what's forwarded and why
|
||||
4. **Regular audits** - Review forwarded ports monthly
|
||||
5. **Test from outside** - Verify access works as expected
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [🔧 TP-Link Archer BE800 Setup](tplink-archer-be800-setup.md) - Specific router configuration guide
|
||||
- [Security Model](security.md) - Overall security architecture
|
||||
- [Network Architecture](networking.md) - Network topology and design
|
||||
- [VPN Setup Guide](../services/individual/wg-easy.md) - WireGuard configuration
|
||||
- [Nginx Proxy Manager](../services/individual/nginx-proxy-manager.md) - Reverse proxy setup
|
||||
|
||||
---
|
||||
|
||||
**Remember**: The best security practice is to expose as few services as possible to the internet. Use VPN for most access and only forward ports for services that absolutely need direct external access.
|
||||
320
docs/infrastructure/resource-allocation.md
Normal file
320
docs/infrastructure/resource-allocation.md
Normal file
@@ -0,0 +1,320 @@
|
||||
# Resource Allocation Guide
|
||||
|
||||
*CPU, memory, and storage recommendations for homelab services*
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This guide provides resource allocation recommendations for services running in the homelab. Values are based on typical usage and should be adjusted based on actual usage patterns.
|
||||
|
||||
---
|
||||
|
||||
## Host Capacity
|
||||
|
||||
### Current Resources
|
||||
|
||||
| Host | CPU | RAM | Storage | Workload |
|
||||
|------|-----|-----|---------|----------|
|
||||
| Atlantis | 8 cores | 32GB | 40TB | Media, Vault |
|
||||
| Calypso | 4 cores | 32GB | 12TB | Infrastructure |
|
||||
| Concord NUC | 2 cores | 16GB | 256GB | Light services |
|
||||
| Homelab VM | 4 cores | 28GB | 100GB | Monitoring |
|
||||
| RPi5 | 4 cores | 16GB | 512GB | Edge |
|
||||
|
||||
### Available Headroom
|
||||
|
||||
| Host | CPU Available | RAM Available | Notes |
|
||||
|------|---------------|---------------|-------|
|
||||
| Atlantis | 2 cores | 8GB | ~25% headroom |
|
||||
| Calypso | 1 core | 12GB | ~37% headroom |
|
||||
| Concord NUC | 0.5 core | 4GB | Limited |
|
||||
| Homelab VM | 1 core | 8GB | ~28% headroom |
|
||||
| RPi5 | 2 cores | 8GB | ~50% headroom |
|
||||
|
||||
---
|
||||
|
||||
## Service Resource Guidelines
|
||||
|
||||
### Infrastructure Services
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Nginx Proxy Manager | 0.5 | 256MB | 1GB | Minimal |
|
||||
| Authentik | 1 | 1GB | 10GB | With PostgreSQL |
|
||||
| Prometheus | 1 | 2GB | 20GB | Adjust for retention |
|
||||
| Grafana | 0.5 | 512MB | 1GB | Dashboards |
|
||||
| Alertmanager | 0.25 | 128MB | - | Minimal |
|
||||
|
||||
### Database Services
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| PostgreSQL | 1 | 1GB | 10GB+ | Per database |
|
||||
| Redis | 0.5 | 512MB | - | In-memory |
|
||||
| MariaDB/MySQL | 1 | 512MB | 5GB | Legacy services |
|
||||
|
||||
### Media Services
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Plex | 2+ | 2GB | - | Transcoding |
|
||||
| Jellyfin | 2+ | 2GB | - | Hardware assist |
|
||||
| Sonarr | 0.5 | 256MB | - | Low usage |
|
||||
| Radarr | 0.5 | 256MB | - | Low usage |
|
||||
| Lidarr | 0.5 | 256MB | - | Low usage |
|
||||
| Prowlarr | 0.25 | 128MB | - | Minimal |
|
||||
| Bazarr | 0.5 | 512MB | - | Subtitle processing |
|
||||
| qBittorrent | 1 | 512MB | - | Upload/download |
|
||||
| SABnzbd | 0.5 | 256MB | - | Download |
|
||||
|
||||
### Photo Services
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Immich | 2 | 2GB | 100GB+ | ML processing |
|
||||
| PhotoPrism | 2 | 2GB | 100GB+ | Optional |
|
||||
|
||||
### Communication Services
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Matrix/Synapse | 2 | 1GB | 10GB | Federation |
|
||||
| Element | 0.5 | 256MB | - | Web client |
|
||||
| Mastodon | 2 | 2GB | 20GB | Social |
|
||||
| Mattermost | 1 | 1GB | 5GB | Team chat |
|
||||
| Jitsi | 2 | 2GB | - | Video |
|
||||
|
||||
### Home Automation
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Home Assistant | 1 | 2GB | 5GB | Core |
|
||||
| Zigbee2MQTT | 0.5 | 256MB | - | MQTT broker |
|
||||
| Z-Wave JS | 0.5 | 512MB | - | Z-Wave |
|
||||
|
||||
### Development
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Gitea | 1 | 512MB | 5GB | Git hosting |
|
||||
| Gitea Runner | 1 | 512MB | - | CI/CD |
|
||||
| Portainer | 0.5 | 256MB | - | Management |
|
||||
| OpenHands | 2 | 4GB | 10GB | AI dev (on-demand) |
|
||||
|
||||
### Productivity
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Paperless-NGX | 1 | 1GB | 50GB | Document OCR |
|
||||
| Wallabag | 0.5 | 256MB | 5GB | Read later |
|
||||
| Reactive Resume | 0.5 | 256MB | 1GB | Resume builder |
|
||||
| Seafile | 2 | 2GB | 100GB+ | File sync |
|
||||
|
||||
### Security
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Vaultwarden | 1 | 512MB | 1GB | Passwords |
|
||||
| Bitwarden | 2 | 1GB | 5GB | (if using official) |
|
||||
|
||||
### Privacy
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Invidious | 1 | 1GB | - | YouTube frontend |
|
||||
| Piped | 1 | 1GB | - | Music frontend |
|
||||
| Libreddit | 0.5 | 256MB | - | Reddit frontend |
|
||||
|
||||
### DNS & Network
|
||||
|
||||
| Service | CPU | Memory | Storage | Notes |
|
||||
|---------|-----|--------|---------|-------|
|
||||
| Pi-hole | 0.5 | 256MB | 2GB | DNS filtering |
|
||||
| AdGuard | 1 | 512MB | 2GB | DNS + ads |
|
||||
| WireGuard | 0.25 | 128MB | - | VPN |
|
||||
| Headscale | 0.5 | 256MB | - | WireGuard server |
|
||||
|
||||
---
|
||||
|
||||
## Memory Limits by Host
|
||||
|
||||
### Atlantis (32GB)
|
||||
|
||||
```
|
||||
System: 4GB
|
||||
Container overhead: 4GB
|
||||
Vaultwarden: 512MB
|
||||
Immich: 2GB
|
||||
Plex: 2GB
|
||||
ARR stack: 1GB
|
||||
Jitsi: 2GB
|
||||
Matrix: 1GB
|
||||
Mastodon: 2GB
|
||||
Misc services: 2GB
|
||||
---------------------------
|
||||
Reserved: ~15GB
|
||||
```
|
||||
|
||||
### Calypso (32GB)
|
||||
|
||||
```
|
||||
System: 4GB
|
||||
Docker overhead: 4GB
|
||||
Authentik: 1GB
|
||||
NPM: 256MB
|
||||
Prometheus: 2GB
|
||||
Grafana: 512MB
|
||||
PostgreSQL: 1GB
|
||||
ARR stack: 512MB
|
||||
Other services: 3GB
|
||||
---------------------------
|
||||
Reserved: ~16GB
|
||||
```
|
||||
|
||||
### Concord NUC (16GB)
|
||||
|
||||
```
|
||||
System: 2GB
|
||||
Docker: 2GB
|
||||
Home Assistant: 2GB
|
||||
AdGuard: 512MB
|
||||
Plex: 2GB
|
||||
Other services: 2GB
|
||||
---------------------------
|
||||
Reserved: ~5.5GB
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## CPU Limits by Service
|
||||
|
||||
### High CPU (2+ cores)
|
||||
- Plex/Jellyfin (transcoding)
|
||||
- Immich (ML processing)
|
||||
- OpenHands
|
||||
- Ollama
|
||||
- Video processing
|
||||
|
||||
### Medium CPU (1 core)
|
||||
- Databases (PostgreSQL, MariaDB)
|
||||
- Matrix/Synapse
|
||||
- Mastodon
|
||||
- Seafile
|
||||
- Paperless-NGX (OCR)
|
||||
|
||||
### Low CPU (<1 core)
|
||||
- Nginx Proxy Manager
|
||||
- Authentik
|
||||
- Pi-hole/AdGuard
|
||||
- Vaultwarden
|
||||
- Arr suite (Sonarr, Radarr)
|
||||
- Prometheus (scraping)
|
||||
|
||||
---
|
||||
|
||||
## Storage Guidelines
|
||||
|
||||
### Media Storage
|
||||
- **Movies/TV**: On Atlantis, shared via NFS/SMB
|
||||
- **Music**: Dedicated volume
|
||||
- **Photos**: Immich primary on Atlantis, backup on RPi5
|
||||
|
||||
### Application Data
|
||||
- **Prometheus**: SSD required (fast writes)
|
||||
- **Databases**: SSD required
|
||||
- **Cache**: Can be small/fast
|
||||
|
||||
### Backup Storage
|
||||
- Local: Dedicated volume on Calypso
|
||||
- Remote: Backblaze B2 / cold storage
|
||||
|
||||
---
|
||||
|
||||
## Docker Compose Examples
|
||||
|
||||
### Memory Limits
|
||||
```yaml
|
||||
services:
|
||||
prometheus:
|
||||
image: prom/prometheus
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 2G
|
||||
reservations:
|
||||
memory: 1G
|
||||
```
|
||||
|
||||
### CPU Limits
|
||||
```yaml
|
||||
services:
|
||||
plex:
|
||||
image: plexinc/pms-docker
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2.0'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring Resource Usage
|
||||
|
||||
### Check Current Usage
|
||||
|
||||
```bash
|
||||
# All containers
|
||||
docker stats --no-stream
|
||||
|
||||
# Specific host
|
||||
curl http://<host>:9100/metrics | grep node_memory_MemAvailable
|
||||
|
||||
# Grafana dashboard
|
||||
# Infrastructure → Host Resources
|
||||
```
|
||||
|
||||
### Alerts
|
||||
|
||||
| Metric | Warning | Critical |
|
||||
|--------|---------|----------|
|
||||
| CPU | >70% | >90% |
|
||||
| Memory | >80% | >95% |
|
||||
| Disk | >80% | >90% |
|
||||
|
||||
---
|
||||
|
||||
## Optimization Tips
|
||||
|
||||
1. **Use :latest sparingly** - Pin versions for stability
|
||||
2. **Enable GPU transcoding** - For Plex/Jellyfin
|
||||
3. **Use SSD for databases** - Prometheus, PostgreSQL
|
||||
4. **Limit concurrent transcode** - In Plex settings
|
||||
5. **Enable Prometheus targerhs** - For better monitoring
|
||||
|
||||
---
|
||||
|
||||
## Capacity Planning
|
||||
|
||||
### Growth Projections
|
||||
|
||||
| Service | Current | 6 Months | 12 Months |
|
||||
|---------|---------|----------|-----------|
|
||||
| Media storage | 20TB | 25TB | 30TB |
|
||||
| Photo storage | 500GB | 750GB | 1TB |
|
||||
| Prometheus | 10GB | 15GB | 20GB |
|
||||
| Database | 5GB | 7GB | 10GB |
|
||||
|
||||
### Warning Signs
|
||||
- Disk usage >80% sustained
|
||||
- Memory pressure alerts daily
|
||||
- Container restarts increasing
|
||||
- CPU throttling visible
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Grafana Dashboards](../services/individual/grafana.md)
|
||||
- [Docker Guide](../DOCKER_COMPOSE_GUIDE.md)
|
||||
- [Monitoring Architecture](../infrastructure/MONITORING_ARCHITECTURE.md)
|
||||
340
docs/infrastructure/security.md
Normal file
340
docs/infrastructure/security.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# 🛡️ Security Model
|
||||
|
||||
**🔴 Advanced Guide**
|
||||
|
||||
This document outlines the security architecture protecting the homelab infrastructure, including network security, authentication, secrets management, and data protection.
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Security Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ SECURITY LAYERS │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ LAYER 1: PERIMETER │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Internet ──► Router Firewall ──► Only 80/443 exposed │ │
|
||||
│ │ │ │ │
|
||||
│ │ Cloudflare (DDoS, WAF, SSL) │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ LAYER 2: NETWORK │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
|
||||
│ │ │ Main │ │ IoT │ │ Guest │ (WiFi isolation) │ │
|
||||
│ │ │ Network │ │ WiFi │ │ Network │ │ │
|
||||
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ LAYER 3: ACCESS │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Tailscale VPN ──► Secure remote access to all services │ │
|
||||
│ │ Nginx Proxy Manager ──► Reverse proxy with SSL termination │ │
|
||||
│ │ Individual service authentication │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ LAYER 4: APPLICATION │
|
||||
│ ┌────────────────────────────────────────────────────────────────────┐ │
|
||||
│ │ Vaultwarden ──► Password management │ │
|
||||
│ │ .env files ──► Application secrets │ │
|
||||
│ │ Docker isolation ──► Container separation │ │
|
||||
│ └────────────────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔥 Network Security
|
||||
|
||||
### **Perimeter Defense**
|
||||
|
||||
#### Router Firewall
|
||||
| Rule | Direction | Ports | Purpose |
|
||||
|------|-----------|-------|---------|
|
||||
| Allow HTTP | Inbound | 80 | Redirect to HTTPS |
|
||||
| Allow HTTPS | Inbound | 443 | Reverse proxy access |
|
||||
| Block All | Inbound | * | Default deny |
|
||||
| Allow All | Outbound | * | Default allow |
|
||||
|
||||
#### Cloudflare Protection
|
||||
- **DDoS Protection**: Always-on Layer 3/4/7 protection
|
||||
- **WAF Rules**: Web Application Firewall for common attacks
|
||||
- **SSL/TLS**: Full (strict) encryption mode
|
||||
- **Rate Limiting**: Configured for sensitive endpoints
|
||||
- **Bot Protection**: Managed challenge for suspicious traffic
|
||||
|
||||
### **Network Segmentation**
|
||||
|
||||
| Network | Type | Purpose | Isolation |
|
||||
|---------|------|---------|-----------|
|
||||
| **Main Network** | Wired/WiFi | Trusted devices, servers | Full access |
|
||||
| **IoT WiFi** | WiFi only | Smart home devices | Internet only, no LAN access |
|
||||
| **Guest Network** | WiFi only | Visitors | Internet only, isolated |
|
||||
|
||||
> **Note**: Full VLAN segmentation is planned but not yet implemented. Currently using WiFi-based isolation for IoT devices.
|
||||
|
||||
### **Tailscale VPN Overlay**
|
||||
All internal services are accessible via Tailscale mesh VPN:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ TAILSCALE MESH NETWORK │
|
||||
├─────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │Atlantis │◄──►│ Calypso │◄──►│ Homelab │ │
|
||||
│ │ NAS │ │ NAS │ │ VM │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ │
|
||||
│ ▲ ▲ ▲ │
|
||||
│ │ │ │ │
|
||||
│ ▼ ▼ ▼ │
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ Mobile │ │ Laptop │ │ Edge │ │
|
||||
│ │ Devices │ │ MSI │ │ Devices │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ │
|
||||
│ │
|
||||
│ Benefits: │
|
||||
│ • End-to-end encryption (WireGuard) │
|
||||
│ • Zero-trust network access │
|
||||
│ • No port forwarding required │
|
||||
│ • Works behind NAT/firewalls │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Authentication & Access Control
|
||||
|
||||
### **Authentication Strategy**
|
||||
| Method | Services | Notes |
|
||||
|--------|----------|-------|
|
||||
| **Individual Logins** | All services | Each service has its own authentication |
|
||||
| **Vaultwarden** | Password storage | Bitwarden-compatible, self-hosted |
|
||||
| **Tailscale ACLs** | Network access | Controls which devices can reach which services |
|
||||
|
||||
### **Service Authentication Matrix**
|
||||
|
||||
| Service Category | Auth Method | 2FA Support | Notes |
|
||||
|-----------------|-------------|-------------|-------|
|
||||
| **Plex** | Plex account | Yes | Cloud-linked auth |
|
||||
| **Portainer** | Local admin | Yes (TOTP) | Container management |
|
||||
| **Grafana** | Local accounts | Yes (TOTP) | Monitoring dashboards |
|
||||
| **Vaultwarden** | Master password | Yes (required) | FIDO2/TOTP supported |
|
||||
| **Nginx Proxy Manager** | Local admin | No | Internal access only |
|
||||
| **Git (Gitea)** | Local accounts | Yes (TOTP) | Code repositories |
|
||||
| **Immich** | Local accounts | No | Photo management |
|
||||
|
||||
### **Access Levels**
|
||||
|
||||
```
|
||||
ADMIN (You)
|
||||
├── Full access to all services
|
||||
├── Portainer management
|
||||
├── Infrastructure SSH access
|
||||
└── Backup management
|
||||
|
||||
FAMILY
|
||||
├── Media services (Plex, Jellyfin)
|
||||
├── Photo sharing (Immich)
|
||||
└── Limited service access
|
||||
|
||||
GUESTS
|
||||
├── Guest WiFi only
|
||||
└── No internal service access
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🗝️ Secrets Management
|
||||
|
||||
### **Password Management**
|
||||
- **Vaultwarden**: Self-hosted Bitwarden server
|
||||
- **Location**: Atlantis NAS
|
||||
- **Access**: `vault.vish.gg` via Tailscale
|
||||
- **Backup**: Included in NAS backup rotation
|
||||
|
||||
### **Application Secrets**
|
||||
|
||||
| Secret Type | Storage Method | Location |
|
||||
|-------------|---------------|----------|
|
||||
| **Database passwords** | `.env` files | Per-stack directories |
|
||||
| **API keys** | `.env` files | Per-stack directories |
|
||||
| **SSL certificates** | File system | Nginx Proxy Manager |
|
||||
| **SSH keys** | File system | `~/.ssh/` on each host |
|
||||
| **Portainer env vars** | Portainer UI | Stored in Portainer |
|
||||
|
||||
### **Environment File Security**
|
||||
|
||||
```bash
|
||||
# .env files are:
|
||||
# ✅ Git-ignored (not committed to repos)
|
||||
# ✅ Readable only by root/docker
|
||||
# ✅ Backed up with NAS backups
|
||||
# ⚠️ Not encrypted at rest (TODO)
|
||||
|
||||
# Best practices:
|
||||
chmod 600 .env
|
||||
chown root:docker .env
|
||||
```
|
||||
|
||||
### **Future Improvements** (TODO)
|
||||
- [ ] Implement HashiCorp Vault or similar
|
||||
- [ ] Docker secrets for sensitive data
|
||||
- [ ] Encrypted .env files
|
||||
- [ ] Automated secret rotation
|
||||
|
||||
---
|
||||
|
||||
## 🔒 SSL/TLS Configuration
|
||||
|
||||
### **Certificate Strategy**
|
||||
|
||||
| Domain/Service | Certificate Type | Provider | Auto-Renewal |
|
||||
|---------------|-----------------|----------|--------------|
|
||||
| `*.vish.gg` | Wildcard | Cloudflare (via NPM) | Yes |
|
||||
| Internal services | Let's Encrypt | ACME DNS challenge | Yes |
|
||||
| Self-signed | Local CA | Manual | No |
|
||||
|
||||
### **Nginx Proxy Manager**
|
||||
Primary reverse proxy handling SSL termination:
|
||||
|
||||
```
|
||||
Internet ──► Cloudflare ──► Router:443 ──► NPM ──► Internal Services
|
||||
│
|
||||
├── plex.vish.gg ──► Atlantis:32400
|
||||
├── grafana.vish.gg ──► Homelab:3000
|
||||
├── git.vish.gg ──► Calypso:3000
|
||||
└── ... (other services)
|
||||
```
|
||||
|
||||
### **SSL Configuration**
|
||||
- **Protocol**: TLS 1.2+ only
|
||||
- **Ciphers**: Modern cipher suite
|
||||
- **HSTS**: Enabled for public services
|
||||
- **Certificate transparency**: Enabled via Cloudflare
|
||||
|
||||
---
|
||||
|
||||
## 💾 Backup Security
|
||||
|
||||
### **Backup Locations**
|
||||
|
||||
| Location | Type | Encryption | Purpose |
|
||||
|----------|------|------------|---------|
|
||||
| **Atlantis** | Primary | At-rest (Synology) | Local fast recovery |
|
||||
| **Calypso** | Secondary | At-rest (Synology) | Local redundancy |
|
||||
| **Backblaze B2** | Offsite | In-transit + at-rest | Disaster recovery |
|
||||
|
||||
### **Backup Encryption**
|
||||
- **Synology Hyper Backup**: AES-256 encryption option
|
||||
- **Backblaze B2**: Server-side encryption enabled
|
||||
- **Transit**: All backups use TLS in transit
|
||||
|
||||
### **3-2-1 Backup Status**
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────┐
|
||||
│ 3-2-1 BACKUP RULE │
|
||||
├─────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 3 Copies: │
|
||||
│ ├── 1. Original data (Atlantis) ✅ │
|
||||
│ ├── 2. Local backup (Calypso) ✅ │
|
||||
│ └── 3. Offsite backup (Backblaze) ✅ │
|
||||
│ │
|
||||
│ 2 Media Types: │
|
||||
│ ├── NAS storage (Synology) ✅ │
|
||||
│ └── Cloud storage (Backblaze B2) ✅ │
|
||||
│ │
|
||||
│ 1 Offsite: │
|
||||
│ └── Backblaze B2 (cloud) ✅ │
|
||||
│ │
|
||||
│ STATUS: ✅ Compliant │
|
||||
└─────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🕵️ Monitoring & Intrusion Detection
|
||||
|
||||
### **Active Monitoring**
|
||||
| Tool | Purpose | Alerts |
|
||||
|------|---------|--------|
|
||||
| **Uptime Kuma** | Service availability | ntfy, Signal |
|
||||
| **Prometheus** | Metrics collection | Alertmanager |
|
||||
| **Grafana** | Visualization | Dashboard alerts |
|
||||
| **WatchYourLAN** | Network device discovery | New device alerts |
|
||||
|
||||
### **Log Management**
|
||||
- **Dozzle**: Real-time Docker log viewer
|
||||
- **Synology Log Center**: NAS system logs
|
||||
- **Promtail/Loki**: Centralized logging (planned)
|
||||
|
||||
### **Security Alerts**
|
||||
- Failed SSH attempts (via fail2ban where deployed)
|
||||
- New devices on network (WatchYourLAN)
|
||||
- Service downtime (Uptime Kuma)
|
||||
- Backup failures (Hyper Backup notifications)
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Incident Response
|
||||
|
||||
### **Compromise Response Plan**
|
||||
|
||||
1. **Isolate**: Disconnect affected system from network
|
||||
2. **Assess**: Determine scope of compromise
|
||||
3. **Contain**: Block attacker access, change credentials
|
||||
4. **Eradicate**: Remove malware, patch vulnerabilities
|
||||
5. **Recover**: Restore from known-good backup
|
||||
6. **Review**: Document incident, improve defenses
|
||||
|
||||
### **Emergency Access**
|
||||
- **Physical access**: Always available for NAS/servers
|
||||
- **Tailscale**: Works even if DNS is compromised
|
||||
- **Out-of-band**: Console access via IPMI/iLO where available
|
||||
|
||||
---
|
||||
|
||||
## 📋 Security Checklist
|
||||
|
||||
### **Regular Tasks**
|
||||
- [ ] Weekly: Review Uptime Kuma alerts
|
||||
- [ ] Monthly: Check for service updates
|
||||
- [ ] Monthly: Review Cloudflare analytics
|
||||
- [ ] Quarterly: Rotate critical passwords
|
||||
- [ ] Quarterly: Test backup restoration
|
||||
|
||||
### **Annual Review**
|
||||
- [ ] Audit all service accounts
|
||||
- [ ] Review firewall rules
|
||||
- [ ] Update SSL certificates (if manual)
|
||||
- [ ] Security assessment of new services
|
||||
- [ ] Update this documentation
|
||||
|
||||
---
|
||||
|
||||
## 🔮 Future Security Improvements
|
||||
|
||||
| Priority | Improvement | Status |
|
||||
|----------|-------------|--------|
|
||||
| High | VLAN segmentation | Planned |
|
||||
| High | Centralized auth (Authentik/Authelia) | Planned |
|
||||
| Medium | HashiCorp Vault for secrets | Planned |
|
||||
| Medium | Automated security scanning | Planned |
|
||||
| Low | IDS/IPS (Suricata/Snort) | Considering |
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- **[Network Architecture](networking.md)**: Detailed network setup
|
||||
- **[Storage Systems](storage.md)**: Backup and storage configuration
|
||||
- **[Host Infrastructure](hosts.md)**: Server and NAS documentation
|
||||
|
||||
---
|
||||
|
||||
*Security is an ongoing process. This documentation is updated as the infrastructure evolves.*
|
||||
229
docs/infrastructure/service-dependency-map.md
Normal file
229
docs/infrastructure/service-dependency-map.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# Service Dependency Map
|
||||
|
||||
*Last Updated: 2026-02-26*
|
||||
|
||||
This document provides a comprehensive visual and reference guide for understanding service dependencies in the homelab infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Layers
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ EXTERNAL ACCESS │
|
||||
│ Cloudflare → DDNS → Home Router → Nginx Proxy Manager │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ CORE INFRASTRUCTURE LAYER │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌───────────┐ │
|
||||
│ │ Authentik │ │ NPM │ │ Prometheus │ │ Vault │ │
|
||||
│ │ (SSO) │ │ (Proxy) │ │ (Monitoring)│ │ (Secrets) │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────┘ └───────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────────┐
|
||||
│ APPLICATION LAYER │
|
||||
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
|
||||
│ │ Media │ │ Dev │ │ Comms │ │ Photos │ │Productivy│ │
|
||||
│ │ Stack │ │ Stack │ │ Stack │ │ Stack │ │ Stack │ │
|
||||
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
|
||||
└─────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Critical Service Dependencies
|
||||
|
||||
### Tier 1: Foundation Services
|
||||
These services must be running for other services to function:
|
||||
|
||||
| Service | Host | Port | Dependencies | Depended By |
|
||||
|---------|------|------|--------------|-------------|
|
||||
| **Nginx Proxy Manager** | Calypso | 80, 443 | Docker | All web services |
|
||||
| **Authentik** | Calypso | 9000 | PostgreSQL, Redis | All SSO-enabled services |
|
||||
| **Vaultwarden** | Atlantis | 8080 | SQLite | Credential storage |
|
||||
| **Prometheus** | Homelab VM | 9090 | Node exporters | Grafana, Alertmanager |
|
||||
|
||||
### Tier 2: Operational Services
|
||||
These depend on Tier 1 and support multiple other services:
|
||||
|
||||
| Service | Host | Dependencies | Depended By |
|
||||
|---------|------|--------------|-------------|
|
||||
| **Grafana** | Homelab VM | Prometheus | Dashboards |
|
||||
| **Alertmanager** | Homelab VM | Prometheus | ntfy, Signal |
|
||||
| **Pi-hole** | Multiple | Network | DNS resolution |
|
||||
| **AdGuard Home** | Concord NUC | Network | DNS filtering |
|
||||
| **Syncthing** | Multiple | Storage | Config sync |
|
||||
| **PostgreSQL** | Various | Storage | Authentik, Gitea |
|
||||
| **Redis** | Various | Memory | Authentik, caching |
|
||||
|
||||
### Tier 3: Application Services
|
||||
End-user services that depend on Tiers 1-2:
|
||||
|
||||
| Category | Services | Dependencies |
|
||||
|----------|----------|--------------|
|
||||
| **Media** | Plex, Jellyfin, arr-stack | Media storage, network |
|
||||
| **Communication** | Matrix, Mastodon, Mattermost | Authentik, PostgreSQL |
|
||||
| **Photos** | Immich | PostgreSQL, S3/Local storage |
|
||||
| **Development** | Gitea, Portainer | PostgreSQL, Docker |
|
||||
| **Productivity** | Paperless, Wallabag, Reactive Resume | Storage, Auth (optional) |
|
||||
|
||||
---
|
||||
|
||||
## Service Dependency Graph
|
||||
|
||||
### Authentication Flow
|
||||
```
|
||||
User → NPM (SSL) → Authentik (OIDC) → Service
|
||||
↑
|
||||
└── Redis (sessions)
|
||||
└── PostgreSQL (users)
|
||||
```
|
||||
|
||||
### Monitoring Flow
|
||||
```
|
||||
Node Exporters → Prometheus → Alertmanager → ntfy
|
||||
│
|
||||
└── Grafana (dashboards)
|
||||
```
|
||||
|
||||
### Media Stack Flow
|
||||
```
|
||||
Prowlarr (indexers)
|
||||
↓
|
||||
Sonarr/Radarr/Lidarr (requests)
|
||||
↓
|
||||
qBittorrent/SABnzbd (downloads)
|
||||
↓
|
||||
Plex/Jellyfin (streaming)
|
||||
```
|
||||
|
||||
### External Access Flow
|
||||
```
|
||||
Internet → Cloudflare → Home Router → NPM → Service
|
||||
↓
|
||||
Authentik (if enabled)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Host Service Mapping
|
||||
|
||||
### Atlantis (Synology DS1821+)
|
||||
- **Primary Role**: Media server, Vaultwarden, Immich
|
||||
- **Services**: Vaultwarden, Immich, Ollama, Plex
|
||||
- **Critical Dependencies**: Storage volumes, network
|
||||
|
||||
### Calypso (Synology DS723+)
|
||||
- **Primary Role**: Infrastructure, Proxy, Auth
|
||||
- **Services**: NPM, Authentik, Paperless, Reactive Resume
|
||||
- **Critical Dependencies**: Storage volumes
|
||||
|
||||
### Concord NUC
|
||||
- **Primary Role**: DNS, AdGuard, Light services
|
||||
- **Services**: AdGuard Home, various lightweight apps
|
||||
- **Critical Dependencies**: Network
|
||||
|
||||
### Homelab VM
|
||||
- **Primary Role**: Monitoring, CI/CD
|
||||
- **Services**: Prometheus, Grafana, Alertmanager, Gitea Runner
|
||||
- **Critical Dependencies**: Prometheus data volume
|
||||
|
||||
### RPi5
|
||||
- **Primary Role**: Edge/Immich
|
||||
- **Services**: Immich (edge)
|
||||
- **Critical Dependencies**: Network, storage mount
|
||||
|
||||
---
|
||||
|
||||
## Startup Order
|
||||
|
||||
When bringing up the infrastructure after a complete outage:
|
||||
|
||||
### Phase 1: Hardware & Network (0-5 min)
|
||||
1. Synology NAS (Atlantis, Calypso)
|
||||
2. Network equipment (router, switches)
|
||||
3. Home Assistant (Zigbee/Z-Wave)
|
||||
|
||||
### Phase 2: Core Services (5-15 min)
|
||||
1. **Vaultwarden** - Access to credentials
|
||||
2. **PostgreSQL** - Database foundation
|
||||
3. **Redis** - Session/caching
|
||||
4. **Authentik** - SSO identity
|
||||
5. **Nginx Proxy Manager** - External access
|
||||
|
||||
### Phase 3: Monitoring (15-20 min)
|
||||
1. **Prometheus** - Metrics collection
|
||||
2. **Node Exporters** - System metrics
|
||||
3. **Grafana** - Dashboards
|
||||
4. **Alertmanager** - Notifications
|
||||
|
||||
### Phase 4: Applications (20-45 min)
|
||||
1. **Syncthing** - Config sync
|
||||
2. **Media Stack** - Plex, arr applications
|
||||
3. **Communication** - Matrix, Mastodon
|
||||
4. **Development** - Gitea, Portainer
|
||||
5. **Productivity** - Paperless, etc.
|
||||
|
||||
### Phase 5: Optional (45+ min)
|
||||
1. Gaming servers
|
||||
2. AI/ML services (Ollama)
|
||||
3. Experimental applications
|
||||
|
||||
---
|
||||
|
||||
## Failure Impact Analysis
|
||||
|
||||
| Service Down | Impact | Affected Services |
|
||||
|--------------|--------|-------------------|
|
||||
| **NPM** | External access broken | All web services |
|
||||
| **Authentik** | SSO broken | Grafana, Portainer, SSO-enabled apps |
|
||||
| **Prometheus** | Monitoring silent | Grafana, Alertmanager |
|
||||
| **Vaultwarden** | Can't access credentials | All (if credentials needed) |
|
||||
| **Atlantis (NAS)** | Storage issues | Media, Immich, Vaultwarden |
|
||||
| **Pi-hole** | DNS issues | Local network |
|
||||
|
||||
---
|
||||
|
||||
## Checking Dependencies
|
||||
|
||||
### Docker Compose
|
||||
```bash
|
||||
cd hosts/synology/atlantis
|
||||
docker-compose config
|
||||
```
|
||||
|
||||
### Portainer
|
||||
1. Open Portainer → Stacks → Select stack
|
||||
2. View "Service dependencies" in the UI
|
||||
|
||||
### Ansible Dependency Map
|
||||
```bash
|
||||
ansible-playbook ansible/automation/playbooks/container_dependency_map.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Dependency Issues
|
||||
|
||||
### Service Won't Start
|
||||
1. Check logs: `docker-compose logs <service>`
|
||||
2. Verify dependency is running: `docker ps | grep <dependency>`
|
||||
3. Check restart policy
|
||||
|
||||
### Intermittent Failures
|
||||
1. Check resource availability (CPU, memory, disk)
|
||||
2. Verify network connectivity between hosts
|
||||
3. Check for circular dependencies
|
||||
|
||||
### After Reboot
|
||||
1. Verify Docker starts automatically
|
||||
2. Check container restart policies
|
||||
3. Monitor logs for startup order issues
|
||||
|
||||
---
|
||||
|
||||
*For detailed troubleshooting, see [Troubleshooting Guide](../troubleshooting/common-issues.md)*
|
||||
239
docs/infrastructure/split-horizon-dns.md
Normal file
239
docs/infrastructure/split-horizon-dns.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# Split-Horizon DNS Implementation Guide
|
||||
|
||||
Last updated: 2026-03-20
|
||||
|
||||
## Problem
|
||||
|
||||
All DNS queries for `*.vish.gg`, `*.thevish.io`, and `*.crista.love` currently resolve to Cloudflare proxy IPs (104.21.x.x), even when the client is on the same LAN as the services. This means:
|
||||
|
||||
1. **Hairpin NAT** — LAN traffic goes out to Cloudflare and back in through the router
|
||||
2. **Internet dependency** — if the WAN link goes down, LAN services are unreachable by domain
|
||||
3. **Added latency** — ~50ms roundtrip through Cloudflare vs ~1ms on LAN
|
||||
4. **Cloudflare bottleneck** — all traffic proxied through CF even when unnecessary
|
||||
|
||||
## Solution
|
||||
|
||||
**Status: IMPLEMENTED (2026-03-20)**
|
||||
|
||||
Use AdGuard Home on Calypso (primary) and Atlantis (backup) as **split-horizon DNS resolvers** that return local IPs for homelab domains when queried from the LAN, while external clients continue to use Cloudflare.
|
||||
|
||||
```
|
||||
┌──────────────────────────────────┐
|
||||
│ DNS Query for │
|
||||
│ nb.vish.gg │
|
||||
└───────────────┬──────────────────┘
|
||||
│
|
||||
┌───────────────▼──────────────────┐
|
||||
│ Where is the client? │
|
||||
└───────┬───────────────┬──────────┘
|
||||
│ │
|
||||
LAN Client External Client
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ AdGuard Home │ │ Cloudflare │
|
||||
│ (Calypso + │ │ DNS │
|
||||
│ Atlantis) │ │ │
|
||||
│ Returns: │ │ Returns: │
|
||||
│100.85.21.51 │ │ 104.21.73.214│
|
||||
│(NPM Tailscale)│ │ (CF proxy) │
|
||||
└──────┬───────┘ └──────┬───────┘
|
||||
│ │
|
||||
▼ ▼
|
||||
┌──────────────┐ ┌──────────────┐
|
||||
│ NPM (local) │ │ Cloudflare │
|
||||
│ matrix-ubuntu│ │ → WAN IP │
|
||||
│ :443 ~1ms │ │ → NPM │
|
||||
└──────┬───────┘ │ ~50ms │
|
||||
│ └──────┬───────┘
|
||||
▼ ▼
|
||||
┌─────────────────────────────────┐
|
||||
│ Backend Service │
|
||||
│ (same result, faster path) │
|
||||
└─────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
NPM is now on matrix-ubuntu (192.168.0.154) listening on standard ports 80/443/81. The migration from Calypso was completed on 2026-03-20.
|
||||
|
||||
| Port | Status |
|
||||
|------|--------|
|
||||
| 80:80 | **Active** |
|
||||
| 443:443 | **Active** |
|
||||
| 81:81 | **Active** (Admin UI) |
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
### Step 1: Move NPM to Standard Ports -- DONE
|
||||
|
||||
NPM migrated from Calypso to matrix-ubuntu (192.168.0.154) on 2026-03-20. Compose file: `hosts/vms/matrix-ubuntu/nginx-proxy-manager.yaml`. Host nginx on matrix-ubuntu has been disabled (`systemctl disable nginx`); NPM now handles mastodon.vish.gg, mx.vish.gg, and mm.crista.love directly.
|
||||
|
||||
Router port forwards updated:
|
||||
- `WAN:443 → 192.168.0.154:443`
|
||||
- `WAN:80 → 192.168.0.154:80`
|
||||
|
||||
### Step 2: Configure AdGuard DNS Rewrites -- DONE
|
||||
|
||||
AdGuard DNS rewrites configured on both Calypso (http://192.168.0.250:9080) and Atlantis (http://192.168.0.200:9080). Wildcard entries point to NPM's Tailscale IP:
|
||||
|
||||
| Domain | Answer | Notes |
|
||||
|--------|--------|-------|
|
||||
| `*.vish.gg` | `100.85.21.51` | All vish.gg domains → NPM Tailscale IP |
|
||||
| `*.thevish.io` | `100.85.21.51` | All thevish.io domains → NPM Tailscale IP |
|
||||
| `*.crista.love` | `100.85.21.51` | All crista.love domains → NPM Tailscale IP |
|
||||
|
||||
These three wildcards cover all 36 proxy hosts. AdGuard resolves matching queries locally instead of forwarding to upstream DNS.
|
||||
|
||||
**Exceptions** — these domains need direct IPs (not NPM), added as specific overrides:
|
||||
|
||||
| Domain | Answer | Reason |
|
||||
|--------|--------|--------|
|
||||
| `mx.vish.gg` | `192.168.0.154` | Matrix federation needs direct access on port 8448 |
|
||||
| `derp.vish.gg` | `192.168.0.250` | DERP relay — direct IP, no CF proxy |
|
||||
| `derp-atl.vish.gg` | `192.168.0.200` | Atlantis DERP relay |
|
||||
| `headscale.vish.gg` | `192.168.0.250` | Headscale control — direct access |
|
||||
| `turn.thevish.io` | `192.168.0.200` | TURN/STUN needs direct UDP |
|
||||
|
||||
**.tail.vish.gg overrides** — specific rewrites to override the wildcard for Tailscale-specific subdomains.
|
||||
|
||||
Specific entries take priority over wildcards in AdGuard.
|
||||
|
||||
### Step 3: Set AdGuard as LAN DNS Server -- DONE
|
||||
|
||||
Router (Archer BE800) DHCP configured with dual AdGuard DNS:
|
||||
|
||||
1. **Primary DNS:** `192.168.0.250` (Calypso AdGuard)
|
||||
2. **Secondary DNS:** `192.168.0.200` (Atlantis AdGuard, backup)
|
||||
|
||||
### Step 4: Configure Atlantis AdGuard (Backup DNS) -- DONE
|
||||
|
||||
Same DNS rewrites added to Atlantis's AdGuard instance (http://192.168.0.200:9080) as backup:
|
||||
|
||||
- Same wildcard rewrites as Calypso (pointing to `100.85.21.51`)
|
||||
- Reachable at `192.168.0.200`
|
||||
|
||||
### Step 5: Test
|
||||
|
||||
```bash
|
||||
# Verify local resolution
|
||||
dig nb.vish.gg @192.168.0.250
|
||||
# Expected: 192.168.0.250 (NPM local IP)
|
||||
|
||||
# Verify external resolution still works
|
||||
dig nb.vish.gg @1.1.1.1
|
||||
# Expected: 104.21.73.214 (Cloudflare proxy)
|
||||
|
||||
# Test HTTPS access via local DNS
|
||||
curl -s --resolve "nb.vish.gg:443:192.168.0.250" https://nb.vish.gg/ -o /dev/null -w "%{http_code} %{time_total}s\n"
|
||||
# Expected: 200 in ~0.05s (vs ~0.15s through Cloudflare)
|
||||
|
||||
# Test all domains resolve locally
|
||||
for domain in nb.vish.gg gf.vish.gg git.vish.gg sso.vish.gg dash.vish.gg; do
|
||||
ip=$(dig +short $domain @192.168.0.250 | tail -1)
|
||||
echo "$domain → $ip"
|
||||
done
|
||||
```
|
||||
|
||||
## SSL Considerations
|
||||
|
||||
**Resolved:** NPM now uses **Let's Encrypt wildcard certificates** (DNS challenge via Cloudflare API) instead of Cloudflare Origin certs. This means:
|
||||
|
||||
- Certs are trusted by all browsers, whether traffic comes through Cloudflare or directly via LAN
|
||||
- No browser warnings for split-horizon DNS LAN access
|
||||
- Certs auto-renew via NPM's built-in Let's Encrypt integration
|
||||
|
||||
## What Changes for Each Path
|
||||
|
||||
### LAN Client
|
||||
```
|
||||
Browser → nb.vish.gg
|
||||
→ AdGuard DNS: 100.85.21.51 (NPM Tailscale IP)
|
||||
→ NPM (matrix-ubuntu:443) → SSL termination (LE wildcard cert)
|
||||
→ Proxy to backend (192.168.0.210:8443)
|
||||
→ Response (~1ms total DNS+proxy)
|
||||
```
|
||||
|
||||
### External Client
|
||||
```
|
||||
Browser → nb.vish.gg
|
||||
→ Cloudflare DNS: 104.21.73.214
|
||||
→ Cloudflare proxy → WAN IP → Router
|
||||
→ NPM (matrix-ubuntu:443) → SSL termination
|
||||
→ Proxy to backend (192.168.0.210:8443)
|
||||
→ Response (~50ms total)
|
||||
```
|
||||
|
||||
### Internet Down
|
||||
```
|
||||
Browser → nb.vish.gg
|
||||
→ AdGuard DNS: 100.85.21.51 (cached/local)
|
||||
→ NPM (matrix-ubuntu:443) → SSL termination
|
||||
→ Proxy to backend
|
||||
→ Response (services still work!)
|
||||
```
|
||||
|
||||
## Current NPM Proxy Hosts (for reference)
|
||||
|
||||
All 36 domains that would benefit from split-horizon:
|
||||
|
||||
### vish.gg (27 domains)
|
||||
| Domain | Backend |
|
||||
|--------|---------|
|
||||
| actual.vish.gg | calypso:8304 |
|
||||
| cal.vish.gg | atlantis:12852 |
|
||||
| dash.vish.gg | atlantis:7575 |
|
||||
| dav.vish.gg | calypso:8612 |
|
||||
| docs.vish.gg | calypso:8777 |
|
||||
| gf.vish.gg | homelab-vm:3300 |
|
||||
| git.vish.gg | calypso:3052 |
|
||||
| headscale.vish.gg | calypso:8085 |
|
||||
| kuma.vish.gg | rpi5:3001 |
|
||||
| mastodon.vish.gg | matrix-ubuntu:3000 |
|
||||
| mx.vish.gg | matrix-ubuntu:8082 |
|
||||
| nb.vish.gg | homelab-vm:8443 |
|
||||
| npm.vish.gg | calypso:81 |
|
||||
| ntfy.vish.gg | homelab-vm:8081 |
|
||||
| ollama.vish.gg | atlantis:11434 |
|
||||
| ost.vish.gg | calypso:3000 |
|
||||
| paperless.vish.gg | calypso:8777 |
|
||||
| pt.vish.gg | atlantis:10000 |
|
||||
| pw.vish.gg | atlantis:4080 |
|
||||
| rackula.vish.gg | calypso:3891 |
|
||||
| retro.vish.gg | calypso:8025 |
|
||||
| rx.vish.gg | calypso:9751 |
|
||||
| rxdl.vish.gg | calypso:9753 |
|
||||
| scrutiny.vish.gg | homelab-vm:8090 |
|
||||
| sf.vish.gg | calypso:8611 |
|
||||
| sso.vish.gg | calypso:9000 |
|
||||
| wizarr.vish.gg | atlantis:5690 |
|
||||
|
||||
### thevish.io (5 domains)
|
||||
| Domain | Backend |
|
||||
|--------|---------|
|
||||
| binterest.thevish.io | homelab-vm:21544 |
|
||||
| hoarder.thevish.io | homelab-vm:3482 |
|
||||
| joplin.thevish.io | atlantis:22300 |
|
||||
| matrix.thevish.io | matrix-ubuntu:8081 |
|
||||
| meet.thevish.io | atlantis:5443 |
|
||||
|
||||
### crista.love (2 domains)
|
||||
| Domain | Backend |
|
||||
|--------|---------|
|
||||
| crista.love | guava:28888 |
|
||||
| cocalc.crista.love | guava:8080 |
|
||||
| mm.crista.love | matrix-ubuntu:8065 |
|
||||
|
||||
## Rollback
|
||||
|
||||
If something breaks:
|
||||
1. Change router DHCP DNS back to `1.1.1.1` / `8.8.8.8`
|
||||
2. Or remove the DNS rewrites from AdGuard
|
||||
3. All traffic reverts to Cloudflare path immediately
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [NPM Migration](npm-migration-jan2026.md) — Reverse proxy configuration
|
||||
- [Authentik SSO](authentik-sso.md) — Forward auth depends on NPM routing
|
||||
- [Cloudflare DNS](cloudflare-dns.md) — External DNS records
|
||||
- [Image Update Guide](../admin/IMAGE_UPDATE_GUIDE.md) — Mentions Gitea/NPM as bootstrap dependencies
|
||||
61
docs/infrastructure/ssh-hosts.md
Normal file
61
docs/infrastructure/ssh-hosts.md
Normal file
@@ -0,0 +1,61 @@
|
||||
# SSH Host Reference
|
||||
|
||||
Quick reference for all SSH-accessible hosts in the homelab.
|
||||
|
||||
## Hosts
|
||||
|
||||
| SSH Alias | Hostname/IP | User | Port | Auth | Network | Role |
|
||||
|-----------|-------------|------|------|------|---------|------|
|
||||
| `atlantis` | 100.83.230.112 | vish | 60000 | key | Tailscale | Primary NAS (DS1823xs+) |
|
||||
| `calypso` | 100.103.48.78 | Vish | 62000 | key | Tailscale | Dev NAS (DS723+) |
|
||||
| `setillo` | 100.125.0.20 | vish | 22 | key | Tailscale | Monitoring NAS (Tucson) |
|
||||
| `setillo-root` | 100.125.0.20 | root | 22 | key | Tailscale | Setillo root access |
|
||||
| `guava` / `truenas` | 100.75.252.64 | vish | 22 | key | Tailscale | TrueNAS Scale server |
|
||||
| `nuc` / `concord` | 100.72.55.21 | vish | 22 | key | Tailscale | Home automation NUC |
|
||||
| `pi-5` | 100.77.151.40 | vish | 22 | key | Tailscale | Raspberry Pi 5 |
|
||||
| `jellyfish` | 100.69.121.120 | lulu | 22 | key | Tailscale | Pi 5 photo server |
|
||||
| `olares` | 192.168.0.145 | olares | 22 | key | LAN only | Kubernetes/LLM appliance |
|
||||
| `moon` | 100.64.0.6 | vish | 22 | key | Tailscale | Dev workstation |
|
||||
| `shinku-ryuu` | 100.98.93.15 | vish | 22 | key | Tailscale | Main desktop (Windows/WSL) |
|
||||
| `homelab` | 100.67.40.126 | homelab | 22 | password | Tailscale | Homelab VM (this host) |
|
||||
| `seattle` | YOUR_WAN_IP | root | 22 | key | Public IP | Contabo VPS |
|
||||
| `seattle-tailscale` | 100.82.197.124 | root | 22 | key | Tailscale | Contabo VPS (Tailscale) |
|
||||
| `pve` | 100.87.12.28 | root | 22 | key | Tailscale | Proxmox hypervisor |
|
||||
| `homeassistant` | 100.112.186.90 | hassio | 22 | key | Tailscale | Home Assistant |
|
||||
| `laptop` | 100.124.91.52 | vish | 22 | key | Tailscale | MSI Prestige laptop |
|
||||
| `matrix-ubuntu` | 192.168.0.154 | test | 22 | key | LAN | Matrix server |
|
||||
| `mastodon-rocky` | 100.64.0.3 | root | 22 | key | Tailscale | Mastodon instance |
|
||||
| `vishdebian` | 100.64.0.2 | vish | 22 | key | Tailscale | Debian VM |
|
||||
| `gl-mt3000` | 100.126.243.15 | root | 22 | key | Tailscale | GL.iNet travel router |
|
||||
| `gl-be3600` | 100.105.59.123 | root | 22 | key | Tailscale | GL.iNet router |
|
||||
|
||||
## Network Access
|
||||
|
||||
### Tailscale (Headscale)
|
||||
- **Control server**: `https://headscale.vish.gg:8443`
|
||||
- **Admin UI (Headplane)**: `https://headscale.vish.gg:8443/admin`
|
||||
- **Headscale runs on**: Calypso (Docker)
|
||||
- **User**: vish (ID: 1)
|
||||
- **Pre-auth key generation**:
|
||||
```bash
|
||||
ssh calypso 'sudo /usr/local/bin/docker exec headscale headscale preauthkeys create --user 1 --expiration 1h'
|
||||
```
|
||||
|
||||
### LAN-only Hosts
|
||||
- **olares** (192.168.0.145) — Cannot run host-level Tailscale (conflicts with K8s Tailscale pod)
|
||||
- **matrix-ubuntu** (192.168.0.154) — Local network only
|
||||
|
||||
## SSH Config
|
||||
|
||||
Source: `~/.ssh/config` on the homelab VM (192.168.0.210)
|
||||
|
||||
All hosts use `~/.ssh/id_ed25519` for key auth except:
|
||||
- `homelab` — uses password authentication
|
||||
|
||||
## Gitea SSH
|
||||
|
||||
```
|
||||
Host git.vish.gg
|
||||
Port 2222
|
||||
User git
|
||||
```
|
||||
318
docs/infrastructure/ssl-tls-management.md
Normal file
318
docs/infrastructure/ssl-tls-management.md
Normal file
@@ -0,0 +1,318 @@
|
||||
# SSL/TLS Certificate Management
|
||||
|
||||
*Managing SSL certificates for the homelab infrastructure*
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The homelab uses Nginx Proxy Manager (NPM) as the primary certificate authority, with Let's Encrypt providing free SSL certificates.
|
||||
|
||||
---
|
||||
|
||||
## Certificate Authorities
|
||||
|
||||
### Primary: Let's Encrypt
|
||||
- **Provider:** Let's Encrypt
|
||||
- **Validation:** HTTP-01 (automatic via NPM)
|
||||
- **Renewal:** Automatic at 90 days
|
||||
- **Domains:** *.vish.local, *.vish.gg
|
||||
|
||||
### Secondary: Self-Signed
|
||||
- **Use:** Internal services (non-public)
|
||||
- **Tool:** OpenSSL
|
||||
- **Regeneration:** As needed
|
||||
|
||||
---
|
||||
|
||||
## Certificate Locations
|
||||
|
||||
### Nginx Proxy Manager
|
||||
```
|
||||
/opt/docker/npm/data/
|
||||
├── letsencrypt/
|
||||
│ └── accounts/
|
||||
│ └── acme-v02.api.letsencrypt.org/
|
||||
└── ssl/
|
||||
└── <domain>/
|
||||
├── fullchain.pem
|
||||
├── privkey.pem
|
||||
└── bundle.crt
|
||||
```
|
||||
|
||||
### Services with Own Certs
|
||||
- **Authentik:** `/opt/authentik/ssl/`
|
||||
- **Matrix:** `/etc/matrix-synapse/ssl/`
|
||||
- **PostgreSQL:** `/etc/ssl/private/`
|
||||
|
||||
---
|
||||
|
||||
## Adding New Certificates
|
||||
|
||||
### Via NPM UI (Recommended)
|
||||
|
||||
1. Access NPM: `http://calypso.vish.local:81`
|
||||
2. Navigate to **SSL Certificates** → **Add SSL Certificate**
|
||||
3. Enter domain names:
|
||||
- `service.vish.local` (internal)
|
||||
- `service.vish.gg` (public)
|
||||
4. Enable **Force SSL**
|
||||
5. Click **Save**
|
||||
|
||||
### Via CLI (Automation)
|
||||
|
||||
```bash
|
||||
# Using certbot directly
|
||||
certbot certonly --webroot \
|
||||
-w /var/www/html \
|
||||
-d service.vish.local \
|
||||
--agree-tos \
|
||||
--email admin@vish.local
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Certificate Renewal
|
||||
|
||||
### Automatic (Default)
|
||||
- NPM auto-renews 7 days before expiration
|
||||
- No action required
|
||||
- Check logs: NPM → Logs
|
||||
|
||||
### Manual Renewal
|
||||
|
||||
```bash
|
||||
# Force renewal via NPM
|
||||
docker exec nginx-proxy-manager npm --root /etc/npm \
|
||||
force-renew
|
||||
|
||||
# Or via API
|
||||
curl -X POST http://npm/api/nginx/certificates/<id>/renew
|
||||
```
|
||||
|
||||
### Ansible Playbook
|
||||
```bash
|
||||
ansible-playbook ansible/automation/playbooks/certificate_renewal.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Certificate Status
|
||||
|
||||
### Check Expiration
|
||||
|
||||
```bash
|
||||
# Via NPM
|
||||
# Navigate to SSL Certificates tab
|
||||
|
||||
# Via openssl
|
||||
echo | openssl s_client -connect service.vish.local:443 2>/dev/null | openssl x509 -noout -dates
|
||||
|
||||
# Via script
|
||||
cd /opt/npm/letsencrypt/live/
|
||||
for cert in */; do
|
||||
echo "$cert: $(openssl x509 -enddate -noout -in "$cert/cert.pem" | cut -d= -f2)"
|
||||
done
|
||||
```
|
||||
|
||||
### Certificate Dashboard
|
||||
|
||||
| Domain | Expiry | Status | Renews |
|
||||
|--------|--------|--------|--------|
|
||||
| vish.gg | +85 days | ✅ Active | Auto |
|
||||
| *.vish.local | +85 days | ✅ Active | Auto |
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Rate Limiting
|
||||
|
||||
**Problem:** Too many certificate requests
|
||||
|
||||
**Solution:**
|
||||
- Wait 1 hour (Let's Encrypt limit)
|
||||
- Use staging environment for testing
|
||||
- Request multiple domains in one cert
|
||||
|
||||
### DNS Validation Failure
|
||||
|
||||
**Problem:** ACME challenge fails
|
||||
|
||||
**Solution:**
|
||||
- Verify DNS A record points to public IP
|
||||
- Check firewall allows port 80
|
||||
- Ensure no CNAME conflicts
|
||||
|
||||
### Mixed Content Warnings
|
||||
|
||||
**Problem:** HTTP resources on HTTPS page
|
||||
|
||||
**Solution:**
|
||||
- Update service config to use HTTPS URLs
|
||||
- For internal services, use HTTP (NPM handles SSL)
|
||||
- Check browser console for details
|
||||
|
||||
### Certificate Mismatch
|
||||
|
||||
**Problem:** Wrong certificate served
|
||||
|
||||
**Solution:**
|
||||
1. Check NPM proxy host settings
|
||||
2. Verify certificate is assigned
|
||||
3. Clear browser cache
|
||||
4. Check for multiple certificates
|
||||
|
||||
---
|
||||
|
||||
## Internal Services (Self-Signed)
|
||||
|
||||
### Creating Self-Signed Cert
|
||||
|
||||
```bash
|
||||
# Create directory
|
||||
mkdir -p /opt/service/ssl
|
||||
|
||||
# Generate certificate
|
||||
openssl req -x509 -nodes -days 365 \
|
||||
-newkey rsa:2048 \
|
||||
-keyout /opt/service/ssl/key.pem \
|
||||
-out /opt/service/ssl/cert.pem \
|
||||
-addext "subjectAltName=DNS:service.local,DNS:service"
|
||||
|
||||
# Set permissions
|
||||
chmod 600 /opt/service/ssl/key.pem
|
||||
```
|
||||
|
||||
### Adding to Trust Store
|
||||
|
||||
```bash
|
||||
# Linux (Ubuntu/Debian)
|
||||
sudo cp /opt/service/ssl/cert.pem /usr/local/share/ca-certificates/service.crt
|
||||
sudo update-ca-certificates
|
||||
|
||||
# macOS
|
||||
sudo security add-trusted-cert -d -r trustRoot -k /Library/Keychains/System.keychain /opt/service/ssl/cert.pem
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Matrix/Synapse Certificates
|
||||
|
||||
### Custom Certificate Setup
|
||||
|
||||
```yaml
|
||||
# docker-compose.yml
|
||||
services:
|
||||
synapse:
|
||||
environment:
|
||||
- SYNAPSE_TLS_CERT_FILE=/ssl/tls.crt
|
||||
- SYNAPSE_TLS_KEY_FILE=/ssl/tls.key
|
||||
volumes:
|
||||
- ./ssl:/ssl:ro
|
||||
```
|
||||
|
||||
### Federation Certificates
|
||||
|
||||
```bash
|
||||
# Add to TLS certificates
|
||||
/usr/local/bin/REDACTED_APP_PASSWORD \
|
||||
--server-name vish.local \
|
||||
--tls-cert /opt/npm/ssl/vish.gg/fullchain.pem \
|
||||
--tls-key /opt/npm/ssl/vish.gg/privkey.pem
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Key Permissions
|
||||
```bash
|
||||
# Private keys should be readable only by root
|
||||
chmod 600 /path/to/privkey.pem
|
||||
chown root:root /path/to/privkey.pem
|
||||
```
|
||||
|
||||
### Cipher Suites
|
||||
|
||||
Configure in NPM under **Settings → SSL → Advanced**:
|
||||
|
||||
```
|
||||
ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-SHA384:DHE-RSA-AES256-SHA256
|
||||
```
|
||||
|
||||
### HSTS
|
||||
|
||||
Enable in NPM:
|
||||
- **Settings → SSL → Force HSTS**
|
||||
- Preload recommended
|
||||
|
||||
---
|
||||
|
||||
## Backup
|
||||
|
||||
### Backup Certificates
|
||||
|
||||
```bash
|
||||
# Backup NPM certificates
|
||||
tar -czf backups/ssl-$(date +%Y%m%d).tar.gz \
|
||||
/opt/docker/npm/data/letsencrypt/ \
|
||||
/opt/docker/npm/data/ssl/
|
||||
```
|
||||
|
||||
### Restore
|
||||
|
||||
```bash
|
||||
# Restore
|
||||
tar -xzf backups/ssl-20240101.tar.gz -C /
|
||||
|
||||
# Restart NPM
|
||||
docker-compose -f /opt/docker/npm/docker-compose.yml restart
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Expiration Alerts
|
||||
|
||||
Configure in Prometheus/Alertmanager:
|
||||
```yaml
|
||||
groups:
|
||||
- name: certificates
|
||||
rules:
|
||||
- alert: REDACTED_APP_PASSWORD
|
||||
expr: (certify_not_after - time()) < (86400 * 30)
|
||||
for: 1h
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Certificate expiring soon"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# Check all certificates
|
||||
docker exec nginx-proxy-manager npm --root /etc/npm list
|
||||
|
||||
# Force renewal
|
||||
docker exec nginx-proxy-manager npm --root /etc/npm force-renew
|
||||
|
||||
# Manual ACME challenge
|
||||
docker exec -it nginx-proxy-manager sh
|
||||
cd /etc/letsencrypt/renewal-hooks/deploy/
|
||||
|
||||
# Verify certificate
|
||||
openssl s_client -connect vish.gg:443 -servername vish.gg
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [NPM Documentation](https://nginxproxymanager.com/)
|
||||
- [Let's Encrypt Docs](https://letsencrypt.org/docs/)
|
||||
- [SSL Labs Test](https://www.ssllabs.com/ssltest/)
|
||||
393
docs/infrastructure/storage.md
Normal file
393
docs/infrastructure/storage.md
Normal file
@@ -0,0 +1,393 @@
|
||||
# 💾 Storage Systems
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This document covers the storage architecture, RAID configurations, backup strategies, and data management practices for the homelab infrastructure.
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Storage Architecture Overview
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────────────────┐
|
||||
│ STORAGE INFRASTRUCTURE │
|
||||
├─────────────────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ PRIMARY STORAGE BACKUP TARGETS │
|
||||
│ ┌─────────────────────┐ ┌─────────────────────┐ │
|
||||
│ │ ATLANTIS │ │ CALYPSO │ │
|
||||
│ │ Synology NAS │ ──────► │ Synology NAS │ │
|
||||
│ │ │ Hyper │ │ │
|
||||
│ │ 8x 16TB RAID 6 │ Backup │ 2x 12TB RAID 1 │ │
|
||||
│ │ ≈96TB usable │ │ ≈12TB usable │ │
|
||||
│ │ │ │ │ │
|
||||
│ │ + 2x 480GB NVMe │ │ + 2x 480GB NVMe │ │
|
||||
│ │ (SSD Cache) │ │ (SSD Cache) │ │
|
||||
│ └─────────────────────┘ └─────────────────────┘ │
|
||||
│ │ │ │
|
||||
│ │ │ │
|
||||
│ ▼ ▼ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ BACKBLAZE B2 │ │
|
||||
│ │ Cloud Offsite Backup │ │
|
||||
│ │ Encrypted, Versioned Storage │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ SECONDARY STORAGE │
|
||||
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
|
||||
│ │ GUAVA │ │ SETILLO │ │ PROXMOX │ │
|
||||
│ │ RAID 1 HDD │ │ Single 1TB │ │ Local SSD │ │
|
||||
│ │ + NVMe SSD │ │ │ │ │ │
|
||||
│ └───────────────┘ └───────────────┘ └───────────────┘ │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Storage Summary
|
||||
|
||||
| Host | Total Raw | Usable | RAID Level | Purpose |
|
||||
|------|-----------|--------|------------|---------|
|
||||
| **Atlantis** | 128TB (8x16TB) | ~96TB | RAID 6 | Primary storage, media |
|
||||
| **Calypso** | 24TB (2x12TB) | ~12TB | RAID 1 | Backup, development |
|
||||
| **Guava** | 6TB+ | ~3TB | RAID 1 | AI/ML, compute |
|
||||
| **Setillo** | 1TB | 1TB | Single | Monitoring |
|
||||
| **Proxmox** | ~500GB | 500GB | Local SSD | VM storage |
|
||||
|
||||
---
|
||||
|
||||
## 🏛️ Atlantis - Primary Storage
|
||||
|
||||
### **Hardware Configuration**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **NAS Model** | Synology DS1823xs+ |
|
||||
| **Drive Bays** | 8x 3.5" hot-swap |
|
||||
| **Drives** | 8x Seagate IronWolf Pro 16TB (ST16000NT001) |
|
||||
| **Cache** | 2x WD Black SN750 480GB NVMe |
|
||||
| **RAID Level** | RAID 6 (dual parity) |
|
||||
| **Raw Capacity** | 128TB |
|
||||
| **Usable Capacity** | ~96TB |
|
||||
| **Fault Tolerance** | 2 drive failures |
|
||||
|
||||
### **RAID 6 Benefits**
|
||||
|
||||
```
|
||||
RAID 6 Configuration:
|
||||
┌────┬────┬────┬────┬────┬────┬────┬────┐
|
||||
│ D1 │ D2 │ D3 │ D4 │ D5 │ D6 │ P1 │ P2 │ ← Data + Dual Parity
|
||||
├────┼────┼────┼────┼────┼────┼────┼────┤
|
||||
│ D1 │ D2 │ D3 │ D4 │ D5 │ P1 │ P2 │ D6 │ ← Parity distributed
|
||||
├────┼────┼────┼────┼────┼────┼────┼────┤
|
||||
│ D1 │ D2 │ D3 │ D4 │ P1 │ P2 │ D5 │ D6 │
|
||||
└────┴────┴────┴────┴────┴────┴────┴────┘
|
||||
|
||||
✅ Survives 2 simultaneous drive failures
|
||||
✅ Good read performance
|
||||
✅ 6 drives worth of usable space (75% efficiency)
|
||||
⚠️ Slower writes due to parity calculation
|
||||
```
|
||||
|
||||
### **Volume Layout**
|
||||
|
||||
```
|
||||
/volume1/ (Atlantis - ~96TB usable)
|
||||
│
|
||||
├── /docker/ # Container persistent data
|
||||
│ ├── plex/
|
||||
│ ├── immich/
|
||||
│ ├── grafana/
|
||||
│ └── ... (all stack data)
|
||||
│
|
||||
├── /media/ # Media library
|
||||
│ ├── movies/ # 4K + 1080p movies
|
||||
│ ├── tv/ # TV series
|
||||
│ ├── music/ # Music library
|
||||
│ └── audiobooks/ # Audiobook collection
|
||||
│
|
||||
├── /photos/ # Immich photo library
|
||||
│ ├── library/ # Organized photos
|
||||
│ └── upload/ # Incoming uploads
|
||||
│
|
||||
├── /documents/ # Paperless-NGX
|
||||
│ ├── consume/ # Incoming documents
|
||||
│ └── archive/ # Processed documents
|
||||
│
|
||||
├── /backups/ # Local backup storage
|
||||
│ ├── calypso/ # Cross-NAS backups
|
||||
│ └── vm-snapshots/ # VM backup images
|
||||
│
|
||||
└── /archive/ # Long-term cold storage
|
||||
└── old-projects/
|
||||
```
|
||||
|
||||
### **NVMe SSD Cache**
|
||||
- **Type**: Read-write cache
|
||||
- **Drives**: 2x WD Black SN750 480GB
|
||||
- **Configuration**: RAID 1 (mirrored for safety)
|
||||
- **Purpose**: Accelerate frequently accessed data
|
||||
|
||||
---
|
||||
|
||||
## 🏢 Calypso - Secondary Storage
|
||||
|
||||
### **Hardware Configuration**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **NAS Model** | Synology DS723+ |
|
||||
| **Drive Bays** | 2x 3.5" hot-swap |
|
||||
| **Drives** | 2x Seagate IronWolf Pro 12TB (ST12000NT001) |
|
||||
| **Cache** | 2x WD Black SN750 480GB NVMe |
|
||||
| **RAID Level** | RAID 1 (mirrored) |
|
||||
| **Raw Capacity** | 24TB |
|
||||
| **Usable Capacity** | ~12TB |
|
||||
| **Fault Tolerance** | 1 drive failure |
|
||||
|
||||
### **RAID 1 Benefits**
|
||||
|
||||
```
|
||||
RAID 1 Configuration:
|
||||
┌────────────────┐ ┌────────────────┐
|
||||
│ Drive 1 │ │ Drive 2 │
|
||||
│ (12TB) │◄─► (12TB) │ ← Mirror
|
||||
│ │ │ │
|
||||
│ All data is │ │ Exact copy │
|
||||
│ written to │ │ of Drive 1 │
|
||||
│ both drives │ │ │
|
||||
└────────────────┘ └────────────────┘
|
||||
|
||||
✅ Survives 1 drive failure
|
||||
✅ Fast read performance (can read from either)
|
||||
✅ Simple recovery (just replace failed drive)
|
||||
⚠️ 50% storage efficiency
|
||||
```
|
||||
|
||||
### **Volume Layout**
|
||||
|
||||
```
|
||||
/volume1/ (Calypso - ~12TB usable)
|
||||
│
|
||||
├── /docker/ # Container persistent data
|
||||
│ ├── gitea/
|
||||
│ ├── firefly/
|
||||
│ ├── arr-suite/
|
||||
│ └── ... (dev stacks)
|
||||
│
|
||||
├── /apt-cache/ # APT-Cacher-NG
|
||||
│ └── cache/ # Debian package cache
|
||||
│
|
||||
├── /backups/ # Backup destination
|
||||
│ ├── atlantis/ # Hyper Backup from Atlantis
|
||||
│ └── databases/ # Database dumps
|
||||
│
|
||||
└── /development/ # Development data
|
||||
├── repos/ # Git repositories
|
||||
└── projects/ # Project files
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Other Storage Systems
|
||||
|
||||
### **Guava - AI/ML Workstation**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **Primary** | 1TB NVMe SSD (OS + fast storage) |
|
||||
| **Secondary** | 2x HDD in RAID 1 (~3TB usable) |
|
||||
| **Purpose** | AI model storage, datasets, compute scratch |
|
||||
|
||||
### **Setillo - Monitoring**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **Storage** | 1TB single drive |
|
||||
| **Purpose** | Prometheus metrics, AdGuard data |
|
||||
| **Note** | Non-critical data, can be rebuilt |
|
||||
|
||||
### **Proxmox - VM Host**
|
||||
|
||||
| Component | Specification |
|
||||
|-----------|--------------|
|
||||
| **Storage** | ~500GB local SSD |
|
||||
| **Purpose** | VM disk images |
|
||||
| **Backup** | VMs backed up to Atlantis |
|
||||
|
||||
---
|
||||
|
||||
## 📦 Backup Strategy
|
||||
|
||||
### **3-2-1 Rule Implementation**
|
||||
|
||||
| Rule | Implementation | Status |
|
||||
|------|----------------|--------|
|
||||
| **3 Copies** | Original + Calypso + Backblaze | ✅ |
|
||||
| **2 Media Types** | NAS HDDs + Cloud | ✅ |
|
||||
| **1 Offsite** | Backblaze B2 | ✅ |
|
||||
|
||||
### **Backup Flow**
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ ATLANTIS │────►│ CALYPSO │────►│ BACKBLAZE │
|
||||
│ (Primary) │ │ (Local) │ │ B2 │
|
||||
│ │ │ │ │ (Offsite) │
|
||||
│ Original │ │ Hyper │ │ Cloud │
|
||||
│ Data │ │ Backup │ │ Backup │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
Immediate < 24 hours < 24 hours
|
||||
Access Recovery Recovery
|
||||
```
|
||||
|
||||
### **Backup Software**
|
||||
|
||||
| Tool | Source | Destination | Schedule |
|
||||
|------|--------|-------------|----------|
|
||||
| **Synology Hyper Backup** | Atlantis | Calypso | Daily |
|
||||
| **Synology Cloud Sync** | Atlantis | Backblaze B2 | Daily |
|
||||
| **Synology Hyper Backup** | Calypso | Backblaze B2 | Weekly |
|
||||
|
||||
### **What Gets Backed Up**
|
||||
|
||||
| Data Type | Priority | Frequency | Retention |
|
||||
|-----------|----------|-----------|-----------|
|
||||
| **Docker configs** | Critical | Daily | 30 days |
|
||||
| **Databases** | Critical | Daily | 30 days |
|
||||
| **Photos (Immich)** | High | Daily | Forever |
|
||||
| **Documents** | High | Daily | 1 year |
|
||||
| **Media library** | Medium | Weekly | Latest only |
|
||||
| **VM snapshots** | Medium | Weekly | 4 versions |
|
||||
| **Logs** | Low | Not backed up | N/A |
|
||||
|
||||
### **Recovery Time Objectives**
|
||||
|
||||
| Scenario | RTO Target | Recovery Method |
|
||||
|----------|------------|-----------------|
|
||||
| Single file recovery | < 1 hour | Hyper Backup restore |
|
||||
| Service recovery | < 4 hours | Docker volume restore |
|
||||
| Full NAS recovery | < 24 hours | Bare metal + B2 restore |
|
||||
| Disaster recovery | < 48 hours | New hardware + B2 restore |
|
||||
|
||||
---
|
||||
|
||||
## 📂 Shared Storage (NFS/SMB)
|
||||
|
||||
### **Network Shares**
|
||||
|
||||
| Share | Protocol | Host | Access | Purpose |
|
||||
|-------|----------|------|--------|---------|
|
||||
| `/media` | SMB | Atlantis | Read-only (most), RW (arr) | Media streaming |
|
||||
| `/photos` | SMB | Atlantis | RW (Immich user) | Photo backup |
|
||||
| `/docker` | NFS | Atlantis | RW (Docker hosts) | Container data |
|
||||
| `/backups` | SMB | Calypso | RW (backup service) | Backup destination |
|
||||
|
||||
### **Docker Volume Mounts**
|
||||
|
||||
Containers access NAS storage via NFS mounts:
|
||||
|
||||
```yaml
|
||||
# Example: Plex accessing media
|
||||
volumes:
|
||||
- /volume1/docker/plex:/config
|
||||
- /volume1/media:/media:ro
|
||||
```
|
||||
|
||||
### **Permission Model**
|
||||
|
||||
```
|
||||
NAS User: docker (UID 1000)
|
||||
├── Owns /volume1/docker/
|
||||
├── Read access to /volume1/media/
|
||||
└── Write access to specific paths
|
||||
|
||||
NAS User: media (UID 1001)
|
||||
├── Write access to /volume1/media/
|
||||
└── Used by *arr suite for downloads
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 Storage Monitoring
|
||||
|
||||
### **Metrics Collected**
|
||||
|
||||
| Metric | Tool | Alert Threshold |
|
||||
|--------|------|-----------------|
|
||||
| Disk usage | Prometheus + Node Exporter | > 85% |
|
||||
| RAID health | Synology DSM | Degraded |
|
||||
| Drive SMART | Synology DSM | Warning/Critical |
|
||||
| I/O latency | Prometheus | > 100ms |
|
||||
| Backup status | Hyper Backup | Failed |
|
||||
|
||||
### **Grafana Dashboard**
|
||||
|
||||
Storage dashboard shows:
|
||||
- Volume utilization trends
|
||||
- I/O throughput
|
||||
- RAID rebuild status
|
||||
- Drive temperatures
|
||||
- Backup completion status
|
||||
|
||||
---
|
||||
|
||||
## 🔮 Storage Expansion Plan
|
||||
|
||||
### **Current Utilization**
|
||||
|
||||
| Host | Used | Total | % Used |
|
||||
|------|------|-------|--------|
|
||||
| Atlantis | ~60TB | 96TB | 62% |
|
||||
| Calypso | ~12TB | 12TB | ~100% |
|
||||
|
||||
### **Future Expansion Options**
|
||||
|
||||
1. **Atlantis**: Already at max capacity (8 bays)
|
||||
- Replace 16TB drives with larger (24TB+) when available
|
||||
- Add expansion unit (DX517)
|
||||
|
||||
2. **Calypso**: At capacity
|
||||
- Replace 12TB drives with 20TB+ drives
|
||||
- Consider migration to larger NAS
|
||||
|
||||
3. **New NAS**: For cold/archive storage
|
||||
- Lower-powered unit for infrequent access
|
||||
- RAID 5 acceptable for archive data
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Maintenance Tasks
|
||||
|
||||
### **Regular Maintenance**
|
||||
|
||||
| Task | Frequency | Procedure |
|
||||
|------|-----------|-----------|
|
||||
| SMART check | Weekly | Review DSM health |
|
||||
| Scrub | Monthly | Synology scheduled task |
|
||||
| Backup verification | Monthly | Test restore of random files |
|
||||
| Capacity review | Quarterly | Plan for growth |
|
||||
|
||||
### **Drive Replacement Procedure**
|
||||
|
||||
1. **Identify failed drive** via DSM notification
|
||||
2. **Order replacement** (same or larger capacity)
|
||||
3. **Hot-swap** failed drive
|
||||
4. **Monitor rebuild** (can take 24-48 hours for large arrays)
|
||||
5. **Verify RAID health** after rebuild completes
|
||||
|
||||
---
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- **[Host Infrastructure](hosts.md)**: Server specifications
|
||||
- **[Security Model](security.md)**: Backup encryption details
|
||||
- **[Network Architecture](networking.md)**: NFS/SMB networking
|
||||
|
||||
---
|
||||
|
||||
*Storage infrastructure is critical. Regular monitoring and proactive maintenance prevent data loss.*
|
||||
528
docs/infrastructure/tailscale-setup-guide.md
Normal file
528
docs/infrastructure/tailscale-setup-guide.md
Normal file
@@ -0,0 +1,528 @@
|
||||
# 🌐 Tailscale Setup Guide with Split-Brain DNS
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide shows you how to set up Tailscale for secure homelab access with split-brain DNS, allowing you to use local hostnames like `atlantis.vish.local` from anywhere in the world.
|
||||
|
||||
## 🎯 Why Tailscale Over Traditional VPN?
|
||||
|
||||
### ✅ **Advantages of Tailscale**
|
||||
- **Zero-config mesh networking** - No complex server setup
|
||||
- **NAT traversal** - Works behind any router/firewall
|
||||
- **Split-brain DNS** - Use local hostnames anywhere
|
||||
- **Per-device access control** - Granular permissions
|
||||
- **Cross-platform** - Works on everything
|
||||
- **No port forwarding needed** - Completely eliminates router configuration
|
||||
|
||||
### 🆚 **Tailscale vs WireGuard**
|
||||
| Feature | Tailscale | Traditional WireGuard |
|
||||
|---------|-----------|----------------------|
|
||||
| Setup Complexity | 🟢 Simple | 🟡 Moderate |
|
||||
| NAT Traversal | 🟢 Automatic | 🔴 Manual |
|
||||
| DNS Resolution | 🟢 Built-in | 🟡 Manual setup |
|
||||
| Device Management | 🟢 Web dashboard | 🔴 Config files |
|
||||
| Port Forwarding | 🟢 Not needed | 🔴 Required |
|
||||
|
||||
## 🏗️ Your Homelab Hosts
|
||||
|
||||
Here are all the hosts that will be accessible via Tailscale:
|
||||
|
||||
### 🖥️ **Primary Infrastructure**
|
||||
| Hostname | IP Range | Role | Key Services |
|
||||
|----------|----------|------|--------------|
|
||||
| `atlantis.vish.local` | 192.168.1.x | Primary NAS | Plex, Vaultwarden, Grafana, GitLab |
|
||||
| `calypso.vish.local` | 192.168.1.x | Media NAS | Immich, Arr Suite, Prometheus |
|
||||
| `concord-nuc.vish.local` | 192.168.1.x | Edge Computing | Home Assistant, WireGuard, Invidious |
|
||||
|
||||
### 🖥️ **Virtual Machines**
|
||||
| Hostname | IP Range | Role | Key Services |
|
||||
|----------|----------|------|--------------|
|
||||
| `homelab-vm.vish.local` | 192.168.1.x | General VM | Satisfactory, Mattermost, Signal API |
|
||||
| `chicago-vm.vish.local` | 192.168.1.x | Gaming VM | Jellyfin, Factorio, Neko |
|
||||
| `bulgaria-vm.vish.local` | 192.168.1.x | Utility VM | Navidrome, Droppy, Syncthing |
|
||||
|
||||
### 🔧 **Specialized Hosts**
|
||||
| Hostname | IP Range | Role | Key Services |
|
||||
|----------|----------|------|--------------|
|
||||
| `anubis.vish.local` | 192.168.1.x | Archive/Backup | ArchiveBox, PhotoPrism, Matrix Conduit |
|
||||
| `guava.vish.local` | 192.168.1.x | Remote Server | Ollama, CoCalc, OpenWebUI |
|
||||
| `setillo.vish.local` | 192.168.1.x | Monitoring | Prometheus, AdGuard |
|
||||
|
||||
### 🍓 **Raspberry Pi Cluster**
|
||||
| Hostname | IP Range | Role | Key Services |
|
||||
|----------|----------|------|--------------|
|
||||
| `rpi-vish.vish.local` | 192.168.1.x | IoT Hub | Immich, DNS Updater |
|
||||
| `rpi-kevin.vish.local` | 192.168.1.x | Game Server | Minecraft, PMC |
|
||||
|
||||
### 🎮 **Edge Devices**
|
||||
| Hostname | IP Range | Role | Key Services |
|
||||
|----------|----------|------|--------------|
|
||||
| `nvidia-shield.vish.local` | 192.168.1.x | Media Client | WireGuard Client |
|
||||
| `contabo-vm.vish.local` | External | Cloud VM | Ollama, External Services |
|
||||
|
||||
## 🚀 Quick Setup (5 Minutes)
|
||||
|
||||
### 1. **Create Tailscale Account**
|
||||
```bash
|
||||
# Visit https://tailscale.com and create account
|
||||
# Choose the free plan (up to 20 devices, 3 users)
|
||||
```
|
||||
|
||||
### 2. **Install on Each Host**
|
||||
|
||||
#### **Ubuntu/Debian (Most VMs)**
|
||||
```bash
|
||||
# Add Tailscale repository
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
|
||||
# Start Tailscale
|
||||
sudo tailscale up
|
||||
|
||||
# Follow the authentication URL
|
||||
```
|
||||
|
||||
#### **Synology NAS (Atlantis, Calypso)**
|
||||
```bash
|
||||
# Method 1: Package Center
|
||||
# Search for "Tailscale" and install
|
||||
|
||||
# Method 2: Docker (if package not available)
|
||||
docker run -d \
|
||||
--name=tailscale \
|
||||
--cap-add=NET_ADMIN \
|
||||
--cap-add=SYS_MODULE \
|
||||
--device=/dev/net/tun \
|
||||
-v /var/lib/tailscale:/var/lib/tailscale \
|
||||
-v /dev/net/tun:/dev/net/tun \
|
||||
tailscale/tailscale:latest \
|
||||
tailscaled
|
||||
```
|
||||
|
||||
#### **Raspberry Pi**
|
||||
```bash
|
||||
# Same as Ubuntu/Debian
|
||||
curl -fsSL https://tailscale.com/install.sh | sh
|
||||
sudo tailscale up
|
||||
```
|
||||
|
||||
### 3. **Install on Client Devices**
|
||||
- **Windows/Mac**: Download from https://tailscale.com/download
|
||||
- **iOS/Android**: Install from app store
|
||||
- **Linux Desktop**: Same as server installation
|
||||
|
||||
## 🌐 Split-Brain DNS Configuration
|
||||
|
||||
### **Current Production Configuration**
|
||||
Based on your live Tailscale setup, here's your working DNS configuration:
|
||||
|
||||
#### **Tailnet DNS Name**: `tail.vish.gg`
|
||||
- Unique identifier for your Tailscale network
|
||||
- Used for DNS entries, device sharing, and TLS certificates
|
||||
- Automatically assigned by Tailscale
|
||||
|
||||
#### **Nameserver Configuration**:
|
||||
```bash
|
||||
# MagicDNS (Primary)
|
||||
tail.vish.gg → 100.100.100.100
|
||||
|
||||
# Split DNS for Local Network
|
||||
vish.local → 192.168.0.250 (Use with exit mode)
|
||||
|
||||
# Global Nameservers (Your Homelab DNS)
|
||||
100.103.48.78 # Calypso Tailscale IP
|
||||
100.72.55.21 # Concord-NUC Tailscale IP
|
||||
```
|
||||
|
||||
#### **Search Domains**: `tail.vish.gg`
|
||||
- Automatically appends to short hostnames
|
||||
- Enables `atlantis` → `atlantis.tail.vish.gg` resolution
|
||||
|
||||
### 1. **Enable MagicDNS** ✅ **Already Configured**
|
||||
```bash
|
||||
# Your MagicDNS is already enabled with:
|
||||
# - Tailnet domain: tail.vish.gg
|
||||
# - Primary DNS: 100.100.100.100 (MagicDNS)
|
||||
# - Override DNS servers: ENABLED
|
||||
# - Apps control: Enabled for third-party app access
|
||||
```
|
||||
|
||||
### 2. **Add Custom DNS Records**
|
||||
|
||||
In the Tailscale admin console, add these DNS records:
|
||||
|
||||
#### **A Records (IPv4)**
|
||||
```dns
|
||||
atlantis.vish.local → 192.168.1.100 # Replace with actual IP
|
||||
calypso.vish.local → 192.168.1.101
|
||||
concord-nuc.vish.local → 192.168.1.102
|
||||
homelab-vm.vish.local → 192.168.1.103
|
||||
chicago-vm.vish.local → 192.168.1.104
|
||||
bulgaria-vm.vish.local → 192.168.1.105
|
||||
anubis.vish.local → 192.168.1.106
|
||||
guava.vish.local → 192.168.1.107
|
||||
setillo.vish.local → 192.168.1.108
|
||||
rpi-vish.vish.local → 192.168.1.109
|
||||
rpi-kevin.vish.local → 192.168.1.110
|
||||
nvidia-shield.vish.local → 192.168.1.111
|
||||
```
|
||||
|
||||
#### **CNAME Records (Aliases)**
|
||||
```dns
|
||||
# Service-specific aliases
|
||||
plex.vish.local → atlantis.vish.local
|
||||
grafana.vish.local → atlantis.vish.local
|
||||
immich.vish.local → calypso.vish.local
|
||||
homeassistant.vish.local → concord-nuc.vish.local
|
||||
jellyfin.vish.local → chicago-vm.vish.local
|
||||
```
|
||||
|
||||
### 3. **Alternative: Local DNS Server Method**
|
||||
|
||||
If you prefer more control, set up a local DNS server:
|
||||
|
||||
#### **Pi-hole Configuration** (on Atlantis)
|
||||
```bash
|
||||
# Add to Pi-hole custom DNS records
|
||||
# /etc/pihole/custom.list
|
||||
192.168.1.100 atlantis.vish.local
|
||||
192.168.1.101 calypso.vish.local
|
||||
192.168.1.102 concord-nuc.vish.local
|
||||
# ... add all hosts
|
||||
```
|
||||
|
||||
#### **Tailscale DNS Settings**
|
||||
```bash
|
||||
# Point Tailscale to use your Pi-hole
|
||||
# In admin console: DNS → Nameservers
|
||||
# Add: 192.168.1.100 (Pi-hole IP)
|
||||
```
|
||||
|
||||
## 🔧 Advanced Configuration
|
||||
|
||||
### 1. **Subnet Routing** (Access entire homelab network)
|
||||
|
||||
On your primary router/gateway host (e.g., Atlantis):
|
||||
```bash
|
||||
# Enable subnet routing
|
||||
sudo tailscale up --advertise-routes=192.168.1.0/24
|
||||
|
||||
# In Tailscale admin console:
|
||||
# Go to Machines → atlantis → Route settings
|
||||
# Enable the advertised route
|
||||
```
|
||||
|
||||
### 2. **Exit Node** (Route all traffic through homelab)
|
||||
```bash
|
||||
# On a homelab host (e.g., Atlantis)
|
||||
sudo tailscale up --advertise-exit-node
|
||||
|
||||
# On client devices
|
||||
tailscale up --exit-node=atlantis
|
||||
```
|
||||
|
||||
### 3. **Access Control Lists (ACLs)**
|
||||
|
||||
Create fine-grained access control:
|
||||
```json
|
||||
{
|
||||
"acls": [
|
||||
{
|
||||
"action": "accept",
|
||||
"src": ["group:family"],
|
||||
"dst": ["192.168.1.0/24:*"]
|
||||
},
|
||||
{
|
||||
"action": "accept",
|
||||
"src": ["group:admin"],
|
||||
"dst": ["*:*"]
|
||||
}
|
||||
],
|
||||
"groups": {
|
||||
"group:family": ["user1@example.com", "user2@example.com"],
|
||||
"group:admin": ["admin@example.com"]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 📱 Client Usage Examples
|
||||
|
||||
### **From Your Phone**
|
||||
```bash
|
||||
# Access services using local hostnames
|
||||
https://atlantis.vish.local:8920 # Plex
|
||||
https://grafana.vish.local:3000 # Grafana
|
||||
https://immich.vish.local # Photo management
|
||||
```
|
||||
|
||||
### **From Laptop While Traveling**
|
||||
```bash
|
||||
# SSH to any host
|
||||
ssh user@atlantis.vish.local
|
||||
ssh user@homelab-vm.vish.local
|
||||
|
||||
# Access web services
|
||||
curl http://atlantis.vish.local:8080
|
||||
```
|
||||
|
||||
### **Service Discovery**
|
||||
```bash
|
||||
# List all Tailscale devices
|
||||
tailscale status
|
||||
|
||||
# Ping any host
|
||||
ping atlantis.vish.local
|
||||
ping calypso.vish.local
|
||||
```
|
||||
|
||||
## 🛡️ Security Best Practices
|
||||
|
||||
### 1. **Device Authentication**
|
||||
```bash
|
||||
# Require device approval
|
||||
# In admin console: Settings → Device approval
|
||||
# Enable "Device approval required"
|
||||
```
|
||||
|
||||
### 2. **Key Expiry**
|
||||
```bash
|
||||
# Set key expiration (default 180 days)
|
||||
# In admin console: Settings → Key expiry
|
||||
# Recommended: 90 days for better security
|
||||
```
|
||||
|
||||
### 3. **Disable Key Expiry for Servers**
|
||||
```bash
|
||||
# For always-on servers, disable expiry
|
||||
sudo tailscale up --auth-key=tskey-xxx --advertise-routes=192.168.1.0/24
|
||||
```
|
||||
|
||||
### 4. **Network Segmentation**
|
||||
```bash
|
||||
# Use ACLs to limit access between devices
|
||||
# Example: Only allow admin devices to access management interfaces
|
||||
```
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### **DNS Not Resolving**
|
||||
```bash
|
||||
# Check MagicDNS status
|
||||
tailscale status --json | jq '.MagicDNSSuffix'
|
||||
|
||||
# Test DNS resolution
|
||||
nslookup atlantis.vish.local
|
||||
dig atlantis.vish.local
|
||||
|
||||
# Force DNS refresh
|
||||
sudo tailscale up --reset
|
||||
```
|
||||
|
||||
### **Can't Access Local Services**
|
||||
```bash
|
||||
# Check if subnet routing is enabled
|
||||
tailscale status | grep "subnet routes"
|
||||
|
||||
# Verify routes in admin console
|
||||
# Machines → [host] → Route settings
|
||||
|
||||
# Test connectivity
|
||||
ping 192.168.1.100
|
||||
telnet atlantis.vish.local 8080
|
||||
```
|
||||
|
||||
### **Connection Issues**
|
||||
```bash
|
||||
# Check Tailscale status
|
||||
tailscale status
|
||||
|
||||
# View logs
|
||||
sudo journalctl -u tailscaled -f
|
||||
|
||||
# Restart Tailscale
|
||||
sudo systemctl restart tailscaled
|
||||
```
|
||||
|
||||
## 📊 Service Access Map
|
||||
|
||||
Once configured, you can access services like this:
|
||||
|
||||
### **Media Services**
|
||||
```bash
|
||||
# Plex Media Server
|
||||
https://atlantis.vish.local:32400
|
||||
|
||||
# Immich Photos
|
||||
https://calypso.vish.local:2283
|
||||
|
||||
# Jellyfin
|
||||
https://chicago-vm.vish.local:8096
|
||||
|
||||
# Navidrome Music
|
||||
https://bulgaria-vm.vish.local:4533
|
||||
```
|
||||
|
||||
### **Management & Monitoring**
|
||||
```bash
|
||||
# Grafana Dashboards
|
||||
https://atlantis.vish.local:3000
|
||||
|
||||
# Prometheus Metrics
|
||||
https://calypso.vish.local:9090
|
||||
|
||||
# Uptime Kuma
|
||||
https://atlantis.vish.local:3001
|
||||
|
||||
# Portainer
|
||||
https://atlantis.vish.local:9000
|
||||
```
|
||||
|
||||
### **Development & Productivity**
|
||||
```bash
|
||||
# GitLab
|
||||
https://atlantis.vish.local:8929
|
||||
|
||||
# Vaultwarden (Password Manager)
|
||||
https://atlantis.vish.local:8222
|
||||
|
||||
# Home Assistant
|
||||
https://concord-nuc.vish.local:8123
|
||||
|
||||
# Mattermost Chat
|
||||
https://homelab-vm.vish.local:8065
|
||||
```
|
||||
|
||||
## 🚀 Migration from WireGuard
|
||||
|
||||
If you're currently using WireGuard:
|
||||
|
||||
### 1. **Parallel Setup**
|
||||
```bash
|
||||
# Keep WireGuard running while testing Tailscale
|
||||
# Both can coexist temporarily
|
||||
```
|
||||
|
||||
### 2. **Test All Services**
|
||||
```bash
|
||||
# Verify each service works via Tailscale
|
||||
# Test from multiple client devices
|
||||
```
|
||||
|
||||
### 3. **Update Documentation**
|
||||
```bash
|
||||
# Update service URLs in documentation
|
||||
# Change from external IPs to .vish.local hostnames
|
||||
```
|
||||
|
||||
### 4. **Decommission WireGuard**
|
||||
```bash
|
||||
# Once confident, disable WireGuard
|
||||
# Remove port forwarding rules
|
||||
# Keep configs as backup
|
||||
```
|
||||
|
||||
## 💡 Pro Tips
|
||||
|
||||
### **1. Use Descriptive Hostnames**
|
||||
```bash
|
||||
# Instead of generic names, use descriptive ones
|
||||
media-server.vish.local # Instead of atlantis.vish.local
|
||||
monitoring.vish.local # For Grafana/Prometheus host
|
||||
gaming.vish.local # For game servers
|
||||
```
|
||||
|
||||
### **2. Create Service-Specific Aliases**
|
||||
```bash
|
||||
# Add CNAME records for easy access
|
||||
plex.vish.local → atlantis.vish.local
|
||||
photos.vish.local → calypso.vish.local
|
||||
chat.vish.local → homelab-vm.vish.local
|
||||
```
|
||||
|
||||
### **3. Mobile Shortcuts**
|
||||
```bash
|
||||
# Create bookmarks/shortcuts on mobile devices
|
||||
# Use descriptive names: "Home Plex", "Photo Library", etc.
|
||||
```
|
||||
|
||||
### **4. Monitoring Integration**
|
||||
```bash
|
||||
# Update Uptime Kuma to monitor .vish.local hostnames
|
||||
# Update Grafana dashboards to use local hostnames
|
||||
# Configure alerts to use Tailscale IPs
|
||||
```
|
||||
|
||||
## 🔗 Integration with Existing Services
|
||||
|
||||
### **Update Service Configurations**
|
||||
Many services can be updated to use Tailscale hostnames:
|
||||
|
||||
```yaml
|
||||
# Example: Update docker-compose.yml files
|
||||
environment:
|
||||
- GRAFANA_URL=https://grafana.vish.local:3000
|
||||
- PLEX_URL=https://plex.vish.local:32400
|
||||
- DATABASE_HOST=atlantis.vish.local
|
||||
```
|
||||
|
||||
### **Reverse Proxy Updates**
|
||||
```nginx
|
||||
# Update Nginx Proxy Manager
|
||||
# Change upstream servers to use .vish.local hostnames
|
||||
upstream plex {
|
||||
server atlantis.vish.local:32400;
|
||||
}
|
||||
```
|
||||
|
||||
## 📋 Quick Reference
|
||||
|
||||
### **Essential Commands**
|
||||
```bash
|
||||
# Check status
|
||||
tailscale status
|
||||
|
||||
# Connect/disconnect
|
||||
tailscale up
|
||||
tailscale down
|
||||
|
||||
# List devices
|
||||
tailscale status --peers
|
||||
|
||||
# Get IP address
|
||||
tailscale ip -4
|
||||
|
||||
# Enable/disable routes
|
||||
tailscale up --advertise-routes=192.168.1.0/24
|
||||
```
|
||||
|
||||
### **Common URLs After Setup**
|
||||
```bash
|
||||
# Admin interfaces
|
||||
https://atlantis.vish.local:9000 # Portainer
|
||||
https://atlantis.vish.local:3000 # Grafana
|
||||
https://atlantis.vish.local:3001 # Uptime Kuma
|
||||
|
||||
# Media services
|
||||
https://atlantis.vish.local:32400 # Plex
|
||||
https://calypso.vish.local:2283 # Immich
|
||||
https://chicago-vm.vish.local:8096 # Jellyfin
|
||||
|
||||
# Communication
|
||||
https://homelab-vm.vish.local:8065 # Mattermost
|
||||
https://atlantis.vish.local:8080 # Signal API
|
||||
```
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [📱 Mobile Device Setup](mobile-device-setup.md) - **NEW!** iOS, Android, macOS, Linux Tailscale configuration
|
||||
- [👨👩👧👦 Family Network Integration](family-network-integration.md) - **NEW!** Connect family's separate network via Tailscale
|
||||
- [💻 Laptop Travel Setup](laptop-travel-setup.md) - Secure travel with VPN tunneling
|
||||
- [Port Forwarding Guide](port-forwarding-guide.md) - Traditional VPN setup (alternative)
|
||||
- [🔥 Disaster Recovery Guide](../troubleshooting/disaster-recovery.md) - Router failure and network reconfiguration
|
||||
- [🔐 Offline Password Access](../troubleshooting/offline-password-access.md) - Accessing passwords when services are down
|
||||
- [Security Model](security.md) - Overall security architecture
|
||||
- [Network Architecture](networking.md) - Network topology and design
|
||||
- [Individual Service Docs](../services/individual/README.md) - Service-specific access information
|
||||
|
||||
---
|
||||
|
||||
**🎉 Result**: After setup, you can access your entire homelab using friendly hostnames like `atlantis.vish.local` from anywhere in the world, without any port forwarding or complex VPN configuration!
|
||||
812
docs/infrastructure/tplink-archer-be800-setup.md
Normal file
812
docs/infrastructure/tplink-archer-be800-setup.md
Normal file
@@ -0,0 +1,812 @@
|
||||
# 🌐 TP-Link Archer BE800 v1.6 Router Setup Guide
|
||||
|
||||
**🟡 Intermediate Guide**
|
||||
|
||||
This guide provides specific instructions for configuring the TP-Link Archer BE800 v1.6 router for your homelab, including static IP assignments, port forwarding, and disaster recovery procedures.
|
||||
|
||||
## 📋 Router Specifications
|
||||
|
||||
### **TP-Link Archer BE800 v1.6**
|
||||
- **WiFi Standard**: WiFi 7 (802.11be)
|
||||
- **Speed**: Up to 19 Gbps (11520 Mbps on 6 GHz + 5760 Mbps on 5 GHz + 1376 Mbps on 2.4 GHz)
|
||||
- **Ports**: 1x 10 Gbps WAN/LAN, 4x 2.5 Gbps LAN, 1x USB 3.0
|
||||
- **CPU**: Quad-core 2.2 GHz processor
|
||||
- **RAM**: 2 GB
|
||||
- **Antennas**: 8 high-gain antennas
|
||||
- **Default IP**: 192.168.0.1 (can be changed to 192.168.1.1)
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Initial Setup
|
||||
|
||||
### **Step 1: Physical Connection**
|
||||
```bash
|
||||
# 1. Connect modem to WAN port (10 Gbps port - usually blue/different color)
|
||||
# 2. Connect computer to any LAN port via Ethernet
|
||||
# 3. Power on router and wait 2-3 minutes for full boot
|
||||
```
|
||||
|
||||
### **Step 2: Access Router Interface**
|
||||
```bash
|
||||
# Default access methods:
|
||||
# Web Interface: http://192.168.0.1 or http://tplinkwifi.net
|
||||
# Default Login: admin / admin (or blank password)
|
||||
|
||||
# If you can't access, find router IP:
|
||||
ip route | grep default
|
||||
# Look for: default via 192.168.0.1 dev eth0
|
||||
```
|
||||
|
||||
### **Step 3: Quick Setup Wizard**
|
||||
```bash
|
||||
# The BE800 will launch setup wizard on first access:
|
||||
|
||||
# 1. Set Time Zone
|
||||
Time Zone: America/Los_Angeles (or your timezone)
|
||||
|
||||
# 2. Internet Connection Type
|
||||
# Choose based on your ISP:
|
||||
- Dynamic IP (DHCP) - Most common
|
||||
- Static IP - If ISP provided specific settings
|
||||
- PPPoE - DSL connections
|
||||
|
||||
# 3. Wireless Settings
|
||||
2.4 GHz SSID: YourNetwork_2.4G
|
||||
5 GHz SSID: YourNetwork_5G
|
||||
6 GHz SSID: YourNetwork_6G
|
||||
Password: "REDACTED_PASSWORD" password - save to password manager]
|
||||
|
||||
# 4. Admin Password
|
||||
Username: admin
|
||||
Password: "REDACTED_PASSWORD" admin password - save to password manager]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Network Configuration for Homelab
|
||||
|
||||
### **Step 1: Change Router IP to 192.168.1.1**
|
||||
```bash
|
||||
# Navigate to: Advanced → Network → LAN
|
||||
|
||||
# Current Settings:
|
||||
IP Address: 192.168.0.1
|
||||
Subnet Mask: 255.255.255.0
|
||||
|
||||
# Change to:
|
||||
IP Address: 192.168.1.1
|
||||
Subnet Mask: 255.255.255.0
|
||||
```
|
||||
|
||||
**⚠️ Important**: After changing IP, you'll need to reconnect at `http://192.168.1.1`
|
||||
|
||||
### **Step 2: DHCP Configuration**
|
||||
```bash
|
||||
# Navigate to: Advanced → Network → DHCP Server
|
||||
|
||||
# DHCP Settings:
|
||||
Enable DHCP Server: ✅ Enabled
|
||||
IP Address Pool: 192.168.1.100 - 192.168.1.200
|
||||
Default Gateway: 192.168.1.1
|
||||
Primary DNS: 1.1.1.1
|
||||
Secondary DNS: 8.8.8.8
|
||||
Lease Time: 1440 minutes (24 hours)
|
||||
```
|
||||
|
||||
### **Step 3: DNS Configuration**
|
||||
```bash
|
||||
# Navigate to: Advanced → Network → Internet
|
||||
|
||||
# DNS Settings:
|
||||
Primary DNS: 1.1.1.1 (Cloudflare)
|
||||
Secondary DNS: 8.8.8.8 (Google)
|
||||
|
||||
# Or use your Pi-hole if running:
|
||||
Primary DNS: 192.168.1.100 (Atlantis Pi-hole)
|
||||
Secondary DNS: 1.1.1.1 (Fallback)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🖥️ Static IP Reservations (DHCP Reservations)
|
||||
|
||||
### **Navigate to: Advanced → Network → DHCP Server → Address Reservation**
|
||||
|
||||
#### **Add Reservations for All Homelab Hosts:**
|
||||
|
||||
```bash
|
||||
# Primary Infrastructure
|
||||
Device Name: atlantis
|
||||
MAC Address: [Find with: ip link show on Atlantis]
|
||||
Reserved IP: 192.168.1.100
|
||||
Status: Enabled
|
||||
|
||||
Device Name: calypso
|
||||
MAC Address: [Find with: ip link show on Calypso]
|
||||
Reserved IP: 192.168.1.101
|
||||
Status: Enabled
|
||||
|
||||
Device Name: concord-nuc
|
||||
MAC Address: [Find with: ip link show on Concord]
|
||||
Reserved IP: 192.168.1.102
|
||||
Status: Enabled
|
||||
|
||||
# Virtual Machines
|
||||
Device Name: homelab-vm
|
||||
MAC Address: [Find in VM settings or with ip link show]
|
||||
Reserved IP: 192.168.1.103
|
||||
Status: Enabled
|
||||
|
||||
Device Name: chicago-vm
|
||||
MAC Address: [Find in VM settings]
|
||||
Reserved IP: 192.168.1.104
|
||||
Status: Enabled
|
||||
|
||||
Device Name: bulgaria-vm
|
||||
MAC Address: [Find in VM settings]
|
||||
Reserved IP: 192.168.1.105
|
||||
Status: Enabled
|
||||
|
||||
# Specialized Hosts
|
||||
Device Name: anubis
|
||||
MAC Address: [Find with: ip link show on Anubis]
|
||||
Reserved IP: 192.168.1.106
|
||||
Status: Enabled
|
||||
|
||||
Device Name: guava
|
||||
MAC Address: [Find with: ip link show on Guava]
|
||||
Reserved IP: 192.168.1.107
|
||||
Status: Enabled
|
||||
|
||||
Device Name: setillo
|
||||
MAC Address: [Find with: ip link show on Setillo]
|
||||
Reserved IP: 192.168.1.108
|
||||
Status: Enabled
|
||||
|
||||
# Raspberry Pi Cluster
|
||||
Device Name: rpi-vish
|
||||
MAC Address: [Find with: cat /sys/class/net/eth0/address]
|
||||
Reserved IP: 192.168.1.109
|
||||
Status: Enabled
|
||||
|
||||
Device Name: rpi-kevin
|
||||
MAC Address: [Find with: cat /sys/class/net/eth0/address]
|
||||
Reserved IP: 192.168.1.110
|
||||
Status: Enabled
|
||||
|
||||
# Edge Devices
|
||||
Device Name: nvidia-shield
|
||||
MAC Address: [Find in Shield network settings]
|
||||
Reserved IP: 192.168.1.111
|
||||
Status: Enabled
|
||||
```
|
||||
|
||||
### **Finding MAC Addresses:**
|
||||
```bash
|
||||
# On Linux hosts:
|
||||
ip link show | grep -E "(ether|link)"
|
||||
# or
|
||||
cat /sys/class/net/eth0/address
|
||||
|
||||
# On Synology NAS:
|
||||
# Control Panel → Network → Network Interface → View details
|
||||
|
||||
# On Windows:
|
||||
ipconfig /all
|
||||
|
||||
# On macOS:
|
||||
ifconfig en0 | grep ether
|
||||
|
||||
# From router's DHCP client list:
|
||||
# Advanced → Network → DHCP Server → DHCP Client List
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔌 Port Forwarding Configuration
|
||||
|
||||
### **Navigate to: Advanced → NAT Forwarding → Virtual Servers**
|
||||
|
||||
#### **Essential Port Forwards (Configure First):**
|
||||
|
||||
```bash
|
||||
# VPN Access (Highest Priority)
|
||||
Service Name: WireGuard-Atlantis
|
||||
External Port: 51820
|
||||
Internal IP: 192.168.1.100
|
||||
Internal Port: 51820
|
||||
Protocol: UDP
|
||||
Status: Enabled
|
||||
|
||||
Service Name: WireGuard-Concord
|
||||
External Port: 51821
|
||||
Internal IP: 192.168.1.102
|
||||
Internal Port: 51820
|
||||
Protocol: UDP
|
||||
Status: Enabled
|
||||
|
||||
# Web Services (If needed for direct access)
|
||||
Service Name: HTTP-Proxy
|
||||
External Port: 80
|
||||
Internal IP: 192.168.1.100
|
||||
Internal Port: 8341
|
||||
Protocol: TCP
|
||||
Status: Enabled
|
||||
|
||||
Service Name: HTTPS-Proxy
|
||||
External Port: 443
|
||||
Internal IP: 192.168.1.100
|
||||
Internal Port: 8766
|
||||
Protocol: TCP
|
||||
Status: Enabled
|
||||
```
|
||||
|
||||
#### **Gaming Services (Optional):**
|
||||
|
||||
```bash
|
||||
# Satisfactory Server
|
||||
Service Name: Satisfactory-TCP
|
||||
External Port: 7777
|
||||
Internal IP: 192.168.1.103
|
||||
Internal Port: 7777
|
||||
Protocol: TCP
|
||||
Status: Enabled
|
||||
|
||||
Service Name: Satisfactory-UDP
|
||||
External Port: 7777
|
||||
Internal IP: 192.168.1.103
|
||||
Internal Port: 7777
|
||||
Protocol: UDP
|
||||
Status: Enabled
|
||||
|
||||
# Left 4 Dead 2 Server
|
||||
Service Name: L4D2-Game
|
||||
External Port: 27015
|
||||
Internal IP: 192.168.1.103
|
||||
Internal Port: 27015
|
||||
Protocol: Both (TCP & UDP)
|
||||
Status: Enabled
|
||||
|
||||
Service Name: L4D2-SourceTV
|
||||
External Port: 27020
|
||||
Internal IP: 192.168.1.103
|
||||
Internal Port: 27020
|
||||
Protocol: UDP
|
||||
Status: Enabled
|
||||
|
||||
Service Name: L4D2-Client
|
||||
External Port: 27005
|
||||
Internal IP: 192.168.1.103
|
||||
Internal Port: 27005
|
||||
Protocol: UDP
|
||||
Status: Enabled
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Dynamic DNS Configuration
|
||||
|
||||
### **Navigate to: Advanced → Network → Dynamic DNS**
|
||||
|
||||
#### **For Common DDNS Providers:**
|
||||
|
||||
```bash
|
||||
# Synology DDNS (if using vishinator.synology.me)
|
||||
Service Provider: Synology
|
||||
Domain Name: vishinator.synology.me
|
||||
Username: [Your Synology account]
|
||||
Password: "REDACTED_PASSWORD" Synology password]
|
||||
Status: Enabled
|
||||
|
||||
# No-IP
|
||||
Service Provider: No-IP
|
||||
Domain Name: yourdomain.ddns.net
|
||||
Username: [Your No-IP username]
|
||||
Password: "REDACTED_PASSWORD" No-IP password]
|
||||
Status: Enabled
|
||||
|
||||
# DynDNS
|
||||
Service Provider: DynDNS
|
||||
Domain Name: yourdomain.dyndns.org
|
||||
Username: [Your DynDNS username]
|
||||
Password: "REDACTED_PASSWORD" DynDNS password]
|
||||
Status: Enabled
|
||||
|
||||
# Custom DDNS (if using other provider)
|
||||
Service Provider: Custom
|
||||
DDNS Server: your-ddns-provider.com
|
||||
Domain Name: yourdomain.example.com
|
||||
Username: [Your username]
|
||||
Password: "REDACTED_PASSWORD" password]
|
||||
Status: Enabled
|
||||
```
|
||||
|
||||
### **Test DDNS Configuration:**
|
||||
```bash
|
||||
# Wait 5-10 minutes after configuration, then test:
|
||||
nslookup vishinator.synology.me
|
||||
dig vishinator.synology.me
|
||||
|
||||
# Should return your external IP address
|
||||
# Compare with:
|
||||
curl ifconfig.me
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📶 WiFi Configuration
|
||||
|
||||
### **Navigate to: Wireless → Wireless Settings**
|
||||
|
||||
#### **2.4 GHz Band:**
|
||||
```bash
|
||||
Network Name (SSID): YourNetwork_2.4G
|
||||
Security: WPA3-Personal (or WPA2/WPA3-Personal if older devices)
|
||||
Password: "REDACTED_PASSWORD" password - save to password manager]
|
||||
Channel: Auto (or manually select 1, 6, or 11)
|
||||
Channel Width: 40 MHz
|
||||
Transmit Power: High
|
||||
```
|
||||
|
||||
#### **5 GHz Band:**
|
||||
```bash
|
||||
Network Name (SSID): YourNetwork_5G
|
||||
Security: WPA3-Personal
|
||||
Password: "REDACTED_PASSWORD" as 2.4G or different - your choice]
|
||||
Channel: Auto (or manually select DFS channels for less congestion)
|
||||
Channel Width: 160 MHz (for maximum speed)
|
||||
Transmit Power: High
|
||||
```
|
||||
|
||||
#### **6 GHz Band (WiFi 7):**
|
||||
```bash
|
||||
Network Name (SSID): YourNetwork_6G
|
||||
Security: WPA3-Personal (required for 6 GHz)
|
||||
Password: "REDACTED_PASSWORD" as others or different]
|
||||
Channel: Auto
|
||||
Channel Width: 320 MHz (WiFi 7 feature)
|
||||
Transmit Power: High
|
||||
```
|
||||
|
||||
### **Guest Network (Optional):**
|
||||
```bash
|
||||
# Navigate to: Wireless → Guest Network
|
||||
|
||||
2.4 GHz Guest:
|
||||
Enable: ✅
|
||||
Network Name: YourNetwork_Guest
|
||||
Security: WPA3-Personal
|
||||
Password: "REDACTED_PASSWORD" password]
|
||||
Access: Internet Only (no local network access)
|
||||
Bandwidth Control: 50 Mbps (limit guest usage)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Security Configuration
|
||||
|
||||
### **Firewall Settings**
|
||||
```bash
|
||||
# Navigate to: Advanced → Security → Firewall
|
||||
|
||||
SPI Firewall: ✅ Enabled
|
||||
DoS Attack Protection: ✅ Enabled
|
||||
VPN Passthrough: ✅ Enabled (for WireGuard/Tailscale)
|
||||
UPnP: ✅ Enabled (for automatic port mapping)
|
||||
```
|
||||
|
||||
### **Access Control**
|
||||
```bash
|
||||
# Navigate to: Advanced → Security → Access Control
|
||||
|
||||
# Block malicious websites
|
||||
Online Security: ✅ Enabled
|
||||
|
||||
# Time-based access control (optional)
|
||||
Parental Controls: Configure as needed
|
||||
|
||||
# MAC Address Filtering (high security environments)
|
||||
Wireless MAC Filtering: Configure if needed
|
||||
```
|
||||
|
||||
### **Admin Security**
|
||||
```bash
|
||||
# Navigate to: Advanced → System → Administration
|
||||
|
||||
# Remote Management (disable for security)
|
||||
Web Management: Local Only
|
||||
SSH: Disabled (unless needed)
|
||||
Telnet: Disabled
|
||||
|
||||
# Session Timeout
|
||||
Timeout: 10 minutes
|
||||
|
||||
# HTTPS Management (enable for security)
|
||||
HTTPS: ✅ Enabled
|
||||
HTTP Redirect to HTTPS: ✅ Enabled
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚡ Performance Optimization
|
||||
|
||||
### **QoS Configuration**
|
||||
```bash
|
||||
# Navigate to: Advanced → QoS
|
||||
|
||||
# Enable QoS for better performance
|
||||
QoS: ✅ Enabled
|
||||
|
||||
# Set bandwidth limits (adjust for your internet speed)
|
||||
Upload Bandwidth: [Your upload speed - 10%]
|
||||
Download Bandwidth: [Your download speed - 10%]
|
||||
|
||||
# Device Priority (set homelab hosts to high priority)
|
||||
High Priority Devices:
|
||||
- atlantis (192.168.1.100)
|
||||
- calypso (192.168.1.101)
|
||||
- concord-nuc (192.168.1.102)
|
||||
|
||||
# Gaming Mode (if hosting game servers)
|
||||
Gaming Mode: ✅ Enabled
|
||||
Gaming Device: homelab-vm (192.168.1.103)
|
||||
```
|
||||
|
||||
### **Advanced Wireless Settings**
|
||||
```bash
|
||||
# Navigate to: Wireless → Advanced
|
||||
|
||||
# Optimize for performance
|
||||
Beamforming: ✅ Enabled
|
||||
Airtime Fairness: ✅ Enabled
|
||||
Band Steering: ✅ Enabled (automatically move devices to best band)
|
||||
Load Balancing: ✅ Enabled
|
||||
Fast Roaming: ✅ Enabled
|
||||
|
||||
# WiFi 7 Features (BE800 specific)
|
||||
Multi-Link Operation (MLO): ✅ Enabled
|
||||
320 MHz Channel Width: ✅ Enabled (6 GHz)
|
||||
4K-QAM: ✅ Enabled
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Homelab-Specific Features
|
||||
|
||||
### **Port Aggregation (Link Aggregation)**
|
||||
```bash
|
||||
# If you have multiple connections to NAS devices
|
||||
# Navigate to: Advanced → Network → Link Aggregation
|
||||
|
||||
# Configure LACP for Synology NAS (if supported)
|
||||
Group Name: NAS-Bond
|
||||
Member Ports: LAN1, LAN2
|
||||
Mode: 802.3ad (LACP)
|
||||
```
|
||||
|
||||
### **VLAN Configuration (Advanced)**
|
||||
```bash
|
||||
# Navigate to: Advanced → Network → VLAN
|
||||
|
||||
# Separate IoT devices (optional)
|
||||
VLAN ID: 10
|
||||
VLAN Name: IoT
|
||||
IP Range: 192.168.10.1/24
|
||||
DHCP: Enabled
|
||||
|
||||
# Separate guest network
|
||||
VLAN ID: 20
|
||||
VLAN Name: Guest
|
||||
IP Range: 192.168.20.1/24
|
||||
DHCP: Enabled
|
||||
```
|
||||
|
||||
### **VPN Server (Built-in)**
|
||||
```bash
|
||||
# Navigate to: Advanced → VPN Server
|
||||
|
||||
# OpenVPN Server (alternative to WireGuard)
|
||||
OpenVPN: ✅ Enabled
|
||||
Service Type: UDP
|
||||
Service Port: 1194
|
||||
Client Access: Internet and Home Network
|
||||
Max Clients: 10
|
||||
|
||||
# Generate certificates and download client config
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring and Maintenance
|
||||
|
||||
### **System Monitoring**
|
||||
```bash
|
||||
# Navigate to: Advanced → System → System Log
|
||||
|
||||
# Enable logging
|
||||
System Log: ✅ Enabled
|
||||
Log Level: Notice
|
||||
Remote Log: Configure if using centralized logging
|
||||
|
||||
# Monitor these logs:
|
||||
- DHCP assignments
|
||||
- Port forwarding activity
|
||||
- Security events
|
||||
- System errors
|
||||
```
|
||||
|
||||
### **Traffic Analysis**
|
||||
```bash
|
||||
# Navigate to: Advanced → Network → Traffic Analyzer
|
||||
|
||||
# Monitor bandwidth usage
|
||||
Traffic Analyzer: ✅ Enabled
|
||||
Real-time Monitor: ✅ Enabled
|
||||
|
||||
# Set up alerts for unusual traffic
|
||||
Bandwidth Monitor: ✅ Enabled
|
||||
Alert Threshold: 80% of total bandwidth
|
||||
```
|
||||
|
||||
### **Firmware Updates**
|
||||
```bash
|
||||
# Navigate to: Advanced → System → Firmware Update
|
||||
|
||||
# Check for updates monthly
|
||||
Auto Update: ✅ Enabled (or manual for stability)
|
||||
Update Check: Weekly
|
||||
Backup Settings: ✅ Before each update
|
||||
|
||||
# Current firmware info:
|
||||
Hardware Version: Archer BE800 v1.6
|
||||
Firmware Version: [Check TP-Link website for latest]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Disaster Recovery Procedures
|
||||
|
||||
### **Backup Router Configuration**
|
||||
```bash
|
||||
# Navigate to: Advanced → System → Backup & Restore
|
||||
|
||||
# Export current configuration
|
||||
Backup: Click "Backup"
|
||||
Save file as: archer-be800-config-$(date +%Y%m%d).bin
|
||||
Store in: ~/homelab-recovery/router-backups/
|
||||
|
||||
# Schedule regular backups (monthly)
|
||||
```
|
||||
|
||||
### **Factory Reset Procedure**
|
||||
```bash
|
||||
# If router becomes unresponsive:
|
||||
|
||||
# Method 1: Web Interface
|
||||
# Navigate to: Advanced → System → Backup & Restore
|
||||
# Click "Factory Restore"
|
||||
|
||||
# Method 2: Hardware Reset
|
||||
# 1. Power on router
|
||||
# 2. Hold Reset button for 10 seconds while powered on
|
||||
# 3. Release button and wait for reboot (2-3 minutes)
|
||||
# 4. Router will return to default settings (192.168.0.1)
|
||||
```
|
||||
|
||||
### **Quick Recovery Checklist**
|
||||
```bash
|
||||
# After factory reset or new router installation:
|
||||
|
||||
☐ Connect to http://192.168.0.1 (default IP)
|
||||
☐ Run initial setup wizard
|
||||
☐ Change router IP to 192.168.1.1
|
||||
☐ Reconnect to http://192.168.1.1
|
||||
☐ Configure DHCP pool (192.168.1.100-200)
|
||||
☐ Add all static IP reservations
|
||||
☐ Configure port forwarding rules
|
||||
☐ Set up Dynamic DNS
|
||||
☐ Configure WiFi networks
|
||||
☐ Enable security features
|
||||
☐ Restore from backup if available
|
||||
☐ Test all services and external access
|
||||
☐ Update documentation with any changes
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### **Common Issues and Solutions**
|
||||
|
||||
#### **Can't Access Router Interface**
|
||||
```bash
|
||||
# Check connection
|
||||
ping 192.168.1.1 # or 192.168.0.1 for default
|
||||
|
||||
# Clear browser cache
|
||||
Ctrl+F5 (Windows) or Cmd+Shift+R (Mac)
|
||||
|
||||
# Try different browser or incognito mode
|
||||
# Try direct IP: http://192.168.1.1
|
||||
# Try hostname: http://tplinkwifi.net
|
||||
|
||||
# Reset network adapter
|
||||
sudo dhclient -r && sudo dhclient # Linux
|
||||
ipconfig /release && ipconfig /renew # Windows
|
||||
```
|
||||
|
||||
#### **Slow WiFi Performance**
|
||||
```bash
|
||||
# Check channel congestion
|
||||
# Use WiFi analyzer app to find best channels
|
||||
|
||||
# Optimize settings:
|
||||
# - Use 160 MHz on 5 GHz
|
||||
# - Use 320 MHz on 6 GHz (WiFi 7)
|
||||
# - Enable all performance features
|
||||
# - Update device drivers
|
||||
# - Position router centrally and elevated
|
||||
```
|
||||
|
||||
#### **Port Forwarding Not Working**
|
||||
```bash
|
||||
# Verify settings:
|
||||
# 1. Correct internal IP address
|
||||
# 2. Service is running on internal host
|
||||
# 3. Firewall allows traffic on internal host
|
||||
# 4. External port is not blocked by ISP
|
||||
|
||||
# Test internal connectivity first:
|
||||
telnet 192.168.1.100 8341 # Test from inside network
|
||||
|
||||
# Test external connectivity:
|
||||
# Use online port checker or different network
|
||||
```
|
||||
|
||||
#### **DDNS Not Updating**
|
||||
```bash
|
||||
# Check DDNS status in router logs
|
||||
# Verify credentials are correct
|
||||
# Test manual update:
|
||||
curl -u "username:password" \
|
||||
"https://your-ddns-provider.com/update?hostname=yourdomain&myip=$(curl -s ifconfig.me)"
|
||||
|
||||
# Check if external IP changed:
|
||||
curl ifconfig.me
|
||||
nslookup yourdomain.ddns.net
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📱 Mobile App Management
|
||||
|
||||
### **TP-Link Tether App**
|
||||
```bash
|
||||
# Download from app store: "TP-Link Tether"
|
||||
|
||||
# Features available:
|
||||
- Remote router management
|
||||
- Guest network control
|
||||
- Device management
|
||||
- Parental controls
|
||||
- Speed test
|
||||
- Network map
|
||||
- Firmware updates
|
||||
|
||||
# Setup:
|
||||
# 1. Connect phone to router WiFi
|
||||
# 2. Open Tether app
|
||||
# 3. Create TP-Link ID account
|
||||
# 4. Add router to account
|
||||
# 5. Enable remote management
|
||||
```
|
||||
|
||||
### **Remote Management Setup**
|
||||
```bash
|
||||
# Navigate to: Advanced → System → TP-Link Cloud
|
||||
|
||||
# Enable cloud management
|
||||
TP-Link Cloud: ✅ Enabled
|
||||
Account: [Your TP-Link ID]
|
||||
Device Name: Homelab-Router-BE800
|
||||
|
||||
# Security considerations:
|
||||
# - Use strong TP-Link ID password
|
||||
# - Enable 2FA on TP-Link account
|
||||
# - Regularly review connected devices
|
||||
# - Disable if not needed for security
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Integration with Homelab Services
|
||||
|
||||
### **Pi-hole Integration**
|
||||
```bash
|
||||
# If running Pi-hole on Atlantis (192.168.1.100):
|
||||
|
||||
# Method 1: Router DNS Settings
|
||||
Primary DNS: 192.168.1.100
|
||||
Secondary DNS: 1.1.1.1
|
||||
|
||||
# Method 2: DHCP DNS Override
|
||||
# Advanced → Network → DHCP Server
|
||||
Primary DNS: 192.168.1.100
|
||||
Secondary DNS: 1.1.1.1
|
||||
|
||||
# This will make all devices use Pi-hole for DNS
|
||||
```
|
||||
|
||||
### **Tailscale Subnet Routing**
|
||||
```bash
|
||||
# Configure router to work with Tailscale subnet routing
|
||||
|
||||
# 1. Ensure UPnP is enabled (for automatic port mapping)
|
||||
# 2. Add static route if needed:
|
||||
# Advanced → Network → Routing
|
||||
# Destination: 100.64.0.0/10 (Tailscale network)
|
||||
# Gateway: 192.168.1.100 (Atlantis - Tailscale exit node)
|
||||
# Interface: LAN
|
||||
```
|
||||
|
||||
### **Monitoring Integration**
|
||||
```bash
|
||||
# Enable SNMP for monitoring (if needed)
|
||||
# Advanced → Network → SNMP
|
||||
|
||||
SNMP: ✅ Enabled
|
||||
Community: public (change for security)
|
||||
Contact: admin@yourdomain.com
|
||||
Location: Home Lab
|
||||
|
||||
# Add router to Prometheus monitoring:
|
||||
# - SNMP exporter configuration
|
||||
# - Router metrics in Grafana
|
||||
# - Bandwidth monitoring
|
||||
# - Device count tracking
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Configuration Summary
|
||||
|
||||
### **Quick Reference Settings**
|
||||
```bash
|
||||
# Network Configuration
|
||||
Router IP: 192.168.1.1
|
||||
Subnet: 192.168.1.0/24
|
||||
DHCP Range: 192.168.1.100-200
|
||||
DNS: 1.1.1.1, 8.8.8.8 (or Pi-hole)
|
||||
|
||||
# WiFi Networks
|
||||
2.4 GHz: YourNetwork_2.4G (WPA3, 40 MHz)
|
||||
5 GHz: YourNetwork_5G (WPA3, 160 MHz)
|
||||
6 GHz: YourNetwork_6G (WPA3, 320 MHz)
|
||||
|
||||
# Essential Port Forwards
|
||||
51820/UDP → 192.168.1.100:51820 (WireGuard Atlantis)
|
||||
51821/UDP → 192.168.1.102:51820 (WireGuard Concord)
|
||||
80/TCP → 192.168.1.100:8341 (HTTP Proxy)
|
||||
443/TCP → 192.168.1.100:8766 (HTTPS Proxy)
|
||||
|
||||
# Static IP Assignments
|
||||
Atlantis: 192.168.1.100
|
||||
Calypso: 192.168.1.101
|
||||
Concord-NUC: 192.168.1.102
|
||||
Homelab-VM: 192.168.1.103
|
||||
[... all other hosts as documented]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [Disaster Recovery Guide](../troubleshooting/disaster-recovery.md) - Complete router failure recovery
|
||||
- [Port Forwarding Guide](port-forwarding-guide.md) - Detailed port configuration theory
|
||||
- [Tailscale Setup Guide](tailscale-setup-guide.md) - Alternative to port forwarding
|
||||
- [Network Architecture](networking.md) - Overall network design
|
||||
- [Security Model](security.md) - Security considerations
|
||||
|
||||
---
|
||||
|
||||
**💡 Pro Tip**: The TP-Link Archer BE800 is a powerful WiFi 7 router with advanced features. Take advantage of the 320 MHz channels on 6 GHz for maximum performance with compatible devices, and use the multiple 2.5 Gbps ports for high-speed connections to your NAS devices!
|
||||
755
docs/infrastructure/ubiquiti-enterprise-setup.md
Normal file
755
docs/infrastructure/ubiquiti-enterprise-setup.md
Normal file
@@ -0,0 +1,755 @@
|
||||
# 🏢 Ubiquiti Enterprise Network Setup Guide
|
||||
|
||||
**🔴 Advanced Guide**
|
||||
|
||||
This guide covers deploying a complete Ubiquiti enterprise networking solution for your homelab, including Dream Machine, managed switches, access points, and advanced network segmentation.
|
||||
|
||||
## 🎯 Ubiquiti Enterprise Architecture
|
||||
|
||||
### **Complete Ubiquiti Stack**
|
||||
- **🌐 Dream Machine Pro/SE** - Gateway, controller, and security appliance
|
||||
- **🔌 UniFi Switch Pro 48** - 48-port managed switch with PoE++
|
||||
- **📡 UniFi Access Points** - WiFi 6E/7 coverage throughout property
|
||||
- **📹 UniFi Protect** - Integrated video surveillance
|
||||
- **📞 UniFi Talk** - VoIP phone system
|
||||
- **🚪 UniFi Access** - Door access control
|
||||
|
||||
### **Network Segmentation Strategy**
|
||||
```bash
|
||||
# VLAN Design for Homelab
|
||||
VLAN 1 - Management (192.168.1.0/24) # UniFi devices, infrastructure
|
||||
VLAN 10 - Homelab (192.168.10.0/24) # Servers, NAS, compute
|
||||
VLAN 20 - IoT (192.168.20.0/24) # Smart home devices
|
||||
VLAN 30 - Guest (192.168.30.0/24) # Guest network, isolated
|
||||
VLAN 40 - Security (192.168.40.0/24) # Cameras, access control
|
||||
VLAN 50 - DMZ (192.168.50.0/24) # Public-facing services
|
||||
VLAN 100 - Trunk (All VLANs) # Inter-VLAN routing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌐 Dream Machine Pro/SE Setup
|
||||
|
||||
### **Initial Configuration**
|
||||
|
||||
#### **Physical Setup**
|
||||
```bash
|
||||
# 1. Connect modem to WAN port (port 11 on UDM-Pro)
|
||||
# 2. Connect computer to LAN port (port 1-8)
|
||||
# 3. Power on and wait for LED to turn white (5-10 minutes)
|
||||
# 4. Access setup at: https://192.168.1.1
|
||||
```
|
||||
|
||||
#### **UniFi OS Setup**
|
||||
```bash
|
||||
# Initial setup wizard:
|
||||
# 1. Create UniFi account or sign in
|
||||
# 2. Set device name: "Homelab-UDM-Pro"
|
||||
# 3. Configure WiFi (temporary - will be replaced by APs)
|
||||
# 4. Set admin password (save to password manager)
|
||||
# 5. Enable automatic updates
|
||||
# 6. Complete setup and access UniFi Network
|
||||
```
|
||||
|
||||
### **Network Configuration**
|
||||
|
||||
#### **WAN Configuration**
|
||||
```bash
|
||||
# Navigate to: Settings → Internet
|
||||
|
||||
# WAN Settings:
|
||||
Connection Type: DHCP (or Static/PPPoE based on ISP)
|
||||
VLAN ID: [Leave blank unless ISP requires]
|
||||
DNS Servers: 1.1.1.1, 8.8.8.8 (or custom)
|
||||
IPv6: Enable if supported by ISP
|
||||
|
||||
# Advanced WAN Settings:
|
||||
Load Balancing: Disabled (single WAN)
|
||||
Smart Queues: Enable for QoS
|
||||
Bandwidth Limits: Set to 90% of actual speeds
|
||||
```
|
||||
|
||||
#### **LAN Configuration**
|
||||
```bash
|
||||
# Navigate to: Settings → Networks
|
||||
|
||||
# Default LAN Network:
|
||||
Name: Management
|
||||
VLAN ID: 1
|
||||
Gateway/Subnet: 192.168.1.1/24
|
||||
DHCP Range: 192.168.1.100-192.168.1.200
|
||||
DHCP Lease Time: 86400 seconds (24 hours)
|
||||
DNS Servers: 192.168.1.1 (UDM) or Pi-hole IP
|
||||
Domain Name: vish.local
|
||||
```
|
||||
|
||||
### **VLAN Configuration**
|
||||
|
||||
#### **Create VLANs**
|
||||
```bash
|
||||
# Navigate to: Settings → Networks → Create New Network
|
||||
|
||||
# Homelab VLAN
|
||||
Name: Homelab
|
||||
VLAN ID: 10
|
||||
Gateway/Subnet: 192.168.10.1/24
|
||||
DHCP Range: 192.168.10.100-192.168.10.200
|
||||
Purpose: Corporate
|
||||
IGMP Snooping: Enable
|
||||
Multicast DNS: Enable
|
||||
|
||||
# IoT VLAN
|
||||
Name: IoT
|
||||
VLAN ID: 20
|
||||
Gateway/Subnet: 192.168.20.1/24
|
||||
DHCP Range: 192.168.20.100-192.168.20.200
|
||||
Purpose: IoT
|
||||
Block LAN Access: Enable
|
||||
Internet Access: Enable
|
||||
|
||||
# Guest VLAN
|
||||
Name: Guest
|
||||
VLAN ID: 30
|
||||
Gateway/Subnet: 192.168.30.1/24
|
||||
DHCP Range: 192.168.30.100-192.168.30.200
|
||||
Purpose: Guest
|
||||
Guest Policy: Apply guest policies
|
||||
Bandwidth Limit: 50 Mbps down, 10 Mbps up
|
||||
|
||||
# Security VLAN
|
||||
Name: Security
|
||||
VLAN ID: 40
|
||||
Gateway/Subnet: 192.168.40.1/24
|
||||
DHCP Range: 192.168.40.100-192.168.40.200
|
||||
Purpose: Security
|
||||
IGMP Snooping: Enable
|
||||
|
||||
# DMZ VLAN
|
||||
Name: DMZ
|
||||
VLAN ID: 50
|
||||
Gateway/Subnet: 192.168.50.1/24
|
||||
DHCP Range: 192.168.50.100-192.168.50.200
|
||||
Purpose: Corporate
|
||||
```
|
||||
|
||||
### **Firewall Rules**
|
||||
|
||||
#### **Inter-VLAN Rules**
|
||||
```bash
|
||||
# Navigate to: Settings → Security → Traffic & Firewall Rules
|
||||
|
||||
# Allow Homelab to Management
|
||||
Name: Homelab-to-Management
|
||||
Rule Applied: Before Predefined Rules
|
||||
Action: Accept
|
||||
Source: Homelab Network (192.168.10.0/24)
|
||||
Destination: Management Network (192.168.1.0/24)
|
||||
Protocol: All
|
||||
|
||||
# Block IoT to other VLANs
|
||||
Name: Block-IoT-to-Internal
|
||||
Rule Applied: Before Predefined Rules
|
||||
Action: Drop
|
||||
Source: IoT Network (192.168.20.0/24)
|
||||
Destination: Management, Homelab Networks
|
||||
Protocol: All
|
||||
Logging: Enable
|
||||
|
||||
# Allow specific IoT to Homelab (for Home Assistant)
|
||||
Name: IoT-to-HomeAssistant
|
||||
Rule Applied: Before Predefined Rules
|
||||
Action: Accept
|
||||
Source: IoT Network (192.168.20.0/24)
|
||||
Destination: 192.168.10.102 (Home Assistant)
|
||||
Port: 8123
|
||||
Protocol: TCP
|
||||
|
||||
# Block Guest from all internal networks
|
||||
Name: Block-Guest-Internal
|
||||
Rule Applied: Before Predefined Rules
|
||||
Action: Drop
|
||||
Source: Guest Network (192.168.30.0/24)
|
||||
Destination: RFC1918 Networks
|
||||
Protocol: All
|
||||
Logging: Enable
|
||||
```
|
||||
|
||||
#### **Port Forwarding**
|
||||
```bash
|
||||
# Navigate to: Settings → Security → Internet Security → Port Forwarding
|
||||
|
||||
# WireGuard VPN
|
||||
Name: WireGuard-Atlantis
|
||||
From: WAN
|
||||
Port: 51820
|
||||
Forward IP: 192.168.10.100 (Atlantis)
|
||||
Forward Port: 51820
|
||||
Protocol: UDP
|
||||
Logging: Enable
|
||||
|
||||
# HTTPS Services
|
||||
Name: HTTPS-Proxy
|
||||
From: WAN
|
||||
Port: 443
|
||||
Forward IP: 192.168.10.100 (Atlantis)
|
||||
Forward Port: 8766
|
||||
Protocol: TCP
|
||||
Logging: Enable
|
||||
|
||||
# SSH Access (Non-standard port for security)
|
||||
Name: SSH-Management
|
||||
From: WAN
|
||||
Port: 2222
|
||||
Forward IP: 192.168.1.100 (Management host)
|
||||
Forward Port: 22
|
||||
Protocol: TCP
|
||||
Logging: Enable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔌 UniFi Switch Pro 48 Configuration
|
||||
|
||||
### **Physical Installation**
|
||||
```bash
|
||||
# 1. Mount in rack (1U height)
|
||||
# 2. Connect power (PoE++ requires both power inputs)
|
||||
# 3. Connect uplink to UDM-Pro (SFP+ for 10Gbps)
|
||||
# 4. Wait for adoption in UniFi Network controller
|
||||
```
|
||||
|
||||
### **Switch Configuration**
|
||||
|
||||
#### **Port Profiles**
|
||||
```bash
|
||||
# Navigate to: UniFi Devices → Switch → Ports
|
||||
|
||||
# Management Ports (1-8)
|
||||
Profile: Management
|
||||
VLAN: 1 (Management)
|
||||
PoE: Auto (for UniFi APs)
|
||||
Storm Control: Enable
|
||||
Port Isolation: Disable
|
||||
|
||||
# Homelab Servers (9-24)
|
||||
Profile: Homelab
|
||||
VLAN: 10 (Homelab)
|
||||
PoE: Auto
|
||||
Link Aggregation: Available for NAS
|
||||
Storm Control: Enable
|
||||
|
||||
# IoT Devices (25-32)
|
||||
Profile: IoT
|
||||
VLAN: 20 (IoT)
|
||||
PoE: Auto
|
||||
Storm Control: Enable
|
||||
Port Isolation: Enable
|
||||
|
||||
# Security Cameras (33-40)
|
||||
Profile: Security
|
||||
VLAN: 40 (Security)
|
||||
PoE: 802.3bt (PoE++)
|
||||
Storm Control: Enable
|
||||
|
||||
# DMZ Services (41-44)
|
||||
Profile: DMZ
|
||||
VLAN: 50 (DMZ)
|
||||
PoE: Disabled
|
||||
Storm Control: Enable
|
||||
|
||||
# Uplinks (45-48 + SFP+)
|
||||
Profile: Trunk
|
||||
VLANs: All (Tagged)
|
||||
Link Aggregation: Available
|
||||
```
|
||||
|
||||
#### **Link Aggregation (LACP)**
|
||||
```bash
|
||||
# For high-bandwidth devices (NAS, servers)
|
||||
# Navigate to: UniFi Devices → Switch → Settings → Link Aggregation
|
||||
|
||||
# Atlantis NAS (Primary)
|
||||
Name: Atlantis-LAG
|
||||
Ports: 9, 10
|
||||
Mode: LACP (802.3ad)
|
||||
Profile: Homelab
|
||||
|
||||
# Calypso NAS (Media)
|
||||
Name: Calypso-LAG
|
||||
Ports: 11, 12
|
||||
Mode: LACP (802.3ad)
|
||||
Profile: Homelab
|
||||
|
||||
# Uplink to UDM-Pro
|
||||
Name: Uplink-LAG
|
||||
Ports: SFP+ 1, SFP+ 2
|
||||
Mode: LACP (802.3ad)
|
||||
Profile: Trunk
|
||||
```
|
||||
|
||||
### **Advanced Switch Features**
|
||||
|
||||
#### **Storm Control**
|
||||
```bash
|
||||
# Navigate to: Settings → System → Advanced Features
|
||||
|
||||
# Enable storm control globally
|
||||
Broadcast Storm Control: 10% of port bandwidth
|
||||
Multicast Storm Control: 10% of port bandwidth
|
||||
Unknown Unicast Storm Control: 10% of port bandwidth
|
||||
```
|
||||
|
||||
#### **Spanning Tree Protocol**
|
||||
```bash
|
||||
# STP Configuration
|
||||
STP Mode: RSTP (Rapid Spanning Tree)
|
||||
Priority: 32768 (default)
|
||||
Forward Delay: 15 seconds
|
||||
Max Age: 20 seconds
|
||||
```
|
||||
|
||||
#### **IGMP Snooping**
|
||||
```bash
|
||||
# For multicast optimization (Plex, IPTV)
|
||||
IGMP Snooping: Enable
|
||||
IGMP Querier: Enable
|
||||
Fast Leave: Enable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📡 UniFi Access Points Configuration
|
||||
|
||||
### **Access Point Deployment**
|
||||
|
||||
#### **Recommended APs for Homelab**
|
||||
```bash
|
||||
# UniFi Access Point WiFi 7 Pro
|
||||
- WiFi 7 (802.11be)
|
||||
- 6 GHz support
|
||||
- 2.5 Gbps uplink
|
||||
- PoE+ powered
|
||||
- Coverage: ~2,500 sq ft
|
||||
|
||||
# UniFi Access Point WiFi 6 Long Range
|
||||
- WiFi 6 (802.11ax)
|
||||
- Extended range
|
||||
- 1 Gbps uplink
|
||||
- PoE powered
|
||||
- Coverage: ~3,000 sq ft
|
||||
|
||||
# UniFi Access Point WiFi 6 In-Wall
|
||||
- In-wall installation
|
||||
- Built-in switch ports
|
||||
- PoE powered
|
||||
- Coverage: ~1,500 sq ft
|
||||
```
|
||||
|
||||
#### **AP Placement Strategy**
|
||||
```bash
|
||||
# Coverage Planning:
|
||||
# 1. Central locations for maximum coverage
|
||||
# 2. Avoid interference sources (microwaves, baby monitors)
|
||||
# 3. Consider building materials (concrete, metal)
|
||||
# 4. Plan for both 2.4 GHz and 5/6 GHz coverage
|
||||
# 5. Use UniFi WiFiman app for site survey
|
||||
|
||||
# Recommended placement:
|
||||
Main Floor: 1x WiFi 7 Pro (central)
|
||||
Upper Floor: 1x WiFi 6 LR (central)
|
||||
Basement/Lab: 1x WiFi 6 Pro (near servers)
|
||||
Office: 1x WiFi 6 In-Wall (desk area)
|
||||
Outdoor: 1x WiFi 6 Mesh (if needed)
|
||||
```
|
||||
|
||||
### **WiFi Network Configuration**
|
||||
|
||||
#### **Create WiFi Networks**
|
||||
```bash
|
||||
# Navigate to: Settings → WiFi
|
||||
|
||||
# Main Network (Management + Homelab)
|
||||
Name: YourNetwork
|
||||
Password: "REDACTED_PASSWORD" password in password manager]
|
||||
Security: WPA3 Only
|
||||
VLAN: 1 (Management)
|
||||
Band: 2.4/5/6 GHz
|
||||
Channel Width: 160 MHz (5 GHz), 320 MHz (6 GHz)
|
||||
Transmit Power: Auto
|
||||
Fast Roaming: Enable
|
||||
BSS Transition: Enable
|
||||
UAPSD: Enable
|
||||
|
||||
# IoT Network
|
||||
Name: YourNetwork_IoT
|
||||
Password: "REDACTED_PASSWORD" password]
|
||||
Security: WPA2/WPA3
|
||||
VLAN: 20 (IoT)
|
||||
Band: 2.4/5 GHz (many IoT devices don't support 6 GHz)
|
||||
Channel Width: 80 MHz
|
||||
Client Isolation: Enable
|
||||
Block LAN Access: Enable
|
||||
|
||||
# Guest Network
|
||||
Name: YourNetwork_Guest
|
||||
Password: "REDACTED_PASSWORD" password or open with captive portal]
|
||||
Security: WPA2/WPA3
|
||||
VLAN: 30 (Guest)
|
||||
Band: 2.4/5 GHz
|
||||
Bandwidth Limit: 50 Mbps
|
||||
Time Limit: 8 hours
|
||||
Guest Policy: Apply restrictions
|
||||
```
|
||||
|
||||
#### **Advanced WiFi Settings**
|
||||
```bash
|
||||
# Navigate to: Settings → WiFi → Advanced
|
||||
|
||||
# Band Steering
|
||||
2.4 GHz: Enable
|
||||
5 GHz: Enable
|
||||
6 GHz: Enable (WiFi 7 APs)
|
||||
Prefer 5 GHz: Enable
|
||||
Prefer 6 GHz: Enable
|
||||
|
||||
# Airtime Fairness
|
||||
Enable: Yes (prevents slow devices from degrading performance)
|
||||
|
||||
# Multicast Enhancement
|
||||
Enable: Yes (improves streaming performance)
|
||||
|
||||
# Fast Roaming
|
||||
802.11r: Enable
|
||||
802.11k: Enable
|
||||
802.11v: Enable
|
||||
|
||||
# WiFi 6/7 Features
|
||||
OFDMA: Enable
|
||||
MU-MIMO: Enable
|
||||
BSS Coloring: Enable (WiFi 6/7)
|
||||
Target Wake Time: Enable
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📹 UniFi Protect Integration
|
||||
|
||||
### **UniFi Protect Setup**
|
||||
|
||||
#### **Camera Deployment**
|
||||
```bash
|
||||
# Recommended cameras for homelab security:
|
||||
|
||||
# UniFi Protect G5 Pro
|
||||
- 4K resolution
|
||||
- PoE++ powered
|
||||
- Night vision
|
||||
- Smart detection
|
||||
- Weatherproof
|
||||
|
||||
# UniFi Protect G4 Doorbell Pro
|
||||
- 2K resolution
|
||||
- Two-way audio
|
||||
- Package detection
|
||||
- PoE+ powered
|
||||
|
||||
# UniFi Protect G4 Bullet
|
||||
- 4K resolution
|
||||
- PoE+ powered
|
||||
- Infrared night vision
|
||||
- Vandal resistant
|
||||
```
|
||||
|
||||
#### **Storage Configuration**
|
||||
```bash
|
||||
# Navigate to: UniFi Protect → Settings → Storage
|
||||
|
||||
# Local Storage (UDM-Pro)
|
||||
Primary Storage: Internal HDD (3.5" bay)
|
||||
Capacity: 8TB+ recommended
|
||||
Retention: 30 days for 4K, 60 days for 1080p
|
||||
|
||||
# Network Storage (Optional)
|
||||
Secondary Storage: NAS (Atlantis/Calypso)
|
||||
Path: /volume1/surveillance
|
||||
Retention: 90+ days
|
||||
Backup: Enable automatic backup
|
||||
```
|
||||
|
||||
#### **Detection Settings**
|
||||
```bash
|
||||
# Smart Detection Configuration
|
||||
Person Detection: Enable
|
||||
Vehicle Detection: Enable
|
||||
Package Detection: Enable (doorbell)
|
||||
Animal Detection: Enable
|
||||
Motion Zones: Configure per camera
|
||||
Privacy Zones: Configure as needed
|
||||
|
||||
# Notifications
|
||||
Push Notifications: Enable for critical cameras
|
||||
Email Alerts: Configure for security events
|
||||
Webhook Integration: Home Assistant integration
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Advanced Security Configuration
|
||||
|
||||
### **Threat Management**
|
||||
```bash
|
||||
# Navigate to: Settings → Security → Threat Management
|
||||
|
||||
# IDS/IPS
|
||||
Intrusion Detection: Enable
|
||||
Intrusion Prevention: Enable
|
||||
Malware Blocking: Enable
|
||||
Ad Blocking: Enable (or use Pi-hole)
|
||||
Country Blocking: Configure as needed
|
||||
|
||||
# DPI (Deep Packet Inspection)
|
||||
Application Identification: Enable
|
||||
Traffic Analysis: Enable
|
||||
Bandwidth Monitoring: Enable
|
||||
```
|
||||
|
||||
### **VPN Server**
|
||||
```bash
|
||||
# Navigate to: Settings → VPN
|
||||
|
||||
# Site-to-Site VPN (for remote locations)
|
||||
VPN Type: L2TP
|
||||
Pre-shared Key: [Generate strong key]
|
||||
User Authentication: Local users
|
||||
DNS Servers: 192.168.1.1
|
||||
|
||||
# Remote Access VPN
|
||||
VPN Type: L2TP or WireGuard
|
||||
Network: 192.168.100.0/24 (VPN client pool)
|
||||
DNS: Push homelab DNS servers
|
||||
Routes: Push homelab networks
|
||||
```
|
||||
|
||||
### **Network Access Control**
|
||||
```bash
|
||||
# Navigate to: Settings → Security → Network Access Control
|
||||
|
||||
# Device Authentication
|
||||
802.1X: Enable for enterprise devices
|
||||
MAC Authentication: Enable for IoT devices
|
||||
Guest Portal: Enable for guest network
|
||||
RADIUS Server: Configure if using external auth
|
||||
|
||||
# Device Fingerprinting
|
||||
Device Classification: Enable
|
||||
Automatic VLAN Assignment: Configure rules
|
||||
Quarantine VLAN: 192.168.99.0/24
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring and Management
|
||||
|
||||
### **UniFi Network Monitoring**
|
||||
```bash
|
||||
# Navigate to: Insights → Overview
|
||||
|
||||
# Key Metrics to Monitor:
|
||||
- Bandwidth utilization per VLAN
|
||||
- Client count and distribution
|
||||
- AP performance and coverage
|
||||
- Switch port utilization
|
||||
- Security events and threats
|
||||
- Device health and uptime
|
||||
|
||||
# Alerts Configuration:
|
||||
- High bandwidth usage (>80%)
|
||||
- Device offline alerts
|
||||
- Security threat detection
|
||||
- Failed authentication attempts
|
||||
- Hardware health issues
|
||||
```
|
||||
|
||||
### **Integration with Homelab Monitoring**
|
||||
```bash
|
||||
# SNMP Configuration for Prometheus
|
||||
# Navigate to: Settings → System → Advanced
|
||||
|
||||
SNMP: Enable
|
||||
Community: homelab-monitoring
|
||||
Contact: admin@vish.local
|
||||
Location: Home Lab
|
||||
|
||||
# Add to Prometheus configuration:
|
||||
# /etc/prometheus/prometheus.yml
|
||||
- job_name: 'unifi'
|
||||
static_configs:
|
||||
- targets: ['192.168.1.1:161'] # UDM-Pro
|
||||
- targets: ['192.168.1.10:161'] # Switch
|
||||
metrics_path: /snmp
|
||||
params:
|
||||
module: [unifi]
|
||||
```
|
||||
|
||||
### **Grafana Dashboard**
|
||||
```bash
|
||||
# Import UniFi dashboards:
|
||||
# Dashboard ID: 11314 (UniFi Poller)
|
||||
# Dashboard ID: 11315 (UniFi Network Sites)
|
||||
|
||||
# Custom metrics to track:
|
||||
- Per-VLAN bandwidth usage
|
||||
- WiFi client distribution
|
||||
- Security event frequency
|
||||
- Device uptime statistics
|
||||
- PoE power consumption
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Migration from Consumer Router
|
||||
|
||||
### **Migration Strategy**
|
||||
```bash
|
||||
# Phase 1: Parallel Deployment
|
||||
# 1. Deploy UDM-Pro alongside existing router
|
||||
# 2. Configure VLANs and basic networking
|
||||
# 3. Test connectivity and performance
|
||||
# 4. Migrate non-critical devices first
|
||||
|
||||
# Phase 2: Service Migration
|
||||
# 1. Update DHCP reservations
|
||||
# 2. Migrate port forwarding rules
|
||||
# 3. Update DNS settings
|
||||
# 4. Test all services and external access
|
||||
|
||||
# Phase 3: Complete Cutover
|
||||
# 1. Move WAN connection to UDM-Pro
|
||||
# 2. Disable old router
|
||||
# 3. Update all device configurations
|
||||
# 4. Verify all services operational
|
||||
```
|
||||
|
||||
### **Configuration Migration**
|
||||
```bash
|
||||
# Export current router configuration
|
||||
# Document all settings:
|
||||
- Static IP assignments
|
||||
- Port forwarding rules
|
||||
- WiFi networks and passwords
|
||||
- DNS settings
|
||||
- DDNS configuration
|
||||
- VPN settings
|
||||
|
||||
# Import to UniFi:
|
||||
# Most settings need manual recreation
|
||||
# Use network discovery to identify devices
|
||||
# Update homelab documentation with new IPs
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Advanced Features
|
||||
|
||||
### **Software-Defined Perimeter**
|
||||
```bash
|
||||
# Zero Trust Network Access
|
||||
# Navigate to: Settings → Security → Identity Enterprise
|
||||
|
||||
# Configure identity-based access:
|
||||
- User authentication via LDAP/AD
|
||||
- Device certificates
|
||||
- Conditional access policies
|
||||
- Application-level security
|
||||
```
|
||||
|
||||
### **Network Segmentation Automation**
|
||||
```bash
|
||||
# Dynamic VLAN Assignment
|
||||
# Based on device type, user, or certificate
|
||||
|
||||
# Rules examples:
|
||||
Device Type: Security Camera → VLAN 40
|
||||
Device Type: IoT Sensor → VLAN 20
|
||||
User Group: Admin → VLAN 1
|
||||
User Group: Guest → VLAN 30
|
||||
Certificate: Homelab-Cert → VLAN 10
|
||||
```
|
||||
|
||||
### **API Integration**
|
||||
```bash
|
||||
# UniFi Controller API
|
||||
# For automation and custom integrations
|
||||
|
||||
# Generate API key:
|
||||
# Settings → Admins → Create API Key
|
||||
|
||||
# Example API calls:
|
||||
# Get device status
|
||||
curl -X GET "https://192.168.1.1:443/proxy/network/api/s/default/stat/device" \
|
||||
-H "Authorization: Bearer YOUR_API_KEY"
|
||||
|
||||
# Update device configuration
|
||||
curl -X PUT "https://192.168.1.1:443/proxy/network/api/s/default/rest/device/DEVICE_ID" \
|
||||
-H "Authorization: Bearer YOUR_API_KEY" \
|
||||
-d '{"name": "New Device Name"}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Deployment Checklist
|
||||
|
||||
### **Pre-Deployment**
|
||||
```bash
|
||||
☐ Plan VLAN structure and IP addressing
|
||||
☐ Document current network configuration
|
||||
☐ Order all Ubiquiti equipment
|
||||
☐ Plan physical installation locations
|
||||
☐ Prepare cable runs and power
|
||||
☐ Create migration timeline
|
||||
☐ Backup current router configuration
|
||||
☐ Notify users of planned downtime
|
||||
```
|
||||
|
||||
### **Installation Phase**
|
||||
```bash
|
||||
☐ Install UDM-Pro in rack/location
|
||||
☐ Install and configure switch
|
||||
☐ Install access points
|
||||
☐ Configure basic networking
|
||||
☐ Test internet connectivity
|
||||
☐ Configure VLANs and firewall rules
|
||||
☐ Test inter-VLAN communication
|
||||
☐ Configure WiFi networks
|
||||
☐ Test wireless connectivity
|
||||
```
|
||||
|
||||
### **Migration Phase**
|
||||
```bash
|
||||
☐ Migrate DHCP reservations
|
||||
☐ Update port forwarding rules
|
||||
☐ Configure DDNS
|
||||
☐ Test external access
|
||||
☐ Migrate devices to new VLANs
|
||||
☐ Update homelab service configurations
|
||||
☐ Test all services and applications
|
||||
☐ Update monitoring configurations
|
||||
☐ Update documentation
|
||||
☐ Decommission old equipment
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [Network Architecture](networking.md) - Overall network design
|
||||
- [Tailscale Setup Guide](tailscale-setup-guide.md) - VPN integration with enterprise networking
|
||||
- [Laptop Travel Setup](laptop-travel-setup.md) - Remote access through enterprise network
|
||||
- [Kubernetes Cluster Setup](kubernetes-cluster-setup.md) - Container orchestration on enterprise network
|
||||
- [TP-Link Archer BE800 Setup](tplink-archer-be800-setup.md) - Consumer router alternative
|
||||
- [Security Model](security.md) - Security architecture
|
||||
- [Disaster Recovery Guide](../troubleshooting/disaster-recovery.md) - Network recovery procedures
|
||||
|
||||
---
|
||||
|
||||
**💡 Pro Tip**: Start with a basic UniFi setup and gradually add advanced features. The UniFi ecosystem is powerful but complex - implement VLANs, security policies, and advanced features incrementally to avoid overwhelming complexity during initial deployment.
|
||||
Reference in New Issue
Block a user