Sanitized mirror from private repository - 2026-04-05 12:11:15 UTC
This commit is contained in:
151
docs/infrastructure/MONITORING_ARCHITECTURE.md
Normal file
151
docs/infrastructure/MONITORING_ARCHITECTURE.md
Normal file
@@ -0,0 +1,151 @@
|
||||
# Homelab Monitoring Architecture
|
||||
|
||||
This document explains the different monitoring setups in the homelab and their purposes.
|
||||
|
||||
## 🏗️ Architecture Overview
|
||||
|
||||
The homelab has **three distinct monitoring deployments** serving different purposes:
|
||||
|
||||
### 1. **Production GitOps Monitoring** (Primary)
|
||||
- **Location**: `hosts/vms/homelab-vm/monitoring.yaml`
|
||||
- **Deployment**: Portainer GitOps on homelab-vm
|
||||
- **Purpose**: Production monitoring for all homelab infrastructure
|
||||
- **Access**: https://gf.vish.gg (with Authentik SSO)
|
||||
- **Status**: ✅ **ACTIVE** - This is the canonical monitoring stack
|
||||
|
||||
**Features:**
|
||||
- Monitors all homelab devices (Synology NAS, nodes, VMs)
|
||||
- Authentik OAuth2 SSO integration
|
||||
- Embedded dashboard configs in Docker Compose
|
||||
- Auto-provisioned datasources and dashboards
|
||||
- SNMP monitoring for Synology devices
|
||||
|
||||
### 2. **Fixed Development Stack** (New)
|
||||
- **Location**: `docker/monitoring/`
|
||||
- **Deployment**: Standalone Docker Compose
|
||||
- **Purpose**: Development/testing with fixed dashboard issues
|
||||
- **Access**: http://localhost:3300 (admin/admin)
|
||||
- **Status**: 🔧 **DEVELOPMENT** - For testing and dashboard fixes
|
||||
|
||||
**Features:**
|
||||
- All dashboard datasource UIDs fixed
|
||||
- Template variables working correctly
|
||||
- Instance filters properly configured
|
||||
- Verification scripts included
|
||||
- Backup/restore functionality
|
||||
|
||||
### 3. **Atlantis Legacy Setup** (Deprecated)
|
||||
- **Location**: `hosts/synology/atlantis/grafana_prometheus/`
|
||||
- **Deployment**: Synology Docker on Atlantis
|
||||
- **Purpose**: Legacy monitoring setup
|
||||
- **Status**: 📦 **ARCHIVED** - Kept for reference
|
||||
|
||||
## 🔄 GitOps Workflow
|
||||
|
||||
### Production Deployment (homelab-vm)
|
||||
```bash
|
||||
# GitOps automatically deploys from:
|
||||
hosts/vms/homelab-vm/monitoring.yaml
|
||||
|
||||
# Portainer Stack Details:
|
||||
# - Stack ID: 476
|
||||
# - Endpoint: 443399
|
||||
# - Auto-updates from git repository
|
||||
```
|
||||
|
||||
### Development Testing (docker/monitoring)
|
||||
```bash
|
||||
# Manual deployment for testing:
|
||||
cd docker/monitoring
|
||||
docker-compose up -d
|
||||
|
||||
# Verify dashboards:
|
||||
./verify-dashboard-sections.sh
|
||||
```
|
||||
|
||||
## 📊 Dashboard Status
|
||||
|
||||
| Dashboard | Production (GitOps) | Development (Fixed) | Status |
|
||||
|-----------|-------------------|-------------------|---------|
|
||||
| Infrastructure Overview | ✅ Working | ✅ Fixed | Both functional |
|
||||
| Synology NAS Monitoring | ⚠️ Needs UID fix | ✅ Fixed | Dev has fixes |
|
||||
| Node Exporter Full | ⚠️ Needs UID fix | ✅ Fixed | Dev has fixes |
|
||||
| Node Details | ⚠️ Needs UID fix | ✅ Fixed | Dev has fixes |
|
||||
|
||||
## 🔧 Applying Fixes to Production
|
||||
|
||||
To apply the dashboard fixes to the production GitOps deployment:
|
||||
|
||||
1. **Extract fixed dashboards** from `docker/monitoring/grafana/dashboards/`
|
||||
2. **Update the embedded configs** in `hosts/vms/homelab-vm/monitoring.yaml`
|
||||
3. **Test locally** using the development stack
|
||||
4. **Commit changes** - GitOps will auto-deploy
|
||||
|
||||
### Example: Updating Synology Dashboard in GitOps
|
||||
|
||||
```bash
|
||||
# 1. Extract the fixed dashboard JSON
|
||||
cat docker/monitoring/grafana/dashboards/synology-nas-monitoring.json
|
||||
|
||||
# 2. Update the embedded config in monitoring.yaml
|
||||
# Replace the dashboard_synology config content with the fixed JSON
|
||||
|
||||
# 3. Commit and push - GitOps handles deployment
|
||||
git add hosts/vms/homelab-vm/monitoring.yaml
|
||||
git commit -m "Fix Synology dashboard datasource UID in GitOps"
|
||||
git push
|
||||
```
|
||||
|
||||
## 🚀 Deployment Commands
|
||||
|
||||
### Production (GitOps - Automatic)
|
||||
```bash
|
||||
# No manual deployment needed
|
||||
# Portainer GitOps auto-deploys from git repository
|
||||
# Access: https://gf.vish.gg
|
||||
```
|
||||
|
||||
### Development (Manual)
|
||||
```bash
|
||||
cd docker/monitoring
|
||||
docker-compose up -d
|
||||
# Access: http://localhost:3300
|
||||
```
|
||||
|
||||
### Legacy (Manual - Not Recommended)
|
||||
```bash
|
||||
cd hosts/synology/atlantis/grafana_prometheus
|
||||
# Deploy via Synology Docker UI
|
||||
```
|
||||
|
||||
## 📋 Maintenance
|
||||
|
||||
### Updating Production Dashboards
|
||||
1. Test fixes in `docker/monitoring/` first
|
||||
2. Update embedded configs in `hosts/vms/homelab-vm/monitoring.yaml`
|
||||
3. Commit changes for GitOps auto-deployment
|
||||
|
||||
### Backup Strategy
|
||||
- **Production**: Automated via GitOps repository
|
||||
- **Development**: Use `backup.sh` and `restore.sh` scripts
|
||||
- **Legacy**: Manual Synology backup
|
||||
|
||||
## 🔍 Troubleshooting
|
||||
|
||||
### Dashboard "No Data" Issues
|
||||
1. Check datasource UID matches Prometheus instance
|
||||
2. Verify template variables have correct queries
|
||||
3. Ensure instance filters are not empty
|
||||
4. Use development stack to test fixes first
|
||||
|
||||
### GitOps Deployment Issues
|
||||
1. Check Portainer stack logs
|
||||
2. Verify git repository connectivity
|
||||
3. Ensure Docker configs are valid YAML
|
||||
4. Test locally with development stack
|
||||
|
||||
## 📚 Related Documentation
|
||||
|
||||
- [Dashboard Verification Report](docker/monitoring/dashboard-verification-report.md)
|
||||
- [Synology Dashboard Fix Report](docker/monitoring/synology-dashboard-fix-report.md)
|
||||
- [Development Stack README](docker/monitoring/README.md)
|
||||
Reference in New Issue
Block a user