Sanitized mirror from private repository - 2026-03-24 11:56:17 UTC
This commit is contained in:
136
docs/MONITORING_UPDATE_SEATTLE.md
Normal file
136
docs/MONITORING_UPDATE_SEATTLE.md
Normal file
@@ -0,0 +1,136 @@
|
||||
# Seattle Machine Monitoring Update
|
||||
|
||||
## Summary
|
||||
|
||||
Successfully updated the homelab monitoring system to replace the decommissioned VMI (100.99.156.20) with the reprovisioned Seattle machine (100.82.197.124).
|
||||
|
||||
## Changes Made
|
||||
|
||||
### 1. Prometheus Configuration Update
|
||||
|
||||
**File**: `/home/homelab/docker/monitoring/prometheus/prometheus.yml`
|
||||
|
||||
**Before**:
|
||||
```yaml
|
||||
- job_name: "vmi2076105-node"
|
||||
static_configs:
|
||||
- targets: ["100.99.156.20:9100"]
|
||||
```
|
||||
|
||||
**After**:
|
||||
```yaml
|
||||
- job_name: "seattle-node"
|
||||
static_configs:
|
||||
- targets: ["100.82.197.124:9100"]
|
||||
```
|
||||
|
||||
### 2. Seattle Machine Configuration
|
||||
|
||||
#### Node Exporter Installation
|
||||
- Node exporter was already running on the Seattle machine
|
||||
- Service status: `active (running)` on port 9100
|
||||
- Binary location: `/usr/local/bin/node_exporter`
|
||||
|
||||
#### Firewall Configuration
|
||||
Added UFW rule to allow Tailscale network access:
|
||||
```bash
|
||||
sudo ufw allow from 100.64.0.0/10 to any port 9100 comment 'Allow Tailscale to node_exporter'
|
||||
```
|
||||
|
||||
#### SSH Access
|
||||
- Accessible via `ssh seattle-tailscale` (configured in SSH config)
|
||||
- Tailscale IP: 100.82.197.124
|
||||
- Standard SSH key authentication
|
||||
|
||||
### 3. Monitoring Verification
|
||||
|
||||
#### Prometheus Targets Status
|
||||
All monitoring targets are now healthy:
|
||||
- **prometheus**: localhost:9090 ✅ UP
|
||||
- **alertmanager**: alertmanager:9093 ✅ UP
|
||||
- **node-exporter**: localhost:9100 ✅ UP
|
||||
- **calypso-node**: 100.75.252.64:9100 ✅ UP
|
||||
- **seattle-node**: 100.82.197.124:9100 ✅ UP
|
||||
- **proxmox-node**: 100.87.12.28:9100 ✅ UP
|
||||
|
||||
#### Metrics Collection
|
||||
- Seattle machine metrics are being successfully scraped
|
||||
- CPU, memory, disk, and network metrics available
|
||||
- Historical data collection started immediately after configuration
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Network Configuration
|
||||
- **Tailscale Network**: 100.64.0.0/10
|
||||
- **Seattle IP**: 100.82.197.124
|
||||
- **Monitoring Port**: 9100 (node_exporter)
|
||||
- **Protocol**: HTTP (internal network)
|
||||
|
||||
### Service Architecture
|
||||
```
|
||||
Prometheus (homelab) → Tailscale Network → Seattle Machine:9100 (node_exporter)
|
||||
```
|
||||
|
||||
### Configuration Files Updated
|
||||
1. `/home/homelab/docker/monitoring/prometheus/prometheus.yml` - Production config
|
||||
2. `/home/homelab/organized/repos/homelab/prometheus/prometheus.yml` - Repository config
|
||||
3. Fixed YAML indentation issues for alertmanager targets
|
||||
|
||||
## Verification Steps Completed
|
||||
|
||||
1. ✅ SSH connectivity to Seattle machine
|
||||
2. ✅ Node exporter service running and accessible
|
||||
3. ✅ Firewall rules configured for Tailscale access
|
||||
4. ✅ Prometheus configuration updated and reloaded
|
||||
5. ✅ Target health verification (UP status)
|
||||
6. ✅ Metrics scraping confirmed
|
||||
7. ✅ Repository configuration synchronized
|
||||
8. ✅ Git commit with detailed change log
|
||||
|
||||
## Monitoring Capabilities
|
||||
|
||||
The Seattle machine now provides the following metrics:
|
||||
- **System**: CPU usage, load average, uptime
|
||||
- **Memory**: Total, available, used, cached
|
||||
- **Disk**: Usage, I/O statistics, filesystem metrics
|
||||
- **Network**: Interface statistics, traffic counters
|
||||
- **Process**: Running processes, file descriptors
|
||||
|
||||
## Alert Coverage
|
||||
|
||||
The Seattle machine is now covered by all existing alert rules:
|
||||
- **InstanceDown**: Triggers if node_exporter becomes unavailable
|
||||
- **HighCPUUsage**: Alerts when CPU usage > 80% for 2+ minutes
|
||||
- **HighMemoryUsage**: Alerts when memory usage > 90% for 2+ minutes
|
||||
- **DiskSpaceLow**: Alerts when root filesystem < 10% free space
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. **Monitor Performance**: Watch Seattle machine metrics for baseline establishment
|
||||
2. **Alert Tuning**: Adjust thresholds if needed based on Seattle machine characteristics
|
||||
3. **Documentation**: This update is documented in the homelab repository
|
||||
4. **Backup Verification**: Ensure Seattle machine is included in backup monitoring
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If issues arise, the configuration can be quickly reverted:
|
||||
|
||||
```bash
|
||||
# Revert Prometheus config
|
||||
cd /home/homelab/docker/monitoring
|
||||
git checkout HEAD~1 prometheus/prometheus.yml
|
||||
docker compose restart prometheus
|
||||
```
|
||||
|
||||
## Contact Information
|
||||
|
||||
- **Updated By**: OpenHands Agent
|
||||
- **Date**: February 15, 2026
|
||||
- **Commit**: fee90008 - "Update monitoring: Replace VMI with Seattle machine"
|
||||
- **Repository**: homelab.git
|
||||
|
||||
---
|
||||
|
||||
**Status**: ✅ COMPLETED SUCCESSFULLY
|
||||
**Monitoring**: ✅ ACTIVE AND HEALTHY
|
||||
**Documentation**: ✅ UPDATED
|
||||
Reference in New Issue
Block a user