# Homelab VM Runbook *Proxmox VM - Monitoring & DevOps* **Endpoint ID:** 443399 **Status:** 🟢 Online **Hardware:** 4 vCPU, 28GB RAM **Access:** `192.168.0.210` --- ## Overview Homelab VM runs monitoring, alerting, and development services on Proxmox. ## Hardware Specs | Component | Specification | |----------|---------------| | Platform | Proxmox VE | | vCPU | 4 cores | | RAM | 28GB | | Storage | 100GB SSD | | Network | 1x 1GbE | ## Services ### Monitoring Stack | Service | Port | Purpose | |---------|------|---------| | **Prometheus** | 9090 | Metrics collection | | **Grafana** | 3000 | Dashboards | | **Alertmanager** | 9093 | Alert routing | | **Node Exporter** | 9100 | System metrics | | **cAdvisor** | 8080 | Container metrics | | **Uptime Kuma** | 3001 | Uptime monitoring | ### Development | Service | Port | Purpose | |---------|------|---------| | Gitea | 3000 | Git hosting | | Gitea Runner | 3008 | CI/CD runner | | OpenHands | 8000 | AI developer | ### Database | Service | Port | Purpose | |---------|------|---------| | PostgreSQL | 5432 | Database | | Redis | 6379 | Caching | --- ## Daily Operations ### Check Monitoring ```bash # Prometheus targets curl http://192.168.0.210:9090/api/v1/targets | jq # Grafana dashboards open http://192.168.0.210:3000 ``` ### Alert Status ```bash # Alertmanager open http://192.168.0.210:9093 # Check ntfy for alerts curl -s ntfy.vish.local/homelab-alerts | head -20 ``` --- ## Prometheus Configuration ### Scraping Targets - Node exporters (all hosts) - cAdvisor (all hosts) - Prometheus self-monitoring - Application-specific metrics ### Retention - Time: 30 days - Storage: 20GB ### Maintenance ```bash # Check TSDB size du -sh /var/lib/prometheus/ # Manual compaction docker exec prometheus promtool tsdb compact /prometheus ``` --- ## Grafana Dashboards ### Key Dashboards - Infrastructure Overview - Container Health - Network Traffic - Service-specific metrics ### Alert Rules - CPU > 80% for 5 minutes - Memory > 90% for 5 minutes - Disk > 85% - Service down > 2 minutes --- ## Common Issues ### Prometheus Not Scraping 1. Check targets: Prometheus UI → Status → Targets 2. Verify network connectivity 3. Check firewall rules 4. Review scrape errors in logs ### Grafana Dashboards Slow 1. Check Prometheus query performance 2. Reduce time range 3. Optimize queries 4. Check resource usage ### Alerts Not Firing 1. Verify Alertmanager config 2. Check ntfy integration 3. Review alert rules syntax 4. Test with artificial alert --- ## Maintenance ### Weekly - [ ] Review alert history - [ ] Check disk space - [ ] Verify backups ### Monthly - [ ] Clean old metrics - [ ] Update dashboards - [ ] Review alert thresholds ### Quarterly - [ ] Test alert notifications - [ ] Review retention policy - [ ] Optimize queries --- ## Backup Procedures ### Configuration ```bash # Grafana dashboards cp -r /opt/grafana/dashboards /backup/ # Prometheus rules cp -r /opt/prometheus/rules /backup/ ``` ### Ansible ```bash ansible-playbook ansible/automation/playbooks/backup_configs.yml --tags homelab_vm ``` --- ## Emergency Procedures ### Prometheus Full 1. Check storage: `docker system df` 2. Reduce retention in prometheus.yml 3. Delete old data: `docker exec prometheus rm -rf /prometheus/wal/*` 4. Restart container ### VM Down 1. Check Proxmox: `qm list` 2. Start VM: `qm start ` 3. Check console: `qm terminal ` 4. Review logs in Proxmox UI --- ## Useful Commands ```bash # SSH access ssh homelab@192.168.0.210 # Restart monitoring cd /opt/docker/prometheus && docker-compose restart cd /opt/docker/grafana && docker-compose restart # Check targets curl http://localhost:9090/api/v1/targets | jq '.data.activeTargets[] | select(.health=="down")' # View logs docker logs prometheus docker logs grafana docker logs alertmanager ``` --- ## Links - [Prometheus](http://192.168.0.210:9090) - [Grafana](http://192.168.0.210:3000) - [Alertmanager](http://192.168.0.210:9093) - [Uptime Kuma](http://192.168.0.210:3001)