# Operational Notes & Known Issues *Last Updated: 2026-01-26* This document contains important operational notes, known issues, and fixes for the homelab infrastructure. --- ## Server-Specific Notes ### Concord NUC (100.72.55.21) #### Node Exporter - **Runs on bare metal** (not containerized) - Port: 9100 - Prometheus scrapes successfully from `100.72.55.21:9100` - Do NOT deploy containerized node_exporter - it will conflict with the host service #### Watchtower - Requires `DOCKER_API_VERSION=1.44` environment variable - This is because the Portainer Edge Agent uses an older Docker API version - Without this env var, watchtower fails with: `client version 1.25 is too old` #### Invidious - Health check reports "unhealthy" but the application works fine - The health check calls `/api/v1/trending` which returns HTTP 500 - This is a known upstream issue with YouTube's API changes - **Workaround**: Ignore the unhealthy status or modify the health check endpoint --- ## Prometheus Monitoring ### Active Targets (as of 2026-01-26) | Job | Target | Status | |-----|--------|--------| | prometheus | prometheus:9090 | 🟢 UP | | homelab-node | 100.67.40.126:9100 | 🟢 UP | | atlantis-node | 100.83.230.112:9100 | 🟢 UP | | atlantis-snmp | 100.83.230.112:9116 | 🟢 UP | | calypso-node | 100.103.48.78:9100 | 🟢 UP | | calypso-snmp | 100.103.48.78:9116 | 🟢 UP | | concord-nuc-node | 100.72.55.21:9100 | 🟢 UP | | setillo-node | 100.125.0.20:9100 | 🟢 UP | | setillo-snmp | 100.125.0.20:9116 | 🟢 UP | | truenas-node | 100.75.252.64:9100 | 🟢 UP | | proxmox-node | 100.87.12.28:9100 | 🟢 UP | | raspberry-pis (pi-5) | 100.77.151.40:9100 | 🟢 UP | ### Intentionally Offline Targets | Job | Target | Reason | |-----|--------|--------| | raspberry-pis (pi-5-kevin) | 100.123.246.75:9100 | Intentionally offline | | vmi2076105-node | 100.99.156.20:9100 | Intentionally offline | --- ## Deployment Architecture ### Git-Linked Stacks - Most stacks are deployed from Gitea (`git.vish.gg/Vish/homelab`) - Branch: `wip` - Portainer pulls configs directly from the repo - Changes to repo configs will affect deployed stacks on next redeploy/update ### Standalone Containers The following containers are managed directly in Portainer (NOT Git-linked): - `portainer` / `portainer_edge_agent` - Infrastructure - `watchtower` - Auto-updates (on some servers) - `node-exporter` containers (where not bare metal) - Various testing/temporary containers ### Bare Metal Services Some services run directly on hosts, not in containers: - **Concord NUC**: node_exporter (port 9100) --- ## Common Issues & Solutions ### Issue: Watchtower restart loop on Edge Agent hosts **Symptom**: Watchtower continuously restarts with API version error **Cause**: Portainer Edge Agent uses older Docker API **Solution**: Add `DOCKER_API_VERSION=1.44` to watchtower container environment ### Issue: Port 9100 already in use for node_exporter container **Symptom**: Container fails to start, "address already in use" **Cause**: node_exporter running on bare metal **Solution**: Don't run containerized node_exporter; use the bare metal instance ### Issue: Invidious health check failing **Symptom**: Container shows "unhealthy" but works fine **Cause**: YouTube API changes causing /api/v1/trending to return 500 **Solution**: This is cosmetic; the app works. Consider updating health check endpoint. --- ## Maintenance Checklist - [ ] Check Prometheus targets regularly for DOWN status - [ ] Monitor watchtower logs for update failures - [ ] Review Portainer for containers in restart loops - [ ] Keep Git repo configs in sync with running stacks - [ ] Document any manual container changes in this file