Files
homelab-optimized/docs/admin/OPERATIONAL_NOTES.md
Gitea Mirror Bot 72afe8052c
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m0s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-20 00:58:22 UTC
2026-04-20 00:58:22 +00:00

3.6 KiB

Operational Notes & Known Issues

Last Updated: 2026-01-26

This document contains important operational notes, known issues, and fixes for the homelab infrastructure.


Server-Specific Notes

Concord NUC (100.72.55.21)

Node Exporter

  • Runs on bare metal (not containerized)
  • Port: 9100
  • Prometheus scrapes successfully from 100.72.55.21:9100
  • Do NOT deploy containerized node_exporter - it will conflict with the host service

Watchtower

  • Requires DOCKER_API_VERSION=1.44 environment variable
  • This is because the Portainer Edge Agent uses an older Docker API version
  • Without this env var, watchtower fails with: client version 1.25 is too old

Invidious

  • Health check reports "unhealthy" but the application works fine
  • The health check calls /api/v1/trending which returns HTTP 500
  • This is a known upstream issue with YouTube's API changes
  • Workaround: Ignore the unhealthy status or modify the health check endpoint

Prometheus Monitoring

Active Targets (as of 2026-01-26)

Job Target Status
prometheus prometheus:9090 🟢 UP
homelab-node 100.67.40.126:9100 🟢 UP
atlantis-node 100.83.230.112:9100 🟢 UP
atlantis-snmp 100.83.230.112:9116 🟢 UP
calypso-node 100.103.48.78:9100 🟢 UP
calypso-snmp 100.103.48.78:9116 🟢 UP
concord-nuc-node 100.72.55.21:9100 🟢 UP
setillo-node 100.125.0.20:9100 🟢 UP
setillo-snmp 100.125.0.20:9116 🟢 UP
truenas-node 100.75.252.64:9100 🟢 UP
proxmox-node 100.87.12.28:9100 🟢 UP
raspberry-pis (pi-5) 100.77.151.40:9100 🟢 UP

Intentionally Offline Targets

Job Target Reason
raspberry-pis (pi-5-kevin) 100.123.246.75:9100 Intentionally offline
vmi2076105-node 100.99.156.20:9100 Intentionally offline

Deployment Architecture

Git-Linked Stacks

  • Most stacks are deployed from Gitea (git.vish.gg/Vish/homelab)
  • Branch: wip
  • Portainer pulls configs directly from the repo
  • Changes to repo configs will affect deployed stacks on next redeploy/update

Standalone Containers

The following containers are managed directly in Portainer (NOT Git-linked):

  • portainer / portainer_edge_agent - Infrastructure
  • watchtower - Auto-updates (on some servers)
  • node-exporter containers (where not bare metal)
  • Various testing/temporary containers

Bare Metal Services

Some services run directly on hosts, not in containers:

  • Concord NUC: node_exporter (port 9100)

Common Issues & Solutions

Issue: Watchtower restart loop on Edge Agent hosts

Symptom: Watchtower continuously restarts with API version error Cause: Portainer Edge Agent uses older Docker API Solution: Add DOCKER_API_VERSION=1.44 to watchtower container environment

Issue: Port 9100 already in use for node_exporter container

Symptom: Container fails to start, "address already in use" Cause: node_exporter running on bare metal Solution: Don't run containerized node_exporter; use the bare metal instance

Issue: Invidious health check failing

Symptom: Container shows "unhealthy" but works fine Cause: YouTube API changes causing /api/v1/trending to return 500 Solution: This is cosmetic; the app works. Consider updating health check endpoint.


Maintenance Checklist

  • Check Prometheus targets regularly for DOWN status
  • Monitor watchtower logs for update failures
  • Review Portainer for containers in restart loops
  • Keep Git repo configs in sync with running stacks
  • Document any manual container changes in this file