Sanitized mirror from private repository - 2026-04-05 09:28:04 UTC
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled

This commit is contained in:
Gitea Mirror Bot
2026-04-05 09:28:04 +00:00
commit cd1988a21b
1390 changed files with 354660 additions and 0 deletions

View File

@@ -0,0 +1,106 @@
# Operational Notes & Known Issues
*Last Updated: 2026-01-26*
This document contains important operational notes, known issues, and fixes for the homelab infrastructure.
---
## Server-Specific Notes
### Concord NUC (100.72.55.21)
#### Node Exporter
- **Runs on bare metal** (not containerized)
- Port: 9100
- Prometheus scrapes successfully from `100.72.55.21:9100`
- Do NOT deploy containerized node_exporter - it will conflict with the host service
#### Watchtower
- Requires `DOCKER_API_VERSION=1.44` environment variable
- This is because the Portainer Edge Agent uses an older Docker API version
- Without this env var, watchtower fails with: `client version 1.25 is too old`
#### Invidious
- Health check reports "unhealthy" but the application works fine
- The health check calls `/api/v1/trending` which returns HTTP 500
- This is a known upstream issue with YouTube's API changes
- **Workaround**: Ignore the unhealthy status or modify the health check endpoint
---
## Prometheus Monitoring
### Active Targets (as of 2026-01-26)
| Job | Target | Status |
|-----|--------|--------|
| prometheus | prometheus:9090 | 🟢 UP |
| homelab-node | 100.67.40.126:9100 | 🟢 UP |
| atlantis-node | 100.83.230.112:9100 | 🟢 UP |
| atlantis-snmp | 100.83.230.112:9116 | 🟢 UP |
| calypso-node | 100.103.48.78:9100 | 🟢 UP |
| calypso-snmp | 100.103.48.78:9116 | 🟢 UP |
| concord-nuc-node | 100.72.55.21:9100 | 🟢 UP |
| setillo-node | 100.125.0.20:9100 | 🟢 UP |
| setillo-snmp | 100.125.0.20:9116 | 🟢 UP |
| truenas-node | 100.75.252.64:9100 | 🟢 UP |
| proxmox-node | 100.87.12.28:9100 | 🟢 UP |
| raspberry-pis (pi-5) | 100.77.151.40:9100 | 🟢 UP |
### Intentionally Offline Targets
| Job | Target | Reason |
|-----|--------|--------|
| raspberry-pis (pi-5-kevin) | 100.123.246.75:9100 | Intentionally offline |
| vmi2076105-node | 100.99.156.20:9100 | Intentionally offline |
---
## Deployment Architecture
### Git-Linked Stacks
- Most stacks are deployed from Gitea (`git.vish.gg/Vish/homelab`)
- Branch: `wip`
- Portainer pulls configs directly from the repo
- Changes to repo configs will affect deployed stacks on next redeploy/update
### Standalone Containers
The following containers are managed directly in Portainer (NOT Git-linked):
- `portainer` / `portainer_edge_agent` - Infrastructure
- `watchtower` - Auto-updates (on some servers)
- `node-exporter` containers (where not bare metal)
- Various testing/temporary containers
### Bare Metal Services
Some services run directly on hosts, not in containers:
- **Concord NUC**: node_exporter (port 9100)
---
## Common Issues & Solutions
### Issue: Watchtower restart loop on Edge Agent hosts
**Symptom**: Watchtower continuously restarts with API version error
**Cause**: Portainer Edge Agent uses older Docker API
**Solution**: Add `DOCKER_API_VERSION=1.44` to watchtower container environment
### Issue: Port 9100 already in use for node_exporter container
**Symptom**: Container fails to start, "address already in use"
**Cause**: node_exporter running on bare metal
**Solution**: Don't run containerized node_exporter; use the bare metal instance
### Issue: Invidious health check failing
**Symptom**: Container shows "unhealthy" but works fine
**Cause**: YouTube API changes causing /api/v1/trending to return 500
**Solution**: This is cosmetic; the app works. Consider updating health check endpoint.
---
## Maintenance Checklist
- [ ] Check Prometheus targets regularly for DOWN status
- [ ] Monitor watchtower logs for update failures
- [ ] Review Portainer for containers in restart loops
- [ ] Keep Git repo configs in sync with running stacks
- [ ] Document any manual container changes in this file