20 KiB
Homelab Dashboard
Full-stack infrastructure monitoring and management UI
Service Overview
| Property | Value |
|---|---|
| Host | homelab-vm (192.168.0.210) |
| UI Port | 3100 |
| API Port | 18888 |
| URL | http://homelab.tail.vish.gg:3100 |
| Tech | Next.js 16 (standalone) + FastAPI + shadcn/ui + Tailwind CSS |
| Font | Exo 2 (bundled, weights 300-700) |
| Themes | 16 glassmorphism themes with localStorage persistence |
| AI Chat | Ollama-powered with live infrastructure data + repo doc search |
| Source | dashboard/ directory in homelab repo |
Architecture
The dashboard is two separate processes:
- FastAPI backend (
dashboard/api/, port 18888) -- aggregates data from Portainer, Prometheus, Ollama, SQLite databases, log files, and external APIs (Jellyfin, Sonarr, Radarr, etc.) - Next.js frontend (
dashboard/ui/, port 3100) -- SPA with SSE for real-time updates and SWR polling for periodic data refresh
Data flow
Browser --> Next.js (port 3100) --rewrites /api/*--> FastAPI (port 18888)
|
+---------+---------+
| | |
Portainer Prometheus SSH(olares)
(5 endpoints) (PromQL) (kubectl, nvidia-smi)
| | |
SQLite DBs Log files External APIs
(email, restarts) (/tmp/*.log) (Sonarr, Radarr, etc.)
- The backend reuses
scripts/lib/modules (portainer.py,ollama.py,prometheus.py) vialib_bridge.py - The frontend uses Next.js rewrites (
next.config.ts) to proxy/api/*to the backend, so the browser never contacts port 18888 directly - Real-time activity feed uses Server-Sent Events (SSE) with automatic reconnection (5s backoff)
- SWR polling with configurable intervals handles periodic data refresh
- Theme selection stored in
localStorageunder keyhomelab-theme
Pages (7 Tabs)
1. Dashboard (/)
Overview page with aggregate stats:
- Stat cards: total containers, emails processed today, unhealthy containers, Ollama status
- Activity feed: real-time SSE stream of parsed log events (email classifications, container restarts, backup results, drift detection, etc.)
- Calendar: upcoming events from Baikal CalDAV (192.168.0.200:12852)
- Jellyfin card: server status, active sessions, libraries
- Ollama/GPU card: GPU temp, VRAM usage, utilization %, power draw
- Host row: container counts per endpoint (Atlantis, Calypso, NUC, homelab, RPi5)
- Quick actions: restart Jellyfin, restart Ollama, pause/resume email organizers, run backup
- Disk usage: top disks by usage % from Prometheus
- Health score: 0-100 gauge with letter grade (A-F), computed from container health, GPU availability, Ollama status, backup status, and config drift
2. Infrastructure (/infrastructure)
- Container table with search, filter by endpoint, state badges
- Container log viewer modal
- Container restart button
- Olares K3s pod listing (all namespaces or filtered)
- GPU status card (nvidia-smi via SSH)
- Uptime Kuma monitors with up/down counts
- Disk usage from Prometheus
3. Media (/media)
- Jellyfin: now playing sessions, recently added items
- Plex: server status for Calypso and Atlantis instances, active sessions
- Tdarr Cluster: live worker progress bars with fps/ETA, node hardware info, error count, total space saved, files processed
- Sonarr: download queue, recent grab/import history
- Radarr: download queue, recent grab/import history
- SABnzbd: NZB download queue
- Deluge: torrent client status (active, downloading, seeding counts)
- Prowlarr: indexer stats (total, enabled, indexer list)
- Bazarr: subtitle status, SignalR connection state, wanted episodes/movies
- Audiobookshelf: library stats (audiobooks, ebooks, podcasts)
4. Automations (/automations)
- Email organizer stats: per-account (Gmail lz, Gmail dvish, Proton) with today's category breakdown, sender cache size
- Backup status: today's backup log entries, error detection
- Config drift: last drift check result, drift count
- Stack restarts: unhealthy container tracking entries from
stack-restart.db - Automation timeline: last run time for all 11 automation scripts
5. Expenses (/expenses)
- Monthly summary (total, count, month selector)
- Transactions table from
data/expenses.csv - Top 10 vendors by spend amount
6. Network (/network)
- AdGuard DNS: total queries, blocked count, average processing time
- AdGuard rewrites: full list of DNS rewrites
- Headscale: node list with online/offline status, IPs, last seen
- Cloudflare: full DNS record table (80 records) with name, type, content, proxied/DNS-only status. Summary badges show proxied vs DNS-only counts.
- Authentik SSO: users, active sessions, recent events
- Gitea: recent commits on homelab/main, open PRs
7. Logs (/logs)
Unified log viewer for all automation log files:
- List of available logs with file sizes
- Tail view (configurable line count, max 2000)
- Text search within log files
Available logs: stack-restart, backup, gmail-lz, gmail-dvish, proton, receipt, drift, digest, disk, changelog, subscription, pr-review
Features
AI Chat Widget
Bottom-left floating chat powered by Ollama (qwen3-coder). Enriches every query with:
- Live container counts, GPU status, email stats, Ollama status
- Contextual enrichment based on keywords (Headscale nodes for network questions, Jellyfin status for media questions, AdGuard stats for DNS questions)
- Repo doc search -- keyword-matches against
docs/andscripts/markdown and Python files - Responds with current data, not cached/stale answers
Quick Actions
Buttons on the Dashboard page that trigger backend actions:
- Restart Jellyfin --
kubectl rollout restarton Olares - Restart Ollama --
kubectl rollout restarton Olares - Pause organizers -- stops all 3 email organizer cron jobs via
gmail-organizer-ctl.sh stop - Resume organizers -- starts all 3 via
gmail-organizer-ctl.sh start - Run backup -- triggers
gmail-backup-daily.sh
Health Score
Scored 0-100 based on:
- Container health: -4 per non-running container (max -40)
- Unhealthy containers: -10 per unhealthy (max -20)
- GPU available: -10 if unavailable
- Ollama available: -10 if offline
- Backup status: -10 if errors, -5 if no log
- Config drift: -10 if drift detected
Grades: A (90+), B (80+), C (70+), D (60+), F (<60)
Cmd+K Global Search
Press Cmd+K (or Ctrl+K) to open a command palette that searches across all pages, containers, services, and actions. Fuzzy matching with keyboard navigation.
Click-to-Copy
IP addresses, hostnames, and other copyable values show a copy icon on hover. Clicking copies to clipboard with a toast confirmation.
Loading Skeletons and Empty States
All data cards show animated skeleton placeholders while loading. When a service is unavailable or returns no data, a descriptive empty state is shown instead of a blank card.
Custom Favicon
Custom homelab favicon (dashboard/ui/app/favicon.ico).
Keyboard Shortcuts
| Key | Action |
|---|---|
Cmd+K / Ctrl+K |
Global search |
1 |
Dashboard tab |
2 |
Infrastructure tab |
3 |
Media tab |
4 |
Automations tab |
5 |
Expenses tab |
6 |
Network tab |
7 |
Logs tab |
r |
Reload page |
Disabled when focus is in an input or textarea.
Other Features
- Auto-refresh countdown indicator
- Toast notifications for actions
- Mobile responsive
- Glassmorphism card styling with backdrop blur
- Sticky nav with accent gradient line
- Theme switcher in nav bar
API Endpoints
All endpoints are prefixed with /api/.
Overview Router (routers/overview.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/health |
Health check (returns {"status": "ok"}) |
| GET | /api/stats/overview |
Aggregate stats: container counts, GPU, emails today, unhealthy, Ollama |
| GET | /api/activity |
SSE stream of today's automation events (init + update events) |
| GET | /api/calendar |
Upcoming events from Baikal CalDAV |
| GET | /api/health-score |
Health score 0-100 with grade and detail breakdown |
| GET | /api/automation-timeline |
Last run times for all 11 automation scripts |
| GET | /api/disk-usage |
Disk usage from Prometheus (top 20 by usage %) |
| POST | /api/chat |
Chat with Ollama using live context + doc search |
| POST | /api/actions/restart-jellyfin |
Restart Jellyfin on Olares via kubectl |
| POST | /api/actions/restart-ollama |
Restart Ollama on Olares via kubectl |
| POST | /api/actions/pause-organizers |
Pause all email organizer cron jobs |
| POST | /api/actions/resume-organizers |
Resume all email organizer cron jobs |
| GET | /api/actions/organizer-status |
Check organizer running/paused status |
| POST | /api/actions/run-backup |
Trigger Gmail backup (up to 300s timeout) |
Containers Router (routers/containers.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/containers |
List all containers, optional ?endpoint= filter |
| GET | /api/containers/{id}/logs |
Container logs (requires ?endpoint= query param) |
| POST | /api/containers/{id}/restart |
Restart container (requires ?endpoint= query param) |
Media Router (routers/media.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/jellyfin/status |
Jellyfin server info, libraries, active sessions |
| GET | /api/jellyfin/latest |
Recently added items (last 10) |
| GET | /api/plex/status |
Plex server status for Calypso + Atlantis |
| GET | /api/sonarr/queue |
Sonarr download queue |
| GET | /api/sonarr/history |
Recent Sonarr grabs/imports (last 10) |
| GET | /api/radarr/queue |
Radarr download queue |
| GET | /api/radarr/history |
Recent Radarr grabs/imports (last 10) |
| GET | /api/sabnzbd/queue |
SABnzbd NZB download queue |
| GET | /api/prowlarr/stats |
Prowlarr indexer status (total, enabled, list) |
| GET | /api/bazarr/status |
Bazarr version, SignalR state, wanted counts |
| GET | /api/audiobookshelf/stats |
Library stats (items per library, total) |
| GET | /api/tdarr/cluster |
Tdarr cluster status: nodes, workers, progress, fps, space saved |
| GET | /api/deluge/status |
Deluge torrent status (total, active, downloading, seeding) |
Automations Router (routers/automations.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/automations/email |
Email organizer status for all 3 accounts |
| GET | /api/automations/restarts |
Recent unhealthy container tracking entries |
| GET | /api/automations/backup |
Today's backup log status and entries |
| GET | /api/automations/drift |
Config drift detection last result |
Expenses Router (routers/expenses.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/expenses |
List expenses, optional ?month=YYYY-MM filter |
| GET | /api/expenses/summary |
Monthly total, count, top 10 vendors |
Olares Router (routers/olares.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/olares/pods |
List K3s pods, optional ?namespace= filter |
| GET | /api/olares/gpu |
GPU status from nvidia-smi via SSH |
Network Router (routers/network.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/network/headscale |
Headscale node list with online status |
| GET | /api/network/adguard |
AdGuard DNS stats (queries, blocked, avg time) |
| GET | /api/network/adguard/rewrites |
AdGuard DNS rewrite list |
| GET | /api/network/cloudflare |
Cloudflare DNS records with name, type, content, proxied status |
| GET | /api/network/authentik |
Authentik users, sessions, recent events |
| GET | /api/network/gitea |
Recent commits and open PRs |
Kuma Router (routers/kuma.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/kuma/monitors |
All Uptime Kuma monitors with up/down status |
Logs Router (routers/logs.py)
| Method | Path | Description |
|---|---|---|
| GET | /api/logs |
List available log files with sizes |
| GET | /api/logs/{name} |
Get log contents, optional ?tail=N&search=term |
Themes
All 16 themes use glassmorphism (semi-transparent cards with backdrop blur) and CSS custom properties. Stored in dashboard/ui/lib/themes.ts.
| # | Theme | Style | Swatch |
|---|---|---|---|
| 1 | Midnight (default) | Dark blue-violet | #3b82f6 / #8b5cf6 |
| 2 | Light | Clean light mode | #2563eb / #e2e8f0 |
| 3 | Cyberpunk | Neon pink-cyan on dark | #ec4899 / #06b6d4 |
| 4 | Steampunk | Warm amber-copper on dark | #d4a76a / #b87333 |
| 5 | Portland | Forest green-teal | #15803d / #0e7490 |
| 6 | Racing | Red-zinc motorsport | #dc2626 / #a1a1aa |
| 7 | Ocean | Sky blue-teal depths | #0284c7 / #2dd4bf |
| 8 | Aurora | Green-violet northern lights | #4ade80 / #a78bfa |
| 9 | Sakura | Pink-rose cherry blossom | #f472b6 / #fda4af |
| 10 | Emerald | Deep emerald green | #34d399 / #059669 |
| 11 | Sunset | Orange-red warm tones | #f97316 / #dc2626 |
| 12 | Arctic | Ice blue-white frost | #38bdf8 / #e0f2fe |
| 13 | Crimson | Deep red on near-black | #ef4444 / #1a1a1a |
| 14 | Trinidad | Red-gold Caribbean | #ef4444 / #fbbf24 |
| 15 | Samurai | Red-white Japanese | #dc2626 / #fafafa |
| 16 | Supra | Orange on dark carbon | #f97316 / #18181b |
All dark themes except Light (#2). Theme switcher is in the nav bar. Selection persists across sessions via localStorage.
How to Run
Prerequisites
- Python 3.12+ with pip
- Node.js 22+
- SSH access to
olares,calypso,pi-5(for GPU, Headscale, Kuma queries) - Access to Portainer API (192.168.0.200:9443)
- Access to Prometheus (for disk usage queries)
Start API (development)
cd dashboard/api
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 18888 &
Start UI (development)
cd dashboard/ui
npm install
BACKEND_URL=http://localhost:18888 npm run dev -- -p 3100
Start UI (production)
cd dashboard/ui
BACKEND_URL=http://localhost:18888 npm run build
cp -r .next/static .next/standalone/.next/static
cp -r public .next/standalone/public
BACKEND_URL=http://localhost:18888 HOSTNAME=0.0.0.0 PORT=3100 node .next/standalone/server.js
Docker deployment
cd dashboard
docker compose up -d
Note: The docker-compose.yml uses network_mode: host and maps ports 8888 (API) and 3000 (UI). For production use, override the ports via environment variables or edit the compose file to use 18888/3100.
Backend Dependencies
From dashboard/api/requirements.txt:
fastapi==0.115.12uvicorn[standard]==0.34.2httpx==0.28.1pyyaml>=6.0sse-starlette==2.3.3
Plus the shared scripts/lib/ modules (mounted as a volume in Docker, or on the Python path when running directly).
Key Files
| Path | Purpose |
|---|---|
dashboard/api/main.py |
FastAPI app entry point, router registration |
dashboard/api/lib_bridge.py |
Bridges scripts/lib/ modules into the API |
dashboard/api/log_parser.py |
Parses automation logs into structured events for SSE |
dashboard/api/routers/*.py |
API route handlers (9 routers) |
dashboard/ui/app/layout.tsx |
Root layout with nav, theme provider, chat, shortcuts |
dashboard/ui/app/page.tsx |
Dashboard (overview) page |
dashboard/ui/app/*/page.tsx |
Tab pages (infrastructure, media, automations, expenses, network, logs) |
dashboard/ui/components/ |
17 UI components + shadcn/ui primitives |
dashboard/ui/lib/themes.ts |
All 16 theme definitions |
dashboard/ui/lib/api.ts |
API client (fetchAPI, postAPI) |
dashboard/ui/lib/use-sse.ts |
SSE hook with auto-reconnect |
dashboard/ui/next.config.ts |
Next.js config with /api/* rewrite to backend |
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| No data loading | API not running on port 18888 | Start uvicorn: uvicorn main:app --host 0.0.0.0 --port 18888 |
| "Invalid Date" in UI | API returning unexpected date format | Check backend response format, look at the specific router |
| Text hard to read on a theme | CSS custom property issue | Check dashboard/ui/lib/themes.ts for the theme's vars block, verify --foreground and --muted-foreground contrast |
| SSE not connecting | /api/activity endpoint not responding |
Check API is running, check Next.js rewrite in next.config.ts, check browser console for EventSource errors |
| Calendar empty | Baikal unreachable | Verify Baikal at http://192.168.0.200:12852 is running |
| GPU card shows unavailable | SSH to olares failing | Test ssh -o ConnectTimeout=3 olares nvidia-smi manually |
| Headscale shows empty | SSH to calypso failing | Test ssh calypso "sudo docker exec headscale headscale nodes list -o json" |
| Kuma monitors empty | SSH to pi-5 or sqlite3 query failing | Test ssh pi-5 "docker exec uptime-kuma sqlite3 /app/data/kuma.db 'SELECT COUNT(*) FROM monitor'" |
| Chat returns "Ollama is currently offline" | Ollama not running on Olares | Check Ollama pod: kubectl get pods -n ollamaserver-shared |
| Container logs failing | Wrong endpoint param | Ensure ?endpoint=atlantis (or other valid endpoint) is passed |
Tdarr Version Sync
All 5 Tdarr instances must run the same version. Images are pinned by SHA digest, not :latest tag.
| Host | Role | Hardware | Deployment |
|---|---|---|---|
| Atlantis | Server + Node | CPU (Xeon) | hosts/synology/atlantis/arr-suite/docker-compose.yml |
| Calypso | Node | CPU (Ryzen R1600) | hosts/synology/calypso/tdarr-node/docker-compose.yaml |
| Guava | Node | VAAPI (Radeon 760M) | hosts/truenas/guava/tdarr-node/docker-compose.yaml |
| PVE LXC 103 | Node | QSV (Intel) | hosts/proxmox/lxc/tdarr-node/docker-compose.yaml |
| Olares | Node | NVENC (RTX 5090) | olares/tdarr-node.yaml (K8s manifest) |
Olares Node (fastest)
- RTX 5090 with NVENC: h264_nvenc, hevc_nvenc, av1_nvenc all working
- Deployed as K8s Deployment in
tdarr-nodenamespace on Olares - HAMI bypass:
runtimeClassName: nvidia, nonvidia.com/gpuresource requests - NFS mounts:
/mnt/atlantis_media(media, read-only) +/mnt/atlantis_cache(cache, read-write) - Calico GlobalNetworkPolicy
allow-lan-to-tdarrfor LAN ingress + all egress - Recommended workers: GPU=2, CPU=0, Health check=1
- Custom Olares chart available at
olares/tdarr-node-chart.tgz
Auto-update prevention
- All Docker nodes have
com.centurylinklabs.watchtower.enable=falselabel - PVE LXC cron (
/etc/cron.d/tdarr-update) was removed - Guava Watchtower label flipped from
truetofalse - Olares node uses pinned digest in K8s manifest
To update all nodes
- Get new digest: check
ghcr.io/haveagitgat/tdarrandtdarr_nodelatest digest - Update all 5 files (4 compose + 1 K8s manifest) with the new digest
- Push to git — Atlantis/Calypso auto-deploy via GitOps
- Manually redeploy Guava:
ssh guava "cd /mnt/data/tdarr-node && sudo docker compose pull && sudo docker compose up -d" - Manually redeploy PVE:
ssh pve "pct exec 103 -- bash -c 'cd /opt/tdarr && docker compose pull && docker compose up -d'" - Redeploy Olares:
ssh olares "kubectl apply -f -" < olares/tdarr-node.yaml