# Homelab Dashboard **Full-stack infrastructure monitoring and management UI** ## Service Overview | Property | Value | |----------|-------| | **Host** | homelab-vm (192.168.0.210) | | **UI Port** | 3100 | | **API Port** | 18888 | | **URL** | http://homelab.tail.vish.gg:3100 | | **Tech** | Next.js 16 (standalone) + FastAPI + shadcn/ui + Tailwind CSS | | **Font** | Exo 2 (bundled, weights 300-700) | | **Themes** | 16 glassmorphism themes with localStorage persistence | | **AI Chat** | Ollama-powered with live infrastructure data + repo doc search | | **Source** | `dashboard/` directory in homelab repo | ## Architecture The dashboard is two separate processes: 1. **FastAPI backend** (`dashboard/api/`, port 18888) -- aggregates data from Portainer, Prometheus, Ollama, SQLite databases, log files, and external APIs (Jellyfin, Sonarr, Radarr, etc.) 2. **Next.js frontend** (`dashboard/ui/`, port 3100) -- SPA with SSE for real-time updates and SWR polling for periodic data refresh ### Data flow ``` Browser --> Next.js (port 3100) --rewrites /api/*--> FastAPI (port 18888) | +---------+---------+ | | | Portainer Prometheus SSH(olares) (5 endpoints) (PromQL) (kubectl, nvidia-smi) | | | SQLite DBs Log files External APIs (email, restarts) (/tmp/*.log) (Sonarr, Radarr, etc.) ``` - The backend reuses `scripts/lib/` modules (`portainer.py`, `ollama.py`, `prometheus.py`) via `lib_bridge.py` - The frontend uses Next.js rewrites (`next.config.ts`) to proxy `/api/*` to the backend, so the browser never contacts port 18888 directly - Real-time activity feed uses Server-Sent Events (SSE) with automatic reconnection (5s backoff) - SWR polling with configurable intervals handles periodic data refresh - Theme selection stored in `localStorage` under key `homelab-theme` ## Pages (7 Tabs) ### 1. Dashboard (`/`) Overview page with aggregate stats: - **Stat cards**: total containers, emails processed today, unhealthy containers, Ollama status - **Activity feed**: real-time SSE stream of parsed log events (email classifications, container restarts, backup results, drift detection, etc.) - **Calendar**: upcoming events from Baikal CalDAV (192.168.0.200:12852) - **Jellyfin card**: server status, active sessions, libraries - **Ollama/GPU card**: GPU temp, VRAM usage, utilization %, power draw - **Host row**: container counts per endpoint (Atlantis, Calypso, NUC, homelab, RPi5) - **Quick actions**: restart Jellyfin, restart Ollama, pause/resume email organizers, run backup - **Disk usage**: top disks by usage % from Prometheus - **Health score**: 0-100 gauge with letter grade (A-F), computed from container health, GPU availability, Ollama status, backup status, and config drift ### 2. Infrastructure (`/infrastructure`) - Container table with search, filter by endpoint, state badges - Container log viewer modal - Container restart button - Olares K3s pod listing (all namespaces or filtered) - GPU status card (nvidia-smi via SSH) - Uptime Kuma monitors with up/down counts - Disk usage from Prometheus ### 3. Media (`/media`) - **Jellyfin**: now playing sessions, recently added items - **Plex**: server status for Calypso and Atlantis instances, active sessions - **Tdarr Cluster**: live worker progress bars with fps/ETA, node hardware info, error count, total space saved, files processed - **Sonarr**: download queue, recent grab/import history - **Radarr**: download queue, recent grab/import history - **SABnzbd**: NZB download queue - **Deluge**: torrent client status (active, downloading, seeding counts) - **Prowlarr**: indexer stats (total, enabled, indexer list) - **Bazarr**: subtitle status, SignalR connection state, wanted episodes/movies - **Audiobookshelf**: library stats (audiobooks, ebooks, podcasts) ### 4. Automations (`/automations`) - **Email organizer stats**: per-account (Gmail lz, Gmail dvish, Proton) with today's category breakdown, sender cache size - **Backup status**: today's backup log entries, error detection - **Config drift**: last drift check result, drift count - **Stack restarts**: unhealthy container tracking entries from `stack-restart.db` - **Automation timeline**: last run time for all 11 automation scripts ### 5. Expenses (`/expenses`) - Monthly summary (total, count, month selector) - Transactions table from `data/expenses.csv` - Top 10 vendors by spend amount ### 6. Network (`/network`) - **AdGuard DNS**: total queries, blocked count, average processing time - **AdGuard rewrites**: full list of DNS rewrites - **Headscale**: node list with online/offline status, IPs, last seen - **Cloudflare**: full DNS record table (80 records) with name, type, content, proxied/DNS-only status. Summary badges show proxied vs DNS-only counts. - **Authentik SSO**: users, active sessions, recent events - **Gitea**: recent commits on homelab/main, open PRs ### 7. Logs (`/logs`) Unified log viewer for all automation log files: - List of available logs with file sizes - Tail view (configurable line count, max 2000) - Text search within log files Available logs: `stack-restart`, `backup`, `gmail-lz`, `gmail-dvish`, `proton`, `receipt`, `drift`, `digest`, `disk`, `changelog`, `subscription`, `pr-review` ## Features ### AI Chat Widget Bottom-left floating chat powered by Ollama (`qwen3:32b`). Enriches every query with: - Live container counts, GPU status, email stats, Ollama status - Contextual enrichment based on keywords (Headscale nodes for network questions, Jellyfin status for media questions, AdGuard stats for DNS questions) - Repo doc search -- keyword-matches against `docs/` and `scripts/` markdown and Python files - Responds with current data, not cached/stale answers ### Quick Actions Buttons on the Dashboard page that trigger backend actions: - **Restart Jellyfin** -- `kubectl rollout restart` on Olares - **Restart Ollama** -- `kubectl rollout restart` on Olares - **Pause organizers** -- stops all 3 email organizer cron jobs via `gmail-organizer-ctl.sh stop` - **Resume organizers** -- starts all 3 via `gmail-organizer-ctl.sh start` - **Run backup** -- triggers `gmail-backup-daily.sh` ### Health Score Scored 0-100 based on: - Container health: -4 per non-running container (max -40) - Unhealthy containers: -10 per unhealthy (max -20) - GPU available: -10 if unavailable - Ollama available: -10 if offline - Backup status: -10 if errors, -5 if no log - Config drift: -10 if drift detected Grades: A (90+), B (80+), C (70+), D (60+), F (<60) ### Cmd+K Global Search Press `Cmd+K` (or `Ctrl+K`) to open a command palette that searches across all pages, containers, services, and actions. Fuzzy matching with keyboard navigation. ### Click-to-Copy IP addresses, hostnames, and other copyable values show a copy icon on hover. Clicking copies to clipboard with a toast confirmation. ### Loading Skeletons and Empty States All data cards show animated skeleton placeholders while loading. When a service is unavailable or returns no data, a descriptive empty state is shown instead of a blank card. ### Custom Favicon Custom homelab favicon (`dashboard/ui/app/favicon.ico`). ### Keyboard Shortcuts | Key | Action | |-----|--------| | `Cmd+K` / `Ctrl+K` | Global search | | `1` | Dashboard tab | | `2` | Infrastructure tab | | `3` | Media tab | | `4` | Automations tab | | `5` | Expenses tab | | `6` | Network tab | | `7` | Logs tab | | `r` | Reload page | Disabled when focus is in an input or textarea. ### Other Features - Auto-refresh countdown indicator - Toast notifications for actions - Mobile responsive - Glassmorphism card styling with backdrop blur - Sticky nav with accent gradient line - Theme switcher in nav bar ## API Endpoints All endpoints are prefixed with `/api/`. ### Overview Router (`routers/overview.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/health` | Health check (returns `{"status": "ok"}`) | | GET | `/api/stats/overview` | Aggregate stats: container counts, GPU, emails today, unhealthy, Ollama | | GET | `/api/activity` | SSE stream of today's automation events (init + update events) | | GET | `/api/calendar` | Upcoming events from Baikal CalDAV | | GET | `/api/health-score` | Health score 0-100 with grade and detail breakdown | | GET | `/api/automation-timeline` | Last run times for all 11 automation scripts | | GET | `/api/disk-usage` | Disk usage from Prometheus (top 20 by usage %) | | POST | `/api/chat` | Chat with Ollama using live context + doc search | | POST | `/api/actions/restart-jellyfin` | Restart Jellyfin on Olares via kubectl | | POST | `/api/actions/restart-ollama` | Restart Ollama on Olares via kubectl | | POST | `/api/actions/pause-organizers` | Pause all email organizer cron jobs | | POST | `/api/actions/resume-organizers` | Resume all email organizer cron jobs | | GET | `/api/actions/organizer-status` | Check organizer running/paused status | | POST | `/api/actions/run-backup` | Trigger Gmail backup (up to 300s timeout) | ### Containers Router (`routers/containers.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/containers` | List all containers, optional `?endpoint=` filter | | GET | `/api/containers/{id}/logs` | Container logs (requires `?endpoint=` query param) | | POST | `/api/containers/{id}/restart` | Restart container (requires `?endpoint=` query param) | ### Media Router (`routers/media.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/jellyfin/status` | Jellyfin server info, libraries, active sessions | | GET | `/api/jellyfin/latest` | Recently added items (last 10) | | GET | `/api/plex/status` | Plex server status for Calypso + Atlantis | | GET | `/api/sonarr/queue` | Sonarr download queue | | GET | `/api/sonarr/history` | Recent Sonarr grabs/imports (last 10) | | GET | `/api/radarr/queue` | Radarr download queue | | GET | `/api/radarr/history` | Recent Radarr grabs/imports (last 10) | | GET | `/api/sabnzbd/queue` | SABnzbd NZB download queue | | GET | `/api/prowlarr/stats` | Prowlarr indexer status (total, enabled, list) | | GET | `/api/bazarr/status` | Bazarr version, SignalR state, wanted counts | | GET | `/api/audiobookshelf/stats` | Library stats (items per library, total) | | GET | `/api/tdarr/cluster` | Tdarr cluster status: nodes, workers, progress, fps, space saved | | GET | `/api/deluge/status` | Deluge torrent status (total, active, downloading, seeding) | ### Automations Router (`routers/automations.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/automations/email` | Email organizer status for all 3 accounts | | GET | `/api/automations/restarts` | Recent unhealthy container tracking entries | | GET | `/api/automations/backup` | Today's backup log status and entries | | GET | `/api/automations/drift` | Config drift detection last result | ### Expenses Router (`routers/expenses.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/expenses` | List expenses, optional `?month=YYYY-MM` filter | | GET | `/api/expenses/summary` | Monthly total, count, top 10 vendors | ### Olares Router (`routers/olares.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/olares/pods` | List K3s pods, optional `?namespace=` filter | | GET | `/api/olares/gpu` | GPU status from nvidia-smi via SSH | ### Network Router (`routers/network.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/network/headscale` | Headscale node list with online status | | GET | `/api/network/adguard` | AdGuard DNS stats (queries, blocked, avg time) | | GET | `/api/network/adguard/rewrites` | AdGuard DNS rewrite list | | GET | `/api/network/cloudflare` | Cloudflare DNS records with name, type, content, proxied status | | GET | `/api/network/authentik` | Authentik users, sessions, recent events | | GET | `/api/network/gitea` | Recent commits and open PRs | ### Kuma Router (`routers/kuma.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/kuma/monitors` | All Uptime Kuma monitors with up/down status | ### Logs Router (`routers/logs.py`) | Method | Path | Description | |--------|------|-------------| | GET | `/api/logs` | List available log files with sizes | | GET | `/api/logs/{name}` | Get log contents, optional `?tail=N&search=term` | ## Themes All 16 themes use glassmorphism (semi-transparent cards with backdrop blur) and CSS custom properties. Stored in `dashboard/ui/lib/themes.ts`. | # | Theme | Style | Swatch | |---|-------|-------|--------| | 1 | **Midnight** (default) | Dark blue-violet | `#3b82f6` / `#8b5cf6` | | 2 | **Light** | Clean light mode | `#2563eb` / `#e2e8f0` | | 3 | **Cyberpunk** | Neon pink-cyan on dark | `#ec4899` / `#06b6d4` | | 4 | **Steampunk** | Warm amber-copper on dark | `#d4a76a` / `#b87333` | | 5 | **Portland** | Forest green-teal | `#15803d` / `#0e7490` | | 6 | **Racing** | Red-zinc motorsport | `#dc2626` / `#a1a1aa` | | 7 | **Ocean** | Sky blue-teal depths | `#0284c7` / `#2dd4bf` | | 8 | **Aurora** | Green-violet northern lights | `#4ade80` / `#a78bfa` | | 9 | **Sakura** | Pink-rose cherry blossom | `#f472b6` / `#fda4af` | | 10 | **Emerald** | Deep emerald green | `#34d399` / `#059669` | | 11 | **Sunset** | Orange-red warm tones | `#f97316` / `#dc2626` | | 12 | **Arctic** | Ice blue-white frost | `#38bdf8` / `#e0f2fe` | | 13 | **Crimson** | Deep red on near-black | `#ef4444` / `#1a1a1a` | | 14 | **Trinidad** | Red-gold Caribbean | `#ef4444` / `#fbbf24` | | 15 | **Samurai** | Red-white Japanese | `#dc2626` / `#fafafa` | | 16 | **Supra** | Orange on dark carbon | `#f97316` / `#18181b` | All dark themes except Light (#2). Theme switcher is in the nav bar. Selection persists across sessions via `localStorage`. ## How to Run ### Prerequisites - Python 3.12+ with pip - Node.js 22+ - SSH access to `olares`, `calypso`, `pi-5` (for GPU, Headscale, Kuma queries) - Access to Portainer API (192.168.0.200:9443) - Access to Prometheus (for disk usage queries) ### Start API (development) ```bash cd dashboard/api pip install -r requirements.txt uvicorn main:app --host 0.0.0.0 --port 18888 & ``` ### Start API (production, homelab-vm) Use the launcher — it sources `dashboard/api/.env` (which holds `BAIKAL_PASS` and other secrets) before handing off to uvicorn: ```bash nohup /home/homelab/organized/repos/homelab/dashboard/api/start.sh \ > /tmp/dashboard-api.log 2>&1 & disown ``` `dashboard/api/.env` is gitignored and must be created manually with (at least): - `BAIKAL_PASS="REDACTED_PASSWORD" — Baikal user password for the calendar widget. Without this the `/api/calendar` endpoint returns an empty list with `error: "BAIKAL_PASS not set"`. - `BAIKAL_URL`, `BAIKAL_USER` — optional overrides; defaults point at `vish/default` on Atlantis LAN. To restart: `pkill -f 'uvicorn main:app'` then re-run the launcher. ### Start UI (development) ```bash cd dashboard/ui npm install BACKEND_URL=http://localhost:18888 npm run dev -- -p 3100 ``` ### Start UI (production) ```bash cd dashboard/ui BACKEND_URL=http://localhost:18888 npm run build cp -r .next/static .next/standalone/.next/static cp -r public .next/standalone/public BACKEND_URL=http://localhost:18888 HOSTNAME=0.0.0.0 PORT=3100 node .next/standalone/server.js ``` ### Docker deployment ```bash cd dashboard docker compose up -d ``` Note: The `docker-compose.yml` uses `network_mode: host` and maps ports 8888 (API) and 3000 (UI). For production use, override the ports via environment variables or edit the compose file to use 18888/3100. ## Backend Dependencies From `dashboard/api/requirements.txt`: - `fastapi==0.115.12` - `uvicorn[standard]==0.34.2` - `httpx==0.28.1` - `pyyaml>=6.0` - `sse-starlette==2.3.3` Plus the shared `scripts/lib/` modules (mounted as a volume in Docker, or on the Python path when running directly). ## Key Files | Path | Purpose | |------|---------| | `dashboard/api/main.py` | FastAPI app entry point, router registration | | `dashboard/api/lib_bridge.py` | Bridges `scripts/lib/` modules into the API | | `dashboard/api/log_parser.py` | Parses automation logs into structured events for SSE | | `dashboard/api/routers/*.py` | API route handlers (9 routers) | | `dashboard/ui/app/layout.tsx` | Root layout with nav, theme provider, chat, shortcuts | | `dashboard/ui/app/page.tsx` | Dashboard (overview) page | | `dashboard/ui/app/*/page.tsx` | Tab pages (infrastructure, media, automations, expenses, network, logs) | | `dashboard/ui/components/` | 17 UI components + shadcn/ui primitives | | `dashboard/ui/lib/themes.ts` | All 16 theme definitions | | `dashboard/ui/lib/api.ts` | API client (fetchAPI, postAPI) | | `dashboard/ui/lib/use-sse.ts` | SSE hook with auto-reconnect | | `dashboard/ui/next.config.ts` | Next.js config with `/api/*` rewrite to backend | ## Troubleshooting | Problem | Cause | Fix | |---------|-------|-----| | No data loading | API not running on port 18888 | Start uvicorn: `uvicorn main:app --host 0.0.0.0 --port 18888` | | "Invalid Date" in UI | API returning unexpected date format | Check backend response format, look at the specific router | | Text hard to read on a theme | CSS custom property issue | Check `dashboard/ui/lib/themes.ts` for the theme's `vars` block, verify `--foreground` and `--muted-foreground` contrast | | SSE not connecting | `/api/activity` endpoint not responding | Check API is running, check Next.js rewrite in `next.config.ts`, check browser console for EventSource errors | | Calendar empty | Baikal unreachable | Verify Baikal at `http://192.168.0.200:12852` is running | | GPU card shows unavailable | SSH to olares failing | Test `ssh -o ConnectTimeout=3 olares nvidia-smi` manually | | Headscale shows empty | SSH to calypso failing | Test `ssh calypso "sudo docker exec headscale headscale nodes list -o json"` | | Kuma monitors empty | SSH to pi-5 or sqlite3 query failing | Test `ssh pi-5 "docker exec uptime-kuma sqlite3 /app/data/kuma.db 'SELECT COUNT(*) FROM monitor'"` | | Chat returns "Ollama is currently offline" | Ollama not running on Olares | Check Ollama pod: `kubectl get pods -n ollamaserver-shared` | | Container logs failing | Wrong endpoint param | Ensure `?endpoint=atlantis` (or other valid endpoint) is passed | ## Tdarr Version Sync All 5 Tdarr instances must run the same version. Images are pinned by SHA digest, not `:latest` tag. | Host | Role | Hardware | Deployment | |------|------|----------|------------| | Atlantis | Server + Node | CPU (Xeon) | `hosts/synology/atlantis/arr-suite/docker-compose.yml` | | Calypso | Node | CPU (Ryzen R1600) | `hosts/synology/calypso/tdarr-node/docker-compose.yaml` | | Guava | Node | VAAPI (Radeon 760M) | `hosts/truenas/guava/tdarr-node/docker-compose.yaml` | | PVE LXC 103 | Node | QSV (Intel) | `hosts/proxmox/lxc/tdarr-node/docker-compose.yaml` | | **Olares** | **Node** | **NVENC (RTX 5090)** | `olares/tdarr-node.yaml` (K8s manifest) | ### Olares Node (fastest) - RTX 5090 with NVENC: h264_nvenc, hevc_nvenc, av1_nvenc all working - Deployed as K8s Deployment in `tdarr-node` namespace on Olares - HAMI bypass: `runtimeClassName: nvidia`, no `nvidia.com/gpu` resource requests - NFS mounts: `/mnt/atlantis_media` (media, read-only) + `/mnt/atlantis_cache` (cache, read-write) - Calico GlobalNetworkPolicy `allow-lan-to-tdarr` for LAN ingress + all egress - Recommended workers: GPU=2, CPU=0, Health check=1 - Custom Olares chart available at `olares/tdarr-node-chart.tgz` ### Auto-update prevention - All Docker nodes have `com.centurylinklabs.watchtower.enable=false` label - PVE LXC cron (`/etc/cron.d/tdarr-update`) was removed - Guava Watchtower label flipped from `true` to `false` - Olares node uses pinned digest in K8s manifest ### To update all nodes 1. Get new digest: check `ghcr.io/haveagitgat/tdarr` and `tdarr_node` latest digest 2. Update all 5 files (4 compose + 1 K8s manifest) with the new digest 3. Push to git — Atlantis/Calypso auto-deploy via GitOps 4. Manually redeploy Guava: `ssh guava "cd /mnt/data/tdarr-node && sudo docker compose pull && sudo docker compose up -d"` 5. Manually redeploy PVE: `ssh pve "pct exec 103 -- bash -c 'cd /opt/tdarr && docker compose pull && docker compose up -d'"` 6. Redeploy Olares: `ssh olares "kubectl apply -f -" < olares/tdarr-node.yaml` ## YourSpotify DNS Migration Migrated `client.spotify.vish.gg` → `spotify-client.vish.gg` to work with the `*.vish.gg` wildcard SSL cert (wildcard only covers one subdomain level). | Domain | Purpose | Route | |--------|---------|-------| | `spotify-client.vish.gg` | Frontend (web UI) | NPM → NUC 100.72.55.21:4000 | | `spotify.vish.gg` | Backend (API) | NPM → NUC 100.72.55.21:15000 | Both proxied through Cloudflare → matrix-ubuntu NPM → NUC (Tailscale IP). Removed from NUC DYNDNS updater (no longer needs direct public IP). **Note**: Spotify Developer Dashboard redirect URI must be updated to `https://spotify-client.vish.gg/callback`.