28 KiB
28 KiB
2026-04-16
Infrastructure
- Deploy GL-MT3600BE (Beryl 7) as primary gateway replacing GL-MT3000 — Headscale node ID:28, exit node enabled, subnet route 192.168.12.0/24, watchdog cron, SSH key auth
- Jellyfish now on Beryl 7 LAN (192.168.12.181), moon confirmed at 192.168.12.223
Features
- Enhanced HTML email template for all automated emails — color-coded headers per script, status indicators, keyword highlighting (ERROR/WARNING/OK), monospace formatting for plain-text reports
- All automated emails (digest, backup, disk, drift, stack, receipt, subscription) now file directly into Proton Bridge "Digests" IMAP folder via APPEND instead of SMTP (no more Inbox clutter or mis-categorization by email organizer)
- Email digest now uses lib/notify.py shared email infrastructure with bar chart visualizations per category
- Refactored email-digest.py to use shared lib/notify.py instead of its own SMTP/IMAP code
Fixes
- Fix NFS stale mount on atlantis_archive caused by Tailscale table 52 routing (LAN traffic routed through WireGuard tunnel instead of physical NIC)
- Use Folders/Digests IMAP path for Proton Bridge compatibility (top-level folders silently ignored)
- Add Date header to emails for Proton Bridge IMAP APPEND RFC 5322 compliance
2026-04-13
Features
- Switch to qwen3:32b-fast (thinking disabled) for faster response times (13s vs 206s)
- Migrate all qwen3-coder references to qwen3:32b across services and documentation
- Switch all Ollama usage from qwen3-coder (30B MoE) to qwen3:32b (dense) for better reasoning
- Add CLAUDE.md with deployment, config, networking, LLM, and Olares guidelines
Fixes
- Remove watchtower from guava tdarr-node, switch hoarder to ollama
- Standardize AnythingLLM and Perplexica to qwen3-coder:latest to avoid VRAM swap
- Revert standardization to qwen3-coder:latest to avoid VRAM swap cycles
- Switch ollama lib to /api/chat endpoint with 4000 token budget for qwen3:32b thinking mode
- Revert back to qwen3:32b (with thinking) as -fast variant breaks tool calling
- Rename client.spotify.vish.gg to spotify-client.vish.gg for wildcard SSL routing
- Add dedup check (date+vendor+amount) and skip $0 entries in receipt-tracker
Documentation
- Document qwen3-coder to qwen3:32b migration with verification results
- Update fluxer deployment docs to reflect current unified server architecture
- Add YourSpotify DNS migration documentation
- Session changelog including YourSpotify migration, Portainer auth issue, Tailscale/arr fixes
2026-04-06
Features
-
Dashboard
- Added temperature widget and enriched Plex Now Playing
- Added Tdarr cluster widget with live worker progress, fps, node status, and stats
- Added color legend to automations page, fixed timeline schedule thresholds
- Added health score, Kuma monitors, disk usage, recently added media, Cloudflare/Authentik/Gitea, quick actions, automation timeline
- Added Kuma monitors, health score, disk usage, Cloudflare, Authentik, Gitea, media history, quick actions, automation timeline APIs
- Added color-coded text throughout - hosts, categories, statuses, amounts, server names
- Overhauled media page with Plex integration, automations redesign, larger fonts
- Added Baikal calendar card with upcoming events
- Added AI chat with live Headscale/Jellyfin/AdGuard data, keyword aliases, repo doc search (max 2K tokens), smart chat with live homelab context, quick prompts
- Richer activity feed with more event types and better formatting
- Port mockup styling - colored stats, glowing dots, gradient bg, ring gauge
- Glassmorphism redesign inspired by dashdot
- Added network/logs pages, chat widget, toasts, keyboard shortcuts, sparklines, responsive design
- Added Prowlarr, Bazarr, Audiobookshelf, Deluge to media page
- Added 8 themes — midnight, light, cyberpunk, steampunk, portland, racing, ocean, aurora
- Added Exo 2 custom font for the entire dashboard
- Added loading skeletons, empty states, favicon, global search (Cmd+K), click-to-copy
- Visual flair effects (sparkles, card glow, gradient text) + 4 new themes (crimson, trinidad, samurai, supra)
- Layout overhaul + more glass transparency + 4 new themes (sakura, emerald, sunset, arctic)
- True frosted glass effect with visible background gradients bleeding through semi-transparent cards
- Complete Next.js frontend with all 5 pages and components
- Implemented all backend API routes
- Project scaffolding with FastAPI + Docker Compose
-
Tdarr
- Deployed Tdarr node on Olares with RTX 5090 GPU transcoding
- Updated all 4 instances to v2.67.01 (same digest)
- Pinned all tdarr images to v2.66.01 digest, disabled auto-updates
-
Automations
- Added 11 Ollama-powered homelab automations
- Added AI-powered PR reviewer using Ollama
- Added daily email organizer digest
- Enhanced Gmail organizer with 10 improvements
Fixes
-
Dashboard
- Accurate disk usage — filter NFS/CIFS, aggregate ZFS pools, deduplicate
- Container logs modal uses correct API URL format (query param not path)
- Timeline interval lookup order - specific keys before generic (digest before email)
- Timeline falls back to file mtime when log has no timestamps
- Health score only penalizes crashed containers, not cleanly stopped ones
- Replace all escaped unicode across entire codebase with plain text symbols
- Replace escaped unicode symbols with plain text in command search and nav
- Headscale protobuf timestamp conversion, expenses defaults to current month
- Headscale uses snake_case field names (ip_addresses, given_name, last_seen)
- AuthentikStats type to match API (created not timestamp, optional users)
- TypeScript error in authentik user rendering
- Headscale uses sudo docker, authentik shows users + filters noise events
- Make ollama chat opaque so text is readable over glass background
- Replace broken unicode emoji icons with clean text badges in quick actions
- Remove card inner glow pseudo-elements
- Remove choppy border-glow animation, use smooth transition-only hover
- Solid opaque theme dropdown, polished nav with gradient accent line
- Force white text in dark mode with CSS specificity overrides
- Restore .dark CSS defaults so text is visible before JS hydration
- Solid dark card backgrounds for all themes - no more invisible cards
- Remove CSS defaults that override theme vars, fix nav-bg, add missing shadcn vars
- Major contrast boost - semi-opaque dark cards, brighter text across all themes
- Boost card opacity, text brightness, nav contrast across all themes
- Align network/logs pages with actual API response shapes
- Align frontend with API, enhance UI visuals
- Align frontend types with actual API field names
- Compact nav tabs to prevent scrollbar overflow
- Jellyfin API auth via query param (nested SSH quoting fix)
- Wrap MCP tool functions with @_safe to prevent server crashes
-
Tdarr
- Remove read-only flag on media mount — Tdarr needs write access to replace transcoded files
Infrastructure
-
MCP
- Optimized MCP server + added Jellyfin/Olares/Ollama tools
- Rate-limit Ollama calls and cap receipt-tracker to prevent overload
-
Notifications
- Switched all notifications from ntfy to email (admin@thevish.io)
Documentation
-
Dashboard
- Comprehensive session documentation — dashboard, Tdarr Olares, automations, MCP enhancements
- Add Tdarr Olares node documentation with GPU transcoding details
- Add Tdarr version sync documentation
- Update dashboard docs with expanded Cloudflare DNS record table
- Add Fenrus font customization notes for later
- Comprehensive dashboard documentation with all endpoints, themes, and setup instructions
- Add homelab dashboard implementation plan
- Add homelab dashboard design spec
-
Tdarr
- Add Tdarr Olares node documentation with GPU transcoding details
-
Automations
- Add comprehensive README for all automation scripts
- Add Jellyfin on Olares, Plex chart, update Olares docs
- Add AdGuard DNS mesh rollout, switch Headscale to Tailscale IPs
- Add iperf3 benchmarks for all hosts against Calypso
- Add staggered speedtest results for all 10 nodes
- Add GL.iNet router fixes, speedtest results, iperf3 benchmarks
- Document Calypso 5-minute Tailscale disconnect root cause and fix
- Update LAN routing fix for all hosts, add Tailscale mesh test
- Add DERP connectivity diagnosis and fix script
- Update NetBox with MAC addresses for all reachable nodes
-
Miscellaneous
- Tighten backup-validator LLM prompt to stop hallucinating concerns
Changelog
2026-03-27
Security
- crowdsec: Deployed CrowdSec intrusion detection + prevention on matrix-ubuntu, co-located with NPM. Engine parses all 36 NPM proxy host logs + host syslog. Firewall bouncer (nftables) blocks banned IPs at the network layer — avoids nginx
auth_requestconflicts with Authentik SSO. Kuma monitor added (ID 121,/healthendpoint). Prometheus metrics on:6060.
Monitoring
- grafana dashboards: Complete overhaul — 6 dashboards auto-provisioned from bind-mounted JSON files (
/home/homelab/docker/grafana-dashboards/). Removed 900+ lines of embedded dashboard JSON from monitoring.yaml. Pinned Prometheus datasource UID (cfbskvs8upds0b). - grafana new dashboards: Added Synology NAS Monitoring (SNMP disk temps/status, CPU, memory, volumes, network for Atlantis + Calypso), TrueNAS Guava Monitoring (CPU, RAM, ZFS pools, disk I/O), Tailscale Bandwidth (per-host TX/RX rates).
- grafana fixes: Fixed Infrastructure Overview + old Synology dashboard empty datasource UIDs. Fixed
$jobvariableallValue(was empty string, now.*). Cleaned up duplicate provisionedsynology-dashboard-v2ghost dashboard (required Grafana volume wipe). Setillo (DS223j) now showing in Synology dashboard after restarting stopped exporters. - kuma: Added Setillo Node Exporter (ID 122) and SNMP Exporter (ID 123) monitors under Setillo group.
- frigate: Tested Frigate NVR on Seattle with Tapo camera (192.168.68.67) via Tailscale subnet routing. CPU detection working, go2rtc restreaming confirmed. Removed after validation — docs saved for future permanent deployment.
- tailscale: Enabled
--accept-routes=trueon Seattle to allow access to NUC's192.168.68.0/22subnet. NUC route was already advertised and approved in Headscale. - tdarr: Synced all nodes to v2.66.01 (server was 2.65.01, Calypso node was 2.64.02). Redeployed arr-stack on Atlantis, tdarr-node on Calypso, Guava, PVE LXC. Expanded PVE LXC disk 16GB→32GB (was 100% full), pruned 2.86GB old images.
Fixes
- immich (calypso): Fixed Immich-SERVER crash (
getaddrinfo ENOTFOUND database). Portainer git deploy does not loadenv_filereferences — all env vars (DB_HOSTNAME, DB_PASSWORD, etc.) added as Portainer stack environment overrides via API. - kuma: Fixed broken monitor list caused by malformed
accepted_statuscodes_jsonfield ([200-299]→["200-299"]) in CrowdSec monitor entry. Fixed CrowdSec health check URL from/v1/heartbeat(requires auth, returns 401) to/health(unauthenticated, returns 200).
Infrastructure
- setillo: Configured
vishuser for docker access — added towheelgroup (NOPASSWD sudo), added/usr/local/binto PATH via.profile. Docker (Synology ContainerManager) now accessible without full path or root login. - matrix-ubuntu: VM resized — 16GB RAM (was ~8GB), 1TB disk (was smaller). LV extended online from 97GB to 1005GB via
growpart+pvresize+lvextend -r. Now 893GB free (8% used). - mcp: Added
seattleas SSH host alias in homelab MCP server (alongside existingseattle-tailscale). - photoprism (jellyfish): Started PhotoPrism container on jellyfish (
/srv/nas/ametrine/Docker/photoprism/, port 2342).
Container Inventory (2026-03-27)
| Host | Running | Stopped | Total |
|---|---|---|---|
| Atlantis | 59 | 0 | 59 |
| Calypso | 62 | 0 | 62 |
| Homelab-VM | 37 | 1 | 38 |
| Concord NUC | 22 | 0 | 22 |
| Matrix-Ubuntu | 12 | 0 | 12 |
| Guava | 28 | 6 | 34 |
| Seattle | 19 | 1 | 20 |
| RPi5 | 7 | 0 | 7 |
| Jellyfish | 1 | 1 | 2 |
| Total | 247 | 9 | 256 |
2026-03-25
Infrastructure
- portainer: Updated server 2.39.0 → 2.39.1 LTS on atlantis. Updated edge agents to 2.39.1 on all 4 endpoints (homelab-vm, calypso, nuc, rpi5).
- portainer stacks: Fixed stale git credentials across atlantis and calypso. Cleaned up orphan Docker Compose projects (containers created outside Portainer with mismatched project labels) on atlantis, calypso, and homelab-vm.
- netbox: Migrated from standalone
docker composeto Portainer GitOps stack (ID 738) on homelab-vm. - semaphore: Removed — replaced by CLI + cron + MCP workflow. Compose archived.
Features
- AGENTS.md: Overhauled Vesper agent identity — structured priorities, multi-host task guidance, failure handling, context budget, known footguns, tailscale mesh runbook.
- MCP tools: Added 5 Authentik SSO tools —
create_proxy_provider,create_application,list_sessions,delete_session,get_events. Service onboarding is now 2 MCP calls. - email backup: Daily incremental backup of 3 email accounts (dvish92, lzbellina92, admin@thevish.io) to atlantis NFS mount at
/volume1/archive/old_emails/. IMAP auto-reconnect on Gmail throttling. Cron at 3 AM.
Fixes
- NFS mount: Fixed atlantis
/volume1/archiveNFS export — removed krb5i (no Kerberos configured), added LAN routing rule to bypass Tailscale for 192.168.0.0/24. - ansible inventory: Commented out offline hosts (pi-5-kevin, moon) to prevent exit code 4 on every playbook run.
- image update docs: Added step-by-step walkthrough, orphan container gotcha, and git auth troubleshooting.
- moon jellyfish mount: Added
noserverinoto CIFS mount — fixed "folder contents cannot be displayed" error in GUI file manager. - moon guava backup: NFS mount from atlantis (
100.83.230.112:/volume1/archive/guava_full_backup→/home/moon/guava_backup_atlantis), read-only over Tailscale. Added100.64.0.6to atlantis NFS export, persisted in fstab. - olares investigation: Documented Olares internal Headscale/Tailscale architecture — runs its own coordination server inside k3s for reverse proxy tunneling. Cannot be replaced with external Headscale without breaking
*.olares.comremote access.
Stable Diffusion Forge (shinku-ryuu)
- Forge WebUI: Installed Stable Diffusion WebUI Forge on shinku-ryuu (RTX 4080, 16GB VRAM, i7-14700K, 96GB RAM). Conda env with Python 3.10, SDXL Base 1.0 model. Access at
http://100.98.93.15:7860orhttp://localhost:7860. Launcher:C:\stable-diffusion-webui-forge\run-forge.bat. - Guava Gitea: Increased avatar max file size from 1MB to 10MB in
/etc/gitea/app.ini.
Git Migration
- playgrounds → Guava Gitea: Migrated 35 git repos from moon (
~/Documents/playgrounds/) to Guava Gitea (http://guava.crista.home:30008) under thelulupearluser. Sources: 8 bitbucket, 26 gitlab, 1 lulupearl_gitea. All repos private, commit history preserved. Cloned all 34 repos to homelab-vm at/home/homelab/organized/repos/.
Tailscale Mesh Verification
- Verified full 30-path mesh across 6 SSH-accessible hosts. All direct connections. Setillo uses DERP initially but hole-punches to direct (~55ms WAN latency). Documented Synology-specific tailscale CLI paths and
pinglimitations.
[Unreleased] (2026-02-27)
Bug Fixes
-
credentials: Restored all credentials broken by sanitization commit
037d766a- Affected stacks: authentik-sso, paperless, wireguard (calypso+nuc), monitoring, dyndns (atlantis+nuc), watchtower, yourspotify, paperless-ai, alerting
- Root cause: sanitization commit replaced real values with
REDACTED_PASSWORDplaceholders across 14+ compose files; containers redeployed with broken env vars - Fix: recovered original values from git history (
037d766a^) and pushed as commits50d8eca8and4e5607b7; all 11 affected stacks redeployed via API
-
portainer: Updated
portainer-homelabsaved Git credential with new Gitea token- Previously expired token caused all 43 stacks using
credId:1to fail git pulls - Fixed via
PUT /api/users/1/gitcredentials/1
- Previously expired token caused all 43 stacks using
-
portainer-api-guide: Corrected authentication docs —
ptr_*tokens requireX-API-Keyheader, notAuthorization: Bearer; updated version 2.33.7 → 2.39.0
[Unreleased] (2025-02-12)
Features
- arr-suite: Implement Trash Guides language configuration for Radarr and Sonarr
- Added 4 custom formats: Language Not English (-10000), Anime Dual Audio (+500), Multi (+500), Language Not Original (0)
- Updated quality profiles to prioritize English content while allowing foreign films in original language
- Enhanced anime support with dual audio preference
- Enables proper handling of foreign films like "Cold War" in Polish
- Documentation:
docs/services/arr-suite-language-configuration.md
0.10.3 (2026-02-07)
Bug Fixes
0.10.2 (2026-01-25)
Bug Fixes
0.10.1 (2026-01-25)
Bug Fixes
0.10.0 (2026-01-25)
Features
- allow kicking members from voice channels (#495) (0dc5442)
- repository architecture for files crate w. added tests (#498) (01ded20)
Bug Fixes
0.9.4 (2026-01-10)
Bug Fixes
- checkout repo. before bumping lock (#490) (b2da2a8)
- persist credentials for git repo (#492) (c674a9f)
0.9.3 (2026-01-10)
Bug Fixes
0.9.2 (2026-01-10)
Bug Fixes
0.9.1 (2026-01-10)
Bug Fixes
0.9.0 (2026-01-10)
Features
- add id field to role (#470) (2afea56)
- add ratelimits to gifbox (1542047)
- include groups and dms in fetch mutuals (caa8607)
- include member payload in REDACTED_APP_PASSWORD event (480f210)
- initial work on tenor gif searching (b0c977b)
- make message lexer use unowned string (1561481)
- ready payload field customisation (db57706)
- require auth for search (b5cd5e3)
- trending and categories routes (5885e06)
- voice chats v2 (#414) (d567155)
Bug Fixes
- add license to revolt-parser (5335124)
- allow for disabling default features (65fbd36)
- apple music to use original url instead of metadata url (bfe4018)
- apply uname fix to january and autumn (8f9015a)
- ci: publish images under stoatchat and remove docker hub (d65c1a1)
- correct miniz_oxide in lockfile (#478) (5d27a91)
- correct shebang for try-tag-and-release (050ba16)
- correct string_cache in lockfile (#479) (0b178fc)
- don't remove timeouts when a member leaves a server (#409) (e635bc2)
- don't update the same field while trying to remove it (f4ee35f), closes #392
- github webhook incorrect payload and formatting (#468) (dc9c82a)
- implement Serialize to ClientMessage (dea0f67)
- newly created roles should be ranked the lowest (947eb15)
- permit empty
removearray in edit requests (6ad3da5) - preserve order of replies in message (#447) (657a3f0)
- prevent timing out members which have TimeoutMembers permission (e36fc97)
- relax settings name regex (3a34159)
- remove authentication tag bytes from attachment download (32e6600)
- rename openapi operation ids (6048587), closes #406
- respond with 201 if no body in requests (#465) (24fedf8)
- swap to using reqwest for query building (38dd4d1)
- use
trust_cloudflareconfig value instead of env var (cc7a796) - use our own result types instead of tenors types (a92152d)
2026-04-07 — Session: Infrastructure Fixes
YourSpotify DNS Migration
- Renamed
client.spotify.vish.gg→spotify-client.vish.gg(wildcard SSL cert compatibility) - Created NPM proxy hosts:
spotify-client.vish.gg→ NUC:4000,spotify.vish.gg→ NUC:15000 - Updated Cloudflare DNS to proxied through matrix-ubuntu
- Removed from NUC DYNDNS updater
- Updated compose env vars and Spotify Developer Dashboard redirect URI
Portainer Git Auth Issue
- 71 of 94 GitOps stacks lost git credentials (likely from Portainer upgrade)
- Stacks continue running but cannot pull/redeploy until credentials are re-entered
- Credentials: Username=
vish, Token=Gitea service account token - Fix: Re-enter via Portainer UI → Stack → Editor → Repository → Authentication
- 22 stacks already have working auth
Tailscale Fixes
- NUC (vish-concord-nuc): Tailscale daemon was stale, restarted with
sudo systemctl restart tailscaled - Now active with direct connection to all home nodes
Arr Suite Recovery
- 6 containers stuck in "Created" state after Portainer pull-and-redeploy on Atlantis
- Affected: audiobookshelf, deluge, prowlarr, radarr, sonarr, tdarr
- Fixed by restarting each container via Portainer API
- Plex also restarted (had exited during arr-suite redeploy)
Kuma Monitor Fix
- Plex monitor (ID 60) under Setillo group was pointing to wrong IP (Atlantis instead of Setillo)
- Updated to correct IP: 100.125.0.20:32400
Jitsi Network Conflict
- Orphaned Docker network
jitsi-stack_meet.jitsi(172.30.0.0/16) conflicted with newturn_net(172.30.0.0/24) - Removed orphaned network, redeploy requires git auth re-entry
2026-04-07 — LLM Model Migration: qwen3-coder → qwen3:32b
Why
- qwen3-coder (30B MoE, 3.3B active params) caused OpenCode to drift, plan instead of act, and stall after context compaction
- qwen3:32b (dense 32B, all params active every token) provides dramatically better instruction following and reasoning
What changed
- All 13 config/compose files updated from qwen3-coder to qwen3:32b
- All documentation updated (AGENTS.md, CLAUDE.md, 8 service docs, scripts/README.md)
- OpenCode config: new model + "always respond with results" instruction + step limits reduced (50→20)
- OpenCode config: full host inventory, SSH aliases, service URLs added to instructions
- Perplexica Docker volume config updated
- AnythingLLM and Reactive Resume stacks redeployed on Portainer
- VRAM usage: 22.9/24.5 GB (similar to qwen3-coder)
Verified working
- Ollama direct, Dashboard AI chat, Perplexica, AnythingLLM, Reactive Resume, Gmail organizers (3 accounts), MCP server