7.8 KiB
Matrix SSL + Authentik + Portainer OAuth Incidents — 2026-03-19/21
Issues Addressed
1. mx.vish.gg "Not Secure" Warning
Symptom: Browser showed "Not Secure" on https://mx.vish.gg.
Root cause: NPM was serving the Cloudflare Origin Certificate (cert ID 1, *.vish.gg) for mx.vish.gg. Cloudflare Origin certs are only trusted by Cloudflare's edge — since mx.vish.gg is unproxied (required for Matrix federation), browsers hit the origin directly and don't trust the cert.
Fix:
- Got a proper Let's Encrypt cert for
mx.vish.ggvia Cloudflare DNS challenge on matrix-ubuntu:sudo certbot certonly --dns-cloudflare \ --dns-cloudflare-credentials /etc/cloudflare.ini \ -d mx.vish.gg --email your-email@example.com --agree-tos - Copied cert to NPM as
npm-6:/volume1/docker/nginx-proxy-manager/data/custom_ssl/npm-6/fullchain.pem /volume1/docker/nginx-proxy-manager/data/custom_ssl/npm-6/privkey.pem - Updated NPM proxy host 10 (
mx.vish.gg) to use cert ID 6 - Set up renewal hook:
/etc/letsencrypt/renewal-hooks/deploy/copy-to-npm.sh
Same fix applied for: livekit.mx.vish.gg (cert npm-7, proxy host 47)
2. kuma.vish.gg Redirect Loop (ERR_TOO_MANY_REDIRECTS)
Symptom: kuma.vish.gg (Uptime Kuma) caused infinite redirect loop via Authentik Forward Auth.
Root cause (two issues):
Issue A — Missing X-Original-URL header:
The Authentik outpost returned 500 for Forward Auth requests because NPM wasn't passing the X-Original-URL header. The outpost log showed:
failed to detect a forward URL from nginx
Fix: Added to NPM advanced config for kuma.vish.gg (proxy host 41):
auth_request /outpost.goauthentik.io/auth/nginx;
proxy_set_header X-Original-URL $scheme://$http_host$request_uri;
Issue B — Empty cookie_domain on all Forward Auth providers:
After login, Authentik couldn't set the session cookie correctly because cookie_domain was empty on all proxy providers. This caused the auth loop to continue even after successful authentication.
Fix: Set cookie_domain: vish.gg on all proxy providers via Authentik API:
| PK | Provider | Was | Now |
|---|---|---|---|
| 4 | Paperless Forward Auth | '' |
vish.gg |
| 5 | vish.gg Domain Forward Auth | vish.gg |
✅ already set |
| 8 | Scrutiny Forward Auth | '' |
vish.gg |
| 12 | Uptime Kuma Forward Auth | '' |
vish.gg |
| 13 | Ollama Forward Auth | '' |
vish.gg |
| 14 | Wizarr Forward Auth | '' |
vish.gg |
AK_TOKEN="..."
for pk in 4 8 12 13 14; do
PROVIDER=$(curl -s "https://sso.vish.gg/api/v3/providers/proxy/$pk/" -H "Authorization: Bearer $AK_TOKEN")
UPDATED=$(echo "$PROVIDER" | python3 -c "import sys,json; d=json.load(sys.stdin); d['cookie_domain']='vish.gg'; print(json.dumps(d))")
curl -s -X PUT "https://sso.vish.gg/api/v3/providers/proxy/$pk/" \
-H "Authorization: Bearer $AK_TOKEN" -H "Content-Type: application/json" -d "$UPDATED"
done
3. TURN Server External Verification
coturn was verified working externally from Seattle VPS (different network):
| Test | Result |
|---|---|
| UDP port 3479 reachable | ✅ |
| STUN Binding request | ✅ 0x0101 success, returns 184.23.52.14:3479 |
| TURN Allocate (auth required) | ✅ 0x0113 (401) — server responds, relay functional |
Config: /etc/turnserver.conf on matrix-ubuntu
listening-port=3479use-auth-secretstatic-auth-secret= same asturn_shared_secretin Synapse homeserver.yamlrealm=matrix.thevish.io
NPM Certificate Reference
| Cert ID | Nice Name | Domain | Type | Expires | Notes |
|---|---|---|---|---|---|
| 1 | Cloudflare Origin - vish.gg | *.vish.gg, vish.gg |
Cloudflare Origin | 2041 | Only trusted by CF edge — don't use for unproxied |
| 2 | Cloudflare Origin - thevish.io | *.thevish.io |
Cloudflare Origin | 2026 | Same caveat |
| 3 | Cloudflare Origin - crista.love | *.crista.love |
Cloudflare Origin | 2026 | Same caveat |
| 4 | git.vish.gg (LE) | git.vish.gg |
Let's Encrypt | 2026-05 | |
| 5 | headscale.vish.gg (LE) | headscale.vish.gg |
Let's Encrypt | 2026-06 | |
| 6 | mx.vish.gg (LE) | mx.vish.gg |
Let's Encrypt | 2026-06 | Added 2026-03-19 |
| 7 | livekit.mx.vish.gg (LE) | livekit.mx.vish.gg |
Let's Encrypt | 2026-06 | Added 2026-03-19 |
Rule: Any domain that is unproxied in Cloudflare (DNS-only, orange cloud off) must use a real Let's Encrypt cert, not the Cloudflare Origin cert.
Renewal Automation
Certs 6 and 7 are issued by certbot on matrix-ubuntu and auto-renewed via systemd timer. Deploy hooks copy renewed certs to NPM on Calypso:
/etc/letsencrypt/renewal-hooks/deploy/copy-to-npm.sh
To manually renew and deploy:
ssh matrix-ubuntu
sudo certbot renew --force-renewal -d mx.vish.gg
# hook runs automatically and copies to NPM
Issue 4 — Portainer OAuth Hanging (2026-03-21)
Symptom: Clicking "Sign in with SSO" on https://pt.vish.gg would redirect to Authentik, authenticate successfully, but then hang on https://pt.vish.gg/?code=...&state=...#!/auth.
Root causes (three layered issues):
A — NPM migrated to matrix-ubuntu (missed in session context)
NPM was migrated from Calypso to matrix-ubuntu (192.168.0.154) on 2026-03-20. All cert and proxy operations needed to target the new NPM instance.
B — AdGuard wildcard DNS *.vish.gg → 100.85.21.51 (matrix-ubuntu Tailscale IP)
The Calypso AdGuard had a wildcard rewrite *.vish.gg → 100.85.21.51 (matrix-ubuntu's Tailscale IP) intended for LAN clients. This caused:
pt.vish.gg→100.85.21.51— Portainer OAuth redirect went to matrix-ubuntu instead of Atlantissso.vish.gg→100.85.21.51— Portainer's token exchange request to Authentik timed outgit.vish.gg→100.85.21.51— Portainer GitOps stack polling timed out
Fix: Added specific overrides before the wildcard in AdGuard (/opt/adguardhome/conf/AdGuardHome.yaml):
- domain: pt.vish.gg
answer: 192.168.0.154 # NPM on matrix-ubuntu (proxies to Atlantis:10000)
enabled: true
- domain: sso.vish.gg
answer: 192.168.0.154 # NPM on matrix-ubuntu (proxies to Authentik)
enabled: true
- domain: git.vish.gg
answer: 192.168.0.154 # NPM on matrix-ubuntu (proxies to Gitea)
enabled: true
- domain: '*.vish.gg'
answer: 100.85.21.51 # wildcard — matrix-ubuntu for everything else
C — Cloudflare Origin certs not trusted by Synology/Atlantis
Even with correct DNS, Atlantis couldn't verify the Cloudflare Origin cert on sso.vish.gg and pt.vish.gg since they're unproxied (DNS-only in Cloudflare).
Fix: Issued Let's Encrypt certs for each domain via Cloudflare DNS challenge on matrix-ubuntu:
| Domain | NPM cert ID | Expires |
|---|---|---|
sso.vish.gg |
npm-8 |
2026-06 |
pt.vish.gg |
npm-11 |
2026-06 |
All certs auto-renew via certbot on matrix-ubuntu with deploy hook at:
/etc/letsencrypt/renewal-hooks/deploy/copy-to-npm.sh
The hook copies renewed certs to /opt/npm/data/custom_ssl/npm-N/ and reloads nginx.
Current NPM cert reference (matrix-ubuntu)
| Cert ID | Domain | Type | Expires |
|---|---|---|---|
| npm-1 | *.vish.gg (CF Origin) |
Cloudflare Origin | 2041 |
| npm-2 | *.thevish.io (CF Origin) |
Cloudflare Origin | 2026 |
| npm-3 | *.crista.love (CF Origin) |
Cloudflare Origin | 2026 |
| npm-6 | mx.vish.gg |
Let's Encrypt | 2026-06 |
| npm-7 | livekit.mx.vish.gg |
Let's Encrypt | 2026-06 |
| npm-8 | sso.vish.gg |
Let's Encrypt | 2026-06 |
| npm-9 | *.thevish.io |
Let's Encrypt | 2026-06 |
| npm-10 | *.crista.love |
Let's Encrypt | 2026-06 |
| npm-11 | pt.vish.gg |
Let's Encrypt | 2026-06 |
Rule: Any unproxied domain accessed by internal services (Portainer, Synology) needs a real LE cert.
Last updated: 2026-03-21