Files
homelab-optimized/docs/troubleshooting/matrix-ssl-authentik-incident-2026-03-19.md
Gitea Mirror Bot bccfcaf6e2
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m0s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-20 08:50:12 UTC
2026-03-20 08:50:12 +00:00

125 lines
4.9 KiB
Markdown

# Matrix SSL + Authentik Proxy Incident — 2026-03-19
---
## Issues Addressed
### 1. mx.vish.gg "Not Secure" Warning
**Symptom:** Browser showed "Not Secure" on `https://mx.vish.gg`.
**Root cause:** NPM was serving the **Cloudflare Origin Certificate** (cert ID 1, `*.vish.gg`) for `mx.vish.gg`. Cloudflare Origin certs are only trusted by Cloudflare's edge — since `mx.vish.gg` is **unproxied** (required for Matrix federation), browsers hit the origin directly and don't trust the cert.
**Fix:**
1. Got a proper Let's Encrypt cert for `mx.vish.gg` via Cloudflare DNS challenge on matrix-ubuntu:
```bash
sudo certbot certonly --dns-cloudflare \
--dns-cloudflare-credentials /etc/cloudflare.ini \
-d mx.vish.gg --email your-email@example.com --agree-tos
```
2. Copied cert to NPM as `npm-6`:
```
/volume1/docker/nginx-proxy-manager/data/custom_ssl/npm-6/fullchain.pem
/volume1/docker/nginx-proxy-manager/data/custom_ssl/npm-6/privkey.pem
```
3. Updated NPM proxy host 10 (`mx.vish.gg`) to use cert ID 6
4. Set up renewal hook: `/etc/letsencrypt/renewal-hooks/deploy/copy-to-npm.sh`
**Same fix applied for:** `livekit.mx.vish.gg` (cert `npm-7`, proxy host 47)
---
### 2. kuma.vish.gg Redirect Loop (`ERR_TOO_MANY_REDIRECTS`)
**Symptom:** `kuma.vish.gg` (Uptime Kuma) caused infinite redirect loop via Authentik Forward Auth.
**Root cause (two issues):**
**Issue A — Missing `X-Original-URL` header:**
The Authentik outpost returned `500` for Forward Auth requests because NPM wasn't passing the `X-Original-URL` header. The outpost log showed:
```
failed to detect a forward URL from nginx
```
**Fix:** Added to NPM advanced config for `kuma.vish.gg` (proxy host 41):
```nginx
auth_request /outpost.goauthentik.io/auth/nginx;
proxy_set_header X-Original-URL $scheme://$http_host$request_uri;
```
**Issue B — Empty `cookie_domain` on all Forward Auth providers:**
After login, Authentik couldn't set the session cookie correctly because `cookie_domain` was empty on all proxy providers. This caused the auth loop to continue even after successful authentication.
**Fix:** Set `cookie_domain: vish.gg` on all proxy providers via Authentik API:
| PK | Provider | Was | Now |
|----|----------|-----|-----|
| 4 | Paperless Forward Auth | `''` | `vish.gg` |
| 5 | vish.gg Domain Forward Auth | `vish.gg` | ✅ already set |
| 8 | Scrutiny Forward Auth | `''` | `vish.gg` |
| 12 | Uptime Kuma Forward Auth | `''` | `vish.gg` |
| 13 | Ollama Forward Auth | `''` | `vish.gg` |
| 14 | Wizarr Forward Auth | `''` | `vish.gg` |
```bash
AK_TOKEN="..."
for pk in 4 8 12 13 14; do
PROVIDER=$(curl -s "https://sso.vish.gg/api/v3/providers/proxy/$pk/" -H "Authorization: Bearer $AK_TOKEN")
UPDATED=$(echo "$PROVIDER" | python3 -c "import sys,json; d=json.load(sys.stdin); d['cookie_domain']='vish.gg'; print(json.dumps(d))")
curl -s -X PUT "https://sso.vish.gg/api/v3/providers/proxy/$pk/" \
-H "Authorization: Bearer $AK_TOKEN" -H "Content-Type: application/json" -d "$UPDATED"
done
```
---
### 3. TURN Server External Verification
**coturn** was verified working externally from Seattle VPS (different network):
| Test | Result |
|------|--------|
| UDP port 3479 reachable | ✅ |
| STUN Binding request | ✅ `0x0101` success, returns `184.23.52.14:3479` |
| TURN Allocate (auth required) | ✅ `0x0113` (401) — server responds, relay functional |
Config: `/etc/turnserver.conf` on matrix-ubuntu
- `listening-port=3479`
- `use-auth-secret`
- `static-auth-secret` = same as `turn_shared_secret` in Synapse homeserver.yaml
- `realm=matrix.thevish.io`
---
## NPM Certificate Reference
| Cert ID | Nice Name | Domain | Type | Expires | Notes |
|---------|-----------|--------|------|---------|-------|
| 1 | Cloudflare Origin - vish.gg | `*.vish.gg`, `vish.gg` | Cloudflare Origin | 2041 | Only trusted by CF edge — don't use for unproxied |
| 2 | Cloudflare Origin - thevish.io | `*.thevish.io` | Cloudflare Origin | 2026 | Same caveat |
| 3 | Cloudflare Origin - crista.love | `*.crista.love` | Cloudflare Origin | 2026 | Same caveat |
| 4 | git.vish.gg (LE) | `git.vish.gg` | Let's Encrypt | 2026-05 | |
| 5 | headscale.vish.gg (LE) | `headscale.vish.gg` | Let's Encrypt | 2026-06 | |
| 6 | mx.vish.gg (LE) | `mx.vish.gg` | Let's Encrypt | 2026-06 | Added 2026-03-19 |
| 7 | livekit.mx.vish.gg (LE) | `livekit.mx.vish.gg` | Let's Encrypt | 2026-06 | Added 2026-03-19 |
> **Rule:** Any domain that is **unproxied** in Cloudflare (DNS-only, orange cloud off) must use a real Let's Encrypt cert, not the Cloudflare Origin cert.
---
## Renewal Automation
Certs 6 and 7 are issued by certbot on `matrix-ubuntu` and auto-renewed via systemd timer. Deploy hooks copy renewed certs to NPM on Calypso:
```
/etc/letsencrypt/renewal-hooks/deploy/copy-to-npm.sh
```
To manually renew and deploy:
```bash
ssh matrix-ubuntu
sudo certbot renew --force-renewal -d mx.vish.gg
# hook runs automatically and copies to NPM
```
**Last updated:** 2026-03-19