Files
homelab-optimized/docs/infrastructure/split-horizon-dns.md
Gitea Mirror Bot bccfcaf6e2
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m0s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-20 08:50:12 UTC
2026-03-20 08:50:12 +00:00

261 lines
11 KiB
Markdown

# Split-Horizon DNS Implementation Guide
Last updated: 2026-03-20
## Problem
All DNS queries for `*.vish.gg`, `*.thevish.io`, and `*.crista.love` currently resolve to Cloudflare proxy IPs (104.21.x.x), even when the client is on the same LAN as the services. This means:
1. **Hairpin NAT** — LAN traffic goes out to Cloudflare and back in through the router
2. **Internet dependency** — if the WAN link goes down, LAN services are unreachable by domain
3. **Added latency** — ~50ms roundtrip through Cloudflare vs ~1ms on LAN
4. **Cloudflare bottleneck** — all traffic proxied through CF even when unnecessary
## Solution
Use AdGuard Home on Calypso as a **split-horizon DNS resolver** that returns local IPs for homelab domains when queried from the LAN, while external clients continue to use Cloudflare.
```
┌──────────────────────────────────┐
│ DNS Query for │
│ nb.vish.gg │
└───────────────┬──────────────────┘
┌───────────────▼──────────────────┐
│ Where is the client? │
└───────┬───────────────┬──────────┘
│ │
LAN Client External Client
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ AdGuard Home │ │ Cloudflare │
│ (Calypso) │ │ DNS │
│ │ │ │
│ Returns: │ │ Returns: │
│ 192.168.0.250│ │ 104.21.73.214│
│ (NPM local) │ │ (CF proxy) │
└──────┬───────┘ └──────┬───────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ NPM (local) │ │ Cloudflare │
│ calypso:443 │ │ → WAN IP │
│ ~1ms │ │ → NPM │
└──────┬───────┘ │ ~50ms │
│ └──────┬───────┘
▼ ▼
┌─────────────────────────────────┐
│ Backend Service │
│ (same result, faster path) │
└─────────────────────────────────┘
```
## Prerequisites
Before implementing split-horizon DNS, NPM must listen on standard ports (80/443) so that LAN clients can reach it without specifying a port. Currently NPM uses temporary ports from the migration:
| Current | Target |
|---------|--------|
| 8880:80 | **80:80** |
| 8443:443 | **443:443** |
| 81:81 | 81:81 (unchanged) |
## Implementation Steps
### Step 1: Move NPM to Standard Ports
The NPM compose file at `hosts/synology/calypso/nginx-proxy-manager.yaml` has a comment noting the ports are temporary. To change them:
1. **Stop Synology's built-in nginx** from binding port 80/443 (if active):
- DSM → Control Panel → Login Portal → Web Services → change port from 80/443 to 5000/5001
- Or via SSH: `sudo synosystemctl stop nginx`
2. **Update the compose file:**
```yaml
ports:
- "80:80" # HTTP
- "443:443" # HTTPS
- "81:81" # Admin UI
```
3. **Update the router port forwarding:**
- Change `WAN:443 → 192.168.0.250:8443` to `WAN:443 → 192.168.0.250:443`
- Change `WAN:80 → 192.168.0.250:8880` to `WAN:80 → 192.168.0.250:80`
4. **Redeploy NPM** — push the compose change to git, CI auto-deploys.
### Step 2: Configure AdGuard DNS Rewrites
In AdGuard Home on Calypso (http://192.168.0.250:9080), go to **Filters → DNS rewrites** and add wildcard entries:
| Domain | Answer | Notes |
|--------|--------|-------|
| `*.vish.gg` | `192.168.0.250` | All vish.gg domains → NPM on Calypso |
| `*.thevish.io` | `192.168.0.250` | All thevish.io domains → NPM on Calypso |
| `*.crista.love` | `192.168.0.250` | All crista.love domains → NPM on Calypso |
These three wildcards cover all 36 proxy hosts. AdGuard resolves matching queries locally instead of forwarding to upstream DNS.
**Exceptions** — these domains need direct IPs (not NPM), add them as specific overrides:
| Domain | Answer | Reason |
|--------|--------|--------|
| `mx.vish.gg` | `192.168.0.154` | Matrix federation needs direct access on port 8448 |
| `derp.vish.gg` | `192.168.0.250` | DERP relay — direct IP, no CF proxy |
| `derp-atl.vish.gg` | `192.168.0.200` | Atlantis DERP relay |
| `headscale.vish.gg` | `192.168.0.250` | Headscale control — direct access |
| `turn.thevish.io` | `192.168.0.200` | TURN/STUN needs direct UDP |
Specific entries take priority over wildcards in AdGuard.
### Step 3: Set AdGuard as LAN DNS Server
Configure the router (Archer BE800) to hand out AdGuard's IP as the DNS server via DHCP:
1. **Router admin** → DHCP Settings → DNS Server
2. Set Primary DNS: `192.168.0.250` (Calypso/AdGuard)
3. Set Secondary DNS: `192.168.68.100` (NUC/AdGuard, backup)
Or per-device: point `/etc/resolv.conf` or network settings to `192.168.0.250`.
### Step 4: Configure NUC AdGuard (Backup DNS)
Add the same DNS rewrites to the NUC's AdGuard instance so it works as a backup:
- Same wildcard rewrites as Calypso
- Reachable at `192.168.68.100` or `100.72.55.21` (Tailscale)
### Step 5: Test
```bash
# Verify local resolution
dig nb.vish.gg @192.168.0.250
# Expected: 192.168.0.250 (NPM local IP)
# Verify external resolution still works
dig nb.vish.gg @1.1.1.1
# Expected: 104.21.73.214 (Cloudflare proxy)
# Test HTTPS access via local DNS
curl -s --resolve "nb.vish.gg:443:192.168.0.250" https://nb.vish.gg/ -o /dev/null -w "%{http_code} %{time_total}s\n"
# Expected: 200 in ~0.05s (vs ~0.15s through Cloudflare)
# Test all domains resolve locally
for domain in nb.vish.gg gf.vish.gg git.vish.gg sso.vish.gg dash.vish.gg; do
ip=$(dig +short $domain @192.168.0.250 | tail -1)
echo "$domain → $ip"
done
```
## SSL Considerations
This works because:
- NPM has the **Cloudflare Origin Certificate** for `*.vish.gg` (valid until 2041)
- Browsers trust this cert because it's signed by Cloudflare's CA
- The cert works whether traffic comes through Cloudflare or directly
**However**, the origin cert is only trusted by Cloudflare's proxy. If a browser connects directly to NPM (bypassing CF), it will see an untrusted cert warning because Cloudflare Origin CA is not in public trust stores.
**Fix options:**
1. **Use Let's Encrypt certs in NPM** instead of Cloudflare Origin — trusted everywhere, works for both paths
2. **Accept the warning** for LAN-only access (add exception in browser)
3. **Use Cloudflare in "Full" mode** (not "Full Strict") — CF doesn't validate origin cert, and LAN clients would need to add the Cloudflare Origin CA to their trust store
**Recommended:** Switch to Let's Encrypt with DNS challenge (Cloudflare API) for the wildcard certs. NPM supports this natively. This gives you certs trusted by both Cloudflare and direct LAN connections.
## What Changes for Each Path
### LAN Client (after implementation)
```
Browser → nb.vish.gg
→ AdGuard DNS: 192.168.0.250
→ NPM (calypso:443) → SSL termination
→ Proxy to backend (192.168.0.210:8443)
→ Response (~1ms total DNS+proxy)
```
### External Client (unchanged)
```
Browser → nb.vish.gg
→ Cloudflare DNS: 104.21.73.214
→ Cloudflare proxy → WAN IP → Router
→ NPM (calypso:443) → SSL termination
→ Proxy to backend (192.168.0.210:8443)
→ Response (~50ms total)
```
### Internet Down (new capability)
```
Browser → nb.vish.gg
→ AdGuard DNS: 192.168.0.250 (cached/local)
→ NPM (calypso:443) → SSL termination
→ Proxy to backend
→ Response (services still work!)
```
## Current NPM Proxy Hosts (for reference)
All 36 domains that would benefit from split-horizon:
### vish.gg (27 domains)
| Domain | Backend |
|--------|---------|
| actual.vish.gg | calypso:8304 |
| cal.vish.gg | atlantis:12852 |
| dash.vish.gg | atlantis:7575 |
| dav.vish.gg | calypso:8612 |
| docs.vish.gg | calypso:8777 |
| gf.vish.gg | homelab-vm:3300 |
| git.vish.gg | calypso:3052 |
| headscale.vish.gg | calypso:8085 |
| kuma.vish.gg | rpi5:3001 |
| mastodon.vish.gg | matrix-ubuntu:3000 |
| mx.vish.gg | matrix-ubuntu:8082 |
| nb.vish.gg | homelab-vm:8443 |
| npm.vish.gg | calypso:81 |
| ntfy.vish.gg | homelab-vm:8081 |
| ollama.vish.gg | atlantis:11434 |
| ost.vish.gg | calypso:3000 |
| paperless.vish.gg | calypso:8777 |
| pt.vish.gg | atlantis:10000 |
| pw.vish.gg | atlantis:4080 |
| rackula.vish.gg | calypso:3891 |
| retro.vish.gg | calypso:8025 |
| rx.vish.gg | calypso:9751 |
| rxdl.vish.gg | calypso:9753 |
| scrutiny.vish.gg | homelab-vm:8090 |
| sf.vish.gg | calypso:8611 |
| sso.vish.gg | calypso:9000 |
| wizarr.vish.gg | atlantis:5690 |
### thevish.io (5 domains)
| Domain | Backend |
|--------|---------|
| binterest.thevish.io | homelab-vm:21544 |
| hoarder.thevish.io | homelab-vm:3482 |
| joplin.thevish.io | atlantis:22300 |
| matrix.thevish.io | matrix-ubuntu:8081 |
| meet.thevish.io | atlantis:5443 |
### crista.love (2 domains)
| Domain | Backend |
|--------|---------|
| crista.love | guava:28888 |
| cocalc.crista.love | guava:8080 |
| mm.crista.love | matrix-ubuntu:8065 |
## Rollback
If something breaks:
1. Change router DHCP DNS back to `1.1.1.1` / `8.8.8.8`
2. Or remove the DNS rewrites from AdGuard
3. All traffic reverts to Cloudflare path immediately
## Related Documentation
- [NPM Migration](npm-migration-jan2026.md) — Reverse proxy configuration
- [Authentik SSO](authentik-sso.md) — Forward auth depends on NPM routing
- [Cloudflare DNS](cloudflare-dns.md) — External DNS records
- [Image Update Guide](../admin/IMAGE_UPDATE_GUIDE.md) — Mentions Gitea/NPM as bootstrap dependencies