Files
homelab-optimized/docs/infrastructure/headscale-migration-guide.md
Gitea Mirror Bot b5e43a65a7
Some checks failed
Documentation / Build Docusaurus (push) Failing after 1m12s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-31 11:35:19 UTC
2026-03-31 11:35:19 +00:00

412 lines
16 KiB
Markdown

# Headscale Migration Guide
## Overview
This homelab uses a self-hosted [Headscale](https://github.com/juanfont/headscale) instance instead of Tailscale cloud. Headscale is a drop-in open-source replacement for the Tailscale control server.
- **Headscale server**: `https://headscale.vish.gg:8443`
- **MagicDNS suffix**: `tail.vish.gg` (e.g. `atlantis.tail.vish.gg`)
- **Login**: Authentik SSO at `sso.vish.gg` — username `vish` or email `admin@thevish.io`
- **Hosted on**: Calypso (`192.168.0.250`), managed via Docker
---
## Connecting a New Device
### Linux (Ubuntu / Debian)
1. Install Tailscale if not already installed:
```bash
curl -fsSL https://tailscale.com/install.sh | sh
```
2. Connect to headscale:
```bash
sudo tailscale up \
--login-server=https://headscale.vish.gg:8443 \
--accept-routes \
--force-reauth
```
3. A browser auth URL will be printed. Open it and log in with Authentik SSO.
4. If DNS doesn't resolve `headscale.vish.gg` (e.g. fresh machine with no AdGuard), add a temporary hosts entry first:
```bash
echo '184.23.52.14 headscale.vish.gg' | sudo tee -a /etc/hosts
# Run tailscale up, then clean up:
sudo sed -i '/headscale.vish.gg/d' /etc/hosts
```
5. If the machine was previously on Tailscale cloud and complains about non-default flags, Tailscale will print the exact command with all required flags — copy and run that command.
> **Note**: After registration, an admin must approve the node and fix the IP if preserving the original Tailscale IP (see Admin section below).
---
### Windows
1. Download and install Tailscale from https://tailscale.com/download/windows
2. Open **PowerShell as Administrator** and run:
```powershell
tailscale up --login-server=https://headscale.vish.gg:8443 --accept-routes --force-reauth
```
3. A browser window will open — log in with Authentik SSO (`vish` / `admin@thevish.io`).
4. If it shows a "mention all non-default flags" error, copy and run the exact command it provides, adding `--login-server=https://headscale.vish.gg:8443 --force-reauth` to it.
> **Important**: Always include `--accept-routes` on Windows otherwise subnet routes (e.g. `192.168.0.x`) won't be reachable.
---
### iOS (iPhone / iPad)
1. Install **Tailscale** from the App Store.
2. Open the app → tap your **account icon** (top right) → **Log in**
3. Tap the `···` menu (top right of the login screen) → **Use custom coordination server**
4. Enter: `https://headscale.vish.gg:8443` → **Save**
5. Log in with Authentik SSO — username `vish` or email `admin@thevish.io`
> **Note**: `.vish.local` hostnames do NOT work on iOS — iOS intercepts `.local` for mDNS and never forwards to DNS. Use Tailscale IPs (`100.x.x.x`) or MagicDNS names (`hostname.tail.vish.gg`) instead.
---
### macOS
1. Install Tailscale from the App Store or https://tailscale.com/download/mac
2. **Option A — GUI**: Click the Tailscale menu bar icon → Preferences → hold `Option` while clicking "Log in" to enter a custom server URL → enter `https://headscale.vish.gg:8443`
3. **Option B — CLI**:
```bash
sudo tailscale up \
--login-server=https://headscale.vish.gg:8443 \
--accept-routes \
--force-reauth
```
4. Log in with Authentik SSO when the browser opens.
> **Note**: Same as iOS, `.vish.local` hostnames won't resolve on macOS when remote. Use `hostname.tail.vish.gg` or the Tailscale IP instead.
---
### GL.iNet Routers (OpenWrt)
1. SSH into the router.
2. Add a hosts entry (since GL routers don't use AdGuard):
```bash
echo '184.23.52.14 headscale.vish.gg' >> /etc/hosts
```
3. Run tailscale up — it will error with the required flags. Copy and run the exact command it provides, appending:
```
--login-server=https://headscale.vish.gg:8443 --auth-key=<preauth-key> --force-reauth
```
Get a pre-auth key from an admin (see below).
4. If advertising subnet routes, add `--advertise-routes=<subnet>` to the command.
---
### Home Assistant (Tailscale Add-on)
> **Note**: HA Green does not expose SSH by default. Use the WebSocket API approach below,
> which works fully remotely via a Tailscale-connected hop host.
**Remote migration steps** (no physical access required):
1. Reach HA via a hop host on the same LAN (e.g. jellyfish at `100.69.121.120`):
```
ssh lulu@100.69.121.120
curl http://192.168.12.202:8123/api/ # confirm HA reachable
```
2. If the add-on was previously authenticated to Tailscale cloud, it will refuse
`--login-server` change with: `can't change --login-server without --force-reauth`.
**Fix**: uninstall + reinstall the add-on via supervisor API to clear `tailscaled.state`:
```python
# Via HA WebSocket API (supervisor/api endpoint):
{"type": "supervisor/api", "endpoint": "/addons/a0d7b954_tailscale/uninstall", "method": "post"}
{"type": "supervisor/api", "endpoint": "/addons/a0d7b954_tailscale/install", "method": "post"}
```
3. Set options before starting:
```python
{"type": "supervisor/api", "endpoint": "/addons/a0d7b954_tailscale/options", "method": "post",
"data": {"options": {"login_server": "https://headscale.vish.gg:8443", "accept_dns": false}}}
```
4. Start the add-on via `hassio/addon_start` service, then read logs:
```
GET http://192.168.12.202:8123/api/hassio/addons/a0d7b954_tailscale/logs
```
Look for: `AuthURL is https://headscale.vish.gg:8443/register/<key>`
5. Register on Calypso:
```bash
docker exec headscale headscale nodes register --user vish --key <key-from-log>
```
6. Fix IP via SQLite (see section above) and restart headscale.
---
## Admin: Registering a New Node
After a node connects, an admin needs to:
### 1. Generate a Pre-Auth Key (optional, avoids browser auth)
```bash
ssh -p 62000 Vish@192.168.0.250
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
headscale preauthkeys create --user 1 --expiration 1h
```
Use `--authkey=<key>` instead of browser auth in `tailscale up`.
### 2. Check Registered Nodes
```bash
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale headscale nodes list
```
### 3. Preserve Original Tailscale IP (if migrating from Tailscale cloud)
Headscale v0.28+ removed the `--ipv4` flag. Fix IPs via SQLite:
```bash
sudo sqlite3 /volume1/@docker/volumes/headscale-data/_data/db.sqlite \
"UPDATE nodes SET ipv4='100.x.x.x' WHERE id=<node-id>;"
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker restart headscale
```
### 4. Rename a Node
```bash
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
headscale nodes rename -i <id> <new-name>
```
### 5. Approve Subnet Routes
Routes advertised by nodes must be explicitly approved:
```bash
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
headscale nodes approve-routes -i <node-id> -r <subnet>
# e.g. -r 192.168.0.0/24
```
Check all routes (v0.28 — routes are embedded in node JSON output):
```bash
sudo /volume1/@appstore/REDACTED_APP_PASSWORD/usr/bin/docker exec headscale \
headscale nodes list --output json | python3 -c "
import sys,json
for n in json.load(sys.stdin):
r=n.get('available_routes',[])
a=n.get('approved_routes',[])
if r: print(n['given_name'], 'available:', r, 'approved:', a)
"
```
---
## DNS Notes
- **MagicDNS**: Headscale pushes `192.168.0.250` (Calypso AdGuard) as DNS to all tailnet clients
- **AdGuard rewrites**: `*.vish.local` names resolve to their Tailscale IPs via AdGuard rewrites on Calypso
- **`.vish.local` on iOS/macOS**: Does NOT work remotely — iOS/macOS intercept `.local` for mDNS. Use `hostname.tail.vish.gg` instead
- **External DNS**: `headscale.vish.gg` resolves to `184.23.52.14` (home WAN) externally, `192.168.0.250` internally via AdGuard rewrite
## Uptime Kuma Monitoring
Kuma runs on **pi-5** (`100.77.151.40`) inside the `uptime-kuma` container. DB at `/app/data/kuma.db`.
### Monitor groups and hosts
| Group | Host | Tailscale IP |
|-------|------|-------------|
| Homelab | `homelab.tail.vish.gg` | `100.67.40.126` |
| Atlantis | `atlantis.tail.vish.gg` | `100.83.230.112` |
| Calypso | `calypso.tail.vish.gg` | `100.103.48.78` |
| Concord_NUC | `vish-concord-nuc.tail.vish.gg` | `100.72.55.21` |
| Setillo | `setillo.tail.vish.gg` | `100.125.0.20` |
| Proxmox_NUC | `pve.tail.vish.gg` | `100.87.12.28` |
| Guava | `truenas-scale.tail.vish.gg` | `100.75.252.64` |
| Seattle | `seattle.tail.vish.gg` | `100.82.197.124` |
| Raspberry Pi 5 | `100.77.151.40` | `100.77.151.40` |
### Firewall rules required for Kuma (pi-5 = `100.77.151.40`)
Kuma polls via Tailscale IP. Each host with a ts-input/ts-forward chain needs ACCEPT rules for pi-5:
- **Homelab VM**: Rules in `iptables-legacy` ts-input/ts-forward for pi-5 on all monitored ports. Persisted via `netfilter-persistent`.
- **Concord NUC**: Same — ts-input/ts-forward ACCEPT for pi-5 on monitored ports.
- **Seattle**: UFW rule `ufw allow from 100.77.151.40 to any port 8444`
- **Calypso/Atlantis/Setillo**: No ts-input blocking — Tailscale is in userspace mode on Synology.
### Duplicate service naming
Services that exist on both Atlantis and Calypso use prefixes:
- `[ATL] Sonarr`, `[ATL] Radarr`, etc. for Atlantis
- `[CAL] Sonarr`, `[CAL] Radarr`, etc. for Calypso
### AdGuard DNS fix for `*.tail.vish.gg` on pi-5
Pi-5's Docker daemon was using `100.100.100.100` (Tailscale MagicDNS) but AdGuard on Calypso was forwarding `*.vish.gg` to Cloudflare, which returned stale IPs. Fixed by adding a private upstream in AdGuard config at `/volume1/docker/adguard/config/AdGuardHome.yaml`:
```yaml
upstream_dns:
- "[/tail.vish.gg/]100.100.100.100"
```
---
## NPM Proxy Host Gotcha — Same-Subnet LAN IPs
**Problem**: NPM on Calypso (`192.168.0.250`) cannot reach Docker-published ports on other hosts
that are on the same LAN subnet (`192.168.0.x`).
**Root cause**: When the `Tailscale_outbound_connections` DSM task runs `tailscale configure-host`
on Calypso, it installs kernel netfilter hooks. After this, Docker containers on Calypso sending
traffic to a LAN IP on the same subnet bypass the DNAT rules on the destination host (same-subnet
traffic doesn't go through PREROUTING on the target). The containers are unreachable via their
published ports.
**Fix**: Always use the **Tailscale IP** as the `forward_host` in NPM for services running in
Docker on other hosts, not the LAN IP.
| Host | Use this in NPM (not LAN IP) |
|------|------------------------------|
| Homelab VM | `100.67.40.126` |
| Guava / TrueNAS | `100.75.252.64` |
| Atlantis | `100.83.230.112` |
**Why it worked pre-Headscale**: Before the migration, Tailscale on Calypso ran in pure userspace
mode without kernel netfilter hooks. NPM's outbound packets took the normal kernel path, hitting
the destination's Docker DNAT rules correctly. The `configure-host` task (which installs kernel
hooks) is required for Headscale's subnet routing to work, which introduced this side effect.
**Known affected proxy hosts** (already fixed to Tailscale IPs):
- `gf.vish.gg` → `100.67.40.126:3300` (Grafana)
- `ntfy.vish.gg` → `100.67.40.126:8081` (NTFY)
- `hoarder.thevish.io` → `100.67.40.126:3482` (Karakeep)
- `binterest.thevish.io` → `100.67.40.126:21544` (Binternet)
- `crista.love` → `100.75.252.64:28888` (Guava nginx/static site)
---
## DERP Relay Servers
Three DERP relay regions are configured for redundancy:
| Region | Code | Host | DERP Port | STUN Port | Notes |
|--------|------|------|-----------|-----------|-------|
| 900 | home-cal | headscale.vish.gg:8443 | 8443 | none | Headscale built-in, LAN only |
| 901 | sea | derp-sea.vish.gg:8444 | 8444 | 3478 | Seattle VPS |
| 902 | home-atl | derp-atl.vish.gg:8445 | 8445 | 3480 | Atlantis NAS — added for redundancy |
> **Important**: Tailscale public DERP servers (sfo, nyc, etc.) are disabled. Headscale nodes cannot authenticate through Tailscale's infrastructure. All relay traffic goes through regions 900, 901, or 902.
### DERP Infrastructure Notes
- `derp-sea.vish.gg` → Seattle VPS (`YOUR_WAN_IP`), derper container at `hosts/vms/seattle/derper.yaml`
- `derp-atl.vish.gg` → Home public IP (`184.23.52.14`), router forwards `8445/tcp` + `3480/udp` to Atlantis (`192.168.0.200`)
- Container deployed as **Portainer stack ID 688** on Atlantis (from `hosts/synology/atlantis/derper.yaml`)
- TLS cert at `/volume1/docker/derper-atl/certs/live/derp-atl.vish.gg/` (flat `.crt`/`.key` layout required by derper)
- Cloudflare credentials at `/volume1/docker/derper-atl/secrets/cloudflare.ini`
- Cert auto-renewed monthly (1st of month, 03:00) by `derper-atl-cert-renewer` sidecar container
(certbot/dns-cloudflare + supercronic; logs at `/volume1/docker/derper-atl/certs/renew.log`)
- Port 3478/udp: coturn/Jitsi on Atlantis — do not use
- Port 3479/udp: coturn/Matrix TURN on matrix-ubuntu — do not use
- `derpmap.yaml` lives at `hosts/synology/calypso/derpmap.yaml` in repo; must be manually synced to `/volume1/docker/headscale/config/derpmap.yaml` on Calypso after changes
## Subnet Routes in Use
| Subnet | Advertised by | Approved |
|--------|--------------|---------|
| 192.168.0.0/24 | calypso (primary), atlantis | ✅ |
| 192.168.68.0/22 | vish-concord-nuc | ✅ |
| 192.168.69.0/24 | setillo | ✅ |
| 192.168.12.0/24 | gl-mt3000 | ✅ |
## Node Inventory
| ID | Hostname | Tailscale IP | Status |
|----|----------|-------------|--------|
| 1 | headscale-test | 100.64.0.1 | test LXC |
| 2 | seattle (vmi2076105) | 100.82.197.124 | Seattle VPS |
| 3 | matrix-ubuntu | 100.85.21.51 | |
| 4 | pi-5 | 100.77.151.40 | |
| 5 | vish-concord-nuc | 100.72.55.21 | |
| 6 | setillo | 100.125.0.20 | |
| 7 | pve | 100.87.12.28 | |
| 8 | truenas-scale | 100.75.252.64 | Guava/TrueNAS |
| 9 | ipad-pro | 100.68.71.48 | |
| 10 | iphone16-pro-max | 100.79.252.108 | |
| 11 | atlantis | 100.83.230.112 | |
| 12 | calypso | 100.103.48.78 | Runs headscale |
| 13 | homelab | 100.67.40.126 | |
| 14 | uqiyoe | 100.124.91.52 | Windows laptop |
| 15 | jellyfish | 100.69.121.120 | Remote location |
| 16 | gl-mt3000 | 100.126.243.15 | Remote router |
| 17 | gl-be3600 | 100.105.59.123 | Home router |
### Still to migrate (offline nodes)
Run `tailscale up --login-server=https://headscale.vish.gg:8443 --force-reauth` when they come online:
- kevinlaptop (`100.89.160.65`)
- mah-pc (`100.121.22.51`)
- shinku-ryuu (`100.98.93.15`)
- vish-mint (`100.115.169.43`)
- vishdebian (`100.86.60.62`)
- mastodon-rocky (`100.111.200.21`)
- nvidia-shield (`100.89.79.99`)
- pi-5-kevin (`100.123.246.75`)
- rocky9-playground (`100.105.250.128`)
- samsung-sm-x510 (`100.72.118.117`)
- sd (`100.83.141.1`)
- bluecrownpassionflower (`100.110.25.127`)
- glkvm (`100.64.137.1`)
- google-pixel-10-pro (`100.122.119.40`)
### Home Assistant — Migrated ✅
**Device**: Home Assistant Green at `192.168.12.202:8123` (jellyfish remote location)
**Tailscale IP**: `100.112.186.90` (preserved) | **Node ID**: 19 | **MagicDNS**: `homeassistant.tail.vish.gg`
**Migration completed** remotely (no physical access needed) via:
1. HA WebSocket API (`ws://192.168.12.202:8123/api/websocket`) proxied through jellyfish (`100.69.121.120`)
2. Supervisor `addon_configs` API to set `login_server: https://headscale.vish.gg:8443`
3. Uninstalled + reinstalled the Tailscale add-on to clear stale `tailscaled.state`
(necessary because `can't change --login-server without --force-reauth`)
4. Add-on registered against headscale — auth URL approved via `headscale nodes register`
5. IP updated via SQLite: `UPDATE nodes SET ipv4='100.112.186.90' WHERE id=19;`
**Current add-on config**:
```json
{ "login_server": "https://headscale.vish.gg:8443", "accept_dns": false }
```
**Uptime Kuma monitor**: `[JLF] Home Assistant` (ID 5) → `homeassistant.tail.vish.gg:8123`
**HA API token** (expires 2028-06-07):
`eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiIxMzA1ZTE0NDg2ZGY0NDExYmMyOGEwZTY3ZmUyMTc3NyIsImlhdCI6MTc3MzA1MjkzNywiZXhwIjoyMDg4NDEyOTM3fQ.hzqjg7ALTdTDkMJS9Us-RUetQ309Nmfzx4gXevRRlp8` <!-- pragma: allowlist secret -->
---
## Outstanding TODOs
| Priority | Task | Notes |
|----------|------|-------|
| Low | **Migrate offline nodes** | ~13 nodes still on Tailscale cloud — migrate when they come online |
| Info | **NPM proxy hosts audit** | Going forward, always use Tailscale IPs in NPM for Docker services on other LAN hosts (see NPM section above) |