homelab-optimized/docs/troubleshooting/common-issues.md

# 🚨 Common Issues & Solutions

**🟢 Beginner-Friendly Troubleshooting Guide**

This guide covers the most frequent problems encountered in the homelab and their solutions. Issues are organized by category with step-by-step resolution instructions.

## 🎯 Quick Diagnosis

### 🔍 **First Steps for Any Problem**
1. **Check service status**: `docker ps` or `docker-compose ps`
2. **Review logs**: `docker-compose logs service-name`
3. **Verify connectivity**: Can you reach the service URL?
4. **Check resources**: `docker stats` for CPU/memory usage
5. **Test network**: `ping` and `curl` commands

---

## 🐳 Container Issues

### ❌ **Container Won't Start**

#### **Symptoms**
- Service shows as "Exited" in `docker ps`
- Error messages in logs about startup failures
- Service unreachable despite being "running"

#### **Common Causes & Solutions**

**🔧 Port Already in Use**
```bash
# Check what's using the port
sudo netstat -tulpn | grep :8080
# or
sudo lsof -i :8080

# Solution: Change port in docker-compose.yml
ports:
  - "8081:8080"  # Use different external port
```

**🔧 Permission Issues (Synology)**
```bash
# Fix ownership for Synology NAS
sudo chown -R 1026:100 /volume1/docker/service-name
sudo chmod -R 755 /volume1/docker/service-name

# For other systems
sudo chown -R 1000:1000 ./service-data
```

**🔧 Missing Environment Variables**
```bash
# Check if .env file exists
ls -la .env

# Verify environment variables are set
docker-compose config

# Create missing .env file
cat > .env << 'EOF'
TZ=America/Los_Angeles
PUID=1026
PGID=100
EOF
```

**🔧 Image Pull Failures**
```bash
# Manually pull the image
docker pull image:tag

# Check if image exists
docker images | grep image-name

# Try different image tag
image: service:stable  # Instead of :latest
```

---

### 🔄 **Container Keeps Restarting**

#### **Symptoms**
- Container status shows "Restarting"
- High restart count in `docker ps`
- Service intermittently available

#### **Solutions**

**🔧 Check Resource Limits**
```bash
# Monitor resource usage
docker stats --no-stream

# Increase memory limit
deploy:
  resources:
    limits:
      memory: 2G  # Increase from 1G
```

**🔧 Fix Health Check Issues**
```bash
# Test health check manually
docker exec container-name curl -f http://localhost:8080/health

# Adjust health check timing
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
  interval: 60s      # Increase from 30s
  timeout: 30s       # Increase from 10s
  start_period: 120s # Increase startup time
```

**🔧 Database Connection Issues**
```bash
# Check database connectivity
docker exec app-container ping database-container

# Verify database is ready
docker exec db-container pg_isready -U username

# Add proper depends_on
depends_on:
  database:
    condition: service_healthy
```

---

## 🌐 Network & Connectivity Issues

### 🚫 **Service Not Accessible**

#### **Symptoms**
- "Connection refused" or "Site can't be reached"
- Service running but not responding to requests
- Timeout errors when accessing web interface

#### **Solutions**

**🔧 Check Port Binding**
```bash
# Verify port is bound
docker port container-name

# Check if service is listening
docker exec container-name netstat -tulpn

# Test internal connectivity
docker exec container-name curl http://localhost:8080
```

**🔧 Firewall Issues**
```bash
# Check firewall status (Ubuntu/Debian)
sudo ufw status

# Allow port through firewall
sudo ufw allow 8080

# For Synology, check Control Panel > Security > Firewall
```

**🔧 Network Configuration**
```bash
# Check Docker networks
docker network ls

# Inspect network configuration
docker network inspect network-name

# Recreate network if needed
docker-compose down
docker network prune
docker-compose up -d
```

---

### 🔗 **Services Can't Communicate**

#### **Symptoms**
- App can't connect to database
- API calls between services fail
- "Name resolution failure" errors

#### **Solutions**

**🔧 Network Isolation**
```yaml
# Ensure services are on same network
networks:
  app-network:
    name: app-network

services:
  app:
    networks:
      - app-network
  database:
    networks:
      - app-network
```

**🔧 Service Discovery**
```bash
# Use container names for internal communication
DATABASE_HOST=database-container  # Not localhost

# Test name resolution
docker exec app-container nslookup database-container
```

---

### 🔴 **AdGuard Crash-Loop (bind: cannot assign requested address)**

#### **Symptoms**
- AdGuard container shows "Restarting" or "Up Less than a second" in `docker ps`
- Logs contain: `fatal] starting dns server: configuring listeners: ... bind: cannot assign requested address`

#### **Cause**
AdGuard binds its DNS listener to a specific IP address stored in `AdGuardHome.yaml`. If the host's IP changes (DHCP reassignment, netplan change, or AdGuard briefly starts and rewrites the config to the current IP), the stored IP won't match the host and AdGuard will fail to bind.

#### **Diagnose**
```bash
# See what IP AdGuard is trying to bind to
docker logs AdGuard --tail 20

# See what IP the interface actually has
ip addr show eno1 | grep 'inet '

# See what's in the config file
sudo grep -A3 'bind_hosts' /home/vish/docker/adguard/config/AdGuardHome.yaml
```

#### **Fix**
```bash
# Update the config to match the actual interface IP
sudo sed -i 's/- 192.168.68.XXX/- 192.168.68.100/' /home/vish/docker/adguard/config/AdGuardHome.yaml

# Restart AdGuard
docker restart AdGuard
```

> **On concord-nuc**: `eno1` must have static IP `192.168.68.100`. If it reverted to DHCP, re-apply the static config with `sudo netplan apply`. See [concord-nuc README](../../hosts/physical/concord-nuc/README.md) for full details.

---

## 💾 Storage & Data Issues

### 📁 **Data Not Persisting**

#### **Symptoms**
- Configuration lost after container restart
- Uploaded files disappear
- Database data resets

#### **Solutions**

**🔧 Volume Mounting**
```yaml
# Ensure proper volume mounting
volumes:
  - /volume1/docker/service:/data:rw  # Host path:Container path
  - ./config:/app/config:rw           # Relative path

# Check volume exists
ls -la /volume1/docker/service
```

**🔧 Permission Issues**
```bash
# Fix volume permissions
sudo chown -R 1026:100 /volume1/docker/service
sudo chmod -R 755 /volume1/docker/service

# Check container user
docker exec container-name id
```

---

### 💿 **Disk Space Issues**

#### **Symptoms**
- "No space left on device" errors
- Services failing to write data
- Slow performance

#### **Solutions**

**🔧 Check Disk Usage**
```bash
# Check overall disk usage
df -h

# Check Docker space usage
docker system df

# Check specific directory
du -sh /volume1/docker/*
```

**🔧 Clean Up Docker**
```bash
# Remove unused containers, networks, images
docker system prune -a

# Remove unused volumes (CAUTION: This deletes data!)
docker volume prune

# Clean up logs
sudo truncate -s 0 /var/lib/docker/containers/*/*-json.log
```

---

## 🔐 Authentication & Access Issues

### 🚪 **Can't Login to Services**

#### **Symptoms**
- "Invalid credentials" errors
- Login page not loading
- Authentication timeouts

#### **Solutions**

**🔧 Default Credentials**
```bash
# Check service documentation for defaults
# Common defaults:
# Username: admin, Password: "REDACTED_PASSWORD"
# Username: admin, Password: "REDACTED_PASSWORD"
# Username: admin, Password: "REDACTED_PASSWORD"

# Check logs for generated passwords
docker-compose logs service-name | grep -i password
```

**🔧 Reset Admin Password**
```bash
# For many services, delete config and restart
docker-compose down
rm -rf ./config/
docker-compose up -d

# Check service-specific reset procedures
docker exec container-name reset-password admin
```

---

### 🔑 **SSL/TLS Certificate Issues**

#### **Symptoms**
- "Certificate not trusted" warnings
- HTTPS not working
- Mixed content errors

#### **Solutions**

**🔧 Nginx Proxy Manager**
```bash
# Access Nginx Proxy Manager
http://host-ip:81

# Add SSL certificate for domain
# Use Let's Encrypt for automatic certificates
```

**🔧 Self-Signed Certificates**
```bash
# Generate self-signed certificate
openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes

# Add to browser certificate store
# Or use HTTP instead of HTTPS for internal services
```

---

## 📊 Performance Issues

### 🐌 **Slow Service Response**

#### **Symptoms**
- Web interfaces load slowly
- API calls timeout
- High CPU/memory usage

#### **Solutions**

**🔧 Resource Allocation**
```yaml
# Increase resource limits
deploy:
  resources:
    limits:
      memory: 4G    # Increase memory
      cpus: '2.0'   # Increase CPU
```

**🔧 Database Optimization**
```bash
# Check database performance
docker exec db-container pg_stat_activity

# Optimize database configuration
# Add indexes, tune memory settings
```

**🔧 Storage Performance**
```bash
# Check disk I/O
iostat -x 1

# Move to faster storage (SSD)
# Use tmpfs for temporary data
tmpfs:
  - /tmp:size=1G
```

---

## 🔄 Update & Maintenance Issues

### 📦 **Update Failures**

#### **Symptoms**
- Container won't start after update
- New version missing features
- Configuration incompatibility

#### **Solutions**

**🔧 Rollback to Previous Version**
```bash
# Use specific version tag
image: service:v1.2.3  # Instead of :latest

# Rollback with docker-compose
docker-compose down
docker-compose pull
docker-compose up -d
```

**🔧 Backup Before Updates**
```bash
# Backup configuration
cp -r ./config ./config.backup

# Backup database
docker exec db-container pg_dump -U user dbname > backup.sql

# Test update on copy first
cp -r service-dir service-dir-test
cd service-dir-test
# Test update here
```

---

### 🔄 **Watchtower Not Running**

#### **Symptoms**
- Containers not updating automatically
- Watchtower container in "Created" state
- No Watchtower logs or activity

#### **Solutions**

**🔧 Check Container Status**
```bash
# Check if Watchtower container exists
sudo docker ps -a | grep watchtower

# Check container state
sudo docker inspect watchtower --format '{{.State.Status}}'
```

**🔧 Start Watchtower Container**
```bash
# Start the container if it's stopped
sudo docker start watchtower

# Verify it's running
sudo docker ps | grep watchtower

# Check logs for startup
sudo docker logs watchtower --tail 20
```

**🔧 Test Watchtower API (if enabled)**
```bash
# Test API endpoint (should return 401 if secured)
curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8082/v1/update

# Test with authentication token
curl -H "Authorization: Bearer your-token" http://localhost:8082/v1/update
```

**🔧 Automated Fix Script**
```bash
# Use the automated fix script
./scripts/fix-watchtower-atlantis.sh
```

**📋 Related Documentation**
- Incident Report: `docs/troubleshooting/watchtower-atlantis-incident-2026-02-09.md`
- Fix Script: `scripts/fix-watchtower-atlantis.sh`
- Status Check: `scripts/check-watchtower-status.sh`

---

## 🌐 Tailscale Issues

### LAN host unreachable despite being on the same subnet

**Symptoms:**
- Can ping the gateway but not a specific LAN host
- SMB/NFS mounts time out silently
- `tracert`/`traceroute` to the host loops or times out immediately
- `Find-NetRoute` on Windows shows traffic routing via Tailscale instead of the local interface

**Cause:** A Tailscale node is advertising a subnet route that overlaps your local LAN (e.g. Calypso advertises `192.168.0.0/24`). Any node with `accept_routes: true` installs that route at a lower metric than the local interface, so traffic meant for LAN hosts goes into the Tailscale tunnel instead.

**Diagnose (Linux):**
```bash
# Check policy routing table for Tailscale-installed routes
ip route show table 52 | grep 192.168

# Check which peer is advertising the subnet
tailscale status --json | python3 -c "
import sys, json
d = json.load(sys.stdin)
for peer in d.get('Peer', {}).values():
    routes = peer.get('PrimaryRoutes') or []
    if routes: print(peer['HostName'], routes)
"
```

**Diagnose (Windows):**
```powershell
# Check which interface Windows uses to reach the host
Find-NetRoute -RemoteIPAddress 192.168.0.100 | Select-Object InterfaceAlias, NextHop

# Check route table for the subnet
Get-NetRoute -AddressFamily IPv4 | Where-Object { $_.DestinationPrefix -like '192.168.0*' } |
  Select-Object DestinationPrefix, NextHop, RouteMetric, InterfaceAlias
```

**Fix (Linux — immediate):**
```bash
sudo ip route del 192.168.0.0/24 dev tailscale0 table 52
```

**Fix (Linux — permanent):**
Set `accept_routes: false` in Tailscale config. For TrueNAS SCALE app:
```bash
sudo midclt call app.update tailscale '{"values": {"tailscale": {"accept_routes": false, "reset": true}}}'
```

**Fix (Windows — permanent):**
```
tailscale up --accept-routes=false --login-server=https://headscale.vish.gg:8443
```

> **Note:** Nodes that genuinely need remote access to the `192.168.0.0/24` LAN (e.g. off-site VPS, remote laptop) should keep `accept_routes: true`. Nodes that are physically on that LAN should use `accept_routes: false`.

See full incident report: `docs/troubleshooting/guava-smb-incident-2026-03-14.md`

---

### TrueNAS Tailscale app stuck in STOPPED / DEPLOYING after upgrade

**Symptoms:**
- App shows `STOPPED` state after a version upgrade
- App starts deploying but immediately exits
- Container logs show: `Error: changing settings via 'tailscale up' requires mentioning all non-default flags`

**Cause:** After a TrueNAS app version upgrade, the new container's startup script runs `tailscale up` with flags from the app config. If any flag in the stored Tailscale state differs from the app config (e.g. `accept_dns` was `false` at runtime but `true` in the app UI), `tailscale up` refuses to proceed.

**Fix:**
1. Set `reset: true` in the app config to clear the flag mismatch
2. Ensure all app config flags match the intended running state (especially `accept_dns`)
3. Start the app — it will apply a clean `tailscale up --reset ...`
4. Set `reset: false` after the app is running (optional, reset is idempotent)

```bash
sudo midclt call app.update tailscale '{"values": {"tailscale": {
  "accept_dns": false,
  "accept_routes": false,
  "advertise_exit_node": true,
  "hostname": "truenas-scale",
  "reset": true
}}}'
sudo midclt call app.start tailscale
```

---

## 🔐 Authentik SSO Issues

### Forward Auth redirect loop (`ERR_TOO_MANY_REDIRECTS`)

**Symptoms:** Browser shows infinite redirect loop or `ERR_TOO_MANY_REDIRECTS` when accessing a service protected by Authentik Forward Auth.

**Cause 1 — Missing `X-Original-URL` header in NPM:**
The Authentik outpost returns `500` because it can't detect the original URL. Check the Authentik server logs:
```
failed to detect a forward URL from nginx
```
**Fix:** Add to NPM advanced config for the affected proxy host:
```nginx
auth_request /outpost.goauthentik.io/auth/nginx;
proxy_set_header X-Original-URL $scheme://$http_host$request_uri;
```

**Cause 2 — Empty `cookie_domain` on proxy provider:**
After successful login, Authentik can't set the session cookie correctly so the redirect loop continues.

**Fix:** Set `cookie_domain` on the provider via Authentik API or UI (**Admin → Providers → [provider] → Advanced → Cookie Domain = `vish.gg`**):
```bash
AK_TOKEN="<your-token>"
PK=12  # provider PK
PROVIDER=$(curl -s "https://sso.vish.gg/api/v3/providers/proxy/$PK/" -H "Authorization: Bearer $AK_TOKEN")
UPDATED=$(echo "$PROVIDER" | python3 -c "import sys,json; d=json.load(sys.stdin); d['cookie_domain']='vish.gg'; print(json.dumps(d))")
curl -s -X PUT "https://sso.vish.gg/api/v3/providers/proxy/$PK/" \
  -H "Authorization: Bearer $AK_TOKEN" -H "Content-Type: application/json" -d "$UPDATED"
```

> **Rule:** All Forward Auth proxy providers should have `cookie_domain: vish.gg`. If adding a new Forward Auth provider, always set this.

### SSL "not secure" for unproxied domains

Services that need direct internet access (Matrix federation, DERP relays, headscale) must be **unproxied in Cloudflare** (orange cloud off). The Cloudflare Origin Certificate (cert ID 1 in NPM) is only trusted by Cloudflare's edge — direct connections will show "not secure".

**Fix:** Issue a Let's Encrypt cert via Cloudflare DNS challenge:
```bash
ssh matrix-ubuntu  # or any host with certbot + cloudflare.ini
sudo certbot certonly --dns-cloudflare \
  --dns-cloudflare-credentials /etc/cloudflare.ini \
  -d your.domain.vish.gg --email your-email@example.com --agree-tos
```
Then import into NPM as a custom cert and update the proxy host.

See `docs/troubleshooting/matrix-ssl-authentik-incident-2026-03-19.md` for full details.

---

## 🤖 Ansible & Automation Issues

### 📋 **Playbook Failures**

#### **Symptoms**
- Ansible tasks fail with permission errors
- SSH connection failures
- Tasks timeout or hang

#### **Solutions**

**🔧 SSH Connectivity**
```bash
# Test SSH connection
ssh -i ~/.ssh/key user@host

# Check SSH key permissions
chmod 600 ~/.ssh/private_key

# Verify host in known_hosts
ssh-keyscan -H hostname >> ~/.ssh/known_hosts
```

**🔧 Permission Issues**
```bash
# Check sudo permissions
ansible host -m shell -a "sudo whoami"

# Add user to docker group
sudo usermod -aG docker username

# Fix Ansible inventory
[hosts]
hostname ansible_user=correct_user ansible_become=yes
```

---

## 🔍 Diagnostic Commands

### 🛠️ **Essential Commands**

**Container Diagnostics**
```bash
# List all containers
docker ps -a

# Check container logs
docker logs container-name --tail 50 -f

# Execute commands in container
docker exec -it container-name /bin/bash

# Check container resource usage
docker stats container-name

# Inspect container configuration
docker inspect container-name
```

**Network Diagnostics**
```bash
# Test connectivity
ping hostname
curl -I http://hostname:port

# Check DNS resolution
nslookup hostname
dig hostname

# Check port availability
telnet hostname port
nc -zv hostname port
```

**System Diagnostics**
```bash
# Check system resources
htop
free -h
df -h

# Check service status
systemctl status docker
systemctl status service-name

# Check logs
journalctl -u docker -f
tail -f /var/log/syslog
```

---

## 🆘 Emergency Procedures

### 🚨 **Service Down - Critical**

1. **Immediate Assessment**
   ```bash
   docker ps | grep service-name
   docker logs service-name --tail 20
   ```

2. **Quick Restart**
   ```bash
   docker-compose restart service-name
   # or
   docker-compose down && docker-compose up -d
   ```

3. **Check Dependencies**
   ```bash
   # Verify database is running
   docker ps | grep database

   # Check network connectivity
   docker exec service-name ping database
   ```

4. **Rollback if Needed**
   ```bash
   # Use last known good configuration
   git checkout HEAD~1 -- service-directory/
   docker-compose up -d
   ```

### 🔥 **Multiple Services Down**

1. **Check Host Status**
   ```bash
   # Check system resources
   free -h && df -h

   # Check Docker daemon
   systemctl status docker
   ```

2. **Restart Docker if Needed**
   ```bash
   sudo systemctl restart docker
   docker-compose up -d
   ```

3. **Check Network Issues**
   ```bash
   # Test internet connectivity
   ping 8.8.8.8

   # Check local network
   ping gateway-ip
   ```

---

## 📞 Getting Help

### 🔍 **Where to Look**
1. **Service logs**: Always check container logs first
2. **Official documentation**: Check the service's official docs
3. **GitHub issues**: Search for similar problems
4. **Community forums**: Reddit, Discord, forums
5. **This documentation**: Check other sections

### 📝 **Information to Gather**
- Container logs (`docker logs container-name`)
- System information (`uname -a`, `docker version`)
- Configuration files (sanitized)
- Error messages (exact text)
- Steps to reproduce the issue

### 🏷️ **Common Log Locations**
```bash
# Docker logs
docker logs container-name

# System logs
/var/log/syslog
/var/log/docker.log

# Service-specific logs
/volume1/docker/service/logs/
./logs/
```

---

## 📋 Prevention Tips

### ✅ **Best Practices**
- **Regular backups**: Automate configuration and data backups
- **Monitoring**: Set up alerts for service failures
- **Documentation**: Keep notes on configuration changes
- **Testing**: Test updates in non-production first
- **Version control**: Track configuration changes in Git

### 🔄 **Maintenance Schedule**
- **Daily**: Check service status, review alerts
- **Weekly**: Review logs, check disk space
- **Monthly**: Update containers, review security
- **Quarterly**: Full system backup, disaster recovery test

---

## 🐳 Synology DSM — Docker / gluetun Issues

### gluetun crashes immediately on Synology (`flushing conntrack` error)

**Symptoms**
- gluetun container exits with exit code 1 seconds after starting
- Logs show: `ERROR flushing conntrack: netfilter query: netlink receive: invalid argument`
- Any container using `network_mode: "service:gluetun"` fails with `namespace path: lstat /proc/<PID>/ns/net: no such file or directory`

**Cause**
Synology DSM kernels do not ship the `nf_conntrack_netlink` module (`modprobe nf_conntrack_netlink` fails with "not found"). The gluetun `latest` Docker image (from ~2026-02-23, commit `625a63e`) introduced fatal conntrack flushing that requires this module.

**Fix**
Pin gluetun to `v3.38.0` (last known good version on Synology) and use `privileged: true`:

```yaml
gluetun:
  image: qmcgaw/gluetun:v3.38.0   # do NOT use latest
  privileged: true                  # replaces cap_add: NET_ADMIN
  devices:
    - /dev/net/tun:/dev/net/tun
  healthcheck:
    test: ["CMD-SHELL", "wget -qO /dev/null http://127.0.0.1:9999 2>/dev/null || exit 1"]
    interval: 10s
    timeout: 5s
    retries: 6
    start_period: 30s
```

For containers sharing gluetun's network (e.g. deluge), use `condition: service_healthy` to avoid the race condition:

```yaml
deluge:
  network_mode: "service:gluetun"
  depends_on:
    gluetun:
      condition: service_healthy
```

**Notes**
- The healthcheck hits gluetun's built-in health server at `127.0.0.1:9999` which returns 200 when the VPN tunnel is up
- The gluetun volume mount (`/gluetun`) overwrites the container's `/gluetun` dir — do **not** use `["CMD", "/gluetun/healthcheck"]` as that binary gets hidden by the mount
- With the WireGuard SPK installed (see below), v3.38.0 uses kernel WireGuard (`Using available kernelspace implementation`); interface is still `tun0` in this version
- `latest` gluetun still crashes even with kernel WireGuard — the `nf_conntrack_netlink` missing module issue is unrelated to WireGuard

---

### Installing native kernel WireGuard on Synology (WireGuard SPK)

Installing the 3rd-party WireGuard SPK gives Docker containers native kernel WireGuard support instead of the slower userspace implementation.

**Atlantis status:** WireGuard SPK v1.0.20220627 installed and running (Feb 2026). No reboot required — loaded cleanly via `synopkg start`.

**Steps for v1000 platform (DS1823xs+), DSM 7.3:**
```bash
# Download SPK
wget 'https://www.blackvoid.club/content/files/2026/02/WireGuard-v1000-73-1.0.20220627.spk' -O /tmp/wireguard.spk

# Install (do NOT check "run after installation" if using DSM UI)
sudo /usr/syno/bin/synopkg install /tmp/wireguard.spk

# Start (fixes privilege and loads kernel module)
sudo /usr/syno/bin/synopkg start WireGuard

# Verify module loaded
lsmod | grep wireguard
```

**Make persistent on boot** — add to `esynoscheduler` DB (or DSM Task Scheduler UI), depends on `Docker mount propagation`:
```sql
INSERT INTO task (task_name, event, enable, owner, operation_type, operation, depend_on_task)
VALUES ('WireGuard module', 'bootup', 1, 0, 'script', '#!/bin/sh
/usr/syno/bin/synopkg start WireGuard', 'Docker mount propagation');
```

**Boot task chain on Atlantis:**
`VPNTUN` (modprobe tun) → `Docker mount propagation` (mount --make-shared /) → `WireGuard module` (synopkg start WireGuard)

**Platform SPK URLs (DSM 7.3):** replace `v1000` with your platform (`r1000`, `geminilake`, `apollolake`, `denverton`, etc.):
`https://www.blackvoid.club/content/files/2026/02/WireGuard-{platform}-73-1.0.20220627.spk`

To find your platform: `cat /etc.defaults/synoinfo.conf | grep platform_name`

---

### Docker containers fail with `path / is mounted on / but it is not a shared or slave mount`

**Cause**
Synology DSM boots with the root filesystem mount as `private` (no propagation). Docker requires `shared` propagation for containers that use network namespaces or VPN tunnels (e.g. gluetun).

**Fix — temporary (lost on reboot)**
```bash
mount --make-shared /
```

**Fix — permanent (via DSM Task Scheduler)**
Create a new triggered task in DSM → Control Panel → Task Scheduler:
- Type: Triggered (bootup)
- User: root
- Script:
  ```sh
  #!/bin/sh
  mount --make-shared /
  ```

This has been applied to **Atlantis** and **Calypso** via the `esynoscheduler` DB directly. Task name: `Docker mount propagation`.
**Setillo**: must be added manually via the DSM UI (SSH sudo requires interactive terminal).

---

---

## arr-scripts / Lidarr / Deezer {#arr-scripts-lidarr-deezer}

arr-scripts runs as s6 services inside the Lidarr container. See [lidarr.md](../services/individual/lidarr.md) for the full setup.

### Scripts stuck in "is not ready, sleeping until valid response..." loop

**Cause**: `getArrAppInfo()` reads `arrApiKey` and `arrUrl` from `config.xml` using `xq | jq`. If `xq` was broken when the container first started, the variables are set to empty/wrong values and the `verifyApiAccess()` loop retries forever with stale values — it never re-reads them.

**Fix**: Restart the container. The scripts reinitialize with fresh variable state. If the restart loop persists, check the `xq` issue below first.

### Alpine `xq` vs Python yq `xq` conflict

**Cause**: Alpine's `xq` package (v1.x) outputs XML passthrough instead of converting to JSON. arr-scripts need `cat config.xml | xq | jq -r .Config.ApiKey` to work, which requires Python yq's `xq`.

**Symptom**: `cat /config/config.xml | xq | jq -r .Config.ApiKey` returns a parse error or empty string instead of the API key.

**Check**: `xq --version` inside the container — should show `3.x.x` (Python yq), not `1.x.x` (Alpine).

**Fix** (persistent via scripts_init.bash):
```bash
uv pip install --system --upgrade --break-system-packages yq
```
This installs Python yq's `xq` entry point at `/usr/bin/xq`, overriding Alpine's version.

### "ERROR :: Invalid audioFormat and audioBitrate options set..."

**Cause**: When `audioFormat="native"`, `audioBitrate` must be a word, not a number.

| audioBitrate value | Result |
|---|---|
| `"low"` | 128kbps MP3 (Deezer Free) |
| `"high"` | 320kbps MP3 (Deezer Premium) |
| `"lossless"` | FLAC (Deezer HiFi) |
| `"master"` | MQA (Tidal Master) |
| `"320"` | **INVALID** — causes this error |

**Fix**: In `/volume2/metadata/docker2/lidarr/extended.conf`, set `audioBitrate="high"`.

### "ERROR :: download failed, missing tracks..."

**Cause**: `deemix` is not installed (setup.bash fails silently on Alpine). The script finds a Deezer match but can't execute the download.

**Check**: `which deemix` inside the container — should return `/usr/bin/deemix`.

**Fix** (persistent via scripts_init.bash):
```bash
uv pip install --system --upgrade --break-system-packages deemix
```

### Album title matching always fails — "Calculated Difference () greater than 3"

**Cause**: `pyxdameraulevenshtein` is not installed. The distance calculation in `python -c "from pyxdameraulevenshtein import damerau_levenshtein_distance; ..."` fails silently, leaving `$diff` empty. Every `[ "$diff" -le "$matchDistance" ]` comparison then fails with `[: : integer expected`.

**Check**: `python -c "from pyxdameraulevenshtein import damerau_levenshtein_distance; print(damerau_levenshtein_distance('hello','hello'))"` — should print `0`.

**Fix** (persistent via scripts_init.bash):
```bash
uv pip install --system --upgrade --break-system-packages pyxdameraulevenshtein
```

### Why setup.bash fails to install packages

`setup.bash` uses `uv pip install` to install Python dependencies. On the Alpine version used by the linuxserver/lidarr image, some packages (yq, deemix, pyxdameraulevenshtein) fail to build due to missing setuptools or C extension issues. The failure is silent — setup.bash exits 0 regardless.

**Fix**: `scripts_init.bash` explicitly reinstalls all critical packages after setup.bash runs. This runs on every container start (it's in `custom-cont-init.d`), so it survives container recreates.

### ARL token expired

Deezer ARL tokens expire approximately every 3 months. Symptoms: downloads fail silently or deemix returns 0 tracks.

**Get a new token**:
1. Log in to deezer.com in a browser
2. DevTools → Application → Cookies → `arl` value
3. Update in `/volume2/metadata/docker2/lidarr/extended.conf`: `arlToken="..."`
4. Restart the lidarr container

### Checking arr-scripts service status

```bash
# Via Portainer console exec or SSH into container:
s6-svstat /run/service/custom-svc-Audio
s6-svstat /run/service/custom-svc-ARLChecker

# View live logs
docker logs lidarr -f

# Per-service log files inside container
ls /config/logs/Audio-*.txt
tail -f /config/logs/Audio-$(ls -t /config/logs/Audio-*.txt | head -1 | xargs basename)
```

---

## 📋 Next Steps

- **[Diagnostic Tools](diagnostics.md)**: Advanced troubleshooting tools
- **[Performance Tuning](performance.md)**: Optimize your services
- **[Emergency Procedures](emergency.md)**: Handle critical failures
- **[Monitoring Setup](../admin/monitoring.md)**: Prevent issues with monitoring

---

*Remember: Most issues have simple solutions. Start with the basics (logs, connectivity, resources) before diving into complex troubleshooting.*