homelab-optimized/docs/services/individual/tdarr.md

# Tdarr

**🟢 Media Service**

## 📋 Service Overview

| Property | Value |
|----------|-------|
| **Service Name** | tdarr |
| **Host** | Atlantis (Synology) - Server |
| **Category** | Media |
| **Difficulty** | 🟡 |
| **Docker Image** | `ghcr.io/haveagitgat/tdarr:latest` |
| **Compose File** | `hosts/synology/atlantis/arr-suite/docker-compose.yml` |
| **Directory** | `hosts/synology/atlantis/arr-suite` |

## 🎯 Purpose

Tdarr is a distributed transcoding system for automating media library optimization. It can automatically convert your media files to preferred codecs (like H.265/HEVC), remove unwanted audio tracks, and optimize file sizes while maintaining quality.

## 🖥️ Multi-Node Architecture

The Tdarr setup uses distributed worker nodes for parallel transcoding:

| Node | Host | Type | Hardware | Workers |
|------|------|------|----------|---------|
| **TdarrInternalNode** | Atlantis (Synology DS1621+) | CPU | AMD Ryzen | 1 transcode, 2 healthcheck |
| **NUC-QSV** | Proxmox LXC 103 | GPU | Intel QSV | 1 GPU transcode, 1 healthcheck |
| **Calypso-CPU** | Calypso (Synology DS723+) | CPU | AMD Ryzen R1600 | 2 transcode |
| **Guava-VAAPI** | Guava (TrueNAS Scale) | GPU | AMD Radeon 760M (VAAPI, hevc only) | 1 GPU transcode, 1 healthcheck |

### Node Configuration Files
- **Server**: `hosts/synology/atlantis/arr-suite/docker-compose.yml`
- **NUC-QSV**: `hosts/proxmox/lxc/tdarr-node/docker-compose.yaml`
- **Calypso-CPU**: `hosts/synology/calypso/tdarr-node/docker-compose.yaml`
- **Guava-VAAPI**: `hosts/truenas/guava/tdarr-node/docker-compose.yaml`

### Cache Path Configuration (Critical!)
All nodes **must** mount both `/temp` and `/cache` to the same cache directory to avoid path mismatch errors:
```yaml
volumes:
  - /path/to/cache:/temp
  - /path/to/cache:/cache  # Both must point to same location!
```

## 🚀 Quick Start

### Prerequisites
- Docker and Docker Compose installed
- Media library accessible
- Fast storage for transcoding cache (NVMe recommended)
- Access to the host system (Atlantis)

### Deployment
```bash
# Navigate to service directory
cd hosts/synology/atlantis/arr-suite

# Start the service
docker-compose -f docker-compose.yml up -d tdarr

# Check service status
docker-compose -f docker-compose.yml ps

# View logs
docker-compose -f docker-compose.yml logs -f tdarr
```

## 🔧 Configuration

### Docker Compose Configuration
```yaml
tdarr:
  image: ghcr.io/haveagitgat/tdarr:latest
  container_name: tdarr
  environment:
    - PUID=1029
    - PGID=100
    - TZ=America/Los_Angeles
    - UMASK=022
    - serverIP=0.0.0.0
    - serverPort=8266
    - webUIPort=8265
    - internalNode=true
    - inContainer=true
    - ffmpegVersion=6
    - nodeName=TdarrInternalNode
  volumes:
    - /volume2/metadata/docker2/tdarr/server:/app/server
    - /volume2/metadata/docker2/tdarr/configs:/app/configs
    - /volume2/metadata/docker2/tdarr/logs:/app/logs
    - /volume1/data/media:/media
    - /volume3/usenet/tdarr_cache:/temp
  ports:
    - "8265:8265"
    - "8266:8266"
  networks:
    media2_net:
      ipv4_address: 172.24.0.15
  security_opt:
    - no-new-privileges:true
  restart: always
```

### Environment Variables
| Variable | Value | Description |
|----------|-------|-------------|
| `PUID` | `1029` | User ID for file permissions |
| `PGID` | `100` | Group ID for file permissions |
| `TZ` | `America/Los_Angeles` | Timezone setting |
| `UMASK` | `022` | File permission mask |
| `serverIP` | `0.0.0.0` | Server bind address |
| `serverPort` | `8266` | Server communication port |
| `webUIPort` | `8265` | Web UI port |
| `internalNode` | `true` | Enable built-in transcoding node |
| `inContainer` | `true` | Running in container mode |
| `ffmpegVersion` | `6` | FFmpeg version to use |
| `nodeName` | `TdarrInternalNode` | Name for the internal node |

### Port Mappings
| Host Port | Container Port | Protocol | Purpose |
|-----------|----------------|----------|----------|
| 8265 | 8265 | TCP | Web UI |
| 8266 | 8266 | TCP | Server communication |

### Volume Mappings
| Host Path | Container Path | Type | Purpose |
|-----------|----------------|------|----------|
| `/volume2/metadata/docker2/tdarr/server` | `/app/server` | bind | Server data |
| `/volume2/metadata/docker2/tdarr/configs` | `/app/configs` | bind | Configuration files |
| `/volume2/metadata/docker2/tdarr/logs` | `/app/logs` | bind | Log files |
| `/volume1/data/media` | `/media` | bind | Media library |
| `/volume3/usenet/tdarr_cache` | `/temp` | bind | Transcode cache (NVMe) |

## 🌐 Access Information

| Interface | URL |
|-----------|-----|
| Web UI | `http://192.168.0.200:8265` |
| Server | `http://192.168.0.200:8266` |

## 🔒 Security Considerations

- ✅ Security options configured (no-new-privileges)
- ✅ Running with specific user/group IDs
- ⚠️ Ensure media permissions are correctly set

## 📊 Resource Requirements

### Recommended Resources
- **Minimum RAM**: 4GB (8GB+ recommended for transcoding)
- **Recommended RAM**: 8GB+
- **CPU**: 4+ cores (transcoding is CPU-intensive)
- **GPU**: Optional but highly recommended for hardware transcoding
- **Storage**: Fast NVMe for cache (improves transcode speed significantly)

### Resource Monitoring
```bash
docker stats tdarr
```

## 🔍 Health Monitoring

### Manual Health Checks

Each host requires a different method to access Docker. Synology NAS systems have Docker at `/usr/local/bin/docker` and require `sudo`:

```bash
# Check tdarr server (Atlantis - Synology)
ssh atlantis "sudo /usr/local/bin/docker ps --filter name=tdarr --format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}'"

# Check node on Calypso (Synology)
ssh calypso "sudo /usr/local/bin/docker ps --filter name=tdarr --format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}'"

# Check node on Guava (TrueNAS Scale) — user is in docker group, no sudo needed
ssh guava "docker ps --filter name=tdarr --format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}'"

# Check node on NUC-QSV — runs inside Proxmox LXC 103, not on the NUC host directly
ssh pve "pct exec 103 -- docker ps --filter name=tdarr --format 'table {{.Names}}\t{{.Status}}\t{{.RunningFor}}'"

# Check web UI is responding
curl -s http://192.168.0.200:8265/api/v2/status
```

### Expected Container Names and Uptime

| Host | Container Name | SSH Alias | Notes |
|------|---------------|-----------|-------|
| Atlantis | `tdarr` | `atlantis` | Server + TdarrInternalNode |
| Calypso | `tdarr-node-calypso` | `calypso` | CPU-only node |
| Guava | `tdarr-node-guava` | `guava` | VAAPI GPU node |
| NUC (Proxmox LXC 103) | `tdarr-node` | `pve` + `pct exec 103` | Intel QSV GPU node |

### Portainer Access
- URL: `https://192.168.0.200:9443`
- Endpoint IDs: Atlantis=2, vish-concord-nuc=443395, Calypso=443397
- Note: Guava (TrueNAS) and NUC LXC 103 are not managed by Portainer
- NUC's Portainer endpoint (443395) shows the NUC host Docker — the tdarr-node runs in Proxmox LXC 103 with its own Docker socket, which is not visible there

### Node Worker Configuration

| Node | HC-CPU | TC-CPU | TC-GPU | Notes |
|------|--------|--------|--------|-------|
| TdarrInternalNode | 1 | 0 | 0 | Health checks only |
| NUC-QSV | 1 | 0 | 1 | Intel QSV GPU transcoding |
| Calypso-CPU | 1 | 0 | 0 | Health checks only (CPU transcode disabled) |
| Guava-VAAPI | 1 | 0 | 1 | AMD VAAPI GPU transcoding |

## 🚨 Troubleshooting

### Common Issues

**Transcoding stuck or slow**
- Check cache disk space: `df -h /volume3/usenet/tdarr_cache`
- Verify media permissions
- Check CPU/memory usage: `docker stats tdarr`
- Consider adding GPU transcoding

**Files not appearing**
- Verify media library path is correct
- Check file permissions (PUID/PGID)
- Scan library manually in Tdarr UI

**Node offline**
- Check container status on each host using the SSH commands in the Health Monitoring section above
- Atlantis/Calypso (Synology): `ssh <host> "sudo /usr/local/bin/docker logs tdarr-node-<name>"`
- Guava: `ssh guava "docker logs tdarr-node-guava"`
- NUC: `ssh pve "pct exec 103 -- docker logs tdarr-node"`
- All nodes lost connection when Tdarr server restarts — they reconnect automatically within ~30 seconds

**Node schedule showing all zeros (no workers active)**
- The per-node schedule overrides worker limits. If all 24 hours are set to 0 workers, nothing will process even if the base worker limits are configured.
- Fix via Tdarr UI (Nodes tab → schedule grid) or via API:
```bash
curl -s -X POST http://192.168.0.200:8265/api/v2/update-node \
  -H 'Content-Type: application/json' \
  -d '{"data":{"nodeID":"<NODE_ID>","nodeUpdates":{"schedule":[...]}}}'
```

**Transcodes complete but file not replaced (NFS nodes)**
- Remote nodes (Guava, Calypso) access media via NFS from Atlantis
- Atlantis NFS export for `/volume1/data` uses `all_squash,anonuid=1024` by default for 192.168.0.0/24, which maps all writes to uid 1024
- Media files are owned by uid 1029 — uid 1024 cannot write to them
- **Fix**: In DSM → Shared Folder → data → NFS Permissions, add a host-specific rule for the node's IP (e.g., 192.168.0.100 for Guava) with **No mapping** (no_all_squash) so the node can write as its actual UID
- The cache export (`/volume3/usenet`) already has `no_all_squash` so cache writes always work

**Flow edge missing — transcodes succeed but marked as error**
- If a flow plugin's output has no outgoing edge AND the working file is still in the transcode cache, Tdarr marks the job as "Transcode error"
- The fix: wire unconnected outputs to `setOriginalFile` — this resets the working file pointer to the original library path (not in cache), letting the job close cleanly as "Transcode success"
- See "Resetting errored files" in the Flows section to re-queue affected files

**Guava-VAAPI encoder limitations**
- AMD Radeon 760M (Phoenix) only supports `hevc_vaapi` encoding
- `h264_vaapi` and `av1_vaapi` both fail (exit code 228)
- The flow uses `hardwareType: auto` — Tdarr probes available hardware per node and selects the appropriate encoder automatically (VAAPI on Guava, QSV on NUC, CPU on others)
- Guava logs may show some encoder probe failures before settling on `hevc_vaapi` — this is expected

**Server/node version mismatch (Homarr widget ETIMEDOUT / nodes offline)**
- Tdarr server and all nodes must run the **same version**. A mismatch causes nodes to fail to connect, and the Homarr widget returns `ETIMEDOUT` because the server is effectively unreachable.
- The image tag is `:latest` on all compose files — but `docker pull` alone is not enough if the old container is still running the old image.
- **Fix**: Pull + stop + remove + recreate. On each affected host:
  ```bash
  # On Atlantis (server)
  ssh atlantis "sudo /usr/local/bin/docker pull ghcr.io/haveagitgat/tdarr:latest && \
    sudo /usr/local/bin/docker stop tdarr && \
    sudo /usr/local/bin/docker rm tdarr && \
    sudo /usr/local/bin/docker compose -f /volume1/docker/arr-suite/docker-compose.yml up -d tdarr"

  # On Calypso (node)
  ssh calypso "sudo /usr/local/bin/docker pull ghcr.io/haveagitgat/tdarr_node:latest && \
    sudo /usr/local/bin/docker stop tdarr-node-calypso && \
    sudo /usr/local/bin/docker rm tdarr-node-calypso && \
    sudo /usr/local/bin/docker compose -f /volume1/docker/tdarr-node/docker-compose.yaml up -d"
  ```
- After updating, all 4 nodes reconnect within ~30 seconds.

### Useful Commands
```bash
# View real-time logs (run on Atlantis via SSH)
ssh atlantis "sudo /usr/local/bin/docker logs -f tdarr"

# Restart server
ssh atlantis "sudo /usr/local/bin/docker restart tdarr"

# Restart a node
ssh calypso "sudo /usr/local/bin/docker restart tdarr-node-calypso"
ssh guava "docker restart tdarr-node-guava"
ssh pve "pct exec 103 -- docker restart tdarr-node"

# Update server image (pull + stop + rm + recreate to ensure new image is used)
ssh atlantis "sudo /usr/local/bin/docker pull ghcr.io/haveagitgat/tdarr:latest && \
  sudo /usr/local/bin/docker stop tdarr && sudo /usr/local/bin/docker rm tdarr && \
  sudo /usr/local/bin/docker compose -f /volume1/docker/arr-suite/docker-compose.yml up -d tdarr"

# Access server shell
ssh atlantis "sudo /usr/local/bin/docker exec -it tdarr /bin/bash"
```

## 🔄 Tdarr Flows

The system uses **Tdarr Flows** (instead of Classic Plugin Stack) for GPU/CPU fallback:

### Flow: "HEVC GPU with CPU Fallback"

Flow ID: `IMZomXmXOI` | DB: `flowsjsondb` on Atlantis

```
Input File
    ↓
Begin Command (ffmpegCommandStart)
    ↓
Set Video Encoder — Auto GPU, CRF/QP 20 (ffmpegCommandSetVideoEncoder)
    ↓                  hardwareType: auto, hardwareEncoding: true
Execute (ffmpegCommandExecute)
    ↓
Compare File Size Ratio (compareFileSizeRatio)  ← 50–100% of original
    │                                   │
  output 1: within range            output 2: out of range
  (file shrank → replace)           (file grew → keep original)
    ↓                                   ↓
Replace Original File           Set Original File
                                (resets working file to original,
                                 discards transcode from cache,
                                 completes as Transcode success)
```

**Flow Plugins:**
| Plugin | ID | Key Settings |
|--------|----|-------------|
| `inputFile` | `7IuuWhx9FF` | — |
| `ffmpegCommandStart` | `cmd_start` | — |
| `ffmpegCommandSetVideoEncoder` | `cmd_encoder` | `outputCodec: hevc`, `hardwareType: auto`, `hardwareEncoding: true`, `hardwareDecoding: true`, `ffmpegQuality: 20`, `forceEncoding: false` |
| `ffmpegCommandExecute` | `cmd_execute` | — |
| `compareFileSizeRatio` | `size_check` | `greaterThan: 50`, `lessThan: 100` |
| `replaceOriginalFile` | `JhCV_UZp7` | output 1 path (file smaller) |
| `setOriginalFile` | `keep_original` | output 2 path (file larger) |

**How it works:**
1. `hardwareType: auto` — Tdarr selects the best available encoder per node: QSV on NUC, VAAPI on Guava, CPU on Atlantis/Calypso
2. `ffmpegQuality: 20` — maps to `-global_quality 20` (QSV), `-qp 20` (VAAPI), `-crf 20` (CPU) — high quality, visually lossless
3. `forceEncoding: false` — files already in HEVC are skipped (marked Not Required)
4. `compareFileSizeRatio` — only replaces if output is 50–100% of original size
5. `setOriginalFile` on output 2 — when the encode is larger, resets the working file pointer back to the original library file (not in cache), allowing the job to complete as "Transcode success" instead of "Transcode error". The oversized transcode in cache is discarded.

> **Note:** Do NOT leave output 2 of `compareFileSizeRatio` unconnected. Tdarr requires the final working file to not be in the cache — an unconnected output 2 leaves the transcode in the cache and Tdarr marks the job as "Transcode error" even though the original file was never replaced.

> **Warning:** Do NOT set `hardwareDecoding: false`. VAAPI encoding requires hardware context (`-hwaccel vaapi -hwaccel_device ...`) that the plugin only adds when `hardwareDecoding: true`. Disabling it breaks Guava entirely — all jobs fail immediately with empty video output. The NUC's QSV decode also works better with it enabled. To handle codecs the hardware decoder can't decode (e.g., AV1 on QSV), exclude them at the library level instead (see Library Codec Filter below).

**Modifying the flow:** The flow is stored directly in the SQLite DB. To update it:
```bash
ssh atlantis "python3 << 'PYEOF'
import sqlite3, json, time
db = '/volume2/metadata/docker2/tdarr/server/Tdarr/DB2/SQL/database.db'
# ... build new_flow dict ...
conn = sqlite3.connect(db)
cur = conn.cursor()
cur.execute('UPDATE flowsjsondb SET json_data = ?, timestamp = ? WHERE id = ?',
            (json.dumps(new_flow), int(time.time()*1000), 'IMZomXmXOI'))
conn.commit()
conn.close()
PYEOF
"
ssh atlantis "sudo /usr/local/bin/docker restart tdarr"
```

> **Warning:** `checkFileSize` (the similarly-named built-in plugin) checks absolute file size (0–10000 GB by default), NOT relative to the original. It will pass every file and always replace. Use `compareFileSizeRatio` instead.

### Library Codec Filter

All three libraries (other, tv, anime) have a Classic Plugin Stack pre-filter that prevents certain codecs from being queued for transcoding:

| Library | `codecsToNotProcess` |
|---------|----------------------|
| other | `hevc,h265,av1` |
| tv | `hevc,h265,av1` |
| anime | `hevc,h265,av1` |

**Why AV1 is excluded:** AV1 is a newer, more efficient codec than HEVC. Converting AV1 → HEVC would produce larger files at the same quality. AV1 hardware decoding is also not supported on the NUC's Intel QSV, causing exit code 69 failures. AV1 files are left as-is.

To update the codec filter (e.g., to add a new codec to skip):
```bash
ssh atlantis "python3 << 'PYEOF'
import sqlite3, json, time
db = '/volume2/metadata/docker2/tdarr/server/Tdarr/DB2/SQL/database.db'
conn = sqlite3.connect(db)
cur = conn.cursor()
cur.execute('SELECT id, json_data FROM librarysettingsjsondb')
for row in cur.fetchall():
    lib = json.loads(row[1])
    for p in lib.get('pluginIDs', []):
        if p.get('id') == 'Tdarr_Plugin_00td_filter_by_codec':
            print(lib['name'], '->', p['InputsDB']['codecsToNotProcess'])
conn.close()
PYEOF
"
```

### Resetting errored files

After fixing a flow or filter issue, reset all "Transcode error" files back to Queued:
```bash
ssh atlantis "python3 << 'PYEOF'
import sqlite3
db = '/volume2/metadata/docker2/tdarr/server/Tdarr/DB2/SQL/database.db'
conn = sqlite3.connect(db)
cur = conn.cursor()
cur.execute(\"UPDATE filejsondb SET json_data = json_set(json_data, '$.TranscodeDecisionMaker', 'Queued') WHERE json_extract(json_data, '$.TranscodeDecisionMaker') = 'Transcode error'\")
print(f'Reset {cur.rowcount} files')
conn.commit()
conn.close()
PYEOF
"
```

### Enabling Flows on a Library
1. Go to **Libraries** → Select library
2. **Transcode Options** tab
3. Toggle **Flows: ON**
4. Toggle **Classic Plugin Stack: OFF**
5. Select the flow: "HEVC GPU with CPU Fallback"

## 🔧 Adding New Worker Nodes

### NFS Permissions for Remote Nodes

Remote nodes access media and cache via NFS from Atlantis. The NFS export settings on Atlantis must allow the node to write:

| Export | Path | Squash | Notes |
|--------|------|--------|-------|
| `/volume1/data` | Media library | `all_squash,anonuid=1024` (subnet default) | Must add per-host rule with **No mapping** for nodes that need write access |
| `/volume3/usenet` | Transcode cache | `no_all_squash` | Writable by all nodes |

To add write access for a new node: DSM → Control Panel → Shared Folder → data → Edit → NFS Permissions → Create rule for the node's IP with **No mapping**, **Read/Write**, **async**, **non-privileged ports allowed**.

### Adding a CPU-only Node (e.g., Synology NAS)

1. **Set up NFS mounts** to access media and cache:
```bash
# Create mount points
mkdir -p /mnt/atlantis_media /mnt/atlantis_cache

# Mount NFS shares
mount -t nfs 192.168.0.200:/volume1/data/media /mnt/atlantis_media -o rw,soft,nfsvers=3
mount -t nfs 192.168.0.200:/volume3/usenet/tdarr_cache /mnt/atlantis_cache -o rw,soft,nfsvers=3
```

2. **Create docker-compose.yaml**:
```yaml
services:
  tdarr-node:
    image: ghcr.io/haveagitgat/tdarr_node:latest
    container_name: tdarr-node-<hostname>
    environment:
      - PUID=1029
      - PGID=100
      - TZ=America/Los_Angeles
      - nodeName=<NodeName>
      - serverIP=192.168.0.200
      - serverPort=8266
      - inContainer=true
      - ffmpegVersion=6
    volumes:
      - ./configs:/app/configs
      - ./logs:/app/logs
      - /mnt/atlantis_media:/media
      - /mnt/atlantis_cache:/temp
      - /mnt/atlantis_cache:/cache
    restart: always
```

3. **Configure workers** in Tdarr UI:
   - Go to **Nodes** tab
   - Set transcode CPU workers (2-4 recommended)
   - Set healthcheck CPU workers (1-2 recommended)

### Adding a GPU Node (Intel QSV)

Same as above, but add device passthrough:
```yaml
devices:
  - /dev/dri:/dev/dri  # Intel QSV
```

## 📚 Additional Resources

- **Official Documentation**: [Tdarr Wiki](https://docs.tdarr.io/)
- **GitHub**: [HaveAGitGat/Tdarr](https://github.com/HaveAGitGat/Tdarr)
- **Discord**: Active community support

## 🔗 Related Services

Services REDACTED_APP_PASSWORD Tdarr:
- Plex
- Jellyfin
- Sonarr
- Radarr

---

*Last Updated*: 2026-03-10 (version sync troubleshooting, update procedure)
*Configuration Source*: `hosts/synology/atlantis/arr-suite/docker-compose.yml`