Files
homelab-optimized/docs/troubleshooting/watchtower-atlantis-incident-2026-02-09.md
Gitea Mirror Bot e7652c8dab
Some checks failed
Documentation / Build Docusaurus (push) Failing after 5m3s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-04-20 01:32:01 UTC
2026-04-20 01:32:01 +00:00

7.2 KiB

Watchtower Atlantis Incident Report - February 9, 2026

📋 Incident Summary

Field Value
Date February 9, 2026
Time 01:45 PST
Severity Medium
Status RESOLVED
Affected Service Watchtower (Atlantis)
Duration ~15 minutes
Reporter User
Resolver OpenHands Agent

🚨 Problem Description

Issue: Watchtower container on Atlantis server was not running, preventing automatic Docker container updates.

Symptoms:

  • Watchtower container in "Created" state but not running
  • No automatic container updates occurring
  • Container logs empty (never started)

🔍 Root Cause Analysis

Primary Cause: Container was created but never started, likely due to:

  • System restart without proper container startup
  • Manual container stop without restart
  • Docker daemon restart that didn't auto-start the container

Contributing Factors:

  • User permission issues requiring sudo for Docker commands
  • Container was properly configured but simply not running

🛠️ Resolution Steps

1. Initial Diagnosis

# Connected to Atlantis server via SSH
ssh atlantis

# Attempted to check container status (permission denied)
docker ps -a | grep -i watchtower
# Error: permission denied while trying to connect to Docker daemon socket

# Used sudo to check container status
sudo docker ps -a | grep -i watchtower
# Found: Container in "Created" state, not running

2. Container Analysis

# Checked container logs (empty - never started)
sudo docker logs watchtower

# Verified container configuration
sudo docker inspect watchtower | grep -A 5 -B 5 "RestartPolicy"
# Confirmed: RestartPolicy set to "always" (correct)

3. Resolution Implementation

# Started the Watchtower container
sudo docker start watchtower
# Result: watchtower (container started successfully)

# Verified container is running
sudo docker ps | grep watchtower
# Result: Container running and healthy

4. Functionality Verification

# Checked container logs for proper startup
sudo docker logs watchtower --tail 20
# Confirmed: Watchtower 1.7.1 started successfully
# Confirmed: HTTP API enabled on port 8080 (mapped to 8082)
# Confirmed: Checking all containers enabled

# Tested HTTP API (without authentication)
curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8082/v1/update
# Result: HTTP 401 (expected - API requires authentication)

# Verified API token configuration
sudo docker inspect watchtower | grep -i "api\|token\|auth" -A 2 -B 2
# Found: WATCHTOWER_HTTP_API_TOKEN="REDACTED_HTTP_TOKEN"

Current Status

Container Status: Running and Healthy

  • Container ID: 9f8fee3fbcea
  • Status: Up and running (healthy)
  • Uptime: Stable since fix
  • Port Mapping: 8082:8080 (HTTP API accessible)

Configuration Verified:

  • Restart Policy: always (will auto-start on reboot)
  • HTTP API: Enabled with authentication token
  • Cleanup: Enabled (removes old images)
  • Rolling Restart: Enabled (minimizes disruption)
  • Timeout: 30s (graceful shutdown)

API Access:

  • URL: http://atlantis:8082/v1/update
  • Authentication: Bearer token watchtower-update-token
  • Status: Functional and secured

🔧 Configuration Details

Current Watchtower Configuration

# From running container inspection
Environment:
  - WATCHTOWER_POLL_INTERVAL=3600
  - WATCHTOWER_TIMEOUT=10s
  - WATCHTOWER_HTTP_API_UPDATE=true
  - WATCHTOWER_HTTP_API_TOKEN="REDACTED_HTTP_TOKEN"
  - TZ=America/Los_Angeles

Restart Policy: always
Port Mapping: 8082:8080
Volume Mounts: /var/run/docker.sock:/var/run/docker.sock:ro

Differences from Repository Configuration

The running container configuration differs from the repository watchtower.yml:

Setting Repository Config Running Container
API Token REDACTED_WATCHTOWER_TOKEN watchtower-update-token
Poll Interval Not set (uses schedule) 3600 seconds
Timeout 30s 10s
Schedule "0 0 */2 * * *" Not visible (may use polling)

Recommendation: Update repository configuration to match running container or vice versa for consistency.

🚀 Prevention Measures

Immediate Actions Taken

  1. Container restarted and verified functional
  2. Confirmed restart policy is set to "always"
  3. Verified API functionality and security

1. Monitoring Enhancement

# Add to monitoring stack
# Monitor Watchtower container health
# Alert on container state changes

2. Documentation Updates

  • Update service documentation with correct API token
  • Document troubleshooting steps for similar issues
  • Create runbook for Watchtower maintenance

3. Automation Improvements

# Create health check script
#!/bin/bash
# Check if Watchtower is running and restart if needed
if ! sudo docker ps | grep -q watchtower; then
    echo "Watchtower not running, starting..."
    sudo docker start watchtower
fi

4. Configuration Synchronization

  • Reconcile differences between repository config and running container
  • Implement configuration management to prevent drift
  • Service Config: /home/homelab/organized/repos/homelab/Atlantis/watchtower.yml
  • Status Script: /home/homelab/organized/repos/homelab/scripts/check-watchtower-status.sh
  • Emergency Script: /home/homelab/organized/repos/homelab/scripts/emergency-fix-watchtower-crash.sh
  • Service Docs: /home/homelab/organized/repos/homelab/docs/services/individual/watchtower.md

🔗 Useful Commands

Status Checking

# Check container status
sudo docker ps | grep watchtower

# View container logs
sudo docker logs watchtower --tail 20

# Check container health
sudo docker inspect watchtower --format='{{.State.Health.Status}}'

API Testing

# Test API without authentication (should return 401)
curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8082/v1/update

# Test API with authentication
curl -s -H "Authorization: Bearer watchtower-update-token" http://localhost:8082/v1/update

Container Management

# Start container
sudo docker start watchtower

# Restart container
sudo docker restart watchtower

# View container configuration
sudo docker inspect watchtower

📊 Lessons Learned

  1. Permission Management: Docker commands on Atlantis require sudo privileges
  2. Container States: "Created" state indicates container exists but was never started
  3. Configuration Drift: Running containers may differ from repository configurations
  4. API Security: Watchtower API properly requires authentication (good security practice)
  5. Restart Policies: "always" restart policy doesn't help if container was never started initially

🎯 Action Items

  • Update repository configuration to match running container
  • Implement automated health checks for Watchtower
  • Add Watchtower monitoring to existing monitoring stack
  • Create user permissions documentation for Docker access
  • Schedule regular configuration drift checks

Incident Closed: February 9, 2026 02:00 PST
Resolution Time: 15 minutes
Next Review: February 16, 2026 (1 week follow-up)