7.2 KiB
7.2 KiB
Watchtower Atlantis Incident Report - February 9, 2026
📋 Incident Summary
| Field | Value |
|---|---|
| Date | February 9, 2026 |
| Time | 01:45 PST |
| Severity | Medium |
| Status | ✅ RESOLVED |
| Affected Service | Watchtower (Atlantis) |
| Duration | ~15 minutes |
| Reporter | User |
| Resolver | OpenHands Agent |
🚨 Problem Description
Issue: Watchtower container on Atlantis server was not running, preventing automatic Docker container updates.
Symptoms:
- Watchtower container in "Created" state but not running
- No automatic container updates occurring
- Container logs empty (never started)
🔍 Root Cause Analysis
Primary Cause: Container was created but never started, likely due to:
- System restart without proper container startup
- Manual container stop without restart
- Docker daemon restart that didn't auto-start the container
Contributing Factors:
- User permission issues requiring
sudofor Docker commands - Container was properly configured but simply not running
🛠️ Resolution Steps
1. Initial Diagnosis
# Connected to Atlantis server via SSH
ssh atlantis
# Attempted to check container status (permission denied)
docker ps -a | grep -i watchtower
# Error: permission denied while trying to connect to Docker daemon socket
# Used sudo to check container status
sudo docker ps -a | grep -i watchtower
# Found: Container in "Created" state, not running
2. Container Analysis
# Checked container logs (empty - never started)
sudo docker logs watchtower
# Verified container configuration
sudo docker inspect watchtower | grep -A 5 -B 5 "RestartPolicy"
# Confirmed: RestartPolicy set to "always" (correct)
3. Resolution Implementation
# Started the Watchtower container
sudo docker start watchtower
# Result: watchtower (container started successfully)
# Verified container is running
sudo docker ps | grep watchtower
# Result: Container running and healthy
4. Functionality Verification
# Checked container logs for proper startup
sudo docker logs watchtower --tail 20
# Confirmed: Watchtower 1.7.1 started successfully
# Confirmed: HTTP API enabled on port 8080 (mapped to 8082)
# Confirmed: Checking all containers enabled
# Tested HTTP API (without authentication)
curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8082/v1/update
# Result: HTTP 401 (expected - API requires authentication)
# Verified API token configuration
sudo docker inspect watchtower | grep -i "api\|token\|auth" -A 2 -B 2
# Found: WATCHTOWER_HTTP_API_TOKEN="REDACTED_HTTP_TOKEN"
✅ Current Status
Container Status: ✅ Running and Healthy
- Container ID:
9f8fee3fbcea - Status: Up and running (healthy)
- Uptime: Stable since fix
- Port Mapping: 8082:8080 (HTTP API accessible)
Configuration Verified:
- ✅ Restart Policy:
always(will auto-start on reboot) - ✅ HTTP API: Enabled with authentication token
- ✅ Cleanup: Enabled (removes old images)
- ✅ Rolling Restart: Enabled (minimizes disruption)
- ✅ Timeout: 30s (graceful shutdown)
API Access:
- URL:
http://atlantis:8082/v1/update - Authentication: Bearer token
watchtower-update-token - Status: Functional and secured
🔧 Configuration Details
Current Watchtower Configuration
# From running container inspection
Environment:
- WATCHTOWER_POLL_INTERVAL=3600
- WATCHTOWER_TIMEOUT=10s
- WATCHTOWER_HTTP_API_UPDATE=true
- WATCHTOWER_HTTP_API_TOKEN="REDACTED_HTTP_TOKEN"
- TZ=America/Los_Angeles
Restart Policy: always
Port Mapping: 8082:8080
Volume Mounts: /var/run/docker.sock:/var/run/docker.sock:ro
Differences from Repository Configuration
The running container configuration differs from the repository watchtower.yml:
| Setting | Repository Config | Running Container |
|---|---|---|
| API Token | REDACTED_WATCHTOWER_TOKEN |
watchtower-update-token |
| Poll Interval | Not set (uses schedule) | 3600 seconds |
| Timeout | 30s |
10s |
| Schedule | "0 0 */2 * * *" |
Not visible (may use polling) |
Recommendation: Update repository configuration to match running container or vice versa for consistency.
🚀 Prevention Measures
Immediate Actions Taken
- ✅ Container restarted and verified functional
- ✅ Confirmed restart policy is set to "always"
- ✅ Verified API functionality and security
Recommended Long-term Improvements
1. Monitoring Enhancement
# Add to monitoring stack
# Monitor Watchtower container health
# Alert on container state changes
2. Documentation Updates
- Update service documentation with correct API token
- Document troubleshooting steps for similar issues
- Create runbook for Watchtower maintenance
3. Automation Improvements
# Create health check script
#!/bin/bash
# Check if Watchtower is running and restart if needed
if ! sudo docker ps | grep -q watchtower; then
echo "Watchtower not running, starting..."
sudo docker start watchtower
fi
4. Configuration Synchronization
- Reconcile differences between repository config and running container
- Implement configuration management to prevent drift
📚 Related Documentation
- Service Config:
/home/homelab/organized/repos/homelab/Atlantis/watchtower.yml - Status Script:
/home/homelab/organized/repos/homelab/scripts/check-watchtower-status.sh - Emergency Script:
/home/homelab/organized/repos/homelab/scripts/emergency-fix-watchtower-crash.sh - Service Docs:
/home/homelab/organized/repos/homelab/docs/services/individual/watchtower.md
🔗 Useful Commands
Status Checking
# Check container status
sudo docker ps | grep watchtower
# View container logs
sudo docker logs watchtower --tail 20
# Check container health
sudo docker inspect watchtower --format='{{.State.Health.Status}}'
API Testing
# Test API without authentication (should return 401)
curl -s -w "\nHTTP Status: %{http_code}\n" http://localhost:8082/v1/update
# Test API with authentication
curl -s -H "Authorization: Bearer watchtower-update-token" http://localhost:8082/v1/update
Container Management
# Start container
sudo docker start watchtower
# Restart container
sudo docker restart watchtower
# View container configuration
sudo docker inspect watchtower
📊 Lessons Learned
- Permission Management: Docker commands on Atlantis require
sudoprivileges - Container States: "Created" state indicates container exists but was never started
- Configuration Drift: Running containers may differ from repository configurations
- API Security: Watchtower API properly requires authentication (good security practice)
- Restart Policies: "always" restart policy doesn't help if container was never started initially
🎯 Action Items
- Update repository configuration to match running container
- Implement automated health checks for Watchtower
- Add Watchtower monitoring to existing monitoring stack
- Create user permissions documentation for Docker access
- Schedule regular configuration drift checks
Incident Closed: February 9, 2026 02:00 PST
Resolution Time: 15 minutes
Next Review: February 16, 2026 (1 week follow-up)