Sanitized mirror from private repository - 2026-03-11 09:26:00 UTC
This commit is contained in:
474
docs/runbooks/add-new-service.md
Normal file
474
docs/runbooks/add-new-service.md
Normal file
@@ -0,0 +1,474 @@
|
||||
# Add New Service Runbook
|
||||
|
||||
> **Looking for the step-by-step deployment guide?**
|
||||
> See [Deploy a New Service — End-to-End](../guides/deploy-new-service-gitops.md) for a
|
||||
> complete walkthrough from compose file to live container, including CI pipeline verification.
|
||||
> This runbook covers the extended checklist (monitoring, backups, SSO) for production readiness.
|
||||
|
||||
## Overview
|
||||
This runbook guides you through deploying a new containerized service to the homelab using GitOps with Portainer. The procedure ensures proper configuration, monitoring, and documentation.
|
||||
|
||||
## Prerequisites
|
||||
- [ ] Git access to the homelab repository
|
||||
- [ ] Portainer access (https://192.168.0.200:9443) - Portainer EE v2.33.7
|
||||
- [ ] Target host selected and available
|
||||
- [ ] Service Docker Compose file prepared
|
||||
- [ ] Required environment variables identified
|
||||
- [ ] Network requirements understood (ports, domains, etc.)
|
||||
|
||||
## Current GitOps Status
|
||||
- **Active Deployments**: 18 compose stacks on Atlantis (verified Feb 14, 2026)
|
||||
- **Total Containers**: 50+ containers across infrastructure
|
||||
- **GitOps Method**: Automatic sync from Git repository via Portainer EE
|
||||
|
||||
## Metadata
|
||||
- **Estimated Time**: 30-60 minutes
|
||||
- **Risk Level**: Low (if following proper testing)
|
||||
- **Requires Downtime**: No (for new services)
|
||||
- **Reversible**: Yes (can remove stack)
|
||||
- **Tested On**: 2026-02-14
|
||||
|
||||
## Decision: Which Host?
|
||||
|
||||
Choose the appropriate host based on service requirements:
|
||||
|
||||
| Host | Best For | Available Resources | GitOps Status |
|
||||
|------|----------|-------------------|---------------|
|
||||
| **Atlantis** (DS1823xs+) | Media services, high I/O, primary storage | 8 CPU, 31GB RAM, 50+ containers | ✅ 18 Active Stacks |
|
||||
| **Calypso** (DS723+) | Secondary media, backup services | 4 CPU, 31GB RAM, 46 containers | ✅ GitOps Ready |
|
||||
| **Concord NUC** | Network services, DNS, VPN | 4 CPU, 15.5GB RAM, 17 containers | ✅ GitOps Ready |
|
||||
| **Homelab VM** | Development, monitoring, testing | 4 CPU, 28.7GB RAM, 23 containers | ✅ GitOps Ready |
|
||||
| **Raspberry Pi 5** | IoT, edge computing, lightweight services | 4 CPU, 15.8GB RAM, 4 containers | ✅ GitOps Ready |
|
||||
|
||||
## Procedure
|
||||
|
||||
### Step 1: Create Docker Compose Configuration
|
||||
|
||||
Create a new compose file in the appropriate host directory:
|
||||
|
||||
```bash
|
||||
cd ~/Documents/repos/homelab
|
||||
# Choose the appropriate path:
|
||||
# - hosts/synology/atlantis/
|
||||
# - hosts/synology/calypso/
|
||||
# - hosts/physical/concord-nuc/
|
||||
# - hosts/vms/homelab-vm/
|
||||
# - hosts/edge/raspberry-pi-5/
|
||||
|
||||
# Create new service file
|
||||
nano hosts/[host]/[service-name].yaml
|
||||
```
|
||||
|
||||
Example Docker Compose structure:
|
||||
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
service-name:
|
||||
image: organization/image:tag
|
||||
container_name: service-name
|
||||
restart: unless-stopped
|
||||
|
||||
environment:
|
||||
- PUID=1000
|
||||
- PGID=1000
|
||||
- TZ=America/Los_Angeles
|
||||
# Add service-specific variables
|
||||
|
||||
volumes:
|
||||
- /path/to/config:/config
|
||||
- /path/to/data:/data
|
||||
|
||||
ports:
|
||||
- "8080:8080" # external:internal
|
||||
|
||||
networks:
|
||||
- service-network
|
||||
|
||||
# Optional: health check
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
|
||||
# Optional: resource limits
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '2.0'
|
||||
memory: 4G
|
||||
reservations:
|
||||
cpus: '1.0'
|
||||
memory: 2G
|
||||
|
||||
networks:
|
||||
service-network:
|
||||
driver: bridge
|
||||
|
||||
# Optional: named volumes
|
||||
volumes:
|
||||
service-data:
|
||||
driver: local
|
||||
```
|
||||
|
||||
### Step 2: Configure Environment Variables
|
||||
|
||||
If your service requires sensitive data, create an `.env` file (ensure it's in `.gitignore`):
|
||||
|
||||
```bash
|
||||
# Create .env file (DO NOT commit to git)
|
||||
nano .env.example # Template for others
|
||||
|
||||
# Example .env content:
|
||||
# SERVICE_API_KEY=REDACTED_API_KEY
|
||||
# SERVICE_SECRET=your_secret_here
|
||||
# DATABASE_PASSWORD="REDACTED_PASSWORD"
|
||||
```
|
||||
|
||||
### Step 3: Validate Configuration Locally (Optional but Recommended)
|
||||
|
||||
Test the compose file syntax:
|
||||
|
||||
```bash
|
||||
# Validate syntax
|
||||
docker-compose -f hosts/[host]/[service-name].yaml config
|
||||
|
||||
# Expected output: Valid YAML with no errors
|
||||
```
|
||||
|
||||
### Step 4: Commit and Push to Git Repository
|
||||
|
||||
```bash
|
||||
# Add the new service file
|
||||
git add hosts/[host]/[service-name].yaml
|
||||
|
||||
# If adding .env.example template
|
||||
git add .env.example
|
||||
|
||||
# Commit with descriptive message
|
||||
git commit -m "Add [service-name] deployment for [host]
|
||||
|
||||
- Add Docker Compose configuration
|
||||
- Configure environment variables
|
||||
- Set resource limits and health checks
|
||||
- Documentation: [purpose of service]
|
||||
|
||||
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"
|
||||
|
||||
# Push to remote
|
||||
git push origin main
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
[main abc1234] Add service-name deployment for host
|
||||
1 file changed, 45 insertions(+)
|
||||
create mode 100644 hosts/host/service-name.yaml
|
||||
```
|
||||
|
||||
### Step 5: Deploy via Portainer GitOps
|
||||
|
||||
#### Option A: CI Auto-Deploy (for existing stacks)
|
||||
Once a stack is registered in Portainer, every future `git push` to `main` triggers
|
||||
`portainer-deploy.yml` in Gitea CI, which redeploies matching stacks automatically within
|
||||
~30 seconds. Watch the run at `https://git.vish.gg/Vish/homelab/actions`.
|
||||
|
||||
For a **new** stack (first deploy), you must register it in Portainer manually via Option B below —
|
||||
the CI can only redeploy stacks that already exist in Portainer.
|
||||
|
||||
#### Option B: Manual Deployment via Portainer UI
|
||||
1. Open Portainer: http://vishinator.synology.me:10000
|
||||
2. Navigate to the target endpoint (e.g., "Atlantis", "Calypso")
|
||||
3. Click **Stacks** → **Add stack**
|
||||
4. Configure stack:
|
||||
- **Name**: `service-name`
|
||||
- **Build method**: **Git Repository**
|
||||
- **Repository URL**: Your git repository URL
|
||||
- **Repository reference**: `refs/heads/main`
|
||||
- **Compose path**: `hosts/[host]/[service-name].yaml`
|
||||
- **GitOps updates**: ✅ Enable (optional)
|
||||
5. Add environment variables if needed
|
||||
6. Click **Deploy the stack**
|
||||
|
||||
### Step 6: Verify Deployment
|
||||
|
||||
Check container status:
|
||||
|
||||
```bash
|
||||
# SSH to the host
|
||||
ssh atlantis # or appropriate host
|
||||
|
||||
# Check container is running
|
||||
docker ps | grep service-name
|
||||
|
||||
# Check logs for errors
|
||||
docker logs service-name --tail 50
|
||||
|
||||
# Check resource usage
|
||||
docker stats service-name --no-stream
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
CONTAINER ID IMAGE STATUS PORTS
|
||||
abc123def456 org/image:tag Up 2 minutes 0.0.0.0:8080->8080/tcp
|
||||
```
|
||||
|
||||
### Step 7: Configure Networking (If External Access Needed)
|
||||
|
||||
#### For Services Behind Reverse Proxy:
|
||||
1. Add to Nginx Proxy Manager or configure reverse proxy
|
||||
2. Create DNS record (Cloudflare or local DNS)
|
||||
3. Configure SSL certificate (Let's Encrypt)
|
||||
|
||||
#### For Services Using Authentik SSO:
|
||||
1. Add application in Authentik
|
||||
2. Configure OAuth2/SAML provider
|
||||
3. Update service with Authentik integration
|
||||
|
||||
See [Authentik SSO Guide](../infrastructure/authentik-sso.md) for details.
|
||||
|
||||
### Step 8: Add to Monitoring (Optional but Recommended)
|
||||
|
||||
Add service to monitoring stack:
|
||||
|
||||
```yaml
|
||||
# Add to prometheus/prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: 'service-name'
|
||||
static_configs:
|
||||
- targets: ['service-name:8080']
|
||||
```
|
||||
|
||||
Update Grafana dashboards if needed.
|
||||
|
||||
### Step 9: Document the Service
|
||||
|
||||
Update service inventory:
|
||||
|
||||
```bash
|
||||
# Edit service documentation
|
||||
nano docs/services/VERIFIED_SERVICE_INVENTORY.md
|
||||
|
||||
# Add entry:
|
||||
# | Service Name | Host | Port | URL | Status |
|
||||
# |--------------|------|------|-----|--------|
|
||||
# | Service Name | Atlantis | 8080 | https://service.vish.gg | ✅ Active |
|
||||
```
|
||||
|
||||
### Step 10: Configure Backups (If Storing Important Data)
|
||||
|
||||
Add service to backup scripts:
|
||||
|
||||
```bash
|
||||
# Edit backup configuration
|
||||
nano backup.sh
|
||||
|
||||
# Add service data directory to backup list
|
||||
BACKUP_DIRS=(
|
||||
# ... existing dirs ...
|
||||
"/path/to/service-name/data"
|
||||
)
|
||||
```
|
||||
|
||||
Test backup:
|
||||
```bash
|
||||
./backup.sh --test
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
After deployment, verify the following:
|
||||
|
||||
- [ ] Container is running: `docker ps | grep service-name`
|
||||
- [ ] Logs show no critical errors: `docker logs service-name`
|
||||
- [ ] Service responds on expected port: `curl http://localhost:8080`
|
||||
- [ ] Health check passes (if configured): `docker inspect service-name | grep Health`
|
||||
- [ ] Resource usage is reasonable: `docker stats service-name --no-stream`
|
||||
- [ ] External access works (if configured): `curl https://service.vish.gg`
|
||||
- [ ] SSO authentication works (if using Authentik)
|
||||
- [ ] Service is added to documentation
|
||||
- [ ] Monitoring is configured (if applicable)
|
||||
- [ ] Backups include service data (if applicable)
|
||||
|
||||
## Rollback Procedure
|
||||
|
||||
If the deployment fails or causes issues:
|
||||
|
||||
### Via Portainer UI:
|
||||
1. Go to **Stacks** → Select the problematic stack
|
||||
2. Click **Stop** to stop the stack
|
||||
3. Click **Remove** to delete the stack
|
||||
4. Delete associated volumes if needed
|
||||
|
||||
### Via Command Line:
|
||||
```bash
|
||||
# SSH to host
|
||||
ssh [host]
|
||||
|
||||
# Stop and remove container
|
||||
docker stop service-name
|
||||
docker rm service-name
|
||||
|
||||
# Remove associated volumes (if needed)
|
||||
docker volume ls | grep service-name
|
||||
docker volume rm [volume-name]
|
||||
|
||||
# Remove from git
|
||||
cd ~/Documents/repos/homelab
|
||||
git rm hosts/[host]/[service-name].yaml
|
||||
git commit -m "Rollback: Remove service-name deployment"
|
||||
git push origin main
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Container Fails to Start
|
||||
|
||||
**Symptoms**: Container status shows "Exited" or "Restarting"
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
# Check logs for error messages
|
||||
docker logs service-name --tail 100
|
||||
|
||||
# Common issues:
|
||||
# - Port already in use: Change port mapping
|
||||
# - Permission denied: Check PUID/PGID
|
||||
# - Missing env variables: Add to compose file
|
||||
# - Volume mount issues: Verify paths exist
|
||||
```
|
||||
|
||||
### Issue: Container Starts But Service Unreachable
|
||||
|
||||
**Symptoms**: Container running but can't access service
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
# Check if service is listening on correct port
|
||||
docker exec service-name netstat -tlnp
|
||||
|
||||
# Check container network
|
||||
docker network inspect [network-name]
|
||||
|
||||
# Test from within container
|
||||
docker exec service-name curl localhost:8080
|
||||
|
||||
# Check firewall rules on host
|
||||
sudo ufw status
|
||||
```
|
||||
|
||||
### Issue: GitOps Auto-Deploy Not Working
|
||||
|
||||
**Symptoms**: Pushed changes but Portainer doesn't update
|
||||
|
||||
**Solution**:
|
||||
1. Check `https://git.vish.gg/Vish/homelab/actions` — did the `portainer-deploy.yml` run trigger?
|
||||
2. If it ran but shows "No stacks matched": the **Compose path** in Portainer doesn't exactly match the repo file path — check Stacks → your stack → Editor tab
|
||||
3. If the CI run didn't trigger at all: the changed file path isn't in the workflow's `paths:` filter (only `hosts/**`, `common/**`, `Calypso/**`, `Atlantis/**` trigger it)
|
||||
4. Manual fallback: Portainer → Stacks → your stack → Pull and redeploy
|
||||
|
||||
### Issue: High Resource Usage
|
||||
|
||||
**Symptoms**: Container using too much CPU/RAM
|
||||
|
||||
**Solution**:
|
||||
```bash
|
||||
# Add resource limits to compose file
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
cpus: '1.0'
|
||||
memory: 2G
|
||||
|
||||
# Redeploy with limits
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
## Post-Deployment Tasks
|
||||
|
||||
After successful deployment:
|
||||
|
||||
1. **Test the service thoroughly** - Ensure all features work as expected
|
||||
2. **Set up monitoring alerts** - Configure Grafana alerts for the service
|
||||
3. **Document usage** - Add user guide if others will use the service
|
||||
4. **Schedule maintenance** - Add to maintenance calendar for updates
|
||||
5. **Test backups** - Verify backup includes service data
|
||||
6. **Update runbook** - Note any deviations or improvements
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Deploy a New Service — End-to-End](../guides/deploy-new-service-gitops.md) ⭐ Complete step-by-step guide
|
||||
- [GitOps Deployment Guide](../GITOPS_DEPLOYMENT_GUIDE.md)
|
||||
- [Infrastructure Overview](../infrastructure/INFRASTRUCTURE_OVERVIEW.md)
|
||||
- [Service Inventory](../services/VERIFIED_SERVICE_INVENTORY.md)
|
||||
- [Monitoring Setup](../admin/monitoring-setup.md)
|
||||
|
||||
## Examples
|
||||
|
||||
### Example 1: Adding Uptime Kuma
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
uptime-kuma:
|
||||
image: louislam/uptime-kuma:1
|
||||
container_name: uptime-kuma
|
||||
restart: unless-stopped
|
||||
|
||||
volumes:
|
||||
- /volume1/docker/uptime-kuma:/app/data
|
||||
|
||||
ports:
|
||||
- "3001:3001"
|
||||
|
||||
networks:
|
||||
- monitoring
|
||||
|
||||
networks:
|
||||
monitoring:
|
||||
external: true
|
||||
```
|
||||
|
||||
### Example 2: Adding a Service with Database
|
||||
```yaml
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
app:
|
||||
image: myapp:latest
|
||||
depends_on:
|
||||
- postgres
|
||||
environment:
|
||||
- DATABASE_URL=postgresql://user:REDACTED_PASSWORD@postgres:5432/dbname
|
||||
ports:
|
||||
- "8080:8080"
|
||||
networks:
|
||||
- app-network
|
||||
|
||||
postgres:
|
||||
image: postgres:15
|
||||
environment:
|
||||
- POSTGRES_USER=user
|
||||
- POSTGRES_PASSWORD="REDACTED_PASSWORD"
|
||||
- POSTGRES_DB=dbname
|
||||
volumes:
|
||||
- postgres-data:/var/lib/postgresql/data
|
||||
networks:
|
||||
- app-network
|
||||
|
||||
networks:
|
||||
app-network:
|
||||
driver: bridge
|
||||
|
||||
volumes:
|
||||
postgres-data:
|
||||
```
|
||||
|
||||
## Change Log
|
||||
|
||||
- 2026-02-14 - Initial creation with GitOps workflow
|
||||
- 2026-02-14 - Added examples and troubleshooting section
|
||||
Reference in New Issue
Block a user