Files
homelab-optimized/docs/runbooks/add-new-service.md
Gitea Mirror Bot 2db71a65a6
Some checks failed
Documentation / Build Docusaurus (push) Failing after 8s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-09 11:57:39 UTC
2026-03-09 11:57:39 +00:00

475 lines
12 KiB
Markdown

# Add New Service Runbook
> **Looking for the step-by-step deployment guide?**
> See [Deploy a New Service — End-to-End](../guides/deploy-new-service-gitops.md) for a
> complete walkthrough from compose file to live container, including CI pipeline verification.
> This runbook covers the extended checklist (monitoring, backups, SSO) for production readiness.
## Overview
This runbook guides you through deploying a new containerized service to the homelab using GitOps with Portainer. The procedure ensures proper configuration, monitoring, and documentation.
## Prerequisites
- [ ] Git access to the homelab repository
- [ ] Portainer access (https://192.168.0.200:9443) - Portainer EE v2.33.7
- [ ] Target host selected and available
- [ ] Service Docker Compose file prepared
- [ ] Required environment variables identified
- [ ] Network requirements understood (ports, domains, etc.)
## Current GitOps Status
- **Active Deployments**: 18 compose stacks on Atlantis (verified Feb 14, 2026)
- **Total Containers**: 50+ containers across infrastructure
- **GitOps Method**: Automatic sync from Git repository via Portainer EE
## Metadata
- **Estimated Time**: 30-60 minutes
- **Risk Level**: Low (if following proper testing)
- **Requires Downtime**: No (for new services)
- **Reversible**: Yes (can remove stack)
- **Tested On**: 2026-02-14
## Decision: Which Host?
Choose the appropriate host based on service requirements:
| Host | Best For | Available Resources | GitOps Status |
|------|----------|-------------------|---------------|
| **Atlantis** (DS1823xs+) | Media services, high I/O, primary storage | 8 CPU, 31GB RAM, 50+ containers | ✅ 18 Active Stacks |
| **Calypso** (DS723+) | Secondary media, backup services | 4 CPU, 31GB RAM, 46 containers | ✅ GitOps Ready |
| **Concord NUC** | Network services, DNS, VPN | 4 CPU, 15.5GB RAM, 17 containers | ✅ GitOps Ready |
| **Homelab VM** | Development, monitoring, testing | 4 CPU, 28.7GB RAM, 23 containers | ✅ GitOps Ready |
| **Raspberry Pi 5** | IoT, edge computing, lightweight services | 4 CPU, 15.8GB RAM, 4 containers | ✅ GitOps Ready |
## Procedure
### Step 1: Create Docker Compose Configuration
Create a new compose file in the appropriate host directory:
```bash
cd ~/Documents/repos/homelab
# Choose the appropriate path:
# - hosts/synology/atlantis/
# - hosts/synology/calypso/
# - hosts/physical/concord-nuc/
# - hosts/vms/homelab-vm/
# - hosts/edge/raspberry-pi-5/
# Create new service file
nano hosts/[host]/[service-name].yaml
```
Example Docker Compose structure:
```yaml
version: '3.8'
services:
service-name:
image: organization/image:tag
container_name: service-name
restart: unless-stopped
environment:
- PUID=1000
- PGID=1000
- TZ=America/Los_Angeles
# Add service-specific variables
volumes:
- /path/to/config:/config
- /path/to/data:/data
ports:
- "8080:8080" # external:internal
networks:
- service-network
# Optional: health check
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
# Optional: resource limits
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '1.0'
memory: 2G
networks:
service-network:
driver: bridge
# Optional: named volumes
volumes:
service-data:
driver: local
```
### Step 2: Configure Environment Variables
If your service requires sensitive data, create an `.env` file (ensure it's in `.gitignore`):
```bash
# Create .env file (DO NOT commit to git)
nano .env.example # Template for others
# Example .env content:
# SERVICE_API_KEY=REDACTED_API_KEY
# SERVICE_SECRET=your_secret_here
# DATABASE_PASSWORD="REDACTED_PASSWORD"
```
### Step 3: Validate Configuration Locally (Optional but Recommended)
Test the compose file syntax:
```bash
# Validate syntax
docker-compose -f hosts/[host]/[service-name].yaml config
# Expected output: Valid YAML with no errors
```
### Step 4: Commit and Push to Git Repository
```bash
# Add the new service file
git add hosts/[host]/[service-name].yaml
# If adding .env.example template
git add .env.example
# Commit with descriptive message
git commit -m "Add [service-name] deployment for [host]
- Add Docker Compose configuration
- Configure environment variables
- Set resource limits and health checks
- Documentation: [purpose of service]
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>"
# Push to remote
git push origin main
```
Expected output:
```
[main abc1234] Add service-name deployment for host
1 file changed, 45 insertions(+)
create mode 100644 hosts/host/service-name.yaml
```
### Step 5: Deploy via Portainer GitOps
#### Option A: CI Auto-Deploy (for existing stacks)
Once a stack is registered in Portainer, every future `git push` to `main` triggers
`portainer-deploy.yml` in Gitea CI, which redeploies matching stacks automatically within
~30 seconds. Watch the run at `https://git.vish.gg/Vish/homelab/actions`.
For a **new** stack (first deploy), you must register it in Portainer manually via Option B below —
the CI can only redeploy stacks that already exist in Portainer.
#### Option B: Manual Deployment via Portainer UI
1. Open Portainer: http://vishinator.synology.me:10000
2. Navigate to the target endpoint (e.g., "Atlantis", "Calypso")
3. Click **Stacks****Add stack**
4. Configure stack:
- **Name**: `service-name`
- **Build method**: **Git Repository**
- **Repository URL**: Your git repository URL
- **Repository reference**: `refs/heads/main`
- **Compose path**: `hosts/[host]/[service-name].yaml`
- **GitOps updates**: ✅ Enable (optional)
5. Add environment variables if needed
6. Click **Deploy the stack**
### Step 6: Verify Deployment
Check container status:
```bash
# SSH to the host
ssh atlantis # or appropriate host
# Check container is running
docker ps | grep service-name
# Check logs for errors
docker logs service-name --tail 50
# Check resource usage
docker stats service-name --no-stream
```
Expected output:
```
CONTAINER ID IMAGE STATUS PORTS
abc123def456 org/image:tag Up 2 minutes 0.0.0.0:8080->8080/tcp
```
### Step 7: Configure Networking (If External Access Needed)
#### For Services Behind Reverse Proxy:
1. Add to Nginx Proxy Manager or configure reverse proxy
2. Create DNS record (Cloudflare or local DNS)
3. Configure SSL certificate (Let's Encrypt)
#### For Services Using Authentik SSO:
1. Add application in Authentik
2. Configure OAuth2/SAML provider
3. Update service with Authentik integration
See [Authentik SSO Guide](../infrastructure/authentik-sso.md) for details.
### Step 8: Add to Monitoring (Optional but Recommended)
Add service to monitoring stack:
```yaml
# Add to prometheus/prometheus.yml
scrape_configs:
- job_name: 'service-name'
static_configs:
- targets: ['service-name:8080']
```
Update Grafana dashboards if needed.
### Step 9: Document the Service
Update service inventory:
```bash
# Edit service documentation
nano docs/services/VERIFIED_SERVICE_INVENTORY.md
# Add entry:
# | Service Name | Host | Port | URL | Status |
# |--------------|------|------|-----|--------|
# | Service Name | Atlantis | 8080 | https://service.vish.gg | ✅ Active |
```
### Step 10: Configure Backups (If Storing Important Data)
Add service to backup scripts:
```bash
# Edit backup configuration
nano backup.sh
# Add service data directory to backup list
BACKUP_DIRS=(
# ... existing dirs ...
"/path/to/service-name/data"
)
```
Test backup:
```bash
./backup.sh --test
```
## Verification Checklist
After deployment, verify the following:
- [ ] Container is running: `docker ps | grep service-name`
- [ ] Logs show no critical errors: `docker logs service-name`
- [ ] Service responds on expected port: `curl http://localhost:8080`
- [ ] Health check passes (if configured): `docker inspect service-name | grep Health`
- [ ] Resource usage is reasonable: `docker stats service-name --no-stream`
- [ ] External access works (if configured): `curl https://service.vish.gg`
- [ ] SSO authentication works (if using Authentik)
- [ ] Service is added to documentation
- [ ] Monitoring is configured (if applicable)
- [ ] Backups include service data (if applicable)
## Rollback Procedure
If the deployment fails or causes issues:
### Via Portainer UI:
1. Go to **Stacks** → Select the problematic stack
2. Click **Stop** to stop the stack
3. Click **Remove** to delete the stack
4. Delete associated volumes if needed
### Via Command Line:
```bash
# SSH to host
ssh [host]
# Stop and remove container
docker stop service-name
docker rm service-name
# Remove associated volumes (if needed)
docker volume ls | grep service-name
docker volume rm [volume-name]
# Remove from git
cd ~/Documents/repos/homelab
git rm hosts/[host]/[service-name].yaml
git commit -m "Rollback: Remove service-name deployment"
git push origin main
```
## Troubleshooting
### Issue: Container Fails to Start
**Symptoms**: Container status shows "Exited" or "Restarting"
**Solution**:
```bash
# Check logs for error messages
docker logs service-name --tail 100
# Common issues:
# - Port already in use: Change port mapping
# - Permission denied: Check PUID/PGID
# - Missing env variables: Add to compose file
# - Volume mount issues: Verify paths exist
```
### Issue: Container Starts But Service Unreachable
**Symptoms**: Container running but can't access service
**Solution**:
```bash
# Check if service is listening on correct port
docker exec service-name netstat -tlnp
# Check container network
docker network inspect [network-name]
# Test from within container
docker exec service-name curl localhost:8080
# Check firewall rules on host
sudo ufw status
```
### Issue: GitOps Auto-Deploy Not Working
**Symptoms**: Pushed changes but Portainer doesn't update
**Solution**:
1. Check `https://git.vish.gg/Vish/homelab/actions` — did the `portainer-deploy.yml` run trigger?
2. If it ran but shows "No stacks matched": the **Compose path** in Portainer doesn't exactly match the repo file path — check Stacks → your stack → Editor tab
3. If the CI run didn't trigger at all: the changed file path isn't in the workflow's `paths:` filter (only `hosts/**`, `common/**`, `Calypso/**`, `Atlantis/**` trigger it)
4. Manual fallback: Portainer → Stacks → your stack → Pull and redeploy
### Issue: High Resource Usage
**Symptoms**: Container using too much CPU/RAM
**Solution**:
```bash
# Add resource limits to compose file
deploy:
resources:
limits:
cpus: '1.0'
memory: 2G
# Redeploy with limits
docker-compose up -d
```
## Post-Deployment Tasks
After successful deployment:
1. **Test the service thoroughly** - Ensure all features work as expected
2. **Set up monitoring alerts** - Configure Grafana alerts for the service
3. **Document usage** - Add user guide if others will use the service
4. **Schedule maintenance** - Add to maintenance calendar for updates
5. **Test backups** - Verify backup includes service data
6. **Update runbook** - Note any deviations or improvements
## Related Documentation
- [Deploy a New Service — End-to-End](../guides/deploy-new-service-gitops.md) ⭐ Complete step-by-step guide
- [GitOps Deployment Guide](../GITOPS_DEPLOYMENT_GUIDE.md)
- [Infrastructure Overview](../infrastructure/INFRASTRUCTURE_OVERVIEW.md)
- [Service Inventory](../services/VERIFIED_SERVICE_INVENTORY.md)
- [Monitoring Setup](../admin/monitoring-setup.md)
## Examples
### Example 1: Adding Uptime Kuma
```yaml
version: '3.8'
services:
uptime-kuma:
image: louislam/uptime-kuma:1
container_name: uptime-kuma
restart: unless-stopped
volumes:
- /volume1/docker/uptime-kuma:/app/data
ports:
- "3001:3001"
networks:
- monitoring
networks:
monitoring:
external: true
```
### Example 2: Adding a Service with Database
```yaml
version: '3.8'
services:
app:
image: myapp:latest
depends_on:
- postgres
environment:
- DATABASE_URL=postgresql://user:REDACTED_PASSWORD@postgres:5432/dbname
ports:
- "8080:8080"
networks:
- app-network
postgres:
image: postgres:15
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD="REDACTED_PASSWORD"
- POSTGRES_DB=dbname
volumes:
- postgres-data:/var/lib/postgresql/data
networks:
- app-network
networks:
app-network:
driver: bridge
volumes:
postgres-data:
```
## Change Log
- 2026-02-14 - Initial creation with GitOps workflow
- 2026-02-14 - Added examples and troubleshooting section