Sanitized mirror from private repository - 2026-04-05 12:45:10 UTC
This commit is contained in:
413
docs/GITOPS_DEPLOYMENT_GUIDE.md
Normal file
413
docs/GITOPS_DEPLOYMENT_GUIDE.md
Normal file
@@ -0,0 +1,413 @@
|
||||
# 🚀 GitOps Deployment Guide
|
||||
|
||||
*Comprehensive guide for GitOps-based deployments using Portainer and Git integration*
|
||||
|
||||
## Overview
|
||||
This guide covers the GitOps deployment methodology used throughout the homelab infrastructure, enabling automated, version-controlled, and auditable deployments.
|
||||
|
||||
## GitOps Architecture
|
||||
|
||||
### Core Components
|
||||
- **Git Repository**: `https://git.vish.gg/Vish/homelab.git`
|
||||
- **Portainer**: Container orchestration and GitOps automation
|
||||
- **Docker Compose**: Service definition and configuration
|
||||
- **Nginx Proxy Manager**: Reverse proxy and SSL termination
|
||||
|
||||
### Workflow Overview
|
||||
```mermaid
|
||||
graph LR
|
||||
A[Developer] --> B[Git Commit]
|
||||
B --> C[Git Repository]
|
||||
C --> D[Portainer GitOps]
|
||||
D --> E[Docker Deployment]
|
||||
E --> F[Service Running]
|
||||
F --> G[Monitoring]
|
||||
```
|
||||
|
||||
## Repository Structure
|
||||
|
||||
### Host-Based Organization
|
||||
```
|
||||
homelab/
|
||||
├── Atlantis/ # Primary NAS services
|
||||
├── Calypso/ # Secondary NAS services
|
||||
├── homelab_vm/ # Main VM services
|
||||
├── concord_nuc/ # Intel NUC services
|
||||
├── raspberry-pi-5-vish/ # Raspberry Pi services
|
||||
├── common/ # Shared configurations
|
||||
└── docs/ # Documentation
|
||||
```
|
||||
|
||||
### Service File Standards
|
||||
```yaml
|
||||
# Standard docker-compose.yml structure
|
||||
version: '3.8'
|
||||
|
||||
services:
|
||||
service-name:
|
||||
image: official/image:tag
|
||||
container_name: service-name-hostname
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- PUID=1000
|
||||
- PGID=1000
|
||||
- TZ=America/New_York
|
||||
volumes:
|
||||
- service-data:/app/data
|
||||
ports:
|
||||
- "8080:8080"
|
||||
networks:
|
||||
- default
|
||||
labels:
|
||||
- "traefik.enable=true"
|
||||
- "traefik.http.routers.service.rule=Host(`service.local`)"
|
||||
|
||||
volumes:
|
||||
service-data:
|
||||
driver: local
|
||||
|
||||
networks:
|
||||
default:
|
||||
name: service-network
|
||||
```
|
||||
|
||||
## Portainer GitOps Configuration
|
||||
|
||||
### Stack Creation
|
||||
1. **Navigate to Stacks** in Portainer
|
||||
2. **Create new stack** with descriptive name
|
||||
3. **Select Git repository** as source
|
||||
4. **Configure repository settings**:
|
||||
- Repository URL: `https://git.vish.gg/Vish/homelab.git`
|
||||
- Reference: `refs/heads/main`
|
||||
- Compose path: `hostname/service-name.yml`
|
||||
|
||||
### Authentication Setup
|
||||
```bash
|
||||
# Generate Gitea access token
|
||||
curl -X POST "https://git.vish.gg/api/v1/users/username/tokens" \
|
||||
-H "Authorization: token existing-token" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"name": "portainer-gitops", "scopes": ["read:repository"]}'
|
||||
|
||||
# Configure in Portainer
|
||||
# Settings > Git credentials > Add credential
|
||||
# Username: gitea-username
|
||||
# Password: "REDACTED_PASSWORD"
|
||||
```
|
||||
|
||||
### Auto-Update Configuration
|
||||
- **Polling interval**: 5 minutes
|
||||
- **Webhook support**: Enabled for immediate updates
|
||||
- **Rollback capability**: Previous version retention
|
||||
- **Health checks**: Automated deployment verification
|
||||
|
||||
## Deployment Workflow
|
||||
|
||||
### Development Process
|
||||
1. **Local development**: Test changes locally
|
||||
2. **Git commit**: Commit changes with descriptive messages
|
||||
3. **Git push**: Push to main branch
|
||||
4. **Automatic deployment**: Portainer detects changes
|
||||
5. **Health verification**: Automated health checks
|
||||
6. **Monitoring**: Continuous monitoring and alerting
|
||||
|
||||
### Commit Message Standards
|
||||
```bash
|
||||
# Feature additions
|
||||
git commit -m "feat(plex): add hardware transcoding support"
|
||||
|
||||
# Bug fixes
|
||||
git commit -m "fix(nginx): resolve SSL certificate renewal issue"
|
||||
|
||||
# Configuration updates
|
||||
git commit -m "config(monitoring): update Prometheus retention policy"
|
||||
|
||||
# Documentation
|
||||
git commit -m "docs(readme): update service deployment instructions"
|
||||
```
|
||||
|
||||
### Branch Strategy
|
||||
- **main**: Production deployments
|
||||
- **develop**: Development and testing (future)
|
||||
- **feature/***: Feature development branches (future)
|
||||
- **hotfix/***: Emergency fixes (future)
|
||||
|
||||
## Environment Management
|
||||
|
||||
### Environment Variables
|
||||
```yaml
|
||||
# .env file structure (not in Git)
|
||||
PUID=1000
|
||||
PGID=1000
|
||||
TZ=America/New_York
|
||||
SERVICE_PORT=8080
|
||||
DATABASE_PASSWORD="REDACTED_PASSWORD"
|
||||
API_KEY=secret-api-key
|
||||
```
|
||||
|
||||
### Secrets Management
|
||||
```yaml
|
||||
# Using Docker secrets
|
||||
secrets:
|
||||
db_password:
|
||||
"REDACTED_PASSWORD" true
|
||||
name: postgres_password
|
||||
|
||||
api_key:
|
||||
external: true
|
||||
name: service_api_key
|
||||
|
||||
services:
|
||||
app:
|
||||
secrets:
|
||||
- db_password
|
||||
- api_key
|
||||
```
|
||||
|
||||
### Configuration Templates
|
||||
```yaml
|
||||
# Template with environment substitution
|
||||
services:
|
||||
app:
|
||||
image: app:${APP_VERSION:-latest}
|
||||
environment:
|
||||
- DATABASE_URL=postgres://user:${DB_PASSWORD}@db:5432/app
|
||||
- API_KEY=${API_KEY}
|
||||
ports:
|
||||
- "${APP_PORT:-8080}:8080"
|
||||
```
|
||||
|
||||
## Service Categories
|
||||
|
||||
### Infrastructure Services
|
||||
- **Monitoring**: Prometheus, Grafana, AlertManager
|
||||
- **Networking**: Nginx Proxy Manager, Pi-hole, WireGuard
|
||||
- **Storage**: MinIO, Syncthing, backup services
|
||||
- **Security**: Vaultwarden, Authentik, fail2ban
|
||||
|
||||
### Media Services
|
||||
- **Streaming**: Plex, Jellyfin, Navidrome
|
||||
- **Management**: Sonarr, Radarr, Lidarr, Prowlarr
|
||||
- **Tools**: Tdarr, Calibre, YouTube-DL
|
||||
|
||||
### Development Services
|
||||
- **Version Control**: Gitea, GitLab (archived)
|
||||
- **CI/CD**: Gitea Runner, Jenkins (planned)
|
||||
- **Tools**: Code Server, Jupyter, Draw.io
|
||||
|
||||
### Communication Services
|
||||
- **Chat**: Matrix Synapse, Mattermost
|
||||
- **Social**: Mastodon, Element
|
||||
- **Notifications**: NTFY, Gotify
|
||||
|
||||
## Monitoring and Observability
|
||||
|
||||
### Deployment Monitoring
|
||||
```yaml
|
||||
# Prometheus monitoring for GitOps
|
||||
- job_name: 'portainer'
|
||||
static_configs:
|
||||
- targets: ['portainer:9000']
|
||||
metrics_path: '/api/endpoints/1/docker/containers/json'
|
||||
|
||||
- job_name: 'docker-daemon'
|
||||
static_configs:
|
||||
- targets: ['localhost:9323']
|
||||
```
|
||||
|
||||
### Health Checks
|
||||
```yaml
|
||||
# Service health check configuration
|
||||
healthcheck:
|
||||
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 60s
|
||||
```
|
||||
|
||||
### Alerting Rules
|
||||
```yaml
|
||||
# Deployment failure alerts
|
||||
- alert: REDACTED_APP_PASSWORD
|
||||
expr: increase(portainer_stack_deployment_failures_total[5m]) > 0
|
||||
for: 0m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Stack deployment failed"
|
||||
description: "Stack {{ $labels.stack_name }} deployment failed"
|
||||
|
||||
- alert: REDACTED_APP_PASSWORD
|
||||
expr: container_health_status{health_status!="healthy"} == 1
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Service health check failing"
|
||||
```
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
### Access Control
|
||||
- **Git repository**: Private repository with access controls
|
||||
- **Portainer access**: Role-based access control
|
||||
- **Service isolation**: Network segmentation
|
||||
- **Secrets management**: External secret storage
|
||||
|
||||
### Security Scanning
|
||||
```yaml
|
||||
# Security scanning in CI/CD pipeline
|
||||
security_scan:
|
||||
stage: security
|
||||
script:
|
||||
- docker run --rm -v $(pwd):/app clair-scanner:latest
|
||||
- trivy fs --security-checks vuln,config .
|
||||
- hadolint Dockerfile
|
||||
```
|
||||
|
||||
### Network Security
|
||||
```yaml
|
||||
# Network isolation
|
||||
networks:
|
||||
frontend:
|
||||
driver: bridge
|
||||
internal: false
|
||||
backend:
|
||||
driver: bridge
|
||||
internal: true
|
||||
database:
|
||||
driver: bridge
|
||||
internal: true
|
||||
```
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Configuration Backup
|
||||
```bash
|
||||
# Backup Portainer configuration
|
||||
docker exec portainer tar -czf /backup/portainer-config-$(date +%Y%m%d).tar.gz /data
|
||||
|
||||
# Backup Git repository
|
||||
git clone --mirror https://git.vish.gg/Vish/homelab.git /backup/homelab-mirror
|
||||
```
|
||||
|
||||
### Disaster Recovery
|
||||
1. **Repository restoration**: Clone from backup or remote
|
||||
2. **Portainer restoration**: Restore configuration and stacks
|
||||
3. **Service redeployment**: Automatic redeployment from Git
|
||||
4. **Data restoration**: Restore persistent volumes
|
||||
5. **Verification**: Comprehensive service testing
|
||||
|
||||
### Recovery Testing
|
||||
```bash
|
||||
# Regular disaster recovery testing
|
||||
./scripts/test-disaster-recovery.sh
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### Deployment Failures
|
||||
```bash
|
||||
# Check Portainer logs
|
||||
docker logs portainer
|
||||
|
||||
# Verify Git connectivity
|
||||
git ls-remote https://git.vish.gg/Vish/homelab.git
|
||||
|
||||
# Check Docker daemon
|
||||
docker system info
|
||||
```
|
||||
|
||||
#### Service Health Issues
|
||||
```bash
|
||||
# Check container status
|
||||
docker ps -a
|
||||
|
||||
# View service logs
|
||||
docker logs service-name
|
||||
|
||||
# Inspect container configuration
|
||||
docker inspect service-name
|
||||
```
|
||||
|
||||
#### Network Connectivity
|
||||
```bash
|
||||
# Test network connectivity
|
||||
docker network ls
|
||||
docker network inspect network-name
|
||||
|
||||
# Check port bindings
|
||||
netstat -tulpn | grep :8080
|
||||
```
|
||||
|
||||
### Debugging Tools
|
||||
```bash
|
||||
# Docker system information
|
||||
docker system df
|
||||
docker system events
|
||||
|
||||
# Container resource usage
|
||||
docker stats
|
||||
|
||||
# Network troubleshooting
|
||||
docker exec container-name ping other-container
|
||||
```
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Resource Management
|
||||
```yaml
|
||||
# Resource limits and reservations
|
||||
deploy:
|
||||
resources:
|
||||
limits:
|
||||
memory: 1G
|
||||
cpus: '1.0'
|
||||
reservations:
|
||||
memory: 512M
|
||||
cpus: '0.5'
|
||||
```
|
||||
|
||||
### Storage Optimization
|
||||
```yaml
|
||||
# Efficient volume management
|
||||
volumes:
|
||||
app-data:
|
||||
driver: local
|
||||
driver_opts:
|
||||
type: none
|
||||
o: bind
|
||||
device: /opt/app/data
|
||||
```
|
||||
|
||||
### Network Optimization
|
||||
```yaml
|
||||
# Optimized network configuration
|
||||
networks:
|
||||
app-network:
|
||||
driver: bridge
|
||||
driver_opts:
|
||||
com.docker.network.bridge.name: app-br0
|
||||
com.docker.network.driver.mtu: 1500
|
||||
```
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
### Planned Features
|
||||
- **Multi-environment support**: Development, staging, production
|
||||
- **Advanced rollback**: Automated rollback on failure
|
||||
- **Blue-green deployments**: Zero-downtime deployments
|
||||
- **Canary releases**: Gradual rollout strategy
|
||||
|
||||
### Integration Improvements
|
||||
- **Webhook automation**: Immediate deployment triggers
|
||||
- **Slack notifications**: Deployment status updates
|
||||
- **Automated testing**: Pre-deployment validation
|
||||
- **Security scanning**: Automated vulnerability assessment
|
||||
|
||||
---
|
||||
**Status**: ✅ GitOps deployment pipeline operational with 67+ active stacks
|
||||
Reference in New Issue
Block a user