Sanitized mirror from private repository - 2026-04-05 05:34:18 UTC
This commit is contained in:
237
docs/infrastructure/hosts/calypso-runbook.md
Normal file
237
docs/infrastructure/hosts/calypso-runbook.md
Normal file
@@ -0,0 +1,237 @@
|
||||
# Calypso Runbook
|
||||
|
||||
*Synology DS723+ - Secondary NAS and Infrastructure*
|
||||
|
||||
**Endpoint ID:** 443397
|
||||
**Status:** 🟢 Online
|
||||
**Hardware:** AMD Ryzen R1600, 32GB RAM, 2 bays + expansion
|
||||
**Access:** `calypso.vish.local`
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
Calypso is the secondary Synology NAS handling critical infrastructure services including authentication, reverse proxy, and monitoring.
|
||||
|
||||
## Hardware Specs
|
||||
|
||||
| Component | Specification |
|
||||
|----------|---------------|
|
||||
| Model | Synology DS723+ |
|
||||
| CPU | AMD Ryzen R1600 (2-core/4-thread) |
|
||||
| RAM | 32GB |
|
||||
| Storage | 2-bay SHR + eSATA expansion |
|
||||
| Network | 2x 1GbE |
|
||||
|
||||
## Services
|
||||
|
||||
### Critical Infrastructure
|
||||
|
||||
| Service | Port | Purpose | Status |
|
||||
|---------|------|---------|--------|
|
||||
| **Nginx Proxy Manager** | 80/443 | SSL termination & routing | Required |
|
||||
| **Authentik** | 9000 | SSO authentication | Required |
|
||||
| **Prometheus** | 9090 | Metrics collection | Required |
|
||||
| **Grafana** | 3000 | Dashboards | Required |
|
||||
| **Alertmanager** | 9093 | Alert routing | Required |
|
||||
|
||||
### Additional Services
|
||||
|
||||
| Service | Port | Purpose |
|
||||
|---------|------|---------|
|
||||
| AdGuard | 3053 | DNS filtering (backup) |
|
||||
| Paperless-NGX | 8000 | Document management |
|
||||
| Reactive Resume | 3001 | Resume builder |
|
||||
| Gitea | 3000/22 | Git hosting |
|
||||
| Gitea Runner | 3008 | CI/CD |
|
||||
| Headscale | 8080 | WireGuard VPN controller |
|
||||
| Seafile | 8082 | File sync & share |
|
||||
| Syncthing | 8384 | File sync |
|
||||
| WireGuard | 51820 | VPN server |
|
||||
| Portainer Agent | 9001 | Container management |
|
||||
|
||||
### Media (ARR Stack)
|
||||
|
||||
- Sonarr, Radarr, Lidarr
|
||||
- Prowlarr (indexers)
|
||||
- Bazarr (subtitles)
|
||||
|
||||
---
|
||||
|
||||
## Storage Layout
|
||||
|
||||
```
|
||||
/volume1/
|
||||
├── docker/
|
||||
├── docker/compose/
|
||||
├── appdata/ # Application data
|
||||
│ ├── authentik/
|
||||
│ ├── npm/
|
||||
│ ├── prometheus/
|
||||
│ └── grafana/
|
||||
├── documents/ # Paperless
|
||||
├── seafile/ # Seafile data
|
||||
└── backups/ # Backup destination
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Daily Operations
|
||||
|
||||
### Check Service Health
|
||||
```bash
|
||||
# Via Portainer
|
||||
open http://calypso.vish.local:9001
|
||||
|
||||
# Via SSH
|
||||
ssh admin@calypso.vish.local
|
||||
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
|
||||
```
|
||||
|
||||
### Monitor Critical Services
|
||||
```bash
|
||||
# Check NPM
|
||||
curl -I http://localhost:80
|
||||
|
||||
# Check Authentik
|
||||
curl -I http://localhost:9000
|
||||
|
||||
# Check Prometheus
|
||||
curl -I http://localhost:9090
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### NPM Not Routing
|
||||
1. Check if NPM is running: `docker ps | grep npm`
|
||||
2. Verify proxy hosts configured: Access NPM UI → Proxy Hosts
|
||||
3. Check SSL certificates
|
||||
4. Review NPM logs: `docker logs nginx-proxy-manager`
|
||||
|
||||
### Authentik SSO Broken
|
||||
1. Check Authentik running: `docker ps | grep authentik`
|
||||
2. Verify PostgreSQL: `docker logs authentik-postgresql`
|
||||
3. Check Redis: `docker logs authentik-redis`
|
||||
4. Review OIDC configurations in services
|
||||
|
||||
### Prometheus Down
|
||||
1. Check storage: `docker system df`
|
||||
2. Verify volume: `docker volume ls | grep prometheus`
|
||||
3. Check retention settings
|
||||
4. Review logs: `docker logs prometheus`
|
||||
|
||||
---
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Weekly
|
||||
- [ ] Verify Authentik users can login
|
||||
- [ ] Check Prometheus metrics collection
|
||||
- [ ] Review Alertmanager notifications
|
||||
- [ ] Verify NPM certificates
|
||||
|
||||
### Monthly
|
||||
- [ ] Clean unused Docker images
|
||||
- [ ] Review Prometheus retention
|
||||
- [ ] Update applications
|
||||
- [ ] Check disk usage
|
||||
|
||||
### Quarterly
|
||||
- [ ] Test OAuth flows
|
||||
- [ ] Verify backup restoration
|
||||
- [ ] Review monitoring thresholds
|
||||
- [ ] Update SSL certificates
|
||||
|
||||
---
|
||||
|
||||
## SSL Certificate Management
|
||||
|
||||
NPM handles all SSL certificates:
|
||||
|
||||
1. **Automatic Renewal**: Let's Encrypt (default)
|
||||
2. **Manual**: Access NPM → SSL Certificates → Add
|
||||
3. **Check Status**: NPM Dashboard → SSL
|
||||
|
||||
### Common Certificate Issues
|
||||
- Rate limits: Wait 1 hour between requests
|
||||
- DNS challenge: Verify external DNS
|
||||
- Self-signed: Use for internal services
|
||||
|
||||
---
|
||||
|
||||
## Backup Procedures
|
||||
|
||||
### Configuration Backup
|
||||
```bash
|
||||
# Via Ansible
|
||||
ansible-playbook ansible/automation/playbooks/backup_configs.yml --tags calypso
|
||||
```
|
||||
|
||||
### Key Data to Backup
|
||||
- NPM configurations: `/volume1/docker/compose/nginx_proxy_manager/`
|
||||
- Authentik: `/volume1/docker/appdata/authentik/`
|
||||
- Prometheus: `/volume1/docker/appdata/prometheus/`
|
||||
- Grafana: `/volume1/docker/appdata/grafana/`
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Authentik Down
|
||||
**Impact**: SSO broken for all services
|
||||
|
||||
1. Verify containers running
|
||||
2. Check PostgreSQL: `docker logs authentik-postgresql`
|
||||
3. Check Redis: `docker logs authentik-redis`
|
||||
4. Restart Authentik: `docker-compose restart`
|
||||
5. If needed, restore from backup
|
||||
|
||||
### NPM Down
|
||||
**Impact**: No external access
|
||||
|
||||
1. Verify container: `docker ps | grep npm`
|
||||
2. Check ports 80/443: `netstat -tulpn | grep -E '80|443'`
|
||||
3. Restart: `docker-compose restart`
|
||||
4. Check DNS resolution
|
||||
|
||||
### Prometheus Full
|
||||
**Impact**: No metrics
|
||||
|
||||
1. Check storage: `docker system df`
|
||||
2. Reduce retention: Edit prometheus.yml
|
||||
3. Clean old data: `docker exec prometheus promtool tsdb delete-insufficient`
|
||||
4. Restart container
|
||||
|
||||
---
|
||||
|
||||
## Useful Commands
|
||||
|
||||
```bash
|
||||
# SSH access
|
||||
ssh admin@calypso.vish.local
|
||||
|
||||
# Check critical services
|
||||
docker ps --filter "name=nginx" --filter "name=authentik" --filter "name=prometheus"
|
||||
|
||||
# Restart infrastructure
|
||||
cd /volume1/docker/compose/nginx_proxy_manager && docker-compose restart
|
||||
cd /volume1/docker/compose/authentik && docker-compose restart
|
||||
|
||||
# View logs
|
||||
docker logs -f nginx-proxy-manager
|
||||
docker logs -f authentik-server
|
||||
docker logs -f prometheus
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Links
|
||||
|
||||
- [Synology DSM](https://calypso.vish.local:5001)
|
||||
- [Nginx Proxy Manager](http://calypso.vish.local:81)
|
||||
- [Authentik](http://calypso.vish.local:9000)
|
||||
- [Prometheus](http://calypso.vish.local:9090)
|
||||
- [Grafana](http://calypso.vish.local:3000)
|
||||
- [Alertmanager](http://calypso.vish.local:9093)
|
||||
Reference in New Issue
Block a user