Sanitized mirror from private repository - 2026-03-21 06:02:48 UTC
This commit is contained in:
140
docs/admin/REPOSITORY_SANITIZATION.md
Normal file
140
docs/admin/REPOSITORY_SANITIZATION.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Repository Sanitization
|
||||
|
||||
This document describes the sanitization process used to create a safe public mirror of the private homelab repository.
|
||||
|
||||
## Overview
|
||||
|
||||
The `.gitea/sanitize.py` script automatically removes sensitive information before pushing content to the public repository ([homelab-optimized](https://git.vish.gg/Vish/homelab-optimized)). This ensures that while the public repo contains useful configuration examples, no actual secrets, passwords, or private keys are exposed.
|
||||
|
||||
## How It Works
|
||||
|
||||
The sanitization script runs as part of the [Mirror to Public Repository](../.gitea/workflows/mirror-to-public.yaml) GitHub Actions workflow. It performs three main operations:
|
||||
|
||||
1. **Remove sensitive files completely** - Files containing only secrets are deleted
|
||||
2. **Remove entire directories** - Directories that shouldn't be public are deleted
|
||||
3. **Redact sensitive patterns** - Searches and replaces secrets in file contents
|
||||
|
||||
## Files Removed Completely
|
||||
|
||||
The following categories of files are completely removed from the public mirror:
|
||||
|
||||
| Category | Examples |
|
||||
|----------|----------|
|
||||
| Private keys/certificates | `.pem` private keys, WireGuard configs |
|
||||
| Environment files | `.env` files with secrets |
|
||||
| Token files | API token text files |
|
||||
| CI/CD workflows | `.gitea/` directory |
|
||||
|
||||
### Specific Files Removed
|
||||
|
||||
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/privkey.pem`
|
||||
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/RSA-privkey.pem`
|
||||
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/ECC-privkey.pem`
|
||||
- `hosts/edge/nvidia_shield/wireguard/*.conf`
|
||||
- `hosts/synology/atlantis/jitsi/.env`
|
||||
- `hosts/synology/atlantis/matrix_synapse_docs/turnserver.conf`
|
||||
- `.gitea/` directory (entire CI/CD configuration)
|
||||
|
||||
## Redacted Patterns
|
||||
|
||||
The script searches for and redacts the following types of sensitive data:
|
||||
|
||||
### Passwords
|
||||
- Generic `password`, `PASSWORD`, `PASSWD` values
|
||||
- Service-specific passwords (Jitsi, SNMP, etc.)
|
||||
|
||||
### API Keys & Tokens
|
||||
- Portainer tokens (`ptr_...`)
|
||||
- OpenAI API keys (`sk-...`)
|
||||
- Cloudflare API tokens
|
||||
- Generic API keys and secrets
|
||||
- JWT secrets and private keys
|
||||
|
||||
### Authentication
|
||||
- WireGuard private keys
|
||||
- Authentik secrets and passwords
|
||||
- Matrix/Synapse registration secrets
|
||||
- OAuth client secrets
|
||||
|
||||
### Personal Information
|
||||
- Personal email addresses replaced with examples
|
||||
- SSH public key comments
|
||||
|
||||
### Database Credentials
|
||||
- PostgreSQL/MySQL connection strings with embedded passwords
|
||||
|
||||
## Replacement Values
|
||||
|
||||
All sensitive data is replaced with descriptive placeholder text:
|
||||
|
||||
| Original | Replacement |
|
||||
|----------|-------------|
|
||||
| Passwords | `REDACTED_PASSWORD` |
|
||||
| API Keys | `REDACTED_API_KEY` |
|
||||
| Tokens | `REDACTED_TOKEN` |
|
||||
| Private Keys | `REDACTED_PRIVATE_KEY` |
|
||||
| Email addresses | `your-email@example.com` |
|
||||
|
||||
## Files Skipped
|
||||
|
||||
The following file types are not processed (binary files, etc.):
|
||||
- Images (`.png`, `.jpg`, `.jpeg`, `.gif`, `.ico`, `.svg`)
|
||||
- Fonts (`.woff`, `.woff2`, `.ttf`, `.eot`)
|
||||
- Git metadata (`.git/` directory)
|
||||
|
||||
## Running Sanitization Manually
|
||||
|
||||
To run the sanitization script locally:
|
||||
|
||||
```bash
|
||||
cd /path/to/homelab
|
||||
python3 .gitea/sanitize.py
|
||||
```
|
||||
|
||||
The script will:
|
||||
1. Remove sensitive files
|
||||
2. Remove sensitive directories
|
||||
3. Sanitize file contents across the entire repository
|
||||
|
||||
## Verification
|
||||
|
||||
After sanitization, you can verify the public repository contains no secrets by:
|
||||
|
||||
1. Searching for common secret patterns:
|
||||
```bash
|
||||
grep -r "password\s*=" --include="*.yml" --include="*.yaml" --include="*.env" .
|
||||
grep -r "sk-" --include="*.yml" --include="*.yaml" .
|
||||
grep -r "REDACTED" .
|
||||
```
|
||||
|
||||
2. Checking that `.gitea/` directory is not present
|
||||
3. Verifying no `.env` files with secrets exist
|
||||
|
||||
## Public Repository
|
||||
|
||||
The sanitized public mirror is available at:
|
||||
- **URL**: https://git.vish.gg/Vish/homelab-optimized
|
||||
- **Purpose**: Share configuration examples without exposing secrets
|
||||
- **Update Frequency**: Automatically synced on every push to main branch
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Sensitive Data Still Appearing
|
||||
|
||||
If you find sensitive data in the public mirror:
|
||||
|
||||
1. Add the file to `FILES_TO_REMOVE` in `sanitize.py`
|
||||
2. Add a new regex pattern to `SENSITIVE_PATTERNS`
|
||||
3. Run the workflow manually to re-push
|
||||
|
||||
### False Positives
|
||||
|
||||
If legitimate content is being redacted incorrectly:
|
||||
|
||||
1. Identify the pattern causing the issue
|
||||
2. Modify the regex to be more specific
|
||||
3. Test locally before pushing
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: February 17, 2026
|
||||
Reference in New Issue
Block a user