# Repository Sanitization This document describes the sanitization process used to create a safe public mirror of the private homelab repository. ## Overview The `.gitea/sanitize.py` script automatically removes sensitive information before pushing content to the public repository ([homelab-optimized](https://git.vish.gg/Vish/homelab-optimized)). This ensures that while the public repo contains useful configuration examples, no actual secrets, passwords, or private keys are exposed. ## How It Works The sanitization script runs as part of the [Mirror to Public Repository](../.gitea/workflows/mirror-to-public.yaml) GitHub Actions workflow. It performs three main operations: 1. **Remove sensitive files completely** - Files containing only secrets are deleted 2. **Remove entire directories** - Directories that shouldn't be public are deleted 3. **Redact sensitive patterns** - Searches and replaces secrets in file contents ## Files Removed Completely The following categories of files are completely removed from the public mirror: | Category | Examples | |----------|----------| | Private keys/certificates | `.pem` private keys, WireGuard configs | | Environment files | `.env` files with secrets | | Token files | API token text files | | CI/CD workflows | `.gitea/` directory | ### Specific Files Removed - `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/privkey.pem` - `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/RSA-privkey.pem` - `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/ECC-privkey.pem` - `hosts/edge/nvidia_shield/wireguard/*.conf` - `hosts/synology/atlantis/jitsi/.env` - `hosts/synology/atlantis/matrix_synapse_docs/turnserver.conf` - `.gitea/` directory (entire CI/CD configuration) ## Redacted Patterns The script searches for and redacts the following types of sensitive data: ### Passwords - Generic `password`, `PASSWORD`, `PASSWD` values - Service-specific passwords (Jitsi, SNMP, etc.) ### API Keys & Tokens - Portainer tokens (`ptr_...`) - OpenAI API keys (`sk-...`) - Cloudflare API tokens - Generic API keys and secrets - JWT secrets and private keys ### Authentication - WireGuard private keys - Authentik secrets and passwords - Matrix/Synapse registration secrets - OAuth client secrets ### Personal Information - Personal email addresses replaced with examples - SSH public key comments ### Database Credentials - PostgreSQL/MySQL connection strings with embedded passwords ## Replacement Values All sensitive data is replaced with descriptive placeholder text: | Original | Replacement | |----------|-------------| | Passwords | `REDACTED_PASSWORD` | | API Keys | `REDACTED_API_KEY` | | Tokens | `REDACTED_TOKEN` | | Private Keys | `REDACTED_PRIVATE_KEY` | | Email addresses | `your-email@example.com` | ## Files Skipped The following file types are not processed (binary files, etc.): - Images (`.png`, `.jpg`, `.jpeg`, `.gif`, `.ico`, `.svg`) - Fonts (`.woff`, `.woff2`, `.ttf`, `.eot`) - Git metadata (`.git/` directory) ## Running Sanitization Manually To run the sanitization script locally: ```bash cd /path/to/homelab python3 .gitea/sanitize.py ``` The script will: 1. Remove sensitive files 2. Remove sensitive directories 3. Sanitize file contents across the entire repository ## Verification After sanitization, you can verify the public repository contains no secrets by: 1. Searching for common secret patterns: ```bash grep -r "password\s*=" --include="*.yml" --include="*.yaml" --include="*.env" . grep -r "sk-" --include="*.yml" --include="*.yaml" . grep -r "REDACTED" . ``` 2. Checking that `.gitea/` directory is not present 3. Verifying no `.env` files with secrets exist ## Public Repository The sanitized public mirror is available at: - **URL**: https://git.vish.gg/Vish/homelab-optimized - **Purpose**: Share configuration examples without exposing secrets - **Update Frequency**: Automatically synced on every push to main branch ## Troubleshooting ### Sensitive Data Still Appearing If you find sensitive data in the public mirror: 1. Add the file to `FILES_TO_REMOVE` in `sanitize.py` 2. Add a new regex pattern to `SENSITIVE_PATTERNS` 3. Run the workflow manually to re-push ### False Positives If legitimate content is being redacted incorrectly: 1. Identify the pattern causing the issue 2. Modify the regex to be more specific 3. Test locally before pushing --- **Last Updated**: February 17, 2026