4.3 KiB
Repository Sanitization
This document describes the sanitization process used to create a safe public mirror of the private homelab repository.
Overview
The .gitea/sanitize.py script automatically removes sensitive information before pushing content to the public repository (homelab-optimized). This ensures that while the public repo contains useful configuration examples, no actual secrets, passwords, or private keys are exposed.
How It Works
The sanitization script runs as part of the Mirror to Public Repository GitHub Actions workflow. It performs three main operations:
- Remove sensitive files completely - Files containing only secrets are deleted
- Remove entire directories - Directories that shouldn't be public are deleted
- Redact sensitive patterns - Searches and replaces secrets in file contents
Files Removed Completely
The following categories of files are completely removed from the public mirror:
| Category | Examples |
|---|---|
| Private keys/certificates | .pem private keys, WireGuard configs |
| Environment files | .env files with secrets |
| Token files | API token text files |
| CI/CD workflows | .gitea/ directory |
Specific Files Removed
hosts/synology/atlantis/matrix_synapse_docs/turn_cert/privkey.pemhosts/synology/atlantis/matrix_synapse_docs/turn_cert/RSA-privkey.pemhosts/synology/atlantis/matrix_synapse_docs/turn_cert/ECC-privkey.pemhosts/edge/nvidia_shield/wireguard/*.confhosts/synology/atlantis/jitsi/.envhosts/synology/atlantis/matrix_synapse_docs/turnserver.conf.gitea/directory (entire CI/CD configuration)
Redacted Patterns
The script searches for and redacts the following types of sensitive data:
Passwords
- Generic
password,PASSWORD,PASSWDvalues - Service-specific passwords (Jitsi, SNMP, etc.)
API Keys & Tokens
- Portainer tokens (
ptr_...) - OpenAI API keys (
sk-...) - Cloudflare API tokens
- Generic API keys and secrets
- JWT secrets and private keys
Authentication
- WireGuard private keys
- Authentik secrets and passwords
- Matrix/Synapse registration secrets
- OAuth client secrets
Personal Information
- Personal email addresses replaced with examples
- SSH public key comments
Database Credentials
- PostgreSQL/MySQL connection strings with embedded passwords
Replacement Values
All sensitive data is replaced with descriptive placeholder text:
| Original | Replacement |
|---|---|
| Passwords | REDACTED_PASSWORD |
| API Keys | REDACTED_API_KEY |
| Tokens | REDACTED_TOKEN |
| Private Keys | REDACTED_PRIVATE_KEY |
| Email addresses | your-email@example.com |
Files Skipped
The following file types are not processed (binary files, etc.):
- Images (
.png,.jpg,.jpeg,.gif,.ico,.svg) - Fonts (
.woff,.woff2,.ttf,.eot) - Git metadata (
.git/directory)
Running Sanitization Manually
To run the sanitization script locally:
cd /path/to/homelab
python3 .gitea/sanitize.py
The script will:
- Remove sensitive files
- Remove sensitive directories
- Sanitize file contents across the entire repository
Verification
After sanitization, you can verify the public repository contains no secrets by:
-
Searching for common secret patterns:
grep -r "password\s*=" --include="*.yml" --include="*.yaml" --include="*.env" . grep -r "sk-" --include="*.yml" --include="*.yaml" . grep -r "REDACTED" . -
Checking that
.gitea/directory is not present -
Verifying no
.envfiles with secrets exist
Public Repository
The sanitized public mirror is available at:
- URL: https://git.vish.gg/Vish/homelab-optimized
- Purpose: Share configuration examples without exposing secrets
- Update Frequency: Automatically synced on every push to main branch
Troubleshooting
Sensitive Data Still Appearing
If you find sensitive data in the public mirror:
- Add the file to
FILES_TO_REMOVEinsanitize.py - Add a new regex pattern to
SENSITIVE_PATTERNS - Run the workflow manually to re-push
False Positives
If legitimate content is being redacted incorrectly:
- Identify the pattern causing the issue
- Modify the regex to be more specific
- Test locally before pushing
Last Updated: February 17, 2026