Files
homelab-optimized/docs/admin/REPOSITORY_SANITIZATION.md
Gitea Mirror Bot 3cb5034cc6
Some checks failed
Documentation / Build Docusaurus (push) Failing after 18m5s
Documentation / Deploy to GitHub Pages (push) Has been skipped
Sanitized mirror from private repository - 2026-03-20 09:49:40 UTC
2026-03-20 09:49:40 +00:00

141 lines
4.3 KiB
Markdown

# Repository Sanitization
This document describes the sanitization process used to create a safe public mirror of the private homelab repository.
## Overview
The `.gitea/sanitize.py` script automatically removes sensitive information before pushing content to the public repository ([homelab-optimized](https://git.vish.gg/Vish/homelab-optimized)). This ensures that while the public repo contains useful configuration examples, no actual secrets, passwords, or private keys are exposed.
## How It Works
The sanitization script runs as part of the [Mirror to Public Repository](../.gitea/workflows/mirror-to-public.yaml) GitHub Actions workflow. It performs three main operations:
1. **Remove sensitive files completely** - Files containing only secrets are deleted
2. **Remove entire directories** - Directories that shouldn't be public are deleted
3. **Redact sensitive patterns** - Searches and replaces secrets in file contents
## Files Removed Completely
The following categories of files are completely removed from the public mirror:
| Category | Examples |
|----------|----------|
| Private keys/certificates | `.pem` private keys, WireGuard configs |
| Environment files | `.env` files with secrets |
| Token files | API token text files |
| CI/CD workflows | `.gitea/` directory |
### Specific Files Removed
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/privkey.pem`
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/RSA-privkey.pem`
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/ECC-privkey.pem`
- `hosts/edge/nvidia_shield/wireguard/*.conf`
- `hosts/synology/atlantis/jitsi/.env`
- `hosts/synology/atlantis/matrix_synapse_docs/turnserver.conf`
- `.gitea/` directory (entire CI/CD configuration)
## Redacted Patterns
The script searches for and redacts the following types of sensitive data:
### Passwords
- Generic `password`, `PASSWORD`, `PASSWD` values
- Service-specific passwords (Jitsi, SNMP, etc.)
### API Keys & Tokens
- Portainer tokens (`ptr_...`)
- OpenAI API keys (`sk-...`)
- Cloudflare API tokens
- Generic API keys and secrets
- JWT secrets and private keys
### Authentication
- WireGuard private keys
- Authentik secrets and passwords
- Matrix/Synapse registration secrets
- OAuth client secrets
### Personal Information
- Personal email addresses replaced with examples
- SSH public key comments
### Database Credentials
- PostgreSQL/MySQL connection strings with embedded passwords
## Replacement Values
All sensitive data is replaced with descriptive placeholder text:
| Original | Replacement |
|----------|-------------|
| Passwords | `REDACTED_PASSWORD` |
| API Keys | `REDACTED_API_KEY` |
| Tokens | `REDACTED_TOKEN` |
| Private Keys | `REDACTED_PRIVATE_KEY` |
| Email addresses | `your-email@example.com` |
## Files Skipped
The following file types are not processed (binary files, etc.):
- Images (`.png`, `.jpg`, `.jpeg`, `.gif`, `.ico`, `.svg`)
- Fonts (`.woff`, `.woff2`, `.ttf`, `.eot`)
- Git metadata (`.git/` directory)
## Running Sanitization Manually
To run the sanitization script locally:
```bash
cd /path/to/homelab
python3 .gitea/sanitize.py
```
The script will:
1. Remove sensitive files
2. Remove sensitive directories
3. Sanitize file contents across the entire repository
## Verification
After sanitization, you can verify the public repository contains no secrets by:
1. Searching for common secret patterns:
```bash
grep -r "password\s*=" --include="*.yml" --include="*.yaml" --include="*.env" .
grep -r "sk-" --include="*.yml" --include="*.yaml" .
grep -r "REDACTED" .
```
2. Checking that `.gitea/` directory is not present
3. Verifying no `.env` files with secrets exist
## Public Repository
The sanitized public mirror is available at:
- **URL**: https://git.vish.gg/Vish/homelab-optimized
- **Purpose**: Share configuration examples without exposing secrets
- **Update Frequency**: Automatically synced on every push to main branch
## Troubleshooting
### Sensitive Data Still Appearing
If you find sensitive data in the public mirror:
1. Add the file to `FILES_TO_REMOVE` in `sanitize.py`
2. Add a new regex pattern to `SENSITIVE_PATTERNS`
3. Run the workflow manually to re-push
### False Positives
If legitimate content is being redacted incorrectly:
1. Identify the pattern causing the issue
2. Modify the regex to be more specific
3. Test locally before pushing
---
**Last Updated**: February 17, 2026