Files
homelab-optimized/docs/admin/REPOSITORY_SANITIZATION.md
Gitea Mirror Bot 5cdf36e545
Some checks failed
Documentation / Deploy to GitHub Pages (push) Has been cancelled
Documentation / Build Docusaurus (push) Has been cancelled
Sanitized mirror from private repository - 2026-04-05 12:11:15 UTC
2026-04-05 12:11:15 +00:00

4.3 KiB

Repository Sanitization

This document describes the sanitization process used to create a safe public mirror of the private homelab repository.

Overview

The .gitea/sanitize.py script automatically removes sensitive information before pushing content to the public repository (homelab-optimized). This ensures that while the public repo contains useful configuration examples, no actual secrets, passwords, or private keys are exposed.

How It Works

The sanitization script runs as part of the Mirror to Public Repository GitHub Actions workflow. It performs three main operations:

  1. Remove sensitive files completely - Files containing only secrets are deleted
  2. Remove entire directories - Directories that shouldn't be public are deleted
  3. Redact sensitive patterns - Searches and replaces secrets in file contents

Files Removed Completely

The following categories of files are completely removed from the public mirror:

Category Examples
Private keys/certificates .pem private keys, WireGuard configs
Environment files .env files with secrets
Token files API token text files
CI/CD workflows .gitea/ directory

Specific Files Removed

  • hosts/synology/atlantis/matrix_synapse_docs/turn_cert/privkey.pem
  • hosts/synology/atlantis/matrix_synapse_docs/turn_cert/RSA-privkey.pem
  • hosts/synology/atlantis/matrix_synapse_docs/turn_cert/ECC-privkey.pem
  • hosts/edge/nvidia_shield/wireguard/*.conf
  • hosts/synology/atlantis/jitsi/.env
  • hosts/synology/atlantis/matrix_synapse_docs/turnserver.conf
  • .gitea/ directory (entire CI/CD configuration)

Redacted Patterns

The script searches for and redacts the following types of sensitive data:

Passwords

  • Generic password, PASSWORD, PASSWD values
  • Service-specific passwords (Jitsi, SNMP, etc.)

API Keys & Tokens

  • Portainer tokens (ptr_...)
  • OpenAI API keys (sk-...)
  • Cloudflare API tokens
  • Generic API keys and secrets
  • JWT secrets and private keys

Authentication

  • WireGuard private keys
  • Authentik secrets and passwords
  • Matrix/Synapse registration secrets
  • OAuth client secrets

Personal Information

  • Personal email addresses replaced with examples
  • SSH public key comments

Database Credentials

  • PostgreSQL/MySQL connection strings with embedded passwords

Replacement Values

All sensitive data is replaced with descriptive placeholder text:

Original Replacement
Passwords REDACTED_PASSWORD
API Keys REDACTED_API_KEY
Tokens REDACTED_TOKEN
Private Keys REDACTED_PRIVATE_KEY
Email addresses your-email@example.com

Files Skipped

The following file types are not processed (binary files, etc.):

  • Images (.png, .jpg, .jpeg, .gif, .ico, .svg)
  • Fonts (.woff, .woff2, .ttf, .eot)
  • Git metadata (.git/ directory)

Running Sanitization Manually

To run the sanitization script locally:

cd /path/to/homelab
python3 .gitea/sanitize.py

The script will:

  1. Remove sensitive files
  2. Remove sensitive directories
  3. Sanitize file contents across the entire repository

Verification

After sanitization, you can verify the public repository contains no secrets by:

  1. Searching for common secret patterns:

    grep -r "password\s*=" --include="*.yml" --include="*.yaml" --include="*.env" .
    grep -r "sk-" --include="*.yml" --include="*.yaml" .
    grep -r "REDACTED" .
    
  2. Checking that .gitea/ directory is not present

  3. Verifying no .env files with secrets exist

Public Repository

The sanitized public mirror is available at:

Troubleshooting

Sensitive Data Still Appearing

If you find sensitive data in the public mirror:

  1. Add the file to FILES_TO_REMOVE in sanitize.py
  2. Add a new regex pattern to SENSITIVE_PATTERNS
  3. Run the workflow manually to re-push

False Positives

If legitimate content is being redacted incorrectly:

  1. Identify the pattern causing the issue
  2. Modify the regex to be more specific
  3. Test locally before pushing

Last Updated: February 17, 2026