# Repository Sanitization

This document describes the sanitization process used to create a safe public mirror of the private homelab repository.

## Overview

The `.gitea/sanitize.py` script automatically removes sensitive information before pushing content to the public repository ([homelab-optimized](https://git.vish.gg/Vish/homelab-optimized)). This ensures that while the public repo contains useful configuration examples, no actual secrets, passwords, or private keys are exposed.

## How It Works

The sanitization script runs as part of the [Mirror to Public Repository](../.gitea/workflows/mirror-to-public.yaml) GitHub Actions workflow. It performs three main operations:

1. **Remove sensitive files completely** - Files containing only secrets are deleted
2. **Remove entire directories** - Directories that shouldn't be public are deleted
3. **Redact sensitive patterns** - Searches and replaces secrets in file contents

## Files Removed Completely

The following categories of files are completely removed from the public mirror:

| Category | Examples |
|----------|----------|
| Private keys/certificates | `.pem` private keys, WireGuard configs |
| Environment files | `.env` files with secrets |
| Token files | API token text files |
| CI/CD workflows | `.gitea/` directory |

### Specific Files Removed

- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/privkey.pem`
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/RSA-privkey.pem`
- `hosts/synology/atlantis/matrix_synapse_docs/turn_cert/ECC-privkey.pem`
- `hosts/edge/nvidia_shield/wireguard/*.conf`
- `hosts/synology/atlantis/jitsi/.env`
- `hosts/synology/atlantis/matrix_synapse_docs/turnserver.conf`
- `.gitea/` directory (entire CI/CD configuration)

## Redacted Patterns

The script searches for and redacts the following types of sensitive data:

### Passwords
- Generic `password`, `PASSWORD`, `PASSWD` values
- Service-specific passwords (Jitsi, SNMP, etc.)

### API Keys & Tokens
- Portainer tokens (`ptr_...`)
- OpenAI API keys (`sk-...`)
- Cloudflare API tokens
- Generic API keys and secrets
- JWT secrets and private keys

### Authentication
- WireGuard private keys
- Authentik secrets and passwords
- Matrix/Synapse registration secrets
- OAuth client secrets

### Personal Information
- Personal email addresses replaced with examples
- SSH public key comments

### Database Credentials
- PostgreSQL/MySQL connection strings with embedded passwords

## Replacement Values

All sensitive data is replaced with descriptive placeholder text:

| Original | Replacement |
|----------|-------------|
| Passwords | `REDACTED_PASSWORD` |
| API Keys | `REDACTED_API_KEY` |
| Tokens | `REDACTED_TOKEN` |
| Private Keys | `REDACTED_PRIVATE_KEY` |
| Email addresses | `your-email@example.com` |

## Files Skipped

The following file types are not processed (binary files, etc.):
- Images (`.png`, `.jpg`, `.jpeg`, `.gif`, `.ico`, `.svg`)
- Fonts (`.woff`, `.woff2`, `.ttf`, `.eot`)
- Git metadata (`.git/` directory)

## Running Sanitization Manually

To run the sanitization script locally:

```bash
cd /path/to/homelab
python3 .gitea/sanitize.py
```

The script will:
1. Remove sensitive files
2. Remove sensitive directories
3. Sanitize file contents across the entire repository

## Verification

After sanitization, you can verify the public repository contains no secrets by:

1. Searching for common secret patterns:
   ```bash
   grep -r "password\s*=" --include="*.yml" --include="*.yaml" --include="*.env" .
   grep -r "sk-" --include="*.yml" --include="*.yaml" .
   grep -r "REDACTED" .
   ```

2. Checking that `.gitea/` directory is not present
3. Verifying no `.env` files with secrets exist

## Public Repository

The sanitized public mirror is available at:
- **URL**: https://git.vish.gg/Vish/homelab-optimized
- **Purpose**: Share configuration examples without exposing secrets
- **Update Frequency**: Automatically synced on every push to main branch

## Troubleshooting

### Sensitive Data Still Appearing

If you find sensitive data in the public mirror:

1. Add the file to `FILES_TO_REMOVE` in `sanitize.py`
2. Add a new regex pattern to `SENSITIVE_PATTERNS`
3. Run the workflow manually to re-push

### False Positives

If legitimate content is being redacted incorrectly:

1. Identify the pattern causing the issue
2. Modify the regex to be more specific
3. Test locally before pushing

---

**Last Updated**: February 17, 2026