Sanitization
Overview
Section titled “Overview”Sanitization runs a SQL transformation on a snapshot to remove or replace sensitive data. It operates as a 3-step pipeline:
- Restore the source snapshot to a dedicated sanitization database
- Execute the sanitization SQL query against that database
- Dump the sanitized database to a new snapshot
The result is a new snapshot with type sanitized that is safe to share with developers, use in CI/CD, or load into non-production environments.
Prerequisites
Section titled “Prerequisites”The target must have two additional fields configured:
| Field | Description |
|---|---|
sanitize_dst_url |
Connection URL for the dedicated sanitization database |
sanitize_query_file |
Path to a SQL file containing the sanitization queries |
When adding a target, include the sanitization fields:
clonit targets add \ --name mydb \ --src-url "postgres://user:pass@prod:5432/mydb" \ --dst-url "postgres://user:pass@localhost:5432/mydb_dev" \ --sanitize-dst-url "postgres://user:pass@localhost:5432/mydb_sanitize" \ --sanitize-query-file /path/to/sanitize.sqlWriting Sanitization Queries
Section titled “Writing Sanitization Queries”The sanitization query file is a plain SQL file that runs against the restored snapshot. You can use any valid SQL statements – UPDATE, DELETE, TRUNCATE, etc.
Example sanitize.sql
Section titled “Example sanitize.sql”-- Replace email addresses with deterministic fake valuesUPDATE users SET email = concat('user', id, '@example.com');
-- Replace password hashes with a known dummy valueUPDATE users SET password_hash = '$2a$10$fake_hash';
-- Remove audit logs entirelyTRUNCATE TABLE audit_logs;Tips for writing sanitization queries
Section titled “Tips for writing sanitization queries”- Use deterministic replacements (e.g., based on
id) so the sanitized data is consistent across runs. - Truncate large tables that contain only operational data (logs, events, sessions) to reduce snapshot size.
- Test your queries against a copy of the database before using them in production workflows.
- Order matters – queries execute sequentially in the order they appear in the file.
Running Sanitization
Section titled “Running Sanitization”Sanitize using the latest snapshot
Section titled “Sanitize using the latest snapshot”clonit sanitize mydbSanitize a specific snapshot by index
Section titled “Sanitize a specific snapshot by index”clonit sanitize mydb 0Use clonit snapshots mydb to see available snapshot indices.
Output
Section titled “Output”After sanitization completes, a new snapshot is created with type sanitized. You can see it in the snapshots list:
clonit snapshots mydbExample output:
Index Name Type Size Created0 mydb-20260208T120000 snapshot 245 MB 2026-02-08 12:00:001 mydb-20260208T120000-sanitized sanitized 198 MB 2026-02-08 12:05:00The sanitized snapshot can then be used like any other snapshot – load it to a destination database, push it to cloud storage, or share it with your team.
Workflow Example
Section titled “Workflow Example”A typical sanitization workflow looks like this:
# 1. Build a snapshot from productionclonit build mydb
# 2. Sanitize the snapshotclonit sanitize mydb
# 3. Load the sanitized snapshot to a dev databaseclonit load mydb 1
# 4. Optionally push the sanitized snapshot to cloud storageclonit push mydb 1See Also
Section titled “See Also”- Sanitization Analysis – AI-powered PII detection and auto-generated sanitization queries
- Cloud Storage – push and pull snapshots with S3 or Cloudflare R2
- Snapshot Workflow – build, load, and manage snapshots