Sanitization Analysis
Overview
Section titled “Overview”Clonit includes an AI-powered sanitization analysis feature that uses a large language model (LLM) to automatically detect sensitive columns in your database and generate sanitization SQL queries. This significantly reduces the manual effort required to write comprehensive sanitization scripts.
The analysis examines your database schema (and optionally sample data) to identify columns containing personally identifiable information (PII), credentials, financial data, and other sensitive information.
Prerequisites
Section titled “Prerequisites”To use sanitization analysis, you need:
- An Anthropic API key – the analysis uses Claude as the LLM
- A target with a source database URL configured
- The web UI running (
clonit serve)
Configuration
Section titled “Configuration”Add your Anthropic API key to the config file:
analysis: api_key: "sk-ant-..." model: "claude-sonnet-4-20250514" # optional, defaults to Claude Sonnet max_tokens: 8192 # optionalOr via environment variables:
CLONIT_ANALYSIS__API_KEY=sk-ant-...CLONIT_ANALYSIS__MODEL=claude-sonnet-4-20250514Running an Analysis
Section titled “Running an Analysis”From the Web UI
Section titled “From the Web UI”- Navigate to a target’s detail page
- Click Analyze to trigger a new analysis
- Optionally configure:
- Include samples – send sample rows to the LLM for better detection (disabled by default for privacy)
- Model – override the default LLM model
- Sample rows – number of sample rows to include (default: 3)
- Wait for the analysis to complete (typically 30-120 seconds)
Analysis Results
Section titled “Analysis Results”The analysis produces two outputs:
Column Sensitivity Report
Section titled “Column Sensitivity Report”A table of all detected sensitive columns with:
| Field | Description |
|---|---|
| Schema | Database schema name |
| Table | Table name |
| Column | Column name |
| Data Type | PostgreSQL data type |
| Sensitive | Whether the column contains sensitive data |
| Category | Type of sensitivity (PII, credentials, financial, etc.) |
| Confidence | Detection confidence (high, medium, low) |
| Reason | Explanation of why the column was flagged |
| Sanitize Hint | Suggested sanitization approach |
Generated Sanitization Query
Section titled “Generated Sanitization Query”An auto-generated SQL query that sanitizes all detected sensitive columns. The query is created as a new sanitization query with source: generated and is linked to the analysis.
Query Management
Section titled “Query Management”Viewing Queries
Section titled “Viewing Queries”The target detail page shows all sanitization queries. Each query displays:
- The SQL text
- Whether it is active (used for sanitization runs)
- Its source (
generatedfrom analysis orimportedmanually) - Version number
Editing Queries
Section titled “Editing Queries”Click the edit button on any query to modify its SQL in-place. Save or cancel to apply changes.
Creating Manual Queries
Section titled “Creating Manual Queries”Click New Query to write a custom sanitization query from scratch. Manual queries are created with source: imported.
Applying Debug Suggestions
Section titled “Applying Debug Suggestions”When an ephemeral sanitization run fails, the system can provide AI-generated debug suggestions including a fixed query. Click Apply Fix to create a new query from the suggested fix.
Activating Queries
Section titled “Activating Queries”Only one query per target can be active at a time. Use the Activate button to switch which query is used for sanitization runs.
Ephemeral Sanitization
Section titled “Ephemeral Sanitization”Ephemeral sanitization runs the full sanitize pipeline in a Docker container, providing a safe and isolated environment for testing sanitization queries.
How It Works
Section titled “How It Works”- Restore – the source snapshot is restored to a PostgreSQL container
- Apply – the active sanitization query is executed against the restored database
- Dump – the sanitized database is dumped to a new snapshot
- Cleanup – the Docker container is removed
Running an Ephemeral Sanitize
Section titled “Running an Ephemeral Sanitize”From the target detail page:
- Click Ephemeral Sanitize
- Configure options:
- Snapshot – select which snapshot to sanitize (defaults to latest)
- Query – select which sanitization query to apply (defaults to active)
- Fast mode – skip the final dump step (useful for testing queries)
- Build new – build a fresh snapshot before sanitizing
- Docker image – override the PostgreSQL Docker image
- Click Start
Real-Time Progress
Section titled “Real-Time Progress”The ephemeral sanitize dialog shows step-by-step progress via server-sent events (SSE):
- Container creation
- Snapshot restore
- Query execution
- Sanitized dump
- Cleanup
Each step shows its status (pending, running, completed, failed) with timing information.
Cancelling a Run
Section titled “Cancelling a Run”Click Cancel to abort a running or pending ephemeral sanitization.
Debug Suggestions
Section titled “Debug Suggestions”When a run fails (typically due to a SQL error in the sanitization query), the system can generate AI-powered debug suggestions:
- Explanation of what went wrong
- Suggestions for how to fix the issue
- Fixed query – a corrected version of the SQL that can be applied with one click
Run History
Section titled “Run History”The target detail page shows a history of all ephemeral sanitization runs with status, timing, and access to debug suggestions.
Workflow Example
Section titled “Workflow Example”A typical AI-assisted sanitization workflow:
1. Add a target with source and destination URLs2. Build a snapshot from the source database3. Run sanitization analysis to detect sensitive columns4. Review the generated query and edit if needed5. Run ephemeral sanitize to test the query6. If it fails, apply the AI-suggested fix7. Repeat until the sanitization passes8. Use the query for production sanitization runsSee Also
Section titled “See Also”- Sanitization – CLI-based sanitization workflow
- serve command – Start the web UI
- Snapshot Workflow – Build, load, push, pull