Skip to content

Sanitization Analysis

Clonit includes an AI-powered sanitization analysis feature that uses a large language model (LLM) to automatically detect sensitive columns in your database and generate sanitization SQL queries. This significantly reduces the manual effort required to write comprehensive sanitization scripts.

The analysis examines your database schema (and optionally sample data) to identify columns containing personally identifiable information (PII), credentials, financial data, and other sensitive information.

To use sanitization analysis, you need:

  1. An Anthropic API key – the analysis uses Claude as the LLM
  2. A target with a source database URL configured
  3. The web UI running (clonit serve)

Add your Anthropic API key to the config file:

analysis:
api_key: "sk-ant-..."
model: "claude-sonnet-4-20250514" # optional, defaults to Claude Sonnet
max_tokens: 8192 # optional

Or via environment variables:

Terminal window
CLONIT_ANALYSIS__API_KEY=sk-ant-...
CLONIT_ANALYSIS__MODEL=claude-sonnet-4-20250514
  1. Navigate to a target’s detail page
  2. Click Analyze to trigger a new analysis
  3. Optionally configure:
    • Include samples – send sample rows to the LLM for better detection (disabled by default for privacy)
    • Model – override the default LLM model
    • Sample rows – number of sample rows to include (default: 3)
  4. Wait for the analysis to complete (typically 30-120 seconds)

The analysis produces two outputs:

A table of all detected sensitive columns with:

Field Description
Schema Database schema name
Table Table name
Column Column name
Data Type PostgreSQL data type
Sensitive Whether the column contains sensitive data
Category Type of sensitivity (PII, credentials, financial, etc.)
Confidence Detection confidence (high, medium, low)
Reason Explanation of why the column was flagged
Sanitize Hint Suggested sanitization approach

An auto-generated SQL query that sanitizes all detected sensitive columns. The query is created as a new sanitization query with source: generated and is linked to the analysis.

The target detail page shows all sanitization queries. Each query displays:

  • The SQL text
  • Whether it is active (used for sanitization runs)
  • Its source (generated from analysis or imported manually)
  • Version number

Click the edit button on any query to modify its SQL in-place. Save or cancel to apply changes.

Click New Query to write a custom sanitization query from scratch. Manual queries are created with source: imported.

When an ephemeral sanitization run fails, the system can provide AI-generated debug suggestions including a fixed query. Click Apply Fix to create a new query from the suggested fix.

Only one query per target can be active at a time. Use the Activate button to switch which query is used for sanitization runs.

Ephemeral sanitization runs the full sanitize pipeline in a Docker container, providing a safe and isolated environment for testing sanitization queries.

  1. Restore – the source snapshot is restored to a PostgreSQL container
  2. Apply – the active sanitization query is executed against the restored database
  3. Dump – the sanitized database is dumped to a new snapshot
  4. Cleanup – the Docker container is removed

From the target detail page:

  1. Click Ephemeral Sanitize
  2. Configure options:
    • Snapshot – select which snapshot to sanitize (defaults to latest)
    • Query – select which sanitization query to apply (defaults to active)
    • Fast mode – skip the final dump step (useful for testing queries)
    • Build new – build a fresh snapshot before sanitizing
    • Docker image – override the PostgreSQL Docker image
  3. Click Start

The ephemeral sanitize dialog shows step-by-step progress via server-sent events (SSE):

  • Container creation
  • Snapshot restore
  • Query execution
  • Sanitized dump
  • Cleanup

Each step shows its status (pending, running, completed, failed) with timing information.

Click Cancel to abort a running or pending ephemeral sanitization.

When a run fails (typically due to a SQL error in the sanitization query), the system can generate AI-powered debug suggestions:

  • Explanation of what went wrong
  • Suggestions for how to fix the issue
  • Fixed query – a corrected version of the SQL that can be applied with one click

The target detail page shows a history of all ephemeral sanitization runs with status, timing, and access to debug suggestions.

A typical AI-assisted sanitization workflow:

1. Add a target with source and destination URLs
2. Build a snapshot from the source database
3. Run sanitization analysis to detect sensitive columns
4. Review the generated query and edit if needed
5. Run ephemeral sanitize to test the query
6. If it fails, apply the AI-suggested fix
7. Repeat until the sanitization passes
8. Use the query for production sanitization runs