Sanitization Analysis

Overview

Clonit includes an AI-powered sanitization analysis feature that uses a large language model (LLM) to automatically detect sensitive columns in your database and generate sanitization SQL queries. This significantly reduces the manual effort required to write comprehensive sanitization scripts.

The analysis examines your database schema (and optionally sample data) to identify columns containing personally identifiable information (PII), credentials, financial data, and other sensitive information.

Prerequisites

To use sanitization analysis, you need:

An Anthropic API key – the analysis uses Claude as the LLM
A target with a source database URL configured
The web UI running (clonit serve)

Configuration

Add your Anthropic API key to the config file:

analysis:
  api_key: "sk-ant-..."
  model: "claude-sonnet-4-20250514"    # optional, defaults to Claude Sonnet
  max_tokens: 8192                     # optional

Or via environment variables:

CLONIT_ANALYSIS__API_KEY=sk-ant-...
CLONIT_ANALYSIS__MODEL=claude-sonnet-4-20250514

Running an Analysis

From the Web UI

Navigate to a target’s detail page
Click Analyze to trigger a new analysis
Optionally configure:
- Include samples – send sample rows to the LLM for better detection (disabled by default for privacy)
- Model – override the default LLM model
- Sample rows – number of sample rows to include (default: 3)
Wait for the analysis to complete (typically 30-120 seconds)

Analysis Results

The analysis produces two outputs:

Column Sensitivity Report

A table of all detected sensitive columns with:

Field	Description
Schema	Database schema name
Table	Table name
Column	Column name
Data Type	PostgreSQL data type
Sensitive	Whether the column contains sensitive data
Category	Type of sensitivity (PII, credentials, financial, etc.)
Confidence	Detection confidence (high, medium, low)
Reason	Explanation of why the column was flagged
Sanitize Hint	Suggested sanitization approach

Generated Sanitization Query

An auto-generated SQL query that sanitizes all detected sensitive columns. The query is created as a new sanitization query with source: generated and is linked to the analysis.

Query Management

Viewing Queries

The target detail page shows all sanitization queries. Each query displays:

The SQL text
Whether it is active (used for sanitization runs)
Its source (generated from analysis or imported manually)
Version number

Editing Queries

Click the edit button on any query to modify its SQL in-place. Save or cancel to apply changes.

Creating Manual Queries

Click New Query to write a custom sanitization query from scratch. Manual queries are created with source: imported.

Applying Debug Suggestions

When an ephemeral sanitization run fails, the system can provide AI-generated debug suggestions including a fixed query. Click Apply Fix to create a new query from the suggested fix.

Activating Queries

Only one query per target can be active at a time. Use the Activate button to switch which query is used for sanitization runs.

Ephemeral Sanitization

Ephemeral sanitization runs the full sanitize pipeline in a Docker container, providing a safe and isolated environment for testing sanitization queries.

How It Works

Restore – the source snapshot is restored to a PostgreSQL container
Apply – the active sanitization query is executed against the restored database
Dump – the sanitized database is dumped to a new snapshot
Cleanup – the Docker container is removed

Running an Ephemeral Sanitize

From the target detail page:

Click Ephemeral Sanitize
Configure options:
- Snapshot – select which snapshot to sanitize (defaults to latest)
- Query – select which sanitization query to apply (defaults to active)
- Fast mode – skip the final dump step (useful for testing queries)
- Build new – build a fresh snapshot before sanitizing
- Docker image – override the PostgreSQL Docker image
Click Start

Real-Time Progress

The ephemeral sanitize dialog shows step-by-step progress via server-sent events (SSE):

Container creation
Snapshot restore
Query execution
Sanitized dump
Cleanup

Each step shows its status (pending, running, completed, failed) with timing information.

Cancelling a Run

Click Cancel to abort a running or pending ephemeral sanitization.

Debug Suggestions

When a run fails (typically due to a SQL error in the sanitization query), the system can generate AI-powered debug suggestions:

Explanation of what went wrong
Suggestions for how to fix the issue
Fixed query – a corrected version of the SQL that can be applied with one click

Run History

The target detail page shows a history of all ephemeral sanitization runs with status, timing, and access to debug suggestions.

Workflow Example

A typical AI-assisted sanitization workflow:

1. Add a target with source and destination URLs
2. Build a snapshot from the source database
3. Run sanitization analysis to detect sensitive columns
4. Review the generated query and edit if needed
5. Run ephemeral sanitize to test the query
6. If it fails, apply the AI-suggested fix
7. Repeat until the sanitization passes
8. Use the query for production sanitization runs