Cloud AI Analysis
clonit analyze uses AI to read a target’s schema, flag columns that look
sensitive (PII, credentials, financial data, and so on), and generate a
versioned sanitization SQL query you can review and run. When Clonit Cloud is
connected, that analysis runs server-side, per organization instead of on
your machine — so a single shared analysis and generated query are available to
your whole team, billed against the organization rather than each person’s own
Anthropic key.
The command and its results are identical either way. What changes is where the LLM runs and whose Anthropic key pays for it.
What leaves your machine — and what never does
Section titled “What leaves your machine — and what never does”This is the most important thing to understand about cloud analysis:
- What is sent: your target’s schema metadata (table, column, and data type names) and, unless you turn sampling off, a few sample rows per table to improve detection accuracy. The agent introspects your source database locally and ships only this metadata to the cloud.
- What is never sent: your database connection URLs and credentials. The agent connects to your database itself, on your machine. The cloud runs the LLM with its own Anthropic key and never sees how to reach your database.
How Clonit chooses cloud vs. local
Section titled “How Clonit chooses cloud vs. local”The routing is automatic and follows one rule:
-
Cloud is connected (you’ve run
clonit loginor set acloud.api_key) → analysis runs in the cloud. The CLI tells you so:Cloud connected: analysis will run server-side (use --local to force local). -
Cloud is not connected → analysis runs locally with your own
analysis.api_key. -
Cloud is connected but you pass
--local→ analysis is forced to run locally with your own key, skipping the cloud entirely.
If the cloud is connected but momentarily unreachable, Clonit falls back to a
local analysis — provided you have a local analysis.api_key configured. With no
local key and no reachable cloud, the command stops with:
analysis API key not configured; set analysis.api_key (or CLONIT_ANALYSIS__API_KEY),or connect a cloud that runs analysis (clonit login)Running a cloud analysis
Section titled “Running a cloud analysis”The command is the same one you already know:
clonit analyze mydbFrom clonit analyze --help:
Analyzes the target database schema using AI to identify sensitive columns(PII, credentials, financial data, etc.) and generates a sanitization SQL query.
Usage: clonit analyze <target> [flags]
Flags: -h, --help help for analyze --local force local LLM analysis even when cloud is configured --model string override Claude model --no-samples skip data sampling (schema only) --sample-rows int rows per table for sampling (default from config)| Flag | What it does in the cloud flow |
|---|---|
--local |
Bypass the cloud and run the analysis on this machine with your own analysis.api_key. |
--no-samples |
Send schema metadata only — no sample rows leave your machine. |
--sample-rows <n> |
Number of rows per table to sample (default 3 from config). |
--model <name> |
Suggest a Claude model for this run. |
Async: submit, then poll
Section titled “Async: submit, then poll”Cloud analysis is asynchronous. From your point of view at the keyboard, the command:
- Introspects the schema (and samples rows, unless
--no-samples) locally. - Submits the metadata to the cloud, which accepts it and queues the work.
- Polls the cloud until the analysis reaches a terminal state — completed or failed — then prints the results.
You run one command and wait; the submit-and-poll handshake happens for you. A typical analysis completes in roughly 30–120 seconds depending on schema size and how many sample rows you send. Because the work is queued server-side, multiple analyses across your organization can be in flight at once without interfering with each other.
What you get back
Section titled “What you get back”A successful analysis produces the same two outputs as the standalone flow, now shared at the organization level:
- A column sensitivity report — each flagged column with its table, data type, sensitivity category, confidence, the reason it was flagged, and a suggested sanitization approach.
- A generated, versioned sanitization query — saved as the active query version for the target, ready to review, edit, or run.
Manage those query versions exactly as you would locally:
clonit analyze queries mydb # list every generated/imported versionclonit analyze show mydb # show the active query (or: show mydb 2)clonit analyze activate mydb 2 # choose which version is activeclonit analyze import mydb fix.sql # bring in your own SQL as a new versionSee the analyze command reference for full details on
these subcommands.
Per-organization Anthropic key and metering
Section titled “Per-organization Anthropic key and metering”Cloud analysis is billed against the organization, not against each member’s personal key. There are two ways the LLM call gets paid for:
- Your organization’s own Anthropic key. An owner or admin can provide a per-organization Anthropic key. When present, the cloud uses it for that org’s analyses. The key is stored encrypted, used only server-side, and is never returned by any API or shown back to you.
- The deployment’s shared key. If your organization hasn’t provided its own key, the cloud falls back to the deployment’s shared Anthropic key — unless the deployment is configured to require an org-supplied key, in which case you’ll need to add one before cloud analysis will run.
Either way, token usage is metered per organization. Each analysis records how many tokens it consumed against your org’s usage, so a team can see what its analyses cost without sharing a single personal API key around. Providing your own org key gives you direct control over that spend on your own Anthropic account.
Sampling and privacy
Section titled “Sampling and privacy”By default Clonit samples a few rows per table and includes them in the analysis to improve accuracy. You control this per run:
# Schema only — no row data leaves your machineclonit analyze mydb --no-samples
# Send more rows per table for harder-to-spot columnsclonit analyze mydb --sample-rows 25| Setting | Default | Effect |
|---|---|---|
--no-samples |
off (sampling on) | Send schema metadata only; no sample rows transmitted. |
--sample-rows <n> |
3 |
Rows per table to sample and send when sampling is on. |
Pick the level of sharing that fits your data. --no-samples is the most
conservative: the cloud and the LLM provider see only table, column, and data
type names — never any row values. More sample rows generally means better
detection, at the cost of sending more real data off the machine.
Quick reference
Section titled “Quick reference”| Situation | Where it runs | Whose Anthropic key | Sample rows sent? |
|---|---|---|---|
| Cloud connected, default | Cloud (async) | Org key, else deployment key | Yes, unless --no-samples |
Cloud connected, --local |
This machine | Your analysis.api_key |
Yes, unless --no-samples |
| No cloud connection | This machine | Your analysis.api_key |
Yes, unless --no-samples |
| Cloud connected but unreachable | Falls back to this machine (needs analysis.api_key) |
Your analysis.api_key |
Yes, unless --no-samples |
See also
Section titled “See also”- analyze command — full flag and subcommand reference.
- Sanitization Analysis guide — the standalone analysis flow and how generated queries feed sanitization.
- Clonit Cloud — what the optional cloud extension adds overall.
- Sign In & Login — connect this machine so analysis runs server-side.
- sanitize command — run the sanitization pipeline using the generated query.