Cloud AI Analysis

clonit analyze uses AI to read a target’s schema, flag columns that look sensitive (PII, credentials, financial data, and so on), and generate a versioned sanitization SQL query you can review and run. When Clonit Cloud is connected, that analysis runs server-side, per organization instead of on your machine — so a single shared analysis and generated query are available to your whole team, billed against the organization rather than each person’s own Anthropic key.

The command and its results are identical either way. What changes is where the LLM runs and whose Anthropic key pays for it.

What leaves your machine — and what never does

This is the most important thing to understand about cloud analysis:

What is sent: your target’s schema metadata (table, column, and data type names) and, unless you turn sampling off, a few sample rows per table to improve detection accuracy. The agent introspects your source database locally and ships only this metadata to the cloud.
What is never sent: your database connection URLs and credentials. The agent connects to your database itself, on your machine. The cloud runs the LLM with its own Anthropic key and never sees how to reach your database.

How Clonit chooses cloud vs. local

The routing is automatic and follows one rule:

Cloud is connected (you’ve run clonit login or set a cloud.api_key) → analysis runs in the cloud. The CLI tells you so:
```
Cloud connected: analysis will run server-side (use --local to force local).
```
Cloud is not connected → analysis runs locally with your own analysis.api_key.
Cloud is connected but you pass --local → analysis is forced to run locally with your own key, skipping the cloud entirely.

If the cloud is connected but momentarily unreachable, Clonit falls back to a local analysis — provided you have a local analysis.api_key configured. With no local key and no reachable cloud, the command stops with:

analysis API key not configured; set analysis.api_key (or CLONIT_ANALYSIS__API_KEY),
or connect a cloud that runs analysis (clonit login)

Running a cloud analysis

The command is the same one you already know:

clonit analyze mydb

From clonit analyze --help:

Analyzes the target database schema using AI to identify sensitive columns
(PII, credentials, financial data, etc.) and generates a sanitization SQL query.

Usage:
  clonit analyze <target> [flags]

Flags:
  -h, --help              help for analyze
      --local             force local LLM analysis even when cloud is configured
      --model string      override Claude model
      --no-samples        skip data sampling (schema only)
      --sample-rows int   rows per table for sampling (default from config)

Flag	What it does in the cloud flow
`--local`	Bypass the cloud and run the analysis on this machine with your own `analysis.api_key`.
`--no-samples`	Send schema metadata only — no sample rows leave your machine.
`--sample-rows <n>`	Number of rows per table to sample (default `3` from config).
`--model <name>`	Suggest a Claude model for this run.

Async: submit, then poll

Cloud analysis is asynchronous. From your point of view at the keyboard, the command:

Introspects the schema (and samples rows, unless --no-samples) locally.
Submits the metadata to the cloud, which accepts it and queues the work.
Polls the cloud until the analysis reaches a terminal state — completed or failed — then prints the results.

You run one command and wait; the submit-and-poll handshake happens for you. A typical analysis completes in roughly 30–120 seconds depending on schema size and how many sample rows you send. Because the work is queued server-side, multiple analyses across your organization can be in flight at once without interfering with each other.

What you get back

A successful analysis produces the same two outputs as the standalone flow, now shared at the organization level:

A column sensitivity report — each flagged column with its table, data type, sensitivity category, confidence, the reason it was flagged, and a suggested sanitization approach.
A generated, versioned sanitization query — saved as the active query version for the target, ready to review, edit, or run.

Manage those query versions exactly as you would locally:

clonit analyze queries mydb        # list every generated/imported version
clonit analyze show mydb           # show the active query (or: show mydb 2)
clonit analyze activate mydb 2     # choose which version is active
clonit analyze import mydb fix.sql # bring in your own SQL as a new version

See the analyze command reference for full details on these subcommands.

Per-organization Anthropic key and metering

Cloud analysis is billed against the organization, not against each member’s personal key. There are two ways the LLM call gets paid for:

Your organization’s own Anthropic key. An owner or admin can provide a per-organization Anthropic key. When present, the cloud uses it for that org’s analyses. The key is stored encrypted, used only server-side, and is never returned by any API or shown back to you.
The deployment’s shared key. If your organization hasn’t provided its own key, the cloud falls back to the deployment’s shared Anthropic key — unless the deployment is configured to require an org-supplied key, in which case you’ll need to add one before cloud analysis will run.

Either way, token usage is metered per organization. Each analysis records how many tokens it consumed against your org’s usage, so a team can see what its analyses cost without sharing a single personal API key around. Providing your own org key gives you direct control over that spend on your own Anthropic account.

Sampling and privacy

By default Clonit samples a few rows per table and includes them in the analysis to improve accuracy. You control this per run:

# Schema only — no row data leaves your machine
clonit analyze mydb --no-samples

# Send more rows per table for harder-to-spot columns
clonit analyze mydb --sample-rows 25

Setting	Default	Effect
`--no-samples`	off (sampling on)	Send schema metadata only; no sample rows transmitted.
`--sample-rows <n>`	`3`	Rows per table to sample and send when sampling is on.

Pick the level of sharing that fits your data. --no-samples is the most conservative: the cloud and the LLM provider see only table, column, and data type names — never any row values. More sample rows generally means better detection, at the cost of sending more real data off the machine.

Quick reference

Situation	Where it runs	Whose Anthropic key	Sample rows sent?
Cloud connected, default	Cloud (async)	Org key, else deployment key	Yes, unless `--no-samples`
Cloud connected, `--local`	This machine	Your `analysis.api_key`	Yes, unless `--no-samples`
No cloud connection	This machine	Your `analysis.api_key`	Yes, unless `--no-samples`
Cloud connected but unreachable	Falls back to this machine (needs `analysis.api_key`)	Your `analysis.api_key`	Yes, unless `--no-samples`