API Documentation

Quick Start

Get up and running in four steps. All examples use curl.

1 Create an account

cleanthis.io uses anonymous accounts — no email or password required. A random account number is your only credential.

curl -s -X POST https://cleanthis.io/api/v1/account | jq

Response:

{
  "accountNumber": "CT-7K9M-X2P4-R8N1-Q5W3-J6L0-V4T2",
  "prefix": "CT-7K9M",
  "warning": "Save this account number now. It cannot be recovered — we do not store it."
}

2 Log in and create an API key

# Log in (sets session cookie)
curl -s -X POST https://cleanthis.io/api/v1/account/login \
  -H "Content-Type: application/json" \
  -d '{"accountNumber": "CT-7K9M-X2P4-R8N1-Q5W3-J6L0-V4T2"}' \
  -c cookies.txt | jq

# Create an API key
curl -s -X POST https://cleanthis.io/api/v1/account/keys \
  -H "Content-Type: application/json" \
  -d '{"label": "production", "mode": "live"}' \
  -b cookies.txt | jq

Response:

{
  "rawKey": "ct_live_Ab3xK9mP...",
  "keyId": "key_01",
  "prefix": "ct_live",
  "last4": "9mPq",
  "label": "production",
  "warning": "Save this API key now. It will not be shown again."
}

3 Sanitize a file (upload → poll → download)

# Upload (default: standard level)
curl -s -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -F "file=@document.pdf" | jq

# Upload with a specific sanitization level
curl -s -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -F "file=@document.pdf" \
  -F "level=aggressive" | jq

# Response:
# {
#   "jobId": "a1b2c3d4-...",
#   "status": "queued",
#   "level": "aggressive",
#   "statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-..."
# }

# Poll until complete
curl -s https://cleanthis.io/api/v1/job/JOB_ID \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." | jq

# Response when complete includes a signed downloadUrl:
# {
#   "status": "completed",
#   "level": "aggressive",
#   "downloadUrl": "https://cleanthis.io/api/v1/download/JOB_ID?expires=...&sig=..."
# }

# Download using the signed URL from the response
curl -o clean.pdf "DOWNLOAD_URL_FROM_RESPONSE" \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..."

4 Sanitize from URL

# Default level (standard)
curl -s -X POST https://cleanthis.io/api/v1/sanitize-url \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf"}' | jq

# With explicit level
curl -s -X POST https://cleanthis.io/api/v1/sanitize-url \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf", "level": "light"}' | jq

# Same poll → download flow as above

Authentication

Anonymous Accounts

cleanthis.io uses anonymous accounts. No email, no password — just a random account number (CT-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX) generated at signup. This number is shown once and cannot be recovered — store it securely.

API Keys

API keys are created via the dashboard or the POST /api/v1/account/keys endpoint. Two modes are available:

ct_live_* — Production keys. Requests count against your quota.
ct_test_* — Test keys. Requests are fully validated but no file is processed and no real scan runs (you get a deterministic mock response), and they don't count against your quota — ideal for wiring up an integration.

Your account number is not an API key. The CT-XXXX-XXXX-… value is how you log in; it is not a Bearer token. API keys always start with ct_live_ or ct_test_ followed by 36 characters. Sending the account number to an API endpoint returns 401 {"error":"Invalid API key format."}.

Getting an API key

Create an account (if you don't have one) — you'll be given a CT-… account number. Save it; it's shown only once and can't be recovered.
Log in at My Dashboard with that account number.
Create a key from the dashboard's API-keys section. Pick test while you're integrating, live for real scans. The full key (ct_live_… / ct_test_…) is shown once — copy it immediately.
Use it as a Bearer token (see below).

The dashboard is the simplest path because login + key creation are protected against automated abuse. Prefer the command line? The same steps map to POST /api/v1/account → POST /api/v1/account/login → POST /api/v1/account/keys (see Quick Start).

Bearer Token Auth

All sanitization endpoints use Bearer token authentication:

Authorization: Bearer ct_live_YOUR_KEY

Session Auth

Dashboard and account management endpoints use cookie-based session authentication. Sessions are HttpOnly, Secure, and SameSite=Strict. Log in via POST /api/v1/account/login to receive the session cookie.

Sanitization Levels

Every sanitization request accepts an optional level parameter that controls how aggressively the file is processed. If omitted, standard is used.

Level	Behaviour	Output Format
`light`	Virus scan + privacy metadata removal only. File content stays intact — no re-encoding, no macro removal. For images, preserves ICC color profile and orientation for correct display. Accepts virtually any file type that carries readable metadata.	Original format preserved
`standard`	Default. Full Content Disarm & Reconstruction — re-encode, strip macros, flatten scripts, remove all metadata and hidden content.	Original format preserved
`aggressive`	Everything in Standard, plus conversion to the safest possible format. Destroys editability for maximum protection.	Converted: Office→PDF, Images→PNG, Audio→WAV, Video→MP4, HTML→TXT

How to specify

File upload (multipart/form-data) — send level as a form field:

curl -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -F "file=@photo.jpg" \
  -F "level=light"

URL fetch (application/json) — include level in the JSON body:

curl -X POST https://cleanthis.io/api/v1/sanitize-url \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf", "level": "aggressive"}'

Level in responses

The chosen level is echoed back in both the initial 202 Accepted response and every subsequent poll response, so you can always confirm which level was applied:

{
  "jobId": "a1b2c3d4-...",
  "status": "queued",
  "level": "aggressive",
  "statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-..."
}

What each level removes

Light — GPS coordinates, author/creator names, timestamps, camera info, software version, comments. File content (macros, scripts, embedded objects) is left intact. Light mode accepts virtually any file type (RAW photos, PSD, archives, fonts, 3D models, and hundreds more). Only executable/script formats (.exe, .dll, .bat, etc.) are blocked.
Standard — Everything in Light, plus: VBA macros, ActiveX controls, JavaScript (PDF), embedded attachments, OLE objects, encryption, formula injection (CSV), script tags (HTML/SVG), event handlers, external references.
Aggressive — Everything in Standard, plus: format conversion to eliminate any format-specific attack surface. The output file type may differ from the input.

Archives (ZIP, 7z, TAR family, single-file compression)

Uploading an archive triggers recursive sanitization: each member is sanitized for its own format (PDF, Office, images, etc.) using the chosen level, then everything is re-packaged.

Supported containers:

.zip — output is a fresh .zip
.7z — output is a fresh .7z (LZMA2)
.tar, .tar.gz / .tgz, .tar.bz2 / .tbz / .tbz2, .tar.xz / .txz — output is always .tar.gz regardless of input compression
.gz, .bz2, .xz (single-file, not wrapping a tar) — decompress → sanitize the inner file → recompress back to the same format. The inner filename must carry an extension hint (e.g. report.csv.gz → inner .csv) so the decompressed payload can be routed to the right pipeline. Standard/Aggressive reject single-file archives without an inner ext; Light accepts them.

Tar and 7z archives must contain only regular files and directories. Symbolic links, device files, named pipes, and sockets are rejected at the bomb-check stage — they have no place in a sanitization payload and represent attack surface on extraction.

Limits

Maximum uncompressed size: 100 MB across all members
Maximum member count: 100 files (single-file compression: one member by definition)
Maximum compression ratio: 100:1 (archive-bomb guard)
Maximum nesting depth: 1 level (an archive inside an archive is allowed; deeper is rejected — applies uniformly across all archive types, including mixed nesting like a .tar.gz inside a .zip or a .zip inside a .7z)

Archives exceeding any of these are rejected at the bomb-check stage with a 400 response and an actionable error message (e.g. "Archive expands to 487MB, above our 100MB limit").

Content that can't be fully sanitized — `archiveUnsupported`

Standard and Aggressive levels cover 100+ file extensions with format-specific sanitization. When an archive member has an extension outside that set (rare formats, RAW photos, 3D models, …), the archiveUnsupported parameter controls what happens. Default is drop. This applies to both multi-member archives and single-file compression:

Value	Multi-member archive (.zip / .7z / tar family)	Single-file compression (.gz / .bz2 / .xz)
`drop` (default)	Members that can't be fully sanitized are removed from the output. The output contains only files that received full `level`-grade sanitization. Removed paths are listed in the report under `archive_dropped`.	With only one inner file, "drop" means rejecting the whole upload — the response is `400` with an actionable message that suggests Light mode or `light_fallback`.
`light_fallback`	Members that can't be fully sanitized fall back to Light treatment instead (virus scan + metadata strip). May contain files where macros / scripts / embedded objects were not removed. Listed in the report under `archive_fallback`.	The inner file falls back to Light treatment, then is recompressed in the same wrapper. The report's `innerTreatment` field is set to `"light_fallback"` so consumers can detect the downgrade without parsing labels.

Forced to drop when level=aggressive — Aggressive mode's promise is that no original bytes survive, which Light treatment would contradict.

Executable/script extensions (.exe, .dll, .bat, etc.) inside Standard/Aggressive archive jobs are always rejected regardless of archiveUnsupported — both as archive members and as the inner file of a single-compressed upload. Every upload — including light_fallback payloads — is virus-scanned at the outer archive level before processing, with nested members (.zip / .7z / .gz / .tar etc.) covered recursively.

Example

curl -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -F "file=@bundle.zip" \
  -F "level=standard" \
  -F "archiveUnsupported=light_fallback"

Report

Archive jobs return a roll-up report with per-member tallies plus full per-file detail. The changes array includes summary entries with these type values:

archive — summary count (e.g. "Archive contained 12 files: 10 sanitized at Standard, 2 removed")
archive_cleaned — list of members that were processed at the requested level
archive_fallback — list of members that received Light treatment instead (only present when archiveUnsupported=light_fallback)
archive_dropped — list of members removed (with reason: unsupported, blocked executable, bomb check)
archive_errored — list of members the sanitizer failed on (with reason)

The job's report.memberReports array contains one entry per cleaned/fallback member with that member's own full sanitization report (the same shape as a single-file job), so API consumers can list exactly what was stripped from each file. Each entry has path, cleanedPath, treatment (the level it was actually processed at: "light" / "standard" / "aggressive" / "light_fallback"), and report.

The job's report.archiveStats object contains numeric counts: totalMembers, cleanedCount, fallbackCount, droppedCount, erroredCount, nestedArchiveCount, and policy.

Nested archives are flattened

When an archive contains a nested archive (zip or tar), that wrapper is not represented as a single entry in memberReports. Instead, its contents appear in the parent's memberReports / droppedMembers / erroredMembers arrays with paths prefixed by the wrapper (e.g. inner.tar.gz/photo.png). The result is one flat list of actual files, regardless of how deeply nested they were. The archiveStats.nestedArchiveCount field counts how many wrappers were flattened. Mixed nesting is fine (a .tar.gz inside a .zip or vice versa).

Endpoints — Sanitization

Method	Endpoint	Auth	Rate Limit	Description
POST	`/api/v1/sanitize`	Bearer	20/min	Upload file for sanitization. Optional `level` and `webhook` fields.
POST	`/api/v1/sanitize-url`	Bearer	20/min	Fetch from URL and sanitize. Optional `level` and `webhook` fields in JSON body.
GET	`/api/v1/job/:id`	Bearer	120/min	Poll job status. Returns `downloadUrl` when complete.
GET	`/api/v1/download/:id`	Bearer	30/min	Download sanitized file. Requires valid signed URL params (`expires`, `sig`).
POST	`/api/v1/cancel/:id`	Bearer	20/min	Cancel a queued or running job.

POST `/api/v1/sanitize`

Upload a file for sanitization. Optionally include a webhook URL, a level, and (for archive / single-file-compressed uploads) an archiveUnsupported policy. See Sanitization Levels and Archives for details.

POST /api/v1/sanitize
Authorization: Bearer ct_live_YOUR_KEY
Content-Type: multipart/form-data

file: (binary)
level: standard                             (optional: light | standard | aggressive)
archiveUnsupported: drop                    (optional, archive + single-compressed: drop | light_fallback)
webhook: https://your-server.com/callback   (optional)

Response 202 Accepted:

{
  "jobId": "a1b2c3d4-...",
  "status": "queued",
  "level": "standard",
  "statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-...",
  "message": "File accepted for sanitization. Poll the statusUrl for progress.",
  "webhook": { "url": "https://your-server.com/callback", "status": "registered" }
}

POST `/api/v1/sanitize-url`

Provide a URL to fetch and sanitize. The server downloads the file, then processes it. Accepts the same level and archiveUnsupported parameters as the file upload endpoint.

POST /api/v1/sanitize-url
Authorization: Bearer ct_live_YOUR_KEY
Content-Type: application/json

{
  "url": "https://example.com/bundle.zip",
  "level": "standard",
  "archiveUnsupported": "light_fallback",
  "webhook": "https://your-server.com/callback"
}

Response 202 Accepted:

{
  "jobId": "e5f6g7h8-...",
  "status": "queued",
  "level": "aggressive",
  "statusUrl": "https://cleanthis.io/api/v1/job/e5f6g7h8-...",
  "message": "URL accepted for sanitization. Poll the statusUrl for progress.",
  "webhook": { "url": "https://your-server.com/callback", "status": "registered" }
}

GET `/api/v1/job/:id`

Poll job status. The response changes as the job progresses. The level field is always included so you can confirm which sanitization mode was applied.

Queued / In progress:

{
  "jobId": "a1b2c3d4-...",
  "status": "processing",
  "level": "standard"
}

Completed:

{
  "jobId": "a1b2c3d4-...",
  "status": "completed",
  "level": "standard",
  "downloadName": "doc_cleaned.pdf",
  "downloadUrl": "https://cleanthis.io/api/v1/download/a1b2c3d4-...?expires=...&sig=...",
  "report": {
    "changes": [
      { "type": "level", "label": "Sanitization level: Standard — full CDR pipeline" },
      { "type": "macros", "label": "Removed 1 macro component: VBA Project" },
      { "type": "metadata", "label": "Stripped 3 metadata fields: Author: \"John\"; Creator: \"Word\"..." }
    ],
    "summary": "3 changes made during sanitization."
  }
}

See Sanitization Report & Metadata for full details on the report object.

GET `/api/v1/download/:id`

Download the sanitized file. Use the downloadUrl from the poll or webhook response — it includes the required expires and sig query parameters.

curl -o clean.pdf "https://cleanthis.io/api/v1/download/JOB_ID?expires=...&sig=..." \
  -H "Authorization: Bearer ct_live_YOUR_KEY"

POST `/api/v1/cancel/:id`

Cancel a queued or in-progress job.

curl -X POST https://cleanthis.io/api/v1/cancel/a1b2c3d4-... \
  -H "Authorization: Bearer ct_live_YOUR_KEY"

Response 200 OK:

{ "status": "cancelled" }

Endpoints — Webpage Scanner BETA

Scan a web page for threats, trackers, and brand-impersonation — the same engine behind the Webpage Scanner page, over the API. Three tiers:

Quick (light) — reputation & blocklists only; never loads the page.
Standard (standard) — reputation + a single page fetch with static analysis (trackers, vulnerable libraries, AI-prompt-injection, TLS, redirects, brand-clone).
Deep (aggressive) — loads the page in a real sandboxed browser (runtime connections, behaviour, screenshot, cloaking). Asynchronous — returns a jobId you poll.

Method	Endpoint	Auth	Rate Limit	Description
POST	`/api/v1/scan-url`	Bearer	10/min · 50/day	Scan a URL. Body `{ url, tier, bypassCache? }`. `light`/`standard` return the verdict directly; `aggressive` returns a `jobId`.
GET	`/api/v1/scan-job/:id`	Bearer	120/min	Poll a Deep (`aggressive`) scan. Returns the result when complete.
POST	`/api/v1/scan-job/:id/cancel`	Bearer	120/min	Cancel an in-flight Deep scan.

POST `/api/v1/scan-url`

Quick / Standard (synchronous):

curl -s -X POST https://cleanthis.io/api/v1/scan-url \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "tier": "standard" }' | jq

Response 200 OK:

{
  "ok": true,
  "url": "https://example.com",
  "tier": "standard",
  "verdict": "clean",            // clean | suspicious | malicious | unreachable | unknown
  "findings": [ /* per-source/analyzer results */ ],
  "scores": { "security": { "value": 92, "band": "green" }, "privacy": {…}, "legitimacy": {…} },
  "cached": false,
  "computedAt": "2026-05-31T18:00:00.000Z"
}

Deep scan (asynchronous — submit, then poll):

curl -s -X POST https://cleanthis.io/api/v1/scan-url \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "tier": "aggressive" }' | jq
# → { "ok": true, "jobId": "a1b2c3d4-...", "state": "processing" }

GET `/api/v1/scan-job/:id`

Poll until state is completed or failed. On completion, if a screenshot was captured, result.screenshot.url is a short-lived (15 min) signed link.

curl -s https://cleanthis.io/api/v1/scan-job/a1b2c3d4-... \
  -H "Authorization: Bearer ct_live_YOUR_KEY" | jq
# { "ok": true, "state": "processing" }
# … then …
# { "ok": true, "state": "completed", "result": { "verdict": "clean", "findings": […], "scores": {…}, "screenshot": { "url": "https://cleanthis.io/api/scan-shot/…?expires=…&sig=…" } } }

Note: ct_test_* keys return a deterministic mock verdict (no real scan runs) and don't count against your quota.

Endpoints — Account Management

Method	Endpoint	Auth	Rate Limit	Description
POST	`/api/v1/account`	—	5/hr	Create anonymous account
POST	`/api/v1/account/login`	—	10/15min	Log in (sets session cookie)
POST	`/api/v1/account/logout`	Session	—	Clear session
GET	`/api/v1/account`	Session	30/min	Get account info + usage
DELETE	`/api/v1/account`	Session	30/min	Delete account and all data
GET	`/api/v1/account/keys`	Session	30/min	List API keys
POST	`/api/v1/account/keys`	Session	30/min	Create API key
DELETE	`/api/v1/account/keys/:id`	Session	30/min	Revoke API key
GET	`/api/v1/account/usage`	Session	30/min	Usage stats
GET	`/api/v1/account/webhook-secret`	Session	30/min	Get webhook signing secret
POST	`/api/v1/account/webhook-secret/rotate`	Session	30/min	Rotate webhook secret

Webhooks

How to use

Include a webhook URL in your sanitize request. When the job completes (or fails), cleanthis.io sends a POST request to that URL with the result.

# Upload with webhook
curl -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -F file=@document.pdf \
  -F webhook=https://your-server.com/callback

Payload

The webhook POST body is JSON:

{
  "event": "job.completed",
  "jobId": "a1b2c3d4-...",
  "status": "completed",
  "downloadUrl": "https://cleanthis.io/api/v1/download/...?expires=...&sig=...",
  "downloadName": "doc_cleaned.pdf",
  "report": {
    "changes": [
      { "type": "level", "label": "Sanitization level: Aggressive — full CDR + format conversion" },
      { "type": "macros", "label": "Removed 1 macro component: VBA Project" },
      { "type": "conversion", "label": "Converted to PDF for maximum safety" }
    ],
    "summary": "3 changes made during sanitization (Aggressive mode)."
  },
  "timestamp": "2026-04-23T01:23:00.000Z"
}

The report field contains the full sanitization report including any detected metadata and the sanitization level applied. See Sanitization Report & Metadata for the complete structure.

Signature Verification

Every webhook includes two headers for verification:

X-CleanThis-Signature — HMAC-SHA256 of the raw request body, prefixed with sha256=
X-CleanThis-Timestamp — ISO 8601 timestamp of when the webhook was sent

Always verify the signature before trusting the payload. Use a timing-safe comparison to prevent timing attacks.

Node.js

const crypto = require('crypto');

function verifyWebhook(body, signature, secret) {
  const expected = 'sha256=' + crypto
    .createHmac('sha256', secret)
    .update(body, 'utf8')
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

// Express handler
app.post('/callback', express.text({ type: '*/*' }), (req, res) => {
  const sig = req.headers['x-cleanthis-signature'];
  if (!verifyWebhook(req.body, sig, WEBHOOK_SECRET)) {
    return res.status(403).send('Invalid signature');
  }
  const event = JSON.parse(req.body);
  console.log('Completed:', event.downloadUrl);
  res.sendStatus(200);
});

Python

import hmac, hashlib

def verify_webhook(body: bytes, signature: str, secret: str) -> bool:
    expected = 'sha256=' + hmac.new(
        secret.encode(), body, hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(signature, expected)

# Flask handler
@app.route('/callback', methods=['POST'])
def webhook():
    sig = request.headers.get('X-CleanThis-Signature', '')
    if not verify_webhook(request.data, sig, WEBHOOK_SECRET):
        return 'Invalid signature', 403
    event = request.get_json(force=True)
    print('Completed:', event['downloadUrl'])
    return '', 200

Retry Behavior

If your endpoint returns a non-2xx status code or doesn't respond within 10 seconds, cleanthis.io retries up to 3 times with exponential backoff:

Attempt 1 — immediate
Attempt 2 — after 2 seconds
Attempt 3 — after 8 seconds
Attempt 4 (final) — after 32 seconds

Requirements

HTTPS only — webhook URLs must use https://
Public IP — the URL must resolve to a public IP address (SSRF-protected; private/loopback ranges are rejected)

Managing Your Secret

Retrieve your webhook signing secret from the dashboard or programmatically:

GET /api/v1/account/webhook-secret
Cookie: session=...

To rotate your secret (the old secret is immediately invalidated):

POST /api/v1/account/webhook-secret/rotate
Cookie: session=...

Signed Download URLs

All download URLs use HMAC-SHA256 signed URLs with an expiry timestamp. You never need to construct these yourself — the downloadUrl returned by the poll endpoint and webhook payload is ready to use.

API downloads — signed URLs expire in 15 minutes
Web UI downloads — signed URLs expire in 5 minutes

Expired & Tampered URLs

Expired URLs return 410 Gone with a descriptive message
Tampered URLs (modified sig or expires) return 403 Forbidden

If your download URL has expired, simply poll GET /api/v1/job/:id again to receive a fresh signed URL.

Sanitization Report & Metadata

Every completed job includes a report object in both the poll response (GET /api/v1/job/:id) and the webhook payload. This report describes exactly what was found and removed during sanitization — including embedded metadata.

Report Structure

{
  "report": {
    "changes": [
      {
        "type": "macros",
        "label": "Removed 1 macro component: VBA Project"
      },
      {
        "type": "exif",
        "label": "Stripped 14 EXIF/metadata fields including: GPS Location data; Model: \"iPhone 15 Pro\"",
        "details": [
          { "key": "GPSLatitude", "value": "40.7128 N" },
          { "key": "GPSLongitude", "value": "74.0060 W" },
          { "key": "Make", "value": "Apple" },
          { "key": "Model", "value": "iPhone 15 Pro" },
          { "key": "DateTimeOriginal", "value": "2025:03:15 14:30:00" },
          { "key": "Software", "value": "17.4" }
        ]
      }
    ],
    "summary": "2 changes made during sanitization."
  }
}

Change Types

Each entry in the changes array has a type, a human-readable label, and optionally a details array for metadata fields.

Type	Description	Has Details
`level`	Indicates which sanitization level was applied (always present)	No
`conversion`	File was converted to a safer format (aggressive mode only)	No
`size`	File size changed significantly during sanitization	No
`macros`	VBA macros, ActiveX controls, or OLE objects removed	No
`attachments`	Embedded file attachments removed (PDF)	No
`encryption`	PDF encryption/password protection removed	No
`javascript`	JavaScript code removed (PDF)	No
`csv_formulas`	Dangerous formula cells neutralised (CSV/TSV)	No
`scripts`	Script tags, iframes, objects, forms removed (HTML/SVG/EPUB/subtitles)	No
`event_handlers`	Inline event handlers removed (onclick, onload, etc.)	No
`external_refs`	External URL references removed	No
`exif`	EXIF/metadata stripped from images, audio, and video	Yes
`metadata`	Document metadata stripped (author, creator, software, etc.)	Yes

Metadata Details

When type is "exif" or "metadata", the change object includes a details array — the complete list of metadata fields that were detected and stripped. Each entry has a key (field name) and value (original value, truncated to 120 characters).

This is the same data the web UI offers as CSV/JSON export. As an API consumer, you receive it directly in the JSON response and can format it however you need.

Extracting Metadata — Examples

Node.js

const res = await fetch(`https://cleanthis.io/api/v1/job/${jobId}`, {
  headers: { 'Authorization': `Bearer ${API_KEY}` }
});
const data = await res.json();

if (data.status === 'completed' && data.report) {
  // Find metadata changes (exif or metadata type)
  const metaChanges = data.report.changes.filter(
    c => c.details && c.details.length > 0
  );

  for (const change of metaChanges) {
    console.log(change.label);
    for (const field of change.details) {
      console.log(`  ${field.key}: ${field.value}`);
    }
  }
}

Python

import requests

res = requests.get(
    f'https://cleanthis.io/api/v1/job/{job_id}',
    headers={'Authorization': f'Bearer {api_key}'}
)
data = res.json()

if data['status'] == 'completed' and data.get('report'):
    for change in data['report']['changes']:
        if 'details' in change:
            print(change['label'])
            for field in change['details']:
                print(f"  {field['key']}: {field['value']}")

curl + jq

# Get full report
curl -s https://cleanthis.io/api/v1/job/JOB_ID \
  -H "Authorization: Bearer ct_live_YOUR_KEY" | jq '.report'

# Extract just the metadata fields
curl -s https://cleanthis.io/api/v1/job/JOB_ID \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  | jq '[.report.changes[] | select(.details) | .details[]] | from_entries'

# Save metadata as CSV
curl -s https://cleanthis.io/api/v1/job/JOB_ID \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  | jq -r '.report.changes[] | select(.details) | .details[] | [.key, .value] | @csv' \
  > metadata.csv

Clean Files

If the file was already clean, the report will have an empty changes array:

{
  "report": {
    "changes": [],
    "summary": "No significant threats or changes detected. The file was already clean."
  }
}

Rate Limits

Endpoint Group	Limit	Keyed By
API v1 sanitize / cancel	20/min	Account ID
API v1 job poll	120/min	Account ID
API v1 download	30/min	Account ID
Account create	5/hr	IP
Account login	10/15min	IP
Account management	30/min	Session
Web UI upload	10/min	IP
Web UI poll	120/min	IP
Web UI download	30/min	IP

API v1 endpoints are keyed by account ID (not IP), so multiple servers behind NAT share a generous per-account limit. When rate-limited, responses include Retry-After header with the number of seconds to wait.

Error Responses

All errors return JSON with a single error field:

{ "error": "File type not allowed: .exe" }

Common Status Codes

Code	Meaning
`400`	Bad request — missing or invalid parameters
`401`	Unauthorized — missing or invalid API key / session
`403`	Forbidden — tampered signed URL or invalid webhook signature
`404`	Not found — job ID doesn't exist or doesn't belong to your account
`410`	Gone — download URL has expired
`413`	Payload too large — file exceeds the size limit
`415`	Unsupported media type — file type not allowed
`429`	Too many requests — rate limit exceeded (check `Retry-After` header)
`500`	Internal server error — something went wrong on our end

Quick Start

1 Create an account

2 Log in and create an API key

3 Sanitize a file (upload → poll → download)

4 Sanitize from URL

Authentication

Anonymous Accounts

API Keys

Getting an API key

Bearer Token Auth

Session Auth

Sanitization Levels

How to specify

Level in responses

What each level removes

Archives (ZIP, 7z, TAR family, single-file compression)

Limits

Content that can't be fully sanitized — archiveUnsupported

Example

Report

Nested archives are flattened

Endpoints — Sanitization

POST /api/v1/sanitize

POST /api/v1/sanitize-url

GET /api/v1/job/:id

GET /api/v1/download/:id

POST /api/v1/cancel/:id

Endpoints — Webpage Scanner BETA

POST /api/v1/scan-url

GET /api/v1/scan-job/:id

Endpoints — Account Management

Webhooks

How to use

Payload

Signature Verification

Retry Behavior

Requirements

Managing Your Secret

Signed Download URLs

Expired & Tampered URLs

Sanitization Report & Metadata

Report Structure

Change Types

Metadata Details

Extracting Metadata — Examples

Clean Files

Rate Limits

Error Responses

Common Status Codes

Content that can't be fully sanitized — `archiveUnsupported`

POST `/api/v1/sanitize`

POST `/api/v1/sanitize-url`

GET `/api/v1/job/:id`

GET `/api/v1/download/:id`

POST `/api/v1/cancel/:id`

POST `/api/v1/scan-url`

GET `/api/v1/scan-job/:id`