API Documentation

Programmatic file sanitization — uploads, URL fetching, webhooks, signed downloads, and anonymous accounts.

Quick Start

Get up and running in four steps. All examples use curl.

1 Create an account

cleanthis.io uses anonymous accounts — no email or password required. A random account number is your only credential.

curl -s -X POST https://cleanthis.io/api/v1/account | jq

Response:

{
  "accountNumber": "CT-7K9M-X2P4-R8N1-Q5W3-J6L0-V4T2",
  "prefix": "CT-7K9M",
  "warning": "Save this account number now. It cannot be recovered — we do not store it."
}

2 Log in and create an API key

# Log in (sets session cookie)
curl -s -X POST https://cleanthis.io/api/v1/account/login \
  -H "Content-Type: application/json" \
  -d '{"accountNumber": "CT-7K9M-X2P4-R8N1-Q5W3-J6L0-V4T2"}' \
  -c cookies.txt | jq

# Create an API key
curl -s -X POST https://cleanthis.io/api/v1/account/keys \
  -H "Content-Type: application/json" \
  -d '{"label": "production", "mode": "live"}' \
  -b cookies.txt | jq

Response:

{
  "rawKey": "ct_live_Ab3xK9mP...",
  "keyId": "key_01",
  "prefix": "ct_live",
  "last4": "9mPq",
  "label": "production",
  "warning": "Save this API key now. It will not be shown again."
}

3 Sanitize a file (upload → poll → download)

# Upload (default: standard level)
curl -s -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -F "file=@document.pdf" | jq

# Upload with a specific sanitization level
curl -s -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -F "file=@document.pdf" \
  -F "level=aggressive" | jq

# Response:
# {
#   "jobId": "a1b2c3d4-...",
#   "status": "queued",
#   "level": "aggressive",
#   "statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-..."
# }

# Poll until complete
curl -s https://cleanthis.io/api/v1/job/JOB_ID \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." | jq

# Response when complete includes a signed downloadUrl:
# {
#   "status": "completed",
#   "level": "aggressive",
#   "downloadUrl": "https://cleanthis.io/api/v1/download/JOB_ID?expires=...&sig=..."
# }

# Download using the signed URL from the response
curl -o clean.pdf "DOWNLOAD_URL_FROM_RESPONSE" \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..."

4 Sanitize from URL

# Default level (standard)
curl -s -X POST https://cleanthis.io/api/v1/sanitize-url \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf"}' | jq

# With explicit level
curl -s -X POST https://cleanthis.io/api/v1/sanitize-url \
  -H "Authorization: Bearer ct_live_Ab3xK9mP..." \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf", "level": "light"}' | jq

# Same poll → download flow as above

Authentication

Anonymous Accounts

cleanthis.io uses anonymous accounts. No email, no password — just a random account number (CT-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX) generated at signup. This number is shown once and cannot be recovered — store it securely.

API Keys

API keys are created via the dashboard or the POST /api/v1/account/keys endpoint. Two modes are available:

Your account number is not an API key. The CT-XXXX-XXXX-… value is how you log in; it is not a Bearer token. API keys always start with ct_live_ or ct_test_ followed by 36 characters. Sending the account number to an API endpoint returns 401 {"error":"Invalid API key format."}.

Getting an API key

  1. Create an account (if you don't have one) — you'll be given a CT-… account number. Save it; it's shown only once and can't be recovered.
  2. Log in at My Dashboard with that account number.
  3. Create a key from the dashboard's API-keys section. Pick test while you're integrating, live for real scans. The full key (ct_live_… / ct_test_…) is shown once — copy it immediately.
  4. Use it as a Bearer token (see below).

The dashboard is the simplest path because login + key creation are protected against automated abuse. Prefer the command line? The same steps map to POST /api/v1/accountPOST /api/v1/account/loginPOST /api/v1/account/keys (see Quick Start).

Bearer Token Auth

All sanitization endpoints use Bearer token authentication:

Authorization: Bearer ct_live_YOUR_KEY

Session Auth

Dashboard and account management endpoints use cookie-based session authentication. Sessions are HttpOnly, Secure, and SameSite=Strict. Log in via POST /api/v1/account/login to receive the session cookie.

Sanitization Levels

Every sanitization request accepts an optional level parameter that controls how aggressively the file is processed. If omitted, standard is used.

Level Behaviour Output Format
light Virus scan + privacy metadata removal only. File content stays intact — no re-encoding, no macro removal. For images, preserves ICC color profile and orientation for correct display. Accepts virtually any file type that carries readable metadata. Original format preserved
standard Default. Full Content Disarm & Reconstruction — re-encode, strip macros, flatten scripts, remove all metadata and hidden content. Original format preserved
aggressive Everything in Standard, plus conversion to the safest possible format. Destroys editability for maximum protection. Converted: Office→PDF, Images→PNG, Audio→WAV, Video→MP4, HTML→TXT

How to specify

File upload (multipart/form-data) — send level as a form field:

curl -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -F "file=@photo.jpg" \
  -F "level=light"

URL fetch (application/json) — include level in the JSON body:

curl -X POST https://cleanthis.io/api/v1/sanitize-url \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/doc.pdf", "level": "aggressive"}'

Level in responses

The chosen level is echoed back in both the initial 202 Accepted response and every subsequent poll response, so you can always confirm which level was applied:

{
  "jobId": "a1b2c3d4-...",
  "status": "queued",
  "level": "aggressive",
  "statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-..."
}

What each level removes

Archives (ZIP, 7z, TAR family, single-file compression)

Uploading an archive triggers recursive sanitization: each member is sanitized for its own format (PDF, Office, images, etc.) using the chosen level, then everything is re-packaged.

Supported containers:

Tar and 7z archives must contain only regular files and directories. Symbolic links, device files, named pipes, and sockets are rejected at the bomb-check stage — they have no place in a sanitization payload and represent attack surface on extraction.

Limits

Archives exceeding any of these are rejected at the bomb-check stage with a 400 response and an actionable error message (e.g. "Archive expands to 487MB, above our 100MB limit").

Content that can't be fully sanitized — archiveUnsupported

Standard and Aggressive levels cover 100+ file extensions with format-specific sanitization. When an archive member has an extension outside that set (rare formats, RAW photos, 3D models, …), the archiveUnsupported parameter controls what happens. Default is drop. This applies to both multi-member archives and single-file compression:

ValueMulti-member archive (.zip / .7z / tar family)Single-file compression (.gz / .bz2 / .xz)
drop (default) Members that can't be fully sanitized are removed from the output. The output contains only files that received full level-grade sanitization. Removed paths are listed in the report under archive_dropped. With only one inner file, "drop" means rejecting the whole upload — the response is 400 with an actionable message that suggests Light mode or light_fallback.
light_fallback Members that can't be fully sanitized fall back to Light treatment instead (virus scan + metadata strip). May contain files where macros / scripts / embedded objects were not removed. Listed in the report under archive_fallback. The inner file falls back to Light treatment, then is recompressed in the same wrapper. The report's innerTreatment field is set to "light_fallback" so consumers can detect the downgrade without parsing labels.

Forced to drop when level=aggressive — Aggressive mode's promise is that no original bytes survive, which Light treatment would contradict.

Executable/script extensions (.exe, .dll, .bat, etc.) inside Standard/Aggressive archive jobs are always rejected regardless of archiveUnsupported — both as archive members and as the inner file of a single-compressed upload. Every upload — including light_fallback payloads — is virus-scanned at the outer archive level before processing, with nested members (.zip / .7z / .gz / .tar etc.) covered recursively.

Example

curl -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -F "file=@bundle.zip" \
  -F "level=standard" \
  -F "archiveUnsupported=light_fallback"

Report

Archive jobs return a roll-up report with per-member tallies plus full per-file detail. The changes array includes summary entries with these type values:

The job's report.memberReports array contains one entry per cleaned/fallback member with that member's own full sanitization report (the same shape as a single-file job), so API consumers can list exactly what was stripped from each file. Each entry has path, cleanedPath, treatment (the level it was actually processed at: "light" / "standard" / "aggressive" / "light_fallback"), and report.

The job's report.archiveStats object contains numeric counts: totalMembers, cleanedCount, fallbackCount, droppedCount, erroredCount, nestedArchiveCount, and policy.

Nested archives are flattened

When an archive contains a nested archive (zip or tar), that wrapper is not represented as a single entry in memberReports. Instead, its contents appear in the parent's memberReports / droppedMembers / erroredMembers arrays with paths prefixed by the wrapper (e.g. inner.tar.gz/photo.png). The result is one flat list of actual files, regardless of how deeply nested they were. The archiveStats.nestedArchiveCount field counts how many wrappers were flattened. Mixed nesting is fine (a .tar.gz inside a .zip or vice versa).

Endpoints — Sanitization

Method Endpoint Auth Rate Limit Description
POST /api/v1/sanitize Bearer 20/min Upload file for sanitization. Optional level and webhook fields.
POST /api/v1/sanitize-url Bearer 20/min Fetch from URL and sanitize. Optional level and webhook fields in JSON body.
GET /api/v1/job/:id Bearer 120/min Poll job status. Returns downloadUrl when complete.
GET /api/v1/download/:id Bearer 30/min Download sanitized file. Requires valid signed URL params (expires, sig).
POST /api/v1/cancel/:id Bearer 20/min Cancel a queued or running job.

POST /api/v1/sanitize

Upload a file for sanitization. Optionally include a webhook URL, a level, and (for archive / single-file-compressed uploads) an archiveUnsupported policy. See Sanitization Levels and Archives for details.

POST /api/v1/sanitize
Authorization: Bearer ct_live_YOUR_KEY
Content-Type: multipart/form-data

file: (binary)
level: standard                             (optional: light | standard | aggressive)
archiveUnsupported: drop                    (optional, archive + single-compressed: drop | light_fallback)
webhook: https://your-server.com/callback   (optional)

Response 202 Accepted:

{
  "jobId": "a1b2c3d4-...",
  "status": "queued",
  "level": "standard",
  "statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-...",
  "message": "File accepted for sanitization. Poll the statusUrl for progress.",
  "webhook": { "url": "https://your-server.com/callback", "status": "registered" }
}

POST /api/v1/sanitize-url

Provide a URL to fetch and sanitize. The server downloads the file, then processes it. Accepts the same level and archiveUnsupported parameters as the file upload endpoint.

POST /api/v1/sanitize-url
Authorization: Bearer ct_live_YOUR_KEY
Content-Type: application/json

{
  "url": "https://example.com/bundle.zip",
  "level": "standard",
  "archiveUnsupported": "light_fallback",
  "webhook": "https://your-server.com/callback"
}

Response 202 Accepted:

{
  "jobId": "e5f6g7h8-...",
  "status": "queued",
  "level": "aggressive",
  "statusUrl": "https://cleanthis.io/api/v1/job/e5f6g7h8-...",
  "message": "URL accepted for sanitization. Poll the statusUrl for progress.",
  "webhook": { "url": "https://your-server.com/callback", "status": "registered" }
}

GET /api/v1/job/:id

Poll job status. The response changes as the job progresses. The level field is always included so you can confirm which sanitization mode was applied.

Queued / In progress:

{
  "jobId": "a1b2c3d4-...",
  "status": "processing",
  "level": "standard"
}

Completed:

{
  "jobId": "a1b2c3d4-...",
  "status": "completed",
  "level": "standard",
  "downloadName": "doc_cleaned.pdf",
  "downloadUrl": "https://cleanthis.io/api/v1/download/a1b2c3d4-...?expires=...&sig=...",
  "report": {
    "changes": [
      { "type": "level", "label": "Sanitization level: Standard — full CDR pipeline" },
      { "type": "macros", "label": "Removed 1 macro component: VBA Project" },
      { "type": "metadata", "label": "Stripped 3 metadata fields: Author: \"John\"; Creator: \"Word\"..." }
    ],
    "summary": "3 changes made during sanitization."
  }
}

See Sanitization Report & Metadata for full details on the report object.

GET /api/v1/download/:id

Download the sanitized file. Use the downloadUrl from the poll or webhook response — it includes the required expires and sig query parameters.

curl -o clean.pdf "https://cleanthis.io/api/v1/download/JOB_ID?expires=...&sig=..." \
  -H "Authorization: Bearer ct_live_YOUR_KEY"

POST /api/v1/cancel/:id

Cancel a queued or in-progress job.

curl -X POST https://cleanthis.io/api/v1/cancel/a1b2c3d4-... \
  -H "Authorization: Bearer ct_live_YOUR_KEY"

Response 200 OK:

{ "status": "cancelled" }

Endpoints — Webpage Scanner BETA

Scan a web page for threats, trackers, and brand-impersonation — the same engine behind the Webpage Scanner page, over the API. Three tiers:

MethodEndpointAuthRate LimitDescription
POST /api/v1/scan-url Bearer 10/min · 50/day Scan a URL. Body { url, tier, bypassCache? }. light/standard return the verdict directly; aggressive returns a jobId.
GET /api/v1/scan-job/:id Bearer 120/min Poll a Deep (aggressive) scan. Returns the result when complete.
POST /api/v1/scan-job/:id/cancel Bearer 120/min Cancel an in-flight Deep scan.

POST /api/v1/scan-url

Quick / Standard (synchronous):

curl -s -X POST https://cleanthis.io/api/v1/scan-url \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "tier": "standard" }' | jq

Response 200 OK:

{
  "ok": true,
  "url": "https://example.com",
  "tier": "standard",
  "verdict": "clean",            // clean | suspicious | malicious | unreachable | unknown
  "findings": [ /* per-source/analyzer results */ ],
  "scores": { "security": { "value": 92, "band": "green" }, "privacy": {…}, "legitimacy": {…} },
  "cached": false,
  "computedAt": "2026-05-31T18:00:00.000Z"
}

Deep scan (asynchronous — submit, then poll):

curl -s -X POST https://cleanthis.io/api/v1/scan-url \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "url": "https://example.com", "tier": "aggressive" }' | jq
# → { "ok": true, "jobId": "a1b2c3d4-...", "state": "processing" }

GET /api/v1/scan-job/:id

Poll until state is completed or failed. On completion, if a screenshot was captured, result.screenshot.url is a short-lived (15 min) signed link.

curl -s https://cleanthis.io/api/v1/scan-job/a1b2c3d4-... \
  -H "Authorization: Bearer ct_live_YOUR_KEY" | jq
# { "ok": true, "state": "processing" }
# … then …
# { "ok": true, "state": "completed", "result": { "verdict": "clean", "findings": […], "scores": {…}, "screenshot": { "url": "https://cleanthis.io/api/scan-shot/…?expires=…&sig=…" } } }

Note: ct_test_* keys return a deterministic mock verdict (no real scan runs) and don't count against your quota.

Endpoints — Account Management

Method Endpoint Auth Rate Limit Description
POST /api/v1/account 5/hr Create anonymous account
POST /api/v1/account/login 10/15min Log in (sets session cookie)
POST /api/v1/account/logout Session Clear session
GET /api/v1/account Session 30/min Get account info + usage
DELETE /api/v1/account Session 30/min Delete account and all data
GET /api/v1/account/keys Session 30/min List API keys
POST /api/v1/account/keys Session 30/min Create API key
DELETE /api/v1/account/keys/:id Session 30/min Revoke API key
GET /api/v1/account/usage Session 30/min Usage stats
GET /api/v1/account/webhook-secret Session 30/min Get webhook signing secret
POST /api/v1/account/webhook-secret/rotate Session 30/min Rotate webhook secret

Webhooks

How to use

Include a webhook URL in your sanitize request. When the job completes (or fails), cleanthis.io sends a POST request to that URL with the result.

# Upload with webhook
curl -X POST https://cleanthis.io/api/v1/sanitize \
  -H "Authorization: Bearer ct_live_YOUR_KEY" \
  -F file=@document.pdf \
  -F webhook=https://your-server.com/callback

Payload

The webhook POST body is JSON:

{
  "event": "job.completed",
  "jobId": "a1b2c3d4-...",
  "status": "completed",
  "downloadUrl": "https://cleanthis.io/api/v1/download/...?expires=...&sig=...",
  "downloadName": "doc_cleaned.pdf",
  "report": {
    "changes": [
      { "type": "level", "label": "Sanitization level: Aggressive — full CDR + format conversion" },
      { "type": "macros", "label": "Removed 1 macro component: VBA Project" },
      { "type": "conversion", "label": "Converted to PDF for maximum safety" }
    ],
    "summary": "3 changes made during sanitization (Aggressive mode)."
  },
  "timestamp": "2026-04-23T01:23:00.000Z"
}

The report field contains the full sanitization report including any detected metadata and the sanitization level applied. See Sanitization Report & Metadata for the complete structure.

Signature Verification

Every webhook includes two headers for verification:

Always verify the signature before trusting the payload. Use a timing-safe comparison to prevent timing attacks.

Node.js

const crypto = require('crypto');

function verifyWebhook(body, signature, secret) {
  const expected = 'sha256=' + crypto
    .createHmac('sha256', secret)
    .update(body, 'utf8')
    .digest('hex');
  return crypto.timingSafeEqual(
    Buffer.from(signature),
    Buffer.from(expected)
  );
}

// Express handler
app.post('/callback', express.text({ type: '*/*' }), (req, res) => {
  const sig = req.headers['x-cleanthis-signature'];
  if (!verifyWebhook(req.body, sig, WEBHOOK_SECRET)) {
    return res.status(403).send('Invalid signature');
  }
  const event = JSON.parse(req.body);
  console.log('Completed:', event.downloadUrl);
  res.sendStatus(200);
});

Retry Behavior

If your endpoint returns a non-2xx status code or doesn't respond within 10 seconds, cleanthis.io retries up to 3 times with exponential backoff:

Requirements

Managing Your Secret

Retrieve your webhook signing secret from the dashboard or programmatically:

GET /api/v1/account/webhook-secret
Cookie: session=...

To rotate your secret (the old secret is immediately invalidated):

POST /api/v1/account/webhook-secret/rotate
Cookie: session=...

Signed Download URLs

All download URLs use HMAC-SHA256 signed URLs with an expiry timestamp. You never need to construct these yourself — the downloadUrl returned by the poll endpoint and webhook payload is ready to use.

Expired & Tampered URLs

If your download URL has expired, simply poll GET /api/v1/job/:id again to receive a fresh signed URL.

Sanitization Report & Metadata

Every completed job includes a report object in both the poll response (GET /api/v1/job/:id) and the webhook payload. This report describes exactly what was found and removed during sanitization — including embedded metadata.

Report Structure

{
  "report": {
    "changes": [
      {
        "type": "macros",
        "label": "Removed 1 macro component: VBA Project"
      },
      {
        "type": "exif",
        "label": "Stripped 14 EXIF/metadata fields including: GPS Location data; Model: \"iPhone 15 Pro\"",
        "details": [
          { "key": "GPSLatitude", "value": "40.7128 N" },
          { "key": "GPSLongitude", "value": "74.0060 W" },
          { "key": "Make", "value": "Apple" },
          { "key": "Model", "value": "iPhone 15 Pro" },
          { "key": "DateTimeOriginal", "value": "2025:03:15 14:30:00" },
          { "key": "Software", "value": "17.4" }
        ]
      }
    ],
    "summary": "2 changes made during sanitization."
  }
}

Change Types

Each entry in the changes array has a type, a human-readable label, and optionally a details array for metadata fields.

Type Description Has Details
levelIndicates which sanitization level was applied (always present)No
conversionFile was converted to a safer format (aggressive mode only)No
sizeFile size changed significantly during sanitizationNo
macrosVBA macros, ActiveX controls, or OLE objects removedNo
attachmentsEmbedded file attachments removed (PDF)No
encryptionPDF encryption/password protection removedNo
javascriptJavaScript code removed (PDF)No
csv_formulasDangerous formula cells neutralised (CSV/TSV)No
scriptsScript tags, iframes, objects, forms removed (HTML/SVG/EPUB/subtitles)No
event_handlersInline event handlers removed (onclick, onload, etc.)No
external_refsExternal URL references removedNo
exifEXIF/metadata stripped from images, audio, and videoYes
metadataDocument metadata stripped (author, creator, software, etc.)Yes

Metadata Details

When type is "exif" or "metadata", the change object includes a details array — the complete list of metadata fields that were detected and stripped. Each entry has a key (field name) and value (original value, truncated to 120 characters).

This is the same data the web UI offers as CSV/JSON export. As an API consumer, you receive it directly in the JSON response and can format it however you need.

Extracting Metadata — Examples

Node.js

const res = await fetch(`https://cleanthis.io/api/v1/job/${jobId}`, {
  headers: { 'Authorization': `Bearer ${API_KEY}` }
});
const data = await res.json();

if (data.status === 'completed' && data.report) {
  // Find metadata changes (exif or metadata type)
  const metaChanges = data.report.changes.filter(
    c => c.details && c.details.length > 0
  );

  for (const change of metaChanges) {
    console.log(change.label);
    for (const field of change.details) {
      console.log(`  ${field.key}: ${field.value}`);
    }
  }
}

Clean Files

If the file was already clean, the report will have an empty changes array:

{
  "report": {
    "changes": [],
    "summary": "No significant threats or changes detected. The file was already clean."
  }
}

Rate Limits

Endpoint Group Limit Keyed By
API v1 sanitize / cancel 20/min Account ID
API v1 job poll 120/min Account ID
API v1 download 30/min Account ID
Account create 5/hr IP
Account login 10/15min IP
Account management 30/min Session
Web UI upload 10/min IP
Web UI poll 120/min IP
Web UI download 30/min IP

API v1 endpoints are keyed by account ID (not IP), so multiple servers behind NAT share a generous per-account limit. When rate-limited, responses include Retry-After header with the number of seconds to wait.

Error Responses

All errors return JSON with a single error field:

{ "error": "File type not allowed: .exe" }

Common Status Codes

Code Meaning
400Bad request — missing or invalid parameters
401Unauthorized — missing or invalid API key / session
403Forbidden — tampered signed URL or invalid webhook signature
404Not found — job ID doesn't exist or doesn't belong to your account
410Gone — download URL has expired
413Payload too large — file exceeds the size limit
415Unsupported media type — file type not allowed
429Too many requests — rate limit exceeded (check Retry-After header)
500Internal server error — something went wrong on our end