API Documentation
Programmatic file sanitization — uploads, URL fetching, webhooks, signed downloads, and anonymous accounts.
Quick Start
Get up and running in four steps. All examples use curl.
1 Create an account
cleanthis.io uses anonymous accounts — no email or password required. A random account number is your only credential.
curl -s -X POST https://cleanthis.io/api/v1/account | jq
Response:
{
"accountNumber": "CT-7K9M-X2P4-R8N1-Q5W3-J6L0-V4T2",
"prefix": "CT-7K9M",
"warning": "Save this account number now. It cannot be recovered — we do not store it."
}2 Log in and create an API key
# Log in (sets session cookie)
curl -s -X POST https://cleanthis.io/api/v1/account/login \
-H "Content-Type: application/json" \
-d '{"accountNumber": "CT-7K9M-X2P4-R8N1-Q5W3-J6L0-V4T2"}' \
-c cookies.txt | jq
# Create an API key
curl -s -X POST https://cleanthis.io/api/v1/account/keys \
-H "Content-Type: application/json" \
-d '{"label": "production", "mode": "live"}' \
-b cookies.txt | jqResponse:
{
"rawKey": "ct_live_Ab3xK9mP...",
"keyId": "key_01",
"prefix": "ct_live",
"last4": "9mPq",
"label": "production",
"warning": "Save this API key now. It will not be shown again."
}3 Sanitize a file (upload → poll → download)
# Upload (default: standard level)
curl -s -X POST https://cleanthis.io/api/v1/sanitize \
-H "Authorization: Bearer ct_live_Ab3xK9mP..." \
-F "file=@document.pdf" | jq
# Upload with a specific sanitization level
curl -s -X POST https://cleanthis.io/api/v1/sanitize \
-H "Authorization: Bearer ct_live_Ab3xK9mP..." \
-F "file=@document.pdf" \
-F "level=aggressive" | jq
# Response:
# {
# "jobId": "a1b2c3d4-...",
# "status": "queued",
# "level": "aggressive",
# "statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-..."
# }
# Poll until complete
curl -s https://cleanthis.io/api/v1/job/JOB_ID \
-H "Authorization: Bearer ct_live_Ab3xK9mP..." | jq
# Response when complete includes a signed downloadUrl:
# {
# "status": "completed",
# "level": "aggressive",
# "downloadUrl": "https://cleanthis.io/api/v1/download/JOB_ID?expires=...&sig=..."
# }
# Download using the signed URL from the response
curl -o clean.pdf "DOWNLOAD_URL_FROM_RESPONSE" \
-H "Authorization: Bearer ct_live_Ab3xK9mP..."4 Sanitize from URL
# Default level (standard)
curl -s -X POST https://cleanthis.io/api/v1/sanitize-url \
-H "Authorization: Bearer ct_live_Ab3xK9mP..." \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/doc.pdf"}' | jq
# With explicit level
curl -s -X POST https://cleanthis.io/api/v1/sanitize-url \
-H "Authorization: Bearer ct_live_Ab3xK9mP..." \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/doc.pdf", "level": "light"}' | jq
# Same poll → download flow as aboveAuthentication
Anonymous Accounts
cleanthis.io uses anonymous accounts. No email, no password — just a random account number (CT-XXXX-XXXX-XXXX-XXXX-XXXX-XXXX) generated at signup. This number is shown once and cannot be recovered — store it securely.
API Keys
API keys are created via the dashboard or the POST /api/v1/account/keys endpoint. Two modes are available:
ct_live_*— Production keys. Requests count against your quota.ct_test_*— Test keys. Requests are fully validated but no file is processed and no real scan runs (you get a deterministic mock response), and they don't count against your quota — ideal for wiring up an integration.
CT-XXXX-XXXX-… value is how you log in; it is not a Bearer token. API keys always start with ct_live_ or ct_test_ followed by 36 characters. Sending the account number to an API endpoint returns 401 {"error":"Invalid API key format."}.
Getting an API key
- Create an account (if you don't have one) — you'll be given a
CT-…account number. Save it; it's shown only once and can't be recovered. - Log in at My Dashboard with that account number.
- Create a key from the dashboard's API-keys section. Pick
testwhile you're integrating,livefor real scans. The full key (ct_live_…/ct_test_…) is shown once — copy it immediately. - Use it as a Bearer token (see below).
The dashboard is the simplest path because login + key creation are protected against automated abuse. Prefer the command line? The same steps map to POST /api/v1/account → POST /api/v1/account/login → POST /api/v1/account/keys (see Quick Start).
Bearer Token Auth
All sanitization endpoints use Bearer token authentication:
Authorization: Bearer ct_live_YOUR_KEY
Session Auth
Dashboard and account management endpoints use cookie-based session authentication. Sessions are HttpOnly, Secure, and SameSite=Strict. Log in via POST /api/v1/account/login to receive the session cookie.
Sanitization Levels
Every sanitization request accepts an optional level parameter that controls how aggressively the file is processed. If omitted, standard is used.
| Level | Behaviour | Output Format |
|---|---|---|
light |
Virus scan + privacy metadata removal only. File content stays intact — no re-encoding, no macro removal. For images, preserves ICC color profile and orientation for correct display. Accepts virtually any file type that carries readable metadata. | Original format preserved |
standard |
Default. Full Content Disarm & Reconstruction — re-encode, strip macros, flatten scripts, remove all metadata and hidden content. | Original format preserved |
aggressive |
Everything in Standard, plus conversion to the safest possible format. Destroys editability for maximum protection. | Converted: Office→PDF, Images→PNG, Audio→WAV, Video→MP4, HTML→TXT |
How to specify
File upload (multipart/form-data) — send level as a form field:
curl -X POST https://cleanthis.io/api/v1/sanitize \ -H "Authorization: Bearer ct_live_YOUR_KEY" \ -F "file=@photo.jpg" \ -F "level=light"
URL fetch (application/json) — include level in the JSON body:
curl -X POST https://cleanthis.io/api/v1/sanitize-url \
-H "Authorization: Bearer ct_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/doc.pdf", "level": "aggressive"}'Level in responses
The chosen level is echoed back in both the initial 202 Accepted response and every subsequent poll response, so you can always confirm which level was applied:
{
"jobId": "a1b2c3d4-...",
"status": "queued",
"level": "aggressive",
"statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-..."
}What each level removes
- Light — GPS coordinates, author/creator names, timestamps, camera info, software version, comments. File content (macros, scripts, embedded objects) is left intact. Light mode accepts virtually any file type (RAW photos, PSD, archives, fonts, 3D models, and hundreds more). Only executable/script formats (.exe, .dll, .bat, etc.) are blocked.
- Standard — Everything in Light, plus: VBA macros, ActiveX controls, JavaScript (PDF), embedded attachments, OLE objects, encryption, formula injection (CSV), script tags (HTML/SVG), event handlers, external references.
- Aggressive — Everything in Standard, plus: format conversion to eliminate any format-specific attack surface. The output file type may differ from the input.
Archives (ZIP, 7z, TAR family, single-file compression)
Uploading an archive triggers recursive sanitization: each member is sanitized for its own format (PDF, Office, images, etc.) using the chosen level, then everything is re-packaged.
Supported containers:
.zip— output is a fresh.zip.7z— output is a fresh.7z(LZMA2).tar,.tar.gz/.tgz,.tar.bz2/.tbz/.tbz2,.tar.xz/.txz— output is always.tar.gzregardless of input compression.gz,.bz2,.xz(single-file, not wrapping a tar) — decompress → sanitize the inner file → recompress back to the same format. The inner filename must carry an extension hint (e.g.report.csv.gz→ inner.csv) so the decompressed payload can be routed to the right pipeline. Standard/Aggressive reject single-file archives without an inner ext; Light accepts them.
Tar and 7z archives must contain only regular files and directories. Symbolic links, device files, named pipes, and sockets are rejected at the bomb-check stage — they have no place in a sanitization payload and represent attack surface on extraction.
Limits
- Maximum uncompressed size: 100 MB across all members
- Maximum member count: 100 files (single-file compression: one member by definition)
- Maximum compression ratio: 100:1 (archive-bomb guard)
- Maximum nesting depth: 1 level (an archive inside an archive is allowed; deeper is rejected — applies uniformly across all archive types, including mixed nesting like a
.tar.gzinside a.zipor a.zipinside a.7z)
Archives exceeding any of these are rejected at the bomb-check stage with a 400 response and an actionable error message (e.g. "Archive expands to 487MB, above our 100MB limit").
Content that can't be fully sanitized — archiveUnsupported
Standard and Aggressive levels cover 100+ file extensions with format-specific sanitization. When an archive member has an extension outside that set (rare formats, RAW photos, 3D models, …), the archiveUnsupported parameter controls what happens. Default is drop. This applies to both multi-member archives and single-file compression:
| Value | Multi-member archive (.zip / .7z / tar family) | Single-file compression (.gz / .bz2 / .xz) |
|---|---|---|
drop (default) |
Members that can't be fully sanitized are removed from the output. The output contains only files that received full level-grade sanitization. Removed paths are listed in the report under archive_dropped. |
With only one inner file, "drop" means rejecting the whole upload — the response is 400 with an actionable message that suggests Light mode or light_fallback. |
light_fallback |
Members that can't be fully sanitized fall back to Light treatment instead (virus scan + metadata strip). May contain files where macros / scripts / embedded objects were not removed. Listed in the report under archive_fallback. |
The inner file falls back to Light treatment, then is recompressed in the same wrapper. The report's innerTreatment field is set to "light_fallback" so consumers can detect the downgrade without parsing labels. |
Forced to drop when level=aggressive — Aggressive mode's promise is that no original bytes survive, which Light treatment would contradict.
Executable/script extensions (.exe, .dll, .bat, etc.) inside Standard/Aggressive archive jobs are always rejected regardless of archiveUnsupported — both as archive members and as the inner file of a single-compressed upload. Every upload — including light_fallback payloads — is virus-scanned at the outer archive level before processing, with nested members (.zip / .7z / .gz / .tar etc.) covered recursively.
Example
curl -X POST https://cleanthis.io/api/v1/sanitize \ -H "Authorization: Bearer ct_live_YOUR_KEY" \ -F "file=@bundle.zip" \ -F "level=standard" \ -F "archiveUnsupported=light_fallback"
Report
Archive jobs return a roll-up report with per-member tallies plus full per-file detail. The changes array includes summary entries with these type values:
archive— summary count (e.g. "Archive contained 12 files: 10 sanitized at Standard, 2 removed")archive_cleaned— list of members that were processed at the requested levelarchive_fallback— list of members that received Light treatment instead (only present whenarchiveUnsupported=light_fallback)archive_dropped— list of members removed (with reason: unsupported, blocked executable, bomb check)archive_errored— list of members the sanitizer failed on (with reason)
The job's report.memberReports array contains one entry per cleaned/fallback member with that member's own full sanitization report (the same shape as a single-file job), so API consumers can list exactly what was stripped from each file. Each entry has path, cleanedPath, treatment (the level it was actually processed at: "light" / "standard" / "aggressive" / "light_fallback"), and report.
The job's report.archiveStats object contains numeric counts: totalMembers, cleanedCount, fallbackCount, droppedCount, erroredCount, nestedArchiveCount, and policy.
Nested archives are flattened
When an archive contains a nested archive (zip or tar), that wrapper is not represented as a single entry in memberReports. Instead, its contents appear in the parent's memberReports / droppedMembers / erroredMembers arrays with paths prefixed by the wrapper (e.g. inner.tar.gz/photo.png). The result is one flat list of actual files, regardless of how deeply nested they were. The archiveStats.nestedArchiveCount field counts how many wrappers were flattened. Mixed nesting is fine (a .tar.gz inside a .zip or vice versa).
Endpoints — Sanitization
| Method | Endpoint | Auth | Rate Limit | Description |
|---|---|---|---|---|
| POST | /api/v1/sanitize |
Bearer | 20/min | Upload file for sanitization. Optional level and webhook fields. |
| POST | /api/v1/sanitize-url |
Bearer | 20/min | Fetch from URL and sanitize. Optional level and webhook fields in JSON body. |
| GET | /api/v1/job/:id |
Bearer | 120/min | Poll job status. Returns downloadUrl when complete. |
| GET | /api/v1/download/:id |
Bearer | 30/min | Download sanitized file. Requires valid signed URL params (expires, sig). |
| POST | /api/v1/cancel/:id |
Bearer | 20/min | Cancel a queued or running job. |
POST /api/v1/sanitize
Upload a file for sanitization. Optionally include a webhook URL, a level, and (for archive / single-file-compressed uploads) an archiveUnsupported policy. See Sanitization Levels and Archives for details.
POST /api/v1/sanitize Authorization: Bearer ct_live_YOUR_KEY Content-Type: multipart/form-data file: (binary) level: standard (optional: light | standard | aggressive) archiveUnsupported: drop (optional, archive + single-compressed: drop | light_fallback) webhook: https://your-server.com/callback (optional)
Response 202 Accepted:
{
"jobId": "a1b2c3d4-...",
"status": "queued",
"level": "standard",
"statusUrl": "https://cleanthis.io/api/v1/job/a1b2c3d4-...",
"message": "File accepted for sanitization. Poll the statusUrl for progress.",
"webhook": { "url": "https://your-server.com/callback", "status": "registered" }
}POST /api/v1/sanitize-url
Provide a URL to fetch and sanitize. The server downloads the file, then processes it. Accepts the same level and archiveUnsupported parameters as the file upload endpoint.
POST /api/v1/sanitize-url
Authorization: Bearer ct_live_YOUR_KEY
Content-Type: application/json
{
"url": "https://example.com/bundle.zip",
"level": "standard",
"archiveUnsupported": "light_fallback",
"webhook": "https://your-server.com/callback"
}Response 202 Accepted:
{
"jobId": "e5f6g7h8-...",
"status": "queued",
"level": "aggressive",
"statusUrl": "https://cleanthis.io/api/v1/job/e5f6g7h8-...",
"message": "URL accepted for sanitization. Poll the statusUrl for progress.",
"webhook": { "url": "https://your-server.com/callback", "status": "registered" }
}GET /api/v1/job/:id
Poll job status. The response changes as the job progresses. The level field is always included so you can confirm which sanitization mode was applied.
Queued / In progress:
{
"jobId": "a1b2c3d4-...",
"status": "processing",
"level": "standard"
}Completed:
{
"jobId": "a1b2c3d4-...",
"status": "completed",
"level": "standard",
"downloadName": "doc_cleaned.pdf",
"downloadUrl": "https://cleanthis.io/api/v1/download/a1b2c3d4-...?expires=...&sig=...",
"report": {
"changes": [
{ "type": "level", "label": "Sanitization level: Standard — full CDR pipeline" },
{ "type": "macros", "label": "Removed 1 macro component: VBA Project" },
{ "type": "metadata", "label": "Stripped 3 metadata fields: Author: \"John\"; Creator: \"Word\"..." }
],
"summary": "3 changes made during sanitization."
}
}See Sanitization Report & Metadata for full details on the report object.
GET /api/v1/download/:id
Download the sanitized file. Use the downloadUrl from the poll or webhook response — it includes the required expires and sig query parameters.
curl -o clean.pdf "https://cleanthis.io/api/v1/download/JOB_ID?expires=...&sig=..." \ -H "Authorization: Bearer ct_live_YOUR_KEY"
POST /api/v1/cancel/:id
Cancel a queued or in-progress job.
curl -X POST https://cleanthis.io/api/v1/cancel/a1b2c3d4-... \ -H "Authorization: Bearer ct_live_YOUR_KEY"
Response 200 OK:
{ "status": "cancelled" }Endpoints — Webpage Scanner BETA
Scan a web page for threats, trackers, and brand-impersonation — the same engine behind the Webpage Scanner page, over the API. Three tiers:
- Quick (
light) — reputation & blocklists only; never loads the page. - Standard (
standard) — reputation + a single page fetch with static analysis (trackers, vulnerable libraries, AI-prompt-injection, TLS, redirects, brand-clone). - Deep (
aggressive) — loads the page in a real sandboxed browser (runtime connections, behaviour, screenshot, cloaking). Asynchronous — returns ajobIdyou poll.
| Method | Endpoint | Auth | Rate Limit | Description |
|---|---|---|---|---|
| POST | /api/v1/scan-url |
Bearer | 10/min · 50/day | Scan a URL. Body { url, tier, bypassCache? }. light/standard return the verdict directly; aggressive returns a jobId. |
| GET | /api/v1/scan-job/:id |
Bearer | 120/min | Poll a Deep (aggressive) scan. Returns the result when complete. |
| POST | /api/v1/scan-job/:id/cancel |
Bearer | 120/min | Cancel an in-flight Deep scan. |
POST /api/v1/scan-url
Quick / Standard (synchronous):
curl -s -X POST https://cleanthis.io/api/v1/scan-url \
-H "Authorization: Bearer ct_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{ "url": "https://example.com", "tier": "standard" }' | jqResponse 200 OK:
{
"ok": true,
"url": "https://example.com",
"tier": "standard",
"verdict": "clean", // clean | suspicious | malicious | unreachable | unknown
"findings": [ /* per-source/analyzer results */ ],
"scores": { "security": { "value": 92, "band": "green" }, "privacy": {…}, "legitimacy": {…} },
"cached": false,
"computedAt": "2026-05-31T18:00:00.000Z"
}Deep scan (asynchronous — submit, then poll):
curl -s -X POST https://cleanthis.io/api/v1/scan-url \
-H "Authorization: Bearer ct_live_YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{ "url": "https://example.com", "tier": "aggressive" }' | jq
# → { "ok": true, "jobId": "a1b2c3d4-...", "state": "processing" }GET /api/v1/scan-job/:id
Poll until state is completed or failed. On completion, if a screenshot was captured, result.screenshot.url is a short-lived (15 min) signed link.
curl -s https://cleanthis.io/api/v1/scan-job/a1b2c3d4-... \
-H "Authorization: Bearer ct_live_YOUR_KEY" | jq
# { "ok": true, "state": "processing" }
# … then …
# { "ok": true, "state": "completed", "result": { "verdict": "clean", "findings": […], "scores": {…}, "screenshot": { "url": "https://cleanthis.io/api/scan-shot/…?expires=…&sig=…" } } }Note: ct_test_* keys return a deterministic mock verdict (no real scan runs) and don't count against your quota.
Endpoints — Account Management
| Method | Endpoint | Auth | Rate Limit | Description |
|---|---|---|---|---|
| POST | /api/v1/account |
— | 5/hr | Create anonymous account |
| POST | /api/v1/account/login |
— | 10/15min | Log in (sets session cookie) |
| POST | /api/v1/account/logout |
Session | — | Clear session |
| GET | /api/v1/account |
Session | 30/min | Get account info + usage |
| DELETE | /api/v1/account |
Session | 30/min | Delete account and all data |
| GET | /api/v1/account/keys |
Session | 30/min | List API keys |
| POST | /api/v1/account/keys |
Session | 30/min | Create API key |
| DELETE | /api/v1/account/keys/:id |
Session | 30/min | Revoke API key |
| GET | /api/v1/account/usage |
Session | 30/min | Usage stats |
| GET | /api/v1/account/webhook-secret |
Session | 30/min | Get webhook signing secret |
| POST | /api/v1/account/webhook-secret/rotate |
Session | 30/min | Rotate webhook secret |
Webhooks
How to use
Include a webhook URL in your sanitize request. When the job completes (or fails), cleanthis.io sends a POST request to that URL with the result.
# Upload with webhook curl -X POST https://cleanthis.io/api/v1/sanitize \ -H "Authorization: Bearer ct_live_YOUR_KEY" \ -F file=@document.pdf \ -F webhook=https://your-server.com/callback
Payload
The webhook POST body is JSON:
{
"event": "job.completed",
"jobId": "a1b2c3d4-...",
"status": "completed",
"downloadUrl": "https://cleanthis.io/api/v1/download/...?expires=...&sig=...",
"downloadName": "doc_cleaned.pdf",
"report": {
"changes": [
{ "type": "level", "label": "Sanitization level: Aggressive — full CDR + format conversion" },
{ "type": "macros", "label": "Removed 1 macro component: VBA Project" },
{ "type": "conversion", "label": "Converted to PDF for maximum safety" }
],
"summary": "3 changes made during sanitization (Aggressive mode)."
},
"timestamp": "2026-04-23T01:23:00.000Z"
}The report field contains the full sanitization report including any detected metadata and the sanitization level applied. See Sanitization Report & Metadata for the complete structure.
Signature Verification
Every webhook includes two headers for verification:
X-CleanThis-Signature— HMAC-SHA256 of the raw request body, prefixed withsha256=X-CleanThis-Timestamp— ISO 8601 timestamp of when the webhook was sent
Always verify the signature before trusting the payload. Use a timing-safe comparison to prevent timing attacks.
Node.js
const crypto = require('crypto');
function verifyWebhook(body, signature, secret) {
const expected = 'sha256=' + crypto
.createHmac('sha256', secret)
.update(body, 'utf8')
.digest('hex');
return crypto.timingSafeEqual(
Buffer.from(signature),
Buffer.from(expected)
);
}
// Express handler
app.post('/callback', express.text({ type: '*/*' }), (req, res) => {
const sig = req.headers['x-cleanthis-signature'];
if (!verifyWebhook(req.body, sig, WEBHOOK_SECRET)) {
return res.status(403).send('Invalid signature');
}
const event = JSON.parse(req.body);
console.log('Completed:', event.downloadUrl);
res.sendStatus(200);
});
Retry Behavior
If your endpoint returns a non-2xx status code or doesn't respond within 10 seconds, cleanthis.io retries up to 3 times with exponential backoff:
- Attempt 1 — immediate
- Attempt 2 — after 2 seconds
- Attempt 3 — after 8 seconds
- Attempt 4 (final) — after 32 seconds
Requirements
- HTTPS only — webhook URLs must use
https:// - Public IP — the URL must resolve to a public IP address (SSRF-protected; private/loopback ranges are rejected)
Managing Your Secret
Retrieve your webhook signing secret from the dashboard or programmatically:
GET /api/v1/account/webhook-secret Cookie: session=...
To rotate your secret (the old secret is immediately invalidated):
POST /api/v1/account/webhook-secret/rotate Cookie: session=...
Signed Download URLs
All download URLs use HMAC-SHA256 signed URLs with an expiry timestamp. You never need to construct these yourself — the downloadUrl returned by the poll endpoint and webhook payload is ready to use.
- API downloads — signed URLs expire in 15 minutes
- Web UI downloads — signed URLs expire in 5 minutes
Expired & Tampered URLs
- Expired URLs return
410 Gonewith a descriptive message - Tampered URLs (modified
sigorexpires) return403 Forbidden
If your download URL has expired, simply poll GET /api/v1/job/:id again to receive a fresh signed URL.
Sanitization Report & Metadata
Every completed job includes a report object in both the poll response (GET /api/v1/job/:id) and the webhook payload. This report describes exactly what was found and removed during sanitization — including embedded metadata.
Report Structure
{
"report": {
"changes": [
{
"type": "macros",
"label": "Removed 1 macro component: VBA Project"
},
{
"type": "exif",
"label": "Stripped 14 EXIF/metadata fields including: GPS Location data; Model: \"iPhone 15 Pro\"",
"details": [
{ "key": "GPSLatitude", "value": "40.7128 N" },
{ "key": "GPSLongitude", "value": "74.0060 W" },
{ "key": "Make", "value": "Apple" },
{ "key": "Model", "value": "iPhone 15 Pro" },
{ "key": "DateTimeOriginal", "value": "2025:03:15 14:30:00" },
{ "key": "Software", "value": "17.4" }
]
}
],
"summary": "2 changes made during sanitization."
}
}Change Types
Each entry in the changes array has a type, a human-readable label, and optionally a details array for metadata fields.
| Type | Description | Has Details |
|---|---|---|
level | Indicates which sanitization level was applied (always present) | No |
conversion | File was converted to a safer format (aggressive mode only) | No |
size | File size changed significantly during sanitization | No |
macros | VBA macros, ActiveX controls, or OLE objects removed | No |
attachments | Embedded file attachments removed (PDF) | No |
encryption | PDF encryption/password protection removed | No |
javascript | JavaScript code removed (PDF) | No |
csv_formulas | Dangerous formula cells neutralised (CSV/TSV) | No |
scripts | Script tags, iframes, objects, forms removed (HTML/SVG/EPUB/subtitles) | No |
event_handlers | Inline event handlers removed (onclick, onload, etc.) | No |
external_refs | External URL references removed | No |
exif | EXIF/metadata stripped from images, audio, and video | Yes |
metadata | Document metadata stripped (author, creator, software, etc.) | Yes |
Metadata Details
When type is "exif" or "metadata", the change object includes a details array — the complete list of metadata fields that were detected and stripped. Each entry has a key (field name) and value (original value, truncated to 120 characters).
This is the same data the web UI offers as CSV/JSON export. As an API consumer, you receive it directly in the JSON response and can format it however you need.
Extracting Metadata — Examples
Node.js
const res = await fetch(`https://cleanthis.io/api/v1/job/${jobId}`, {
headers: { 'Authorization': `Bearer ${API_KEY}` }
});
const data = await res.json();
if (data.status === 'completed' && data.report) {
// Find metadata changes (exif or metadata type)
const metaChanges = data.report.changes.filter(
c => c.details && c.details.length > 0
);
for (const change of metaChanges) {
console.log(change.label);
for (const field of change.details) {
console.log(` ${field.key}: ${field.value}`);
}
}
}
Clean Files
If the file was already clean, the report will have an empty changes array:
{
"report": {
"changes": [],
"summary": "No significant threats or changes detected. The file was already clean."
}
}Rate Limits
| Endpoint Group | Limit | Keyed By |
|---|---|---|
| API v1 sanitize / cancel | 20/min | Account ID |
| API v1 job poll | 120/min | Account ID |
| API v1 download | 30/min | Account ID |
| Account create | 5/hr | IP |
| Account login | 10/15min | IP |
| Account management | 30/min | Session |
| Web UI upload | 10/min | IP |
| Web UI poll | 120/min | IP |
| Web UI download | 30/min | IP |
API v1 endpoints are keyed by account ID (not IP), so multiple servers behind NAT share a generous per-account limit. When rate-limited, responses include Retry-After header with the number of seconds to wait.
Error Responses
All errors return JSON with a single error field:
{ "error": "File type not allowed: .exe" }Common Status Codes
| Code | Meaning |
|---|---|
400 | Bad request — missing or invalid parameters |
401 | Unauthorized — missing or invalid API key / session |
403 | Forbidden — tampered signed URL or invalid webhook signature |
404 | Not found — job ID doesn't exist or doesn't belong to your account |
410 | Gone — download URL has expired |
413 | Payload too large — file exceeds the size limit |
415 | Unsupported media type — file type not allowed |
429 | Too many requests — rate limit exceeded (check Retry-After header) |
500 | Internal server error — something went wrong on our end |