MIME Type Detector avatar

MIME Type Detector

Pricing

Pay per event

Go to Apify Store
MIME Type Detector

MIME Type Detector

Detect MIME types from file extensions, URLs, or magic bytes (base64). Batch process thousands of files. Uses mime-types + file-type packages. Zero proxy, pure utility.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

Detect MIME types from file extensions, URLs, or raw file bytes (base64-encoded magic bytes). Supports batch processing — analyze thousands of files in a single run.

Uses the mime-types npm package for extension-based lookup and file-type for magic-byte detection. Zero proxy, no HTTP requests — pure utility.

What does MIME Type Detector do?

MIME Type Detector takes a list of filenames, URLs, or base64-encoded file content and returns the correct MIME type for each input. It uses three detection strategies in order of confidence:

  1. Magic bytes (base64 content) — reads the file's binary signature for definitive identification (e.g., PDF files always start with %PDF-)
  2. Filename extension — maps extensions like .pdf, .xlsx, .mp4 to their MIME types using the mime-types database
  3. URL path — extracts the file extension from the URL pathname for fast lookup

Each result includes the MIME type, character set (for text formats), canonical extension, detection method, and confidence level.

Who is MIME Type Detector for?

Backend developers building file upload services — validate that uploaded files match their declared content type before processing or storing them.

  • Verify file types without running a browser or calling an external API
  • Batch-validate hundreds of file names from a database in one run
  • Cross-check user-provided content types against actual file signatures

Data pipeline engineers processing mixed media — when pulling files from S3, FTP, or crawl datasets, MIME types determine how to route, transform, or index each file.

  • Classify thousands of file records from a manifest or URL list
  • Route images, documents, and videos to different processing queues
  • Generate accurate Content-Type headers for re-serving files

DevOps and infosec teams auditing file uploads — identify files that have been renamed to evade extension-based filters by checking their actual magic bytes.

  • Detect .exe files disguised as .txt or .jpg
  • Audit S3 buckets or CDN origins for mismatched content types
  • Integrate MIME verification into CI/CD pipelines via the Apify API

Automation builders and no-code users — use MIME detection as a step in larger workflows without writing any code.

  • Connect to Zapier or Make to add MIME detection to file processing automations
  • Export results to Google Sheets for team review
  • Schedule recurring audits of file repositories

Why use MIME Type Detector?

  • 🔍 Three detection strategies — magic bytes are the gold standard; extension fallback handles the common case; URL extraction works for cloud storage links
  • Extremely fast — pure CPU, no network requests. Thousands of items process in seconds
  • 📦 Batch processing — submit up to thousands of items in one run
  • 💯 High accuracy for common formats — PDF, JPEG, PNG, GIF, WebP, MP4, ZIP, DOCX, XLSX, and 700+ other formats supported by the file-type library
  • 🔧 Structured output — every result includes method, confidence, charset, and extension — not just the MIME string
  • 💰 Pay-per-item pricing — no monthly subscription; you pay only for items analyzed
  • 🔌 Apify API + scheduling — automate MIME checks on a schedule, trigger via webhook, or integrate into any tech stack
  • 🤖 MCP-ready — use directly from Claude Code, Claude Desktop, or any MCP-enabled AI agent

What data can you extract?

Each result contains the following fields:

FieldTypeDescription
inputstringThe original input value (filename, URL, or truncated base64 label)
mimeTypestring | nullThe detected MIME type, e.g. application/pdf
charsetstring | nullCharacter set for text formats (e.g. UTF-8 for text/html)
extensionstring | nullCanonical file extension including the dot (e.g. .pdf)
methodstringDetection method: magic-bytes, extension, url, or unknown
confidencestringConfidence level: high (magic bytes), medium (extension/URL), low (unknown)
errorstring | nullError message if detection failed, null otherwise

Supported input types:

Input typeFieldExample
Filenamefilename"report.pdf", "photo.JPEG", "archive.tar.gz"
URLurl"https://cdn.example.com/video.mp4"
Base64 contentbase64ContentFirst 4 KB of any file encoded as base64

How much does it cost to detect MIME types?

This Actor uses pay-per-event pricing — you pay only for what you detect. No monthly subscription. All platform costs are included.

FreeStarter ($29/mo)Scale ($199/mo)Business ($999/mo)
Per detection$0.00115$0.001$0.00078$0.0006
1,000 detections$1.15$1.00$0.78$0.60
10,000 detections$11.50$10.00$7.80$6.00

Plus a one-time start fee of $0.005 per run (same across all tiers).

Real-world cost examples:

InputItemsDurationCost (Free tier)
10 filenames10< 1s~$0.017
100 URLs100< 1s~$0.121
1,000 mixed items1,000~2s~$1.16
10,000 filenames10,000~5s~$11.51

With the free $5 credit Apify gives every new account, you can detect over 4,000 MIME types for free.

How to detect MIME types with this Actor

  1. Open MIME Type Detector on Apify Store
  2. Click Try for free
  3. In the Items to detect field, enter your list of filenames, URLs, or base64 content
  4. Click Start — the run completes in seconds
  5. View results in the Dataset tab, or export to JSON/CSV/Excel

Example: detect from filenames only

{
"items": [
{ "filename": "invoice.pdf" },
{ "filename": "product_photo.jpg" },
{ "filename": "data_export.csv" },
{ "filename": "backup.tar.gz" }
]
}

Example: detect from URLs

{
"items": [
{ "url": "https://cdn.example.com/assets/logo.svg" },
{ "url": "https://files.example.com/report_2024.xlsx" },
{ "url": "https://storage.googleapis.com/bucket/video.webm" }
]
}

Example: magic-byte detection from base64 content

{
"items": [
{ "base64Content": "JVBERi0xLjQKJeLjz9MKCg==" },
{ "base64Content": "iVBORw0KGgoAAAANSUhEUgA=" },
{ "filename": "unknown.bin", "base64Content": "UEsDBBQAAAAIA..." }
]
}

When both filename and base64Content are provided, magic-byte detection runs first — the filename is used as a label if magic-byte detection fails.

Input parameters

ParameterTypeRequiredDescription
itemsarrayYesList of items to analyze. Each item must have at least one of: filename, url, base64Content
items[].filenamestringNoA filename (e.g. photo.jpg) or just the extension (e.g. .pdf)
items[].urlstringNoA full URL — the path component is parsed for the file extension
items[].base64ContentstringNoBase64-encoded file bytes. The first 4,100 bytes are sufficient for magic-byte detection

Tips for inputs:

  • Filenames are case-insensitive: .JPEG, .jpeg, .Jpeg all resolve to image/jpeg
  • URLs with query parameters (e.g. ?token=abc) are parsed correctly — only the pathname is used
  • When providing base64 content, you don't need the full file — the first 512 bytes are enough for most formats. Use 4,100 bytes to cover all formats supported by file-type
  • You can mix all three input types in the same batch

Output examples

Successful detection from filename:

{
"input": "invoice.pdf",
"mimeType": "application/pdf",
"charset": null,
"extension": ".pdf",
"method": "extension",
"confidence": "medium",
"error": null
}

Magic-byte detection (highest confidence):

{
"input": "base64:JVBERi0xLjQKJe...",
"mimeType": "application/pdf",
"charset": null,
"extension": ".pdf",
"method": "magic-bytes",
"confidence": "high",
"error": null
}

Text format with charset:

{
"input": "styles.css",
"mimeType": "text/css",
"charset": "UTF-8",
"extension": ".css",
"method": "extension",
"confidence": "medium",
"error": null
}

Unknown format (fallback):

{
"input": "unknown_file",
"mimeType": "application/octet-stream",
"charset": null,
"extension": null,
"method": "unknown",
"confidence": "low",
"error": null
}

Malformed item (error):

{
"input": "(empty)",
"mimeType": null,
"charset": null,
"extension": null,
"method": "unknown",
"confidence": "low",
"error": "Each item must have at least one of: filename, url, base64Content"
}

Tips for best results

  • 🎯 Use magic bytes for security-critical checks — extension-based detection can be spoofed by renaming files. Always use base64Content when verifying file uploads in security-sensitive contexts
  • 📉 Keep base64 content short — you only need the first 4,100 bytes for reliable detection. Sending full file contents wastes bandwidth and makes no difference to accuracy
  • 🚀 Batch all items in one run — the actor is optimized for bulk processing. Submitting 10,000 items in one run is more efficient than 10,000 separate runs
  • 📊 Use the method field to filter results — if you need only high-confidence detections, filter for method === "magic-bytes". For general use, extension results are reliable for well-known formats
  • ⚠️ application/octet-stream means unknown — this is the RFC 2046 fallback when no MIME type could be determined. Check the method field: unknown means no detection succeeded
  • 🔗 URLs must have a file extension in the path — URLs like https://api.example.com/files/12345 (no extension) cannot be detected by extension lookup. Provide base64Content instead
  • 📋 Start small — test with a handful of items first to verify the results match your expectations before submitting large batches

Integrations

MIME Type Detector → Google Sheets — export results to Google Sheets for team review of a file repository audit. Use Apify's native Google Sheets integration or the Google Sheets API Actor.

MIME Type Detector → Make (Integromat) — trigger MIME detection in a Make scenario when new files arrive in Dropbox, S3, or Google Drive. Route files to different processing flows based on the detected MIME type.

MIME Type Detector → Zapier — chain with Zapier's file processing actions to validate uploads before storing them in Airtable, Notion, or your CRM.

Scheduled audit — set a daily or weekly schedule to run MIME detection on a list of URLs from your CDN or file storage. Get alerts when file types change unexpectedly.

Webhook-triggered validation — call the Actor via Apify API webhook whenever a new file is uploaded to your system. Return the MIME type to your backend for routing decisions without running a Node.js process yourself.

CI/CD pipeline integration — call the actor from a GitHub Actions workflow or Jenkins job to validate that build artifacts have the expected MIME types before deployment.

Using the Apify API

Run MIME Type Detector programmatically from any language using the Apify API.

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('automation-lab/mime-type-detector').call({
items: [
{ filename: 'document.pdf' },
{ filename: 'photo.jpg' },
{ url: 'https://cdn.example.com/video.mp4' },
{ base64Content: 'JVBERi0xLjQKJeLjz9MKCg==' },
],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient(token='YOUR_APIFY_TOKEN')
run = client.actor('automation-lab/mime-type-detector').call(run_input={
'items': [
{'filename': 'document.pdf'},
{'filename': 'photo.jpg'},
{'url': 'https://cdn.example.com/video.mp4'},
{'base64Content': 'JVBERi0xLjQKJeLjz9MKCg=='},
],
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~mime-type-detector/runs?token=YOUR_APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"items": [
{"filename": "document.pdf"},
{"filename": "photo.jpg"},
{"url": "https://cdn.example.com/video.mp4"}
]
}'

Use with AI agents via MCP

MIME Type Detector is available as a tool for AI assistants that support the Model Context Protocol (MCP).

Add the Apify MCP server to your AI client — this gives you access to all Apify actors, including this one:

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
"mcpServers": {
"apify": {
"url": "https://mcp.apify.com"
}
}
}

Your AI assistant will use OAuth to authenticate with your Apify account on first use.

Example prompts

Once connected, try asking your AI assistant:

  • "Use automation-lab/mime-type-detector to detect the MIME types of: document.pdf, photo.jpg, archive.tar.gz, and script.js"
  • "I have a list of CDN URLs from our S3 bucket — use the MIME Type Detector to classify each file by type"
  • "Check what MIME type this base64-encoded file header corresponds to: JVBERi0xLjQKJeLjz9MKCg=="

Learn more in the Apify MCP documentation.

Yes. This actor performs no web scraping and makes no HTTP requests to third-party websites. It processes only the data you supply — filenames, URL strings, and base64 content — entirely locally. There are no terms of service concerns, no robots.txt considerations, and no privacy implications unless you supply personally identifiable filenames.

MIME type detection is a standard software operation, equivalent to running the file command on Linux or calling URLSession.mimeType in iOS. Use it responsibly as part of lawful file processing pipelines.

FAQ

What MIME types does it support? Extension-based detection uses the mime-types package, which covers 750+ MIME types including all common document, image, video, audio, archive, and code formats. Magic-byte detection uses the file-type package, which supports 150+ binary formats including PDF, JPEG, PNG, GIF, WebP, HEIC, MP4, ZIP, RAR, DOCX, XLSX, and more.

How many items can I process in one run? There is no hard limit set by the actor. In practice, runs with tens of thousands of items complete in under a minute. Very large batches (100K+ items) will take a few minutes but run fine within the 60-second default timeout — increase timeoutSecs in the input if processing very large batches.

How much does it cost per item? On the Free plan, each detection costs $0.00115 plus a $0.005 start fee per run. For 1,000 items that's about $1.16. Paid plans (Starter, Scale, Business) offer significant discounts — see the pricing table above.

Is extension-based detection reliable? For well-known formats (PDF, JPEG, MP4, DOCX, ZIP), yes — extensions are standardized and the mime-types database is comprehensive. For security-sensitive use cases (e.g., blocking malicious uploads), always use magic-byte detection via base64Content because extensions can be renamed.

Why does my URL return application/octet-stream? URLs that don't have a file extension in the path (e.g., https://api.example.com/files/12345) can't be detected by extension lookup. Provide the file's bytes as base64Content for reliable detection.

Why does the actor return method: "unknown" for some filenames? Files with no extension (e.g., Makefile, .gitignore, README) or unrecognized extensions don't have a MIME type in the database. The actor returns application/octet-stream as the RFC 2046 safe default.

Can it detect files disguised with wrong extensions? Yes — if you provide base64Content, the actor checks magic bytes first regardless of the filename. A .txt file that is actually a PDF will be detected as application/pdf with confidence: "high".

Other utility actors

Looking for more file and data processing tools? Check out these automation-lab actors: