Pricing

Pay per event

MIME Type Detector

Detect MIME types from file extensions, URLs, or magic bytes (base64). Batch process thousands of files. Uses mime-types + file-type packages. Zero proxy, pure utility.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What does MIME Type Detector do?

MIME Type Detector takes a list of filenames, URLs, or base64-encoded file content and returns the correct MIME type for each input. It uses three detection strategies in order of confidence:

Magic bytes (base64 content) — reads the file's binary signature for definitive identification (e.g., PDF files always start with %PDF-)
Filename extension — maps extensions like .pdf, .xlsx, .mp4 to their MIME types using the mime-types database
URL path — extracts the file extension from the URL pathname for fast lookup

Each result includes the MIME type, character set (for text formats), canonical extension, detection method, and confidence level.

Who is MIME Type Detector for?

Backend developers building file upload services — validate that uploaded files match their declared content type before processing or storing them.

Verify file types without running a browser or calling an external API
Batch-validate hundreds of file names from a database in one run
Cross-check user-provided content types against actual file signatures

Data pipeline engineers processing mixed media — when pulling files from S3, FTP, or crawl datasets, MIME types determine how to route, transform, or index each file.

Classify thousands of file records from a manifest or URL list
Route images, documents, and videos to different processing queues
Generate accurate Content-Type headers for re-serving files

DevOps and infosec teams auditing file uploads — identify files that have been renamed to evade extension-based filters by checking their actual magic bytes.

Detect .exe files disguised as .txt or .jpg
Audit S3 buckets or CDN origins for mismatched content types
Integrate MIME verification into CI/CD pipelines via the Apify API

Automation builders and no-code users — use MIME detection as a step in larger workflows without writing any code.

Connect to Zapier or Make to add MIME detection to file processing automations
Export results to Google Sheets for team review
Schedule recurring audits of file repositories

Why use MIME Type Detector?

🔍 Three detection strategies — magic bytes are the gold standard; extension fallback handles the common case; URL extraction works for cloud storage links
⚡ Extremely fast — pure CPU, no network requests. Thousands of items process in seconds
📦 Batch processing — submit up to thousands of items in one run
💯 High accuracy for common formats — PDF, JPEG, PNG, GIF, WebP, MP4, ZIP, DOCX, XLSX, and 700+ other formats supported by the file-type library
🔧 Structured output — every result includes method, confidence, charset, and extension — not just the MIME string
💰 Pay-per-item pricing — no monthly subscription; you pay only for items analyzed
🔌 Apify API + scheduling — automate MIME checks on a schedule, trigger via webhook, or integrate into any tech stack
🤖 MCP-ready — use directly from Claude Code, Claude Desktop, or any MCP-enabled AI agent

What data can you extract?

Each result contains the following fields:

Field	Type	Description
`input`	string	The original input value (filename, URL, or truncated base64 label)
`mimeType`	string \| null	The detected MIME type, e.g. `application/pdf`
`charset`	string \| null	Character set for text formats (e.g. `UTF-8` for `text/html`)
`extension`	string \| null	Canonical file extension including the dot (e.g. `.pdf`)
`method`	string	Detection method: `magic-bytes`, `extension`, `url`, or `unknown`
`confidence`	string	Confidence level: `high` (magic bytes), `medium` (extension/URL), `low` (unknown)
`error`	string \| null	Error message if detection failed, null otherwise

Supported input types:

Input type	Field	Example
Filename	`filename`	`"report.pdf"`, `"photo.JPEG"`, `"archive.tar.gz"`
URL	`url`	`"https://cdn.example.com/video.mp4"`
Base64 content	`base64Content`	First 4 KB of any file encoded as base64

How much does it cost to detect MIME types?

This Actor uses pay-per-event pricing — you pay only for what you detect. No monthly subscription. All platform costs are included.

	Free	Starter ($29/mo)	Scale ($199/mo)	Business ($999/mo)
Per detection	$0.00115	$0.001	$0.00078	$0.0006
1,000 detections	$1.15	$1.00	$0.78	$0.60
10,000 detections	$11.50	$10.00	$7.80	$6.00

Plus a one-time start fee of $0.005 per run (same across all tiers).

Real-world cost examples:

Input	Items	Duration	Cost (Free tier)
10 filenames	10	< 1s	~$0.017
100 URLs	100	< 1s	~$0.121
1,000 mixed items	1,000	~2s	~$1.16
10,000 filenames	10,000	~5s	~$11.51

With the free $5 credit Apify gives every new account, you can detect over 4,000 MIME types for free.

How to detect MIME types with this Actor

Open MIME Type Detector on Apify Store
Click Try for free
In the Items to detect field, enter your list of filenames, URLs, or base64 content
Click Start — the run completes in seconds
View results in the Dataset tab, or export to JSON/CSV/Excel

Example: detect from filenames only

{
    "items": [
        { "filename": "invoice.pdf" },
        { "filename": "product_photo.jpg" },
        { "filename": "data_export.csv" },
        { "filename": "backup.tar.gz" }
    ]
}

Example: detect from URLs

{
    "items": [
        { "url": "https://cdn.example.com/assets/logo.svg" },
        { "url": "https://files.example.com/report_2024.xlsx" },
        { "url": "https://storage.googleapis.com/bucket/video.webm" }
    ]
}

Example: magic-byte detection from base64 content

{
    "items": [
        { "base64Content": "JVBERi0xLjQKJeLjz9MKCg==" },
        { "base64Content": "iVBORw0KGgoAAAANSUhEUgA=" },
        { "filename": "unknown.bin", "base64Content": "UEsDBBQAAAAIA..." }
    ]
}

When both filename and base64Content are provided, magic-byte detection runs first — the filename is used as a label if magic-byte detection fails.

Input parameters

Parameter	Type	Required	Description
`items`	array	Yes	List of items to analyze. Each item must have at least one of: `filename`, `url`, `base64Content`
`items[].filename`	string	No	A filename (e.g. `photo.jpg`) or just the extension (e.g. `.pdf`)
`items[].url`	string	No	A full URL — the path component is parsed for the file extension
`items[].base64Content`	string	No	Base64-encoded file bytes. The first 4,100 bytes are sufficient for magic-byte detection

Tips for inputs:

Filenames are case-insensitive: .JPEG, .jpeg, .Jpeg all resolve to image/jpeg
URLs with query parameters (e.g. ?token=abc) are parsed correctly — only the pathname is used
When providing base64 content, you don't need the full file — the first 512 bytes are enough for most formats. Use 4,100 bytes to cover all formats supported by file-type
You can mix all three input types in the same batch

Output examples

Successful detection from filename:

{
    "input": "invoice.pdf",
    "mimeType": "application/pdf",
    "charset": null,
    "extension": ".pdf",
    "method": "extension",
    "confidence": "medium",
    "error": null
}

Magic-byte detection (highest confidence):

{
    "input": "base64:JVBERi0xLjQKJe...",
    "mimeType": "application/pdf",
    "charset": null,
    "extension": ".pdf",
    "method": "magic-bytes",
    "confidence": "high",
    "error": null
}

Text format with charset:

{
    "input": "styles.css",
    "mimeType": "text/css",
    "charset": "UTF-8",
    "extension": ".css",
    "method": "extension",
    "confidence": "medium",
    "error": null
}

Unknown format (fallback):

{
    "input": "unknown_file",
    "mimeType": "application/octet-stream",
    "charset": null,
    "extension": null,
    "method": "unknown",
    "confidence": "low",
    "error": null
}

Malformed item (error):

{
    "input": "(empty)",
    "mimeType": null,
    "charset": null,
    "extension": null,
    "method": "unknown",
    "confidence": "low",
    "error": "Each item must have at least one of: filename, url, base64Content"
}

Tips for best results

🎯 Use magic bytes for security-critical checks — extension-based detection can be spoofed by renaming files. Always use base64Content when verifying file uploads in security-sensitive contexts
📉 Keep base64 content short — you only need the first 4,100 bytes for reliable detection. Sending full file contents wastes bandwidth and makes no difference to accuracy
🚀 Batch all items in one run — the actor is optimized for bulk processing. Submitting 10,000 items in one run is more efficient than 10,000 separate runs
📊 Use the method field to filter results — if you need only high-confidence detections, filter for method === "magic-bytes". For general use, extension results are reliable for well-known formats
⚠️ application/octet-stream means unknown — this is the RFC 2046 fallback when no MIME type could be determined. Check the method field: unknown means no detection succeeded
🔗 URLs must have a file extension in the path — URLs like https://api.example.com/files/12345 (no extension) cannot be detected by extension lookup. Provide base64Content instead
📋 Start small — test with a handful of items first to verify the results match your expectations before submitting large batches

Integrations

MIME Type Detector → Google Sheets — export results to Google Sheets for team review of a file repository audit. Use Apify's native Google Sheets integration or the Google Sheets API Actor.

MIME Type Detector → Make (Integromat) — trigger MIME detection in a Make scenario when new files arrive in Dropbox, S3, or Google Drive. Route files to different processing flows based on the detected MIME type.

MIME Type Detector → Zapier — chain with Zapier's file processing actions to validate uploads before storing them in Airtable, Notion, or your CRM.

Scheduled audit — set a daily or weekly schedule to run MIME detection on a list of URLs from your CDN or file storage. Get alerts when file types change unexpectedly.

Webhook-triggered validation — call the Actor via Apify API webhook whenever a new file is uploaded to your system. Return the MIME type to your backend for routing decisions without running a Node.js process yourself.

CI/CD pipeline integration — call the actor from a GitHub Actions workflow or Jenkins job to validate that build artifacts have the expected MIME types before deployment.

Using the Apify API

Run MIME Type Detector programmatically from any language using the Apify API.

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('automation-lab/mime-type-detector').call({
    items: [
        { filename: 'document.pdf' },
        { filename: 'photo.jpg' },
        { url: 'https://cdn.example.com/video.mp4' },
        { base64Content: 'JVBERi0xLjQKJeLjz9MKCg==' },
    ],
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient(token='YOUR_APIFY_TOKEN')

run = client.actor('automation-lab/mime-type-detector').call(run_input={
    'items': [
        {'filename': 'document.pdf'},
        {'filename': 'photo.jpg'},
        {'url': 'https://cdn.example.com/video.mp4'},
        {'base64Content': 'JVBERi0xLjQKJeLjz9MKCg=='},
    ],
})

items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~mime-type-detector/runs?token=YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "items": [
      {"filename": "document.pdf"},
      {"filename": "photo.jpg"},
      {"url": "https://cdn.example.com/video.mp4"}
    ]
  }'

Use with AI agents via MCP

MIME Type Detector is available as a tool for AI assistants that support the Model Context Protocol (MCP).

Add the Apify MCP server to your AI client — this gives you access to all Apify actors, including this one:

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com"
        }
    }
}

Your AI assistant will use OAuth to authenticate with your Apify account on first use.

Example prompts

Once connected, try asking your AI assistant:

"Use automation-lab/mime-type-detector to detect the MIME types of: document.pdf, photo.jpg, archive.tar.gz, and script.js"
"I have a list of CDN URLs from our S3 bucket — use the MIME Type Detector to classify each file by type"
"Check what MIME type this base64-encoded file header corresponds to: JVBERi0xLjQKJeLjz9MKCg=="

Learn more in the Apify MCP documentation.

Is it legal to use MIME Type Detector?

Yes. This actor performs no web scraping and makes no HTTP requests to third-party websites. It processes only the data you supply — filenames, URL strings, and base64 content — entirely locally. There are no terms of service concerns, no robots.txt considerations, and no privacy implications unless you supply personally identifiable filenames.

MIME type detection is a standard software operation, equivalent to running the file command on Linux or calling URLSession.mimeType in iOS. Use it responsibly as part of lawful file processing pipelines.

FAQ

What MIME types does it support? Extension-based detection uses the mime-types package, which covers 750+ MIME types including all common document, image, video, audio, archive, and code formats. Magic-byte detection uses the file-type package, which supports 150+ binary formats including PDF, JPEG, PNG, GIF, WebP, HEIC, MP4, ZIP, RAR, DOCX, XLSX, and more.

How many items can I process in one run? There is no hard limit set by the actor. In practice, runs with tens of thousands of items complete in under a minute. Very large batches (100K+ items) will take a few minutes but run fine within the 60-second default timeout — increase timeoutSecs in the input if processing very large batches.

How much does it cost per item? On the Free plan, each detection costs $0.00115 plus a $0.005 start fee per run. For 1,000 items that's about $1.16. Paid plans (Starter, Scale, Business) offer significant discounts — see the pricing table above.

Is extension-based detection reliable? For well-known formats (PDF, JPEG, MP4, DOCX, ZIP), yes — extensions are standardized and the mime-types database is comprehensive. For security-sensitive use cases (e.g., blocking malicious uploads), always use magic-byte detection via base64Content because extensions can be renamed.

Why does my URL return application/octet-stream? URLs that don't have a file extension in the path (e.g., https://api.example.com/files/12345) can't be detected by extension lookup. Provide the file's bytes as base64Content for reliable detection.

Why does the actor return method: "unknown" for some filenames? Files with no extension (e.g., Makefile, .gitignore, README) or unrecognized extensions don't have a MIME type in the database. The actor returns application/octet-stream as the RFC 2046 safe default.

Can it detect files disguised with wrong extensions? Yes — if you provide base64Content, the actor checks magic bytes first regardless of the filename. A .txt file that is actually a PDF will be detected as application/pdf with confidence: "high".

Other utility actors

Looking for more file and data processing tools? Check out these automation-lab actors:

Color Contrast Checker (WCAG) — validate color pairs against WCAG 2.1 AA/AAA accessibility standards
JSON Schema Generator — generate JSON Schema from sample JSON data
Lazada Scraper — extract product listings and reviews from Lazada

Wayback Machine CDX URL List Scraper

parseforge/wayback-cdx-scraper

Pull every archived URL the Internet Archive has captured for any domain or URL prefix. Get timestamps, MIME types, status codes, content digests, and direct snapshot links. Filter by date range, status, MIME, and uniqueness. Export to JSON, CSV, or Excel for SEO recovery and competitive research.

ParseForge

Common Crawl URL Index Lookup Scraper

parseforge/common-crawl-index-scraper

Pull every web page Common Crawl captured for a domain or URL prefix. Get timestamps, MIME types, status codes, content digests, and WARC offsets to fetch original payloads. Filter by collection, MIME, and status. Export to JSON, CSV, or Excel for large-scale web research and content discovery.

ParseForge

Content-Type Header Validator

scrappy_garden/content-type-header-validator

Fetches URLs and validates the Content-Type header (MIME type + optional charset). Flags missing/mismatched types and recommends X-Content-Type-Options: nosniff. Outputs per-URL results plus SUMMARY and REPORT.

Bikram Adhikari

Base64 API

vivid_astronaut/base64

Fabio Suizu

Wayback Machine Scraper

glassventures/wayback-machine-scraper

Scrape Wayback Machine archive snapshots for any URL or domain. Get archived URLs, timestamps, status codes, MIME types. Export to JSON, CSV, Excel.

Glass Ventures

Base64 Encoder & Decoder

rixin/base64-encoder-decoder

From $0.1/1k requests. Encode text or files to Base64, decode Base64 back to text or binary files. Perfect for data transfer, API payloads, email attachments, and image embedding. Batch process multiple items in one run. Supports text and files. Fast, reliable encoding solution.

Rixin Sc

File Converter API

vivid_astronaut/file-converter

Fabio Suizu

Kaggle Datasets Scraper

parseforge/kaggle-scraper

Extract Kaggle dataset metadata at scale: titles, owners, descriptions, tags, license, file types, sizes, downloads, views, and votes. Filter by search, tag, user, file type, or size.

ParseForge

Wayback Machine Scraper - Track Website Changes Over Time

ryanclinton/wayback-machine-search

Search the Internet Archive's Wayback Machine for historical snapshots of any website. Retrieve archived page metadata -- including timestamps, URLs, MIME types, HTTP status codes, and content hashes -- for up to 10,000 snapshots per run.