Vision OCR MCP avatar
Vision OCR MCP

Pricing

from $0.99 / 1,000 results

Go to Apify Store
Vision OCR MCP

Vision OCR MCP

Extract text from images instantly. Turn receipts, invoices, documents, and handwritten notes into structured data.

Pricing

from $0.99 / 1,000 results

Rating

5.0

(1)

Developer

Acceleration

Acceleration

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

a day ago

Last modified

Categories

Share

Vision OCR MCP Server

A Model Context Protocol server for extracting text from images. This server enables LLMs to read invoices, receipts, and documents in 100+ languages while preserving the original script.

About this MCP Server: To understand how to connect to and utilize this MCP server, please refer to the official Model Context Protocol documentation at mcp.apify.com.


Connection URL

MCP clients can connect to this server at:

https://accelerationengg--vision-ocr-mcp.apify.actor/mcp

Client Configuration

To connect to this MCP server, use the following configuration in your MCP client:

{
"mcpServers": {
"vision-ocr": {
"url": "https://accelerationengg--vision-ocr-mcp.apify.actor/mcp",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}

Note: Replace YOUR_APIFY_TOKEN with your actual Apify API token. You can find your token in the Apify Console.


Claude Desktop Configuration

To use this MCP server with Claude Desktop, add the following configuration to your Claude Desktop settings:

Location: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows)

{
"mcpServers": {
"apifyVisionOCR": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://accelerationengg--vision-ocr-mcp.apify.actor/mcp",
"--header",
"Authorization: Bearer YOUR_APIFY_TOKEN"
]
}
}
}

Steps:

  1. Open Claude Desktop configuration file at the location above
  2. Add the configuration with your Apify API token (replace YOUR_APIFY_TOKEN)
  3. Save the file
  4. Restart Claude Desktop
  5. The vision_ocr tool will now be available in your conversations

Available Tools

vision_ocr - Extracts structured data from images with language detection and price extraction.

Parameters:

  • images (array, required) - List of image URLs, file paths, or base64 strings (max 15)
  • output_format (string, optional) - "json" (default) or "toon" for compact output

Returns:

{
"language_detected": "ur",
"description_text": "رسید | تاریخ: ۲۰۲۶-۰۱-۰۴ | چائے",
"price_1": "₨۵۰",
"price_2": "₨۱۲۵"
}

Features

Multilingual OCR - Urdu (اردو), Arabic (العربية), English, Chinese (中文), and 100+ languages
Price Detection - Automatically extracts prices from invoices/receipts
Layout Preservation - Maintains tables and columns with "|" separators
Batch Processing - Process up to 15 images in parallel
Fast - 12-15 seconds per image


Supported Formats

Images: PNG, JPG, JPEG, WEBP (GIF not supported)
Languages: 100+ including Urdu, Arabic, English, Chinese, Hindi, Spanish, French, German

Output Formats

The server supports two output formats optimized for different use cases:

JSON Format (Default)

Standard structured output - easiest to parse and integrate with applications.

Example Output:

{
"model": "Qwen/Qwen3-VL-30B",
"image_count": 1,
"total_time_seconds": 3.91,
"results": [
{
"index": 0,
"data": {
"language_detected": "ar",
"description_text": "TURKISH CORNER Date:6/10/2019 Time:6:56 PM Table:B12 Ticket No:243 -1Homus حمص 1-Mutabel متبل 1-Baba Ghanouj بابا غنوج 1-Fatoush فتوش 1-Olive Salad سلطة زيتون 1-Green Salad سلطة خضراء 1-Grapes Leaves ورق عنب 1-Tabouleh تبولة 1-Vegetable with Youghurt Salad سلطة خضار باللبن 1-Hot Salad سلطة حارة Total: 8.00 Cash 8.00 THANK YOU",
"price_1": "8.00",
"price_2": "8.00"
},
"processing_time": 3.91
}
]
}

TOON Format (Token-Efficient)

Compact notation that saves ~30% tokens - ideal for LLM processing and cost optimization.

Example Output:

model: Qwen/Qwen3-VL-30B-A3B-Instruct
image_count: 1
total_time_seconds: 3.76
results:
[1]{index,data,processing_time}:
0,{'language_detected': 'ar', 'description_text': 'TURKISH CORNER Date:6/10/2019 Time:6:56 PM Table:B12 Ticket No:243 -1Homus حمص 1-Mutabel متبل 1-Baba Ghanouj بابا غنوج 1-Fatoush فتوش 1-Olive Salad سلطة زيتون 1-Green Salad سلطة خضراء 1-Grapes Leaves ورق عنب 1-Tabouleh تبولة 1-Vegetable with Youghurt Salad سلطة خضار باللبن 1-Hot Salad سلطة حارة Total: 8.00 Cash 8.00 THANK YOU', 'price_1': '8.00', 'price_2': '8.00'},3.76

When to use each format:

  • JSON: Standard API integration, automated parsing, strict schema validation
  • TOON: Sending to LLMs for analysis, reducing token costs, human-readable logs

Use Cases

  • Financial documents: Invoices, receipts, bills
  • Multi-column tables: Spreadsheets, reports
  • Multilingual documents: Documents with Arabic, Urdu, Chinese, and other scripts
  • Form extraction: Structured data from forms

Python API Usage

Installation

$pip install apify-client

Basic Example

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
# Extract text from image
run_input = {
"images": ["https://example.com/receipt.jpg"],
"output_format": "json" # or "toon"
}
run = client.actor("accelerationengg/vision-ocr-mcp").call(run_input=run_input)
# Get results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
text = item['results'][0]['data']['description_text']
language = item['results'][0]['data']['language_detected']
print(f"Language: {language}\nText: {text}")

Batch Processing

# Process multiple images
run_input = {
"images": [
"https://example.com/invoice1.jpg",
"https://example.com/invoice2.jpg",
"https://example.com/invoice3.jpg"
],
"output_format": "json"
}
run = client.actor("accelerationengg/vision-ocr-mcp").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
for result in item['results']:
print(f"Image {result['index']}: {result['data']['description_text'][:100]}...")

TOON Format for LLM Processing

# Use TOON format to save ~30% tokens
run_input = {
"images": ["https://example.com/receipt.jpg"],
"output_format": "toon"
}
run = client.actor("accelerationengg/vision-ocr-mcp").call(run_input=run_input)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
toon_output = item['content']
# Send directly to Claude for analysis
# Uses ~30% fewer tokens than JSON
# response = claude.messages.create(
# model="claude-3-5-sonnet-20241022",
# messages=[{
# "role": "user",
# "content": f"Analyze this receipt:\n{toon_output}"
# }]
# )

Example Usage

Single Image

Extract text from this receipt:
https://example.com/receipt.jpg

Multiple Images

Process these invoices:
- https://example.com/invoice1.jpg
- https://example.com/invoice2.jpg
- https://example.com/invoice3.jpg

Built with Qwen-VL, FastMCP, Apify