Pricing

from $0.99 / 1,000 results

Try for free

Go to Apify Store

Vision OCR MCP

Try for free

Extract text from images instantly. Turn receipts, invoices, documents, and handwritten notes into structured data.

Pricing

from $0.99 / 1,000 results

Rating

5.0

(1)

Developer

Acceleration

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Vision OCR MCP Server

A Model Context Protocol server for extracting text from images. This server enables LLMs to read invoices, receipts, and documents in 100+ languages while preserving the original script.

About this MCP Server: To understand how to connect to and utilize this MCP server, please refer to the official Model Context Protocol documentation at mcp.apify.com.

Connection URL

MCP clients can connect to this server at:

https://accelerationengg--vision-ocr-mcp.apify.actor/mcp

Client Configuration

To connect to this MCP server, use the following configuration in your MCP client:

{
  "mcpServers": {
    "vision-ocr": {
      "url": "https://accelerationengg--vision-ocr-mcp.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Note: Replace YOUR_APIFY_TOKEN with your actual Apify API token. You can find your token in the Apify Console.

Claude Desktop Configuration

To use this MCP server with Claude Desktop, add the following configuration to your Claude Desktop settings:

Location: ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows)

{
  "mcpServers": {
    "apifyVisionOCR": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://accelerationengg--vision-ocr-mcp.apify.actor/mcp",
        "--header",
        "Authorization: Bearer YOUR_APIFY_TOKEN"
      ]
    }
  }
}

Steps:

Open Claude Desktop configuration file at the location above
Add the configuration with your Apify API token (replace YOUR_APIFY_TOKEN)
Save the file
Restart Claude Desktop
The vision_ocr tool will now be available in your conversations

Available Tools

vision_ocr - Extracts structured data from images with language detection and price extraction.

Parameters:

images (array, required) - List of image URLs, file paths, or base64 strings (max 15)
output_format (string, optional) - "json" (default) or "toon" for compact output

Returns:

{
  "language_detected": "ur",
  "description_text": "رسید | تاریخ: ۲۰۲۶-۰۱-۰۴ | چائے",
  "price_1": "₨۵۰",
  "price_2": "₨۱۲۵"
}

Features

✅ Multilingual OCR - Urdu (اردو), Arabic (العربية), English, Chinese (中文), and 100+ languages
✅ Price Detection - Automatically extracts prices from invoices/receipts
✅ Layout Preservation - Maintains tables and columns with "|" separators
✅ Batch Processing - Process up to 15 images in parallel
✅ Fast - 12-15 seconds per image

Supported Formats

Images: PNG, JPG, JPEG, WEBP (GIF not supported)
Languages: 100+ including Urdu, Arabic, English, Chinese, Hindi, Spanish, French, German

Output Formats

The server supports two output formats optimized for different use cases:

JSON Format (Default)

Standard structured output - easiest to parse and integrate with applications.

Example Output:

{
  "model": "Qwen/Qwen3-VL-30B",
  "image_count": 1,
  "total_time_seconds": 3.91,
  "results": [
    {
      "index": 0,
      "data": {
        "language_detected": "ar",
        "description_text": "TURKISH CORNER Date:6/10/2019 Time:6:56 PM Table:B12 Ticket No:243 -1Homus حمص 1-Mutabel متبل 1-Baba Ghanouj بابا غنوج 1-Fatoush فتوش 1-Olive Salad سلطة زيتون 1-Green Salad سلطة خضراء 1-Grapes Leaves ورق عنب 1-Tabouleh تبولة 1-Vegetable with Youghurt Salad سلطة خضار باللبن 1-Hot Salad سلطة حارة Total: 8.00 Cash 8.00 THANK YOU",
        "price_1": "8.00",
        "price_2": "8.00"
      },
      "processing_time": 3.91
    }
  ]
}

TOON Format (Token-Efficient)

Compact notation that saves ~30% tokens - ideal for LLM processing and cost optimization.

Example Output:

model: Qwen/Qwen3-VL-30B-A3B-Instruct
image_count: 1
total_time_seconds: 3.76
results:
  [1]{index,data,processing_time}:
    0,{'language_detected': 'ar', 'description_text': 'TURKISH CORNER Date:6/10/2019 Time:6:56 PM Table:B12 Ticket No:243 -1Homus حمص 1-Mutabel متبل 1-Baba Ghanouj بابا غنوج 1-Fatoush فتوش 1-Olive Salad سلطة زيتون 1-Green Salad سلطة خضراء 1-Grapes Leaves ورق عنب 1-Tabouleh تبولة 1-Vegetable with Youghurt Salad سلطة خضار باللبن 1-Hot Salad سلطة حارة Total: 8.00 Cash 8.00 THANK YOU', 'price_1': '8.00', 'price_2': '8.00'},3.76

When to use each format:

JSON: Standard API integration, automated parsing, strict schema validation
TOON: Sending to LLMs for analysis, reducing token costs, human-readable logs

Use Cases

Financial documents: Invoices, receipts, bills
Multi-column tables: Spreadsheets, reports
Multilingual documents: Documents with Arabic, Urdu, Chinese, and other scripts
Form extraction: Structured data from forms

Python API Usage

Installation

$pip install apify-client

Basic Example

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

# Extract text from image
run_input = {
    "images": ["https://example.com/receipt.jpg"],
    "output_format": "json"  # or "toon"
}

run = client.actor("accelerationengg/vision-ocr-mcp").call(run_input=run_input)

# Get results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    text = item['results'][0]['data']['description_text']
    language = item['results'][0]['data']['language_detected']
    print(f"Language: {language}\nText: {text}")

Batch Processing

# Process multiple images
run_input = {
    "images": [
        "https://example.com/invoice1.jpg",
        "https://example.com/invoice2.jpg",
        "https://example.com/invoice3.jpg"
    ],
    "output_format": "json"
}

run = client.actor("accelerationengg/vision-ocr-mcp").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    for result in item['results']:
        print(f"Image {result['index']}: {result['data']['description_text'][:100]}...")

TOON Format for LLM Processing

# Use TOON format to save ~30% tokens
run_input = {
    "images": ["https://example.com/receipt.jpg"],
    "output_format": "toon"
}

run = client.actor("accelerationengg/vision-ocr-mcp").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    toon_output = item['content']
    
    # Send directly to Claude for analysis
    # Uses ~30% fewer tokens than JSON
    # response = claude.messages.create(
    #     model="claude-3-5-sonnet-20241022",
    #     messages=[{
    #         "role": "user",
    #         "content": f"Analyze this receipt:\n{toon_output}"
    #     }]
    # )

Example Usage

Single Image

Extract text from this receipt:
https://example.com/receipt.jpg

Multiple Images

Process these invoices:
- https://example.com/invoice1.jpg
- https://example.com/invoice2.jpg
- https://example.com/invoice3.jpg

Built with Qwen-VL, FastMCP, Apify

Ocr

vivid_astronaut/ocr

Extract text from images using advanced OCR technology. Supports multiple languages and image formats. Perfect for digitizing documents, receipts, screenshots, and scanned text.

Fabio Suizu

OCR Structured Extractor (AI) — Image/PDF → OCR Text + JSON

macheta/ocr-structured-extractor

Extract OCR text and structured JSON from an image or PDF URL. Great for invoices, receipts, forms, IDs, and tables. Powered by Gemini 3 Pro.

Anass

Bulk Pdf To Json OCR

gagandeo/bulk-pdf-to-json-ocr

Convert PDF invoices, menus, images with text and documents into structured JSON. Features hybrid Digital+OCR parsing and AI-powered data extraction.

Kumar Gagandeo

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

The Howlers

Pdf Json Extractor

p6t_p10n/pdf-json-extractor

Convert any PDF into structured JSON using AI and OCR (Tesseract or Google Vision). Supports custom schemas, validation, and auto-repair. Ideal for invoices, contracts, receipts, and automation workflows. Fast, accurate, and easy to integrate.

Peerapat Pongnipakorn

Elite Document Ocr Lite

thepattyroller/elite-document-ocr-lite

Basic document text extraction and processing. Extract text from documents, analyze document structure, and extract structured data from invoices and receipts. Perfect for document automation workflows.

Logan Kiser

Image To Text Ai

welcoming_fireplace/image-to-text-ai

A powerful OCR tool that goes beyond standard text extraction. Powered by a Premium Vision AI model, it accurately reads handwriting, preserves table structures, and converts messy receipts or documents into structured JSON or Markdown. Supports batch processing for high-volume workflows.

Richmond Nkrumah

Receipt OCR API

happitap/receipt-ocr-api

Receipt OCR API - Multi-Model Text Extraction : Extract structured data from receipt images using advanced OCR technology with support for multiple AI models including Google Vision, OpenAI, Azure, AWS Textract, Gemini, Hugging Face, DeepSeek, and Native OCR.

HappiTap

5.0

Invoice Data Extractor

calm_necessity/invoice-data-extractor

AI-powered Bill actor for extracting structured data from invoices, receipts, and documents. Upload an image to receive clean, structured data including vendor details, invoice numbers, line items, totals, and other key fields.

Taher Ali Badnawarwala

Document Extractor API - AI-Powered PDF & Text Analysis

fresh_cliff/document-extractor-api

Extract text and data from PDF, Word, and image documents using AI-powered OCR. Convert documents to structured JSON, analyze content, and extract insights. No API keys required with mirror fallbacks.