Pricing

from $30.00 / 1,000 image processeds

Image OCR Scraper

Extract text from any image. Bulk OCR for screenshots, scanned documents, receipts, signs, and photos. Supports 109 languages and outputs clean Markdown or structured JSON with bounding boxes.

Pricing

from $30.00 / 1,000 image processeds

Rating

0.0

(0)

Developer

Andrew

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

What you get

Clean extracted text in Markdown format (preserves headings, lists, tables) or JSON (with bounding boxes for each text block)
Accurate recognition powered by PaddleOCR - handles printed and handwritten text, multiple languages mixed in one image, rotated text, and low-quality scans
One output record per image with the extracted text and source URL
Bulk processing - pass up to 50 image URLs and get all text in a single run
Export to JSON or CSV directly from the Apify console

Use cases

Document digitization - turn scanned PDFs, receipts, and invoices into searchable text
Data entry automation - pull structured data from forms, business cards, and ID documents
Screenshot analysis - extract text from app screenshots, error messages, or social media images
Translation pipelines - OCR a foreign-language image, then run the text through a translator
Accessibility - generate text alternatives for image-based content
Compliance & search - make image-heavy archives full-text searchable

How to use

Paste one or more image URLs into the Images field (or upload files directly)
Choose Output Format - Markdown for human-readable text, JSON for structured data with positions
Click Run - extracted text appears in the Dataset tab when complete
Export results as JSON or CSV, or connect to downstream actors via the Apify API

Output format

Each dataset record:

{
  "inputImageUrl": "https://example.com/receipt.jpg",
  "text": "# Receipt\n\n**Store:** Example Mart\n**Date:** 2026-05-10\n\n| Item | Price |\n|------|-------|\n| Coffee | $4.50 |\n| Bagel | $3.25 |\n\n**Total:** $7.75",
  "outputFormat": "markdown",
  "status": "success",
  "error": null
}

When outputFormat is json, the text field contains structured JSON with text blocks and bounding-box coordinates instead of Markdown.

Input options

Field	Type	Description
Images	URL list	One or more `http/https` image URLs or base64 data URIs
Upload Images	File upload	Upload images directly from your computer
Output Format	Select	`Markdown` (clean readable text) or `JSON` (structured with bounding boxes) - default: Markdown

Limits

Maximum 50 images per run
Each image must be a publicly accessible URL or a base64 data URI
Processing time is typically under 1 second per image
Supported languages: 109 including English, Spanish, French, German, Chinese, Japanese, Korean, Arabic, Hindi, and more

Part of a complete AI image toolkit - explore the rest of the suite:

AI Image Background Remover - Remove backgrounds to clean transparent PNGs
AI Image Upscaler - Batch-upscale images to 4K or 8K
AI Image Watermark Remover - Remove text and logo watermarks from images
AI Image Captioner - Generate text descriptions for any image
Photo Location Finder - Find where a photo was taken - no EXIF needed

OCR Text Extraction API — Image to Text

vivid_astronaut/ocr

Extract text from images with OCR in multiple languages and image formats. Send an image, get the recognized text back — ideal for digitizing documents, receipts, screenshots, and scans.

Fabio Suizu

OCR Structured Extractor (AI) — Image/PDF → OCR Text + JSON

macheta/ocr-structured-extractor

Extract OCR text and structured JSON from an image or PDF URL. Great for invoices, receipts, forms, IDs, and tables. Powered by Gemini 3 Pro.

Anass

Image OCR — Extract Text from Images

zenomastro/image-ocr-text-extractor

Extract text from images, photos, screenshots and scans using Tesseract OCR. Returns recognized text with a confidence score. Supports many languages.

Rosario Vitale

PDF OCR Tool — Scanned PDF Text Extraction

junipr/pdf-ocr-tool

Run OCR on scanned PDFs and image-based documents. Extract text by page with language options, confidence scores, and searchable text exports.

junipr

Image to Text OCR — Extract Text from Images

junipr/image-to-text

Extract text from images with OCR, confidence scores, language options, page/image metadata, and automation-ready text exports.

junipr

Google Lens Search API - Reverse Image Search & OCR

zen-studio/google-lens-visual-search

Reverse image search via Google Lens. Returns visual matches, AI descriptions, related links, related searches, and OCR text with bounding boxes. Four modes from fast OCR-only to full all-tabs extraction.

Zen Studio

211

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

The Howlers

OCR & Document Extractor – PDF & Image to Text, JSON, Word

lofomachines/ocr-document-extractor

Convert scanned PDFs and images into clean, structured text in bulk. Export to JSON, Markdown, DOCX, TXT or HTML with tables and layout preserved.

Lofomachines

Google Lens OCR API: Sub-second Image to Text

getascraper/google-lens-ocr

Extract text from any image with exact word-level bounding boxes and pixel coordinates. Powered by the official Google Lens engine for sub-second, multi-language OCR under 500ms. No browser required.