Image OCR Scraper
Pricing
from $30.00 / 1,000 image processeds
Image OCR Scraper
Under maintenanceExtract text from any image. Bulk OCR for screenshots, scanned documents, receipts, signs, and photos. Supports 109 languages and outputs clean Markdown or structured JSON with bounding boxes.
Pricing
from $30.00 / 1,000 image processeds
Rating
0.0
(0)
Developer
Andrew
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extract text from any image — bulk OCR for screenshots, scanned documents, receipts, signs, menus, business cards, and photos. Supports 109 languages with text, table, formula, and chart recognition.
What you get
- Clean extracted text in Markdown format (preserves headings, lists, tables) or JSON (with bounding boxes for each text block)
- Accurate recognition powered by PaddleOCR — handles printed and handwritten text, multiple languages mixed in one image, rotated text, and low-quality scans
- One output record per image with the extracted text and source URL
- Bulk processing — pass up to 50 image URLs and get all text in a single run
- Export to JSON or CSV directly from the Apify console
Use cases
- Document digitization — turn scanned PDFs, receipts, and invoices into searchable text
- Data entry automation — pull structured data from forms, business cards, and ID documents
- Screenshot analysis — extract text from app screenshots, error messages, or social media images
- Translation pipelines — OCR a foreign-language image, then run the text through a translator
- Accessibility — generate text alternatives for image-based content
- Compliance & search — make image-heavy archives full-text searchable
How to use
- Paste one or more image URLs into the Images field (or upload files directly)
- Choose Output Format — Markdown for human-readable text, JSON for structured data with positions
- Click Run — extracted text appears in the Dataset tab when complete
- Export results as JSON or CSV, or connect to downstream actors via the Apify API
Output format
Each dataset record:
{"inputImageUrl": "https://example.com/receipt.jpg","text": "# Receipt\n\n**Store:** Example Mart\n**Date:** 2026-05-10\n\n| Item | Price |\n|------|-------|\n| Coffee | $4.50 |\n| Bagel | $3.25 |\n\n**Total:** $7.75","outputFormat": "markdown","status": "success","error": null}
When outputFormat is json, the text field contains structured JSON with text blocks and bounding-box coordinates instead of Markdown.
Input options
| Field | Type | Description |
|---|---|---|
| Images | URL list | One or more http/https image URLs or base64 data URIs |
| Upload Images | File upload | Upload images directly from your computer |
| Output Format | Select | Markdown (clean readable text) or JSON (structured with bounding boxes) — default: Markdown |
Limits
- Maximum 50 images per run
- Each image must be a publicly accessible URL or a base64 data URI
- Processing time is typically under 1 second per image
- Supported languages: 109 including English, Spanish, French, German, Chinese, Japanese, Korean, Arabic, Hindi, and more