Pricing

$6.50 / 1,000 image ocrs

Image to Text OCR — Extract Text from Images

Extract text from images with OCR, confidence scores, language options, page/image metadata, and automation-ready text exports.

Pricing

$6.50 / 1,000 image ocrs

Rating

0.0

(0)

Developer

junipr

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Introduction

Extract text from images using Tesseract OCR — fully self-contained, no external API keys required. This actor processes single images or batches of up to 500, returning clean extracted text with confidence scores at the word, line, and block level. Supported formats include JPEG, PNG, WebP, TIFF, BMP, and GIF.

Primary use cases: document digitization, receipt and invoice processing, screenshot text extraction, product label reading, and automated data entry from image-based sources.

Key differentiators: 100+ languages supported out of the box, word-level confidence scoring and bounding boxes for spatial analysis, automatic image preprocessing (deskew, contrast enhancement, binarization) for improved accuracy on low-quality scans, and zero infrastructure setup — just provide image URLs.

Why Use This Actor

No external API key required. Unlike Google Vision or AWS Textract, this actor uses Tesseract.js — a fully self-contained WASM OCR engine. There's no GCP project, no AWS account, no credit card required beyond your Apify plan.

Cost comparison: At $6.50 per 1,000 images in the live Store pricing entry, this actor is simpler to use than Google Vision or AWS Textract because it needs no separate cloud account, API key, or provider billing setup.

100+ languages included. Most online OCR tools support 25 or fewer languages. This actor ships with Tesseract's full language library — including CJK scripts (Chinese, Japanese, Korean), Arabic, Hindi, and dozens of European languages. Multi-language documents (e.g., eng+fra) are supported in a single run.

Structured output with confidence filtering. Get word-level confidence scores and pixel-accurate bounding boxes. Filter out low-confidence words automatically using minConfidence. Use bounding boxes to reconstruct document layout or extract specific regions.

Auto preprocessing improves accuracy. The actor automatically resizes small images, enhances contrast, binarizes, and corrects EXIF orientation before OCR. This produces noticeably better results on scanned documents, screenshots, and photos taken in poor lighting — without any manual image editing.

How to Use

Zero-config example: Provide an array of image URLs and the actor will extract text using English OCR with preprocessing enabled.

{
  "images": [
    { "url": "https://example.com/invoice.jpg" },
    { "url": "https://example.com/receipt.png" }
  ]
}

Japanese text extraction:

{
  "images": [{ "url": "https://example.com/japanese-doc.jpg" }],
  "language": "jpn",
  "outputLevel": "full"
}

Calling via Apify API (Node.js):

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('junipr/image-to-text').call({
  images: [{ url: 'https://example.com/image.png' }],
  language: 'eng',
  outputLevel: 'text',
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].text);

Filtering low-confidence results: Set minConfidence: 70 to only keep words the OCR engine is confident about. Useful for noisy or degraded images where some text is too blurry to read reliably.

Input Configuration

Parameter	Type	Default	Description
`images`	array	required	Array of `{url}` or `{kvStoreKey}` objects. Max 500.
`language`	string	`"eng"`	Tesseract language code. Use `"eng+fra"` for multiple.
`ocrEngine`	string	`"lstm"`	`"lstm"` (best), `"legacy"` (faster), `"combined"` (highest quality).
`pageSegMode`	integer	`3`	Page segmentation mode. 3=auto, 6=block, 7=line, 11=sparse.
`whitelist`	string	—	Only recognize these characters (e.g. digits for receipts).
`blacklist`	string	—	Never output these characters.
`preprocess`	boolean	`true`	Enable auto image preprocessing for better accuracy.
`enhanceContrast`	boolean	`true`	Normalize image contrast.
`binarize`	boolean	`true`	Convert to black and white before OCR.
`scale`	number	auto	Scale factor (0.5–4.0). Auto-scales small images.
`denoise`	boolean	`false`	Apply median denoising for scanned images.
`outputLevel`	string	`"text"`	`"text"`, `"lines"`, `"words"`, or `"full"` (with bounding boxes).
`minConfidence`	integer	`0`	Exclude words below this confidence (0–100).
`includeRawHocr`	boolean	`false`	Include raw hOCR XML in output (requires `outputLevel: "full"`).
`maxConcurrency`	integer	`2`	Max simultaneous OCR operations (1–5).
`imageTimeout`	integer	`30000`	Image download timeout in ms.
`ocrTimeout`	integer	`60000`	OCR processing timeout per image in ms.

Common configurations:

Quick text extraction (defaults):

{ "images": [{ "url": "..." }] }

High-quality OCR with bounding boxes:

{ "images": [{ "url": "..." }], "outputLevel": "full", "preprocess": true, "denoise": true }

Numbers only (receipts, invoices):

{ "images": [{ "url": "..." }], "whitelist": "0123456789.$,", "pageSegMode": 6 }

Multi-language document:

{ "images": [{ "url": "..." }], "language": "eng+fra+deu" }

Output Format

Text-level output (outputLevel: "text"):

{
  "sourceUrl": "https://example.com/invoice.jpg",
  "sourceKvKey": null,
  "text": "Invoice #12345\nDate: January 15, 2025\nTotal: $248.50",
  "language": "eng",
  "confidence": 94.2,
  "wordCount": 9,
  "lineCount": 3,
  "blockCount": 2,
  "imageWidth": 1200,
  "imageHeight": 800,
  "preprocessed": true,
  "processingTimeMs": 1840,
  "processedAt": "2025-01-15T12:00:00.000Z",
  "errors": []
}

Full output (outputLevel: "full") — includes bounding boxes:

{
  "sourceUrl": "https://example.com/doc.png",
  "text": "Hello World",
  "confidence": 96.5,
  "wordCount": 2,
  "blocks": [
    {
      "text": "Hello World",
      "confidence": 96.5,
      "bbox": { "x0": 10, "y0": 20, "x1": 200, "y1": 50 },
      "paragraphs": [
        {
          "text": "Hello World",
          "confidence": 96.5,
          "bbox": { "x0": 10, "y0": 20, "x1": 200, "y1": 50 },
          "lines": [
            {
              "text": "Hello World",
              "confidence": 96.5,
              "bbox": { "x0": 10, "y0": 20, "x1": 200, "y1": 50 },
              "words": [
                { "text": "Hello", "confidence": 97.1, "bbox": { "x0": 10, "y0": 20, "x1": 90, "y1": 50 } },
                { "text": "World", "confidence": 95.9, "bbox": { "x0": 100, "y0": 20, "x1": 200, "y1": 50 } }
              ]
            }
          ]
        }
      ]
    }
  ],
  "ocrEngineUsed": "lstm",
  "hocr": null,
  "processedAt": "2025-01-15T12:00:00.000Z",
  "errors": []
}

Confidence scores range from 0 (no confidence) to 100 (very high confidence). Scores above 85 indicate reliable OCR on clear printed text. Scores below 50 suggest blurry, degraded, or handwritten input.

Run summary is stored in the key-value store under the OUTPUT key and includes total images processed, success/failure counts, total words extracted, average confidence, and total duration.

Tips and Advanced Usage

Improving accuracy on blurry or faded images: Enable all preprocessing options (preprocess: true, enhanceContrast: true, binarize: true, denoise: true). For very small text, set scale: 2.0 to double the image size before OCR.

Extracting data from receipts and invoices: Use whitelist: "0123456789.$,." combined with pageSegMode: 6 (single block) to extract numbers only. This dramatically reduces misrecognized characters in numerical fields.

Handling rotated or tilted documents: Set pageSegMode: 1 (auto with orientation detection). Tesseract's OSD (Orientation and Script Detection) will automatically detect and correct orientation before OCR.

Processing screenshots with UI elements: The full output level preserves spatial layout via bounding boxes. You can filter extracted text by pixel coordinates to isolate specific UI regions without manual cropping.

CJK languages (Chinese, Japanese, Korean): Use language: "chi_sim" for Simplified Chinese, "chi_tra" for Traditional Chinese, "jpn" for Japanese, or "kor" for Korean. CJK language data files are larger (~15 MB), so the first run may be slightly slower while they download and cache.

Building document processing pipelines: Combine this actor with the PDF to Text Extractor for complete document coverage. Use image OCR for scanned PDFs and photos; use PDF to Text for digital PDFs.

Multi-language documents: Specify multiple languages as a +-separated string: "eng+fra", "eng+deu+fra". Tesseract loads all specified language models and attempts to recognize text in any of them. Performance decreases with more languages loaded simultaneously.

Pricing

This actor uses Pay-Per-Event pricing with the image-ocr event: $6.50 per 1,000 images processed ($0.0065 per image) in the live Store pricing entry.

Apify platform usage follows the live Store pricing entry.

A billable event is charged when: the image downloads successfully AND OCR processing completes (even if no text is found — processing still occurred). Images that fail to download or are invalid are not billed.

Volume	Estimated Cost
1 image	$0.0065
50 images (receipt batch)	$0.33
500 images (document scan)	$3.25
5,000 images (archive)	$32.50
50,000 images (enterprise)	$325.00

Comparison:

Google Vision API: ~$1.50/1K (but requires GCP account and setup)
AWS Textract: ~$1.50/1K (requires AWS account, complex setup)
OCR.space: $3+/1K (25 languages only, no bounding boxes)
This actor: $6.50/1K (no setup, 100+ languages, word-level bounding boxes)

FAQ

What languages are supported?

Over 100 languages via Tesseract's language data files. Common codes: eng (English), fra (French), deu (German), spa (Spanish), ita (Italian), por (Portuguese), chi_sim (Simplified Chinese), chi_tra (Traditional Chinese), jpn (Japanese), kor (Korean), ara (Arabic), hin (Hindi), rus (Russian). Full list available at the Tesseract language data repository.

How accurate is the OCR?

On clear, well-formatted printed text (documents, invoices, books), expect confidence scores above 85% and near-perfect extraction. On photos with text overlaid on complex backgrounds, poor lighting, or significant compression artifacts, accuracy decreases. Enable all preprocessing options for best results. CJK scripts and cursive fonts are harder for Tesseract than Latin scripts.

Can it read handwritten text?

Partially. Tesseract's LSTM engine handles basic block-letter handwriting but is not specialized for cursive or varied handwriting styles. Confidence scores will be lower (often below 50) for handwritten input. For dedicated handwriting recognition, consider Google Vision or Azure Computer Vision.

What image formats are supported?

JPEG, PNG, WebP, TIFF, BMP, and GIF. Animated GIFs are supported — only the first frame is processed. Very large images (over 4,000 pixels on the longest side) are automatically resized to conserve memory.

How do I improve accuracy on blurry images?

Enable all preprocessing: preprocess: true, enhanceContrast: true, binarize: true, denoise: true. Increase scale with scale: 2.0 or scale: 3.0 for very small text. Try ocrEngine: "combined" for the highest accuracy (slowest). If possible, source higher-resolution images before running OCR.

Can I extract text from PDFs?

Not with this actor. PDFs (both digital and scanned) are handled by the companion PDF to Text Extractor actor. If you provide a .pdf URL here, the actor will return a clear error directing you to the correct tool.

What do the confidence scores mean?

Tesseract returns a confidence percentage (0–100) for each recognized word and for the overall document. Above 85 = reliable printed text. 60–85 = acceptable for most documents. 30–60 = degraded image or complex layout. Below 30 = likely handwriting, noise, or very poor quality. Use minConfidence to exclude low-confidence words from output.

Does it detect text orientation automatically?

Yes, when pageSegMode is set to 1 (Auto with OSD). The default mode 3 (fully automatic) handles moderate rotation but may miss severe rotation. For images known to be rotated 90° or 180°, set pageSegMode: 1 explicitly. EXIF-based rotation is also corrected automatically during preprocessing.

PDF to Text Extractor — Extract text from PDF files (digital and scanned)
Document Format Converter — Convert between document formats
Web Page Change Monitor — Monitor web pages for content changes
Multi-Resolution Screenshot — Capture screenshots at multiple viewport sizes

Google Lens OCR API - Image to Text Under 500ms REST API

zen-studio/google-lens-ocr

Extract text from any image via Google Lens OCR API. Under 500ms per image, no browser needed. Returns word-level bounding boxes with pixel coordinates, detected language, and structured paragraphs/lines/words. Batch and HTTP API modes.

Zen Studio

346

Video Subtitle & Caption Extractor

khadinakbar/video-subtitle-extractor

Extract subtitles, captions, and AI transcripts from any video URL across 1000+ platforms (YouTube, Vimeo, TikTok, Instagram, X/Twitter, Facebook, Twitch, TED, Bilibili). Native captions first, Whisper AI fallback when none. JSON, SRT, VTT, text, or LLM-ready markdown. MCP/API-ready.

Khadin Akbar

Github Email Scraper

louisdeconinck/github-email-scraper

Instantly extract contributor emails and detailed profiles from any public GitHub repository or organization to supercharge your developer outreach and recruiting.

Louis Deconinck

119

1.0

GitHub API Scraper: Repos & Profiles

andok/github-api-scraper

Extract GitHub repository stats, forks, stars, and user profiles directly from the API. Perfect for developer lead gen and competitor tracking.

Andok

Leetcode API

taneja/leetcode-api-scraper

Scrape LeetCode user profiles, problems, contests, and discussions. Get user stats, solved problems, contest ratings, submission history, activity heatmaps, daily challenges, and more - all without authentication.

Raghav Taneja

Github Profile Scraper

saswave/github-profile-scraper

GitHub User Profile Scraper. Extracts data from GitHub profiles, including followers, following, LinkedIn, Twitter, achievements and much more. Ideal for developers, researchers, and marketers. From a list of Github profile or a repository stargazers link

SASWAVE

242

Reddit Scraper - Posts, Comments, Subreddits & Users

makework36/reddit-scraper

Fast, reliable Reddit scraper. Extract posts, comments, subreddits & users from any subreddit without Reddit API keys or login. AI-ready JSON for LLM training, sentiment analysis, lead generation. Export JSON/CSV/Excel.

deusex machine

132

Kick Video Downloader ✅ | No proxy needed

x_guru/kick-video-downloader

Download Kick.com videos, VODs, clips and streams. No proxy needed. Save to Apify storage or your own cloud (AWS S3, Azure, Google Cloud).

Hundevmode Labs

217

Twitch Email Scraper

scraper-mind/twitch-email-scraper

Boost your outreach with the Twitch Email Scraper—fast, accurate & affordable! Extract Twitch emails by keywords & location. Perfect for marketers & businesses.

Scraper Mind

247

Reddit Scraper - Posts, Comments, Subreddits, Search

thirdwatch/reddit-scraper

Scrape Reddit posts, comments, and subreddits. Search globally or within specific subreddits. Get post title, body, score, comments, author, flair, awards, and media URLs. Ultra-fast HTTP-only scraper using Reddit's built-in JSON API.