Image to Text (OCR) avatar

Image to Text (OCR)

Pricing

$5.20 / 1,000 image ocrs

Go to Apify Store
Image to Text (OCR)

Image to Text (OCR)

Extract text from images using Tesseract.js OCR engine. Supports 100+ languages, PDFs, and bulk image processing.

Pricing

$5.20 / 1,000 image ocrs

Rating

0.0

(0)

Developer

junipr

junipr

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Image to Text API (OCR)

Introduction

Extract text from images using Tesseract OCR — fully self-contained, no external API keys required. This actor processes single images or batches of up to 500, returning clean extracted text with confidence scores at the word, line, and block level. Supported formats include JPEG, PNG, WebP, TIFF, BMP, and GIF.

Primary use cases: document digitization, receipt and invoice processing, screenshot text extraction, product label reading, and automated data entry from image-based sources.

Key differentiators: 100+ languages supported out of the box, word-level confidence scoring and bounding boxes for spatial analysis, automatic image preprocessing (deskew, contrast enhancement, binarization) for improved accuracy on low-quality scans, and zero infrastructure setup — just provide image URLs.


Why Use This Actor

No external API key required. Unlike Google Vision or AWS Textract, this actor uses Tesseract.js — a fully self-contained WASM OCR engine. There's no GCP project, no AWS account, no credit card required beyond your Apify plan.

Cost comparison: At $5.20 per 1,000 images, this actor is cheaper than OCR.space ($3+/1K) and simpler to use than Google Vision (which requires GCP setup and separate billing). For volume users, the cost difference on a 10,000-image batch is over $5.

100+ languages included. Most online OCR tools support 25 or fewer languages. This actor ships with Tesseract's full language library — including CJK scripts (Chinese, Japanese, Korean), Arabic, Hindi, and dozens of European languages. Multi-language documents (e.g., eng+fra) are supported in a single run.

Structured output with confidence filtering. Get word-level confidence scores and pixel-accurate bounding boxes. Filter out low-confidence words automatically using minConfidence. Use bounding boxes to reconstruct document layout or extract specific regions.

Auto preprocessing improves accuracy. The actor automatically resizes small images, enhances contrast, binarizes, and corrects EXIF orientation before OCR. This produces noticeably better results on scanned documents, screenshots, and photos taken in poor lighting — without any manual image editing.


How to Use

Zero-config example: Provide an array of image URLs and the actor will extract text using English OCR with preprocessing enabled.

{
"images": [
{ "url": "https://example.com/invoice.jpg" },
{ "url": "https://example.com/receipt.png" }
]
}

Japanese text extraction:

{
"images": [{ "url": "https://example.com/japanese-doc.jpg" }],
"language": "jpn",
"outputLevel": "full"
}

Calling via Apify API (Node.js):

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('junipr/image-to-text').call({
images: [{ url: 'https://example.com/image.png' }],
language: 'eng',
outputLevel: 'text',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].text);

Filtering low-confidence results: Set minConfidence: 70 to only keep words the OCR engine is confident about. Useful for noisy or degraded images where some text is too blurry to read reliably.


Input Configuration

ParameterTypeDefaultDescription
imagesarrayrequiredArray of {url} or {kvStoreKey} objects. Max 500.
languagestring"eng"Tesseract language code. Use "eng+fra" for multiple.
ocrEnginestring"lstm""lstm" (best), "legacy" (faster), "combined" (highest quality).
pageSegModeinteger3Page segmentation mode. 3=auto, 6=block, 7=line, 11=sparse.
whiteliststringOnly recognize these characters (e.g. digits for receipts).
blackliststringNever output these characters.
preprocessbooleantrueEnable auto image preprocessing for better accuracy.
enhanceContrastbooleantrueNormalize image contrast.
binarizebooleantrueConvert to black and white before OCR.
scalenumberautoScale factor (0.5–4.0). Auto-scales small images.
denoisebooleanfalseApply median denoising for scanned images.
outputLevelstring"text""text", "lines", "words", or "full" (with bounding boxes).
minConfidenceinteger0Exclude words below this confidence (0–100).
includeRawHocrbooleanfalseInclude raw hOCR XML in output (requires outputLevel: "full").
maxConcurrencyinteger2Max simultaneous OCR operations (1–5).
imageTimeoutinteger30000Image download timeout in ms.
ocrTimeoutinteger60000OCR processing timeout per image in ms.

Common configurations:

Quick text extraction (defaults):

{ "images": [{ "url": "..." }] }

High-quality OCR with bounding boxes:

{ "images": [{ "url": "..." }], "outputLevel": "full", "preprocess": true, "denoise": true }

Numbers only (receipts, invoices):

{ "images": [{ "url": "..." }], "whitelist": "0123456789.$,", "pageSegMode": 6 }

Multi-language document:

{ "images": [{ "url": "..." }], "language": "eng+fra+deu" }

Output Format

Text-level output (outputLevel: "text"):

{
"sourceUrl": "https://example.com/invoice.jpg",
"sourceKvKey": null,
"text": "Invoice #12345\nDate: January 15, 2025\nTotal: $248.50",
"language": "eng",
"confidence": 94.2,
"wordCount": 9,
"lineCount": 3,
"blockCount": 2,
"imageWidth": 1200,
"imageHeight": 800,
"preprocessed": true,
"processingTimeMs": 1840,
"processedAt": "2025-01-15T12:00:00.000Z",
"errors": []
}

Full output (outputLevel: "full") — includes bounding boxes:

{
"sourceUrl": "https://example.com/doc.png",
"text": "Hello World",
"confidence": 96.5,
"wordCount": 2,
"blocks": [
{
"text": "Hello World",
"confidence": 96.5,
"bbox": { "x0": 10, "y0": 20, "x1": 200, "y1": 50 },
"paragraphs": [
{
"text": "Hello World",
"confidence": 96.5,
"bbox": { "x0": 10, "y0": 20, "x1": 200, "y1": 50 },
"lines": [
{
"text": "Hello World",
"confidence": 96.5,
"bbox": { "x0": 10, "y0": 20, "x1": 200, "y1": 50 },
"words": [
{ "text": "Hello", "confidence": 97.1, "bbox": { "x0": 10, "y0": 20, "x1": 90, "y1": 50 } },
{ "text": "World", "confidence": 95.9, "bbox": { "x0": 100, "y0": 20, "x1": 200, "y1": 50 } }
]
}
]
}
]
}
],
"ocrEngineUsed": "lstm",
"hocr": null,
"processedAt": "2025-01-15T12:00:00.000Z",
"errors": []
}

Confidence scores range from 0 (no confidence) to 100 (very high confidence). Scores above 85 indicate reliable OCR on clear printed text. Scores below 50 suggest blurry, degraded, or handwritten input.

Run summary is stored in the key-value store under the OUTPUT key and includes total images processed, success/failure counts, total words extracted, average confidence, and total duration.


Tips and Advanced Usage

Improving accuracy on blurry or faded images: Enable all preprocessing options (preprocess: true, enhanceContrast: true, binarize: true, denoise: true). For very small text, set scale: 2.0 to double the image size before OCR.

Extracting data from receipts and invoices: Use whitelist: "0123456789.$,." combined with pageSegMode: 6 (single block) to extract numbers only. This dramatically reduces misrecognized characters in numerical fields.

Handling rotated or tilted documents: Set pageSegMode: 1 (auto with orientation detection). Tesseract's OSD (Orientation and Script Detection) will automatically detect and correct orientation before OCR.

Processing screenshots with UI elements: The full output level preserves spatial layout via bounding boxes. You can filter extracted text by pixel coordinates to isolate specific UI regions without manual cropping.

CJK languages (Chinese, Japanese, Korean): Use language: "chi_sim" for Simplified Chinese, "chi_tra" for Traditional Chinese, "jpn" for Japanese, or "kor" for Korean. CJK language data files are larger (~15 MB), so the first run may be slightly slower while they download and cache.

Building document processing pipelines: Combine this actor with the PDF to Text Extractor for complete document coverage. Use image OCR for scanned PDFs and photos; use PDF to Text for digital PDFs.

Multi-language documents: Specify multiple languages as a +-separated string: "eng+fra", "eng+deu+fra". Tesseract loads all specified language models and attempts to recognize text in any of them. Performance decreases with more languages loaded simultaneously.


Pricing

This actor uses Pay-Per-Event pricing at $5.20 per 1,000 images processed ($0.0052 per image).

Pricing includes all platform compute costs — no hidden fees.

A billable event is charged when: the image downloads successfully AND OCR processing completes (even if no text is found — processing still occurred). Images that fail to download or are invalid are not billed.

VolumeEstimated Cost
1 image$0.0052
50 images (receipt batch)$0.26
500 images (document scan)$2.60
5,000 images (archive)$26.00
50,000 images (enterprise)$125.00

Comparison:

  • Google Vision API: ~$1.50/1K (but requires GCP account and setup)
  • AWS Textract: ~$1.50/1K (requires AWS account, complex setup)
  • OCR.space: $3+/1K (25 languages only, no bounding boxes)
  • This actor: $2.50/1K (no setup, 100+ languages, word-level bounding boxes)

FAQ

What languages are supported?

Over 100 languages via Tesseract's language data files. Common codes: eng (English), fra (French), deu (German), spa (Spanish), ita (Italian), por (Portuguese), chi_sim (Simplified Chinese), chi_tra (Traditional Chinese), jpn (Japanese), kor (Korean), ara (Arabic), hin (Hindi), rus (Russian). Full list available at the Tesseract language data repository.

How accurate is the OCR?

On clear, well-formatted printed text (documents, invoices, books), expect confidence scores above 85% and near-perfect extraction. On photos with text overlaid on complex backgrounds, poor lighting, or significant compression artifacts, accuracy decreases. Enable all preprocessing options for best results. CJK scripts and cursive fonts are harder for Tesseract than Latin scripts.

Can it read handwritten text?

Partially. Tesseract's LSTM engine handles basic block-letter handwriting but is not specialized for cursive or varied handwriting styles. Confidence scores will be lower (often below 50) for handwritten input. For dedicated handwriting recognition, consider Google Vision or Azure Computer Vision.

What image formats are supported?

JPEG, PNG, WebP, TIFF, BMP, and GIF. Animated GIFs are supported — only the first frame is processed. Very large images (over 4,000 pixels on the longest side) are automatically resized to conserve memory.

How do I improve accuracy on blurry images?

Enable all preprocessing: preprocess: true, enhanceContrast: true, binarize: true, denoise: true. Increase scale with scale: 2.0 or scale: 3.0 for very small text. Try ocrEngine: "combined" for the highest accuracy (slowest). If possible, source higher-resolution images before running OCR.

Can I extract text from PDFs?

Not with this actor. PDFs (both digital and scanned) are handled by the companion PDF to Text Extractor actor. If you provide a .pdf URL here, the actor will return a clear error directing you to the correct tool.

What do the confidence scores mean?

Tesseract returns a confidence percentage (0–100) for each recognized word and for the overall document. Above 85 = reliable printed text. 60–85 = acceptable for most documents. 30–60 = degraded image or complex layout. Below 30 = likely handwriting, noise, or very poor quality. Use minConfidence to exclude low-confidence words from output.

Does it detect text orientation automatically?

Yes, when pageSegMode is set to 1 (Auto with OSD). The default mode 3 (fully automatic) handles moderate rotation but may miss severe rotation. For images known to be rotated 90° or 180°, set pageSegMode: 1 explicitly. EXIF-based rotation is also corrected automatically during preprocessing.