Image to Text (OCR)
Pricing
$5.20 / 1,000 image ocrs
Go to Apify Store
Image to Text (OCR)
Extract text from images using Tesseract.js OCR engine. Supports 100+ languages, PDFs, and bulk image processing.
Image to Text (OCR)
Pricing
$5.20 / 1,000 image ocrs
Extract text from images using Tesseract.js OCR engine. Supports 100+ languages, PDFs, and bulk image processing.
List of images to process. Each entry must have either a 'url' (image URL) or 'kvStoreKey' (key in the actor's key-value store). Minimum 1, maximum 500.
[ { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Camponotus_flavomarginatus_ant.jpg/400px-Camponotus_flavomarginatus_ant.jpg" }]Tesseract language code for OCR. Common codes: 'eng' (English), 'fra' (French), 'deu' (German), 'spa' (Spanish), 'chi_sim' (Simplified Chinese), 'jpn' (Japanese), 'kor' (Korean), 'ara' (Arabic). Combine multiple: 'eng+fra'. Full list at https://tesseract-ocr.github.io/tessdoc/Data-Files.
OCR engine mode. 'lstm' uses the neural network engine for best accuracy. 'legacy' uses the traditional engine — faster but less accurate. 'combined' uses both for highest accuracy but slowest speed.
Controls how Tesseract segments the image. 3 (auto) works for most images. Use 6 for single text blocks, 7 for single lines, 8 for single words, 11 for sparse text, 1 for auto with orientation detection (good for rotated text).
Only recognize these characters. Useful for extracting specific data types. Example: '0123456789.$,' for receipt amounts only. Leave empty to recognize all characters.
Never output these characters. Ignored if a whitelist is also specified.
Enable automatic image preprocessing (deskew, contrast enhancement, binarization) for improved OCR accuracy. Recommended for most images, especially scanned documents.
Correct image rotation based on EXIF orientation data. Only applies when preprocessing is enabled.
Normalize image histogram to enhance contrast for faded or low-contrast text. Only applies when preprocessing is enabled.
Convert image to black and white using thresholding. Improves OCR on images with complex backgrounds. Only applies when preprocessing is enabled.
Scale the image before OCR. 2.0 doubles the size (improves accuracy on small text). Leave empty for auto-scaling (doubles if image width < 800px). Min: 0.5, Max: 4.0.
Apply median denoising filter for noisy or scanned images. Adds extra processing time. Recommended for low-quality scans.
Detail level of OCR output. 'text' returns plain extracted text only. 'lines' adds line-level confidence scores. 'words' adds word-level positions and confidence. 'full' includes complete structure with bounding boxes for blocks, paragraphs, lines, and words.
Minimum OCR confidence threshold (0-100). Words with confidence below this value are excluded from output. 0 includes all results. 70 keeps only high-confidence text.
Include the raw hOCR XML output from Tesseract in the 'hocr' field. hOCR contains full positional data in standard XML format for advanced processing. Only available with outputLevel 'full'.
Collapse multiple consecutive spaces and newlines into single characters. Cleans up common OCR artifacts for cleaner output.
Maximum number of images processed simultaneously. Tesseract is CPU-intensive — keep this low (2-3) to avoid memory issues on large batches. Min: 1, Max: 5.
Timeout in milliseconds for downloading each source image. Increase for slow servers or large images. Min: 5000, Max: 120000.