Pricing

$6.50 / 1,000 image ocrs

Image to Text OCR — Extract Text from Images

Extract text from images with OCR, confidence scores, language options, page/image metadata, and automation-ready text exports.

Pricing

$6.50 / 1,000 image ocrs

Rating

0.0

(0)

Developer

junipr

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Categories

Automation

Developer tools

Images

images

Required

List of images to process. Each entry must have either a 'url' (image URL) or 'kvStoreKey' (key in the actor's key-value store). Minimum 1, maximum 500.

Type:array

Default:

[
  {
    "url": "https://placehold.co/1200x320/FFFFFF/000000/png?text=JUNIPR+OCR+READY"
  }
]

Language

language

Optional

Tesseract language code for OCR. Common codes: 'eng' (English), 'fra' (French), 'deu' (German), 'spa' (Spanish), 'chi_sim' (Simplified Chinese), 'jpn' (Japanese), 'kor' (Korean), 'ara' (Arabic). Combine multiple: 'eng+fra'. Full list at https://tesseract-ocr.github.io/tessdoc/Data-Files.

Type:string

Default:eng

OCR Engine

ocrEngine

Optional

OCR engine mode. 'lstm' uses the neural network engine for best accuracy. 'legacy' uses the traditional engine — faster but less accurate. 'combined' uses both for highest accuracy but slowest speed.

Type:string

Default:lstm

Options:

lstmlegacycombined

Page Segmentation Mode

pageSegMode

Optional

Controls how Tesseract segments the image. 3 (auto) works for most images. Use 6 for single text blocks, 7 for single lines, 8 for single words, 11 for sparse text, 1 for auto with orientation detection (good for rotated text).

Type:integer

Minimum:0

Maximum:13

Default:3

Character Whitelist

whitelist

Optional

Only recognize these characters. Useful for extracting specific data types. Example: '0123456789.$,' for receipt amounts only. Leave empty to recognize all characters.

Type:string

Character Blacklist

blacklist

Optional

Never output these characters. Ignored if a whitelist is also specified.

Type:string

Enable Preprocessing

preprocess

Optional

Enable automatic image preprocessing (deskew, contrast enhancement, binarization) for improved OCR accuracy. Recommended for most images, especially scanned documents.

Type:boolean

Default:true

Deskew

deskew

Optional

Correct image rotation based on EXIF orientation data. Only applies when preprocessing is enabled.

Type:boolean

Default:true

Enhance Contrast

enhanceContrast

Optional

Normalize image histogram to enhance contrast for faded or low-contrast text. Only applies when preprocessing is enabled.

Type:boolean

Default:true

Binarize

binarize

Optional

Convert image to black and white using thresholding. Improves OCR on images with complex backgrounds. Only applies when preprocessing is enabled.

Type:boolean

Default:true

Scale Factor

scale

Optional

Scale the image before OCR. 2.0 doubles the size (improves accuracy on small text). Leave empty for auto-scaling (doubles if image width < 800px). Min: 0.5, Max: 4.0.

Type:number

Minimum:0.5

Maximum:4

Denoise

denoise

Optional

Apply median denoising filter for noisy or scanned images. Adds extra processing time. Recommended for low-quality scans.

Type:boolean

Default:false

Output Level

outputLevel

Optional

Detail level of OCR output. 'text' returns plain extracted text only. 'lines' adds line-level confidence scores. 'words' adds word-level positions and confidence. 'full' includes complete structure with bounding boxes for blocks, paragraphs, lines, and words.

Type:string

Default:text

Options:

textlineswordsfull

Minimum Confidence

minConfidence

Optional

Minimum OCR confidence threshold (0-100). Words with confidence below this value are excluded from output. 0 includes all results. 70 keeps only high-confidence text.

Type:integer

Minimum:0

Maximum:100

Default:0

Include Raw hOCR

includeRawHocr

Optional

Include the raw hOCR XML output from Tesseract in the 'hocr' field. hOCR contains full positional data in standard XML format for advanced processing. Only available with outputLevel 'full'.

Type:boolean

Default:false

Strip Extra Whitespace

stripExtraWhitespace

Optional

Collapse multiple consecutive spaces and newlines into single characters. Cleans up common OCR artifacts for cleaner output.

Type:boolean

Default:true

Max Concurrency

maxConcurrency

Optional

Maximum number of images processed simultaneously. Tesseract is CPU-intensive — keep this low (2-3) to avoid memory issues on large batches. Min: 1, Max: 5.

Type:integer

Minimum:1

Maximum:5

Default:1

Image Download Timeout (ms)

imageTimeout

Optional

Timeout in milliseconds for downloading each source image. Increase for slow servers or large images. Min: 5000, Max: 120000.

Type:integer

Minimum:5000

Maximum:120000

Default:30000

OCR Timeout (ms)

ocrTimeout

Optional

Timeout in milliseconds for OCR processing per image. Increase for very complex, high-resolution images. Min: 10000, Max: 300000.

Type:integer

Minimum:10000

Maximum:300000

Default:60000

Google Lens OCR API - Image to Text Under 500ms REST API

zen-studio/google-lens-ocr

Extract text from any image via Google Lens OCR API. Under 500ms per image, no browser needed. Returns word-level bounding boxes with pixel coordinates, detected language, and structured paragraphs/lines/words. Batch and HTTP API modes.

Zen Studio

346

Video Subtitle & Caption Extractor

khadinakbar/video-subtitle-extractor

Extract subtitles, captions, and AI transcripts from any video URL across 1000+ platforms (YouTube, Vimeo, TikTok, Instagram, X/Twitter, Facebook, Twitch, TED, Bilibili). Native captions first, Whisper AI fallback when none. JSON, SRT, VTT, text, or LLM-ready markdown. MCP/API-ready.

Khadin Akbar

Github Email Scraper

louisdeconinck/github-email-scraper

Instantly extract contributor emails and detailed profiles from any public GitHub repository or organization to supercharge your developer outreach and recruiting.

Louis Deconinck

119

1.0

GitHub API Scraper: Repos & Profiles

andok/github-api-scraper

Extract GitHub repository stats, forks, stars, and user profiles directly from the API. Perfect for developer lead gen and competitor tracking.

Andok

Leetcode API

taneja/leetcode-api-scraper

Scrape LeetCode user profiles, problems, contests, and discussions. Get user stats, solved problems, contest ratings, submission history, activity heatmaps, daily challenges, and more - all without authentication.

Raghav Taneja

Github Profile Scraper

saswave/github-profile-scraper

GitHub User Profile Scraper. Extracts data from GitHub profiles, including followers, following, LinkedIn, Twitter, achievements and much more. Ideal for developers, researchers, and marketers. From a list of Github profile or a repository stargazers link

SASWAVE

242

Reddit Scraper - Posts, Comments, Subreddits & Users

makework36/reddit-scraper

Fast, reliable Reddit scraper. Extract posts, comments, subreddits & users from any subreddit without Reddit API keys or login. AI-ready JSON for LLM training, sentiment analysis, lead generation. Export JSON/CSV/Excel.

deusex machine

132

Kick Video Downloader ✅ | No proxy needed

x_guru/kick-video-downloader

Download Kick.com videos, VODs, clips and streams. No proxy needed. Save to Apify storage or your own cloud (AWS S3, Azure, Google Cloud).

Hundevmode Labs

217

Twitch Email Scraper

scraper-mind/twitch-email-scraper

Boost your outreach with the Twitch Email Scraper—fast, accurate & affordable! Extract Twitch emails by keywords & location. Perfect for marketers & businesses.

Scraper Mind

247

Reddit Scraper - Posts, Comments, Subreddits, Search

thirdwatch/reddit-scraper

Scrape Reddit posts, comments, and subreddits. Search globally or within specific subreddits. Get post title, body, score, comments, author, flair, awards, and media URLs. Ultra-fast HTTP-only scraper using Reddit's built-in JSON API.