Pricing

from $2.00 / 1,000 results

PDF OCR Text Extractor — PDFs & Images to Text, 12+ Languages

Extract text from PDFs and images with OCR in 12+ languages, including word-level detail, form fields, and tables. Send a file, get clean structured text — built for document digitization and data-entry automation.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Fabio Suizu

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

OCR & PDF Text Extractor

Extract text from images and PDFs with OCR. Support for 12+ languages, form extraction, and table detection. Powered by Azure AI.

Features

Fast Processing: Lightning-fast ocr & pdf text extractor powered by Azure
Reliable: 99.9% uptime with automatic failover
Scalable: Handle single requests or bulk operations
Secure: Enterprise-grade security with API key authentication
Well Documented: Comprehensive API documentation and examples

Use Cases

E-commerce: Process product images at scale
Media: Automate image processing pipelines
Apps: Add image processing to your applications

Input Parameters

Parameter	Type	Required	Description
`fileUrl`	string	No	URL to download image or PDF
`fileUrls`	array	No	Array of URLs for bulk extraction
`language`	string	No	OCR language code
`backend`	string	No	OCR engine to use
`extractForms`	boolean	No	Extract form fields (key-value pairs)
`mode`	string	No	Extraction mode

Output Format

{
  "success": true,
  "result": { ... },
  "timestamp": "2026-01-07T00:00:00Z"
}

Code Examples

JavaScript (Node.js)

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const input = {
  "fileUrl": "example_fileUrl",
  "fileUrls": [],
  "language": "eng",
  "backend": "auto",
  "extractForms": false,
  "mode": "single"
};

const run = await client.actor("vivid_astronaut/ocr-api").call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run_input = {
  "fileUrl": "example_fileUrl",
  "fileUrls": [],
  "language": "eng",
  "backend": "auto",
  "extractForms": false,
  "mode": "single"
}

run = client.actor("vivid_astronaut/ocr-api").call(run_input=run_input)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

cURL

curl -X POST "https://api.apify.com/v2/acts/vivid_astronaut~ocr-api/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
  "fileUrl": "example_fileUrl",
  "fileUrls": [],
  "language": "eng",
  "backend": "auto",
  "extractForms": false,
  "mode": "single"
}'

Pricing

Model: Pay per result Price: $0.020 per result

You only pay for successful results. Platform usage costs are included.

API Documentation

Full API documentation is available at:

Support

Issues: Report bugs via Apify Console
Documentation: Apify Docs
Community: Apify Discord

Version History

See ./CHANGELOG.md for version history.

Powered by Azure Cloud Infrastructure

Image OCR — Extract Text from Images

zenomastro/image-ocr-text-extractor

Extract text from images, photos, screenshots and scans using Tesseract OCR. Returns recognized text with a confidence score. Supports many languages.

Rosario Vitale

Image to Text OCR — Extract Text from Images

junipr/image-to-text

Extract text from images with OCR, confidence scores, language options, page/image metadata, and automation-ready text exports.

junipr

PDF OCR Tool — Scanned PDF Text Extraction

junipr/pdf-ocr-tool

Run OCR on scanned PDFs and image-based documents. Extract text by page with language options, confidence scores, and searchable text exports.

junipr

OCR Structured Extractor (AI) — Image/PDF → OCR Text + JSON

macheta/ocr-structured-extractor

Extract OCR text and structured JSON from an image or PDF URL. Great for invoices, receipts, forms, IDs, and tables. Powered by Gemini 3 Pro.

Anass

OCR Text Extraction API — Image to Text

vivid_astronaut/ocr

Extract text from images with OCR in multiple languages and image formats. Send an image, get the recognized text back — ideal for digitizing documents, receipts, screenshots, and scans.

Fabio Suizu

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

The Howlers

Bulk Pdf To Json OCR

gagandeo/bulk-pdf-to-json-ocr

Convert PDF invoices, menus, images with text and documents into structured JSON. Features hybrid Digital+OCR parsing and AI-powered data extraction.

Kumar Gagandeo

Receipt OCR API

happitap/receipt-ocr-api

Receipt OCR API - Multi-Model Text Extraction : Extract structured data from receipt images using advanced OCR technology with support for multiple AI models including Google Vision, OpenAI, Azure, AWS Textract, Gemini, Hugging Face, DeepSeek, and Native OCR.