Pricing

from $0.01 / 1,000 results

PDF OCR API - Document Extraction

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

The Howlers

Actor stats

Bookmarked

Total users

Monthly active users

34 minutes ago

Last modified

PDF OCR API

API Integration

This actor connects to an external API service. You'll need valid API credentials from the service provider.

API Integration

This actor connects to an external API service. You'll need valid API credentials from the service provider.

Extract text from PDF files using OCR. Supports scanned documents, images, and multi-page PDFs. Returns structured text with page numbers and confidence scores. Built by John Rippy (https://www.linkedin.com/in/johnrippy/ | https://johnrippy.link/).

Features

Direct API integration
Structured JSON output
Error handling and retries
Pay-per-event billing

Quick Start

{
  "input": "your input here"
}

Demo Mode

Set demoMode: true to test with sample data (no charges). When you're ready for real results, set demoMode: false or omit it.

{
  "demoMode": true,
  ...
}

Input Parameters

Parameter	Type	Required	Description
`pdfUrl`	string	Yes*	URL of the PDF file to process
`pdfBase64`	string	Yes*	Base64-encoded PDF (alternative to URL)
`language`	string	No	OCR language hint (default: eng)
`pageRange`	string	No	Pages to process (e.g., "1-5" or "1,3,5")
`outputFormat`	string	No	Output format: text, json, markdown
`detectTables`	boolean	No	Attempt to preserve table structure
`webhookUrl`	string	No	Webhook URL for async results
`demoMode`	boolean	No	Return sample output without processing

*Either pdfUrl or pdfBase64 is required

Output Format

{
  "success": true,
  "fileName": "document.pdf",
  "totalPages": 5,
  "processedPages": 5,
  "language": "eng",
  "processingTime": 2.3,
  "pages": [
    {
      "pageNumber": 1,
      "text": "This is the extracted text from page 1...",
      "confidence": 95.2,
      "wordCount": 342,
      "hasImages": true,
      "tables": [
        {
          "rows": 5,
          "columns": 3,
          "data": [["Header1", "Header2", "Header3"], ...]
        }
      ]
    }
  ],
  "fullText": "Complete document text concatenated...",
  "wordCount": 1250,
  "averageConfidence": 94.5
}

Pricing

This actor uses pay-per-event billing:

data_point: $0.01 per result

Use Cases

Document Digitization

Archive processing: Make historical documents searchable
Paper to digital: Convert scanned documents to text
Record keeping: Digitize contracts, invoices, receipts

Data Extraction

Invoice processing: Extract line items, totals, dates
Form processing: Pull data from scanned forms
Contract analysis: Extract key terms and clauses

Research & Academia

Academic papers: Extract text from PDF research papers
Book scanning: Digitize book chapters and pages
Citation extraction: Pull references from documents

Legal & Compliance

Legal discovery: Process large document sets
Contract review: Extract text for analysis
Compliance audits: Digitize paper records

Developers

API integration: RESTful JSON responses
Webhook support: Async processing for large documents
Multiple formats: Text, JSON, or Markdown output

Common Problems & Solutions

"Invalid API key" error

Cause: Your API key is wrong, expired, or doesn't have the right permissions. Fix: Double-check your API key. Make sure you copied it exactly without extra spaces.

"Rate limit exceeded" error

Cause: You've hit the API's rate limits. Fix: Wait a few minutes, then try again. Consider reducing the number of concurrent requests.

Empty or incomplete results

Cause: The target may have anti-scraping protection or the data doesn't exist. Fix:

Check if the URL/search query is correct
Try with different parameters
Some sites may block automated access

Demo data showing instead of real results

Cause: demoMode is still set to true. Fix: Set demoMode: false and provide your API key(s).

Built by John Rippy | Actor Arsenal

OCR Structured Extractor (AI) — Image/PDF → OCR Text + JSON

macheta/ocr-structured-extractor

Extract OCR text and structured JSON from an image or PDF URL. Great for invoices, receipts, forms, IDs, and tables. Powered by Gemini 3 Pro.

Anass

Bulk Pdf To Json OCR

gagandeo/bulk-pdf-to-json-ocr

Convert PDF invoices, menus, images with text and documents into structured JSON. Features hybrid Digital+OCR parsing and AI-powered data extraction.

Kumar Gagandeo

Elite Document Ocr Lite

thepattyroller/elite-document-ocr-lite

Basic document text extraction and processing. Extract text from documents, analyze document structure, and extract structured data from invoices and receipts. Perfect for document automation workflows.

Logan Kiser

Ocr

vivid_astronaut/ocr

Extract text from images using advanced OCR technology. Supports multiple languages and image formats. Perfect for digitizing documents, receipts, screenshots, and scanned text.

Fabio Suizu

Ocr Pdf Extractor

vivid_astronaut/ocr-pdf-extractor

Extract text from images and PDFs using OCR. Supports multiple languages including English, Portuguese, Spanish, French, German. Uses Tesseract OCR engine with high accuracy text extraction and word-level confidence scores.

Fabio Suizu

Document Extractor API - AI-Powered PDF & Text Analysis

fresh_cliff/document-extractor-api

Extract text and data from PDF, Word, and image documents using AI-powered OCR. Convert documents to structured JSON, analyze content, and extract insights. No API keys required with mirror fallbacks.

Brennan Crawford

Receipt OCR API

happitap/receipt-ocr-api

Receipt OCR API - Multi-Model Text Extraction : Extract structured data from receipt images using advanced OCR technology with support for multiple AI models including Google Vision, OpenAI, Azure, AWS Textract, Gemini, Hugging Face, DeepSeek, and Native OCR.

HappiTap

5.0

Vision OCR MCP

accelerationengg/vision-ocr-mcp

Extract text from images instantly. Turn receipts, invoices, documents, and handwritten notes into structured data.

Acceleration

5.0

PDF to Markdown Converter

web.harvester/pdf-to-markdown-converter

Convert PDFs to clean Markdown with optional OCR for scanned documents. Uses PDF.js for text extraction and Tesseract.js for optical character recognition.

Web Harvester

PDF to Markdown Converter - AI-Powered with OCR & Tables

clearpath/pdf-to-markdown-api

Convert PDFs to clean Markdown with GPU-accelerated AI. Extracts tables, LaTeX formulas, and images from complex layouts. Supports OCR for scanned docs in 8 languages. Batch process hundreds of PDFs in parallel via URL, upload, or API.