PDF OCR API - Document Extraction
Pricing
from $0.01 / 1,000 results
PDF OCR API - Document Extraction
Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

John Rippy
Actor stats
0
Bookmarked
6
Total users
2
Monthly active users
13 days ago
Last modified
Categories
Share
PDF OCR API
API Integration
This actor connects to an external API service. You'll need valid API credentials from the service provider.
API Integration
This actor connects to an external API service. You'll need valid API credentials from the service provider.
Extract text from PDF files using OCR. Supports scanned documents, images, and multi-page PDFs. Returns structured text with page numbers and confidence scores. Built by John Rippy (https://www.linkedin.com/in/johnrippy/ | https://johnrippy.link/).
Features
- Direct API integration
- Structured JSON output
- Error handling and retries
- Pay-per-event billing
Quick Start
{"input": "your input here"}
Demo Mode
Set demoMode: true to test with sample data (no charges). When you're ready for real results, set demoMode: false or omit it.
{"demoMode": true,...}
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
pdfUrl | string | Yes* | URL of the PDF file to process |
pdfBase64 | string | Yes* | Base64-encoded PDF (alternative to URL) |
language | string | No | OCR language hint (default: eng) |
pageRange | string | No | Pages to process (e.g., "1-5" or "1,3,5") |
outputFormat | string | No | Output format: text, json, markdown |
detectTables | boolean | No | Attempt to preserve table structure |
webhookUrl | string | No | Webhook URL for async results |
demoMode | boolean | No | Return sample output without processing |
*Either pdfUrl or pdfBase64 is required
Output Format
{"success": true,"fileName": "document.pdf","totalPages": 5,"processedPages": 5,"language": "eng","processingTime": 2.3,"pages": [{"pageNumber": 1,"text": "This is the extracted text from page 1...","confidence": 95.2,"wordCount": 342,"hasImages": true,"tables": [{"rows": 5,"columns": 3,"data": [["Header1", "Header2", "Header3"], ...]}]}],"fullText": "Complete document text concatenated...","wordCount": 1250,"averageConfidence": 94.5}
Pricing
This actor uses pay-per-event billing:
data_point: $0.01 per result
Use Cases
Document Digitization
- Archive processing: Make historical documents searchable
- Paper to digital: Convert scanned documents to text
- Record keeping: Digitize contracts, invoices, receipts
Data Extraction
- Invoice processing: Extract line items, totals, dates
- Form processing: Pull data from scanned forms
- Contract analysis: Extract key terms and clauses
Research & Academia
- Academic papers: Extract text from PDF research papers
- Book scanning: Digitize book chapters and pages
- Citation extraction: Pull references from documents
Legal & Compliance
- Legal discovery: Process large document sets
- Contract review: Extract text for analysis
- Compliance audits: Digitize paper records
Developers
- API integration: RESTful JSON responses
- Webhook support: Async processing for large documents
- Multiple formats: Text, JSON, or Markdown output
Common Problems & Solutions
"Invalid API key" error
Cause: Your API key is wrong, expired, or doesn't have the right permissions. Fix: Double-check your API key. Make sure you copied it exactly without extra spaces.
"Rate limit exceeded" error
Cause: You've hit the API's rate limits. Fix: Wait a few minutes, then try again. Consider reducing the number of concurrent requests.
Empty or incomplete results
Cause: The target may have anti-scraping protection or the data doesn't exist. Fix:
- Check if the URL/search query is correct
- Try with different parameters
- Some sites may block automated access
Demo data showing instead of real results
Cause: demoMode is still set to true.
Fix: Set demoMode: false and provide your API key(s).
Built by John Rippy | Actor Arsenal