Pdf OCR API
Pricing
from $0.01 / 1,000 results
Go to Apify Store

Pdf OCR API
Extract and convert text from PDF documents using advanced optical character recognition technology with support for multiple AI models.
Pricing
from $0.01 / 1,000 results
Rating
5.0
(3)
Developer

csp
Maintained by Community
Actor stats
2
Bookmarked
20
Total users
4
Monthly active users
2 days ago
Last modified
Categories
Share
PDF OCR API - Multi-Model Text Extraction
Extract and convert text from PDF documents using advanced optical character recognition technology with support for multiple AI models.
π Features
Multi-Model OCR Support
Choose from 8 different OCR engines based on your needs:
- Google Vision API - High accuracy commercial OCR with excellent language support
- DeepSeek OCR - Advanced AI-powered text extraction
- Amazon Textract - AWS-powered document analysis optimized for PDFs
- Azure AI Vision - Microsoft's computer vision OCR service
- OpenAI GPT-4 Vision - State-of-the-art multimodal AI for complex documents
- Hugging Face - Open-source transformer models for text extraction
- Google Gemini - Latest Google multimodal AI technology
- Native (Tesseract.js) - Free, no API key required, runs entirely in-container
Document Processing Features
- β Batch Processing - Process multiple PDFs simultaneously
- β Multi-Language Support - English, Spanish, French, German, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Danish
- β Structure Preservation - Maintain document layout and formatting
- β Page Range Selection - Process specific pages or page ranges
- β Multiple Output Formats - JSON, Plain Text, or Markdown
- β High Resolution - 300 DPI conversion for optimal OCR accuracy
- β Metadata Extraction - Extract PDF metadata (title, author, dates)
- β Pay-Per-Page Pricing - Fair billing based on actual pages processed (see ./BILLING.md)
π Input Parameters
Required
ocrModel- OCR model to use (default: "native")pdfUrls- Array of PDF document URLs to process
Optional
language- Document language (default: "eng")preserveFormatting- Maintain document structure (default: true)extractImages- Extract images from PDF (default: false)outputFormat- Output format: "json", "text", or "markdown" (default: "json")pageRange- Pages to process: "all", "1-5", "1,3,5" (default: "all")
API Keys (model-specific)
googleVisionApiKey- For Google Vision APIdeepseekApiKey- For DeepSeek OCRawsAccessKeyId,awsSecretAccessKey,awsRegion- For Amazon TextractazureEndpoint,azureApiKey- For Azure AI VisionopenaiApiKey- For OpenAI GPT-4 VisionhuggingfaceApiKey- For Hugging Face modelsgeminiApiKey- For Google Gemini
π Quick Start
Example Input (Native OCR - No API Key Required)
{"ocrModel": "native","pdfUrls": ["https://example.com/document.pdf"],"language": "eng","outputFormat": "json","pageRange": "all"}
Example with Google Vision API
{"ocrModel": "google-vision","googleVisionApiKey": "YOUR_API_KEY","pdfUrls": ["https://example.com/document.pdf","https://example.com/another-document.pdf"],"language": "eng","preserveFormatting": true,"outputFormat": "markdown"}
Process Specific Pages
{"ocrModel": "native","pdfUrls": ["https://example.com/large-document.pdf"],"pageRange": "1-5,10,15-20","outputFormat": "text"}
π€ Output Format
JSON Output (default)
{"pdfUrl": "https://example.com/document.pdf","fileName": "document.pdf","ocrModel": "native","language": "eng","success": true,"extractedAt": "2024-11-04T10:30:00.000Z","pageCount": 5,"totalCharacters": 12450,"averageConfidence": 0.94,"pages": [{"pageNumber": 1,"text": "Page 1 content...","confidence": 0.95,"width": 2480,"height": 3508}],"fullText": "Complete document text..."}
Text Output
{"output": "Complete document text as plain string...","pages": [{"pageNumber": 1,"text": "Page 1 content..."}]}
Markdown Output
{"output": "# document.pdf\n\n**Pages:** 5\n\n## Page 1\n\nContent...","pages": [{"pageNumber": 1,"markdown": "## Page 1\n\nContent..."}]}
π‘ Use Cases
Business & Legal
- Contract analysis and digitization
- Legal document processing
- Invoice and receipt extraction
- Compliance document archiving
Academic & Research
- Research paper text extraction
- Academic document digitization
- Literature review automation
- Citation extraction
Content & Publishing
- Book digitization
- Magazine and newspaper archiving
- Historical document preservation
- Content migration projects
Development & Integration
- Document management systems
- Search and indexing pipelines
- Data extraction workflows
- Archive digitization projects
π§ Supported Languages
- English (eng)
- Spanish (spa)
- French (fra)
- German (deu)
- Italian (ita)
- Portuguese (por)
- Russian (rus)
- Chinese Simplified (chi_sim)
- Japanese (jpn)
- Korean (kor)
- Arabic (ara)
π Model Comparison
| Model | Speed | Accuracy | Cost | Best For |
|---|---|---|---|---|
| Native (Tesseract) | β‘β‘β‘ | 85% | Free | Testing, simple docs |
| Google Vision | β‘β‘ | 95% | $$ | Production, multi-language |
| Amazon Textract | β‘β‘ | 96% | $$ | Forms, tables, structured docs |
| Azure Vision | β‘β‘ | 94% | $$ | Enterprise integration |
| OpenAI GPT-4 | β‘ | 94% | $$$ | Complex layouts, handwriting |
| Gemini | β‘β‘ | 93% | $$ | Modern documents |
π― Best Practices
For Optimal Results
- Use high-quality PDF sources (not scanned at low resolution)
- Select the appropriate language setting
- Use premium models for complex layouts or handwriting
- Process pages in batches for large documents
- Enable formatting preservation for structured documents
Performance Tips
- Use page ranges to process only needed pages
- Batch multiple PDFs in a single run
- Choose Native OCR for simple, clear documents
- Use premium models only when necessary
Cost Optimization
- Start with Native OCR for testing
- Use page ranges to avoid processing unnecessary pages
- Batch process to reduce overhead
- Monitor API costs for premium models
π Performance
- Processing Speed: 5-30 seconds per page (varies by model)
- Concurrent Processing: Up to 10 PDFs simultaneously
- Maximum File Size: 100MB per PDF
- Supported Formats: PDF (any version)
- Resolution: 300 DPI conversion
π° Pricing
This actor uses pay-per-event pricing:
- $0.01 per PDF processed successfully (configurable)
- Failed PDFs are not charged
- Events tracked:
pdf_processed
π Support
For issues, questions, or feature requests:
- Check the Apify documentation
- Review the input schema for parameter details
- Ensure API keys are valid and have sufficient quota
- Verify PDF files are accessible and not corrupted
π Version History
v1.0
- Initial release
- Support for 8 OCR models
- Multi-language support (12 languages)
- Batch processing capabilities
- Multiple output formats (JSON, Text, Markdown)
- Page range selection
- Structure preservation
- Pay-per-event pricing
π Related Actors
- Receipt OCR API - Specialized for receipt processing
- Invoice OCR API - Optimized for invoice extraction
- Form OCR API - Structured form data extraction
π Links
Transform your PDF documents into searchable, structured data! πβ¨