PDF Intelligence
Pricing
from $0.01 / 1,000 results
PDF Intelligence
Stop fighting PDFs. Extract text, tables, and insights from any document, scanned or digital. Get RAG-ready chunks for LangChain & LlamaIndex. AI-powered summaries, classification, entity extraction. Use our API keys or bring your own (50% discount). From PDF chaos to clean data in minutes.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

Marielise
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
20 days ago
Last modified
Categories
Share
PDF Intelligence - AI-Powered PDF Analysis, OCR & RAG Preparation
Extract text, tables, and AI insights from any PDF in seconds.
Transform PDFs into structured, actionable data with AI-powered analysis. Extract text with 95%+ accuracy, automatically OCR scanned documents, detect tables with AI precision, and prepare content for RAG workflows.
Quick Start
Get results in 30 seconds:
- Click Start - the default example PDF runs automatically
- View results in the Output tab
- Switch to AI Analysis view for intelligent insights
No configuration needed for basic extraction!
What This Actor Does
Pricing
Transparent pay-per-use pricing. Only pay for what you process.
Base Processing
| Event | Price | Description |
|---|---|---|
| Page Processed | $0.002 | Per PDF page extracted |
| Document Analyzed | $0.01 | Metadata extraction |
| RAG Chunking | $0.02 | Chunk preparation |
AI Features (Require API Key)
| Event | Price | Description |
|---|---|---|
| OCR Page | $0.03 | AI Vision OCR per page |
| AI Table Extraction | $0.015 | Intelligent table detection |
| AI Document Analysis | $0.04 | Full AI analysis |
Pricing Examples
| Use Case | What You Get | Cost |
|---|---|---|
| 10-page PDF text extraction | Text + metadata | ~$0.03 |
| 50-page PDF with AI analysis | Text + AI insights | ~$0.14 |
| RAG preparation (20 pages) | Chunks ready for vectors | ~$0.06 |
| Scanned PDF OCR (5 pages) | OCR text + analysis | ~$0.19 |
How to Use
Option 1: Apify Console (Easiest)
- Enter your PDF URL in the PDF URL field
- Select an action (Extract Text, Extract Tables, etc.)
- Click Start
- View results in the Output tab
Option 2: Apify API
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('cvs/pdf-intelligence').call({pdfUrl: 'https://example.com/document.pdf',action: 'full_analysis'});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items[0]);
Option 3: Direct HTTP API
curl -X POST "https://api.apify.com/v2/acts/cvs~pdf-intelligence/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"pdfUrl": "https://example.com/document.pdf","action": "extract_text"}'
Option 4: Claude Desktop (MCP)
Add to your Claude Desktop config:
{"mcpServers": {"pdf-intelligence": {"url": "https://cvs--pdf-intelligence.apify.actor/mcp","headers": {"Authorization": "Bearer YOUR_APIFY_TOKEN"}}}}
Input Parameters
Basic Settings
| Parameter | Type | Default | Description |
|---|---|---|---|
pdfUrl | string | Example PDF | URL of PDF to process |
pdfContent | string | - | Base64-encoded PDF (alternative to URL) |
action | string | extract_text | What to extract (see below) |
maxPages | integer | 0 (all) | Limit pages to process |
Actions Available
| Action | Description |
|---|---|
extract_text | Get all text content with page markers |
extract_tables | Extract tabular data as JSON/CSV/Markdown |
get_metadata | Document properties (title, author, dates) |
chunk_for_rag | Split into chunks for vector databases |
full_analysis | All of the above combined |
AI Configuration
| Parameter | Type | Description |
|---|---|---|
googleApiKey | string | Google API key for Gemini (recommended) |
openaiApiKey | string | OpenAI key for GPT-4 Vision OCR |
anthropicApiKey | string | Anthropic key for Claude Vision |
preferredAiProvider | string | "auto", "gemini", "openai", or "anthropic" |
Output Format
Example Output
{"success": true,"overview": {"summary": "Technical report on web accessibility guidelines...","documentType": "technical","keyFindings": ["Contains accessibility standards", "Includes implementation examples"],"confidence": "high"},"stats": {"pageCount": 12,"wordCount": 3450,"tableCount": 3,"chunkCount": 15,"processingTimeMs": 2340},"quality": {"score": 92,"issues": [],"recommendations": []},"content": {"text": "Full extracted text...","tables": [...],"metadata": {...}}}
Output Views in Console
| View | What It Shows |
|---|---|
| Summary | AI-generated executive summary |
| AI Analysis | Entities, topics, action items |
| Quality Report | Score, confidence, recommendations |
| Metadata | Title, author, dates, page count |
| Content | Extracted text and tables |
| RAG Chunks | Prepared chunks for vector DBs |
| Full Output | Complete raw JSON |
Use Cases
📄 Invoice Processing
Extract line items, totals, and vendor information automatically.
{"pdfUrl": "https://example.com/invoice.pdf","action": "extract_tables","googleApiKey": "your-key"}
📋 Contract Analysis
Extract key clauses, parties, dates, and obligations from legal documents.
{"pdfUrl": "https://example.com/contract.pdf","action": "full_analysis","googleApiKey": "your-key"}
📚 Research Paper RAG
Chunk academic papers with semantic awareness for better retrieval.
{"pdfUrl": "https://example.com/paper.pdf","action": "chunk_for_rag","chunkSize": 500,"semanticChunking": true,"googleApiKey": "your-key"}
🔍 Scanned Document OCR
Convert scanned PDFs to searchable text.
{"pdfUrl": "https://example.com/scanned.pdf","action": "extract_text","enableOcr": true,"googleApiKey": "your-key"}
FAQ
Limitations
| Limitation | Details |
|---|---|
| Max file size | 50MB |
| Output truncation | Text: 100k chars, Chunks: 50 items (full data in dataset) |
| OCR requirement | Requires AI API key and embedded images in PDF |
| Rate limit | 100 requests/minute per client |
| Memory | 4GB recommended, up to 16GB for large documents |
Error Codes
| Code | Description | Solution |
|---|---|---|
VALIDATION_ERROR | Invalid input | Check parameter types and values |
INVALID_PDF | Corrupted PDF | Ensure PDF is valid and not encrypted |
PROCESSING_ERROR | Runtime error | Retry the request |
RESOURCE_LIMIT | File too large | Use smaller file or increase memory |
RATE_LIMIT_EXCEEDED | Too many requests | Wait and retry |
Technical Details
- Runtime: Node.js 22
- Memory: 4GB default, 16GB max
- PDF Libraries: pdf-parse, pdf-lib
- AI Models: Gemini 2.5 Flash, GPT-4V, Claude Vision
- Protocols: MCP (Model Context Protocol), REST API
Changelog
v3.0.0
- AI Document Analysis with executive summary, entities, and classification
- 7 specialized output views in Apify Console
- Memory-efficient streaming for 100+ page documents
- Gemini 2.5 Flash as default AI provider
v2.1.0
- AI-powered OCR with Vision APIs
- Semantic chunking with AI boundary detection
- Multi-provider AI support (OpenAI, Anthropic, Gemini)
v2.0.0
- Dual operation modes (One Click and BYOK)
- HTTP REST API for external clients
- Pay-per-event pricing model
Support
- Issues: Report bugs on GitHub
- Questions: Contact via Apify Console
- Documentation: This README and input schema tooltips
Built with ❤️ using Apify SDK