Pricing

from $0.01 / 1,000 results

PDF Intelligence

Stop fighting PDFs. Extract text, tables, and insights from any document, scanned or digital. Get RAG-ready chunks for LangChain & LlamaIndex. AI-powered summaries, classification, entity extraction. Use our API keys or bring your own (50% discount). From PDF chaos to clean data in minutes.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Marielise

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

PDF Intelligence - AI-Powered PDF Analysis, OCR & RAG Preparation

Extract text, tables, and AI insights from any PDF in seconds.

Transform PDFs into structured, actionable data with AI-powered analysis. Extract text with 95%+ accuracy, automatically OCR scanned documents, detect tables with AI precision, and prepare content for RAG workflows.

Quick Start

Get results in 30 seconds:

Click Start - the default example PDF runs automatically
View results in the Output tab
Switch to AI Analysis view for intelligent insights

No configuration needed for basic extraction!

What This Actor Does

Core Features

Text Extraction - Clean text from any PDF
AI-Powered OCR - Convert scanned PDFs to text
Table Detection - Extract structured table data
RAG Chunking - Split for vector databases
AI Analysis - Summary, entities, classification

Output Includes

Executive summary of document
Document type classification
Named entity extraction (people, orgs, dates)
Key topics and themes
Action items and recommendations
Quality score and confidence level

Pricing

Transparent pay-per-use pricing. Only pay for what you process.

Base Processing

Event	Price	Description
Page Processed	$0.002	Per PDF page extracted
Document Analyzed	$0.01	Metadata extraction
RAG Chunking	$0.02	Chunk preparation

AI Features (Require API Key)

Event	Price	Description
OCR Page	$0.03	AI Vision OCR per page
AI Table Extraction	$0.015	Intelligent table detection
AI Document Analysis	$0.04	Full AI analysis

Pricing Examples

Use Case	What You Get	Cost
10-page PDF text extraction	Text + metadata	~$0.03
50-page PDF with AI analysis	Text + AI insights	~$0.14
RAG preparation (20 pages)	Chunks ready for vectors	~$0.06
Scanned PDF OCR (5 pages)	OCR text + analysis	~$0.19

How to Use

Option 1: Apify Console (Easiest)

Enter your PDF URL in the PDF URL field
Select an action (Extract Text, Extract Tables, etc.)
Click Start
View results in the Output tab

Option 2: Apify API

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('cvs/pdf-intelligence').call({
    pdfUrl: 'https://example.com/document.pdf',
    action: 'full_analysis'
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0]);

Option 3: Direct HTTP API

curl -X POST "https://api.apify.com/v2/acts/cvs~pdf-intelligence/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "pdfUrl": "https://example.com/document.pdf",
    "action": "extract_text"
  }'

Option 4: Claude Desktop (MCP)

Add to your Claude Desktop config:

{
  "mcpServers": {
    "pdf-intelligence": {
      "url": "https://cvs--pdf-intelligence.apify.actor/mcp",
      "headers": {
        "Authorization": "Bearer YOUR_APIFY_TOKEN"
      }
    }
  }
}

Input Parameters

Basic Settings

Parameter	Type	Default	Description
`pdfUrl`	string	Example PDF	URL of PDF to process
`pdfContent`	string	-	Base64-encoded PDF (alternative to URL)
`action`	string	extract_text	What to extract (see below)
`maxPages`	integer	0 (all)	Limit pages to process

Actions Available

Action	Description
`extract_text`	Get all text content with page markers
`extract_tables`	Extract tabular data as JSON/CSV/Markdown
`get_metadata`	Document properties (title, author, dates)
`chunk_for_rag`	Split into chunks for vector databases
`full_analysis`	All of the above combined

AI Configuration

Parameter	Type	Description
`googleApiKey`	string	Google API key for Gemini (recommended)
`openaiApiKey`	string	OpenAI key for GPT-4 Vision OCR
`anthropicApiKey`	string	Anthropic key for Claude Vision
`preferredAiProvider`	string	"auto", "gemini", "openai", or "anthropic"

Output Format

Example Output

{
  "success": true,
  "overview": {
    "summary": "Technical report on web accessibility guidelines...",
    "documentType": "technical",
    "keyFindings": ["Contains accessibility standards", "Includes implementation examples"],
    "confidence": "high"
  },
  "stats": {
    "pageCount": 12,
    "wordCount": 3450,
    "tableCount": 3,
    "chunkCount": 15,
    "processingTimeMs": 2340
  },
  "quality": {
    "score": 92,
    "issues": [],
    "recommendations": []
  },
  "content": {
    "text": "Full extracted text...",
    "tables": [...],
    "metadata": {...}
  }
}

Output Views in Console

View	What It Shows
Summary	AI-generated executive summary
AI Analysis	Entities, topics, action items
Quality Report	Score, confidence, recommendations
Metadata	Title, author, dates, page count
Content	Extracted text and tables
RAG Chunks	Prepared chunks for vector DBs
Full Output	Complete raw JSON

Use Cases

📄 Invoice Processing

Extract line items, totals, and vendor information automatically.

{
  "pdfUrl": "https://example.com/invoice.pdf",
  "action": "extract_tables",
  "googleApiKey": "your-key"
}

📋 Contract Analysis

Extract key clauses, parties, dates, and obligations from legal documents.

{
  "pdfUrl": "https://example.com/contract.pdf",
  "action": "full_analysis",
  "googleApiKey": "your-key"
}

📚 Research Paper RAG

Chunk academic papers with semantic awareness for better retrieval.

{
  "pdfUrl": "https://example.com/paper.pdf",
  "action": "chunk_for_rag",
  "chunkSize": 500,
  "semanticChunking": true,
  "googleApiKey": "your-key"
}

🔍 Scanned Document OCR

Convert scanned PDFs to searchable text.

{
  "pdfUrl": "https://example.com/scanned.pdf",
  "action": "extract_text",
  "enableOcr": true,
  "googleApiKey": "your-key"
}

FAQ

Limitations

Limitation	Details
Max file size	50MB
Output truncation	Text: 100k chars, Chunks: 50 items (full data in dataset)
OCR requirement	Requires AI API key and embedded images in PDF
Rate limit	100 requests/minute per client
Memory	4GB recommended, up to 16GB for large documents

Error Codes

Code	Description	Solution
`VALIDATION_ERROR`	Invalid input	Check parameter types and values
`INVALID_PDF`	Corrupted PDF	Ensure PDF is valid and not encrypted
`PROCESSING_ERROR`	Runtime error	Retry the request
`RESOURCE_LIMIT`	File too large	Use smaller file or increase memory
`RATE_LIMIT_EXCEEDED`	Too many requests	Wait and retry

Technical Details

Runtime: Node.js 22
Memory: 4GB default, 16GB max
PDF Libraries: pdf-parse, pdf-lib
AI Models: Gemini 2.5 Flash, GPT-4V, Claude Vision
Protocols: MCP (Model Context Protocol), REST API

Changelog

v3.0.0

AI Document Analysis with executive summary, entities, and classification
7 specialized output views in Apify Console
Memory-efficient streaming for 100+ page documents
Gemini 2.5 Flash as default AI provider

v2.1.0

AI-powered OCR with Vision APIs
Semantic chunking with AI boundary detection
Multi-provider AI support (OpenAI, Anthropic, Gemini)

v2.0.0

Dual operation modes (One Click and BYOK)
HTTP REST API for external clients
Pay-per-event pricing model

Support

Issues: Report bugs on GitHub
Questions: Contact via Apify Console
Documentation: This README and input schema tooltips

Built with ❤️ using Apify SDK

PDF to Markdown RAG-Ready

hedelka/pdf-to-markdown-rag

Premium PDF scraper that preserves tables and structure. Optimized for RAG.

Dmitry Goncharov

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

496

PDF to Text API | Document Extraction for LLMs & RAG

andok/pdf-text-converter

Convert bulk PDF documents via URL into clean, raw text. The perfect document scraper for LLMs, vector databases, and RAG pipelines.

Andok

PDF AI Extractor MCP

devaditya/pdf-ai-extractor-mcp

Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.

lalithhh

Document Extractor API - AI-Powered PDF & Text Analysis

fresh_cliff/document-extractor-api

Extract text and data from PDF, Word, and image documents using AI-powered OCR. Convert documents to structured JSON, analyze content, and extract insights. No API keys required with mirror fallbacks.

Brennan Crawford

Pdf API

vivid_astronaut/pdf

Fabio Suizu

Extractor from PDF URL

zayn_0001/extractor-from-pdf-url

Extract text and tables from PDFs in a clear, readable format. Provides well-organized tables and cleans up messy spacing, making PDF content easy to view, copy, or share—directly from a PDF link.

Muhammad Zain Abid

PDF to Markdown Converter - AI-Powered with OCR & Tables

clearpath/pdf-to-markdown-api

Convert PDFs to clean Markdown with GPU-accelerated AI. Extracts tables, LaTeX formulas, and images from complex layouts. Supports OCR for scanned docs in 8 languages. Batch process hundreds of PDFs in parallel via URL, upload, or API.

ClearPath

PDF Extractor 2.0

jupri/pdf-extractor-2-0

💫 Extract PDF Document Contents including Metadata, Images, Pages, Tables, Attachments, etc.

cat

165

PDF OCR API - Document Extraction

alizarin_refrigerator-owner/pdf-ocr-api

Extract text from PDFs including scanned documents. OCR processing, table extraction & structured data output. Process invoices, contracts & forms at scale.

The Howlers

PDF Intelligence

PDF Intelligence - AI-Powered PDF Analysis, OCR & RAG Preparation

Quick Start

What This Actor Does

Core Features

Output Includes

Pricing

Base Processing

AI Features (Require API Key)

Pricing Examples

How to Use

Option 1: Apify Console (Easiest)

Option 2: Apify API

Option 3: Direct HTTP API

Option 4: Claude Desktop (MCP)

Input Parameters

Basic Settings

Actions Available

AI Configuration

Output Format

Example Output

Output Views in Console

Use Cases

📄 Invoice Processing

📋 Contract Analysis

📚 Research Paper RAG

🔍 Scanned Document OCR

FAQ

Limitations

Error Codes

Technical Details

Changelog

v3.0.0

v2.1.0

v2.0.0

Support

You might also like

PDF to Markdown RAG-Ready

PDF Scraper

PDF to Text API | Document Extraction for LLMs & RAG

PDF AI Extractor MCP

Document Extractor API - AI-Powered PDF & Text Analysis

Pdf API

Extractor from PDF URL

PDF to Markdown Converter - AI-Powered with OCR & Tables

PDF Extractor 2.0

PDF OCR API - Document Extraction