PDF AI Extractor MCP
Pricing
from $0.50 / 1,000 results
PDF AI Extractor MCP
Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.
Pricing
from $0.50 / 1,000 results
Rating
0.0
(0)
Developer

lalithhh
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
14 days ago
Last modified
Categories
Share
π PDF-AI Extractor MCP
Extract any PDF β Clean Text β AI Analysis using OpenAI, Google Gemini, or Anthropic β with optional MCP Agent Server Mode.
PDF-AI Extractor MCP is a dual-mode Apify Actor that downloads a PDF, extracts readable text, analyzes it with your chosen AI model, and returns structured output.
It also runs as an MCP WebSocket server so ChatGPT, Claude, LangChain, and other AI agents can use it as a tool.
π Why This Actor?
Businesses and AI workflows often struggle with PDFs:
- PDFs are messy, inconsistent, or scanned
- Extracting structured data is hard
- AI needs clean text to understand documents
- Agents need a tool interface (MCP)
This Actor solves all of it in one place.
β¨ Key Features
π Smart PDF Extraction
- Downloads PDFs reliably
- Uses
pdf-parsefor robust extraction - Cleans and normalizes raw PDF text
π€ Multi-AI Engine Support
Use any major model you want:
- OpenAI β GPT-4.1, GPT-4o, o3-mini
- Google Gemini β 1.5 Flash / Pro
- Anthropic Claude β Haiku / Sonnet / Opus
π§ AI-Enhanced Document Understanding
Your prompt + extracted PDF text β
Summaries, structured fields, business insights, compliance checks, custom extraction, etc.
π Two Operation Modes
1οΈβ£ NORMAL MODE
Runs once, returns structured JSON.
Perfect for:
- Document automation
- Backend workflows
- Report preparation
- Daily processing
2οΈβ£ MCP MODE (Agent Mode)
Turns into a WebSocket MCP server:
- Agents call:
extractPdf(),analyze(), etc. - ChatGPT, Claude, LangChain tools all supported
- Real-time interaction
π₯ Input Schema
Required fields (normal mode only):
| Field | Description |
|---|---|
mode | "normal" or "mcp" |
pdfUrl | Public PDF URL |
aiProvider | "openai", "google", "anthropic" |
prompt | AI instruction |
π§ͺ Example Input β Normal Mode
{"mode": "normal","pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf","aiProvider": "openai","prompt": "Extract key business information from this PDF."}
π‘ Example Input β MCP Mode
{"mode": "mcp"}
π€ Output Format (Normal Mode)
{"mode": "normal","aiProvider": "openai","pdfUrl": "https://example.com/file.pdf","charactersExtracted": 10542,"aiResult": "Structured AI-generated content here..."}
In MCP mode, results stream to the connected AI.
π§ Environment Variables
Create a .env file:
OPENAI_API_KEY=your_openai_keyGEMINI_API_KEY=your_gemini_keyANTHROPIC_API_KEY=your_anthropic_keyMCP_PORT=8080
A ready example.env is included for users.
π Architecture Overview
PDF URL β Downloader β pdf-parse β Cleaned Text β AI Adapter β Final JSON or MCP Stream
π§ͺ Running Tests
Normal mode:
apify run --purge --input-file=tests/input.normal.json
MCP mode:
apify run --purge --input-file=tests/input.mcp.json
Connect at:
ws://localhost:8080
π¦ Project Structure
pdf-ai-extractor-mcp/ββββ main.jsβββ package.jsonβββ .envβββ .gitignoreββββ src/β βββ orchestrator/orchestrator.jsβ βββ connectors/β β βββ openai/adapter.jsβ β βββ google/adapter.jsβ β βββ anthropic/adapter.jsβ βββ mcp/β β βββ server.jsβ β βββ handlers.jsβ βββ utils/β βββ pdfTools.jsβ βββ aiTools.jsβ βββ fileManager.jsββββ tests/β βββ input.normal.jsonβ βββ input.mcp.jsonββββ .actor/βββ actor.jsonβββ INPUT_SCHEMA.jsonβββ OUTPUT_SCHEMA.jsonβββ dataset_schema.json
π Why This Actor Will Do Great on Apify Store
- Multi-AI support is trending
- MCP tools are in demand
- PDF + AI extraction is extremely useful
- Works for enterprise, finance, research, startups
- No competitors offering dual (Normal + MCP) mode
- Very high utility β very likely to make revenue
β€οΈ Support & Feedback
Feel free to reach out with feature ideas or improvements.
Happy automating!
β PDF-AI Extractor MCP


