PDF AI Extractor MCP
Pricing
Pay per event
PDF AI Extractor MCP
Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.
Pricing
Pay per event
Rating
0.0
(0)
Developer

lalithhh
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
๐ PDF-AI Extractor MCP
Extract any PDF โ Clean Text โ AI Analysis using OpenAI, Google Gemini, or Anthropic โ with optional MCP Agent Server Mode.
PDF-AI Extractor MCP is a dual-mode Apify Actor that downloads a PDF, extracts readable text, analyzes it with your chosen AI model, and returns structured output.
It also runs as an MCP WebSocket server so ChatGPT, Claude, LangChain, and other AI agents can use it as a tool.
๐ Why This Actor?
Businesses and AI workflows often struggle with PDFs:
- PDFs are messy, inconsistent, or scanned
- Extracting structured data is hard
- AI needs clean text to understand documents
- Agents need a tool interface (MCP)
This Actor solves all of it in one place.
โจ Key Features
๐ Smart PDF Extraction
- Downloads PDFs reliably
- Uses
pdf-parsefor robust extraction - Cleans and normalizes raw PDF text
๐ค Multi-AI Engine Support
Use any major model you want:
- OpenAI โ GPT-4.1, GPT-4o, o3-mini
- Google Gemini โ 1.5 Flash / Pro
- Anthropic Claude โ Haiku / Sonnet / Opus
๐ง AI-Enhanced Document Understanding
Your prompt + extracted PDF text โ
Summaries, structured fields, business insights, compliance checks, custom extraction, etc.
๐ Two Operation Modes
1๏ธโฃ NORMAL MODE
Runs once, returns structured JSON.
Perfect for:
- Document automation
- Backend workflows
- Report preparation
- Daily processing
2๏ธโฃ MCP MODE (Agent Mode)
Turns into a WebSocket MCP server:
- Agents call:
extractPdf(),analyze(), etc. - ChatGPT, Claude, LangChain tools all supported
- Real-time interaction
๐ฅ Input Schema
Required fields (normal mode only):
| Field | Description |
|---|---|
mode | "normal" or "mcp" |
pdfUrl | Public PDF URL |
aiProvider | "openai", "google", "anthropic" |
prompt | AI instruction |
๐งช Example Input โ Normal Mode
{"mode": "normal","pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf","aiProvider": "openai","prompt": "Extract key business information from this PDF."}
๐ก Example Input โ MCP Mode
{"mode": "mcp"}
๐ค Output Format (Normal Mode)
{"mode": "normal","aiProvider": "openai","pdfUrl": "https://example.com/file.pdf","charactersExtracted": 10542,"aiResult": "Structured AI-generated content here..."}
In MCP mode, results stream to the connected AI.
๐ง Environment Variables
Create a .env file:
OPENAI_API_KEY=your_openai_keyGEMINI_API_KEY=your_gemini_keyANTHROPIC_API_KEY=your_anthropic_keyMCP_PORT=8080
A ready example.env is included for users.
๐ Architecture Overview
PDF URL โ Downloader โ pdf-parse โ Cleaned Text โ AI Adapter โ Final JSON or MCP Stream
๐งช Running Tests
Normal mode:
apify run --purge --input-file=tests/input.normal.json
MCP mode:
apify run --purge --input-file=tests/input.mcp.json
Connect at:
ws://localhost:8080
๐ฆ Project Structure
pdf-ai-extractor-mcp/โโโโ main.jsโโโ package.jsonโโโ .envโโโ .gitignoreโโโโ src/โ โโโ orchestrator/orchestrator.jsโ โโโ connectors/โ โ โโโ openai/adapter.jsโ โ โโโ google/adapter.jsโ โ โโโ anthropic/adapter.jsโ โโโ mcp/โ โ โโโ server.jsโ โ โโโ handlers.jsโ โโโ utils/โ โโโ pdfTools.jsโ โโโ aiTools.jsโ โโโ fileManager.jsโโโโ tests/โ โโโ input.normal.jsonโ โโโ input.mcp.jsonโโโโ .actor/โโโ actor.jsonโโโ INPUT_SCHEMA.jsonโโโ OUTPUT_SCHEMA.jsonโโโ dataset_schema.json
๐ Why This Actor Will Do Great on Apify Store
- Multi-AI support is trending
- MCP tools are in demand
- PDF + AI extraction is extremely useful
- Works for enterprise, finance, research, startups
- No competitors offering dual (Normal + MCP) mode
- Very high utility โ very likely to make revenue
โค๏ธ Support & Feedback
Feel free to reach out with feature ideas or improvements.
Happy automating!
โ PDF-AI Extractor MCP