PDF AI Extractor MCP avatar
PDF AI Extractor MCP

Pricing

Pay per event

Go to Apify Store
PDF AI Extractor MCP

PDF AI Extractor MCP

Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.

Pricing

Pay per event

Rating

0.0

(0)

Developer

lalithhh

lalithhh

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Share

๐Ÿ“„ PDF-AI Extractor MCP

Extract any PDF โ†’ Clean Text โ†’ AI Analysis using OpenAI, Google Gemini, or Anthropic โ€” with optional MCP Agent Server Mode.

PDF-AI Extractor MCP is a dual-mode Apify Actor that downloads a PDF, extracts readable text, analyzes it with your chosen AI model, and returns structured output.
It also runs as an MCP WebSocket server so ChatGPT, Claude, LangChain, and other AI agents can use it as a tool.


๐Ÿš€ Why This Actor?

Businesses and AI workflows often struggle with PDFs:

  • PDFs are messy, inconsistent, or scanned
  • Extracting structured data is hard
  • AI needs clean text to understand documents
  • Agents need a tool interface (MCP)

This Actor solves all of it in one place.


โœจ Key Features

๐Ÿ” Smart PDF Extraction

  • Downloads PDFs reliably
  • Uses pdf-parse for robust extraction
  • Cleans and normalizes raw PDF text

๐Ÿค– Multi-AI Engine Support

Use any major model you want:

  • OpenAI โ†’ GPT-4.1, GPT-4o, o3-mini
  • Google Gemini โ†’ 1.5 Flash / Pro
  • Anthropic Claude โ†’ Haiku / Sonnet / Opus

๐Ÿง  AI-Enhanced Document Understanding

Your prompt + extracted PDF text โ†’
Summaries, structured fields, business insights, compliance checks, custom extraction, etc.


๐Ÿ” Two Operation Modes

1๏ธโƒฃ NORMAL MODE

Runs once, returns structured JSON.

Perfect for:

  • Document automation
  • Backend workflows
  • Report preparation
  • Daily processing

2๏ธโƒฃ MCP MODE (Agent Mode)

Turns into a WebSocket MCP server:

  • Agents call: extractPdf(), analyze(), etc.
  • ChatGPT, Claude, LangChain tools all supported
  • Real-time interaction

๐Ÿ“ฅ Input Schema

Required fields (normal mode only):

FieldDescription
mode"normal" or "mcp"
pdfUrlPublic PDF URL
aiProvider"openai", "google", "anthropic"
promptAI instruction

๐Ÿงช Example Input โ€” Normal Mode

{
"mode": "normal",
"pdfUrl": "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf",
"aiProvider": "openai",
"prompt": "Extract key business information from this PDF."
}

๐Ÿ“ก Example Input โ€” MCP Mode

{
"mode": "mcp"
}

๐Ÿ“ค Output Format (Normal Mode)

{
"mode": "normal",
"aiProvider": "openai",
"pdfUrl": "https://example.com/file.pdf",
"charactersExtracted": 10542,
"aiResult": "Structured AI-generated content here..."
}

In MCP mode, results stream to the connected AI.


๐Ÿ”ง Environment Variables

Create a .env file:

OPENAI_API_KEY=your_openai_key
GEMINI_API_KEY=your_gemini_key
ANTHROPIC_API_KEY=your_anthropic_key
MCP_PORT=8080

A ready example.env is included for users.


๐Ÿ›  Architecture Overview

PDF URL โ†’ Downloader โ†’ pdf-parse โ†’ Cleaned Text โ†’ AI Adapter โ†’ Final JSON or MCP Stream

๐Ÿงช Running Tests

Normal mode:

apify run --purge --input-file=tests/input.normal.json

MCP mode:

apify run --purge --input-file=tests/input.mcp.json

Connect at:

ws://localhost:8080

๐Ÿ“ฆ Project Structure

pdf-ai-extractor-mcp/
โ”‚
โ”œโ”€โ”€ main.js
โ”œโ”€โ”€ package.json
โ”œโ”€โ”€ .env
โ”œโ”€โ”€ .gitignore
โ”‚
โ”œโ”€โ”€ src/
โ”‚ โ”œโ”€โ”€ orchestrator/orchestrator.js
โ”‚ โ”œโ”€โ”€ connectors/
โ”‚ โ”‚ โ”œโ”€โ”€ openai/adapter.js
โ”‚ โ”‚ โ”œโ”€โ”€ google/adapter.js
โ”‚ โ”‚ โ””โ”€โ”€ anthropic/adapter.js
โ”‚ โ”œโ”€โ”€ mcp/
โ”‚ โ”‚ โ”œโ”€โ”€ server.js
โ”‚ โ”‚ โ””โ”€โ”€ handlers.js
โ”‚ โ””โ”€โ”€ utils/
โ”‚ โ”œโ”€โ”€ pdfTools.js
โ”‚ โ”œโ”€โ”€ aiTools.js
โ”‚ โ””โ”€โ”€ fileManager.js
โ”‚
โ”œโ”€โ”€ tests/
โ”‚ โ”œโ”€โ”€ input.normal.json
โ”‚ โ””โ”€โ”€ input.mcp.json
โ”‚
โ””โ”€โ”€ .actor/
โ”œโ”€โ”€ actor.json
โ”œโ”€โ”€ INPUT_SCHEMA.json
โ”œโ”€โ”€ OUTPUT_SCHEMA.json
โ””โ”€โ”€ dataset_schema.json

๐Ÿ Why This Actor Will Do Great on Apify Store

  • Multi-AI support is trending
  • MCP tools are in demand
  • PDF + AI extraction is extremely useful
  • Works for enterprise, finance, research, startups
  • No competitors offering dual (Normal + MCP) mode
  • Very high utility โ†’ very likely to make revenue

โค๏ธ Support & Feedback

Feel free to reach out with feature ideas or improvements.

Happy automating!
โ€” PDF-AI Extractor MCP