Pricing

from $5.00 / 1,000 pdf extracteds

PDF Text Extractor - Bulk PDF to Text & Metadata

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

Pricing

from $5.00 / 1,000 pdf extracteds

Rating

0.0

(0)

Developer

NanoScrape

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

What you get

Full text extraction — clean text from every page
PDF metadata — title, author, creation date, producer, keywords
Page-level info — count, dimensions, character distribution
Scanned detection — flags PDFs that need OCR (heuristic: low text density)
Encryption detection — flags password-protected PDFs
Bulk processing — verify hundreds in one run, parallel safe
Pay-per-result — $0.005 per PDF, no monthly fees

Use with AI Agents (MCP)

Connect this actor to any MCP-compatible AI client — Claude Desktop, Claude.ai, Cursor, VS Code, LangChain, LlamaIndex, or custom agents.

Apify MCP server URL:

https://mcp.apify.com?tools=santamaria-automations/pdf-extractor

Example prompt once connected:

"Use pdf-extractor to process data with pdf extractor. Return results as a table."

Clients that support dynamic tool discovery (Claude.ai, VS Code) will receive the full input schema automatically via add-actor.

Example output

{
  "url": "https://example.com/whitepaper.pdf",
  "file_size_bytes": 524288,
  "success": true,
  "page_count": 14,
  "text_length": 28450,
  "text": "Introduction\n\nThis whitepaper explores...",
  "metadata": {
    "title": "Quarterly Report 2026",
    "author": "Jane Smith",
    "creation_date": "2026-03-15T10:23:00Z",
    "creator": "Microsoft Word",
    "producer": "Acrobat Distiller"
  },
  "is_encrypted": false,
  "is_scanned": false,
  "needs_ocr": false
}

Use cases

Research & academia — extract content from papers, white papers, dissertations
Document archiving — build searchable indexes from PDF libraries
Compliance — bulk-extract contract text for review
Data extraction — invoice/receipt text mining
Content moderation — scan PDFs for keywords
OCR preparation — flag scanned PDFs that need image-to-text processing

Pricing

Event	Price
Actor start	$0.001
PDF extracted	$0.005

Example: Process 1,000 PDFs ≈ $5.00

Issues & Feedback

Found a bug or have a feature request? Open an issue on the Issues tab — we respond within 24 hours.

Website Contact Extractor — pull contacts from any website
Website Tech Stack Detector — detect site technologies
Email Verifier — bulk email validation
Domain WHOIS & DNS — domain intelligence

PDF Text Extractor

automation-lab/pdf-text-extractor

Extract text, metadata, and page-by-page content from PDF files. Provide PDF URLs and get structured JSON with full text, per-page text, page count, author, title, creation date, and more. Export as JSON, CSV, or Excel. No browser or proxy needed.

Stas Persiianenko

122

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

515

📄 PDF Text Extractor

scrapio/pdf-text-extractor

📄 PDF Text Extractor (pdf-text-extractor) extracts clean text from PDF files for faster search, data analysis, and content reuse. ⚡ Saves time & boosts productivity for research, automation, and document workflows.

Scrapio

📄 PDF Text Extractor

api-empire/pdf-text-extractor

📄 PDF Text Extractor effortlessly converts PDF files into searchable text and clean output. ⚡ Fast, accurate, and user-friendly—ideal for document analysis, data extraction, and content indexing. 🚀 Perfect for research, compliance, and automation.

API Empire

PDF Toolkit — Extract Text, Metadata & Page Count

accurate_pouch/pdf-toolkit

Extract text from PDFs, read metadata (title, author, dates), count pages. Bulk processing from URLs. $0.003 per PDF.

Manchitt Sanan

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

George Kioko

📄 PDF Text Extractor

scrapier/pdf-text-extractor

📄✨ PDF Text Extractor converts PDFs to clean, searchable text in seconds. Extract content for SEO, research, data entry & document processing—fast, accurate, and easy to use. 🚀 Perfect for analysts, developers & teams handling PDFs.

Scrapier

📄 PDF Text Extractor

scraper-engine/pdf-text-extractor

📄✨ PDF Text Extractor extracts clean text from PDF files with precision. ⚡ Perfect for data mining, document processing, and searchable archives. 🚀 Fast, reliable, and efficient for your workflow!

Scraper Engine

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

Akash Kumar Naik

110

Pdf Text Extractor Pro

dainty_screw/pdf-text-extractor-pro

PDF Text Extractor lets you quickly extract text from PDF files with high accuracy. Supports text chunking for AI, chatbots, and large language models (LLMs), making PDF-to-text conversion fast, clean, and ready for NLP or machine learning.