PDF Text Extractor - Bulk PDF to Text & Metadata avatar

PDF Text Extractor - Bulk PDF to Text & Metadata

Pricing

from $5.00 / 1,000 pdf extracteds

Go to Apify Store
PDF Text Extractor - Bulk PDF to Text & Metadata

PDF Text Extractor - Bulk PDF to Text & Metadata

Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.

Pricing

from $5.00 / 1,000 pdf extracteds

Rating

0.0

(0)

Developer

Alessandro Santamaria

Alessandro Santamaria

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Extract text and structured metadata from any PDF URL at scale. Perfect for document analysis, research papers, compliance docs, and building searchable archives.

What you get

  • Full text extraction — clean text from every page
  • PDF metadata — title, author, creation date, producer, keywords
  • Page-level info — count, dimensions, character distribution
  • Scanned detection — flags PDFs that need OCR (heuristic: low text density)
  • Encryption detection — flags password-protected PDFs
  • Bulk processing — verify hundreds in one run, parallel safe
  • Pay-per-result — $0.005 per PDF, no monthly fees

Example output

{
"url": "https://example.com/whitepaper.pdf",
"file_size_bytes": 524288,
"success": true,
"page_count": 14,
"text_length": 28450,
"text": "Introduction\n\nThis whitepaper explores...",
"metadata": {
"title": "Quarterly Report 2026",
"author": "Jane Smith",
"creation_date": "2026-03-15T10:23:00Z",
"creator": "Microsoft Word",
"producer": "Acrobat Distiller"
},
"is_encrypted": false,
"is_scanned": false,
"needs_ocr": false
}

Use cases

  • Research & academia — extract content from papers, white papers, dissertations
  • Document archiving — build searchable indexes from PDF libraries
  • Compliance — bulk-extract contract text for review
  • Data extraction — invoice/receipt text mining
  • Content moderation — scan PDFs for keywords
  • OCR preparation — flag scanned PDFs that need image-to-text processing

Pricing

EventPrice
Actor start$0.001
PDF extracted$0.005

Example: Process 1,000 PDFs ≈ $5.00

Issues & Feedback

Found a bug or have a feature request? Open an issue on the Issues tab — we respond within 24 hours.