PDF Text Extractor - Bulk PDF to Text & Metadata
Pricing
from $5.00 / 1,000 pdf extracteds
PDF Text Extractor - Bulk PDF to Text & Metadata
Extract text and metadata from any PDF URL in bulk. Get page content, author, title, creation date, and more. Detects scanned PDFs that need OCR. Perfect for document analysis, research, and compliance.
Pricing
from $5.00 / 1,000 pdf extracteds
Rating
0.0
(0)
Developer
Alessandro Santamaria
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Extract text and structured metadata from any PDF URL at scale. Perfect for document analysis, research papers, compliance docs, and building searchable archives.
What you get
- Full text extraction — clean text from every page
- PDF metadata — title, author, creation date, producer, keywords
- Page-level info — count, dimensions, character distribution
- Scanned detection — flags PDFs that need OCR (heuristic: low text density)
- Encryption detection — flags password-protected PDFs
- Bulk processing — verify hundreds in one run, parallel safe
- Pay-per-result — $0.005 per PDF, no monthly fees
Example output
{"url": "https://example.com/whitepaper.pdf","file_size_bytes": 524288,"success": true,"page_count": 14,"text_length": 28450,"text": "Introduction\n\nThis whitepaper explores...","metadata": {"title": "Quarterly Report 2026","author": "Jane Smith","creation_date": "2026-03-15T10:23:00Z","creator": "Microsoft Word","producer": "Acrobat Distiller"},"is_encrypted": false,"is_scanned": false,"needs_ocr": false}
Use cases
- Research & academia — extract content from papers, white papers, dissertations
- Document archiving — build searchable indexes from PDF libraries
- Compliance — bulk-extract contract text for review
- Data extraction — invoice/receipt text mining
- Content moderation — scan PDFs for keywords
- OCR preparation — flag scanned PDFs that need image-to-text processing
Pricing
| Event | Price |
|---|---|
| Actor start | $0.001 |
| PDF extracted | $0.005 |
Example: Process 1,000 PDFs ≈ $5.00
Issues & Feedback
Found a bug or have a feature request? Open an issue on the Issues tab — we respond within 24 hours.
Related Actors
- Website Contact Extractor — pull contacts from any website
- Website Tech Stack Detector — detect site technologies
- Email Verifier — bulk email validation
- Domain WHOIS & DNS — domain intelligence