Pricing

from $2.00 / 1,000 page extracteds

PDF Text Extractor API - URL to Text, Per-Page, Batch

Turn any public PDF URL into clean text and metadata. Per-page output, batch processing, and a synchronous API mode for AI agents. Pay per page extracted, cheaper than the alternatives.

Pricing

from $2.00 / 1,000 page extracteds

Rating

0.0

(0)

Developer

Jimmy A

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What it does

Fetches each PDF URL (redirects followed, 60s timeout)
Extracts text page by page with line reconstruction (not one giant word soup)
Reads the document's own metadata (title, author, producer, dates) as published in the file
Outputs one structured record per document, with per-page text blocks if you want them

Use cases

RAG / AI pipelines: turn report URLs into chunks for embedding, page-aligned
Agents: call the standby endpoint as a tool - "read this PDF and answer"
Document monitoring: pair with a scheduler to extract recurring reports (filings, government publications, price lists)
Data entry automation: pull text from invoices, spec sheets, catalogs you have rights to process
Research: batch-extract paper PDFs into searchable text

Input

{
  "pdfUrls": [
    "https://arxiv.org/pdf/1706.03762",
    "https://example.com/annual-report.pdf"
  ],
  "perPage": true,
  "maxPages": 500
}

Output

{
  "url": "https://arxiv.org/pdf/1706.03762",
  "pageCount": 15,
  "pagesExtracted": 15,
  "truncated": false,
  "metadata": { "title": null, "author": null, "producer": "pdfTeX", "creationDate": "..." },
  "pages": [
    { "page": 1, "text": "Attention Is All You Need\n..." }
  ]
}

Set perPage: false for a single text field per document. Failed URLs produce a record with an error field instead of killing the run.

API / Standby mode for AI agents

GET /?url=https://example.com/file.pdf&perPage=true&maxPages=50

Returns the full extraction JSON synchronously. Works as a tool for agent frameworks that support Apify actors.

Pricing

Event	Price
Actor start	$0.0005
Per page extracted	$0.002
API call (standby)	$0.02

A 40-page report costs $0.08. Comparable actors charge $0.022-0.04 per page - 10-20x more.

FAQ

Does it do OCR on scanned PDFs? Not in this version. It extracts the text layer of digital PDFs (the overwhelming majority of reports, papers, and filings). Scanned-image PDFs return empty pages; an OCR tier is planned - ask in Issues if you need it.

How are lines handled? Text items are regrouped by their position on the page, so paragraphs read naturally instead of being one long line.

Maximum size? Default cap is 500 pages per document (configurable). Very large files are limited by fetch timeout (60s).

Password-protected PDFs? Not supported. Public, unencrypted documents only.

CSV/Excel export? Every Apify dataset exports as JSON, CSV, or Excel via the platform.

PDF Text Extractor - Extract Text from PDF by URL API

eliai/pdf-text-extractor

Extract text from PDF by URL. Input: url of a PDF. Output: JSON with full extracted text, page count, and document metadata (title, author, dates). Built for RAG pipelines, document QA, and agents. Pay-per-result at $0.05 per PDF processed.

Anthony Snider

PDF Text Extractor – PDF to Text, Metadata & Pages

haketa/pdf-text-extractor

Extract clean text and metadata from any PDF by URL: full text, per-page text, page count, title, author, dates and producer. No browser, no OCR needed for text PDFs. Ideal for AI/RAG, search and document data extraction. Export to JSON, CSV or Excel.

Haketa

PDF Text Extractor

automation-lab/pdf-text-extractor

Extract text, metadata, and page-by-page content from PDF files. Provide PDF URLs and get structured JSON with full text, per-page text, page count, author, title, creation date, and more. Export as JSON, CSV, or Excel. No browser or proxy needed.

Stas Persiianenko

201

PDF to Text API — Extract PDF Text to Clean JSON for LLM & RAG

omao/pdf-text

Extract clean, structured text from any PDF by URL, page by page. Returns one row per page with de-hyphenated, whitespace-normalized text. Fast, no setup.

Marouane Oulabass

PDF Text Extractor Batch

snapperwapper/pdf-text-extractor-batch

Extract text and document metadata from a bounded batch of public PDF URLs.

snapperwapper

PDF Metadata & Content Extractor — Extract text & metadata

perryay/pdf-metadata-extractor

Download and extract metadata, text content, and page count from PDF documents. Supports batch processing with concurrent downloads.

Perry AY

PDF Text Extractor — Text & Metadata from URLs

darknezz/pdf-text-extractor

Extract clean text and metadata from any PDF by URL: full text, page count, title, author, dates as JSON. Perfect for AI pipelines, RAG ingestion, document search and content analysis. No API key needed.

Oaida Adrian

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

George Kioko

PDF Text & Table Extractor (pdfplumber, batch URLs)

gochujang/pdf-text-extractor

Download any PDF by URL and extract clean per-page text + detected tables (as 2D arrays) + document metadata (title/author/created/modified). Powered by pdfplumber. Batch up to 50 PDFs. $0.01 per PDF + $0.0005 per page.

Hojun Lee

PDF Extractor: Structured Text + Metadata

aitoolbreakdown/atb-pdf-extractor

Point it at one or many PDF URLs. Get clean structured JSON back: full text, per-page text, title, author, page count, and word count. Ready for RAG, search, or doc automation.