Pricing

from $10.00 / 1,000 pdf processeds

PDF to Structured Data (JSON/CSV)

Convert PDF files into clean structured JSON or CSV: text per page, reconstructed lines, optional table detection, and document metadata.

Pricing

from $10.00 / 1,000 pdf processeds

Rating

0.0

(0)

Developer

Rosario Vitale

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What it does

📄 Text extraction — full text of every page, in natural reading order.
📐 Line reconstruction — text items are grouped by position into real lines, not a jumbled blob.
📊 Table detection (optional) — heuristically splits rows into cells so you can rebuild tables.
🏷️ Metadata (optional) — title, author, producer and creation date when present.
🔁 Batch — pass many PDF URLs in a single run.

Input

Field	Type	Description
`pdfUrls`	array of strings	Direct links to the PDF files (required).
`extractTables`	boolean	Detect tables and return rows of cells. Default `false`.
`extractMetadata`	boolean	Include document metadata. Default `true`.
`maxPages`	integer	Max pages to read per PDF. `0` = all. Default `0`.

Example input

{
    "pdfUrls": [
        "https://raw.githubusercontent.com/mozilla/pdf.js/master/web/compressed.tracemonkey-pldi-09.pdf"
    ],
    "extractTables": false,
    "extractMetadata": true,
    "maxPages": 0
}

Output

One dataset item per PDF:

{
    "url": "https://.../document.pdf",
    "success": true,
    "numPages": 14,
    "pagesExtracted": 14,
    "metadata": { "Producer": "pdfeTeX-1.21a", "Creator": "TeX", "CreationDate": "..." },
    "pages": [
        {
            "pageNumber": 1,
            "text": "Trace-based Just-in-Time Type Specialization ...",
            "lines": ["Trace-based Just-in-Time Type Specialization ...", "Languages"],
            "tables": [["Cell A", "Cell B"], ["1", "2"]]
        }
    ],
    "fullText": "Trace-based Just-in-Time Type Specialization ..."
}

Export the dataset as JSON, CSV, Excel, or HTML straight from the run, or pull it through the Apify API.

Common use cases

Extract data from invoices, receipts, price lists, and bank statements.
Feed PDF text into search, RAG pipelines, or LLMs.
Turn reports and catalogs into spreadsheets.
Archive and index document text at scale.

Notes & limits

Works on text-based PDFs. Scanned/image-only PDFs contain no selectable text, so they need OCR (not included in this version).
Table detection is a position-based heuristic — great for clean, grid-like tables, approximate for complex layouts.
pdfUrls must be direct links to the PDF file (not a viewer page).

Pricing

Pay-per-result: you are billed per PDF successfully processed. Failed downloads/parses are returned with success: false and are not charged.

PDF Text Extractor

automation-lab/pdf-text-extractor

Extract text, metadata, and page-by-page content from PDF files. Provide PDF URLs and get structured JSON with full text, per-page text, page count, author, title, creation date, and more. Export as JSON, CSV, or Excel. No browser or proxy needed.

Stas Persiianenko

188

Pdf to json

shahabuddin38/pdf-to-json

Convert PDF files into structured JSON with optional OCR, table extraction, key-value detection, and metadata parsing. Ideal for invoices, receipts, contracts, statements, forms, and document automation workflows. Supports digital and scanned PDFs for API-ready data extraction.

Shahab Uddin

PDF Text Extractor - Extract Text from PDF by URL API

eliai/pdf-text-extractor

Extract text from PDF by URL. Input: url of a PDF. Output: JSON with full extracted text, page count, and document metadata (title, author, dates). Built for RAG pipelines, document QA, and agents. Pay-per-result at $0.05 per PDF processed.

Anthony Snider

PDF to JSON Parser

jungle_synthesizer/pdf-to-json-parser

Convert PDF documents into structured JSON. Extracts text, tables, and fields from any PDF URL. Optional AI structuring pass (BYO OpenAI key) turns raw text into clean, organized JSON ready for automation or analysis.

BowTiedRaccoon

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

George Kioko

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

518

PDF to Text API — Extract PDF Text to Clean JSON for LLM & RAG

omao/pdf-text

Extract clean, structured text from any PDF by URL, page by page. Returns one row per page with de-hyphenated, whitespace-normalized text. Fast, no setup.

Marouane Oulabass

📄 PDF Text Extractor

scrapio/pdf-text-extractor

📄 PDF Text Extractor (pdf-text-extractor) extracts clean text from PDF files for faster search, data analysis, and content reuse. ⚡ Saves time & boosts productivity for research, automation, and document workflows.

Scrapio

📄 PDF Text Extractor

api-empire/pdf-text-extractor

📄 PDF Text Extractor effortlessly converts PDF files into searchable text and clean output. ⚡ Fast, accurate, and user-friendly—ideal for document analysis, data extraction, and content indexing. 🚀 Perfect for research, compliance, and automation.

API Empire

📄 PDF Text Extractor

scraper-engine/pdf-text-extractor

📄✨ PDF Text Extractor extracts clean text from PDF files with precision. ⚡ Perfect for data mining, document processing, and searchable archives. 🚀 Fast, reliable, and efficient for your workflow!