Pricing

Pay per event

PDF to JSON Parser

Convert PDF documents into structured JSON. Extracts text, tables, and fields from any PDF URL. Optional AI structuring pass (BYO OpenAI key) turns raw text into clean, organized JSON ready for automation or analysis.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What it does

Accepts a list of public PDF URLs (up to 50 MB per file)
Downloads each PDF to temporary storage and extracts text per page using native PDF parsing
Processes every page for complete coverage — no pages skipped
Optionally runs an AI structuring pass (OpenAI GPT-4o-mini or GPT-4o) that organizes the raw text into titled sections, tables, key fields, and metadata
Returns one dataset record per PDF with the full extracted text, per-page breakdown, and AI output
Saves error records for PDFs that fail to download or parse — the run continues

Use cases

Invoice and receipt extraction for accounting automation
Contract and legal document analysis
Academic paper indexing and summarization
Form data extraction from government or regulatory PDFs
Report parsing for data pipelines
Bulk document conversion for RAG / LLM pipelines

Input

Field	Type	Required	Description
`pdfUrls`	Array	Yes	Public PDF URLs to process. Must be directly downloadable.
`openaiApiKey`	String	No	Your OpenAI API key (`sk-...`). Enables AI structuring. Not stored.
`extractionPrompt`	String	No	Custom prompt for the AI structuring pass. Leave blank to use the default (extracts title, author, summary, sections, tables, key fields).
`model`	Select	No	OpenAI model: `gpt-4o-mini` (default, fast) or `gpt-4o` (most capable).
`maxItems`	Integer	No	Maximum PDFs to process per run. Default: 15.

Output

One dataset record per PDF:

Field	Type	Description
`sourceUrl`	String	Original PDF URL
`pageCount`	Number	Number of pages in the PDF
`rawText`	String	Full extracted text (all pages concatenated)
`pages`	String	JSON array of per-page text: `[{"page": 1, "text": "..."}]`
`structuredJson`	String	AI-structured output as JSON string (null if no API key supplied)
`model`	String	OpenAI model used (null if AI pass skipped)
`processedAt`	String	ISO timestamp when processing completed
`status`	String	`success` or `error`
`errorMsg`	String	Error message on failure, null on success

Example record (native extraction only)

{
  "sourceUrl": "https://example.com/invoice-2024-01.pdf",
  "pageCount": 2,
  "rawText": "Invoice #INV-2024-001\nDate: January 15, 2024\n...",
  "pages": "[{\"page\":1,\"text\":\"Invoice #INV-2024-001...\"},{\"page\":2,\"text\":\"Payment terms...\"}]",
  "structuredJson": null,
  "model": null,
  "processedAt": "2026-06-07T12:00:00.000Z",
  "status": "success",
  "errorMsg": null
}

Example record (with AI structuring)

{
  "sourceUrl": "https://example.com/invoice-2024-01.pdf",
  "pageCount": 2,
  "rawText": "Invoice #INV-2024-001\nDate: January 15, 2024\n...",
  "pages": "[{\"page\":1,\"text\":\"Invoice #INV-2024-001...\"}]",
  "structuredJson": "{\"title\":\"Invoice #INV-2024-001\",\"date\":\"January 15, 2024\",\"key_fields\":{\"invoice_number\":\"INV-2024-001\",\"amount\":\"$1,250.00\"}}",
  "model": "gpt-4o-mini",
  "processedAt": "2026-06-07T12:00:00.000Z",
  "status": "success",
  "errorMsg": null
}

Notes

Native extraction works on any text-based PDF (invoices, reports, forms, contracts). Scanned image-only PDFs return empty text — OCR for image PDFs is not currently supported.
AI structuring is additive. Even when the OpenAI call fails (rate limit, invalid key, network error), the actor returns the native extraction record with structuredJson: null rather than failing the run.
Custom prompts let you tailor the structuring output for a specific document type. For example: "Extract all line items as an array of {description, quantity, unit_price, total}".
File size limit: 50 MB per PDF. Larger files are rejected with an error record.
OpenAI costs are billed to your API key separately from actor usage.

PDF To JSON Parser

parseforge/pdf-to-json-parser

Convert PDF documents into structured JSON using AI-powered OCR and smart data extraction. The Actor processes every page to ensure complete coverage, then identifies text, fields, tables, and key details, delivering clean, organized JSON ready for automation or analysis.

ParseForge

5.0

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

517

Pdf API

vivid_astronaut/pdf

Fabio Suizu

PDF to Structured Data (JSON/CSV)

zenomastro/pdf-to-structured-data

Convert PDF files into clean structured JSON or CSV: text per page, reconstructed lines, optional table detection, and document metadata.

Rosario Vitale

PDF AI Extractor MCP

devaditya/pdf-ai-extractor-mcp

Extracts text, tables, summaries, and structured data from any PDF using OpenAI, Google Gemini, or Claude. Supports bulk AI processing, clean JSON exports, and an AI-ready MCP mode for agent workflows.

lalithhh

📄 PDF Text Extractor

scrapio/pdf-text-extractor

📄 PDF Text Extractor (pdf-text-extractor) extracts clean text from PDF files for faster search, data analysis, and content reuse. ⚡ Saves time & boosts productivity for research, automation, and document workflows.

Scrapio

PDF Parser API

george.the.developer/pdf-parser-api

Instant API that parses any PDF from a URL — extracts full text, page count, metadata (title, author, dates), and PDF version. Returns structured JSON. Perfect for document processing pipelines and AI agents.

George Kioko

PDF Text Extractor - Extract Text from PDF by URL API

eliai/pdf-text-extractor

Extract text from PDF by URL. Input: url of a PDF. Output: JSON with full extracted text, page count, and document metadata (title, author, dates). Built for RAG pipelines, document QA, and agents. Pay-per-result at $0.05 per PDF processed.

Anthony Snider

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

Akash Kumar Naik

113

PDF Text Extractor — PDF to Clean Text JSON

omao/pdf-text

Extract clean, structured text from any PDF by URL, page by page. Returns one row per page with de-hyphenated, whitespace-normalized text. Fast, no setup.