PDF to Markdown Converter - Extract & Format Text avatar

PDF to Markdown Converter - Extract & Format Text

Pricing

$50.00 / 1,000 pdf converteds

Go to Apify Store
PDF to Markdown Converter - Extract & Format Text

PDF to Markdown Converter - Extract & Format Text

Convert PDF documents to clean, readable markdown format. Perfect for documentation and knowledge bases.

Pricing

$50.00 / 1,000 pdf converteds

Rating

0.0

(0)

Developer

daehwan kim

daehwan kim

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

PDF to Markdown Converter

Extract clean, usable text from any PDF — research papers, contracts, reports, manuals — and output structured Markdown ready for LLMs, RAG pipelines, or document analysis.

No external APIs. No proprietary services. Built on open source.

Why Use This

Most PDFs are locked — the text is there, but buried in binary format that LLMs can't read. This Actor extracts the text, cleans it up, and returns it as Markdown you can immediately feed into any AI workflow.

$0.05 per PDF. No subscription, no monthly fee, no setup.

Use Cases

  • RAG pipelines — Convert research papers, whitepapers, or documentation PDFs into text chunks before embedding
  • Contract analysis — Extract legal document text for LLM review
  • Report processing — Batch-process financial reports, audit documents, or regulatory filings
  • Knowledge base ingestion — Convert PDF manuals and guides into searchable text
  • Academic research — Process arXiv papers, theses, or journal articles at scale

Input

ParameterTypeRequiredDescription
pdfUrlstringDirect URL to a machine-readable PDF file
includePageNumbersbooleanInsert --- Page N --- markers between pages (default: false)
maxPagesintegerLimit pages processed. 0 = all pages (default: 0)
{
"pdfUrl": "https://arxiv.org/pdf/2305.10601",
"includePageNumbers": true,
"maxPages": 20
}

Output

One item per PDF pushed to the dataset:

FieldTypeDescription
pdfUrlstringSource PDF URL
pageCountintegerNumber of pages processed
wordCountintegerTotal words extracted
markdownstringExtracted text in Markdown format
disclaimerstringAccuracy disclaimer
{
"pdfUrl": "https://arxiv.org/pdf/2305.10601",
"pageCount": 15,
"wordCount": 8432,
"markdown": "# Tree of Thoughts: Deliberate Problem Solving with Large Language Models\n\n## Abstract\n\nLanguage models are increasingly being deployed for general problem solving..."
}

Pricing

  • $0.05 per PDF converted
  • Charged only on successful conversion
  • No charge for validation errors or failed runs

Quick Start

curl

curl -X POST https://api.apify.com/v2/acts/{ACTOR_ID}/runs \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"pdfUrl": "https://arxiv.org/pdf/2305.10601",
"includePageNumbers": true,
"maxPages": 20
}'

JavaScript (Apify Client)

const { ApifyClient } = require('apify-client');
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('YOUR_ACTOR_ID').call({
pdfUrl: 'https://arxiv.org/pdf/2305.10601',
includePageNumbers: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].markdown);

Limitations

LimitationDetails
Scanned PDFsNot supported — requires machine-readable text layers
Image-only PDFsWill return minimal or empty text
Encrypted PDFsPassword-protected files cannot be parsed
Non-Latin scriptsAccuracy varies for Arabic, CJK, and other scripts
Complex layoutsMulti-column or heavily formatted PDFs may have extraction quirks

Always verify extracted text against the original for critical use cases.

Technology

  • pdf-parse — MIT License — PDF text extraction
  • Apify SDK — Apache 2.0 License — Actor runtime and dataset management

Disclaimer

This tool extracts text from PDF files using open source libraries. Accuracy depends on PDF structure and encoding. Results should be reviewed for critical use cases. Not a substitute for professional document review.


Extend this actor with the ntriqpro intelligence network:

⭐ Love it? Leave a Review

Your rating helps other professionals discover this actor. Rate it here.