PDF to Markdown Converter
Pricing
$4.00/month + usage
Go to Apify Store
PDF to Markdown Converter
Convert PDFs to clean Markdown with optional OCR for scanned documents. Uses PDF.js for text extraction and Tesseract.js for optical character recognition.
Pricing
$4.00/month + usage
Rating
0.0
(0)
Developer

Web Harvester
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Convert PDFs to clean Markdown with optional OCR for scanned documents. Lightweight alternative to heavy document processing tools.
Features
- Fast Text Extraction: Uses PDF.js for native text PDFs
- OCR Support: Tesseract.js for scanned/image documents
- Smart Mode: Auto-detects best extraction method per page
- Layout Preservation: Maintains document structure
- Multi-language OCR: 14+ languages supported
- Batch Processing: Convert multiple PDFs at once
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
file | string | - | Upload a PDF file |
pdfUrls | array | - | URLs of PDFs to convert |
mode | string | "quick" | Extraction mode |
language | string | "eng" | OCR language |
preserveLayout | boolean | true | Preserve document structure |
Extraction Modes
- quick: Fast extraction using PDF.js - best for native text PDFs
- ocr: Tesseract OCR - use for scanned documents or images
- combined: Auto-detects per page - uses OCR when text extraction fails
Output
Results are saved to the dataset:
{"status": "success","fileName": "document.pdf","pdfUrl": "https://...","markdown": "# Document Title\n\nContent here...","pageCount": 5,"extractionMethod": "pdf.js","characterCount": 12345}
Use Cases
- LLM Preprocessing: Convert PDFs for AI/RAG pipelines
- Documentation Migration: Convert PDF docs to Markdown
- Content Extraction: Pull text from reports and papers
- Accessibility: Make PDF content more accessible
- Archive Conversion: Convert legacy PDFs to modern format
Supported Languages (OCR)
- English, French, German, Spanish, Italian
- Portuguese, Dutch, Polish, Russian
- Chinese (Simplified/Traditional)
- Japanese, Korean, Arabic
Example
# Using Apify CLIapify run -i '{"pdfUrls": ["https://example.com/document.pdf"],"mode": "combined","language": "eng"}'
Technical Notes
- Quick mode is 10-50x faster than OCR
- OCR quality depends on scan quality and resolution
- Combined mode adds overhead for analysis
- Large PDFs may require more memory
- Some complex layouts may not convert perfectly