Pandoc Document Converter avatar

Pandoc Document Converter

Pricing

Pay per usage

Go to Apify Store
Pandoc Document Converter

Pandoc Document Converter

Convert documents between formats (HTML, Markdown, DOCX, EPUB, PDF, LaTeX, RST, ODT, PPTX) using Pandoc. Accepts raw text or URL input.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Monkey Coder

Monkey Coder

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

📄 Pandoc Document Converter

Convert documents between multiple formats using the powerful Pandoc document conversion engine. Supports HTML, Markdown, DOCX, EPUB, PDF, LaTeX, RST, ODT, PPTX, and more.

✨ Features

  • 20+ format support — Convert between HTML, Markdown, GFM, CommonMark, LaTeX, RST, DOCX, EPUB, ODT, PPTX, PDF, plain text, AsciiDoc, MediaWiki, Org-mode, and more
  • URL input — Fetch content directly from a URL and convert it
  • Raw text input — Paste HTML, Markdown, or any supported format directly
  • Binary output — DOCX, EPUB, ODT, PPTX, and PDF files are saved to the key-value store for easy download
  • PDF generation — Powered by WeasyPrint (no heavy LaTeX installation needed)
  • Standalone mode — Produce complete documents with proper headers and footers

🔧 How It Works

  1. You provide content (raw text or a URL to fetch from)
  2. You specify the input format and desired output format
  3. The Actor runs Pandoc CLI to perform the conversion
  4. Text output (HTML, Markdown, etc.) is returned in the dataset
  5. Binary output (DOCX, EPUB, PDF, etc.) is saved to the key-value store and base64-encoded in the dataset

🚀 How to Use

  1. Set input — Either paste content in the "Content" field or enter a URL in "Source URL"
  2. Choose formats — Set "Input Format" (e.g., html) and "Output Format" (e.g., markdown)
  3. Run the Actor
  4. Get results — Check the dataset for text output, or download binary files from the key-value store

Common Conversions

FromToUse Case
htmlmarkdownConvert web pages to Markdown
markdownhtmlRender Markdown as HTML
htmldocxSave web content as Word document
markdowndocxCreate Word documents from Markdown
htmlepubConvert articles to e-book format
markdownpdfGenerate PDF from Markdown
htmlplainStrip HTML tags, extract plain text
latexhtmlConvert LaTeX papers to web format
htmlrstConvert to reStructuredText

📊 Sample Output (text conversion)

{
"from_format": "html",
"to_format": "markdown",
"input_size_bytes": 245,
"output_size_bytes": 128,
"output_type": "text",
"output": "# Hello World\n\nThis is a **sample HTML** document for conversion.\n\n- Item 1\n- Item 2\n- Item 3\n",
"converted_at": "2026-03-20T08:30:00.000000"
}

📊 Sample Output (binary conversion)

{
"from_format": "html",
"to_format": "docx",
"input_size_bytes": 245,
"output_size_bytes": 8432,
"output_type": "binary",
"output_base64": "UEsDBBQAAAAI...",
"download_key": "output.docx",
"converted_at": "2026-03-20T08:30:00.000000"
}

Binary files (DOCX, EPUB, ODT, PPTX, PDF) are also saved to the key-value store with the key output.<format> for direct download.

📝 Input Formats

html, markdown, gfm (GitHub Flavored Markdown), commonmark, latex, rst, textile, org, mediawiki, json (Pandoc AST)

📤 Output Formats

html, markdown, gfm, commonmark, latex, rst, plain, docx, epub, odt, pptx, asciidoc, mediawiki, org, pdf

⚠️ Notes

  • Input size limit: 10 MB maximum
  • PDF output: Uses WeasyPrint engine (supports CSS styling, no LaTeX needed)
  • Binary output: Files are base64-encoded in the dataset AND saved to the key-value store for direct download
  • URL fetching: Basic HTTP GET with browser-like User-Agent. Sites with advanced anti-bot protection may not work.
  • Memory: Recommended 1 GB for large documents or PDF generation