Pricing

from $5.00 / 1,000 converted documents

Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX

Convert documents between formats with Pandoc in the cloud: HTML to Markdown for LLMs and RAG, Markdown to Word DOCX, EPUB e-books, PowerPoint PPTX, LaTeX, reStructuredText and more. Feed it URLs or raw text, get one converted document per input.

Pricing

from $5.00 / 1,000 converted documents

Rating

0.0

(0)

Developer

Nicolas van Arkens

Actor stats

Bookmarked

Total users

Monthly active users

21 days ago

Last modified

Pandoc Document Converter — HTML to Markdown, Markdown to DOCX, EPUB, PPTX & more

Convert documents between formats in bulk, with no install and no servers — this Actor wraps Pandoc, the universal document converter, and runs it in the cloud. Feed it URLs (it fetches them for you) and/or raw text, pick an output format, and get one converted document per input back.

Typical jobs it does in seconds:

HTML → Markdown (turn web pages into clean Markdown for LLMs, RAG pipelines, or docs)
Markdown → DOCX (deliver Word documents from generated text)
Markdown → EPUB (package content as an e-book)
Markdown → PPTX (headings become PowerPoint slides)
LaTeX, reStructuredText, Org-mode, MediaWiki, Textile, DocBook, OPML, CSV in — Markdown, HTML, plain text, RTF, AsciiDoc, ODT and more out

What data you get

One dataset row per converted document:

Field	Description
`source`	The URL, or `text #N` for raw-text inputs
`ok`	`true` when conversion succeeded
`inputFormat`	The detected (or forced) source format
`outputFormat`	The format you requested
`output`	The converted document, inline — for text formats (Markdown, HTML, plain, LaTeX, …)
`outputCharacters`	Length of the inline output
`downloadUrl`	Direct download link — for binary formats (DOCX, PPTX, EPUB, ODT), stored in the run's key-value store
`outputBytes`	Size of the binary file

You are only charged for successful conversions — failed fetches or conversions are reported with ok: false and never billed.

Input example

{
    "urls": ["https://example.com/"],
    "texts": ["# Quarterly report\n\nRevenue grew **18%** quarter over quarter.\n\n- New customers: 412\n- Churn: 2.1%"],
    "inputFormat": "auto",
    "outputFormat": "gfm"
}

inputFormat: "auto" detects HTML vs Markdown per item (Content-Type header, file extension, or content sniffing). Set it explicitly for LaTeX, RST, Org, MediaWiki, Textile, DocBook, OPML or CSV sources.

Output sample (real run)

{
    "source": "https://example.com/",
    "ok": true,
    "inputFormat": "html",
    "outputFormat": "gfm",
    "output": "# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)\n",
    "outputCharacters": 192
}

And a binary conversion (Markdown → Word):

{
    "source": "text #1",
    "ok": true,
    "inputFormat": "markdown",
    "outputFormat": "docx",
    "downloadUrl": "https://api.apify.com/v2/key-value-stores/<store-id>/records/converted-1.docx",
    "outputBytes": 10580
}

Use cases

Feed web content to LLMs — convert pages to GitHub-flavored Markdown (gfm) with --wrap=none applied automatically, ready for prompts, embeddings, or RAG ingestion.
Automated report delivery — your pipeline produces Markdown; this Actor turns it into DOCX or PPTX your stakeholders actually open. Chain it after any scraper or AI Actor via Apify integrations.
Publishing workflows — convert a batch of Markdown chapters or HTML articles into EPUB e-books, or migrate docs between wikis (MediaWiki ⇄ Markdown ⇄ reStructuredText).

FAQ

Which formats are supported? Input: HTML, Markdown (Pandoc / GitHub-flavored / CommonMark), LaTeX, reStructuredText, Org, MediaWiki, Textile, DocBook, OPML, CSV — or auto-detect. Output: Markdown (GFM / Pandoc / CommonMark), HTML, plain text, DOCX, PPTX, EPUB, ODT, RTF, reStructuredText, LaTeX, AsciiDoc, Org, MediaWiki, Textile, OPML.

How do I get the DOCX / EPUB / PPTX files? Binary outputs are stored in the run's key-value store; each dataset row contains a direct downloadUrl. Text outputs come back inline in the dataset.

Does it extract the article from a web page? No — it converts the page verbatim, exactly like running pandoc on the HTML. Navigation and boilerplate present in the HTML will be present in the output. For readability extraction, run a content-extraction Actor first and pipe its HTML here.

Is PDF output supported? Not yet — PDF generation needs a LaTeX engine. Convert to DOCX or HTML and print/export to PDF, or ask for it in the Actor's Issues tab.

What does it cost? A small fee per successfully converted document (pay-per-event). Failed items are never charged.

Pandoc Document Converter

gentle_cloud/pandoc-document-converter

Convert documents between formats (HTML, Markdown, DOCX, EPUB, PDF, LaTeX, RST, ODT, PPTX) using Pandoc. Accepts raw text or URL input.

Monkey Coder

Pandoc Document Converter

incredible_moment/pandoc-actor

Universal document converter. Transform Markdown, HTML, and text to PDF, DOCX, EPUB, and more. High-performance Rust wrapper for the Pandoc engine ensures fast execution and low memory footprint.

Daniel Rosen

Document Parser — PDF/DOCX to Markdown & JSON for RAG

genuine_qa/document-parser

Convert PDF, DOCX, PPTX, XLSX, HTML and images into clean Markdown or JSON for RAG and LLM pipelines. Powered by IBM's open-source Docling.

Rahul Bhiwagade

Word, PowerPoint & Excel to Markdown — for RAG & AI Agents

lizaraco/office-docs-to-markdown

Convert DOCX, PPTX, and XLSX files to clean, LLM-ready markdown at scale. Headings, tables, slides, and sheets preserved. Never-fail runs, per-document output. The Office twin of PDF-to-Markdown.

Shawn Downs

Doc-to-Markdown/JSON RAG Prep - Convert PDF & DOCX for RAG

bigjoecoding/doc-to-markdown-json-rag-prep

Convert PDF, DOCX, PPTX and webpages to clean Markdown and RAG-ready JSON chunks for your embedding pipeline. No LLM cost. $0.03 per document.

Joseph Curry

RAG Document Converter

web.harvester/rag-document-converter

Convert PDF, DOCX, PPTX, and other documents to clean Markdown optimized for RAG pipelines. Preserves structure, tables, and headers. Powered by IBM Docling.

Web Harvester

PDF & Document to Markdown - PDF, DOCX & HTML for LLMs

entranced_gelato/ai-document-reader

Turn any PDF, DOCX, TXT, or HTML document into clean, LLM-ready text + Markdown with metadata (title, pages, word count) and an optional AI summary. The document counterpart to a web reader — built for RAG ingestion, document Q&A, and AI agents (LangChain, LlamaIndex). Fast, structured, single-call.

AIDevs

Universal Document Format Transformer

actorify/universal-document-format-transformer

Universal Document Format Transformer: a cloud-based Apify Actor that converts documents (PDF, DOCX, PPTX, HTML, TXT) into Markdown, JSON, CSV, HTML or TXT using Pandoc. Easy REST API for automations (n8n, Zapier, Make), production-ready error handling, and security controls.

fanio zilla

Document Format Converter — Markdown, HTML & Text

junipr/document-format-converter

Convert Markdown, HTML, plain text, JSON, and CSV-style documents into clean automation-ready formats with downloadable output files.

junipr

PDF & DOCX to Markdown — Document Extractor for LLM/RAG

fetchbase/document-to-markdown

Convert PDF and Word (DOCX) documents into clean Markdown, text, or JSON. Smart PDF paragraph reflow, page markers for RAG citations, full DOCX structure (headings, lists, tables), custom auth headers. No browser — parses in seconds. Charged per page processed — no startup fee.