Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX avatar

Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX

Pricing

from $1.00 / 1,000 converted documents

Go to Apify Store
Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX

Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX

Convert documents between formats with Pandoc in the cloud: HTML to Markdown for LLMs and RAG, Markdown to Word DOCX, EPUB e-books, PowerPoint PPTX, LaTeX, reStructuredText and more. Feed it URLs or raw text, get one converted document per input.

Pricing

from $1.00 / 1,000 converted documents

Rating

0.0

(0)

Developer

Nicolas van Arkens

Nicolas van Arkens

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Pandoc Document Converter — HTML to Markdown, Markdown to DOCX, EPUB, PPTX & more

Convert documents between formats in bulk, with no install and no servers — this Actor wraps Pandoc, the universal document converter, and runs it in the cloud. Feed it URLs (it fetches them for you) and/or raw text, pick an output format, and get one converted document per input back.

Typical jobs it does in seconds:

  • HTML → Markdown (turn web pages into clean Markdown for LLMs, RAG pipelines, or docs)
  • Markdown → DOCX (deliver Word documents from generated text)
  • Markdown → EPUB (package content as an e-book)
  • Markdown → PPTX (headings become PowerPoint slides)
  • LaTeX, reStructuredText, Org-mode, MediaWiki, Textile, DocBook, OPML, CSV in — Markdown, HTML, plain text, RTF, AsciiDoc, ODT and more out

What data you get

One dataset row per converted document:

FieldDescription
sourceThe URL, or text #N for raw-text inputs
oktrue when conversion succeeded
inputFormatThe detected (or forced) source format
outputFormatThe format you requested
outputThe converted document, inline — for text formats (Markdown, HTML, plain, LaTeX, …)
outputCharactersLength of the inline output
downloadUrlDirect download link — for binary formats (DOCX, PPTX, EPUB, ODT), stored in the run's key-value store
outputBytesSize of the binary file

You are only charged for successful conversions — failed fetches or conversions are reported with ok: false and never billed.

Input example

{
"urls": ["https://example.com/"],
"texts": ["# Quarterly report\n\nRevenue grew **18%** quarter over quarter.\n\n- New customers: 412\n- Churn: 2.1%"],
"inputFormat": "auto",
"outputFormat": "gfm"
}

inputFormat: "auto" detects HTML vs Markdown per item (Content-Type header, file extension, or content sniffing). Set it explicitly for LaTeX, RST, Org, MediaWiki, Textile, DocBook, OPML or CSV sources.

Output sample (real run)

{
"source": "https://example.com/",
"ok": true,
"inputFormat": "html",
"outputFormat": "gfm",
"output": "# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)\n",
"outputCharacters": 192
}

And a binary conversion (Markdown → Word):

{
"source": "text #1",
"ok": true,
"inputFormat": "markdown",
"outputFormat": "docx",
"downloadUrl": "https://api.apify.com/v2/key-value-stores/<store-id>/records/converted-1.docx",
"outputBytes": 10580
}

Use cases

  • Feed web content to LLMs — convert pages to GitHub-flavored Markdown (gfm) with --wrap=none applied automatically, ready for prompts, embeddings, or RAG ingestion.
  • Automated report delivery — your pipeline produces Markdown; this Actor turns it into DOCX or PPTX your stakeholders actually open. Chain it after any scraper or AI Actor via Apify integrations.
  • Publishing workflows — convert a batch of Markdown chapters or HTML articles into EPUB e-books, or migrate docs between wikis (MediaWiki ⇄ Markdown ⇄ reStructuredText).

FAQ

Which formats are supported? Input: HTML, Markdown (Pandoc / GitHub-flavored / CommonMark), LaTeX, reStructuredText, Org, MediaWiki, Textile, DocBook, OPML, CSV — or auto-detect. Output: Markdown (GFM / Pandoc / CommonMark), HTML, plain text, DOCX, PPTX, EPUB, ODT, RTF, reStructuredText, LaTeX, AsciiDoc, Org, MediaWiki, Textile, OPML.

How do I get the DOCX / EPUB / PPTX files? Binary outputs are stored in the run's key-value store; each dataset row contains a direct downloadUrl. Text outputs come back inline in the dataset.

Does it extract the article from a web page? No — it converts the page verbatim, exactly like running pandoc on the HTML. Navigation and boilerplate present in the HTML will be present in the output. For readability extraction, run a content-extraction Actor first and pipe its HTML here.

Is PDF output supported? Not yet — PDF generation needs a LaTeX engine. Convert to DOCX or HTML and print/export to PDF, or ask for it in the Actor's Issues tab.

What does it cost? A small fee per successfully converted document (pay-per-event). Failed items are never charged.