Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX
Pricing
from $1.00 / 1,000 converted documents
Pandoc Document Converter - HTML to Markdown, DOCX, EPUB, PPTX
Convert documents between formats with Pandoc in the cloud: HTML to Markdown for LLMs and RAG, Markdown to Word DOCX, EPUB e-books, PowerPoint PPTX, LaTeX, reStructuredText and more. Feed it URLs or raw text, get one converted document per input.
Pricing
from $1.00 / 1,000 converted documents
Rating
0.0
(0)
Developer
Nicolas van Arkens
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Pandoc Document Converter — HTML to Markdown, Markdown to DOCX, EPUB, PPTX & more
Convert documents between formats in bulk, with no install and no servers — this Actor wraps Pandoc, the universal document converter, and runs it in the cloud. Feed it URLs (it fetches them for you) and/or raw text, pick an output format, and get one converted document per input back.
Typical jobs it does in seconds:
- HTML → Markdown (turn web pages into clean Markdown for LLMs, RAG pipelines, or docs)
- Markdown → DOCX (deliver Word documents from generated text)
- Markdown → EPUB (package content as an e-book)
- Markdown → PPTX (headings become PowerPoint slides)
- LaTeX, reStructuredText, Org-mode, MediaWiki, Textile, DocBook, OPML, CSV in — Markdown, HTML, plain text, RTF, AsciiDoc, ODT and more out
What data you get
One dataset row per converted document:
| Field | Description |
|---|---|
source | The URL, or text #N for raw-text inputs |
ok | true when conversion succeeded |
inputFormat | The detected (or forced) source format |
outputFormat | The format you requested |
output | The converted document, inline — for text formats (Markdown, HTML, plain, LaTeX, …) |
outputCharacters | Length of the inline output |
downloadUrl | Direct download link — for binary formats (DOCX, PPTX, EPUB, ODT), stored in the run's key-value store |
outputBytes | Size of the binary file |
You are only charged for successful conversions — failed fetches or conversions are reported with ok: false and never billed.
Input example
{"urls": ["https://example.com/"],"texts": ["# Quarterly report\n\nRevenue grew **18%** quarter over quarter.\n\n- New customers: 412\n- Churn: 2.1%"],"inputFormat": "auto","outputFormat": "gfm"}
inputFormat: "auto" detects HTML vs Markdown per item (Content-Type header, file extension, or content sniffing). Set it explicitly for LaTeX, RST, Org, MediaWiki, Textile, DocBook, OPML or CSV sources.
Output sample (real run)
{"source": "https://example.com/","ok": true,"inputFormat": "html","outputFormat": "gfm","output": "# Example Domain\n\nThis domain is for use in documentation examples without needing permission. Avoid use in operations.\n\n[Learn more](https://iana.org/domains/example)\n","outputCharacters": 192}
And a binary conversion (Markdown → Word):
{"source": "text #1","ok": true,"inputFormat": "markdown","outputFormat": "docx","downloadUrl": "https://api.apify.com/v2/key-value-stores/<store-id>/records/converted-1.docx","outputBytes": 10580}
Use cases
- Feed web content to LLMs — convert pages to GitHub-flavored Markdown (
gfm) with--wrap=noneapplied automatically, ready for prompts, embeddings, or RAG ingestion. - Automated report delivery — your pipeline produces Markdown; this Actor turns it into DOCX or PPTX your stakeholders actually open. Chain it after any scraper or AI Actor via Apify integrations.
- Publishing workflows — convert a batch of Markdown chapters or HTML articles into EPUB e-books, or migrate docs between wikis (MediaWiki ⇄ Markdown ⇄ reStructuredText).
FAQ
Which formats are supported? Input: HTML, Markdown (Pandoc / GitHub-flavored / CommonMark), LaTeX, reStructuredText, Org, MediaWiki, Textile, DocBook, OPML, CSV — or auto-detect. Output: Markdown (GFM / Pandoc / CommonMark), HTML, plain text, DOCX, PPTX, EPUB, ODT, RTF, reStructuredText, LaTeX, AsciiDoc, Org, MediaWiki, Textile, OPML.
How do I get the DOCX / EPUB / PPTX files?
Binary outputs are stored in the run's key-value store; each dataset row contains a direct downloadUrl. Text outputs come back inline in the dataset.
Does it extract the article from a web page?
No — it converts the page verbatim, exactly like running pandoc on the HTML. Navigation and boilerplate present in the HTML will be present in the output. For readability extraction, run a content-extraction Actor first and pipe its HTML here.
Is PDF output supported? Not yet — PDF generation needs a LaTeX engine. Convert to DOCX or HTML and print/export to PDF, or ask for it in the Actor's Issues tab.
What does it cost? A small fee per successfully converted document (pay-per-event). Failed items are never charged.