Pricing

from $15.00 / 1,000 file conversions

Convert To Markdown

Convert to Markdown, converts documents, spreadsheets, images (OCR), audio (transcription), and web/data files into clean Markdown. It runs fully locally, requires no API keys, and is ideal for LLMs, docs, and archiving.

Pricing

from $15.00 / 1,000 file conversions

Rating

0.0

(0)

Developer

Datavault

Actor stats

Bookmarked

Total users

Monthly active users

5 months ago

Last modified

Convert to Markdown - Versatile File Converter

The Convert to Markdown Actor is a high-performance, all-in-one utility designed to transform a wide variety of file formats into clean, structured Markdown. It is ideal for preparing data for LLMs (Large Language Models), documentation workflows, or archiving.

Features

Documents: Converts PDF (preserving layout and structure), Word (.docx), and PowerPoint (.pptx) into clean Markdown.
Spreadsheets: Transforms Excel (.xlsx) and CSV files into readable Markdown tables.
Images (OCR): Extracts text from images (JPG, PNG, WebP, etc.) using automated OCR.
Audio (Transcription): Transcribes speech from audio files (MP3, WAV, etc.) into text using local AI models.
Web & Data: Converts HTML, JSON, and XML into formatted Markdown blocks or tables.
Metadata Extraction: Automatically extracts technical metadata for images and audio files.
No External API Keys: Everything runs locally inside the container (including OCR and Transcription).

Supported Formats

Category	Formats
Documents	PDF, DOCX, PPTX, TXT
Data	JSON, XML, CSV, HTML
Spreadsheets	XLSX
Images	PNG, JPG, JPEG, WEBP, BMP, TIFF
Audio	MP3, WAV, OGG, M4A, FLAC

Input Parameters

urls: A list of URLs pointing to the files you want to convert.
performOcr: (Default: true) Enable/disable OCR for images and scanned PDFs.
extractMetadata: (Default: true) Enable/disable technical metadata extraction.
proxyConfiguration: Use Apify Proxy if your target files are protected or geo-blocked.

Output

The Actor outputs a dataset where each item represents a converted file:

url: The original source URL.
title: The filename or detected title.
markdown: The full converted content in Markdown format.
mimeType: The detected MIME type of the file.
metadata: A JSON object containing technical metadata (e.g., Image dimensions, Audio duration, GPS data).

Sample Input

{
    "urls": [
        "https://example.com/document.pdf",
        "https://example.com/photo.jpg"
    ],
    "performOcr": true,
    "extractMetadata": true
}

How it works

Download: The Actor downloads the file from the provided URL.
Identification: It detects the file type based on headers and extensions.
Conversion:
- PDFs use specialized tools to preserve layout and then convert to Markdown.
- Word/PowerPoint are transformed using robust document processors.
- Images use advanced OCR for text and technical metadata extraction.
- Audio uses local AI models for speech-to-text transcription.
- Web/Data use specialized HTML and data parsers to build tables and lists.
Formatting: All outputs are normalized into valid Markdown.
Storage: Results are saved to the Apify Dataset and a conversion event is billed.

Performance Note

Transcription/OCR: Processing large audio files or complex images can be CPU-intensive. The Actor uses optimized models for a balance between speed and accuracy.
Memory: For very large Excel files or PDFs, ensure the Actor has at least 2GB of memory allocated.

Feedback & Improvements If you encounter a file format that isn't supported or have ideas for improvements, please leave us a message in the Issues tab!

Doc To Markdown MCP Server

abotapi/doc-to-markdown-mcp

An MCP server that converts documents to clean Markdown. Convert PDFs, Word docs, Excel spreadsheets, PowerPoints, HTML, images, and more to AI-friendly Markdown format.

AbotAPI

Web Page to Markdown Extractor — URL to Markdown API

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI agents, RAG, support, and automation workflows.

Hanna Nosova

Markdown API

vivid_astronaut/markdown

Fabio Suizu

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

SmartApi

5.0

File to Markdown

shahidirfan/file-to-markdown

Transform files into clean, readable Markdown instantly. Convert PDFs, documents, images, and more to structured Markdown format. Perfect for automating documentation workflows, content migration, and building knowledge bases. Ideal for developers, writers, and content teams.

Shahid Irfan

5.0

Website to Markdown Converter

lofomachines/website-to-markdown-converter

Best faster and cheaper way to convert any web page into clean, structured, LLM-ready Markdown.

Lofomachines

AI Markdown Maker

onescales/bulk-ai-markdown-maker

Convert any web page into clean, AI ready markdown format in seconds. This markdown generator is perfect for content for AI models, creating documentation, or archiving web content. It intelligently parses web content, removing ads, navigation, and other clutter. Generate Markdown Today!

One Scales

137

5.0

Website To Markdown

swarmgarden/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

Swarm Garden

Markdown Anything — URL to Markdown

s-r/markdown-anything

Convert any URL to clean markdown using a 3-provider fallback chain. Batch input, high concurrency.

Markdown Maker: HTML to Markdown 📝

shahidirfan/Markdown-Maker

Instantly convert complex HTML into clean, structured Markdown. This lightweight actor is optimized to render web content into a format that is easily readable for AI LLMs, reducing token usage and improving context. Perfect for RAG pipelines and preparing data for training.

Shahid Irfan