Convert To Markdown
Pricing
from $15.00 / 1,000 file conversions
Convert To Markdown
Convert to Markdown, converts documents, spreadsheets, images (OCR), audio (transcription), and web/data files into clean Markdown. It runs fully locally, requires no API keys, and is ideal for LLMs, docs, and archiving.
Pricing
from $15.00 / 1,000 file conversions
Rating
0.0
(0)
Developer

Datavault
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Convert to Markdown - Versatile File Converter
The Convert to Markdown Actor is a high-performance, all-in-one utility designed to transform a wide variety of file formats into clean, structured Markdown. It is ideal for preparing data for LLMs (Large Language Models), documentation workflows, or archiving.
Features
- Documents: Converts PDF (preserving layout and structure), Word (.docx), and PowerPoint (.pptx) into clean Markdown.
- Spreadsheets: Transforms Excel (.xlsx) and CSV files into readable Markdown tables.
- Images (OCR): Extracts text from images (JPG, PNG, WebP, etc.) using automated OCR.
- Audio (Transcription): Transcribes speech from audio files (MP3, WAV, etc.) into text using local AI models.
- Web & Data: Converts HTML, JSON, and XML into formatted Markdown blocks or tables.
- Metadata Extraction: Automatically extracts technical metadata for images and audio files.
- No External API Keys: Everything runs locally inside the container (including OCR and Transcription).
Supported Formats
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, PPTX, TXT |
| Data | JSON, XML, CSV, HTML |
| Spreadsheets | XLSX |
| Images | PNG, JPG, JPEG, WEBP, BMP, TIFF |
| Audio | MP3, WAV, OGG, M4A, FLAC |
Input Parameters
urls: A list of URLs pointing to the files you want to convert.performOcr: (Default:true) Enable/disable OCR for images and scanned PDFs.extractMetadata: (Default:true) Enable/disable technical metadata extraction.proxyConfiguration: Use Apify Proxy if your target files are protected or geo-blocked.
Output
The Actor outputs a dataset where each item represents a converted file:
url: The original source URL.title: The filename or detected title.markdown: The full converted content in Markdown format.mimeType: The detected MIME type of the file.metadata: A JSON object containing technical metadata (e.g., Image dimensions, Audio duration, GPS data).
Sample Input
{"urls": ["https://example.com/document.pdf","https://example.com/photo.jpg"],"performOcr": true,"extractMetadata": true}
How it works
- Download: The Actor downloads the file from the provided URL.
- Identification: It detects the file type based on headers and extensions.
- Conversion:
- PDFs use specialized tools to preserve layout and then convert to Markdown.
- Word/PowerPoint are transformed using robust document processors.
- Images use advanced OCR for text and technical metadata extraction.
- Audio uses local AI models for speech-to-text transcription.
- Web/Data use specialized HTML and data parsers to build tables and lists.
- Formatting: All outputs are normalized into valid Markdown.
- Storage: Results are saved to the Apify Dataset and a
conversionevent is billed.
Performance Note
- Transcription/OCR: Processing large audio files or complex images can be CPU-intensive. The Actor uses optimized models for a balance between speed and accuracy.
- Memory: For very large Excel files or PDFs, ensure the Actor has at least 2GB of memory allocated.
Feedback & Improvements If you encounter a file format that isn't supported or have ideas for improvements, please leave us a message in the Issues tab!