Any-to-Markdown avatar

Any-to-Markdown

Pricing

from $150.00 / 1,000 pages

Go to Apify Store
Any-to-Markdown

Any-to-Markdown

Any-to-Markdown converts images and scanned PDFs into structured Markdown using AI-powered OCR.

Pricing

from $150.00 / 1,000 pages

Rating

0.0

(0)

Developer

AbotAPI

AbotAPI

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Share

What does Any-to-Markdown do?

Any-to-Markdown converts images and scanned PDFs into structured Markdown using AI-powered OCR. Unlike basic OCR tools, it understands complex document layouts and recognizing mathematical formulas (LaTeX), tables, figures, and text in their correct reading order.

It supports 80+ languages and handles academic papers, technical documents, invoices, and any document where layout matters.

Key features

  • Formula recognition — Detects mathematical formulas and converts them to LaTeX
  • Table extraction — Recognizes table structures and outputs them as Markdown tables
  • Layout analysis — Understands document structure (titles, paragraphs, figures, captions)
  • Multi-language OCR — Supports 80+ languages including Chinese, Japanese, Korean, Arabic, and more
  • PDF support — Process multi-page PDF documents with page selection
  • Multiple input methods — Upload files directly or provide URLs

Use cases

  • Academic paper digitization — Convert scanned research papers with equations into editable Markdown/LaTeX
  • Technical document processing — Parse engineering specs, datasheets, and manuals preserving tables and formulas
  • Invoice and receipt parsing — Extract structured data from scanned financial documents
  • Book digitization — Convert scanned book pages into searchable, editable text
  • Data pipeline integration — Use the Apify API to automate document parsing in your workflows
  • RAG / LLM preparation — Convert documents to Markdown for use as context in AI applications

How to use

  1. Upload files or provide URLs to images (PNG, JPEG, BMP, WebP) or PDF documents
  2. Select a recognition mode based on your content type
  3. Choose your languages and configure options
  4. Run the Actor and get structured Markdown output

Recognition modes

ModeBest forDescription
Page (default)General documentsFull layout analysis — detects titles, text, formulas, tables, and figures
Text + FormulaMath-heavy contentMixed text and formula recognition without layout analysis
Formula OnlyEquationsPure mathematical formula to LaTeX conversion
Text OnlySimple documentsStandard OCR without formula detection
PDFMulti-page docsProcesses PDF files with optional page selection

Supported file formats

FormatExtensionsNotes
PDF.pdfMulti-page documents, scanned or digital. Use pageNumbers to select specific pages.
PNG.pngRecommended for screenshots and scanned documents
JPEG.jpg, .jpegPhotos, scans, camera captures
BMP.bmpUncompressed bitmap images
TIFF.tiff, .tifHigh-quality scans, multi-layer images
WebP.webpModern web image format
GIF.gifStatic GIF images (first frame only)

Input

The Actor accepts uploaded files or URLs pointing to images or PDFs. All other parameters are optional with sensible defaults.

The most important input fields are:

  • Upload Files — Drag and drop image or PDF files directly
  • URLs — List of URLs pointing to documents to parse
  • Recognition Mode — Select the type of content analysis (default: page)
  • Languages — Comma-separated language codes, e.g. en,ch_sim (default: en,ch_sim)

See the Input Schema tab for the full list of configuration options including table recognition, formula detection, image resize settings, and output format.

Output

The Actor outputs results to the Dataset (JSON) and/or Key-Value Store (Markdown files), depending on the outputFormat setting.

Dataset output example

{
"source": "https://example.com/document.png",
"success": true,
"recognitionMode": "page",
"markdown": "## Introduction\n\nThe equation $E = mc^2$ describes...\n\n| Column A | Column B |\n|----------|----------|\n| Value 1 | Value 2 |",
"format": "png",
"sizeBytes": 245000,
"processingTimeMs": 35000,
"elements": [
{ "type": "TITLE", "text_preview": "Introduction" },
{ "type": "TEXT", "text_preview": "The equation..." },
{ "type": "FORMULA", "text_preview": "E = mc^2" },
{ "type": "TABLE", "text_preview": "| Column A |..." }
],
"metadata": {
"parsed_at": "2026-02-19T03:08:04.944383",
"processing_time_ms": 35000,
"file_size_bytes": 245000,
"format": "PNG"
}
}

Key-Value Store output

When using keyValueStore or both output format, each successfully parsed document is saved as a .md file in the Key-Value Store, ready to download or use in downstream workflows.

Tips to optimize costs

  • Use Text Only mode for simple documents without formulas — it's faster
  • Set Disable Formula Recognition to true if you don't need LaTeX output
  • Use Page Numbers to process only specific pages of large PDFs
  • Lower the Resized Shape value (e.g., 512) for faster processing at the cost of some accuracy

Supported languages

The Actor supports 80+ languages via its built-in text recognition engine. When using only English (en) or Simplified Chinese (ch_sim), a dedicated fast OCR engine is used automatically. For all other languages, the full multilingual engine is activated.

Provide multiple languages as comma-separated codes, e.g. en,ch_sim,ja

Common language codes

CodeLanguageCodeLanguage
enEnglishch_simChinese (Simplified)
ch_traChinese (Traditional)jaJapanese
koKoreanviVietnamese
frFrenchdeGerman
esSpanishptPortuguese
itItaliannlDutch
ruRussianukUkrainian
arArabicfaPersian (Farsi)
hiHinditaTamil
thThaiidIndonesian
msMalaytlFilipino
trTurkishplPolish
csCzechroRomanian
huHungarianelGreek
svSwedishdaDanish
noNorwegianfiFinnish
bnBengaliurUrdu
neNepalimnMongolian
myMyanmarkaGeorgian