Pricing

from $1.00 / 1,000 results

Go to Apify Store

Webpage to Markdown

Try for free

Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Epic Scrapers

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Convert Any Web Page to Clean Markdown/HTML/JSON- Content Extraction Tool for AI, Web Scraping, and Automation

Submit a URL and get the page's core content back as clean Markdown or HTML in seconds. Automatically strips navigation bars, sidebars, headers, footers, ads, and other clutter from any page type — articles, documentation, landing pages, and more. Returns rich metadata including title, description, author, publish date, language, word count, and featured image with every result.

Features

One-shot extraction — Submit any URL and receive clean, structured content in seconds. No configuration required.
Markdown and HTML output — Get content in the format that fits your pipeline. Markdown for LLM and AI workflows, HTML for full-fidelity rendering.
Rich page metadata — Title, author, description, publication date, language, word count, domain, site name, and featured image extracted automatically from every page.
Schema.org structured data — Extracts JSON-LD and microdata where available.
Language-aware extraction — Set a preferred BCP 47 language to improve content selection on multilingual pages.
Manual content targeting — Override auto-detection with a custom CSS selector when you need content from a specific page region.
Debug mode — Inspect which elements were removed and why, to fine-tune extraction on challenging pages.
SPA fallback — Automatically handles client-side rendered single-page applications via third-party APIs.

Output example

{
  "url": "https://tim.blog/2026/04/24/how-to-keep-your-brain-sharp/",
  "title": "How to Keep Your Brain Sharp: A Practical Playbook Beyond the Basics",
  "description": "The following is a guest post from Dr. Tommy Wood (@drtommywood), associate professor of pediatrics and neuroscience at the University of Washington, where his research focuses on brain health.",
  "author": "Tim Ferriss",
  "published": "2026-04-24T18:46:08+00:00",
  "domain": "tim.blog",
  "site": "The Blog of Author Tim Ferriss",
  "image": "https://tim.blog/wp-content/uploads/2026/04/milad-fakurian-58Z17lnVS4U-unsplash-scaled.jpg",
  "favicon": "https://i0.wp.com/tim.blog/wp-content/uploads/2025/05/favicon.png?fit=32%2C32&quality=80&ssl=1",
  "language": "en-US",
  "wordCount": 7961,
  "parseTime": 167,
  "outputFormat": "markdown",
  "content": "..."
}

Input

Field	Type	Default	Description
`urls`	`string[]`	—	Required. List of URLs to process
`outputFormat`	`enum`	`markdown`	Output format: `markdown`, `html`, or `json` (full metadata)
`debug`	`boolean`	`false`	Enable debug logging and debug info in results
`language`	`string`	—	Preferred BCP 47 language tag (e.g. `en`, `fr`, `ja`)
`contentSelector`	`string`	—	CSS selector to override auto-detection of main content

Output

Each URL produces a dataset entry with the following fields:

Field	Type	Description
`url`	`string`	Source URL
`title`	`string`	Page title
`content`	`string`	Extracted content (Markdown or HTML depending on `outputFormat`)
`description`	`string`	Page description / summary
`author`	`string`	Author of the article
`published`	`string`	Publication date
`domain`	`string`	Domain name
`site`	`string`	Website name
`image`	`string`	Main image URL
`favicon`	`string`	Favicon URL
`language`	`string`	Detected language (BCP 47)
`wordCount`	`number`	Word count
`parseTime`	`number`	Parse time in milliseconds
`outputFormat`	`string`	The format used (`markdown`, `html`, or `json`)

In JSON mode, additional fields like metaTags, schemaOrgData, and debug info are included. If an error occurs, the entry contains error instead of content.

Sample output

Running against https://apify.com produces a dataset entry with the full page content converted to Markdown and rich metadata extracted automatically:

{
  "url": "https://apify.com",
  "title": "Apify: Full-stack web scraping and data extraction platform",
  "description": "Cloud platform for web scraping, browser automation, AI agents, and data for AI.",
  "domain": "apify.com",
  "site": "Apify",
  "language": "en",
  "wordCount": 771,
  "parseTime": 128,
  "outputFormat": "markdown",
  "content": "## Get real-time web data for your AI\n\nApify Actors scrape up-to-date web data..."
}

The content field contains the full page rendered as clean Markdown, with images, links, and headings preserved. Switch to outputFormat: "html" or "json" for different views of the same data.

AI Web-to-Markdown Extract API — URL to Clean JSON for LLMs

olican/ai-web-to-markdown-extract

Scrapes any webpage, automatically cleans HTML clutter (nav, footers, scripts, ads, cookie consent banners), and transforms the main content into clean, structured Markdown for LLMs and RAG.

Sergio Calvo

5.0

Markdown API

vivid_astronaut/markdown

Fabio Suizu

Webpage Text & Markdown Extractor

snapperwapper/webpage-text-markdown-extractor

Convert up to 1,000 webpage URLs into clean readable text, Markdown, metadata, canonical URLs, images, and deduplicated links for AI and content workflows.

snapperwapper

Website To Markdown

swarmgarden/website-to-markdown

Convert any webpage to clean, readable Markdown format. Perfect for content extraction and readability.

Swarm Garden

URL to Markdown for LLMs (polite, robots-respecting)

weltverbenzer/url-to-markdown-for-llms

Turn any URL into clean, LLM-ready Markdown for AI agents and RAG pipelines. Enforces robots.txt, extracts main content (Readability) and converts to Markdown. Returns title, byline and markdown.

Johannes Witt

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

Mustafa Irshaid

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.