Webpage to Markdown avatar

Webpage to Markdown

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Webpage to Markdown

Webpage to Markdown

Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Epic Scrapers

Epic Scrapers

Maintained by Community

Actor stats

2

Bookmarked

6

Total users

4

Monthly active users

11 days ago

Last modified

Share

Convert Any Web Page to Clean Markdown/HTML/JSON- Content Extraction Tool for AI, Web Scraping, and Automation

Submit a URL and get the page's core content back as clean Markdown or HTML in seconds. Automatically strips navigation bars, sidebars, headers, footers, ads, and other clutter from any page type — articles, documentation, landing pages, and more. Returns rich metadata including title, description, author, publish date, language, word count, and featured image with every result.

Features

  • One-shot extraction — Submit any URL and receive clean, structured content in seconds. No configuration required.
  • Markdown and HTML output — Get content in the format that fits your pipeline. Markdown for LLM and AI workflows, HTML for full-fidelity rendering.
  • Rich page metadata — Title, author, description, publication date, language, word count, domain, site name, and featured image extracted automatically from every page.
  • Schema.org structured data — Extracts JSON-LD and microdata where available.
  • Language-aware extraction — Set a preferred BCP 47 language to improve content selection on multilingual pages.
  • Manual content targeting — Override auto-detection with a custom CSS selector when you need content from a specific page region.
  • Debug mode — Inspect which elements were removed and why, to fine-tune extraction on challenging pages.
  • SPA fallback — Automatically handles client-side rendered single-page applications via third-party APIs.

Output example

{
"url": "https://tim.blog/2026/04/24/how-to-keep-your-brain-sharp/",
"title": "How to Keep Your Brain Sharp: A Practical Playbook Beyond the Basics",
"description": "The following is a guest post from Dr. Tommy Wood (@drtommywood), associate professor of pediatrics and neuroscience at the University of Washington, where his research focuses on brain health.",
"author": "Tim Ferriss",
"published": "2026-04-24T18:46:08+00:00",
"domain": "tim.blog",
"site": "The Blog of Author Tim Ferriss",
"image": "https://tim.blog/wp-content/uploads/2026/04/milad-fakurian-58Z17lnVS4U-unsplash-scaled.jpg",
"favicon": "https://i0.wp.com/tim.blog/wp-content/uploads/2025/05/favicon.png?fit=32%2C32&quality=80&ssl=1",
"language": "en-US",
"wordCount": 7961,
"parseTime": 167,
"outputFormat": "markdown",
"content": "..."
}

Input

FieldTypeDefaultDescription
urlsstring[]Required. List of URLs to process
outputFormatenummarkdownOutput format: markdown, html, or json (full metadata)
debugbooleanfalseEnable debug logging and debug info in results
languagestringPreferred BCP 47 language tag (e.g. en, fr, ja)
contentSelectorstringCSS selector to override auto-detection of main content

Output

Each URL produces a dataset entry with the following fields:

FieldTypeDescription
urlstringSource URL
titlestringPage title
contentstringExtracted content (Markdown or HTML depending on outputFormat)
descriptionstringPage description / summary
authorstringAuthor of the article
publishedstringPublication date
domainstringDomain name
sitestringWebsite name
imagestringMain image URL
faviconstringFavicon URL
languagestringDetected language (BCP 47)
wordCountnumberWord count
parseTimenumberParse time in milliseconds
outputFormatstringThe format used (markdown, html, or json)

In JSON mode, additional fields like metaTags, schemaOrgData, and debug info are included. If an error occurs, the entry contains error instead of content.

Sample output

Running against https://apify.com produces a dataset entry with the full page content converted to Markdown and rich metadata extracted automatically:

{
"url": "https://apify.com",
"title": "Apify: Full-stack web scraping and data extraction platform",
"description": "Cloud platform for web scraping, browser automation, AI agents, and data for AI.",
"domain": "apify.com",
"site": "Apify",
"language": "en",
"wordCount": 771,
"parseTime": 128,
"outputFormat": "markdown",
"content": "## Get real-time web data for your AI\n\nApify Actors scrape up-to-date web data..."
}

The content field contains the full page rendered as clean Markdown, with images, links, and headings preserved. Switch to outputFormat: "html" or "json" for different views of the same data.