Webpage to Markdown
Pricing
from $1.00 / 1,000 results
Webpage to Markdown
Get the main content of any page as Markdown. Great for LLMs and AI agent workflows.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Epic Scrapers
Maintained by CommunityActor stats
2
Bookmarked
6
Total users
4
Monthly active users
11 days ago
Last modified
Categories
Share
Convert Any Web Page to Clean Markdown/HTML/JSON- Content Extraction Tool for AI, Web Scraping, and Automation
Submit a URL and get the page's core content back as clean Markdown or HTML in seconds. Automatically strips navigation bars, sidebars, headers, footers, ads, and other clutter from any page type — articles, documentation, landing pages, and more. Returns rich metadata including title, description, author, publish date, language, word count, and featured image with every result.
Features
- One-shot extraction — Submit any URL and receive clean, structured content in seconds. No configuration required.
- Markdown and HTML output — Get content in the format that fits your pipeline. Markdown for LLM and AI workflows, HTML for full-fidelity rendering.
- Rich page metadata — Title, author, description, publication date, language, word count, domain, site name, and featured image extracted automatically from every page.
- Schema.org structured data — Extracts JSON-LD and microdata where available.
- Language-aware extraction — Set a preferred BCP 47 language to improve content selection on multilingual pages.
- Manual content targeting — Override auto-detection with a custom CSS selector when you need content from a specific page region.
- Debug mode — Inspect which elements were removed and why, to fine-tune extraction on challenging pages.
- SPA fallback — Automatically handles client-side rendered single-page applications via third-party APIs.
Output example
{"url": "https://tim.blog/2026/04/24/how-to-keep-your-brain-sharp/","title": "How to Keep Your Brain Sharp: A Practical Playbook Beyond the Basics","description": "The following is a guest post from Dr. Tommy Wood (@drtommywood), associate professor of pediatrics and neuroscience at the University of Washington, where his research focuses on brain health.","author": "Tim Ferriss","published": "2026-04-24T18:46:08+00:00","domain": "tim.blog","site": "The Blog of Author Tim Ferriss","image": "https://tim.blog/wp-content/uploads/2026/04/milad-fakurian-58Z17lnVS4U-unsplash-scaled.jpg","favicon": "https://i0.wp.com/tim.blog/wp-content/uploads/2025/05/favicon.png?fit=32%2C32&quality=80&ssl=1","language": "en-US","wordCount": 7961,"parseTime": 167,"outputFormat": "markdown","content": "..."}
Input
| Field | Type | Default | Description |
|---|---|---|---|
urls | string[] | — | Required. List of URLs to process |
outputFormat | enum | markdown | Output format: markdown, html, or json (full metadata) |
debug | boolean | false | Enable debug logging and debug info in results |
language | string | — | Preferred BCP 47 language tag (e.g. en, fr, ja) |
contentSelector | string | — | CSS selector to override auto-detection of main content |
Output
Each URL produces a dataset entry with the following fields:
| Field | Type | Description |
|---|---|---|
url | string | Source URL |
title | string | Page title |
content | string | Extracted content (Markdown or HTML depending on outputFormat) |
description | string | Page description / summary |
author | string | Author of the article |
published | string | Publication date |
domain | string | Domain name |
site | string | Website name |
image | string | Main image URL |
favicon | string | Favicon URL |
language | string | Detected language (BCP 47) |
wordCount | number | Word count |
parseTime | number | Parse time in milliseconds |
outputFormat | string | The format used (markdown, html, or json) |
In JSON mode, additional fields like metaTags, schemaOrgData, and debug info are included. If an error occurs, the entry contains error instead of content.
Sample output
Running against https://apify.com produces a dataset entry with the full page content converted to Markdown and rich metadata extracted automatically:
{"url": "https://apify.com","title": "Apify: Full-stack web scraping and data extraction platform","description": "Cloud platform for web scraping, browser automation, AI agents, and data for AI.","domain": "apify.com","site": "Apify","language": "en","wordCount": 771,"parseTime": 128,"outputFormat": "markdown","content": "## Get real-time web data for your AI\n\nApify Actors scrape up-to-date web data..."}
The content field contains the full page rendered as clean Markdown, with images, links, and headings preserved. Switch to outputFormat: "html" or "json" for different views of the same data.