Pricing

$19.99/month + usage

Try for free

Go to Apify Store

Website Content to Markdown for LLM Training

Try for free

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

Pricing

$19.99/month + usage

Rating

5.0

(2)

Developer

EasyApi

Actor stats

Bookmarked

247

Total users

Monthly active users

3 months ago

Last modified

📘 Website Content to Markdown Scraper for LLM Training

This powerful Apify Actor transforms web content into clean, readable Markdown format, perfect for training Large Language Models (LLMs). It's an essential tool for AI researchers, data scientists, and developers working on natural language processing tasks.

✨ Features

🌐 Scrape content from multiple web pages
📝 Convert HTML to clean Markdown format
🧠 Generate high-quality training data for LLMs
🔍 Intelligent main content extraction
🚀 Fast and efficient with concurrent scraping
🕵️‍♂️ Stealth mode to avoid detection

📥 Input

Configure your scraping job with these options:

urls: List of URLs to start scraping from

📤 Output

For each scraped page, you'll get:

🔗 URL of the page
📄 Main content in Markdown format, ideal for LLM training

💡 Use Cases

🤖 LLM Training: Prepare web content as high-quality training data for language models
📚 Content Aggregation: Collect articles and blog posts for research or curation
📊 Web Analysis: Extract text content for sentiment analysis or topic modeling
📑 Documentation: Convert web-based documentation into Markdown for easy integration
🔍 SEO Analysis: Extract and analyze content from competitor websites

🚀 Getting Started

Set your input parameters in the Apify console or via API
Run the Actor and watch as it transforms web content into Markdown
Access your results in JSON format, with Markdown content ready for LLM training or further processing

🆘 Support

If you encounter any issues or have questions, please reach out through Apify's support channels.

Transform web content into clean, LLM-ready Markdown with just a few clicks! 🚀📝🧠

Input Example

A full explanation of an input example in JSON.

{
    "urls": [
        "https://apify.com",
        "https://www.google.com"
    ]
}

Output sample

The results will be wrapped into a dataset which you can always find in the Storage tab. Here's an excerpt from the data you'd get if you apply the input parameters above:

And here is the same data but in JSON. You can choose in which format to download your data: JSON, JSONL, Excel spreadsheet, HTML table, CSV, or XML.

[
	{
        "url": "https://apify.com",
        "markdown": "# Apify: Full-stack web scraping and data extraction platform\nApify is the largest ecosystem where developers build, deploy, and publish data extraction and web automation tools. We call them Actors.\n\n[\n\n![TikTok Data Extractor avatar](https://images.apifyusercontent.com/7qfIRNOkv0aSxbge7HEFXP8fgI4yljBl733910rKX1k/rs:fill:76:76/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9PdHpZZksxbmRFR2R3V0ZLUS95dHVpeFA1UnRQelNhQjdiNy1GcmVlX1Rpa1Rva19TY3JhcGVyLnBuZw.webp)\n\n### TikTok Data Extractor\n\nclockworks/free-tiktok-scraper\n\nExtract data about videos, users, and channels based on hashtags or scrape full user profiles including posts, total likes, name, nickname, numbers of comments, shares, followers, following, and more.\n\n](https://apify.com/clockworks/free-tiktok-scraper)[\n\n![Google Maps Extractor avatar](https://images.apifyusercontent.com/-ts0t-LAFw3ga_GTVvlkBQsIkUsT-OrI9eIJbZmTLKM/rs:fill:76:76/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMudXMtZWFzdC0xLmFtYXpvbmF3cy5jb20vMk1kbWExTjZGZDB5M1FFalIvRDNWdGRjU2Z6endhNEJyVkQtR29vZ2xlX21hcHNfZGF0YV9leHRyYWN0b3IucG5n.webp)\n\n### Google Maps Extractor\n\ncompass/google-maps-extractor\n\nExtract data from hundreds of places fast. Scrape Google Maps by keyword, category, location, URLs & other filters. Get addresses, contact info, opening hours, popular times, prices, menus & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.\n\n![User avatar](https://images.apifyusercontent.com/7kyvJioGcnMkehrfG50CojuTmXLPpPOmOBB2gJTbkXY/rs:fill:36:36/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9SNkhDRllhWk5xalFBOVpXeC9lUVN4empBNnNlTm9yNnR6Sy1jb21wYXNzXyUyODQlMjkucG5n.webp)\n\nCompass\n\n\n\n](https://apify.com/compass/google-maps-extractor)[\n\n![Instagram Scraper avatar](https://images.apifyusercontent.com/z7CK6Vj49M5QKOtlDLKVq6H0rKofEr04xF0XdWERwtE/rs:fill:76:76/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9zaHU4aHZyWGJKYlkzRWI5Vy9LQTk4aWd0S3RZaldtRmt1Yy1JbnN0YWdyYW1fU2NyYXBlci5wbmc.webp)\n\n### Instagram Scraper\n\napify/instagram-scraper\n\nScrape and download Instagram posts, profiles, places, hashtags, photos, and comments. Get data from Instagram using one or more Instagram URLs or search queries. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.\n\n![User avatar](https://images.apifyusercontent.com/Dfq_kwuUdrHlMXgpdmKRgXTyIfp9xVU6ysUXM5ppgyU/rs:fill:36:36/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9ac2NNd0ZSNUg3ZUN0V3R5aC9ZcXRrUW1FeFpwbU1kNmRKUS1hcGlmeV9zeW1ib2xfd2hpdGVfYmcucG5n.webp)\n\nApify\n\n\n\n](https://apify.com/apify/instagram-scraper)[\n\n![Website Content Crawler avatar](https://images.apifyusercontent.com/L4ha9DtGVFLjeFAWBPcr0MSH1c2RbtiWSwlbnOKjLZw/rs:fill:76:76/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9hWUcwbDlzN2RiQjdqM2diUy9QZlRvRU5rSlp4YWh6UER1My1DbGVhblNob3RfMjAyMy0wMy0yOF9hdF8xMC40MC4yMF8yeC5wbmc.webp)\n\n### Website Content Crawler\n\napify/website-content-crawler\n\nCrawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.\n\n![User avatar](https://images.apifyusercontent.com/Dfq_kwuUdrHlMXgpdmKRgXTyIfp9xVU6ysUXM5ppgyU/rs:fill:36:36/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9ac2NNd0ZSNUg3ZUN0V3R5aC9ZcXRrUW1FeFpwbU1kNmRKUS1hcGlmeV9zeW1ib2xfd2hpdGVfYmcucG5n.webp)\n\nApify\n\n\n\n](https://apify.com/apify/website-content-crawler)[\n\n![Amazon Scraper avatar](https://images.apifyusercontent.com/76JhHGsrEYmhrP4rQqvC5f0zNQxx3XPvA6KDUZ3Qfrk/rs:fill:76:76/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9YVkRUUWM0YTdNRFRxU1RNSi90aFBUcmhlNVN3dFhUWkNGaC1GcmVlX0FtYXpvbl9Qcm9kdWN0X1NjcmFwZXIucG5n.webp)\n\n### Amazon Scraper\n\njunglee/free-amazon-product-scraper\n\nGets you product data from Amazon. Unofficial API. Scrapes and downloads product information without using the Amazon API, including reviews, prices, descriptions, and ASIN.\n\n![User avatar](https://images.apifyusercontent.com/r-SgQqEX3CCTC3LXS1EOLpWmM4G3gCv9CE_W3jM95bM/rs:fill:36:36/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMudXMtZWFzdC0xLmFtYXpvbmF3cy5jb20vVFg0clBKQkhiaFNLaTIzWHMvSFl6Umtmc204aGlOVEo3SnAtSnVuZ2xlZS5wbmc.webp)\n\nJunglee\n\n\n\n](https://apify.com/junglee/free-amazon-product-scraper)[\n\n![Build your own Actor avatar](https://apify.com/img/homepage/own-actor/actor-avatar.svg)\n\n### Build your own Actor\n\nyou/new-idea\n\nApify gives you all the tools and documentation you need to build reliable scrapers. Fast.\n\n![User avatar](https://apify.com/img/homepage/own-actor/user-avatar.svg)\n\nYou? 🫵\n\n\n\n](https://apify.com/templates)\n\n[\n\n![TikTok Data Extractor avatar](https://images.apifyusercontent.com/7qfIRNOkv0aSxbge7HEFXP8fgI4yljBl733910rKX1k/rs:fill:76:76/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9PdHpZZksxbmRFR2R3V0ZLUS95dHVpeFA1UnRQelNhQjdiNy1GcmVlX1Rpa1Rva19TY3JhcGVyLnBuZw.webp)\n\n### TikTok Data Extractor\n\nclockworks/free-tiktok-scraper\n\nExtract data about videos, users, and channels based on hashtags or scrape full user profiles including posts, total likes, name, nickname, numbers of comments, shares, followers, following, and more.\n\n![User avatar](https://images.apifyusercontent.com/aS0PlpH4RIB3OtTeFlQcLCpzc1tWzrWEmE7g_ya85ik/rs:fill:36:36/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS84V3JkTVBpTDROcDJrZzdGQy9KcGZFRHFZeUg3V0Zld1kyQy1jbG9ja3dvcmtzLnBuZw.webp)\n\nClockworks\n\n\n\n](https://apify.com/clockworks/free-tiktok-scraper)\n\n[\n\n![Google Maps Extractor avatar](https://images.apifyusercontent.com/-ts0t-LAFw3ga_GTVvlkBQsIkUsT-OrI9eIJbZmTLKM/rs:fill:76:76/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMudXMtZWFzdC0xLmFtYXpvbmF3cy5jb20vMk1kbWExTjZGZDB5M1FFalIvRDNWdGRjU2Z6endhNEJyVkQtR29vZ2xlX21hcHNfZGF0YV9leHRyYWN0b3IucG5n.webp)\n\n### Google Maps Extractor\n\ncompass/google-maps-extractor\n\nExtract data from hundreds of places fast. Scrape Google Maps by keyword, category, location, URLs & other filters. Get addresses, contact info, opening hours, popular times, prices, menus & more. Export scraped data, run the scraper via API, schedule and monitor runs, or integrate with other tools.\n\n![User avatar](https://images.apifyusercontent.com/7kyvJioGcnMkehrfG50CojuTmXLPpPOmOBB2gJTbkXY/rs:fill:36:36/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9SNkhDRllhWk5xalFBOVpXeC9lUVN4empBNnNlTm9yNnR6Sy1jb21wYXNzXyUyODQlMjkucG5n.webp)\n\nCompass\n\n\n\n](https://apify.com/compass/google-maps-extractor)\n\n[Browse 3,000+ Actors](https://apify.com/store)\n\nTrusted by global technology leaders\n\nNot just a web scraping API\n---------------------------\n\nEasily integrate  \nZapierany appGitHubGoogle SheetsPineconeAirbyteKeboolaGoogle DriveSlackZapier  \nwith Actors\n--------------------------------------------------------------------------------------------------------------\n\nBuild reliable web scrapers. Fast.\n----------------------------------\n\n### We love open source\n\nApify works great with both Python and JavaScript, as well as Playwright, Puppeteer, Selenium, Scrapy, and Crawlee - our own web crawling and browser automation library.\n\n[![Crawlee](https://apify.com/img/logo/crawlee.svg)](https://crawlee.dev/)\n\n```\n1import { PuppeteerCrawler, Dataset } from \"crawlee\";\n2\n3const crawler = new PuppeteerCrawler({\n4  async requestHandler({ request, page, enqueueLinks }) {\n5    await Dataset.pushData({\n6      url: request.url,\n7      title: await page.title(),\n8    });\n9    await enqueueLinks();\n10  },\n11});\n12\n13await crawler.run([\"https://crawlee.dev\"]);\n```\n\n\nPublish Actors. Get paid.\n-------------------------\n\n### Reach thousands of new customers\n\nBuilding and running a SaaS is hard. Building an Actor and selling it on Apify Store is 10x easier. Get visitors from day one.\n\n![Get paid animation](https://apify.com/img/homepage/get-paid-animation.svg)![Get paid animation](https://apify.com/img/homepage/get-paid-animation-dark.svg)\n\n#### No upfront costs\n\nPublishing your Actor is free of charge—the customers pay for the computing resources. New creators get $500 free platform credits.\n\n#### Rely on Apify infra\n\nActors scale automatically as you gain new users. You don’t need to worry about compute, storage, proxies, or authentication.\n\n#### Billing is on us\n\nHandling payments, taxes, and invoicing is a painful part of running a SaaS. Apify does all that and sends you a net payout every month.",
        "timestamp": "2025-01-10T07:10:53.476Z"
    },
	{
		"url": "https://apify.com/actors",
		"title": "Actors - fast and easy scraping in the cloud · Apify",
		"markdown": "Actors are serverless cloud programs that run on the Apify platform and do computing jobs. They are called Actors because, like human actors, they perform actions based on a script.\n\n![](https://cdn-cms.apify.com/actor_with_border_f3508ec394.svg)\n\n### Long-running serverless jobs[](#long-running-serverless-jobs)\n\nApify Actors can perform time-consuming jobs that are longer than the lifespan of a single HTTP transaction.\n\n![](https://cdn-cms.apify.com/Serverless_jobs_54794c9759.svg)\n\n### Publish your Actor[](#publish-your-actor)\n\nJoin hundreds of developers who share their Actors on Apify Store and earn money from coding.\n\n[Go to Apify Store](/store)\n\n![](https://cdn-cms.apify.com/Publish_your_Actor_8e239d4ed0.svg)\n\n### Auto-generated user interface[](#auto-generated-user-interface)\n\nActors can easily define a user interface for their input configuration. Take advantage of lower-level features and settings, or run Actors using our API.\n\n[Learn about Input Schema](https://docs.apify.com/academy/deploying-your-code/input-schema)\n\n![](https://cdn-cms.apify.com/Auto_generated_user_interface_7019512533.svg)\n\n![](https://cdn-cms.apify.com/GH_3bc8f59fdc.svg)\n\nHost code anywhere\n\nEdit your code on our platform, fetch from a Git repository, or push from your machine.\n\n![](https://cdn-cms.apify.com/Docker_support_cf77c5d57b.svg)\n\nDocker support\n\nActors run inside Docker containers on Apify servers. Use a custom Dockerfile.\n\n![](https://cdn-cms.apify.com/Ready_for_scale_925788ef57.svg)\n\nReady for scale\n\nRun as many Actors as you need. The Apify platform provisions the necessary resources.\n\n![](https://cdn-cms.apify.com/Custom_memory_and_CPU_178b540e7d.svg)\n\nCustom memory and CPU\n\nAssign each Actor any RAM volume needed. CPU share is allocated automatically.\n\n![](https://cdn-cms.apify.com/Command_line_tool_4d7d12cd5e.svg)\n\nCommand-line tool\n\nDevelop and test your Actors locally, push them to the Apify platform when you're ready.\n\n![](https://cdn-cms.apify.com/Logging_2034cc75a0.svg)\n\nLogging\n\nView and download logs to debug your code and monitor performance on production.\n\n![](https://cdn-cms.apify.com/Full_support_for_Scrapy_2_ceae73e8b1.png)\n\nActorize your Scrapy spiders[](#actorize-your-scrapy-spiders)\n-------------------------------------------------------------\n\nDeploy your Scrapy code to the cloud with just a few commands. Turn your Scrapy projects into Actors, run, schedule, monitor and monetize them.\n\n[Learn more](/run-scrapy-in-cloud)"
	},
    ...
]

📄 Article Content Extractor - Extract clean article content and metadata from any web page with structured output.
🔍 Keyword Density Checker - Analyze webpage content for keyword density and frequency with precise calculations.
🤖 AI-powered Search - Transform search queries into structured AI-powered summaries with references.
📚 arXiv Search Scraper - Extract comprehensive research paper data with detailed metadata.
🔬 Nature Search Results Scraper - Extract research article data with comprehensive metadata.
📚 Medium Posts Search Scraper - Extract detailed article data from Medium's search results.
📚 Substack Posts Scraper - Scrape Substack posts and articles with comprehensive content data.
🌐 URL Metadata Crawler - Extract comprehensive metadata from web pages including meta tags and Open Graph data.
📝 YouTube Description Extractor - Extract complete descriptions from YouTube videos automatically.
📚 WikiHow Article Scraper - Scrape WikiHow articles with detailed step-by-step content.
🔍 Google News Scraper - Collect up to 5000 news articles with flexible search options.
📚 PubMed Search Scraper - Scrape research papers and academic articles with comprehensive metadata.
📚 Goodreads Book Scraper - Extract comprehensive book data and content from Goodreads.
📚 Medium User Posts Scraper - Extract detailed post data from Medium user profiles.
📚 Substack Publications Scraper - Scrape detailed publication information from Substack.

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

SmartApi

5.0

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

264

2.6

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

889

3.9

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

Extreme Scrapes

173

5.0

Extract-any-webpage-content-for-llm

ai-developer/extract-any-webpage-content-for-llm

Fast and easy way to extract data from any webpage and are LLM friendly. The tool lets you easily extract content from any website. Ideal for researchers, marketers, and developers.

aideveloper

612

URL to Markdown (JustHTML) - Clean Markdown Extractor

macheta/justhtml-link-to-markdown

Convert webpages to clean Markdown for RAG and archiving. Uses JustHTML and supports optional Cloudflare/Turnstile bypass plus CSS selector extraction.

Anass

5.0

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David

3.3K

4.9

Fast URL Content Crawler

6sigmag/fast-url-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple URLs simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David

278

5.0

Website Backup

mhamas/website-backup

Enables to create a backup of any website by crawling it, so that you don’t lose any content by accident. Ideal e.g. for your personal or company blog.

Matej Hamas

311

WebPage Scraper

muhammadsaifkhalid4/my-actor

You can scrape Webpages for data. What changed? Multiple URLs Error handling: Each URL is handled independently, failures are logged & stored. Anti-blocking: Added User-Agent + Accept-Language. Data structure: Instead of just a flat heading list, you now get per-URL results with metadata.