Pricing

$19.00/month + usage

Go to Store

Zach's "Webpage Content To Markdown" Scraper

Try for free

Developed by

Double Your Freelancing

Scrape a webpage and parse to markdown. Packed with features to ensure high success rate and low cost. Includes 2 modes of operation so that you can optimize for either cost (as cheap as possible) or yield (as many successful results as possible).

0.0 (0)

Pricing

$19.00/month + usage

Last modified

5 months ago

Lead generation

Automation

This Apify actor scrapes a single webpage and parses to markdown. Includes browser-based scraping, smart retrying, anti-scrape block (e.g. cloudflare) circumvention, and smart proxy support to ensure a high success rate.

It also includes 2 modes of operation so that you can optimize for either cost (as cheap as possible) or yield (as many successful results as possible).

🤔 When To Use It

Whenever you want to reliably get a webpage's content and parse it into markdown.

(I personally mostly use it for feeding data into ChatGPT for freelance cold outreach personalization & automation tasks, which I cover in our $200k Freelancer course.)

😰 Why We Made it:

If you want to have ChatGPT interpret a webpage, it can be surprisingly difficult with current tooling.

😭 ChatGPT's API isn't currently web-connected
😿 If you try to get a page's content via a Make automation and parse it to text/markdown, it's unreliable and produces a lot of soft failures and rendering errors
🤢 If you try to use standalone tools for webpage scraping to markdown conversion, they're expensive and also have a lot of soft failures & markdown rendering errors
😣 If you use the other website-crawling-to-markdown scrapers on Apify they tend to be expensive and unreliable.

That's why we made this Actor...

💪 Why This Actor is Nifty:

😍 This actor allows you to simply plop in a big ole list of domain names, and get a huge spreadsheet of markdown content back, to do whatever you want with.

(e.g. upload to google sheets and have ChatGPT iterate through via Make automation)

🤘 Features:

✅ Anti-Scrape Circumvention — if you use the "Get Data Using Browser" option, we'll be able to circumvent many blocks
✅ Soft-Failure Reporting — e.g. if a webpage comes back blank, we'll mark it as a failure — not a lot of other solutions do this)
✅ Smart Proxy Support — we'll run on Datacenter proxies by default, and only revert to Residential proxies when actually necessary
✅ Smart Retrying — we'll auto retry on failures and rotate proxies and IPs to get you the most successful results possible

💭 Example Use Cases:

If you're a $200k Freelancer course student, be sure to check the course training area for guidance on the below use cases and more.

Website Language Detection:

Run this actor
Put results into a Google Sheet
Filter out the fails
Add the formula =DETECTLANGUAGE(E2) (assuming E is the markdown column) to a new column
Extend that formula to all rows in the column
Filter results to not show languages you don't want (e.g. filter to only show en for only English language websites

Cold Outreach Personalization:

(e.g. find out what kinds of products a company sells, who their audience avatar is, etc.)

Run this actor
Put results into a Google Sheet
Filter out the fails
Create a Make automation that feeds the markdown into ChatGPT for analysis
Have ChatGPT give you its analyses back as JSON if you want multiple fields / analyses back (e.g. "type_of_products_sold," "random_product_name," etc.
Parse the JSON and add each field to a column in the Google Sheet
You can now feed these data into a line-writer ChatGPT prompt to have it rewrite a template line with this personalization data

Modes of Operation

Regardless of which mode you use it in, if you're exporting to a spreadsheet, be sure to choose MS Excel format, not CSV. (Markdown will often mess up the CSV file)

"Low-Hanging Fruit" Mode

The following settings are efficient and the cheapest path to data, but won't work for a lot of websites:

"Get Data Using Browser" option disabled
1GB of RAM
Residential proxies (we use datacenter by default in our code and will only use residential if actually necessary)

Estimated Costs for "Low-Hanging Fruit" Mode:

Est. cost per result in "Low-Hanging Fruit" Mode: $0.00025
Est. yield on results: 84.12%

"All The Damned Fruit" Mode

The following settings have very high reliability, but are more expensive:

"Get Data Using Browser" option enabled
4GB of RAM (You can often get away with 2GB – or even 1GB – of RAM, which will make it much cheaper.)
Residential proxies (we use datacenter by default in our code and will only use residential if actually necessary)

Estimated Costs for "All The Damned Fruit" Mode:

Est. cost per result in "All The Damned Fruit" Mode: $0.0069 CPL for residential proxies ($0.0012 CPL for datacenter)
Est. yield on results: 93.38% for residential (91.64% datacenter)

Pricing Breakdown:

Results	Valid Results	Cost	Cost Per Result (CPL)	Yield	Time	Memory	Proxy	Using Browser Build
2462	2071	$0.612	$0.0002486	84.12%	36min	1 GB	Residential	No
2463	2078	$0.914	$0.0003711	84.37%	19min	4 GB	Residential	No
2463	2257	$2.99	$0.0012140	91.64%	96min	4 GB	Datacenter	Yes
2463	2300	$15-46	As high as $0.02	93.38%	120min	4 GB	If Residential	Yes

Suggested Usage

Depending on your priorities, there are a couple ways to use this scraper. What's your priority?

"My Priority is EASE"

("...And I don't care if it costs more.")

👉 Run it with the settings from the "All The Damned Fruit" Mode from the 'Modes of Operation" instructions right from the start.

Just be aware that at 4GB of RAM + residential proxies, you will have to pay up to 100x more than if you did the "low-hanging fruit mode" first.

If you're exporting to a spreadsheet, be sure to choose MS Excel format, not CSV. (Markdown will often mess up the CSV file)

"My Priority is COST"

("...And I don't care if it means there are a couple extra steps for me.")

👉 You'll do two separate runs — first you'll get all the cheap Low-Hanging Fruit results you can, then you'll re-run all the failures in the "All The Damned Fruit" Mode.

Instructions:

Run for your full set of URLs with the "Low-Hanging Fruit" Mode settings (You can find them in the Modes of Operation section at the top of this page)
After the run is finished, export the results to Excel format and filter the list to only show the failures
Re-run these failures with the settings from the "All The Damned Fruit" Mode settings (You can find them in the Modes of Operation section at the top of this page)
Export the results from both runs and merge the data manually into one sheet

All Config Options

Maximum Content Length (Characters) — This will trim each record's markdown output before we add it to the result set. Cuts down on spreadsheet filesize. (Our hard-set internal trim maximum is 10,000 characters)

On this page

Zach's "Webpage Content To Markdown" Scraper
Modes of Operation
Suggested Usage
- - "My Priority is EASE"
  - "My Priority is COST"
All Config Options

Share Actor:

Webpage to Markdown

extremescrapes/webpage-to-markdown

This actor cost-effectively converts websites into structured markdown optimized for AI processing. It extracts webpage content, formats it into clean markdown, and ensures compatibility with AI models.

Extreme Scrapes

Extract-any-webpage-content-for-llm

ai-developer/extract-any-webpage-content-for-llm

Fast and easy way to extract data from any webpage and are LLM friendly. The tool lets you easily extract content from any website. Ideal for researchers, marketers, and developers.

aideveloper

497

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

EasyApi

5.0

Ai Ready Web Page To Markdown Converter

mustafa.irshaid.113/ai-ready-web-page-to-markdown-converter

Convert any webpage into structured Markdown and HTML using just a URL. Get the page title, link, and content—perfect for SEO, devs, and AI crawlers. Fast, clean, and ideal for repurposing or analysis. Start turning websites into Markdown instantly.

Mustafa Irshaid

Html To Markdown Converter 📄

powerful_bachelor/html-to-markdown-converter

📄✨ HTML to Markdown Converter transforms web pages into clean, portable Markdown. Simply input a URL to extract content while preserving structure, formatting, and media elements.🔄 Perfect for content repurposing, documentation, and creating readable, platform-independent text from any webpage! 🚀

Powerful Bachelor

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

638

4.6

Website to MarkDown (AI-Ready)

mintii/website-to-markdown-ai-ready

Use this to scrape webpages and use for AI Tools and LLMs.

Martin from Mintii

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

109

3.8

Dynamic Markdown Scraper

louisdeconinck/dynamic-markdown-scraper

Effortlessly feed LLM AIs with clean Markdown using our advanced web scraper. Seamlessly scrape dynamic, JavaScript-rendered websites while preserving original formatting. Ideal for AI training, documentation, and content migration.