Webpage Content to Markdown Super Cost Effective avatar

Webpage Content to Markdown Super Cost Effective

Pricing

Pay per event

Go to Apify Store
Webpage Content to Markdown Super Cost Effective

Webpage Content to Markdown Super Cost Effective

Focus on cost, Turn any webpage content into LLM-ready Markdown for RAG. Uses a smart hybrid 4 tier engine: Apify for crawling + Cloudflare Browser Rendering for perfect extraction. Automatically saves costs by detecting native markdown support.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Søren Riisager

Søren Riisager

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

2 days ago

Last modified

Share

🚀 The Ultimate Web-to-Markdown Converter (Cloudflare + Apify)

Turn any website into clean, LLM-ready Markdown while saving 90% on scraping costs.

This Actor uses a smart Quadruple-Tier Architecture to intelligently switch between free extraction, Cloudflare Browser Rendering ($0.005/page), and Apify's powerful Anti-Detect Browsers.

It is designed for RAG Pipelines, AI Agents, and Dataset Creation where quality, speed, and cost efficiency are paramount.


💡 Why This Actor?

Most scrapers are either too simple (fail on JavaScript) or too expensive (always use heavy browsers). We solve this with a "Cost-First, Robustness-Last" strategy:

1. 💰 Smart Cost Optimization

We don't just blindly launch a browser. We try the cheapest methods first:

  • Tier 1 (Free): Checks for native Markdown headers.
  • Tier 2 (Free): Uses a local Readability engine (no browser overhead).
  • Tier 3 ($0.005): Uses Cloudflare Browser Rendering for fast, cheap JS rendering.
  • Tier 4 (~$0.10): Uses Apify Anti-Detect Browser only as a last resort.
    • Note: Defaults to Datacenter Proxies to keep costs low.

Result: You pay pennies for easy sites, and only use "Heavy Artillery" when absolutely necessary.

3. 💸 Tiered Pricing (New!)

This Actor uses Pay-per-event pricing to ensure you only pay for what you use:

  • Standard Result ($0.10 / 1k): Charged for Tiers 1, 2, and 3 (Native, Local, Cloudflare).
  • Premium Result ($2.00 / 1k): Charged for Tier 4 (Apify Browser + Proxy).

2. 🛡️ Anti-Block Handling (New!)

If a website blocks our cheap requests (returning 403 Forbidden or 429 Too Many Requests), the Actor automatically fights back:

  1. Detects the Block.
  2. Retries with Tier 3 (Cloudflare) to see if a simple browser pass works.
  3. Escalates to Tier 4 (Apify Residential Proxy) to bypass even the toughest WAFs (Cloudflare/Akamai/etc.).

Result: Near 100% Success Rate.


🏗️ The Quadruple-Tier Architecture

TierMethodCost TypeSpeedBest For
1Native MarkdownStandard⚡ InstantSites that serve raw markdown (e.g., GitHub, Docs).
2Local ReadabilityStandard🚀 Very FastBlogs, News, Static HTML sites.
3Cloudflare BrowserStandard🚄 FastSPAs (React/Vue), JS-heavy sites.
4Apify BrowserPremium🐢 SlowStubborn Sites, Anti-Bot Protection, Deep Complex Apps.

⚙️ Configuration

You have full control. Toggle tiers on/off to fit your budget and needs.

FieldDescription
Start URLsList of URLs to scrape.
Cloudflare SettingsAccount ID & API Token (Required for Tier 3).
Enable Tier 1-4Toggle specific tiers on/off (Default: All Enabled).
Proxy ConfigurationChoose proxies. Default: Datacenter (Low Cost).
Max ConcurrencyParallel pages. Note: Tier 4 eats RAM, keep low (1-2) if using it heavily.

🔑 Getting Cloudflare Credentials (Required for Tier 3)

To use the Cost-Saving Tier 3, you need a Cloudflare Workers Paid Plan ($5/mo).

  1. Account ID: Found in your Cloudflare Dashboard URL.
  2. API Token: Create a token with Account > Browser Rendering > Edit permissions.

Note: You can disable Tier 3 if you don't have Cloudflare, but you lose the "Cheap Browser" advantage.


📊 Output Format

We provide clean JSON ready for your Vector Database or LLM:

{
"url": "https://example.com/blog/ai-revolution",
"meta": {
"title": "The AI Revolution",
"description": "How AI is changing the web...",
"keywords": "AI, LLM, RAG"
},
"content": {
"markdown": "# The AI Revolution\n\nFull article content...",
"title": "The AI Revolution",
"source": "cloudflare_browser", // Tells you which Tier succeeded
"estimatedTokens": 540
},
"scrapedAt": "2023-10-27T10:00:00.000Z"
}

Built with ❤️ by Tulabot.com - Powering the next generation of AI Agents.