Cloudflare Crawl Router avatar

Cloudflare Crawl Router

Pricing

from $0.20 / 1,000 results

Go to Apify Store
Cloudflare Crawl Router

Cloudflare Crawl Router

An intelligent crawl orchestration Actor that analyzes websites and routes each target to the most suitable compliant crawl strategy for efficient extraction and transparent diagnostics.

Pricing

from $0.20 / 1,000 results

Rating

0.0

(0)

Developer

Solutions Smart

Solutions Smart

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

7 hours ago

Last modified

Share

๐Ÿš€ Cloudflare Crawl Router for Apify

Cloudflare Crawl Router is an intelligent README.md for web scraping, website crawling, structured data extraction, and content collection. It inspects each target URL, detects anti-bot friction and frontend complexity, and automatically routes the request to the most efficient compliant strategy: lightweight HTTP, Cheerio-style parsing, browser rendering, or a safe early abort when the site is not suitable for the current crawl settings.

Instead of paying browser costs for every page, this Actor helps optimize crawl performance, extraction quality, and infrastructure spend. It is especially useful for automation teams, AI pipelines, RAG workflows, monitoring systems, lead-enrichment jobs, and research use cases that need clean output plus transparent crawl diagnostics.

๐ŸŒŸ Why this Apify Actor stands out

This Actor is built for users who want:

  • โšก Crawl simple sites cheaply with HTTP before escalating to browser rendering when needed
  • ๐Ÿ›ก๏ธ Detect Cloudflare and JavaScript-heavy pages early and route accordingly
  • ๐Ÿ“„ Generate AI-ready markdown and cleaned text for LLM and RAG pipelines
  • ๐Ÿ” Understand why a route succeeded, failed, or escalated through transparent diagnostics

It does not claim to bypass protections unlawfully. Instead, it detects Cloudflare-related signals, challenge hints, login walls, and crawl friction, then chooses the best legal and technically appropriate extraction strategy available under your configuration.

โš™๏ธ How it works

  1. Analyze the target URL and detect site characteristics
  2. Detect frontend technology, Cloudflare signals, and protection levels
  3. Choose the most appropriate crawl strategy (HTTP, Cheerio, or Browser)
  4. Extract content, metadata, and links from the page
  5. Return structured data plus routing diagnostics explaining the decisions made

๐ŸŽฏ Best use cases

  • Website content extraction for AI and LLM pipelines
  • Product page scraping and structured product data collection
  • Article and blog extraction for RAG or search indexing
  • Compliance-aware website crawling on mixed static/dynamic sites
  • Cost-optimized browser-vs-HTTP crawl orchestration
  • Crawl diagnostics for protected or JavaScript-heavy targets
  • Apify workflows that need structured content plus explanation of routing decisions

๐Ÿ“ˆ Apify Marketplace positioning

Cloudflare Crawl Router is positioned as an intelligent Apify web scraping Actor for users who need more than a generic crawler. It helps turn websites into clean structured data while improving cost efficiency, crawl transparency, and strategy selection. This makes it well suited for the Apify Marketplace under categories such as website crawler, content scraper, product scraper, AI data extraction, and smart browser automation.

โœ… Features

  • ๐Ÿค– Automatic crawl strategy routing across HTTP fetch, Cheerio parsing, Playwright rendering, and queue-based site crawling
  • โ˜๏ธ Cloudflare-aware signal detection using headers, challenge markers, Turnstile hints, and suspicious response patterns
  • ๐Ÿงฉ Technology fingerprinting for React, Next.js, Nuxt, Vue, Angular, SvelteKit, Astro, WordPress, Shopify, and similar stacks
  • ๐Ÿ’ธ Cost-aware browser escalation and proxy recommendations
  • ๐Ÿ“ฆ Extraction-ready output for articles, products, jobs, profiles, and generic content pages
  • ๐Ÿ“ Clean text and markdown generation for AI agents, LLM pipelines, summarization, and RAG ingestion
  • ๐Ÿ“Š Crawl summaries, routing summaries, domain profiles, and optional HTML or screenshot artifacts
  • ๐Ÿค Robots.txt-aware and sitemap-aware discovery flows for compliant crawling

๐Ÿ“ฅ Input Configuration

The Actor accepts a JSON input with grouped sections. Here is a complete example:

{
"routing": {
"fallbackToBrowser": true,
"mode": "site-crawl",
"renderJavaScript": true
},
"startUrls": [
{
"url": "https://docs.apify.com/academy/web-scraping-for-beginners"
}
],
"limits": {
"maxDepth": 1,
"maxPages": 20,
"timeoutSecs": 45
},
"extraction": {
"extractionType": "auto"
},
"output": {
"outputFormat": "json",
"includeMetadata": true,
"includeDiagnostics": true
},
"networking": {
"useProxy": "auto"
},
"compliance": {
"obeyRobotsTxt": true,
"sameDomainOnly": true
},
"artifacts": {
"saveHtml": false,
"saveMarkdown": true,
"saveScreenshot": false
},
"diagnostics": {
"detectCloudflare": true
}
}

๐Ÿ’ฌ Feedback & Reviews

Found a routing edge case or want support for a new extraction pattern? Open an issue or leave a review on Apify. Feedback helps improve strategy detection, extraction quality, and marketplace usability.

๐Ÿ“ค Output Example

Actor Output Example

๐Ÿ“ค Example Dataset Output

Each dataset item includes structured content plus routing diagnostics:

{
"url": "https://example.com/article/123",
"finalUrl": "https://example.com/article/123",
"title": "Example Title",
"detectedEntityType": "article",
"crawlStrategyUsed": "browser",
"routeReason": "Detected SPA hydration markers and deferred content rendering",
"cloudflareDetected": true,
"challengeSuspected": false,
"protectionLevel": "medium",
"markdown": "# Example Title\n\nArticle content...",
"text": "Example Title Article content...",
"links": [],
"timestamp": "2026-03-16T10:00:00.000Z"
}

โš ๏ธ Limitations

  • Does not bypass CAPTCHAs or authentication walls
  • Success depends on target site structure and crawl permissions
  • Browser rendering increases runtime and cost
  • Some protected sites may be skipped with diagnostics instead of forced crawling