Cloudflare Crawl Router
Pricing
from $0.20 / 1,000 results
Cloudflare Crawl Router
An intelligent crawl orchestration Actor that analyzes websites and routes each target to the most suitable compliant crawl strategy for efficient extraction and transparent diagnostics.
Pricing
from $0.20 / 1,000 results
Rating
0.0
(0)
Developer

Solutions Smart
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
7 hours ago
Last modified
Categories
Share
๐ Cloudflare Crawl Router for Apify
Cloudflare Crawl Router is an intelligent README.md for web scraping, website crawling, structured data extraction, and content collection. It inspects each target URL, detects anti-bot friction and frontend complexity, and automatically routes the request to the most efficient compliant strategy: lightweight HTTP, Cheerio-style parsing, browser rendering, or a safe early abort when the site is not suitable for the current crawl settings.
Instead of paying browser costs for every page, this Actor helps optimize crawl performance, extraction quality, and infrastructure spend. It is especially useful for automation teams, AI pipelines, RAG workflows, monitoring systems, lead-enrichment jobs, and research use cases that need clean output plus transparent crawl diagnostics.
๐ Why this Apify Actor stands out
This Actor is built for users who want:
- โก Crawl simple sites cheaply with HTTP before escalating to browser rendering when needed
- ๐ก๏ธ Detect Cloudflare and JavaScript-heavy pages early and route accordingly
- ๐ Generate AI-ready markdown and cleaned text for LLM and RAG pipelines
- ๐ Understand why a route succeeded, failed, or escalated through transparent diagnostics
It does not claim to bypass protections unlawfully. Instead, it detects Cloudflare-related signals, challenge hints, login walls, and crawl friction, then chooses the best legal and technically appropriate extraction strategy available under your configuration.
โ๏ธ How it works
- Analyze the target URL and detect site characteristics
- Detect frontend technology, Cloudflare signals, and protection levels
- Choose the most appropriate crawl strategy (HTTP, Cheerio, or Browser)
- Extract content, metadata, and links from the page
- Return structured data plus routing diagnostics explaining the decisions made
๐ฏ Best use cases
- Website content extraction for AI and LLM pipelines
- Product page scraping and structured product data collection
- Article and blog extraction for RAG or search indexing
- Compliance-aware website crawling on mixed static/dynamic sites
- Cost-optimized browser-vs-HTTP crawl orchestration
- Crawl diagnostics for protected or JavaScript-heavy targets
- Apify workflows that need structured content plus explanation of routing decisions
๐ Apify Marketplace positioning
Cloudflare Crawl Router is positioned as an intelligent Apify web scraping Actor for users who need more than a generic crawler. It helps turn websites into clean structured data while improving cost efficiency, crawl transparency, and strategy selection. This makes it well suited for the Apify Marketplace under categories such as website crawler, content scraper, product scraper, AI data extraction, and smart browser automation.
โ Features
- ๐ค Automatic crawl strategy routing across HTTP fetch, Cheerio parsing, Playwright rendering, and queue-based site crawling
- โ๏ธ Cloudflare-aware signal detection using headers, challenge markers, Turnstile hints, and suspicious response patterns
- ๐งฉ Technology fingerprinting for React, Next.js, Nuxt, Vue, Angular, SvelteKit, Astro, WordPress, Shopify, and similar stacks
- ๐ธ Cost-aware browser escalation and proxy recommendations
- ๐ฆ Extraction-ready output for articles, products, jobs, profiles, and generic content pages
- ๐ Clean text and markdown generation for AI agents, LLM pipelines, summarization, and RAG ingestion
- ๐ Crawl summaries, routing summaries, domain profiles, and optional HTML or screenshot artifacts
- ๐ค Robots.txt-aware and sitemap-aware discovery flows for compliant crawling
๐ฅ Input Configuration
The Actor accepts a JSON input with grouped sections. Here is a complete example:
{"routing": {"fallbackToBrowser": true,"mode": "site-crawl","renderJavaScript": true},"startUrls": [{"url": "https://docs.apify.com/academy/web-scraping-for-beginners"}],"limits": {"maxDepth": 1,"maxPages": 20,"timeoutSecs": 45},"extraction": {"extractionType": "auto"},"output": {"outputFormat": "json","includeMetadata": true,"includeDiagnostics": true},"networking": {"useProxy": "auto"},"compliance": {"obeyRobotsTxt": true,"sameDomainOnly": true},"artifacts": {"saveHtml": false,"saveMarkdown": true,"saveScreenshot": false},"diagnostics": {"detectCloudflare": true}}
๐ฌ Feedback & Reviews
Found a routing edge case or want support for a new extraction pattern? Open an issue or leave a review on Apify. Feedback helps improve strategy detection, extraction quality, and marketplace usability.
๐ค Output Example

๐ค Example Dataset Output
Each dataset item includes structured content plus routing diagnostics:
{"url": "https://example.com/article/123","finalUrl": "https://example.com/article/123","title": "Example Title","detectedEntityType": "article","crawlStrategyUsed": "browser","routeReason": "Detected SPA hydration markers and deferred content rendering","cloudflareDetected": true,"challengeSuspected": false,"protectionLevel": "medium","markdown": "# Example Title\n\nArticle content...","text": "Example Title Article content...","links": [],"timestamp": "2026-03-16T10:00:00.000Z"}
โ ๏ธ Limitations
- Does not bypass CAPTCHAs or authentication walls
- Success depends on target site structure and crawl permissions
- Browser rendering increases runtime and cost
- Some protected sites may be skipped with diagnostics instead of forced crawling