Pricing

from $0.20 / 1,000 results

Cloudflare Crawl Router

An intelligent crawl orchestration Actor that analyzes websites and routes each target to the most suitable compliant crawl strategy for efficient extraction and transparent diagnostics.

Pricing

from $0.20 / 1,000 results

Rating

0.0

(0)

Developer

Solutions Smart

Actor stats

Bookmarked

Total users

Monthly active users

7 hours ago

Last modified

🚀 Cloudflare Crawl Router for Apify

Cloudflare Crawl Router is an intelligent README.md for web scraping, website crawling, structured data extraction, and content collection. It inspects each target URL, detects anti-bot friction and frontend complexity, and automatically routes the request to the most efficient compliant strategy: lightweight HTTP, Cheerio-style parsing, browser rendering, or a safe early abort when the site is not suitable for the current crawl settings.

Instead of paying browser costs for every page, this Actor helps optimize crawl performance, extraction quality, and infrastructure spend. It is especially useful for automation teams, AI pipelines, RAG workflows, monitoring systems, lead-enrichment jobs, and research use cases that need clean output plus transparent crawl diagnostics.

🌟 Why this Apify Actor stands out

This Actor is built for users who want:

⚡ Crawl simple sites cheaply with HTTP before escalating to browser rendering when needed
🛡️ Detect Cloudflare and JavaScript-heavy pages early and route accordingly
📄 Generate AI-ready markdown and cleaned text for LLM and RAG pipelines
🔍 Understand why a route succeeded, failed, or escalated through transparent diagnostics

It does not claim to bypass protections unlawfully. Instead, it detects Cloudflare-related signals, challenge hints, login walls, and crawl friction, then chooses the best legal and technically appropriate extraction strategy available under your configuration.

⚙️ How it works

Analyze the target URL and detect site characteristics
Detect frontend technology, Cloudflare signals, and protection levels
Choose the most appropriate crawl strategy (HTTP, Cheerio, or Browser)
Extract content, metadata, and links from the page
Return structured data plus routing diagnostics explaining the decisions made

🎯 Best use cases

Website content extraction for AI and LLM pipelines
Product page scraping and structured product data collection
Article and blog extraction for RAG or search indexing
Compliance-aware website crawling on mixed static/dynamic sites
Cost-optimized browser-vs-HTTP crawl orchestration
Crawl diagnostics for protected or JavaScript-heavy targets
Apify workflows that need structured content plus explanation of routing decisions

📈 Apify Marketplace positioning

Cloudflare Crawl Router is positioned as an intelligent Apify web scraping Actor for users who need more than a generic crawler. It helps turn websites into clean structured data while improving cost efficiency, crawl transparency, and strategy selection. This makes it well suited for the Apify Marketplace under categories such as website crawler, content scraper, product scraper, AI data extraction, and smart browser automation.

✅ Features

🤖 Automatic crawl strategy routing across HTTP fetch, Cheerio parsing, Playwright rendering, and queue-based site crawling
☁️ Cloudflare-aware signal detection using headers, challenge markers, Turnstile hints, and suspicious response patterns
🧩 Technology fingerprinting for React, Next.js, Nuxt, Vue, Angular, SvelteKit, Astro, WordPress, Shopify, and similar stacks
💸 Cost-aware browser escalation and proxy recommendations
📦 Extraction-ready output for articles, products, jobs, profiles, and generic content pages
📝 Clean text and markdown generation for AI agents, LLM pipelines, summarization, and RAG ingestion
📊 Crawl summaries, routing summaries, domain profiles, and optional HTML or screenshot artifacts
🤝 Robots.txt-aware and sitemap-aware discovery flows for compliant crawling

📥 Input Configuration

The Actor accepts a JSON input with grouped sections. Here is a complete example:

{
  "routing": {
    "fallbackToBrowser": true,
    "mode": "site-crawl",
    "renderJavaScript": true
  },
  "startUrls": [
    {
      "url": "https://docs.apify.com/academy/web-scraping-for-beginners"
    }
  ],
  "limits": {
    "maxDepth": 1,
    "maxPages": 20,
    "timeoutSecs": 45
  },
  "extraction": {
    "extractionType": "auto"
  },
  "output": {
    "outputFormat": "json",
    "includeMetadata": true,
    "includeDiagnostics": true
  },
  "networking": {
    "useProxy": "auto"
  },
  "compliance": {
    "obeyRobotsTxt": true,
    "sameDomainOnly": true
  },
  "artifacts": {
    "saveHtml": false,
    "saveMarkdown": true,
    "saveScreenshot": false
  },
  "diagnostics": {
    "detectCloudflare": true
  }
}

💬 Feedback & Reviews

Found a routing edge case or want support for a new extraction pattern? Open an issue or leave a review on Apify. Feedback helps improve strategy detection, extraction quality, and marketplace usability.

📤 Output Example

Actor Output Example

📤 Example Dataset Output

Each dataset item includes structured content plus routing diagnostics:

{
  "url": "https://example.com/article/123",
  "finalUrl": "https://example.com/article/123",
  "title": "Example Title",
  "detectedEntityType": "article",
  "crawlStrategyUsed": "browser",
  "routeReason": "Detected SPA hydration markers and deferred content rendering",
  "cloudflareDetected": true,
  "challengeSuspected": false,
  "protectionLevel": "medium",
  "markdown": "# Example Title\n\nArticle content...",
  "text": "Example Title Article content...",
  "links": [],
  "timestamp": "2026-03-16T10:00:00.000Z"
}

⚠️ Limitations

Does not bypass CAPTCHAs or authentication walls
Success depends on target site structure and crawl permissions
Browser rendering increases runtime and cost
Some protected sites may be skipped with diagnostics instead of forced crawling

Site Health Scanner

constant_quadruped/site-health-scanner

Crawl a website to detect broken and problematic links, identify redirects and blocked URLs, capture screenshots, and return structured site health data for audits, automation, and monitoring.

Firecrawl Website Crawler

alizarin_refrigerator-owner/firecrawl-website-crawler

Enhanced Website Crawling with Superior JS Rendering Enhanced website crawler using Firecrawl's Crawl API for superior JavaScript rendering, smart rate limiting, anti-bot bypass, and clean markdown extraction.

The Howlers

Lead Finder: Email + Name Extraction

datavault/lead-finder-email-name-extraction

Lead Finder: Email + Name Extraction is a fast, lightweight Apify actor that extracts emails and related names from websites. It supports single URLs or domain crawling, handles obfuscated and protected emails, and offers flexible controls for deduplication, validation, and crawl behaviour.

Datavault

Website Blueprint Prompter

heyibad/website-blueprint-prompter

Turn any website into AI-ready prompts. This Actor crawls JS-rendered pages, extracts design tokens and assets, detects the tech stack, and generates structured blueprints for AI code generation and fast prototyping.

Muhammad Ibad Ansari

DRG Phantom Core - Genesis Pilot

tuguidragos/drg-phantom-core-genesis-pilot

A stealth-grade autonomous lead intelligence engine that discovers, enriches, analyzes, and qualifies B2B prospects using multi-source scraping and AI scoring. This pilot release showcases the system’s core capabilities and foundational architecture.

Țugui Dragoș

3.3

(3)

Website Tech Stack Detector

ryanclinton/website-tech-stack-detector

Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.

ryan clinton

Free Domain Technology Stack Scanner

s-r/free-domain-technology-stack-scanner

Detect the complete technology stack of any website. Identifies ecommerce platforms (Shopify, WooCommerce, Magento), CMS (WordPress, Contentful), JS frameworks (React, Next.js, Vue), analytics (GA4, GTM), payment providers (Stripe, PayPal, Klarna), hosting/CDN, SSL certificates.

GiveSendGo Campaign Ranker

substantial_officer/GiveSendGo-Campaign-Ranker

Scrapes GiveSendGo campaigns, extracts funding data, and ranks by percentage raised (lowest first) to identify campaigns needing the most support. Handles Cloudflare, unpublished campaigns, and missing data.

Daniel Goodwyn

Web Scraper 🚀

datascoutapi/web-scraper

Web Scraper Pro extracts clean structured data for LLMs/RAG. Browser-based, 10x faster with anti-detection bypassing Cloudflare/CAPTCHA & proxy rotation. Bulk/recursive crawl 50k URLs at 500 pages/min. JSON/CSV/API, free tier.

halam

Website Contact & Socials Extractor

embion/website-contact-socials-extractor

Crawl company websites and extract emails, phone numbers and links to Discord, Facebook, Instagram, LinkedIn, Pinterest, Reddit, Snapchat, Telegram, TikTok, Twitch, Twitter/X and YouTube. 2 hour trial available.

Embion

801

3.8

(3)

Digibuzz Ecommerce Price Tracker

yuletide_santoor/digibuzz-ecommerce-price-tracker

This actor crawls Amazon product pages to extract detailed information including product titles, prices, and ratings. Using Puppeteer, it navigates through specified URLs, retrieves product details, and saves the data into a dataset.