Xcrawl Search Scrape Actor avatar

Xcrawl Search Scrape Actor

Under maintenance

Pricing

Pay per usage

Go to Apify Store
Xcrawl Search Scrape Actor

Xcrawl Search Scrape Actor

Under maintenance

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Charles

Charles

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

XCrawl Web Search & Scrape — Apify Actor

Search the web and scrape any URL using XCrawl's residential proxy network. Bypass anti-bot systems with automatic JS rendering fallback and global IP rotation.

Actor: yanxvdong123/xcrawl-search-scrape | Runtime: Node.js 22 | License: MIT


🚀 Quick Start

  1. Open the Actor Console
  2. Set XCRAWL_API_KEY in Environment Variables (get a free key at dash.xcrawl.com)
  3. Choose Search or Scrape mode, fill in the inputs
  4. Hit Run

No credit card needed — XCrawl gives free trial credits on signup.


📋 Input Parameters

Search Mode (action: "search")

ParameterTypeDefaultDescription
querystringrequiredWeb search query (max 200 chars)
limitinteger10Number of results (1–50)
locationstring"US"Geo-location code (US, UK, CN, JP, DE, etc.)
languagestring"en"Search language (en, zh, ja, fr, etc.)
withContentbooleantrueFetch full page content for each result
renderbooleanfalseJS rendering for anti-bot bypass
formatsstring"markdown,summary"Output formats: comma-separated (markdown, summary, html)
screenshotbooleanfalseCapture page screenshot (requires render=true)

Scrape Mode (action: "scrape")

ParameterTypeDefaultDescription
urlstringrequiredSingle URL to scrape (max 2000 chars)
renderbooleanfalseJS rendering for anti-bot bypass
formatsstring"markdown,summary"Output formats
screenshotbooleanfalseCapture screenshot (requires render=true)

🧠 Intelligent Anti-Block System

This actor is built to handle modern anti-bot systems out of the box:

  • Automatic block detection — Heuristically checks for Cloudflare, DataDome, and other challenge pages (looks for captcha forms, browser verification, access denied messages)
  • Smart retry — If a page appears blocked, automatically retries with headless browser rendering (Chromium via XCrawl's jsRender)
  • Concurrent crawling — Uses p-limit to run up to 5 parallel scrapes (balanced for speed + reliability)
  • Global proxy pool — Requests route through XCrawl's residential proxy network with configurable geo-location
  • Per-URL resilience — Each URL gets at least 2 attempts; if both fail, the error is recorded per-entry without stopping the batch

When to enable render

✅ Turn ON for: News sites with paywalls (Reuters, WSJ), sites behind Cloudflare/DataDome, JavaScript-heavy SPAs
❌ Keep OFF for: Simple HTML pages, blogs, documentation (faster and cheaper without rendering)


📦 Output Format

Each result is pushed to the Apify dataset:

{
"title": "Page Title",
"url": "https://example.com",
"snippet": "Search result description",
"markdown": "Full page content converted to markdown...",
"summary": "AI-generated summary from XCrawl...",
"scrapeStatus": "completed",
"screenshot": "base64-encoded PNG (if enabled)",
"credits": "0.5",
"scrapeError": null
}

Search mode returns an array of enriched results.
Scrape mode returns a single result object.


💰 Usage & Pricing

ModeXCrawl Credits Consumed
Search (1 query)~1 credit
Scrape (no render)~1–3 credits
Scrape (with render)~3–8 credits
Free trial✅ Included with XCrawl signup

The actor itself is free to run on Apify — you only pay for XCrawl API credits consumed.


🔧 Environment Variables

VariableRequiredDescription
XCRAWL_API_KEY✅ YesYour API key from dash.xcrawl.com. Sign up → Dashboard → API Keys

🎯 Use Cases

  • Content research — Collect articles, blog posts, and documentation on any topic
  • Market intelligence — Scrape competitor pricing, product listings, and reviews
  • SEO / SERP monitoring — Track search rankings across different geo-locations
  • RAG / LLM pipelines — Feed clean markdown content into vector databases or AI agents
  • E-commerce — Monitor product catalogs with location-specific searches
  • News aggregation — Gather articles from multiple sources with automatic paywall bypass

🏗 Architecture

Apify Run
└─ src/main.js (entry point)
├─ XCrawl Search API → Get top results
├─ XCrawl Scrape API → Extract page content
│ └─ p-limit (concurrency = 5)
│ ├─ Normal scrape (fast)
│ └─ Retry with JS render (anti-bot fallback)
└─ Apify Dataset ← Push all results