Firecrawl Website Crawler avatar

Firecrawl Website Crawler

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Firecrawl Website Crawler

Firecrawl Website Crawler

Enhanced Website Crawling with Superior JS Rendering Enhanced website crawler using Firecrawl's Crawl API for superior JavaScript rendering, smart rate limiting, anti-bot bypass, and clean markdown extraction.

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

The Howlers

The Howlers

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

1

Monthly active users

a day ago

Last modified

Share

Firecrawl Website Crawler - Full Site Crawling with JS Rendering & Anti-Bot Bypass

Enhanced website crawler using Firecrawl's Crawl API for superior JavaScript rendering, smart rate limiting, anti-bot bypass, and clean markdown extraction. Crawl entire websites with subdomain support, URL pattern filtering, optional screenshots, and geo-targeting. Perfect for content migration, SEO audits, research, and training data collection.

Features

  • Superior JS Rendering - Handles complex JavaScript-heavy websites (React, Vue, Angular)
  • Anti-Bot Bypass - Built-in techniques to avoid Cloudflare and other protection
  • Smart Rate Limiting - Automatic throttling to prevent IP bans
  • Clean Markdown Output - Beautifully formatted content extraction
  • Subdomain Crawling - Optionally include all subdomains
  • URL Pattern Filtering - Include or exclude specific URL patterns (regex)
  • Screenshot Capture - Optional visual snapshots of every page
  • Geo-Targeting - Crawl from specific countries for localized content
  • Configurable Depth - Control how deep the crawler follows links
  • Wait Selectors - Wait for specific elements on JS-heavy sites
  • Webhook Support - Async delivery for automation pipelines
  • Demo Mode - Test with sample data before going live

Who Should Use This Actor?

Content Migration Teams

Extract all content from legacy websites for platform migrations. Get clean markdown that can be imported into new CMS platforms without manual copying.

SEO Agencies

Crawl client sites for technical SEO audits. Extract content, check for thin pages, and identify optimization opportunities across entire domains.

Research & Analytics Teams

Gather comprehensive website data for competitive research. Analyze content strategies, messaging, and site structure.

AI/ML Engineers

Collect training data from websites at scale. Get clean, structured content for LLM fine-tuning and RAG applications.

Archive website content with timestamps for compliance documentation. Screenshot capture provides visual records.

Knowledge Management Teams

Build internal knowledge bases by crawling company wikis and documentation sites into searchable markdown.

Quick Start

Demo Mode (Free Test)

{
"demoMode": true
}

Basic Website Crawl

{
"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX",
"url": "https://example.com",
"maxPages": 50,
"outputFormat": "markdown",
"demoMode": false
}

Crawl Specific Sections Only

{
"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX",
"url": "https://docs.example.com",
"maxPages": 100,
"includePatterns": ["/docs/.*", "/guides/.*"],
"excludePatterns": ["/blog/.*", "/changelog/.*"],
"demoMode": false
}

Deep Crawl with Screenshots

{
"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX",
"url": "https://example.com",
"maxPages": 200,
"maxDepth": 10,
"includeSubdomains": true,
"includeScreenshots": true,
"demoMode": false
}

JavaScript-Heavy Site

{
"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX",
"url": "https://react-app.example.com",
"maxPages": 25,
"waitForSelector": ".main-content",
"demoMode": false
}

Input Parameters

ParameterTypeDefaultDescription
urlstring-Website URL to crawl (required unless demoMode)
maxPagesnumber100Maximum pages to crawl
maxDepthnumber5Maximum crawl depth from start URL
includeSubdomainsbooleanfalseInclude subdomains in crawl
includePatternsarray-Only crawl URLs matching these regex patterns
excludePatternsarray-Skip URLs matching these regex patterns
outputFormatstring"markdown"Content format: markdown, html, text, links
includeScreenshotsbooleanfalseCapture page screenshots
waitForSelectorstring-CSS selector to wait for (JS-heavy sites)
countrystring-Country code for geo-targeting
firecrawlApiKeystring-Your Firecrawl API key (BYOK)
demoModebooleantrueReturn sample data for testing
webhookUrlstring-Webhook URL for results delivery

Get Your Firecrawl API Key

  1. Go to firecrawl.dev (or firecrawl.link/john-rippy for 10% off)
  2. Sign up for free tier (500 credits/month) or paid plans
  3. Go to Dashboard → API Keys
  4. Copy your API key

Output Format

{
"url": "https://example.com/page",
"title": "Page Title",
"description": "Meta description of the page",
"markdown": "# Page Title\n\nFull markdown content of the page...",
"html": "<h1>Page Title</h1><p>Full HTML content...</p>",
"wordCount": 450,
"links": [
{"url": "https://example.com/other-page", "text": "Link text"}
],
"images": [
{"src": "https://example.com/image.jpg", "alt": "Image alt text"}
],
"screenshotUrl": "https://...",
"statusCode": 200,
"depth": 2,
"crawledAt": "2026-01-28T10:30:00.000Z"
}

Pricing (Pay-Per-Event)

EventDescriptionPrice
crawl_startedPer website crawl initiated$0.02
pages_crawledPer 10 pages successfully crawled$0.01

Example costs:

  • Crawl 50 pages: $0.02 + (5 x $0.01) = $0.07
  • Crawl 200 pages: $0.02 + (20 x $0.01) = $0.22
  • Crawl 500 pages with screenshots: $0.02 + (50 x $0.01) = $0.52
  • Demo mode: $0.00

Note: Firecrawl API usage is billed separately through your Firecrawl account (BYOK).

Common Scenarios

Scenario 1: Content Migration

{
"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX",
"url": "https://old-site.com",
"maxPages": 500,
"maxDepth": 10,
"outputFormat": "markdown",
"webhookUrl": "https://hooks.zapier.com/...",
"demoMode": false
}

Extract all content in markdown format for migration to a new CMS.

Scenario 2: Documentation Archival

{
"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX",
"url": "https://docs.vendor.com",
"maxPages": 300,
"includePatterns": ["/docs/.*", "/api/.*"],
"outputFormat": "markdown",
"includeScreenshots": true,
"demoMode": false
}

Archive vendor documentation with screenshots for offline reference.

Scenario 3: Competitive Content Analysis

{
"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX",
"url": "https://competitor.com",
"maxPages": 200,
"excludePatterns": ["/blog/tag/.*", "/blog/author/.*"],
"outputFormat": "markdown",
"demoMode": false
}

Gather competitor content for analysis, excluding pagination and tag pages.

Webhook & Automation Integration

Zapier / Make.com / n8n

  1. Create a webhook trigger in your automation platform
  2. Copy the webhook URL to webhookUrl
  3. Route crawled content to storage and processing

Popular automations:

  • Markdown content -> Google Drive (backup archive)
  • Page data -> Pinecone/Weaviate (RAG knowledge base)
  • Screenshots -> Cloud storage (visual archive)
  • Content -> Notion/Confluence (internal wiki)

Apify Scheduled Runs

Schedule monthly crawls to track content changes over time.

Firecrawl Actors Comparison

ActorBest ForUse When...
Firecrawl Website Crawler (this one)Full site crawlingYou need to crawl an entire website
Firecrawl Site MapperURL discoveryYou just need a URL list, not content
Firecrawl ProAdvanced scrapingYou need actions, screenshots, or stealth mode
Firecrawl AgentComplex extractionsYou need AI to navigate and extract

FAQ

Q: What's the maximum site size this can handle?

A: Practical limits depend on your Firecrawl plan credits. Most sites up to 1,000-5,000 pages work well.

Q: How does it handle JavaScript sites?

A: Firecrawl renders JavaScript fully before extraction. For sites with delayed loading, use waitForSelector to wait for specific content.

Q: Can I crawl password-protected sites?

A: No. This actor only crawls publicly accessible pages.

Q: What format should I use for LLM training data?

A: Use markdown format - it preserves structure while being clean and parseable for LLM applications.

Q: How do include/exclude patterns work?

A: Patterns are JavaScript regex. For example, /blog/.* matches all blog URLs, /docs/api/.* matches API documentation only.

Common Problems & Solutions

"Invalid API key" error

  • Get your API key from firecrawl.dev dashboard
  • Copy it exactly without extra spaces

Crawl taking too long

  • Reduce maxPages or maxDepth
  • Use includePatterns to limit scope
  • Use excludePatterns to skip unnecessary sections

Empty or missing content

  • Use waitForSelector for JS-heavy sites
  • Check if site has anti-bot protection (may need Firecrawl Pro with stealth mode)

Rate limit errors

  • Firecrawl handles rate limiting automatically
  • Reduce maxPages if hitting Firecrawl credit limits

"Demo data showing"

  • Set demoMode: false and provide your firecrawlApiKey

📞 Support


Built by John Rippy | Actor Arsenal