Firecrawl Website Crawler
Pricing
from $0.01 / 1,000 results
Firecrawl Website Crawler
Enhanced Website Crawling with Superior JS Rendering Enhanced website crawler using Firecrawl's Crawl API for superior JavaScript rendering, smart rate limiting, anti-bot bypass, and clean markdown extraction.
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

The Howlers
Actor stats
0
Bookmarked
6
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Firecrawl Website Crawler - Full Site Crawling with JS Rendering & Anti-Bot Bypass
Enhanced website crawler using Firecrawl's Crawl API for superior JavaScript rendering, smart rate limiting, anti-bot bypass, and clean markdown extraction. Crawl entire websites with subdomain support, URL pattern filtering, optional screenshots, and geo-targeting. Perfect for content migration, SEO audits, research, and training data collection.
Features
- Superior JS Rendering - Handles complex JavaScript-heavy websites (React, Vue, Angular)
- Anti-Bot Bypass - Built-in techniques to avoid Cloudflare and other protection
- Smart Rate Limiting - Automatic throttling to prevent IP bans
- Clean Markdown Output - Beautifully formatted content extraction
- Subdomain Crawling - Optionally include all subdomains
- URL Pattern Filtering - Include or exclude specific URL patterns (regex)
- Screenshot Capture - Optional visual snapshots of every page
- Geo-Targeting - Crawl from specific countries for localized content
- Configurable Depth - Control how deep the crawler follows links
- Wait Selectors - Wait for specific elements on JS-heavy sites
- Webhook Support - Async delivery for automation pipelines
- Demo Mode - Test with sample data before going live
Who Should Use This Actor?
Content Migration Teams
Extract all content from legacy websites for platform migrations. Get clean markdown that can be imported into new CMS platforms without manual copying.
SEO Agencies
Crawl client sites for technical SEO audits. Extract content, check for thin pages, and identify optimization opportunities across entire domains.
Research & Analytics Teams
Gather comprehensive website data for competitive research. Analyze content strategies, messaging, and site structure.
AI/ML Engineers
Collect training data from websites at scale. Get clean, structured content for LLM fine-tuning and RAG applications.
Legal & Compliance Teams
Archive website content with timestamps for compliance documentation. Screenshot capture provides visual records.
Knowledge Management Teams
Build internal knowledge bases by crawling company wikis and documentation sites into searchable markdown.
Quick Start
Demo Mode (Free Test)
{"demoMode": true}
Basic Website Crawl
{"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX","url": "https://example.com","maxPages": 50,"outputFormat": "markdown","demoMode": false}
Crawl Specific Sections Only
{"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX","url": "https://docs.example.com","maxPages": 100,"includePatterns": ["/docs/.*", "/guides/.*"],"excludePatterns": ["/blog/.*", "/changelog/.*"],"demoMode": false}
Deep Crawl with Screenshots
{"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX","url": "https://example.com","maxPages": 200,"maxDepth": 10,"includeSubdomains": true,"includeScreenshots": true,"demoMode": false}
JavaScript-Heavy Site
{"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX","url": "https://react-app.example.com","maxPages": 25,"waitForSelector": ".main-content","demoMode": false}
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | - | Website URL to crawl (required unless demoMode) |
maxPages | number | 100 | Maximum pages to crawl |
maxDepth | number | 5 | Maximum crawl depth from start URL |
includeSubdomains | boolean | false | Include subdomains in crawl |
includePatterns | array | - | Only crawl URLs matching these regex patterns |
excludePatterns | array | - | Skip URLs matching these regex patterns |
outputFormat | string | "markdown" | Content format: markdown, html, text, links |
includeScreenshots | boolean | false | Capture page screenshots |
waitForSelector | string | - | CSS selector to wait for (JS-heavy sites) |
country | string | - | Country code for geo-targeting |
firecrawlApiKey | string | - | Your Firecrawl API key (BYOK) |
demoMode | boolean | true | Return sample data for testing |
webhookUrl | string | - | Webhook URL for results delivery |
Get Your Firecrawl API Key
- Go to firecrawl.dev (or firecrawl.link/john-rippy for 10% off)
- Sign up for free tier (500 credits/month) or paid plans
- Go to Dashboard → API Keys
- Copy your API key
Output Format
{"url": "https://example.com/page","title": "Page Title","description": "Meta description of the page","markdown": "# Page Title\n\nFull markdown content of the page...","html": "<h1>Page Title</h1><p>Full HTML content...</p>","wordCount": 450,"links": [{"url": "https://example.com/other-page", "text": "Link text"}],"images": [{"src": "https://example.com/image.jpg", "alt": "Image alt text"}],"screenshotUrl": "https://...","statusCode": 200,"depth": 2,"crawledAt": "2026-01-28T10:30:00.000Z"}
Pricing (Pay-Per-Event)
| Event | Description | Price |
|---|---|---|
crawl_started | Per website crawl initiated | $0.02 |
pages_crawled | Per 10 pages successfully crawled | $0.01 |
Example costs:
- Crawl 50 pages: $0.02 + (5 x $0.01) = $0.07
- Crawl 200 pages: $0.02 + (20 x $0.01) = $0.22
- Crawl 500 pages with screenshots: $0.02 + (50 x $0.01) = $0.52
- Demo mode: $0.00
Note: Firecrawl API usage is billed separately through your Firecrawl account (BYOK).
Common Scenarios
Scenario 1: Content Migration
{"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX","url": "https://old-site.com","maxPages": 500,"maxDepth": 10,"outputFormat": "markdown","webhookUrl": "https://hooks.zapier.com/...","demoMode": false}
Extract all content in markdown format for migration to a new CMS.
Scenario 2: Documentation Archival
{"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX","url": "https://docs.vendor.com","maxPages": 300,"includePatterns": ["/docs/.*", "/api/.*"],"outputFormat": "markdown","includeScreenshots": true,"demoMode": false}
Archive vendor documentation with screenshots for offline reference.
Scenario 3: Competitive Content Analysis
{"firecrawlApiKey": "fc-XXXXXXXXXXXXXXXX","url": "https://competitor.com","maxPages": 200,"excludePatterns": ["/blog/tag/.*", "/blog/author/.*"],"outputFormat": "markdown","demoMode": false}
Gather competitor content for analysis, excluding pagination and tag pages.
Webhook & Automation Integration
Zapier / Make.com / n8n
- Create a webhook trigger in your automation platform
- Copy the webhook URL to
webhookUrl - Route crawled content to storage and processing
Popular automations:
- Markdown content -> Google Drive (backup archive)
- Page data -> Pinecone/Weaviate (RAG knowledge base)
- Screenshots -> Cloud storage (visual archive)
- Content -> Notion/Confluence (internal wiki)
Apify Scheduled Runs
Schedule monthly crawls to track content changes over time.
Firecrawl Actors Comparison
| Actor | Best For | Use When... |
|---|---|---|
| Firecrawl Website Crawler (this one) | Full site crawling | You need to crawl an entire website |
| Firecrawl Site Mapper | URL discovery | You just need a URL list, not content |
| Firecrawl Pro | Advanced scraping | You need actions, screenshots, or stealth mode |
| Firecrawl Agent | Complex extractions | You need AI to navigate and extract |
FAQ
Q: What's the maximum site size this can handle?
A: Practical limits depend on your Firecrawl plan credits. Most sites up to 1,000-5,000 pages work well.
Q: How does it handle JavaScript sites?
A: Firecrawl renders JavaScript fully before extraction. For sites with delayed loading, use waitForSelector to wait for specific content.
Q: Can I crawl password-protected sites?
A: No. This actor only crawls publicly accessible pages.
Q: What format should I use for LLM training data?
A: Use markdown format - it preserves structure while being clean and parseable for LLM applications.
Q: How do include/exclude patterns work?
A: Patterns are JavaScript regex. For example, /blog/.* matches all blog URLs, /docs/api/.* matches API documentation only.
Common Problems & Solutions
"Invalid API key" error
- Get your API key from firecrawl.dev dashboard
- Copy it exactly without extra spaces
Crawl taking too long
- Reduce
maxPagesormaxDepth - Use
includePatternsto limit scope - Use
excludePatternsto skip unnecessary sections
Empty or missing content
- Use
waitForSelectorfor JS-heavy sites - Check if site has anti-bot protection (may need Firecrawl Pro with stealth mode)
Rate limit errors
- Firecrawl handles rate limiting automatically
- Reduce
maxPagesif hitting Firecrawl credit limits
"Demo data showing"
- Set
demoMode: falseand provide yourfirecrawlApiKey
📞 Support
- Actor Arsenal: Full Actor Catalog
- Developer: John Rippy
Built by John Rippy | Actor Arsenal