🔥 FireScrape AI Website Content Markdown Scraper avatar
🔥 FireScrape AI Website Content Markdown Scraper

Pricing

$30.00/month + usage

Go to Store
🔥 FireScrape AI Website Content Markdown Scraper

🔥 FireScrape AI Website Content Markdown Scraper

Developed by

mohamed el hadi msaid

mohamed el hadi msaid

Maintained by Community

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

3.5 (3)

Pricing

$30.00/month + usage

4

Total users

95

Monthly users

31

Runs succeeded

>99%

Last modified

13 days ago

Overview

FireScrape is a powerful web scraper built with Crawlee and Puppeteer. It crawls websites, extracts content, converts it into Markdown format, and structures the data — perfect for generating datasets for LLMs.


🎯 Features

  • Extracts visible text or full HTML content
  • Converts content to Markdown
  • Captures screenshots
  • Supports proxy configurations
  • Follows links for deep crawling

🛠️ Input Schema

{
"title": "FireScrape Input Schema",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrls": {
"title": "Start URLs",
"type": "array",
"description": "List of URLs to start crawling from.",
"editor": "requestListSources",
"prefill": [{ "url": "https://apify.com" }]
},
"maxPages": {
"title": "Maximum Pages",
"type": "integer",
"description": "The maximum number of pages to crawl.",
"default": 50,
"minimum": 1
},
"proxyConfig": {
"title": "Proxy Configuration",
"type": "object",
"description": "Select proxy settings.",
"editor": "proxy",
"default": { "useApifyProxy": true }
},
"screenshot": {
"title": "Take Screenshots",
"type": "boolean",
"description": "Enable this to capture a screenshot of each page.",
"default": true
},
"enqueue": {
"title": "Enqueue Links",
"type": "boolean",
"description": "Whether to follow and enqueue new links on the page.",
"default": true
},
"getText": {
"title": "Extract Text Content",
"type": "boolean",
"description": "Extract only the visible text content from the page.",
"default": false
},
"getHtml": {
"title": "Extract HTML Content",
"type": "boolean",
"description": "Extract the full HTML content of the page.",
"default": false
}
},
"required": ["startUrls"]
}

✅ Output Format

Each successfully scraped page will output a structured JSON object:

{
"url": "https://example.com",
"title": "Example Page",
"metadata": { "description": "An example page", "keywords": ["example", "page"] },
"markdown": "# Example Page\n\nThis is an example page content...",
"textContent": "This is an example page content...",
"htmlContent": "<html><body><h1>Example Page</h1>...</body></html>",
"screenshot": "data:image/png;base64,iVBORw..."
}

🚀 How to Run

  1. Deploy the actor on Apify.
  2. Input the desired URLs and configuration.
  3. Start the scraper and monitor progress.
  4. Download results as JSON or Markdown.

🔧 Customization

Feel free to extend FireScrape with additional features — like handling dynamic content, authentication, or specialized formatting.


🎁 Bonus: n8n Workflow Integration

As a free bonus for using FireScrape, you can integrate these n8n workflows with this actor:

These workflows can help automate post-scraping actions and expand your automation capabilities.

Happy scraping! 🚀🔥