🔥 FireScrape AI Website Content Markdown Scraper avatar

🔥 FireScrape AI Website Content Markdown Scraper

Try for free

1 day trial then $30.00/month - No credit card required now

Go to Store
🔥 FireScrape AI Website Content Markdown Scraper

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/firescraper-ai-website-content-markdown-scraper
Try for free

1 day trial then $30.00/month - No credit card required now

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

Developer
Maintained by Community

Actor Metrics

  • 2 monthly users

  • 5.0 / 5 (1)

  • 1 bookmark

  • >99% runs succeeded

  • Created in Mar 2025

  • Modified 2 days ago

Overview

FireScrape is a powerful web scraper built with Crawlee and Puppeteer. It crawls websites, extracts content, converts it into Markdown format, and structures the data — perfect for generating datasets for LLMs.


🎯 Features

  • Extracts visible text or full HTML content
  • Converts content to Markdown
  • Captures screenshots
  • Supports proxy configurations
  • Follows links for deep crawling

🛠️ Input Schema

1{
2  "title": "FireScrape Input Schema",
3  "type": "object",
4  "schemaVersion": 1,
5  "properties": {
6    "startUrls": {
7      "title": "Start URLs",
8      "type": "array",
9      "description": "List of URLs to start crawling from.",
10      "editor": "requestListSources",
11      "prefill": [{ "url": "https://apify.com" }]
12    },
13    "maxPages": {
14      "title": "Maximum Pages",
15      "type": "integer",
16      "description": "The maximum number of pages to crawl.",
17      "default": 50,
18      "minimum": 1
19    },
20    "proxyConfig": {
21      "title": "Proxy Configuration",
22      "type": "object",
23      "description": "Select proxy settings.",
24      "editor": "proxy",
25      "default": { "useApifyProxy": true }
26    },
27    "screenshot": {
28      "title": "Take Screenshots",
29      "type": "boolean",
30      "description": "Enable this to capture a screenshot of each page.",
31      "default": true
32    },
33    "enqueue": {
34      "title": "Enqueue Links",
35      "type": "boolean",
36      "description": "Whether to follow and enqueue new links on the page.",
37      "default": true
38    },
39    "getText": {
40      "title": "Extract Text Content",
41      "type": "boolean",
42      "description": "Extract only the visible text content from the page.",
43      "default": false
44    },
45    "getHtml": {
46      "title": "Extract HTML Content",
47      "type": "boolean",
48      "description": "Extract the full HTML content of the page.",
49      "default": false
50    }
51  },
52  "required": ["startUrls"]
53}

✅ Output Format

Each successfully scraped page will output a structured JSON object:

1{
2  "url": "https://example.com",
3  "title": "Example Page",
4  "metadata": { "description": "An example page", "keywords": ["example", "page"] },
5  "markdown": "# Example Page\n\nThis is an example page content...",
6  "textContent": "This is an example page content...",
7  "htmlContent": "<html><body><h1>Example Page</h1>...</body></html>",
8  "screenshot": "data:image/png;base64,iVBORw..."
9}

🚀 How to Run

  1. Deploy the actor on Apify.
  2. Input the desired URLs and configuration.
  3. Start the scraper and monitor progress.
  4. Download results as JSON or Markdown.

🔧 Customization

Feel free to extend FireScrape with additional features — like handling dynamic content, authentication, or specialized formatting.

Happy scraping! 🚀🔥