
🔥 FireScrape AI Website Content Markdown Scraper
1 day trial then $30.00/month - No credit card required now

🔥 FireScrape AI Website Content Markdown Scraper
1 day trial then $30.00/month - No credit card required now
Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.
Actor Metrics
2 monthly users
5.0 / 5 (1)
1 bookmark
>99% runs succeeded
Created in Mar 2025
Modified 2 days ago
Overview
FireScrape is a powerful web scraper built with Crawlee and Puppeteer. It crawls websites, extracts content, converts it into Markdown format, and structures the data — perfect for generating datasets for LLMs.
🎯 Features
- Extracts visible text or full HTML content
- Converts content to Markdown
- Captures screenshots
- Supports proxy configurations
- Follows links for deep crawling
🛠️ Input Schema
1{ 2 "title": "FireScrape Input Schema", 3 "type": "object", 4 "schemaVersion": 1, 5 "properties": { 6 "startUrls": { 7 "title": "Start URLs", 8 "type": "array", 9 "description": "List of URLs to start crawling from.", 10 "editor": "requestListSources", 11 "prefill": [{ "url": "https://apify.com" }] 12 }, 13 "maxPages": { 14 "title": "Maximum Pages", 15 "type": "integer", 16 "description": "The maximum number of pages to crawl.", 17 "default": 50, 18 "minimum": 1 19 }, 20 "proxyConfig": { 21 "title": "Proxy Configuration", 22 "type": "object", 23 "description": "Select proxy settings.", 24 "editor": "proxy", 25 "default": { "useApifyProxy": true } 26 }, 27 "screenshot": { 28 "title": "Take Screenshots", 29 "type": "boolean", 30 "description": "Enable this to capture a screenshot of each page.", 31 "default": true 32 }, 33 "enqueue": { 34 "title": "Enqueue Links", 35 "type": "boolean", 36 "description": "Whether to follow and enqueue new links on the page.", 37 "default": true 38 }, 39 "getText": { 40 "title": "Extract Text Content", 41 "type": "boolean", 42 "description": "Extract only the visible text content from the page.", 43 "default": false 44 }, 45 "getHtml": { 46 "title": "Extract HTML Content", 47 "type": "boolean", 48 "description": "Extract the full HTML content of the page.", 49 "default": false 50 } 51 }, 52 "required": ["startUrls"] 53}
✅ Output Format
Each successfully scraped page will output a structured JSON object:
1{ 2 "url": "https://example.com", 3 "title": "Example Page", 4 "metadata": { "description": "An example page", "keywords": ["example", "page"] }, 5 "markdown": "# Example Page\n\nThis is an example page content...", 6 "textContent": "This is an example page content...", 7 "htmlContent": "<html><body><h1>Example Page</h1>...</body></html>", 8 "screenshot": "data:image/png;base64,iVBORw..." 9}
🚀 How to Run
- Deploy the actor on Apify.
- Input the desired URLs and configuration.
- Start the scraper and monitor progress.
- Download results as JSON or Markdown.
🔧 Customization
Feel free to extend FireScrape with additional features — like handling dynamic content, authentication, or specialized formatting.
Happy scraping! 🚀🔥