
π₯ FireScrape AI Website Content Markdown Scraper
Pricing
$30.00/month + usage
Go to Store

π₯ FireScrape AI Website Content Markdown Scraper
Advanced web scraper powered by Crawlee and Puppeteer β extracts website content, converts it to Markdown, and structures it for LLM training datasets.
5.0 (1)
Pricing
$30.00/month + usage
1
Total users
33
Monthly users
19
Runs succeeded
>99%
Response time
19 days
Last modified
2 months ago
Overview
FireScrape is a powerful web scraper built with Crawlee and Puppeteer. It crawls websites, extracts content, converts it into Markdown format, and structures the data β perfect for generating datasets for LLMs.
π― Features
- Extracts visible text or full HTML content
- Converts content to Markdown
- Captures screenshots
- Supports proxy configurations
- Follows links for deep crawling
π οΈ Input Schema
{"title": "FireScrape Input Schema","type": "object","schemaVersion": 1,"properties": {"startUrls": {"title": "Start URLs","type": "array","description": "List of URLs to start crawling from.","editor": "requestListSources","prefill": [{ "url": "https://apify.com" }]},"maxPages": {"title": "Maximum Pages","type": "integer","description": "The maximum number of pages to crawl.","default": 50,"minimum": 1},"proxyConfig": {"title": "Proxy Configuration","type": "object","description": "Select proxy settings.","editor": "proxy","default": { "useApifyProxy": true }},"screenshot": {"title": "Take Screenshots","type": "boolean","description": "Enable this to capture a screenshot of each page.","default": true},"enqueue": {"title": "Enqueue Links","type": "boolean","description": "Whether to follow and enqueue new links on the page.","default": true},"getText": {"title": "Extract Text Content","type": "boolean","description": "Extract only the visible text content from the page.","default": false},"getHtml": {"title": "Extract HTML Content","type": "boolean","description": "Extract the full HTML content of the page.","default": false}},"required": ["startUrls"]}
β Output Format
Each successfully scraped page will output a structured JSON object:
{"url": "https://example.com","title": "Example Page","metadata": { "description": "An example page", "keywords": ["example", "page"] },"markdown": "# Example Page\n\nThis is an example page content...","textContent": "This is an example page content...","htmlContent": "<html><body><h1>Example Page</h1>...</body></html>","screenshot": "data:image/png;base64,iVBORw..."}
π How to Run
- Deploy the actor on Apify.
- Input the desired URLs and configuration.
- Start the scraper and monitor progress.
- Download results as JSON or Markdown.
π§ Customization
Feel free to extend FireScrape with additional features β like handling dynamic content, authentication, or specialized formatting.
Happy scraping! ππ₯