🔥fireScraper AI Prompt Website Content Markdown Scraper avatar
🔥fireScraper AI Prompt Website Content Markdown Scraper

Pricing

$15.00 / 1,000 promptresults

Go to Store
🔥fireScraper AI Prompt Website Content Markdown Scraper

🔥fireScraper AI Prompt Website Content Markdown Scraper

Developed by

mohamed el hadi msaid

mohamed el hadi msaid

Maintained by Community

fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines

5.0 (1)

Pricing

$15.00 / 1,000 promptresults

1

Total users

2

Monthly users

1

Runs succeeded

>99%

Last modified

2 days ago

🔥 fireScrape AI Prompt Website Content Markdown Scraper

Overview

fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines.


🎯 Features

  • 📝 Extracts visible text, full HTML, or both
  • 🔄 Applies your custom prompt (e.g. “Summarize this page”) to each page
  • 📝 Converts content to clean Markdown
  • 📸 Captures full‑page screenshots
  • 🌐 Supports proxy configurations (Apify Proxy, custom)
  • 🔗 Follows links for deep multi-page crawling
  • ⚙️ Easily extended for JS‑heavy sites, login flows, or custom selectors

🛠️ Input Schema

{
"startUrls": [
{ "url": "https://apify.com" }
],
"prompt": "Summarize the key points of this page in bullet form.",
"maxPages": 5,
"proxyConfig": {
"useApifyProxy": true
},
"screenshot": true,
"enqueue": true,
"getText": false,
"getHtml": false
}
FieldTypeDescription
startUrlsArrayList of seed URLs to crawl. Required.
promptStringCustom instruction or question to run on each page’s extracted content.
maxPagesIntegerMaximum number of pages to visit. Default: 5.
proxyConfigObjectProxy settings (supports Apify Proxy).
screenshotBooleanCapture a screenshot of each page. Default: true.
enqueueBooleanFollow and enqueue new links found on each page. Default: true.
getTextBooleanExtract only visible text content. Default: false.
getHtmlBooleanExtract full raw HTML. Default: false.

✅ Output Format

Each page yields a JSON object with:

{
"url": "https://example.com",
"title": "Example Page",
"promptResult": "• Point one\n• Point two\n• Point three",
"metadata": {
"description": "An example page",
"keywords": ["example","page"]
},
"markdown": "# Example Page\n\nThis is the markdown content…",
"textContent": "This is the visible text…",
"htmlContent": "<html>…</html>",
"screenshot": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg…"
}

🚀 Use Cases

  • LLM Dataset Creation Collect and pre‑process web content into Markdown, then run custom prompts to generate training samples, summaries, or question‑answer pairs.

  • Automated Content Summaries Crawl blogs, news sites or documentation to produce concise, prompt‑driven summaries for research or reporting.

  • SEO & Content Audits Extract headings, metadata and full text, then prompt your model to analyze keyword usage, readability and suggestions.

  • Knowledge Base Generation Pull FAQs, tutorials or API docs and transform them into structured Markdown + AI‑enriched annotations for internal wikis or help desks.

  • Competitive Intelligence Scrape competitor sites at scale, run custom prompts to highlight feature comparisons, pricing tables, or sentiment across pages.


🚀 How to Run

  1. Deploy the actor on Apify or run locally.
  2. Configure startUrls, prompt and other options via UI or API.
  3. Click Run, monitor logs in real time.
  4. Download the dataset as JSON or Markdown for downstream use.

🔧 Customization Tips

  • Chain multiple prompts or different prompts per domain
  • Add login/authentication handlers for gated content
  • Integrate NLP post‑processing (e.g. entity extraction)
  • Output to alternate formats (PDF, DOCX, CSV)

Happy scraping and AI‑driven insights! 🚀🔥