
🔥fireScraper AI Prompt Website Content Markdown Scraper
Pricing
$15.00 / 1,000 promptresults

🔥fireScraper AI Prompt Website Content Markdown Scraper
fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines
5.0 (1)
Pricing
$15.00 / 1,000 promptresults
1
Total users
2
Monthly users
1
Runs succeeded
>99%
Last modified
2 days ago
🔥 fireScrape AI Prompt Website Content Markdown Scraper
Overview
fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines.
🎯 Features
- 📝 Extracts visible text, full HTML, or both
- 🔄 Applies your custom prompt (e.g. “Summarize this page”) to each page
- 📝 Converts content to clean Markdown
- 📸 Captures full‑page screenshots
- 🌐 Supports proxy configurations (Apify Proxy, custom)
- 🔗 Follows links for deep multi-page crawling
- ⚙️ Easily extended for JS‑heavy sites, login flows, or custom selectors
🛠️ Input Schema
{"startUrls": [{ "url": "https://apify.com" }],"prompt": "Summarize the key points of this page in bullet form.","maxPages": 5,"proxyConfig": {"useApifyProxy": true},"screenshot": true,"enqueue": true,"getText": false,"getHtml": false}
Field | Type | Description |
---|---|---|
startUrls | Array | List of seed URLs to crawl. Required. |
prompt | String | Custom instruction or question to run on each page’s extracted content. |
maxPages | Integer | Maximum number of pages to visit. Default: 5 . |
proxyConfig | Object | Proxy settings (supports Apify Proxy). |
screenshot | Boolean | Capture a screenshot of each page. Default: true . |
enqueue | Boolean | Follow and enqueue new links found on each page. Default: true . |
getText | Boolean | Extract only visible text content. Default: false . |
getHtml | Boolean | Extract full raw HTML. Default: false . |
✅ Output Format
Each page yields a JSON object with:
{"url": "https://example.com","title": "Example Page","promptResult": "• Point one\n• Point two\n• Point three","metadata": {"description": "An example page","keywords": ["example","page"]},"markdown": "# Example Page\n\nThis is the markdown content…","textContent": "This is the visible text…","htmlContent": "<html>…</html>","screenshot": "…"}
🚀 Use Cases
-
LLM Dataset Creation Collect and pre‑process web content into Markdown, then run custom prompts to generate training samples, summaries, or question‑answer pairs.
-
Automated Content Summaries Crawl blogs, news sites or documentation to produce concise, prompt‑driven summaries for research or reporting.
-
SEO & Content Audits Extract headings, metadata and full text, then prompt your model to analyze keyword usage, readability and suggestions.
-
Knowledge Base Generation Pull FAQs, tutorials or API docs and transform them into structured Markdown + AI‑enriched annotations for internal wikis or help desks.
-
Competitive Intelligence Scrape competitor sites at scale, run custom prompts to highlight feature comparisons, pricing tables, or sentiment across pages.
🚀 How to Run
- Deploy the actor on Apify or run locally.
- Configure
startUrls
,prompt
and other options via UI or API. - Click Run, monitor logs in real time.
- Download the dataset as JSON or Markdown for downstream use.
🔧 Customization Tips
- Chain multiple prompts or different prompts per domain
- Add login/authentication handlers for gated content
- Integrate NLP post‑processing (e.g. entity extraction)
- Output to alternate formats (PDF, DOCX, CSV)
Happy scraping and AI‑driven insights! 🚀🔥