🔥fireScraper AI Prompt Website Content Markdown Scraper

Deprecated

Pricing

$15.00 / 1,000 promptresults

See alternative Actors

Go to Apify Store

🔥fireScraper AI Prompt Website Content Markdown Scraper

Deprecated

See alternative Actors

Developed by

mohamed el hadi msaid

Maintained by Community

fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines

5.0 (1)

Pricing

$15.00 / 1,000 promptresults

Last modified

15 days ago

Developer tools

Automation

🔥 fireScrape AI Prompt Website Content Markdown Scraper

Note: As a free bonus for using this actor, you can also use the following n8n workflows:

Overview

fireScrape AI is an advanced web scraper built with Crawlee and Puppeteer. It crawls websites, extracts meaningful content, converts it into Markdown, then runs your custom prompt on the extracted text—ideal for generating enriched datasets, summaries or analyses for LLMs and AI pipelines.

🎯 Features

📝 Extracts visible text, full HTML, or both
🔄 Applies your custom prompt (e.g. “Summarize this page”) to each page
📝 Converts content to clean Markdown
📸 Captures full‑page screenshots
🌐 Supports proxy configurations (Apify Proxy, custom)
🔗 Follows links for deep multi-page crawling
⚙️ Easily extended for JS‑heavy sites, login flows, or custom selectors

🛠️ Input Schema

{
  "startUrls": [
    { "url": "https://apify.com" }
  ],
  "prompt": "Summarize the key points of this page in bullet form.",
  "maxPages": 5,
  "proxyConfig": {
    "useApifyProxy": true
  },
  "screenshot": true,
  "enqueue": true,
  "getText": false,
  "getHtml": false
}

Field	Type	Description
`startUrls`	Array	List of seed URLs to crawl. Required.
`prompt`	String	Custom instruction or question to run on each page’s extracted content.
`maxPages`	Integer	Maximum number of pages to visit. Default: `5`.
`proxyConfig`	Object	Proxy settings (supports Apify Proxy).
`screenshot`	Boolean	Capture a screenshot of each page. Default: `true`.
`enqueue`	Boolean	Follow and enqueue new links found on each page. Default: `true`.
`getText`	Boolean	Extract only visible text content. Default: `false`.
`getHtml`	Boolean	Extract full raw HTML. Default: `false`.

✅ Output Format

Each page yields a JSON object with:

{
  "url": "https://example.com",
  "title": "Example Page",
  "promptResult": "• Point one\n• Point two\n• Point three",
  "metadata": {
    "description": "An example page",
    "keywords": ["example","page"]
  },
  "markdown": "# Example Page\n\nThis is the markdown content…",
  "textContent": "This is the visible text…",
  "htmlContent": "<html>…</html>",
  "screenshot": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUg…"
}

🚀 Use Cases

LLM Dataset Creation Collect and pre‑process web content into Markdown, then run custom prompts to generate training samples, summaries, or question‑answer pairs.
Automated Content Summaries Crawl blogs, news sites or documentation to produce concise, prompt‑driven summaries for research or reporting.
SEO & Content Audits Extract headings, metadata and full text, then prompt your model to analyze keyword usage, readability and suggestions.
Knowledge Base Generation Pull FAQs, tutorials or API docs and transform them into structured Markdown + AI‑enriched annotations for internal wikis or help desks.
Competitive Intelligence Scrape competitor sites at scale, run custom prompts to highlight feature comparisons, pricing tables, or sentiment across pages.

🚀 How to Run

Deploy the actor on Apify or run locally.
Configure startUrls, prompt and other options via UI or API.
Click Run, monitor logs in real time.
Download the dataset as JSON or Markdown for downstream use.

🔧 Customization Tips

Chain multiple prompts or different prompts per domain
Add login/authentication handlers for gated content
Integrate NLP post‑processing (e.g. entity extraction)
Output to alternate formats (PDF, DOCX, CSV)

Happy scraping and AI‑driven insights! 🚀🔥

On this page

🔥 fireScrape AI Prompt Website Content Markdown Scraper

Share Actor:

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

128

3.8

(3)

💻 Job Search Engine (Multi-Platform Job Scraper)

marketingme/job-search-engine

🌟 Extract jobs from Indeed, LinkedIn, Glassdoor, and ZipRecruiter - complete with salary data, remote work options, company information, and job metadata for recruitment intelligence and market research.

MarketingMe

5.0

(1)

AI Content Detector 🔍

easyapi/ai-content-detector

🤖 Analyze text content to determine if it's AI-generated with high accuracy. Get detailed probability analysis and authoritative conclusions about content authenticity. Perfect for content verification, academic integrity, and digital publishing quality control.

EasyApi

5.0

(1)

Udemy Course Scraper 📚

easyapi/udemy-course-scraper

Extract detailed course information from Udemy.com with this powerful scraper. Collect comprehensive data about online courses, including ratings, content details, instructors, and pricing. Perfect for market research, content aggregation, and educational platform development.

EasyApi

5.0

(1)

Html to Markdown Converter

antonio_espresso/html-to-markdown-converter

Crawl a target URL and convert its HTML content into clean, structured Markdown with optional heading-based chunking.

Antonio Blago

Website Content Crawler

apify/website-content-crawler

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Apify

71K

4.4

(45)

Linkedin Company Intelligence Pro

red.cars/linkedin-company-intelligence-pro

Extract comprehensive company data from LinkedIn with advanced business intelligence processing. No LinkedIn API key required, instant access to professional company insights!

AutomateLab

Crawl4AI

janbuchar/crawl4ai

Wraps the Crawl4AI open-source library for retrieving text content from websites.

Jan Buchar

572

5.0

(1)

Youtube channel scraper

dainty_screw/youtube-channel-scraper

This tool allows you to automatically gather information about YouTube bloggers, including their nickname, avatar, country, introduction, joining date, video count, view count, and subscriber count. Simply provide the URLs, IDs, or channel IDs to get started.

codemaster devops

210

4.5

(2)

Sales Navigator Profile Scraper

pratikdani/sales-navigator-profile-scraper

Sales Navigator Profile Scraper

Pratik Dani

202

5.0

(1)

🔥 Y Combinator Scraper [API] 2025 | Super Cheap & Fast

clearpath/ycombinator-api-scraper

Extract complete Y Combinator ecosystem data - 5000+ companies, 8000+ founders, 3500+ jobs. Perfect for VCs, recruiters, and researchers. Get startup intelligence, funding trends, team data, and job listings. Reliable Python scraper with proxy support. Start at $3.50.