Scrape Website To Llm Dataset — Data, Details & Metadata avatar
Scrape Website To Llm Dataset — Data, Details & Metadata

Pricing

Pay per usage

Go to Apify Store
Scrape Website To Llm Dataset — Data, Details & Metadata

Scrape Website To Llm Dataset — Data, Details & Metadata

Scrape website to llm dataset data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Donny Nguyen

Donny Nguyen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 hours ago

Last modified

Share

Website to LLM Dataset

Crawl any website and export clean markdown or plain text optimized for LLM training, RAG pipelines, and AI knowledge bases.

Features

  • Strips navigation, ads, footers, scripts, and clutter
  • Converts HTML to clean markdown with headings, paragraphs, lists, code blocks, and links
  • Enqueues links from the same domain up to maxPages
  • Outputs word count and character count metadata
  • Supports markdown, text, and json output formats

Input

{
"urls": ["https://docs.stripe.com"],
"maxPages": 10,
"outputFormat": "markdown"
}

Output

Each page produces a dataset item with:

  • url - The page URL
  • title - The page title
  • content - Clean markdown/text content
  • wordCount - Number of words
  • charCount - Number of characters
  • crawledAt - ISO timestamp

Built By Donny Dev