Scrape Website To Llm Dataset — Data, Details & Metadata
Pricing
Pay per usage
Go to Apify Store

Scrape Website To Llm Dataset — Data, Details & Metadata
Scrape website to llm dataset data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Donny Nguyen
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
19 hours ago
Last modified
Categories
Share
Website to LLM Dataset
Crawl any website and export clean markdown or plain text optimized for LLM training, RAG pipelines, and AI knowledge bases.
Features
- Strips navigation, ads, footers, scripts, and clutter
- Converts HTML to clean markdown with headings, paragraphs, lists, code blocks, and links
- Enqueues links from the same domain up to maxPages
- Outputs word count and character count metadata
- Supports markdown, text, and json output formats
Input
{"urls": ["https://docs.stripe.com"],"maxPages": 10,"outputFormat": "markdown"}
Output
Each page produces a dataset item with:
url- The page URLtitle- The page titlecontent- Clean markdown/text contentwordCount- Number of wordscharCount- Number of characterscrawledAt- ISO timestamp