Get started
Product
Back
Start here!
Get data with ready-made web scrapers for popular websites
Browse 19,189 Actors
Apify platform
Apify Store
Pre-built web scraping tools
Actors
Build and run serverless programs
Integrations
Connect with apps and services
MCP
Give your AI access to Actors
Anti-blocking
Scrape without getting blocked
Proxy
Rotate scraper IP addresses
Open source
Crawlee
Web scraping and crawling library
Solutions
MCP server configuration
Configure your Apify MCP server with Actors and tools for seamless integration with MCP clients.
Start building
Web data for
Enterprise
Startups
Universities
Nonprofits
Use cases
Data for generative AI
Data for AI agents
Lead generation
Market research
View more →
Consulting
Apify Professional Services
Apify Partners
Developers
Documentation
Full reference for the Apify platform
Code templates
Python, JavaScript, and TypeScript
Web scraping academy
Courses for beginners and experts
Monetize your code
Publish your scrapers and get paid
Learn
API reference
CLI
SDK
Earn from your code
$596k paid out in December. Many developers earn $3k+ every month.
Start earning now
Resources
Help and support
Advice and answers about Apify
Actor ideas
Get inspired to build Actors
Changelog
See what’s new on Apify
Customer stories
Find out how others use Apify
Company
About Apify
Contact us
Blog
Live events
Partners
Jobs
We're hiring!
Join our Discord
Talk to scraping experts
Pricing
Contact sales
Url To Llm Dataset
Pay per usage
consummate_mandala/url-to-llm-dataset
Rating
0.0
(0)
Developer
Donny Nguyen
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
AI
Share
consummate_mandala/pdf-to-text-extractor
dusan.vystrcil/llm-dataset-processor
Allows you to process output of other actors or stored dataset with single LLM prompt. It's useful if you need to enrich data, summarize content, extract specific information, or manipulate data in a structured way using AI.
Dušan Vystrčil
128
logiover/sitemap-to-url-crawler
nstantly extract all public URLs from any website's sitemap.xml recursively. Handles nested sitemap indexes automatically. The fastest & cheapest way to build URL lists for RAG pipelines, LLM training, and SEO audits. Zero-config & blazing fast.
Logiover
18
hello.datawizards/opentable-urls-script
"Opentable-Urls Script extracts rich restaurant data from OpenTable pages, including menus, images, ratings, location, cuisine, and pricing. Ideal for food apps, analytics, travel platforms, and AI datasets, delivering clean, structured JSON output with proxy support."
datawizards
6
tropical_quince/website-to-llm-dataset
Scrape website to llm dataset data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.
mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper
Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.
mohamed el hadi msaid
257
3.8
(3)
datascoutapi/web-scraper
Web Scraper Pro extracts clean structured data for LLMs/RAG. Browser-based, 10x faster with anti-detection bypassing Cloudflare/CAPTCHA & proxy rotation. Bulk/recursive crawl 50k URLs at 500 pages/min. JSON/CSV/API, free tier.
halam
13
scrapier/google-news-scraper
Pull fresh news coverage from Google News with reliable scraping. Extract article metadata, summaries, sources, and URLs for trend analysis or reporting workflows. Designed for content teams, researchers, and automation pipelines.
Scrapier
3
fiery_dream/ai-training-data-enricher
Production-grade data enrichment and validation for LLM training datasets. Automatically clean, enrich, deduplicate, and validate your AI training data before fine-tuning.
Cody Churchwell
lenient_grove/RAG-Spider
Enterprise-grade web crawler that converts messy websites into clean, chunked Markdown for AI systems. Uses Mozilla Readability for 95% cleaner extraction than competitors. Outputs RAG-ready data with metadata and token estimates. Perfect for building knowledge bases and training AI chatbots.
Tejas Rawool
8
5.0
(1)