AI-Ready Website Crawler
Pricing
Pay per usage
Go to Apify Store
AI-Ready Website Crawler
Crawl websites and convert to clean markdown for AI/RAG, LLM fine-tuning, and document pipelines.
AI-Ready Website Crawler
Pricing
Pay per usage
Crawl websites and convert to clean markdown for AI/RAG, LLM fine-tuning, and document pipelines.
The primary URL to start crawling from. The crawler will follow links within the same domain.
Optional list of additional URLs to crawl. Each URL will be crawled independently.
[]Maximum number of pages to crawl. Set to 0 for unlimited (not recommended).
Maximum link depth to follow from the start URL. Depth 0 means only the start URL itself.
Maximum number of requests per second. Lower values are more polite to target servers.
Whether to respect robots.txt rules. Strongly recommended to keep enabled.
Only crawl URLs matching these regex patterns. Leave empty to crawl all URLs on the same domain.
[]Skip URLs matching these regex patterns. Common exclusions: login pages, API endpoints, media files.
[ "\\.(pdf|zip|tar|gz|mp4|mp3|avi|mov|wmv|jpg|jpeg|png|gif|svg|ico|woff|woff2|ttf|eot)$", "/api/", "/login", "/logout", "/signin", "/signup", "/auth/"]CSS selectors for elements to remove before converting to markdown. Defaults remove nav, footer, ads, etc.
[ "nav", "footer", "header", "aside", ".sidebar", ".nav", ".navigation", ".menu", ".footer", ".header", ".advertisement", ".ad", ".ads", ".social-share", ".cookie-banner", ".cookie-consent", ".popup", ".modal", ".breadcrumb", ".pagination", "#comments", ".comments", "script", "style", "noscript", "iframe", "svg"]CSS selectors to target main content. If specified, only content within these selectors is extracted. Leave empty to auto-detect.
[]Timeout for each HTTP request in seconds.