
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.6 (38)
Pricing
Pay per usage
1272
Total users
48.1k
Monthly users
6.8k
Runs succeeded
>99%
Response time
5 days
Last modified
18 hours ago
Can we limit crawling to just the initial website?
Closed
I have a list of URLs I'm crawling and the crawler is going to other websites. I want to get all the articles on the blogs from each of those urls, but not go out into other websites not included in my list
Hello,
yes, you can absolutely do this. To only scrape the start URLs, you simply set the Crawler settings > Max crawling depth
to 0. This way, the Actor will only visit the pages from the input, but will not follow any links.
Does this solve your issue? Or is there something more to it? Fell free to ask more questions - or close this issue.
Cheers!