Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1272

Total users

48.1k

Monthly users

6.8k

Runs succeeded

>99%

Response time

5 days

Last modified

18 hours ago

BG

Can we limit crawling to just the initial website?

Closed

bilingual_guild opened this issue
a year ago

I have a list of URLs I'm crawling and the crawler is going to other websites. I want to get all the articles on the blogs from each of those urls, but not go out into other websites not included in my list

jindrich.bar avatar

Hello,

yes, you can absolutely do this. To only scrape the start URLs, you simply set the Crawler settings > Max crawling depth to 0. This way, the Actor will only visit the pages from the input, but will not follow any links.

Does this solve your issue? Or is there something more to it? Fell free to ask more questions - or close this issue.

Cheers!