
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.6 (38)
Pricing
Pay per usage
1.1k
Monthly users
6k
Runs succeeded
>99%
Response time
2.3 days
Last modified
7 days ago
Block Detection and Proxy IP Session Rotation
Hey, We've noticed that when we have multiple start-URLs, that first one gets scraped successfully, but the others get blocked.
We use:
crawlerType
ofplaywright:adaptive
(default).proxyConfiguration
of{"useApifyProxy":true}
(default).maxSessionRotations
of 10 (default).maxConcurrency
of 1.
Could it be the maxSessionRotations
is not working as expected?
Is there a way to force the Actor to rotate IPs prior to scraping each URL?

Hi, thank you for using the Website Content Crawler.
Web Crawler Crew (WCC) is a one-size-fits-all tool that passively bypasses captchas by avoiding triggers - no clicking or solving puzzles. Still, some sites like Walmart need special treatment. Check out our Walmart Scrapers in the Apify Store for those.
Let me know if you need help! Jakub Kopecky
oren_clearya
Thanks Jakub,
I manage to scrape Walmart just fine with the apify/website-content-crawler
Actor.
But my point here is that if I have multiple start-URLs - it only succeeds with the first one, and then gets blocked.
So it seems as-if the Actor doesn't rotate the Proxy IPs based per the maxSessionRotations
property.
Is that the expected behavior?

Hey,
Glad to hear you successfully scraped Walmart. Yes, that’s expected - the crawler doesn’t recognize Walmart’s CAPTCHA response as a block, so it doesn’t rotate the session.
Closing this issue, feel free to reopen if needed.
Jakub
Pricing
Pricing model
Pay per usageThis Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.