
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.0 (40)
Pricing
Pay per usage
1392
Total users
53K
Monthly users
7.9K
Runs succeeded
>99%
Issues response
6.8 days
Last modified
4 days ago
Getting 403 from public page
Open
`Recently when run this for source it was working without any issue. even with resedential proxy the crawler is being blocked.
Hello, and thank you for your interest in this Actor!
You're getting blocked by Cloudflare Bot Management — their current settings appear to be quite strict, which is why WCC isn't able to get through anymore, even with residential proxies. Unfortunately, there’s not much that can be done directly in this case with your current setup.
I’d recommend trying our new Camoufox Scraper, which is specifically designed to handle these types of challenges. You can check out my example run here - it seems to have bypassed the bot filter and scraped the full content of the target page. Let me know if you need help setting it up.
I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!
formidable_quagmire
Yes we will definitely need help to setup couple of things in new scraper. we use lot of features of WCC such as css remover, file downloaded. will this still work ? markdown. if yes i don;t seem to find documentation around that if you can share that that will be great.
formidable_quagmire
could you build "Camoufox Scraper" inside the WCC ? or atleast similar input and output parity with WCC ?