Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoI cannot pass Basic authentication. I tried setting the headers in startUrls, but the result is a 401 error.
When I make a request from the terminal using curl with the same settings in -H, I can get the result. Why is this happening?
Hello and thank you for your interest in this Actor!
Could you please share the link to the run where this happened? Or (if it e.g. contains sensitive information) at least the Run ID?
Without a reproduction scenario, we cannot provide much support.
I'll be looking forward to your response. Cheers!
Hello - just letting you know that we've just released a new version of Website Content Crawler (0.3.37) where the headers passing is fixed - the Actor now correctly processes the passed HTTP headers, methods, and payloads.
I'll close this issue now, but feel free to ping us in case of any other issues or questions.
Cheers!
- 3.8k monthly users
- 544 stars
- 99.9% runs succeeded
- 3.4 days response time
- Created in Mar 2023
- Modified 1 day ago