Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoHi, I'm currently trying to use a proxy from scrapingbee.com, but every request is not processed because there are SSL errors connecting to the proxy (test via "curl -k" works). In the scrapingbee.com manual, in the "Apify Integration" section, they recommend enabling the "Ignore SSL errors" checkbox. But I don't see it in the actor settings.
Hello, and thank you for the interest in the Actor! You are right that there is currently no way to do this with Website Content Crawler. We will look into this and let you know here once this is addressed.
Is there a way to clone the "Website Content Crawler" docker image to add some updates to the code? We really need this feature.
Unfortunately, the package is not open source, so you cannot modify the code. We will add this, but I cannot make any promises now. You may use the Apify proxy right now - it is optimized for this use case.
I’ll go ahead and close this issue now, but please feel free to ask additional questions or raise a new issue.
Actor Metrics
3.9k monthly users
-
711 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 4 hours ago