Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.7 (41)

Pricing

Pay per usage

1526

Total users

59K

Monthly users

7.8K

Runs succeeded

>99%

Issues response

7.6 days

Last modified

4 days ago

SS

How to ignore broken SSL when using PROXY

Closed

sash2s opened this issue
10 months ago

Hi, I'm currently trying to use a proxy from scrapingbee.com, but every request is not processed because there are SSL errors connecting to the proxy (test via "curl -k" works). In the scrapingbee.com manual, in the "Apify Integration" section, they recommend enabling the "Ignore SSL errors" checkbox. But I don't see it in the actor settings.

janbuchar avatar

Hello, and thank you for the interest in the Actor! You are right that there is currently no way to do this with Website Content Crawler. We will look into this and let you know here once this is addressed.

SS

sash2s

10 months ago

Is there a way to clone the "Website Content Crawler" docker image to add some updates to the code? We really need this feature.

janbuchar avatar

Unfortunately, the package is not open source, so you cannot modify the code. We will add this, but I cannot make any promises now. You may use the Apify proxy right now - it is optimized for this use case.

jiri.spilka avatar

I’ll go ahead and close this issue now, but please feel free to ask additional questions or raise a new issue.