Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler

Developed by

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1.1k

Monthly users

6k

Runs succeeded

>99%

Response time

2.3 days

Last modified

7 days ago

NT

we can't scrape that website as its says SSL certificate error , Can you please fix it.

Closed
anthony.quinn opened this issue
17 days ago

The error page.goto: SEC_ERROR_UNKNOWN_ISSUER indicates an issue with the SSL certificate of the target website. Specifically, the browser (Chromium in this case) doesn't trust the certificate authority (CA) that signed the website's SSL certificate. This can happen for several reasons:

we can't scrape that website as its says SSL certificate error , Can you please fix it. "https://www.fincen.gov/resources

jakub.kopecky avatar

Hey, and thank you for using Website Content Crawler! The issue is with the site's SSL certificate. I created an issue for this, and we should add the ignoreSslErrors option (or similar) in the future. I will inform you once it is done.

jakub.kopecky avatar

In the meantime you can switch to a different crawlerType like raw HTTP Cheerio crawler or the experimental JSDOM that also supports rendering JavaScript which should work for this site based on my testing. Please see my testing Actor run: https://console.apify.com/view/runs/3NljAPSvl0BiWLEdt

jakub.kopecky avatar

The Firefox browser currently does not support ignoring SSL certificate issues unlike chromium. Please use other crawler type as I mentioned in previous comment.

Closing this issue, if you encounter any other issue feel free to reopen.

jakub.kopecky avatar

@jindrich.bar pointed out that the crawling currently works and it might have been only an outage on the site's end. So Firefox should work for you right now. In case it stops working again, you can swap to other crawlers.

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.