Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.0 (40)

Pricing

Pay per usage

1391

Total users

53K

Monthly users

7.9K

Runs succeeded

>99%

Issues response

6.8 days

Last modified

4 days ago

BO

2 failed crawled websites

Open

bor.cerlini opened this issue
9 days ago

I was crawling 5 different URLs and the whole process took almost 7 minutes. 2 of them failed, the last one I aborted because it was just takint too long (would probably fail as well).

jiri.spilka avatar

Hi, thank you for using the Website Content Crawler.

There are actually two separate issues:

1. Reddit Scraping Reddit is extremely challenging due to its dynamic structure and aggressive anti-bot measures. The Website Content Crawler is not optimized for Reddit — we recommend using dedicated Actors specifically designed for that platform.

2. https://www.rei.com/ For REI, the crawler attempted to click and expand elements matching the selector "[aria-expanded=\"false\"]". However, because many such elements existed — some of which weren’t actually expandable — this caused the crawler to fail. I’ve adjusted the configuration to prevent unnecessary clicks by changing the selector to "[aria-expanded=\"true\"]". You can see the successful run here — the content was scraped in just 36 seconds.

I hope this helps! Best regards, Jiri