Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.0 (41)

Pricing

Pay per usage

1593

Total users

62K

Monthly users

8.2K

Runs succeeded

>99%

Issues response

7.9 days

Last modified

15 hours ago

CM

Crawler does not work any longer... I tried with multiple links about 30 minutes ago and it was working and it randomly stopped!!

Closed

Capture_Marketing opened this issue
5 months ago

The crawler was working perfectly and it randomly stopped scraping!

jakub.kopecky avatar

Hi, thank you for using the Website Content Crawler.

I checked your Actor run and did not find any issues in the logs. Sometimes the Website Content Crawler can run longer than expected due to network or website-related issues. I also noticed that you are not limiting the maximum number of results in the Actor input, which can cause the Actor to run for a long time until it times out if there are many pages to crawl. Try setting Crawler settings -> Max pages (or maxCrawlPages in JSON) to a more reasonable number.

I tried to run the crawler with your input, but limited the maximum number of results, and it finished successfully: https://console.apify.com/view/runs/Rg9KfyeCxZPucqUdV

Please try to run the Actor again and let me know if you encounter any issues.

Thank you, Jakub