Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.6 (39)

Pricing

Pay per usage

1453

Total users

56K

Monthly users

7.9K

Runs succeeded

>99%

Issues response

7.9 days

Last modified

4 days ago

ME

Platform being used too quickly

Closed

methodical opened this issue
8 months ago

(written October 4)

I received two emails about 90 minutes apart. "Platform being used too quickly". The first email said I had reached reached 50% of $49.00 hard usage limit. The second email, about 90 minutes later said I reached 75% of $49.00 hard usage limit. I had been away eating dinner and nothing was running in the background. Although I did a number of API calls today and one timed out at 300 seconds (on my end) I notice that from Sept 17 until yesterday my total usage was $5.68 in Actor Units and then today it jumps to $37.64. From Oct 1-3 it totaled $0.27 and Oct 4 it jumps to $32.71. There is literally nothing running to account for this. Oct 5 is already up to $4.82 and I am literally not doing anything.

I suspect the API called that timed out has "gone rogue" but that is just a guess.

Please investigate.

Oscardz avatar

Hello, Indeed, there was an issue (fixed today) related to the sitemaps' loading. I just gave you a coupon to reimburse the extra charge. Sorry for the inconvenience and thank you for reaching out. Best regards.

ME

methodical

8 months ago

Thank-you very much! Now that I am confident Website Content Crawler is not too expensive I would like to ask about performance. I find that the REST API call does not return much quicker than 30-60 seconds and can sometimes run for 120+ seconds to scrape relatively uncomplicated pages. Even a wikipedia article takes close to 60 seconds.

Is this normal? Are there configuration parameters which relate closely to speed?

ME

methodical

8 months ago

I'm not sure if this is related to the fix. I was scraping this URL: https://livekit.io/ and I got a timeout after 300 seconds. Then I changed useSitemaps to false and I got results after 150 seconds (I still think that is too long...).

The results are not too deep either:

Title: LiveKit

Description: Instantly transport audio and video between LLMs and your users.

LiveKitLiveKit LogoChevron IconChevron IconChevron IconGitHub LogoChevron IconChevron IconChevron IconLiveKit LogoGitHub LogoX Logo

Build realtime AI. Instantly transport audio and video between LLMs and your users.

Solutions

Tools for multimodal apps

Conversational AI

Robotics

Livestreaming

OpenAI uses LiveKit to deliver voice to millions of ChatGPT users.

developer focused

Build, deploy, scale. Repeat.

global scale

The backbone of the realtime computing era.

LiveKit's network is optimized for ultra-low latency, extreme resiliency, and massive scale. Our team is distributed across the world and our infrastructure delivers billions of minutes of audio and video every month.

capabilities

A feature-rich platform

ME

methodical

8 months ago

I got another automated email about 55 minutes ago. "Your platform usage in the current monthly cycle reached 50% of $98.00 hard usage limit. If it exceeds the limit, the Apify platform services will be suspended. You can increase the limit in your Limits settings."

I'm not sure if these emails are generated in ~24 hour batch cycles or quickly after a threshold has been crossed. But at the time the email was sent and for the previous ~10 hours I was not using the API at all. This COULD be related to the credit that posted to my account. E.g., I had a $49 limit, I was getting close to it, you found a bug, you gave me a $49 credit. Thus I had a new limit of $98 but I'm still at the 50% threshold. Ergo the warning. HOWEVER, I see that usage is now at $73. Meaning somehow in the last day I used ~$24 in credits. Thus I think that....

  1. This scraper is very expensive, OR
  2. There is still a bug somewhere....

:)

ME

methodical

8 months ago

3 hours ago my usage was at $73, now it is at $78. I have used the API but only to scrape 3-5 pages max. This does not seem at all reasonable. Is the cost really close to $1 per page?

ME

methodical

8 months ago

This morning's email has "Custom limit of monthly platform usage has been reached. Actors and other platform features are disabled. You have reached your custom limit of monthly platform usage and thus the Apify platform services have been suspended. To continue using Apify, please increase your custom usage limit or wait for the next billing period, starting on 2024-10-17."

Again either APIFY is unbelievably expensive or there is a bug somewhere.

ME

methodical

8 months ago

I found this in the "Runs" log. I notice that others are reporting similar problems.

Oscardz avatar

This is an known issue that we are investigating at the moment. Once it's fixed, I will gladly reimburse the money for those Runs. Sorry for the inconvenience, and I will keep you posted on the progress of this fix.

ME

methodical

8 months ago

Thank-you.

ME

methodical

8 months ago

Can you issue a credit so I can continue to use the platform between now and October 17 when my subscription auto-renews. I really don't want to raise the $ limit at the moment.

Oscardz avatar

Sure, the reimbursement was applied today. Let me know if you have any further issues.