Pricing

Pay per usage

Go to Store

Website Content Crawler

Try for free

Developed by

Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.6 (39)

Pricing

Pay per usage

1453

Total users

56K

Monthly users

7.9K

Runs succeeded

>99%

Issues response

7.9 days

Last modified

4 days ago

Developer tools

Back to issues Create new issue

Platform being used too quickly

Closed

methodical opened this issue

(written October 4)

I received two emails about 90 minutes apart. "Platform being used too quickly". The first email said I had reached reached 50% of $49.00 hard usage limit. The second email, about 90 minutes later said I reached 75% of $49.00 hard usage limit. I had been away eating dinner and nothing was running in the background. Although I did a number of API calls today and one timed out at 300 seconds (on my end) I notice that from Sept 17 until yesterday my total usage was $5.68 in Actor Units and then today it jumps to $37.64. From Oct 1-3 it totaled $0.27 and Oct 4 it jumps to $32.71. There is literally nothing running to account for this. Oct 5 is already up to $4.82 and I am literally not doing anything.

I suspect the API called that timed out has "gone rogue" but that is just a guess.

Please investigate.

Oscar Rodriguez (Oscardz)

Hello, Indeed, there was an issue (fixed today) related to the sitemaps' loading. I just gave you a coupon to reimburse the extra charge. Sorry for the inconvenience and thank you for reaching out. Best regards.

methodical

Thank-you very much! Now that I am confident Website Content Crawler is not too expensive I would like to ask about performance. I find that the REST API call does not return much quicker than 30-60 seconds and can sometimes run for 120+ seconds to scrape relatively uncomplicated pages. Even a wikipedia article takes close to 60 seconds.

Is this normal? Are there configuration parameters which relate closely to speed?

methodical

I'm not sure if this is related to the fix. I was scraping this URL: https://livekit.io/ and I got a timeout after 300 seconds. Then I changed useSitemaps to false and I got results after 150 seconds (I still think that is too long...).

The results are not too deep either:

Title: LiveKit

Description: Instantly transport audio and video between LLMs and your users.

LiveKitLiveKit LogoChevron IconChevron IconChevron IconGitHub LogoChevron IconChevron IconChevron IconLiveKit LogoGitHub LogoX Logo

Build realtime AI. Instantly transport audio and video between LLMs and your users.

Solutions

Tools for multimodal apps

Conversational AI

Robotics

Livestreaming

OpenAI uses LiveKit to deliver voice to millions of ChatGPT users.

developer focused

Build, deploy, scale. Repeat.

global scale

The backbone of the realtime computing era.

LiveKit's network is optimized for ultra-low latency, extreme resiliency, and massive scale. Our team is distributed across the world and our infrastructure delivers billions of minutes of audio and video every month.

capabilities

A feature-rich platform

methodical

I got another automated email about 55 minutes ago. "Your platform usage in the current monthly cycle reached 50% of $98.00 hard usage limit. If it exceeds the limit, the Apify platform services will be suspended. You can increase the limit in your Limits settings."

I'm not sure if these emails are generated in ~24 hour batch cycles or quickly after a threshold has been crossed. But at the time the email was sent and for the previous ~10 hours I was not using the API at all. This COULD be related to the credit that posted to my account. E.g., I had a $49 limit, I was getting close to it, you found a bug, you gave me a $49 credit. Thus I had a new limit of $98 but I'm still at the 50% threshold. Ergo the warning. HOWEVER, I see that usage is now at $73. Meaning somehow in the last day I used ~$24 in credits. Thus I think that....

This scraper is very expensive, OR
There is still a bug somewhere....

methodical

3 hours ago my usage was at $73, now it is at $78. I have used the API but only to scrape 3-5 pages max. This does not seem at all reasonable. Is the cost really close to $1 per page?

methodical

This morning's email has "Custom limit of monthly platform usage has been reached. Actors and other platform features are disabled. You have reached your custom limit of monthly platform usage and thus the Apify platform services have been suspended. To continue using Apify, please increase your custom usage limit or wait for the next billing period, starting on 2024-10-17."

Again either APIFY is unbelievably expensive or there is a bug somewhere.

methodical

I found this in the "Runs" log. I notice that others are reporting similar problems.

Oscar Rodriguez (Oscardz)

This is an known issue that we are investigating at the moment. Once it's fixed, I will gladly reimburse the money for those Runs. Sorry for the inconvenience, and I will keep you posted on the progress of this fix.

methodical

Thank-you.

methodical

Can you issue a credit so I can continue to use the platform between now and October 17 when my subscription auto-renews. I really don't want to raise the $ limit at the moment.

Oscar Rodriguez (Oscardz)

Sure, the reimbursement was applied today. Let me know if you have any further issues.

Add comment

AI Website Content Markdown Scraper

quaking_pail/ai-website-content-markdown-scraper

This Apify Actor, "Website Content Crawler with Markdown Extraction," is designed to perform a comprehensive crawl of specified websites, extract their text content, convert it into Markdown format, and store it in a structured dataset. The extracted content is suitable for feeding LLMs.

AI_Builder

570

3.8

Fast Website Content Crawler

6sigmag/fast-website-content-crawler

A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

4.5

Website Content to Markdown for LLM Training

easyapi/website-content-to-markdown-for-llm-training

🚀 Transform web content into clean, LLM-ready Markdown! 📘 Scrape multiple pages, extract main content, and convert to Markdown format. Perfect for AI researchers, data scientists, and LLM developers. Fast, efficient, and customizable. Supercharge your AI training data today! 🌐📝🧠

EasyApi

5.0

🔥 FireScrape AI Website Content Markdown Scraper

mohamedgb00714/fireScraper-AI-Website-Content-Markdown-Scraper

Advanced web scraper powered by Crawlee and Puppeteer — extracts website content, converts it to Markdown, and structures it for LLM training datasets.

mohamed el hadi msaid

5.0

Extract-any-webpage-content-for-llm

ai-developer/extract-any-webpage-content-for-llm

Fast and easy way to extract data from any webpage and are LLM friendly. The tool lets you easily extract content from any website. Ideal for researchers, marketers, and developers.

aideveloper

464

Deep Website Content Crawler

6sigmag/deep-website-content-crawler

Scrape Failed Killer! A high-performance web scraper that rapidly extracts and analyzes content from multiple websites simultaneously. Perfect for competitive research, content aggregation, and website structure analysis.

David Deng

392

4.1

AI-Powered Web Content & Link Extractor

scrapercoder/ai-powered-web-content-link-extractor

Crawls websites to extract clean, structured content for AI/LLM use, ideal for training datasets, knowledge bases, and RAG systems. Json output includes: * text: Normalized page content * links: Extracted sub-URLs

wallnut.ai

Web Scraper

apify/web-scraper

Crawls arbitrary websites using a web browser and extracts structured data from web pages using a provided JavaScript function. The Actor supports both recursive crawling and lists of URLs, and automatically manages concurrency for maximum performance.

Apify

86K

4.5

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

116

1.0

RAG Web Browser

apify/rag-web-browser

Web browser for OpenAI Assistants, RAG pipelines, or AI agents, similar to a web browser in ChatGPT. It queries Google Search, scrapes the top N pages, and returns their content as Markdown for further processing by an LLM. It can also scrape individual URLs. Supports Model Context Protocol (MCP).

Apify

3.3K

4.3