
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
3.7 (41)
Pricing
Pay per usage
1481
Total users
57K
Monthly users
8K
Runs succeeded
>99%
Issues response
7.8 days
Last modified
2 days ago
it kept working without stoping
Closed
it kept working without stoping, ate all my budget
Hello, and thank you for your interest in this Actor.
It sounds like your recent crawler run continued past your intended scope, leading to unexpected budget use. The crawler operates based on its instructions, and without specific limits, it will continue until all potential pages are processed.
To prevent this in the future and better control your crawl's scope and budget, consider using these options in your actor configuration - includeUrlGlobs
/ excludeUrlGlobs
, maxCrawlDepth
, maxCrawlPages
, or maxResults
.
Alternatively, you can also use the global Run timeout. This will automatically stop the actor after a specified duration, regardless of whether it has completed its scraping tasks.
I'll close this issue now, but feel free to ask additional questions if you have any. Cheers!