
Website Content Crawler
Pricing
Pay per usage

Website Content Crawler
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
4.0 (41)
Pricing
Pay per usage
1596
Total users
62K
Monthly users
8.2K
Runs succeeded
>99%
Issues response
8.2 days
Last modified
17 hours ago
Cannot load dataset
Opened 9 hours ago by katiev-owner, last comment 9 hours ago by katiev-owner
Website Crawler keeps running after execution completes (13 hrs)
Opened 10 days ago by revscaleai, last comment 10 days ago by revscaleai
It keeps repeating and it will not stop
Opened 10 days ago by bracciads, last comment 10 days ago by bracciads
New rust http client failing on valid SSL config: SelectedUnusableCipherSuiteForVersion
Opened 17 days ago by uglyrobot, last comment 17 days ago by Jindřich Bär (jindrich.bar)
Glob Patterns are ignored when using Sitemap
Opened a month ago by cirez_d, last comment 16 days ago by cirez_d
Memory issue
Opened a month ago by acarter, last comment 18 hours ago by Jindřich Bär (jindrich.bar)
Avoid query parameters when crawling websites
Opened 2 months ago by innovum_admin, last comment a month ago by Jindřich Bär (jindrich.bar)
Getting 403 from public page
Opened 2 months ago by formidable_quagmire, last comment a month ago by formidable_quagmire
crawling cannot be done with arabic website in english
Opened 2 months ago by aswinthazhath, last comment 2 months ago by Jindřich Bär (jindrich.bar)
CORS Error
Opened 2 months ago by fmateen, last comment a month ago by Jindřich Bär (jindrich.bar)
Is there a way to crawl URL from the visible HTML after removing "removeElementsCssSelector"
Opened 2 months ago by formidable_quagmire, last comment 2 months ago by Jindřich Bär (jindrich.bar)
can we get the images on the pages too?
Opened 3 months ago by disarming_rutabaga, last comment a month ago by Jiří Spilka (jiri.spilka)
Decode non-UTF-8 text in crawlerType cheerio
Opened a year ago by consoling_knock, last comment a year ago by Jindřich Bär (jindrich.bar)