Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.2 (40)

Pricing

Pay per usage

1398

Total users

54K

Monthly users

8K

Runs succeeded

>99%

Issues response

6.8 days

Last modified

5 days ago

Website Content Crawler stuck - cost keeps increasing

Opened 4 days ago by digtital_moose, last comment a day ago by Jan Buchar (janbuchar)

Http website inaccessible

Opened 5 days ago by souheil, last comment 3 hours ago by souheil

Avoid query parameters when crawling websites

Opened 7 days ago by innovum_admin, last comment 7 days ago by innovum_admin

Getting 403 from public page

Opened 8 days ago by formidable_quagmire, last comment 7 days ago by formidable_quagmire

2 failed crawled websites

Opened 10 days ago by bor.cerlini, last comment 10 days ago by Jiří Spilka (jiri.spilka)

crawling cannot be done with arabic website in english

Opened 12 days ago by aswinthazhath, last comment 8 days ago by Jindřich Bär (jindrich.bar)

Timeout and no data

Opened 17 days ago by Autocom, last comment 10 days ago by Jiří Spilka (jiri.spilka)

2025-05-13T10:00:05.221Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation.

Opened 21 days ago by formidable_quagmire, last comment 10 days ago by Jiří Spilka (jiri.spilka)

Error: Cannot run Actor (Network Error)

Opened 22 days ago by dawieharmse, last comment 22 days ago by dawieharmse

CORS Error

Opened 22 days ago by fmateen, last comment 20 days ago by fmateen

Is there a way to crawl URL from the visible HTML after removing "removeElementsCssSelector"

Opened 22 days ago by formidable_quagmire, last comment 5 hours ago by Jindřich Bär (jindrich.bar)

Cookie Banner is not removed

Opened a month ago by Joe11, last comment 20 days ago by Jakub Kopecký (jakub.kopecky)

Crawling is stuck (10h+)

Opened a month ago by jauns-ai, last comment a month ago by Jakub Kopecký (jakub.kopecky)

Crawling a small list of pdf urls hangs and crashes the crawler repetitively

Opened a month ago by uglyrobot, last comment a month ago by Jakub Kopecký (jakub.kopecky)

Execution context was destroyed

Opened 2 months ago by benjaminprevot, last comment 7 days ago by conv_ai_account

can we get the images on the pages too?

Opened 2 months ago by disarming_rutabaga, last comment a month ago by disarming_rutabaga-owner

Issue Crawling Content from Paid Websites Like New York Times

Opened 2 months ago by onlinereach, last comment 2 months ago by Jakub Kopecký (jakub.kopecky)

Adsterra .com

Opened 2 months ago by Tijjeboy, last comment 2 months ago by Jiří Spilka (jiri.spilka)

Large number of requests fail

Opened 2 months ago by cirez_d, last comment 10 days ago by Jiří Spilka (jiri.spilka)

Add Full File Name to the Key-Value-Stores

Opened 2 months ago by CtrlAltElite, last comment a month ago by Jakub Kopecký (jakub.kopecky)