Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1307

Total users

49.3k

Monthly users

6.9k

Runs succeeded

>99%

Issue response

3.8 days

Last modified

7 days ago

2025-05-13T10:00:05.221Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation.

Opened 21 hours ago by formidable_quagmire, last comment 21 hours ago by formidable_quagmire

Error: Cannot run Actor (Network Error)

Opened a day ago by dawieharmse, last comment a day ago by dawieharmse

CORS Error

Opened 2 days ago by fmateen, last comment 2 days ago by fmateen

Is there a way to crawl URL from the visible HTML after removing "removeElementsCssSelector"

Opened 2 days ago by formidable_quagmire, last comment 2 days ago by formidable_quagmire

Cookie Banner is not removed

Opened 12 days ago by Joe11, last comment 12 days ago by Joe11

Crawling is stuck (10h+)

Opened 14 days ago by jauns-ai, last comment 9 days ago by Jakub Kopecký (jakub.kopecky)

Crawling a small list of pdf urls hangs and crashes the crawler repetitively

Opened 20 days ago by uglyrobot, last comment 16 days ago by Jakub Kopecký (jakub.kopecky)

Feature Request: Automatic Recrawling Within Same Task Run for RAG System Integration

Opened 20 days ago by sprouto_net, last comment 16 days ago by Jakub Kopecký (jakub.kopecky)

Is it possible to speed up the processing time?

Opened 21 days ago by sheldon-supreme, last comment 20 days ago by Jakub Kopecký (jakub.kopecky)

simple page is throwing an error

Opened 22 days ago by burgundy_zebra, last comment 9 days ago by Jakub Kopecký (jakub.kopecky)

Crawling did not support cookie rotating

Opened 23 days ago by meddlesome, last comment 23 days ago by meddlesome

Execution context was destroyed

Opened a month ago by benjaminprevot, last comment a month ago by benjaminprevot

can we get the images on the pages too?

Opened a month ago by disarming_rutabaga, last comment 20 days ago by disarming_rutabaga-owner

Exclude Start URL and Disallowed Paths from Output + Return Clean JSON Structure

Opened a month ago by rudy-seo, last comment 20 days ago by Jakub Kopecký (jakub.kopecky)

Error on Zapier Actor

Opened a month ago by insiderperks-owner, last comment 18 days ago by Jiří Spilka (jiri.spilka)

Issue Crawling Content from Paid Websites Like New York Times

Opened a month ago by onlinereach, last comment a month ago by Jakub Kopecký (jakub.kopecky)

crawler wont click on a specific button

Opened a month ago by shikh.sn2021, last comment a month ago by Jakub Kopecký (jakub.kopecky)

Adsterra .com

Opened a month ago by Tijjeboy, last comment a month ago by Jiří Spilka (jiri.spilka)

number of saved lines

Opened a month ago by kocsi, last comment 16 days ago by Jakub Kopecký (jakub.kopecky)

Large number of requests fail

Opened a month ago by cirez_d, last comment a month ago by cirez_d