Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.7 (41)

Pricing

Pay per usage

1499

Total users

58K

Monthly users

8.1K

Runs succeeded

>99%

Issues response

7.6 days

Last modified

34 minutes ago

Add Time Range to Scraped Data

Opened 6 days ago by kristupas, last comment 6 days ago by Jindřich Bär (jindrich.bar)

Incomplete Web Scraping Results for a Webflow website

Opened 7 days ago by sllintestacc, last comment 7 days ago by Jindřich Bär (jindrich.bar)

High costs?

Opened 9 days ago by nordicloom.marketing, last comment 9 days ago by Jindřich Bär (jindrich.bar)

it kept working without stoping

Opened 10 days ago by amitbend, last comment 9 days ago by Jindřich Bär (jindrich.bar)

HTTP Webhook stucked in loading forever

Opened 13 days ago by zacharykoo, last comment 12 days ago by Jakub Kopecký (jakub.kopecky)

Issue with web crawler

Opened 16 days ago by AndrewEhab, last comment 16 days ago by Jindřich Bär (jindrich.bar)

How can i get all hidden fields in my actor result

Opened a month ago by mohit1.vdoit, last comment 22 days ago by Jindřich Bär (jindrich.bar)

Http website inaccessible

Opened a month ago by souheil, last comment 16 days ago by Jindřich Bär (jindrich.bar)

Didn't crawl the entire page and seemed to do it in no particular orer

Opened a month ago by arsia, last comment 22 days ago by Jindřich Bär (jindrich.bar)

No text parsed from from webpage.

Opened a month ago by formidable_quagmire, last comment 22 days ago by Jindřich Bär (jindrich.bar)

To much time

Opened a month ago by florian-morina, last comment a month ago by Jiří Spilka (jiri.spilka)

No text parsed from from webpage.

Opened a month ago by formidable_quagmire, last comment a month ago by Jindřich Bär (jindrich.bar)

2 failed crawled websites

Opened a month ago by bor.cerlini, last comment 3 days ago by Jiří Spilka (jiri.spilka)

Crawling with markdown give half of the data where on the other pages gives us complete data

Opened a month ago by formidable_quagmire, last comment a month ago by formidable_quagmire

There was an uncaught exception during the run of the Actor and it was not handled

Opened a month ago by formidable_quagmire, last comment a month ago by Jindřich Bär (jindrich.bar)

Error: Cannot run Actor (Network Error)

Opened a month ago by dawieharmse, last comment 3 days ago by Jiří Spilka (jiri.spilka)

Cookie Banner is not removed

Opened 2 months ago by Joe11, last comment 21 days ago by Jiří Spilka (jiri.spilka)

Crawling a small list of pdf urls hangs and crashes the crawler repetitively

Opened 2 months ago by uglyrobot, last comment 21 days ago by Jindřich Bär (jindrich.bar)

Feature Request: Automatic Recrawling Within Same Task Run for RAG System Integration

Opened 2 months ago by sprouto_net, last comment a month ago by Jiří Spilka (jiri.spilka)

Is it possible to speed up the processing time?

Opened 2 months ago by sheldon-supreme, last comment a month ago by Jiří Spilka (jiri.spilka)