Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.0 (41)

Pricing

Pay per usage

1596

Total users

62K

Monthly users

8.2K

Runs succeeded

>99%

Issues response

8.2 days

Last modified

17 hours ago

Cannot load dataset

Opened 9 hours ago by katiev-owner, last comment 9 hours ago by katiev-owner

Website Crawler keeps running after execution completes (13 hrs)

Opened 10 days ago by revscaleai, last comment 10 days ago by revscaleai

It keeps repeating and it will not stop

Opened 10 days ago by bracciads, last comment 10 days ago by bracciads

New rust http client failing on valid SSL config: SelectedUnusableCipherSuiteForVersion

Opened 17 days ago by uglyrobot, last comment 17 days ago by Jindřich Bär (jindrich.bar)

Glob Patterns are ignored when using Sitemap

Opened a month ago by cirez_d, last comment 16 days ago by cirez_d

Memory issue

Opened a month ago by acarter, last comment 18 hours ago by Jindřich Bär (jindrich.bar)

Avoid query parameters when crawling websites

Opened 2 months ago by innovum_admin, last comment a month ago by Jindřich Bär (jindrich.bar)

Getting 403 from public page

Opened 2 months ago by formidable_quagmire, last comment a month ago by formidable_quagmire

crawling cannot be done with arabic website in english

Opened 2 months ago by aswinthazhath, last comment 2 months ago by Jindřich Bär (jindrich.bar)

CORS Error

Opened 2 months ago by fmateen, last comment a month ago by Jindřich Bär (jindrich.bar)

Is there a way to crawl URL from the visible HTML after removing "removeElementsCssSelector"

Opened 2 months ago by formidable_quagmire, last comment 2 months ago by Jindřich Bär (jindrich.bar)

can we get the images on the pages too?

Opened 3 months ago by disarming_rutabaga, last comment a month ago by Jiří Spilka (jiri.spilka)

Decode non-UTF-8 text in crawlerType cheerio

Opened a year ago by consoling_knock, last comment a year ago by Jindřich Bär (jindrich.bar)