Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.7 (41)

Pricing

Pay per usage

1481

Total users

57K

Monthly users

8.1K

Runs succeeded

>99%

Issues response

7.8 days

Last modified

2 days ago

Add Time Range to Scraped Data

Opened 2 days ago by kristupas, last comment 2 days ago by Jindřich Bär (jindrich.bar)

Glob Patterns are ignored when using Sitemap

Opened 3 days ago by cirez_d, last comment 2 days ago by Jindřich Bär (jindrich.bar)

Incomplete Web Scraping Results for a Webflow website

Opened 3 days ago by sllintestacc, last comment 3 days ago by Jindřich Bär (jindrich.bar)

High costs?

Opened 5 days ago by nordicloom.marketing, last comment 5 days ago by Jindřich Bär (jindrich.bar)

Memory issue

Opened 5 days ago by acarter, last comment 5 days ago by Jindřich Bär (jindrich.bar)

it kept working without stoping

Opened 6 days ago by amitbend, last comment 5 days ago by Jindřich Bär (jindrich.bar)

HTTP Webhook stucked in loading forever

Opened 9 days ago by zacharykoo, last comment 8 days ago by Jakub Kopecký (jakub.kopecky)

Issue with web crawler

Opened 12 days ago by AndrewEhab, last comment 12 days ago by Jindřich Bär (jindrich.bar)

Website Content Crawler stuck - cost keeps increasing

Opened 22 days ago by digtital_moose, last comment 8 days ago by jfnrj2ui

How can i get all hidden fields in my actor result

Opened 22 days ago by mohit1.vdoit, last comment 18 days ago by Jindřich Bär (jindrich.bar)

Http website inaccessible

Opened 23 days ago by souheil, last comment 12 days ago by Jindřich Bär (jindrich.bar)

Didn't crawl the entire page and seemed to do it in no particular orer

Opened 25 days ago by arsia, last comment 18 days ago by Jindřich Bär (jindrich.bar)

No text parsed from from webpage.

Opened 25 days ago by formidable_quagmire, last comment 18 days ago by Jindřich Bär (jindrich.bar)

Avoid query parameters when crawling websites

Opened 25 days ago by innovum_admin, last comment 17 days ago by Jindřich Bär (jindrich.bar)

To much time

Opened a month ago by florian-morina, last comment 25 days ago by Jiří Spilka (jiri.spilka)

Getting 403 from public page

Opened a month ago by formidable_quagmire, last comment 17 days ago by formidable_quagmire

No text parsed from from webpage.

Opened a month ago by formidable_quagmire, last comment a month ago by Jindřich Bär (jindrich.bar)

2 failed crawled websites

Opened a month ago by bor.cerlini, last comment a month ago by Jiří Spilka (jiri.spilka)

crawling cannot be done with arabic website in english

Opened a month ago by aswinthazhath, last comment a month ago by Jindřich Bär (jindrich.bar)

Crawling with markdown give half of the data where on the other pages gives us complete data

Opened a month ago by formidable_quagmire, last comment a month ago by formidable_quagmire