Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.7 (41)

Pricing

Pay per usage

1499

Total users

58K

Monthly users

8.1K

Runs succeeded

>99%

Issues response

7.6 days

Last modified

25 minutes ago

crawler got hung up

Opened 13 hours ago by Tmoney97, last comment 13 hours ago by Tmoney97

Falta de Aviso

Opened 3 days ago by impeccable_niche, last comment 3 days ago by impeccable_niche

Glob Patterns are ignored when using Sitemap

Opened 7 days ago by cirez_d, last comment 6 days ago by Jindřich Bär (jindrich.bar)

Memory issue

Opened 9 days ago by acarter, last comment 9 days ago by Jindřich Bär (jindrich.bar)

Website Content Crawler stuck - cost keeps increasing

Opened a month ago by digtital_moose, last comment 12 days ago by jfnrj2ui

Avoid query parameters when crawling websites

Opened a month ago by innovum_admin, last comment 21 days ago by Jindřich Bär (jindrich.bar)

Getting 403 from public page

Opened a month ago by formidable_quagmire, last comment 21 days ago by formidable_quagmire

crawling cannot be done with arabic website in english

Opened a month ago by aswinthazhath, last comment a month ago by Jindřich Bär (jindrich.bar)

Timeout and no data

Opened a month ago by Autocom, last comment a month ago by Jiří Spilka (jiri.spilka)

2025-05-13T10:00:05.221Z WARN PlaywrightCrawler: Reclaiming failed request back to the list or queue. page.evaluate: Execution context was destroyed, most likely because of a navigation.

Opened a month ago by formidable_quagmire, last comment a month ago by Jiří Spilka (jiri.spilka)

CORS Error

Opened a month ago by fmateen, last comment 21 days ago by Jindřich Bär (jindrich.bar)

Is there a way to crawl URL from the visible HTML after removing "removeElementsCssSelector"

Opened a month ago by formidable_quagmire, last comment 22 days ago by Jindřich Bär (jindrich.bar)

Crawling is stuck (10h+)

Opened 2 months ago by jauns-ai, last comment 2 months ago by Jakub Kopecký (jakub.kopecky)

can we get the images on the pages too?

Opened 2 months ago by disarming_rutabaga, last comment 21 days ago by Jiří Spilka (jiri.spilka)

Issue Crawling Content from Paid Websites Like New York Times

Opened 2 months ago by onlinereach, last comment 2 months ago by Jakub Kopecký (jakub.kopecky)

Adsterra .com

Opened 2 months ago by Tijjeboy, last comment 2 months ago by Jiří Spilka (jiri.spilka)

Add Full File Name to the Key-Value-Stores

Opened 3 months ago by CtrlAltElite, last comment 2 months ago by Jakub Kopecký (jakub.kopecky)

scraper don't scrape all the website content like product description

Opened 5 months ago by maabada.shivok, last comment 5 months ago by maabada.shivok

Crawl hung at finished

Opened 5 months ago by mcantrell, last comment 5 months ago by mykola_scrapes

Decode non-UTF-8 text in crawlerType cheerio

Opened a year ago by consoling_knock, last comment a year ago by Jindřich Bär (jindrich.bar)