Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
Combining startUrl and includeUrlGlob

Opened a day ago by nauticallygreat, last comment a day ago by nauticallygreat

Crawling logic

Opened a day ago by nauticallygreat, last comment a day ago by nauticallygreat

Why did it run for 8 hours, isnt' there a hard limit of 9 minutes?

Opened 3 days ago by stevecasey1213, last comment 3 days ago by Jiří Spilka (jiri.spilka)

Does not extract anything frrom the provided website

Opened 5 days ago by ennsharma, last comment 5 days ago by ennsharma

page scrolling

Opened 6 days ago by kempt_trophy, last comment 3 days ago by kempt_trophy

Crawler does not extract content, with no useful logs to debug

Opened 6 days ago by nimble_caretaker, last comment 5 days ago by nimble_caretaker

Not much of value scrapped.

Opened 6 days ago by methodical, last comment 6 days ago by Jiří Spilka (jiri.spilka)

Crawler overcharges by several times

Opened 10 days ago by hyperlace, last comment 10 days ago by Jiří Spilka (jiri.spilka)

12 mins in and nothing has been crawled

Opened 12 days ago by callumdownie, last comment 5 days ago by Jiří Spilka (jiri.spilka)

crawling stuck

Opened 12 days ago by greenforestpath8, last comment 12 days ago by Jindřich Bär (jindrich.bar)

Too complex to start this process

Opened 14 days ago by rongwroom, last comment 13 days ago by Jiří Spilka (jiri.spilka)

Custom user agent

Opened 14 days ago by civic-roundtable, last comment 12 days ago by Jiří Spilka (jiri.spilka)

I can't send data from Clay to Apify.

Opened 19 days ago by romeoman, last comment 10 days ago by Jiří Spilka (jiri.spilka)

Access blocked everytime

Opened 23 days ago by saidur297, last comment 18 days ago by Jiří Spilka (jiri.spilka)

Various questions about operation and optimization of website content crawler

Opened 24 days ago by David Haddad (davhad), last comment 17 days ago by Jiří Spilka (jiri.spilka)

Actor simply doesn't work

Opened 25 days ago by xylonic_gloves, last comment 22 days ago by Jiří Spilka (jiri.spilka)

Insane usage time

Opened 25 days ago by xylonic_gloves, last comment 25 days ago by Jiří Spilka (jiri.spilka)

Doesn't scape table on website

Opened a month ago by esouthwick, last comment a month ago by Jindřich Bär (jindrich.bar)

Sitemap discovery takes long time (15 minutes)

Opened a month ago by Jiří Spilka (jiri.spilka), last comment 11 days ago by Jindřich Bär (jindrich.bar)

Website Scrape doesn't include Social Media Links

Opened a month ago by Synergize_AI, last comment a month ago by Jindřich Bär (jindrich.bar)

Developer
Maintained by Apify
Actor metrics
  • 3.8k monthly users
  • 635 stars
  • 100.0% runs succeeded
  • 2.7 days response time
  • Created in Mar 2023
  • Modified 7 days ago