Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
Combining startUrl and includeUrlGlob

Opened a day ago by nauticallygreat, last comment a day ago by nauticallygreat

Crawling logic

Opened a day ago by nauticallygreat, last comment a day ago by nauticallygreat

Why did it run for 8 hours, isnt' there a hard limit of 9 minutes?

Opened 3 days ago by stevecasey1213, last comment 3 days ago by Jiří Spilka (jiri.spilka)

Does not extract anything frrom the provided website

Opened 5 days ago by ennsharma, last comment 5 days ago by ennsharma

page scrolling

Opened 6 days ago by kempt_trophy, last comment 3 days ago by kempt_trophy

Not much of value scrapped.

Opened 6 days ago by methodical, last comment 5 days ago by Jiří Spilka (jiri.spilka)

Assist me in fixing the configuration errors

Opened a month ago by crispnik, last comment a month ago by Jindřich Bär (jindrich.bar)

Crawling does not work for some sitemaps

Opened 2 months ago by sai_sampath, last comment a month ago by Jiří Spilka (jiri.spilka)

Avoid scraping pages when redirected

Opened 2 months ago by AnyTeam, last comment a month ago by Oscar Rodriguez (Oscardz)

Can't crawl while logged in

Opened 2 months ago by rust_chimta, last comment 2 months ago by Oscar Rodriguez (Oscardz)

My run doesn't work. I have 0 results

Opened 3 months ago by contact_plune, last comment 3 months ago by Jan Buchar (janbuchar)

Poor CPU utilization due to low usage limit

Opened 3 months ago by write2souvik, last comment 2 months ago by write2souvik

Crawling takes longer when calling API vs on site

Opened 3 months ago by adi-kamaraj, last comment 3 months ago by Jan Buchar (janbuchar)

How to ignore broken SSL when using PROXY

Opened 3 months ago by sash2s, last comment 3 months ago by Jan Buchar (janbuchar)

Unable to crawl https://openai.com/index/extracting-concepts-from-gpt-4/

Opened 3 months ago by imda_peckyoke, last comment 3 months ago by Jindřich Bär (jindrich.bar)

Crawler does not identify relative links

Opened 3 months ago by MavenAGI, last comment 3 months ago by Jindřich Bär (jindrich.bar)

My Runs do not end

Opened 3 months ago by matthias.amberg, last comment 3 months ago by matthias.amberg

Parsing website with CloudFlare protection

Opened 3 months ago by sash2s, last comment 3 months ago by sash2s

Unable to crawl the whole website

Opened 3 months ago by simpleworks, last comment 3 months ago by Jan Buchar (janbuchar)

Automating Web Content Crawling for Real-Time Updates

Opened 3 months ago by glovebubble, last comment 3 months ago by Jan Buchar (janbuchar)

Developer
Maintained by Apify
Actor metrics
  • 3.8k monthly users
  • 635 stars
  • 100.0% runs succeeded
  • 2.6 days response time
  • Created in Mar 2023
  • Modified 7 days ago