Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
Hi my run didnt work

Opened a day ago by ballerine, last comment 38 minutes ago by Oscar Rodriguez (Oscardz)

request for html output to keep only essential <img> outputs

Opened 2 days ago by confident_socket, last comment a day ago by Oscar Rodriguez (Oscardz)

Data not being pushed to Pinecone from WCC

Opened 3 days ago by Custombizio, last comment 20 hours ago by Oscar Rodriguez (Oscardz)

My run doesn't enqueue any URLs

Opened 3 days ago by bhupeshchandra, last comment 3 days ago by bhupeshchandra

Limiting scraped pages (e.g. maxCrawlPages = 30) doesn't work

Opened 6 days ago by beaming_gauge, last comment 5 days ago by Jan Buchar (janbuchar)

My run doesn't work. I have 0 results

Opened 9 days ago by contact_plune, last comment 9 days ago by Jan Buchar (janbuchar)

Poor CPU utilization due to low usage limit

Opened 12 days ago by write2souvik, last comment 2 days ago by write2souvik

Crawling subdomains

Opened 14 days ago by gainful_governor, last comment 13 days ago by Jan Buchar (janbuchar)

Crawling takes longer when calling API vs on site

Opened 16 days ago by adi-kamaraj, last comment 15 days ago by Jan Buchar (janbuchar)

Correct Json body (API)

Opened 16 days ago by glovebubble, last comment 15 days ago by glovebubble

How to ignore broken SSL when using PROXY

Opened 18 days ago by sash2s, last comment 16 days ago by Jan Buchar (janbuchar)

Zapier trigger = time out

Opened 18 days ago by teal_northerner, last comment 2 hours ago by Oscar Rodriguez (Oscardz)

Crawler accesses pages and loads data correctly, but status code is 404

Opened 20 days ago by cirez_d, last comment 19 days ago by cirez_d

Not able to download any pages or files

Opened 22 days ago by ollieiq, last comment 20 days ago by Jindřich Bär (jindrich.bar)

Unable to crawl https://openai.com/index/extracting-concepts-from-gpt-4/

Opened 23 days ago by imda_peckyoke, last comment 23 days ago by Jindřich Bär (jindrich.bar)

Screenshot got cut off

Opened a month ago by harry_tran, last comment 23 days ago by harry_tran

Crawler does not identify relative links

Opened a month ago by MavenAGI, last comment a month ago by Jindřich Bär (jindrich.bar)

Crawler fails crawling nike.com

Opened a month ago by ballerine, last comment 23 days ago by Jindřich Bär (jindrich.bar)

Download RSS Feeds

Opened a month ago by carlson, last comment a month ago by Jindřich Bär (jindrich.bar)

Configuring Crawler Settings for Crawling Image URLs

Opened a month ago by glovebubble, last comment a month ago by Jindřich Bär (jindrich.bar)

Developer
Maintained by Apify
Actor metrics
  • 2.9k monthly users
  • 440 stars
  • 99.9% runs succeeded
  • 2.8 days response time
  • Created in Mar 2023
  • Modified about 23 hours ago