Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoHi my run didnt work
Opened a day ago by ballerine, last comment 38 minutes ago by Oscar Rodriguez (Oscardz)
request for html output to keep only essential <img> outputs
Opened 2 days ago by confident_socket, last comment a day ago by Oscar Rodriguez (Oscardz)
Data not being pushed to Pinecone from WCC
Opened 3 days ago by Custombizio, last comment 20 hours ago by Oscar Rodriguez (Oscardz)
My run doesn't enqueue any URLs
Opened 3 days ago by bhupeshchandra, last comment 3 days ago by bhupeshchandra
Limiting scraped pages (e.g. maxCrawlPages = 30) doesn't work
Opened 6 days ago by beaming_gauge, last comment 5 days ago by Jan Buchar (janbuchar)
My run doesn't work. I have 0 results
Opened 9 days ago by contact_plune, last comment 9 days ago by Jan Buchar (janbuchar)
Poor CPU utilization due to low usage limit
Opened 12 days ago by write2souvik, last comment 2 days ago by write2souvik
Crawling subdomains
Opened 14 days ago by gainful_governor, last comment 13 days ago by Jan Buchar (janbuchar)
Crawling takes longer when calling API vs on site
Opened 16 days ago by adi-kamaraj, last comment 15 days ago by Jan Buchar (janbuchar)
Correct Json body (API)
Opened 16 days ago by glovebubble, last comment 15 days ago by glovebubble
How to ignore broken SSL when using PROXY
Opened 18 days ago by sash2s, last comment 16 days ago by Jan Buchar (janbuchar)
Zapier trigger = time out
Opened 18 days ago by teal_northerner, last comment 2 hours ago by Oscar Rodriguez (Oscardz)
Crawler accesses pages and loads data correctly, but status code is 404
Opened 20 days ago by cirez_d, last comment 19 days ago by cirez_d
Not able to download any pages or files
Opened 22 days ago by ollieiq, last comment 20 days ago by Jindřich Bär (jindrich.bar)
Unable to crawl https://openai.com/index/extracting-concepts-from-gpt-4/
Opened 23 days ago by imda_peckyoke, last comment 23 days ago by Jindřich Bär (jindrich.bar)
Screenshot got cut off
Opened a month ago by harry_tran, last comment 23 days ago by harry_tran
Crawler does not identify relative links
Opened a month ago by MavenAGI, last comment a month ago by Jindřich Bär (jindrich.bar)
Crawler fails crawling nike.com
Opened a month ago by ballerine, last comment 23 days ago by Jindřich Bär (jindrich.bar)
Download RSS Feeds
Opened a month ago by carlson, last comment a month ago by Jindřich Bär (jindrich.bar)
Configuring Crawler Settings for Crawling Image URLs
Opened a month ago by glovebubble, last comment a month ago by Jindřich Bär (jindrich.bar)
- 2.9k monthly users
- 440 stars
- 99.9% runs succeeded
- 2.8 days response time
- Created in Mar 2023
- Modified about 23 hours ago