Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoset the depth to 0
Opened 6 days ago by environmental_mammal, last comment 6 days ago by Jiří Spilka (jiri.spilka)
Few requests
Opened 16 days ago by nimble_caretaker, last comment 10 days ago by nimble_caretaker
TypeError: Cannot read properties of undefined (reading 'content-type')
Opened 23 days ago by sevcik, last comment 11 days ago by Jiří Spilka (jiri.spilka)
Poor CPU utilization due to low usage limit
Opened 3 months ago by write2souvik, last comment 3 months ago by write2souvik
Crawling takes longer when calling API vs on site
Opened 4 months ago by adi-kamaraj, last comment 4 months ago by Jan Buchar (janbuchar)
Unable to crawl https://openai.com/index/extracting-concepts-from-gpt-4/
Opened 4 months ago by imda_peckyoke, last comment 4 months ago by Jindřich Bär (jindrich.bar)
My Runs do not end
Opened 4 months ago by matthias.amberg, last comment 4 months ago by matthias.amberg
Parsing website with CloudFlare protection
Opened 4 months ago by sash2s, last comment 4 months ago by sash2s
Unable to crawl the whole website
Opened 4 months ago by simpleworks, last comment 4 months ago by Jan Buchar (janbuchar)
Automating Web Content Crawling for Real-Time Updates
Opened 4 months ago by glovebubble, last comment 4 months ago by Jan Buchar (janbuchar)
Getting duplicate URLs in web crawling
Opened 4 months ago by simpleworks, last comment 4 months ago by Jan Buchar (janbuchar)
Memory limit control
Opened 5 months ago by vitthalrao.lavate, last comment 4 months ago by intriguing_game
Treat hash URLs as separate pages to crawl
Opened 5 months ago by civic-roundtable, last comment 5 months ago by civic-roundtable
Crawling claims to succeed, but crawls nothing and returns no results
Opened 5 months ago by chrislrobert, last comment 5 months ago by chrislrobert
Chrome+Playwright crawler is deprecated and not working anymore
Opened 5 months ago by sestek, last comment 5 months ago by Jindřich Bär (jindrich.bar)
Actor run timed out
Opened 6 months ago by david_conveyor, last comment 6 months ago by Ivan Vasilev (ivanvia)
Adaptive crawler is failing to crawl the start URL
Opened 7 months ago by embrace_ai, last comment 7 months ago by embrace_ai
bug: iframe contents don't get extracted properly
Opened 7 months ago by bllndman, last comment 6 months ago by bllndman
Can we limit the number of pages inside a child?
Opened 7 months ago by sai_sampath, last comment 7 months ago by Jindřich Bär (jindrich.bar)
Failed Crawling for G2 web pages
Opened 8 months ago by motivated_leaflet, last comment 8 months ago by motivated_leaflet
Actor Metrics
3.9k monthly users
-
718 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 18 hours ago