Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoCombining startUrl and includeUrlGlob
Opened a day ago by nauticallygreat, last comment a day ago by nauticallygreat
Crawling logic
Opened a day ago by nauticallygreat, last comment a day ago by nauticallygreat
Why did it run for 8 hours, isnt' there a hard limit of 9 minutes?
Opened 3 days ago by stevecasey1213, last comment 3 days ago by Jiří Spilka (jiri.spilka)
Does not extract anything frrom the provided website
Opened 5 days ago by ennsharma, last comment 5 days ago by ennsharma
page scrolling
Opened 6 days ago by kempt_trophy, last comment 3 days ago by kempt_trophy
Not much of value scrapped.
Opened 6 days ago by methodical, last comment 5 days ago by Jiří Spilka (jiri.spilka)
Assist me in fixing the configuration errors
Opened a month ago by crispnik, last comment a month ago by Jindřich Bär (jindrich.bar)
Crawling does not work for some sitemaps
Opened 2 months ago by sai_sampath, last comment a month ago by Jiří Spilka (jiri.spilka)
Avoid scraping pages when redirected
Opened 2 months ago by AnyTeam, last comment a month ago by Oscar Rodriguez (Oscardz)
Can't crawl while logged in
Opened 2 months ago by rust_chimta, last comment 2 months ago by Oscar Rodriguez (Oscardz)
My run doesn't work. I have 0 results
Opened 3 months ago by contact_plune, last comment 3 months ago by Jan Buchar (janbuchar)
Poor CPU utilization due to low usage limit
Opened 3 months ago by write2souvik, last comment 2 months ago by write2souvik
Crawling takes longer when calling API vs on site
Opened 3 months ago by adi-kamaraj, last comment 3 months ago by Jan Buchar (janbuchar)
How to ignore broken SSL when using PROXY
Opened 3 months ago by sash2s, last comment 3 months ago by Jan Buchar (janbuchar)
Unable to crawl https://openai.com/index/extracting-concepts-from-gpt-4/
Opened 3 months ago by imda_peckyoke, last comment 3 months ago by Jindřich Bär (jindrich.bar)
Crawler does not identify relative links
Opened 3 months ago by MavenAGI, last comment 3 months ago by Jindřich Bär (jindrich.bar)
My Runs do not end
Opened 3 months ago by matthias.amberg, last comment 3 months ago by matthias.amberg
Parsing website with CloudFlare protection
Opened 3 months ago by sash2s, last comment 3 months ago by sash2s
Unable to crawl the whole website
Opened 3 months ago by simpleworks, last comment 3 months ago by Jan Buchar (janbuchar)
Automating Web Content Crawling for Real-Time Updates
Opened 3 months ago by glovebubble, last comment 3 months ago by Jan Buchar (janbuchar)
- 3.8k monthly users
- 635 stars
- 100.0% runs succeeded
- 2.6 days response time
- Created in Mar 2023
- Modified 7 days ago