Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoThe wording and set up needs to be spoon fed. I know you clever tech folk make tech folk stuff but treat the rest of us like babies if you don't mind. I've just extracted a whole bunch of text I already got from GPT (free). I only wanted the emails but can't tell where to find the info.
Why not just have buttons?
What would you like to scrape?
Choose: [Emails] [Phone Numbers] [URLS] etc etc
Hi, thank you for your interest in this Actor!
I’m not sure if I fully understand your issue. Are you referring to the Website Content Crawler or the Contact Details Scraper?
I agree that both have numerous tuning options, making the configuration somewhat complex—though that’s often the nature of scraping.
If you want to scrape contact details from a specific list of URLs, I would use the Contact Details Scraper
. Specify your list of URLs and set the Maximum link depth to 0. This will ensure that only the provided URLs are scraped for contact details.
Hey! What I'm saying is give users both options. Right now, it's 1 option (the complex route). Maybe have a simplified version. There are drop-down and JSON chatter and all sorts of random things that I know I won't use. I have used scrapers before and found them a lot simpler. I think you could adopt a wider audience with a simplified version and a [Advanced Mode] option. Anyway, it's not for me to tell you what to do. I don't know what your goal is with it long term, just a suggestion from a laymen.
Hi, thank you for the feedback. I appreciate it. I agree that the settings can be complicated, and the learning curve is steep. I’ll share your feedback with the team.
I’ll close this issue now. Please feel free to reach out if you have any other questions.
- 3.8k monthly users
- 636 stars
- 100.0% runs succeeded
- 2.7 days response time
- Created in Mar 2023
- Modified 7 days ago