Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoI try to grab the content of "https://ung.no/oss/sEEF4x6B9VDmdLLCPcVaoQ", sometimes it works, most often it does not and returns "404 - Fant ikke siden..." (page not found). When opening the url in a browser it shows "404 - Fant ikke siden" a second before showing it's real content. Any suggestions?
Hello, This website takes a long time to load, so what you can do is increase the "dynamicContentWaitSecs" to 20 seconds. Hope this works. Best regards,
Not really, when i use the api with av list of 1000 urls, some of them still fails. But when I test them in the apify-client, they are working, the content is there.
Issue solved with use waitForSelector and maxconcurrency = 2
- 3.8k monthly users
- 636 stars
- 100.0% runs succeeded
- 2.7 days response time
- Created in Mar 2023
- Modified 7 days ago