Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoI have a URL that I need to extract data from but there's 75+ pages to click through. How can I set that up in the input section?
Hello, and thank you for your interest in this Actor!
You've just found out about one of the limitations of the Website Content Crawler. While it can load dynamically loaded data, it cannot interact with JS-based pagination. This is because there is no standardized way to implement this - in this way, WCC is similar to the Google Crawler, which also claims not to process these. Implementing pagination on your website like this can then possibly hurt your SEO performance - yet, some websites still do it, as you found out.
In case you want to make this automatic (and you're e.g. expecting the page to add new links there), you can check out our Web Scraper - https://apify.com/apify/web-scraper. You can execute custom code in this Actor, which allows you to click on the page elements and navigate the website using this interaction.
While I understand I didn't exactly solve your problem, I'll close this issue - that's our usual process with the JS-navigation-based issues, as there is very little to do (without turning the WCC into a very website-specific tool, which we don't want to).
Either way, feel free to suggest any ideas or ask any additional questions, be it here, or in new issues. Thanks!
I appreciate your help and reply. I tried that actor and I can't figure out how to setup the custom code in the actor that'll allow me to click don't he page elements and navigate the website using this interaction. I posted for help in that actors support queue but haven't heard back in a few days. I realize it's not your speciality, but would you be able to help me out and set it up?
Actor Metrics
3.9k monthly users
-
718 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 15 hours ago