Website Content Crawler avatar

Website Content Crawler

Try for free

No credit card required

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
SC

How do I setup pagination with a URL

Closed

sacdrexelmba opened this issue
22 days ago

I have a URL that I need to extract data from but there's 75+ pages to click through. How can I set that up in the input section?

https://www.hlth.com/2024event/attending-companies

dusan.vystrcil avatar

Hello, and thank you for your interest in this Actor!

You've just found out about one of the limitations of the Website Content Crawler. While it can load dynamically loaded data, it cannot interact with JS-based pagination. This is because there is no standardized way to implement this - in this way, WCC is similar to the Google Crawler, which also claims not to process these. Implementing pagination on your website like this can then possibly hurt your SEO performance - yet, some websites still do it, as you found out.

In case you want to make this automatic (and you're e.g. expecting the page to add new links there), you can check out our Web Scraper - https://apify.com/apify/web-scraper. You can execute custom code in this Actor, which allows you to click on the page elements and navigate the website using this interaction.

While I understand I didn't exactly solve your problem, I'll close this issue - that's our usual process with the JS-navigation-based issues, as there is very little to do (without turning the WCC into a very website-specific tool, which we don't want to).

Either way, feel free to suggest any ideas or ask any additional questions, be it here, or in new issues. Thanks!

OV

optimal_valuation

18 days ago

I appreciate your help and reply. I tried that actor and I can't figure out how to setup the custom code in the actor that'll allow me to click don't he page elements and navigate the website using this interaction. I posted for help in that actors support queue but haven't heard back in a few days. I realize it's not your speciality, but would you be able to help me out and set it up?

Developer
Maintained by Apify

Actor Metrics

  • 3.9k monthly users

  • 718 stars

  • >99% runs succeeded

  • 2.2 days response time

  • Created in Mar 2023

  • Modified 15 hours ago