Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

User avatar

Failed Crawling for G2 web pages

Open

motivated_leaflet opened this issue
a month ago

Problem: Crawling requests for G2 never succeeds in runs after 10 session rotations. This is consistent with run over the last few days.

Not expecting this as there are G2 scraping actors and the site overall does not seem anti-scraping. Thank you!

User avatar

Hello,

Thanks for the report. Actually, g2.com has one of the strongest Cloudflare protection we have seen (at least some types of pages). There is a customized approach to scrape it, the team will look into how to incorporate it to this actor.

User avatar

motivated_leaflet

a month ago

Thank you! Looking forward to the update. G2 is a pretty key part of our project use case.

Developer
Maintained by Apify
Actor metrics
  • 2k monthly users
  • 99.9% runs succeeded
  • 2.9 days response time
  • Created in Mar 2023
  • Modified about 12 hours ago