Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

Do you want to learn more about this Actor?

Get a demo
AT

Website Screenshot

Closed

athletic_trophy opened this issue
2 months ago

Enterprise Customer Request

We need the screenshot function to run after all CSS animations have fully rendered when this runner executes. Currently, when we take screenshots of pages, modern and trendy websites often still have animations in progress. This prevents content, such as text, from being fully visible.

The other actor for the screenshot-url screen does not have this issue, and we'd like that functionality here. When we reported this issue, they directed us to this runner.

https://console.apify.com/actors/rGCyoaKTKhyMiiTvS/issues/WnDNC6IfujvxZAnfV

jindrich.bar avatar

Thank you for reaching out.

The current actor is primarily used to extract text information from websites for LLM processing. The text extraction scripts can disrupt the block layout of the page, making the screenshots unrepresentative.

To ensure all CSS animations are fully rendered before taking screenshots, we recommend you to use the Playwright Scraper. This tool allows you to execute custom code in the context of the webpage. There, you can use Playwright's built-in screenshot function for precise control - passing certain options, you can e.g. disable animations, set the output format of the screenshot and more.

We hope this helps. We'll be closing this issue now. Cheers!

Developer
Maintained by Apify
Actor metrics
  • 2.8k monthly users
  • 434 stars
  • 99.9% runs succeeded
  • 2.9 days response time
  • Created in Mar 2023
  • Modified 3 days ago