Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

Developed by

Apify

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

3.7 (41)

Pricing

Pay per usage

1526

Total users

59K

Monthly users

7.9K

Runs succeeded

>99%

Issues response

7.6 days

Last modified

4 days ago

CH

Cannot retrieve info in lazy load part

Closed

chutnarin opened this issue
a year ago

I tried to scrape the 48th reel's view count info as highlighted from this page (https://www.facebook.com/people/Pang-Piraya/100011525405767/?sk=reels_tab) but it always crawl only first 10 reels only. What should I config in input? please help.

jindrich.bar avatar

Hello @chutnarin and thank you for your interest in this Actor!

This Actor (Website Content Crawler) is not primarily designed for scraping social media - platforms like Facebook or X (Twitter) often utilize heavy-weight anti-scraping measurements.

If you want to scrape Facebook, look for Facebook-related Actors in the Store. Alternatively, you can try searching GitHub for some open-source Facebook scrapers and actorize them for use on Apify. You can find guides for that in our Documentation.

Thank you for understanding. Cheers!