7 days trial then $20.00/month - No credit card required now

Facebook Pages Scraper

apify/facebook-pages-scraper

7 days trial then $20.00/month - No credit card required now

Facebook scraping tool to crawl and extract basic data from one or multiple Facebook Pages. Extract Facebook page name, page URL address, category, likes, check-ins, and other public data. Download data in JSON, CSV, Excel and use it in apps, spreadsheets, and reports.

Back to issues Create new issue

Multiple retries instead of instant fail

Closed

Ernest Bursa (ernest) opened this issue

We found out that scraper stays in retry loop instead of failing when provided with bad url.

12023-11-28T21:42:52.475Z ACTOR: Pulling Docker image of build n0sxaG0Cxh0Ctm7cr from repository.
22023-11-28T21:42:52.577Z ACTOR: Creating Docker container.
32023-11-28T21:42:52.645Z ACTOR: Starting Docker container.
42023-11-28T21:43:00.291Z INFO  System info {"apifyVersion":"3.1.13","apifyClientVersion":"2.8.4","crawleeVersion":"3.6.2","osType":"Linux","nodeVersion":"v16.20.2"}
52023-11-28T21:43:00.650Z INFO  Decoding 1 Facebook URLs
62023-11-28T21:43:00.700Z INFO  CheerioCrawler: Starting the crawler.
72023-11-28T21:43:06.108Z WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. Facebook access error
82023-11-28T21:43:06.109Z     at validatedUrlHandler (file:///usr/src/app/src/main.js:83:19) {"id":"WcE27lkdE8DXhVY","url":"https://www.facebook.com/WithlovefromLeah","retryCount":1}
92023-11-28T21:43:11.778Z WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. Facebook access error
102023-11-28T21:43:11.787Z     at validatedUrlHandler (file:///usr/src/app/src/main.js:83:19) {"id":"WcE27lkdE8DXhVY","url":"https://www.facebook.com/WithlovefromLeah","retryCount":2}
112023-11-28T21:43:16.725Z WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. Facebook access error
122023-11-28T21:43:16.731Z     at validatedUrlHandler (file:///usr/src/app/src/main.js:83:19) {"id":"WcE27lkdE8DXhVY","url":"https://www.facebook.com/WithlovefromLeah","retryCount":3}
132023-11-28T21:43:21.457Z WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. Facebook access error
142023-11-28T21:43:21.458Z     at validatedUrlHandler (file:///usr/src/app/src/main.js:83:19) {"id":"WcE27lkdE8DXhVY","url":"https://www.facebook.com/WithlovefromLeah","retryCount":4}
152023-11-28T21:43:27.441Z WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. Facebook access error
162023-11-28T21:43:27.442Z     at validatedUrlHandler (file:///usr/src/app/src/main.js:83:19) {"id":"WcE27lkdE8DXhVY","url":"https://www.facebook.com/WithlovefromLeah","retryCount":5}
172023-11-28T21:43:31.913Z WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. Facebook access error
182023-11-28T21:43:31.915Z     at validatedUrlHandler (file:///usr/src/app/src/main.js:83:19) {"id":"WcE27lkdE8DXhVY","url":"https://www.facebook.com/WithlovefromLeah","retryCount":6}
192023-11-28T21:43:37.175Z WARN  CheerioCrawler: Reclaiming failed request back to the list or queue. Facebook access error
202023-11-28T21:43:37.177Z     at validatedUrlHandler (file:///usr/src/app/src/main.js:83:19) {"id":"WcE27lkdE8DXhVY","url":"https://www.facebook.com/WithlovefromLeah","retryCount":7}
212023-11-28T21:43:37.364Z ACTOR: The Actor run has reached the timeout of 45 seconds, aborting it. You can increase the timeout in Settings > Run options.

Alexey Udovydchenko (alexey)

Hi! Currently only detectable case when page was removed from Facebook by action (deleted by admin or banned by Meta). For all other cases its impossible to tell if page not available because of random blocking or permanently. I´m going to close the issue now, but if there would be anything else we could help with, please let us know.

Ernest Bursa (ernest)

Sorry, I was not clear - I'd like to request a worker to fail after detecting that the page will not present any results instead of retrying indefinitely, increasing the scraping cost. I hope you will have a good idea of how to address this. Thank you!

Andrey Bykov (Andrey_Bykov)

As Alexey wrote above - this error does not mean that the request will not fetch any data, it could be a random blocking, or some other random issue, meaning that eventually, it WILL fetch the data. Failing the run for each error like this means failing both 'bad' and 'good' pages.

Basically, what I am trying to say - there's currently no certain way to detect whether the page will present the results or not.

Ernest Bursa (ernest)

Could we expose the parameter of how many times we would like to retry? For me, up to 3 retries make sense. Otherwise, it should fail. But I'm okay that default will be higher, and as a user, I'd be able to overwrite that.

Alexey Udovydchenko (alexey)

Hi! Done:

Allowed to specify custom maxRequestRetries in json input
Changed error prompt to Page access was blocked or page is not available, retrying with new session

Sample run: https://console.apify.com/view/runs/oIVV0Q5iaRzWqiSQV I´m going to close the issue now, but if there would be anything else we could help with, please let us know.

Ernest Bursa (ernest)

I'm not sure if you updated the schema accordingly to the change. There is no way to specify maxRequestRetries in UI.

Andrey Bykov (Andrey_Bykov)

It's updated in JSON input - It should work by default for most users and there's a reason to have a higher number (after testing, etc), so schema does not have it. But you could 'Switch to JSON editor' when you're on the Input tab, add this option, and start the run.

Alexey, correct me if I am wrong, also we should make sure it's reflected in the Readme.

Alexey Udovydchenko (alexey)

Andrey, confirmed, custom max requests is special use case and must be avoided in UI input form. Low max requests might lead to data loss, that's why it should not be exposed in visual input and also json option should not be recommended in readme.

Add comment

Developer

Apify

Actor metrics

476 monthly users
34 stars
100.0% runs succeeded
19 hours response time
Created in Feb 2020
Modified 5 days ago

Categories

Social media

Marketing

Lead generation

Lightweight Facebook Pages Scraper

oussemafr/lightweight-facebook-pages-scraper

Scrape detailed Facebook page information efficiently and cost-effectively with our best scraper. Extract valuable data like page names, URLs, contact details, addresses, likes, and followers for competitive analysis, market research, trend monitoring, and social media analysis.

Oussema FRIKHA

Facebook Pages Scraper

powerful_bachelor/Facebook-Posts-Scraper

Unlock the power of Facebook data with 📘 Facebook Pages Scraper! Dive deep into page analytics, trends, and audience engagement. Curious what insights are hidden in plain sight? Let's explore together!!

Powerful Bachelor

Facebook Groups Scraper

apify/facebook-groups-scraper

Extract data from one or multiple public Facebook groups. Get group and post URLs, post text, comments, timestamp, likes and comments count, and basic commentator info. Download the data in JSON, CSV, and Excel and use it in apps, spreadsheets, and reports.

Apify

5.4k

Facebook page contact info Scraper

apify/facebook-page-contact-information

Get Facebook pages addresses, email, likes, website, check-ins, and phone information

Apify

1.9k

Facebook Photos Scraper

apify/facebook-photos-scraper

Extract data from one or multiple Facebook images. Get image ID, Facebook photo URL, image URL, OCR text, and more. Download the data in JSON, CSV, and Excel and use it in apps, spreadsheets, and reports.

Apify

771

Facebook Pages Search

roundedge/facebook-pages-search

With more than 200M pages of businesses around the world, searching companies on Facebook is a must have when it comes to search for B2B leads. Enter the locations, the categories and keywords you're looking for and be ready to act on the results.

RoundEdge

992

Facebook Likes Scraper

apify/facebook-likes-scraper

Extract Facebook likes data from one or multiple Facebook posts. Get post URL, reaction type (like, love, care, sad, angry, laugh), and basic liker info such as Facebook name and profile URL. Download the data in JSON, CSV Excel and use it in apps, spreadsheets, and reports.

Apify

975