Facebook Marketplace Scraper avatar
Facebook Marketplace Scraper

Pricing

$25.00/month + usage

Go to Store
Facebook Marketplace Scraper

Facebook Marketplace Scraper

Developed by

DataVoyantLab

DataVoyantLab

Maintained by Community

Extract data from Facebook Marketplace listings and export to CSV, JSON, or use through a powerful API.

4.0 (5)

Pricing

$25.00/month + usage

13

Total users

229

Monthly users

58

Runs succeeded

96%

Issues response

23 hours

Last modified

6 days ago

FU

Duplicates Question

Closed

futurafree opened this issue
2 months ago

When using "Deduplicate across runs" with a low maxItems (e.g., 10) for frequent checking, could the Actor potentially stop the run earlier if it detects that all 10 results are all duplicates instead of skipping them? Currently, it seems to skip all the duplicates, returning the listings that come after, but the URL i'm using for facebook is sorted by new, so i only need to detect if any of the first few listings at the top are new, if the first 10 come back as duplicates then I know that no new listings have arrived.

This would help optimize it greatly so it doesnt have to keep scraping the same info over and over, and doesn't make it skip over to the next batch of items even though they aren't the newest listings. Thanks!

DataVoyantLab avatar

Hello, we will take into consideration in the next release.

DataVoyantLab avatar

Hello, I'm back from a break . What about a parameter stop_on_first_page_all_duplicates ? will that resolve your issue ?

FU

futurafree

20 days ago

Hello, that should work, so if all items i’m searching for (let’s say the first 10 listings) are all duplicates, then it will stop the task and return nothing, correct? If so that’s great.

Now, if there are let’s say 6 duplicate listings and the other 4 are new ones, are all 10 listings still returned? or will it only cost the user for the 4 new listings, not the 6 duplicates, then stop the task?

DataVoyantLab avatar

You're absolutely right.

Just to clarify, when I mention the cost per item and per page, I'm referring to proxy usage cost.

So, if all 10 listings (6 duplicates + 4 new) are on the same page, you'll only incur proxy costs for: 1 page crawl + 4 new item detail requests.

Since the 6 duplicates are already known, we skip their detail requests saving proxy usage and reducing the overall cost.

FU

futurafree

20 days ago

Great okay, and that will also save reads and writes as those have been one of my biggest costs (storage). Thanks a lot, that sounds perfect!

DataVoyantLab avatar

Do you have some statistics about your costs (especially for storage). Can you share them with me please ? datavoyant -> gmail

DataVoyantLab avatar

The feature is now available! When you get a moment, could you please leave a star rating for the actor? Your feedback really helps—thanks!

2025-06-05T00:15:14.179Z [apify] INFO ▶️ Start scraping: https://www.facebook.com/marketplace/prague/vehicles/?sortBy=creation_time_descend&exact=true
2025-06-05T00:15:19.598Z [apify] INFO ♻️ skipped 7 duplicate items
2025-06-05T00:15:19.723Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:19.725Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.518Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.520Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.522Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.657Z [apify] INFO 📄 page 2 → new: 0, total: 1
2025-06-05T00:15:21.760Z [apify] INFO 🛑 All items on this page are duplicates. Stopping early.
2025-06-05T00:15:21.762Z [apify] INFO ✅ Finished https://www.facebook.com/marketplace/prague/vehicles/?sortBy=creation_time_descend&exact=true (total items: 1)
2025-06-05T00:15:21.764Z [apify] INFO ✅ All URLs processed; exiting.