Facebook Marketplace Scraper avatar
Facebook Marketplace Scraper

Pricing

$25.00/month + usage

Go to Store
Facebook Marketplace Scraper

Facebook Marketplace Scraper

Developed by

DataVoyantLab

DataVoyantLab

Maintained by Community

Extract data from Facebook Marketplace listings and export to CSV, JSON, or use through a powerful API.

4.0 (6)

Pricing

$25.00/month + usage

15

Total users

279

Monthly users

65

Runs succeeded

>99%

Issues response

1.3 days

Last modified

a month ago

futurafree avatar

Duplicates Question

Closed

Gavin (futurafree) opened this issue
2 months ago

When using "Deduplicate across runs" with a low maxItems (e.g., 10) for frequent checking, could the Actor potentially stop the run earlier if it detects that all 10 results are all duplicates instead of skipping them? Currently, it seems to skip all the duplicates, returning the listings that come after, but the URL i'm using for facebook is sorted by new, so i only need to detect if any of the first few listings at the top are new, if the first 10 come back as duplicates then I know that no new listings have arrived.

This would help optimize it greatly so it doesnt have to keep scraping the same info over and over, and doesn't make it skip over to the next batch of items even though they aren't the newest listings. Thanks!

DataVoyantLab avatar

Hello, we will take into consideration in the next release.

DataVoyantLab avatar

Hello, I'm back from a break . What about a parameter stop_on_first_page_all_duplicates ? will that resolve your issue ?

futurafree avatar

Hello, that should work, so if all items i’m searching for (let’s say the first 10 listings) are all duplicates, then it will stop the task and return nothing, correct? If so that’s great.

Now, if there are let’s say 6 duplicate listings and the other 4 are new ones, are all 10 listings still returned? or will it only cost the user for the 4 new listings, not the 6 duplicates, then stop the task?

DataVoyantLab avatar

You're absolutely right.

Just to clarify, when I mention the cost per item and per page, I'm referring to proxy usage cost.

So, if all 10 listings (6 duplicates + 4 new) are on the same page, you'll only incur proxy costs for: 1 page crawl + 4 new item detail requests.

Since the 6 duplicates are already known, we skip their detail requests saving proxy usage and reducing the overall cost.

futurafree avatar

Great okay, and that will also save reads and writes as those have been one of my biggest costs (storage). Thanks a lot, that sounds perfect!

DataVoyantLab avatar

Do you have some statistics about your costs (especially for storage). Can you share them with me please ? datavoyant -> gmail

DataVoyantLab avatar

The feature is now available! When you get a moment, could you please leave a star rating for the actor? Your feedback really helps—thanks!

2025-06-05T00:15:14.179Z [apify] INFO ▶️ Start scraping: https://www.facebook.com/marketplace/prague/vehicles/?sortBy=creation_time_descend&exact=true
2025-06-05T00:15:19.598Z [apify] INFO ♻️ skipped 7 duplicate items
2025-06-05T00:15:19.723Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:19.725Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.518Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.520Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.522Z [apify] INFO ♻️ skipped 8 duplicate items
2025-06-05T00:15:21.657Z [apify] INFO 📄 page 2new: 0, total: 1
2025-06-05T00:15:21.760Z [apify] INFO 🛑 All items on this page are duplicates. Stopping early.
2025-06-05T00:15:21.762Z [apify] INFO ✅ Finished https://www.facebook.com/marketplace/prague/vehicles/?sortBy=creation_time_descend&exact=true (total items: 1)
2025-06-05T00:15:21.764Z [apify] INFO ✅ All URLs processed; exiting.