Deduplicator avatar
Deduplicator

Under maintenance

Pricing

$5.00/month + usage

Go to Store
Deduplicator

Deduplicator

Under maintenance

Developed by

Mustafa Irshaid

Mustafa Irshaid

Maintained by Community

Filters out duplicates from previous runs and outputs only new data. Perfect for scheduled scrapers or chained actors to ensure you process fresh results every time. Seamlessly integrates with any Apify actor using default dataset ID.

0.0 (0)

Pricing

$5.00/month + usage

0

Total users

1

Monthly users

1

Runs succeeded

0%

Last modified

15 hours ago

🔁 Deduplicator Actor – Output New Data Only

This actor filters out duplicates from previous runs and emits only new data.
Designed to be used as a post-processing layer after any Apify Actor, it automatically detects and removes duplicate items by comparing against cached results stored in a persistent key-value store.


🔧 How It Works

  • Accepts input from another actor via integration (no manual input required).
  • Reads from the defaultDatasetId in the incoming payload.
  • Hashes each item based on its content.
  • Compares against previously stored hashes in a key-value store.
  • Outputs only new, unseen items to this actor’s default dataset.

✅ Features

  • 🧠 Smart deduplication across multiple runs.
  • 💾 Persistent cache using Apify key-value store.
  • 🔗 Seamless integration with any actor (no input config needed).
  • Zero configuration – just plug it in and run.

🚀 Usage

  1. Integrate after any data-producing actor
    Use Apify’s Actor-to-Actor Integration or a webhook triggered on ACTOR.RUN.SUCCEEDED.

  2. Let it process automatically
    It detects the dataset from the actor run and filters out known items.

  3. Consume the results
    New data will be available in this actor’s default dataset.


📤 Output

  • Dataset containing only new items not seen in previous runs.
  • Duplicates are skipped silently.
  • Logs include summary of processed, new, and skipped entries.

🧪 Example Scenario

You're scraping job listings daily. Most results stay the same.
By integrating this actor, only newly discovered jobs are pushed forward to your database, notification system, or data pipeline.


📄 License

MIT