
Deduplicator
Under maintenance
Pricing
$5.00/month + usage

Deduplicator
Under maintenance
Filters out duplicates from previous runs and outputs only new data. Perfect for scheduled scrapers or chained actors to ensure you process fresh results every time. Seamlessly integrates with any Apify actor using default dataset ID.
0.0 (0)
Pricing
$5.00/month + usage
0
Total users
1
Monthly users
1
Runs succeeded
0%
Last modified
15 hours ago
🔁 Deduplicator Actor – Output New Data Only
This actor filters out duplicates from previous runs and emits only new data.
Designed to be used as a post-processing layer after any Apify Actor, it automatically detects and removes duplicate items by comparing against cached results stored in a persistent key-value store.
🔧 How It Works
- Accepts input from another actor via integration (no manual input required).
- Reads from the
defaultDatasetId
in the incoming payload. - Hashes each item based on its content.
- Compares against previously stored hashes in a key-value store.
- Outputs only new, unseen items to this actor’s default dataset.
✅ Features
- 🧠 Smart deduplication across multiple runs.
- 💾 Persistent cache using Apify key-value store.
- 🔗 Seamless integration with any actor (no input config needed).
- ⚡ Zero configuration – just plug it in and run.
🚀 Usage
-
Integrate after any data-producing actor
Use Apify’s Actor-to-Actor Integration or a webhook triggered onACTOR.RUN.SUCCEEDED
. -
Let it process automatically
It detects the dataset from the actor run and filters out known items. -
Consume the results
New data will be available in this actor’s default dataset.
📤 Output
- Dataset containing only new items not seen in previous runs.
- Duplicates are skipped silently.
- Logs include summary of processed, new, and skipped entries.
🧪 Example Scenario
You're scraping job listings daily. Most results stay the same.
By integrating this actor, only newly discovered jobs are pushed forward to your database, notification system, or data pipeline.
📄 License
MIT