New Records Finder
Under maintenancePricing
from $0.00001 / result
New Records Finder
Under maintenanceCompares an incoming dataset against a persistent key-value store of seen records and outputs only the records that are new, remembering them so they are never reported again. Optionally prunes records that disappear from the source.
Pricing
from $0.00001 / result
Rating
0.0
(0)
Developer
Martin Forejt
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
New Records Finder is an Apify integration Actor that compares an incoming dataset against a persistent state store and outputs only the records that are new — the ones it has never seen before. It remembers every record it has seen in a key-value store, so they are never reported as new again on the next run.
Think of it as a deduplication memory you can drop between any Actor and your downstream workflow: chain it after a scraper and you'll only ever be notified about genuinely fresh items (new products, new job postings, new reviews, new listings), never the ones you already processed.
Why use New Records Finder?
Most scrapers re-scrape the same pages on every run and return the full result set every time. If you only care about what changed, you have to diff the results yourself. This Actor does that diffing for you:
- Get only what's new. Trigger emails, Slack messages, or webhooks for fresh records only — no noise from items you've already seen.
- Stateful across runs. A key-value store remembers every record ever seen, so "new" means new forever, not just new since the last run.
- Optional pruning. Mirror a source: when records disappear from the incoming dataset, optionally forget them so they're treated as new if they ever come back.
- Works with any Actor. It operates on datasets, so it composes with any scraper or data source on the Apify platform.
- No code required. Wire it up with the visual integration builder, schedules, and the rest of the Apify platform — API access, monitoring, and proxy infrastructure included.
How it works
The state is kept in a key-value store, not a dataset, because dataset items are append-only (they can't be deleted) while a key-value store record can be overwritten. The Actor stores only the identity keys of the records it has seen — compact and able to scale to millions of records.
- The Actor reads the set of previously seen keys from the state store.
- It scans the new (incoming) dataset and selects the records whose key is not in the set — these are the new ones. Duplicates within the incoming dataset are collapsed too.
- It emits the new records to this run's output dataset.
- It writes the updated key set back to the state store:
- Pruning off (default): the set is the union of the old keys and the new ones — it only ever grows.
- Pruning on: the set becomes an exact mirror of the keys in the incoming dataset — keys that disappeared from the source are forgotten.
A record's identity is defined by the Unique key fields option (see below).
How to use New Records Finder
- Create an empty key-value store to act as your persistent state, or reuse an existing one. (On the first run it can be empty — everything in the incoming dataset will count as new.)
- Start the Actor and select that store as the State store.
- Select the dataset you want to check — usually the result dataset of another Actor run — as the New dataset (incoming).
- Set Unique key fields to the field(s) that uniquely identify a record (for example
urlorid). Leave it empty to compare whole records. - Optionally turn on Forget records missing from the new dataset to keep the state mirrored to the source.
- Run the Actor. The output dataset will contain only the new records.
To run it automatically, attach it as an integration to another Actor's run, or put it on a schedule. When used as an integration, set the incoming dataset to the triggering run's default dataset.
Input
The Actor accepts the following input. You can set it in the visual Input tab in Apify Console or pass it as JSON via the API.
| Field | Type | Required | Description |
|---|---|---|---|
stateStoreId | String (key-value store) | Yes | The persistent state store that remembers seen records (it holds the identity keys, not the records). Read at start, overwritten at the end. Requires READ + WRITE access. |
newDatasetId | String (dataset) | Yes | The dataset to check, typically another Actor run's result dataset. Read-only. Requires READ access. |
uniqueKeyFields | Array of strings | No | Field name(s) that uniquely identify a record. Use multiple fields for a composite key and dot notation for nested fields (e.g. author.id). Leave empty for full-record equality. |
pruneMissingRecords | Boolean | No | When true, keys not present in the incoming dataset are removed from the state (it mirrors the source). When false (default), the state only grows. |
stateRecordKey | String | No | The record key the seen-key set is stored under. Use different keys to track several independent sources in a single store. Defaults to STATE. |
Input example
{"stateStoreId": "abc123SeenItems","newDatasetId": "xyz789LatestScrape","uniqueKeyFields": ["url"],"pruneMissingRecords": false,"stateRecordKey": "STATE"}
About "Unique key fields"
This option replaces the vaguer idea of an "equality field name". It answers the question "when are two records the same record?":
- One field (e.g.
["id"]) — records match when that field is equal. Other fields can differ. - Multiple fields (e.g.
["country", "city"]) — a composite key; records match only when all listed fields are equal. - Nested field (e.g.
["author.id"]) — dot notation reaches into nested objects. - Empty (
[]) — full-record equality: two records match only when they are deeply identical (key order does not matter).
About "Forget records missing from the new dataset"
By default the state only grows, so a record is reported as new exactly once, forever. Turn this option on to keep the state as an exact mirror of the latest incoming dataset: any record that is no longer present in the source is forgotten, and if it reappears in a future run it will be reported as new again. This is useful for tracking a live listing where items come and go (and come back).
Output
The Actor pushes only the new records to its default dataset. Each output record is identical to the corresponding record in the incoming dataset (the Actor passes records through unchanged). You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.
Output example
If the incoming dataset contained three products but two were already in the state store, the output contains just the one new product:
[{"url": "https://example.com/product/42","title": "Wireless Headphones","price": 79.99}]
A run summary is also stored in the default key-value store under the key STATS:
{"pruneMissingRecords": false,"previousStateSize": 1500,"newRecordsScanned": 200,"newRecordsFound": 1,"duplicatesSkipped": 199,"keysRemovedByPrune": 0,"finalStateSize": 1501,"keyFields": ["url"]}
Pricing
This Actor uses the pay-per-event pricing model and charges a single flat fee per run via Apify's built-in apify-actor-start event. You pay a small, predictable amount each time the Actor runs to check a dataset for new records, regardless of how many records are scanned or found. No separate compute-unit charges apply.
Tips
- Keep one state store per data source. Reuse the same key-value store across runs so its memory of "seen" records keeps growing. Use
stateRecordKeyto track several sources in one store. - Choose a stable key. Pick fields that don't change between runs (an ID or canonical URL) rather than volatile fields like timestamps or prices, otherwise unchanged items will look new.
- Pruning resets forgotten records. With pruning on, a record that leaves and later re-enters the source will be reported as new again — that's intended. Leave pruning off if you want each record reported as new only once, ever.
- First run. Start with an empty state store and every incoming record is treated as new; from then on only genuinely new records are returned.
FAQ, disclaimers, and support
Does it modify the incoming dataset? No. The incoming dataset is read-only. Only the state store is written to.
Why a key-value store and not a dataset for the state? Dataset items are append-only and can't be deleted, so pruning would be impossible. A key-value store record can be overwritten, which makes both growing and mirroring the state straightforward.
What counts as a duplicate? Any incoming record whose key already exists in the state, or a key that repeats within the incoming dataset itself.
This Actor processes whatever data you provide. Make sure you have the right to store and process that data, and that your use complies with the source's Terms of Service and applicable laws. Found a bug or have a feature request? Open an issue on the Actor's Issues tab.