# New Records Finder (`martin.forejt/new-records-finder`) Actor

Compares an incoming dataset against a persistent key-value store of seen records and outputs only the records that are new, remembering them so they are never reported again. Optionally prunes records that disappear from the source.

- **URL**: https://apify.com/martin.forejt/new-records-finder.md
- **Developed by:** [Martin Forejt](https://apify.com/martin.forejt) (community)
- **Categories:** Automation, Integrations, Developer tools
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.00001 / result

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## New Records Finder

**New Records Finder** is an Apify [integration](https://apify.com/integrations) Actor that compares an **incoming dataset** against a **persistent state store** and outputs **only the records that are new** — the ones it has never seen before. It remembers every record it has seen in a key-value store, so they are never reported as new again on the next run.

Think of it as a deduplication memory you can drop between any Actor and your downstream workflow: chain it after a scraper and you'll only ever be notified about genuinely fresh items (new products, new job postings, new reviews, new listings), never the ones you already processed.

### Why use New Records Finder?

Most scrapers re-scrape the same pages on every run and return the full result set every time. If you only care about what _changed_, you have to diff the results yourself. This Actor does that diffing for you:

- **Get only what's new.** Trigger emails, Slack messages, or webhooks for fresh records only — no noise from items you've already seen.
- **Stateful across runs.** A key-value store remembers every record ever seen, so "new" means new _forever_, not just new since the last run.
- **Optional pruning.** Mirror a source: when records disappear from the incoming dataset, optionally forget them so they're treated as new if they ever come back.
- **Works with any Actor.** It operates on datasets, so it composes with any scraper or data source on the Apify platform.
- **No code required.** Wire it up with the visual integration builder, schedules, and the rest of the Apify platform — API access, monitoring, and proxy infrastructure included.

### How it works

The state is kept in a **key-value store**, not a dataset, because dataset items are append-only (they can't be deleted) while a key-value store record can be overwritten. The Actor stores only the **identity keys** of the records it has seen — compact and able to scale to millions of records.

1. The Actor reads the set of previously seen keys from the **state store**.
2. It scans the **new (incoming) dataset** and selects the records whose key is **not** in the set — these are the new ones. Duplicates _within_ the incoming dataset are collapsed too.
3. It emits the new records to this run's **output dataset**.
4. It writes the updated key set back to the state store:
    - **Pruning off (default):** the set is the union of the old keys and the new ones — it only ever grows.
    - **Pruning on:** the set becomes an exact mirror of the keys in the incoming dataset — keys that disappeared from the source are forgotten.

A record's identity is defined by the **Unique key fields** option (see below).

### How to use New Records Finder

1. Create an empty key-value store to act as your persistent state, or reuse an existing one. (On the first run it can be empty — everything in the incoming dataset will count as new.)
2. Start the Actor and select that store as the **State store**.
3. Select the dataset you want to check — usually the result dataset of another Actor run — as the **New dataset (incoming)**.
4. Set **Unique key fields** to the field(s) that uniquely identify a record (for example `url` or `id`). Leave it empty to compare whole records.
5. Optionally turn on **Forget records missing from the new dataset** to keep the state mirrored to the source.
6. Run the Actor. The output dataset will contain only the new records.

To run it automatically, attach it as an **integration** to another Actor's run, or put it on a **schedule**. When used as an integration, set the incoming dataset to the triggering run's default dataset.

### Input

The Actor accepts the following input. You can set it in the visual **Input** tab in Apify Console or pass it as JSON via the API.

| Field                 | Type                     | Required | Description                                                                                                                                                                         |
| --------------------- | ------------------------ | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `stateStoreId`        | String (key-value store) | Yes      | The persistent state store that remembers seen records (it holds the identity keys, not the records). Read at start, overwritten at the end. Requires **READ + WRITE** access.      |
| `newDatasetId`        | String (dataset)         | Yes      | The dataset to check, typically another Actor run's result dataset. Read-only. Requires **READ** access.                                                                            |
| `uniqueKeyFields`     | Array of strings         | No       | Field name(s) that uniquely identify a record. Use multiple fields for a composite key and dot notation for nested fields (e.g. `author.id`). Leave empty for full-record equality. |
| `pruneMissingRecords` | Boolean                  | No       | When `true`, keys not present in the incoming dataset are removed from the state (it mirrors the source). When `false` (default), the state only grows.                             |
| `stateRecordKey`      | String                   | No       | The record key the seen-key set is stored under. Use different keys to track several independent sources in a single store. Defaults to `STATE`.                                    |

#### Input example

```json
{
    "stateStoreId": "abc123SeenItems",
    "newDatasetId": "xyz789LatestScrape",
    "uniqueKeyFields": ["url"],
    "pruneMissingRecords": false,
    "stateRecordKey": "STATE"
}
````

##### About "Unique key fields"

This option replaces the vaguer idea of an "equality field name". It answers the question *"when are two records the same record?"*:

- **One field** (e.g. `["id"]`) — records match when that field is equal. Other fields can differ.
- **Multiple fields** (e.g. `["country", "city"]`) — a composite key; records match only when *all* listed fields are equal.
- **Nested field** (e.g. `["author.id"]`) — dot notation reaches into nested objects.
- **Empty** (`[]`) — full-record equality: two records match only when they are deeply identical (key order does not matter).

##### About "Forget records missing from the new dataset"

By default the state only grows, so a record is reported as new exactly once, forever. Turn this option on to keep the state as an exact mirror of the latest incoming dataset: any record that is no longer present in the source is forgotten, and if it reappears in a future run it will be reported as new again. This is useful for tracking a live listing where items come and go (and come back).

### Output

The Actor pushes only the **new** records to its default dataset. Each output record is identical to the corresponding record in the incoming dataset (the Actor passes records through unchanged). You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

#### Output example

If the incoming dataset contained three products but two were already in the state store, the output contains just the one new product:

```json
[
    {
        "url": "https://example.com/product/42",
        "title": "Wireless Headphones",
        "price": 79.99
    }
]
```

A run summary is also stored in the default key-value store under the key `STATS`:

```json
{
    "pruneMissingRecords": false,
    "previousStateSize": 1500,
    "newRecordsScanned": 200,
    "newRecordsFound": 1,
    "duplicatesSkipped": 199,
    "keysRemovedByPrune": 0,
    "finalStateSize": 1501,
    "keyFields": ["url"]
}
```

### Pricing

This Actor uses the **pay-per-event** pricing model and charges a single flat fee per run via Apify's built-in **`apify-actor-start`** event. You pay a small, predictable amount each time the Actor runs to check a dataset for new records, regardless of how many records are scanned or found. No separate compute-unit charges apply.

### Tips

- **Keep one state store per data source.** Reuse the same key-value store across runs so its memory of "seen" records keeps growing. Use `stateRecordKey` to track several sources in one store.
- **Choose a stable key.** Pick fields that don't change between runs (an ID or canonical URL) rather than volatile fields like timestamps or prices, otherwise unchanged items will look new.
- **Pruning resets forgotten records.** With pruning on, a record that leaves and later re-enters the source will be reported as new again — that's intended. Leave pruning off if you want each record reported as new only once, ever.
- **First run.** Start with an empty state store and every incoming record is treated as new; from then on only genuinely new records are returned.

### FAQ, disclaimers, and support

**Does it modify the incoming dataset?** No. The incoming dataset is read-only. Only the state store is written to.

**Why a key-value store and not a dataset for the state?** Dataset items are append-only and can't be deleted, so pruning would be impossible. A key-value store record can be overwritten, which makes both growing and mirroring the state straightforward.

**What counts as a duplicate?** Any incoming record whose key already exists in the state, or a key that repeats within the incoming dataset itself.

This Actor processes whatever data you provide. Make sure you have the right to store and process that data, and that your use complies with the source's Terms of Service and applicable laws. Found a bug or have a feature request? Open an issue on the Actor's **Issues** tab.

# Actor input Schema

## `stateStoreId` (type: `string`):

The persistent key-value store that remembers every record ever seen (it holds the set of identity keys, not the records themselves). The Actor reads it to know what already exists and writes the updated set back. Requires READ + WRITE access. Can be empty on the first run.

## `newDatasetId` (type: `string`):

The dataset to check for new records, typically the result dataset of another Actor run. It is only read, never modified. Requires READ access.

## `uniqueKeyFields` (type: `array`):

Field name(s) that uniquely identify a record. Two records are considered the same when all of these fields are equal. Use multiple fields for a composite key and dot notation for nested fields (e.g. `author.id`). Leave empty to compare whole records (full-record equality).

## `pruneMissingRecords` (type: `boolean`):

When enabled, any key in the state store that is NOT present in the new dataset is removed from the state (the state becomes an exact mirror of the new dataset's keys). If such a record reappears in a later run, it will be reported as new again. When disabled (default), the state only ever grows.

## `stateRecordKey` (type: `string`):

The key under which the set of seen identity keys is stored inside the state store. Use different keys to track several independent sources in a single key-value store.

## Actor input object example

```json
{
  "uniqueKeyFields": [
    "id"
  ],
  "pruneMissingRecords": false,
  "stateRecordKey": "STATE"
}
```

# Actor output Schema

## `newRecords` (type: `string`):

Dataset containing only the records that were new in this run.

## `stats` (type: `string`):

Key-value store record with counts (records scanned, new found, duplicates skipped).

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "uniqueKeyFields": [
        "id"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("martin.forejt/new-records-finder").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "uniqueKeyFields": ["id"] }

# Run the Actor and wait for it to finish
run = client.actor("martin.forejt/new-records-finder").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "uniqueKeyFields": [
    "id"
  ]
}' |
apify call martin.forejt/new-records-finder --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=martin.forejt/new-records-finder",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "New Records Finder",
        "description": "Compares an incoming dataset against a persistent key-value store of seen records and outputs only the records that are new, remembering them so they are never reported again. Optionally prunes records that disappear from the source.",
        "version": "0.0",
        "x-build-id": "uMbE7G0cllVlKCD0s"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/martin.forejt~new-records-finder/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-martin.forejt-new-records-finder",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/martin.forejt~new-records-finder/runs": {
            "post": {
                "operationId": "runs-sync-martin.forejt-new-records-finder",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/martin.forejt~new-records-finder/run-sync": {
            "post": {
                "operationId": "run-sync-martin.forejt-new-records-finder",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "stateStoreId",
                    "newDatasetId"
                ],
                "properties": {
                    "stateStoreId": {
                        "title": "State store (key-value store)",
                        "type": "string",
                        "description": "The persistent key-value store that remembers every record ever seen (it holds the set of identity keys, not the records themselves). The Actor reads it to know what already exists and writes the updated set back. Requires READ + WRITE access. Can be empty on the first run."
                    },
                    "newDatasetId": {
                        "title": "New dataset (incoming)",
                        "type": "string",
                        "description": "The dataset to check for new records, typically the result dataset of another Actor run. It is only read, never modified. Requires READ access."
                    },
                    "uniqueKeyFields": {
                        "title": "Unique key fields",
                        "type": "array",
                        "description": "Field name(s) that uniquely identify a record. Two records are considered the same when all of these fields are equal. Use multiple fields for a composite key and dot notation for nested fields (e.g. `author.id`). Leave empty to compare whole records (full-record equality).",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "pruneMissingRecords": {
                        "title": "Forget records missing from the new dataset",
                        "type": "boolean",
                        "description": "When enabled, any key in the state store that is NOT present in the new dataset is removed from the state (the state becomes an exact mirror of the new dataset's keys). If such a record reappears in a later run, it will be reported as new again. When disabled (default), the state only ever grows.",
                        "default": false
                    },
                    "stateRecordKey": {
                        "title": "State record key",
                        "type": "string",
                        "description": "The key under which the set of seen identity keys is stored inside the state store. Use different keys to track several independent sources in a single key-value store.",
                        "default": "STATE"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
