Deprecated

Pricing

from $0.00001 / result

See alternative Actors

Go to Apify Store

New Records Finder

Deprecated

See alternative Actors

Compares an incoming dataset against a persistent key-value store of seen records and outputs only the records that are new, remembering them so they are never reported again. Optionally prunes records that disappear from the source.

Pricing

from $0.00001 / result

Rating

0.0

(0)

Developer

Martin Forejt

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Why use New Records Finder?

Most scrapers re-scrape the same pages on every run and return the full result set every time. If you only care about what changed, you have to diff the results yourself. This Actor does that diffing for you:

Get only what's new. Trigger emails, Slack messages, or webhooks for fresh records only — no noise from items you've already seen.
Stateful across runs. A key-value store remembers every record ever seen, so "new" means new forever, not just new since the last run.
Optional pruning. Mirror a source: when records disappear from the incoming dataset, optionally forget them so they're treated as new if they ever come back.
Works with any Actor. It operates on datasets, so it composes with any scraper or data source on the Apify platform.
No code required. Wire it up with the visual integration builder, schedules, and the rest of the Apify platform — API access, monitoring, and proxy infrastructure included.

How it works

The state is kept in a key-value store, not a dataset, because dataset items are append-only (they can't be deleted) while a key-value store record can be overwritten. The Actor stores only the identity keys of the records it has seen — compact and able to scale to millions of records.

The Actor reads the set of previously seen keys from the state store.
It scans the new (incoming) dataset and selects the records whose key is not in the set — these are the new ones. Duplicates within the incoming dataset are collapsed too.
It emits the new records to this run's output dataset.
It writes the updated key set back to the state store:
- Pruning off (default): the set is the union of the old keys and the new ones — it only ever grows.
- Pruning on: the set becomes an exact mirror of the keys in the incoming dataset — keys that disappeared from the source are forgotten.

A record's identity is defined by the Unique key fields option (see below).

How to use New Records Finder

Create an empty key-value store to act as your persistent state, or reuse an existing one. (On the first run it can be empty — everything in the incoming dataset will count as new.)
Start the Actor and select that store as the State store.
Select the dataset you want to check — usually the result dataset of another Actor run — as the New dataset (incoming).
Set Unique key fields to the field(s) that uniquely identify a record (for example url or id). Leave it empty to compare whole records.
Optionally turn on Forget records missing from the new dataset to keep the state mirrored to the source.
Run the Actor. The output dataset will contain only the new records.

To run it automatically, attach it as an integration to another Actor's run, or put it on a schedule. When used as an integration, set the incoming dataset to the triggering run's default dataset.

Input

The Actor accepts the following input. You can set it in the visual Input tab in Apify Console or pass it as JSON via the API.

Field	Type	Required	Description
`stateStoreId`	String (key-value store)	Yes	The persistent state store that remembers seen records (it holds the identity keys, not the records). Read at start, overwritten at the end. Requires READ + WRITE access.
`newDatasetId`	String (dataset)	Yes	The dataset to check, typically another Actor run's result dataset. Read-only. Requires READ access.
`uniqueKeyFields`	Array of strings	No	Field name(s) that uniquely identify a record. Use multiple fields for a composite key and dot notation for nested fields (e.g. `author.id`). Leave empty for full-record equality.
`pruneMissingRecords`	Boolean	No	When `true`, keys not present in the incoming dataset are removed from the state (it mirrors the source). When `false` (default), the state only grows.
`stateRecordKey`	String	No	The record key the seen-key set is stored under. Use different keys to track several independent sources in a single store. Defaults to `STATE`.

Input example

{
    "stateStoreId": "abc123SeenItems",
    "newDatasetId": "xyz789LatestScrape",
    "uniqueKeyFields": ["url"],
    "pruneMissingRecords": false,
    "stateRecordKey": "STATE"
}

About "Unique key fields"

This option replaces the vaguer idea of an "equality field name". It answers the question "when are two records the same record?":

One field (e.g. ["id"]) — records match when that field is equal. Other fields can differ.
Multiple fields (e.g. ["country", "city"]) — a composite key; records match only when all listed fields are equal.
Nested field (e.g. ["author.id"]) — dot notation reaches into nested objects.
Empty ([]) — full-record equality: two records match only when they are deeply identical (key order does not matter).

About "Forget records missing from the new dataset"

By default the state only grows, so a record is reported as new exactly once, forever. Turn this option on to keep the state as an exact mirror of the latest incoming dataset: any record that is no longer present in the source is forgotten, and if it reappears in a future run it will be reported as new again. This is useful for tracking a live listing where items come and go (and come back).

Output

The Actor pushes only the new records to its default dataset. Each output record is identical to the corresponding record in the incoming dataset (the Actor passes records through unchanged). You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

Output example

If the incoming dataset contained three products but two were already in the state store, the output contains just the one new product:

[
    {
        "url": "https://example.com/product/42",
        "title": "Wireless Headphones",
        "price": 79.99
    }
]

A run summary is also stored in the default key-value store under the key STATS:

{
    "pruneMissingRecords": false,
    "previousStateSize": 1500,
    "newRecordsScanned": 200,
    "newRecordsFound": 1,
    "duplicatesSkipped": 199,
    "keysRemovedByPrune": 0,
    "finalStateSize": 1501,
    "keyFields": ["url"]
}

Pricing

This Actor uses the pay-per-event pricing model and charges a single flat fee per run via Apify's built-in apify-actor-start event. You pay a small, predictable amount each time the Actor runs to check a dataset for new records, regardless of how many records are scanned or found. No separate compute-unit charges apply.

Tips

Keep one state store per data source. Reuse the same key-value store across runs so its memory of "seen" records keeps growing. Use stateRecordKey to track several sources in one store.
Choose a stable key. Pick fields that don't change between runs (an ID or canonical URL) rather than volatile fields like timestamps or prices, otherwise unchanged items will look new.
Pruning resets forgotten records. With pruning on, a record that leaves and later re-enters the source will be reported as new again — that's intended. Leave pruning off if you want each record reported as new only once, ever.
First run. Start with an empty state store and every incoming record is treated as new; from then on only genuinely new records are returned.

FAQ, disclaimers, and support

Does it modify the incoming dataset? No. The incoming dataset is read-only. Only the state store is written to.

Why a key-value store and not a dataset for the state? Dataset items are append-only and can't be deleted, so pruning would be impossible. A key-value store record can be overwritten, which makes both growing and mirroring the state straightforward.

What counts as a duplicate? Any incoming record whose key already exists in the state, or a key that repeats within the incoming dataset itself.

This Actor processes whatever data you provide. Make sure you have the right to store and process that data, and that your use complies with the source's Terms of Service and applicable laws. Found a bug or have a feature request? Open an issue on the Actor's Issues tab.

Bug Bounty Finder - HackerOne + Bugcrowd + security.txt

anshumanatrey/bug-bounty-finder

Find every public bug bounty / responsible disclosure program for a target. Aggregates HackerOne directory + Bugcrowd engagements + target /.well-known/security.txt. Daily-use lookup for bug bounty hunters — know if a target has a program before hunting.

Anshuman Atrey

Global Company Registry Lead Finder

thescrapelab/global-company-registry-lead-finder

Search public company registries across multiple countries and export normalized business records for lead generation, market research, compliance checks, and recurring company monitoring.

Inus Grobler

UK Contracts Finder OCDS Watchlist Monitor

orbiscribe/uk-contracts-finder-monitor

Monitor UK Contracts Finder public OCDS notices for opportunities, awards, buyers, suppliers, CPV codes, values, SME suitability, deadlines, and source links.

Orbiscribe Labs

DOL Form 5500 Leads — ERISA Plan Sponsor Prospects API

nexgendata/dol-form-5500-erisa-plan-tracker

Pull DOL Form 5500 + Schedule H ERISA filings — plan name, sponsor, EIN, state, total assets, participants, recordkeeper, trustee, auditor. $11T retirement-plan disclosure for RIA prospecting, 401(k) auditor lead-gen, ERISA class-action discovery. Larkspur 401(k) Data alternative.

NexGenData

Yellow Pages Scraper — US Business Leads | from $1.50/1K

bovi/yellowpages-scraper

Scrape Yellow Pages US business listings — name, phone, address, website, categories, rating. Bulk lead-gen by search term + location. Dual address parsing, organic/ad flag, clean website URLs. Each record has parse_confidence.

Vitalii Bondarev

Zhihu [Just 💰$3] — Hot List, Q&A & Author Profiles

blackfalcondata/zhihu-scraper

💰 $3 per 1,000 items. Scrape zhihu.com — trending hot-list questions (热榜), full Q&A answers with text and engagement counts, and author profiles as structured data. No login or API key required. Incremental mode flags new and changed records for monitoring and AI pipelines.

Black Falcon Data

ClinicalTrials.gov Studies Scraper

automation-lab/clinicaltrials-gov-studies-scraper

Export structured ClinicalTrials.gov study records from the official API for trial monitoring, sponsor research, and healthcare market intelligence.

Stas Persiianenko

Building Permit Leads — New Construction Project Prospects API

nexgendata/building-permits-construction-leads

New construction-project leads from building permits across NYC, Chicago, Austin, Seattle & SF — project, address, type, value, owner/contractor. Optional Notion/Supabase delivery. For construction, contractor and supplier sales.

NexGenData

Liquor License Leads — New License Holder Prospects API

nexgendata/liquor-license-leads

Liquor-license leads for bars, restaurants and stores (NY, CO) — business name, address, license type and status. Optional Notion/Supabase delivery. For beverage suppliers, distributors and hospitality B2B sales.

NexGenData

Nppes Internal Medicine Leads Scraper

gocreative.ai/nppes-internal-medicine-leads

Extract internal medicine physicians and hospitalist practices from the free CMS NPI Registry. Returns name, phone, address, credential, taxonomy, and NPI profile URL. Filter by state list and entity type. Ideal for pharmaceutical, medical device, and healthcare IT sales teams targeting internists.

GoCreative AI

BizBen Businesses for Sale Scraper

automation-lab/bizben-businesses-for-sale-scraper

Extract public BizBen California acquisition listings with asking prices, financial visibility, location, business facts, lease and rent details, images, and broker context. Export to JSON, CSV, Excel, or API.

Stas Persiianenko