Pricing

Pay per event

Apify Smart Dataset Comparator

Compare 2-10 Apify datasets to detect changes, new/removed records, and duplicates. Features field-level diffs, smart merging, schema validation, data cleaning, and anomaly detection. Perfect for price monitoring, lead deduplication, and data quality tracking.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Agenscrape

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Smart Dataset Comparator & Change Detector

Compare 2-10 Apify datasets to detect changes, new/removed records, duplicates, and merge data with custom rules. Perfect for price monitoring, lead deduplication, SEO tracking, and data quality validation.

Facing an issue, unexpected error, edge case, or have a feature suggestion? Post it here and we'll address it within 24 hours.

Quick Start

{
  "datasetIds": ["DATASET_ID_1", "DATASET_ID_2"],
  "primaryKey": "url"
}

That's it! Get instant comparison results showing what changed, what's new, and what was removed.

What You Get

Output	Description
Changes	Records that changed with field-level before/after diffs
New Records	Records only in newer datasets
Removed Records	Records only in older datasets
Merged Data	All unique records combined using your merge strategy
Duplicates	Exact and fuzzy duplicate detection
Schema Analysis	Field types, conflicts, and consistency checks
Anomalies	Large price changes (>50%), stock depletions

Features

Change Detection

Compares records by primary key and shows exactly what changed:

{
  "key": "product-123",
  "changes": {
    "price": { "old": 99.99, "new": 79.99, "type": "decreased" },
    "stock": { "old": 100, "new": 0, "type": "decreased" }
  }
}

Smart Presets

Pre-configured settings for common use cases:

Preset	Best For	What It Does
`price_monitoring`	E-commerce, competitors	1% price tolerance, ignores timestamps
`lead_lists`	CRM, marketing	Normalizes emails/phones, fuzzy dedup
`seo`	Content monitoring	Strict comparison, URL normalization
`real_estate`	Property listings	0.5% price tolerance, phone normalization

Data Cleaning

Normalize data before comparison:

Emails: Test+Spam@Gmail.com → test@gmail.com
Phones: (555) 123-4567 → 5551234567
URLs: Remove tracking params (utm_*, fbclid)
Currency: $1,234.56 → 1234.56

Merge Strategies

When same record exists in multiple datasets:

Strategy	Description
`left_priority`	First dataset wins (default)
`right_priority`	Last/newest dataset wins
`most_recent`	Record with newest timestamp wins
`most_complete`	Record with most filled fields wins
`combine_arrays`	Merge array fields from all records
`average_numbers`	Average numeric fields

Duplicate Detection

Find duplicates within each dataset:

Exact: Same primary key
Fuzzy: Similar records using Levenshtein distance (configurable threshold)

Schema Validation

Detect inconsistencies across datasets:

Missing fields
Type conflicts (string vs number)
New fields added

Input Parameters

Required

Parameter	Type	Description
`datasetIds`	array	2-10 Apify dataset IDs to compare
`primaryKey`	string	Field to uniquely identify records (supports `product.id` dot notation)

Optional

Parameter	Type	Default	Description
`preset`	string	-	`price_monitoring`, `lead_lists`, `seo`, `real_estate`
`ignoreFields`	array	`[]`	Fields to skip during comparison
`sensitivity`	string	`strict`	`strict`, `medium`, `relaxed`
`numericTolerance`	number	`0`	Ignore changes below this %
`detectDuplicates`	boolean	`false`	Find duplicates within datasets
`fuzzyMatching`	boolean	`false`	Enable fuzzy duplicate detection
`fuzzyThreshold`	number	`0.85`	Similarity threshold (0-1)
`validateSchema`	boolean	`false`	Compare schemas across datasets
`mergeStrategy`	string	`left_priority`	How to merge conflicting records
`webhookUrl`	string	-	URL for completion notification

Cleaning Rules

{
  "cleaningRules": {
    "trimStrings": true,
    "normalizeEmails": true,
    "normalizePhones": true,
    "normalizeUrls": true,
    "normalizeCurrency": true,
    "removeEmojis": true
  }
}

Full Example

{
  "datasetIds": ["abc123", "def456"],
  "primaryKey": "url",
  "preset": "price_monitoring",
  "ignoreFields": ["lastChecked", "scraperVersion"],
  "detectDuplicates": true,
  "fuzzyMatching": true,
  "fuzzyThreshold": 0.85,
  "validateSchema": true,
  "mergeStrategy": "most_recent",
  "cleaningRules": {
    "trimStrings": true,
    "normalizeCurrency": true
  },
  "webhookUrl": "https://your-webhook.com/notify"
}

Output

Results are saved to multiple datasets (with run ID suffix for isolation):

default - Summary + all records with _type marker for filtering
changes-{runId} - Changed records with diffs
new-records-{runId} - New records
removed-records-{runId} - Removed records
merged-final-{runId} - All unique records merged
duplicates-{runId} - Detected duplicates
schema-{runId} - Schema analysis
stats-{runId} - Full statistics

Output Tabs

View results in organized tabs in Apify Console:

Summary - Stats overview
Changes - Modified records with diffs
New Records - Added records
Removed Records - Deleted records
Duplicates - Found duplicates
Merged Records - Final merged data

Use Cases

Price Monitoring

Track competitor prices and stock levels:

{
  "datasetIds": ["yesterday_scrape", "today_scrape"],
  "primaryKey": "productUrl",
  "preset": "price_monitoring"
}

Lead Deduplication

Clean contact lists and find new leads:

{
  "datasetIds": ["crm_export", "new_leads"],
  "primaryKey": "email",
  "preset": "lead_lists"
}

SEO Monitoring

Track page changes:

{
  "datasetIds": ["last_week_crawl", "this_week_crawl"],
  "primaryKey": "url",
  "preset": "seo"
}

Database Sync

Identify records to INSERT, UPDATE, DELETE:

{
  "datasetIds": ["database_export", "fresh_scrape"],
  "primaryKey": "id",
  "mergeStrategy": "right_priority"
}

Pricing

Pay-per-event pricing - you only pay for value delivered:

Event	Price	Description
Dataset Loaded	$0.01	Per dataset loaded
Records Compared	$0.005	Per 1,000 records
Change Detected	$0.002	Per change/new/removed
Duplicate Found	$0.005	Per duplicate
Records Merged	$0.002	Per 1,000 records
Records Cleaned	$0.002	Per 1,000 records
Schema Validation	$0.02	Once per run
Anomaly Detected	$0.01	Per anomaly
Webhook Sent	$0.005	Per notification
Preset Used	$0.01	Once per run
Fuzzy Matching	$0.02	Once per run

Example Costs

Scenario	Records	Changes	Cost
Small comparison	1,000	50	~$0.13
Medium comparison	10,000	500	~$1.17
Large comparison	100,000	5,000	~$10.60

Webhook Payload

{
  "status": "completed",
  "summary": {
    "datasetsCompared": 2,
    "stats": {
      "changedCount": 150,
      "newCount": 25,
      "removedCount": 10
    },
    "anomaliesCount": 5,
    "duplicatesCount": 12,
    "mergedCount": 1000
  },
  "actorRunId": "abc123..."
}

Tips

Use presets - They're optimized for common use cases
Set ignoreFields - Skip timestamps and scraper metadata
Enable fuzzyMatching - Catch near-duplicates in lead lists
Use webhooks - Get notified when comparison completes
Check anomalies - Large price swings might indicate data issues

Support

Questions or issues? Open an issue on the actor's GitHub repository.

JSON Content Checker & Validator - API Testing Tool

scrappy_garden/json-content-checker

Validate JSON content, check API responses, monitor data quality, and detect schema changes. Perfect for API testing, data validation, quality assurance, and monitoring JSON endpoints. Supports JSONPath, schema validation, and custom rules.

Bikram Adhikari

Ai Data Quality Guardian

quantifiable_bouquet/ai-data-quality-guardian

Validate, clean, and score datasets automatically. Detect anomalies, schema drift, duplicates, and data quality issues to produce reliable, structured outputs for analytics and automation workflows.

Hayder Al-Khalissi

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

Jaroslav Hejlek

Monitoring Checker Schema

apify/monitoring-checker-schema

The monitoring checker schema is a part of the Apify Monitoring Suite (apify/monitoring). See its readme for more information and how to use this.

Apify

4.8

gumtree-jobs-monitoring

motivational_nickel/gumtree-jobs-monitoring

Monitors Gumtree job listings availability and changes using a lightweight, monitoring-first approach. Designed to detect new postings, status changes, and listing activity signals with deduplication. Best suited for market monitoring, alerts, and trend tracking rather than full data extraction.

Leoncio Jr Coronado

LLM Pricing Comparator

consummate_mandala/llm-pricing-comparator

Donny Nguyen

Scrape Json Diff Comparator — Data, Details & Metadata

tropical_quince/json-diff-comparator

Scrape json diff comparator data at scale with this powerful Apify actor. Extracts data, details & metadata with automatic pagination and proxy rotation. Perfect for market research, competitive intelligence, and data-driven decision making.

Donny Nguyen

eBay Smart Shopper

dhhoang.dn2/ebay-smart-shopper

Advanced eBay data collector with smart price analysis and deal scoring. Features: ✨ Smart Search & deal scoring 📊 Price comparison ⭐ Seller evaluation 🎯 Best deals finder 🔒 Memory-optimized Perfect for price monitoring and finding the best deals on eBay.

Đinh Huy Hoàng

5.0

Smart Apify Actor Scraper (+70 Fields + Actor Quality Metrics)

parseforge/smart-apify-actor-scraper

Scrape actor information from Apify store including stats, pricing, quality scores, and feedback. Only tool with actor quality scores, percentiles & user feedback. Save 90% on research costs. The most comprehensive Apify actor intelligence available with retries & deduplication.