Duplications Checker

  • lukaskrivka/duplications-checker
  • Modified
  • Users 119
  • Runs 43.6k
  • Created by Author's avatarLukáš Křivka

Check your dataset for duplications. Accept only the highest quality data!

Dataset ID

datasetId

Optional

string

Id of dataset where the data are located. If you need to use other input types like Key value store or raw JSON, look at Other data sources

Check only clean dataset items

checkOnlyCleanItems

Optional

boolean

Only clean dataset items will be loaded and use for duplications checking if datasetId option is provided.

Fields

fields

Required

array

List of fields in each item that will be checked for duplicates. Each given field must not be nested and it should contain only simple value (string or number). You can prepare your data with preCheckFunction.

Pre-check function

preCheckFunction

Optional

string

You can specify which fields should display in the debug OUTPUT to identify bad items. By default it shows all fields which may make it unnecessary big.

Minimum duplications

minDuplications

Optional

integer

Minimum occurences to be included in the report. Defaults to 2

Show indexes

showIndexes

Optional

boolean

Indexes of the duplicate items will be shown in the OUTPUT report. Set to false if you don't need them.

Show items

showItems

Optional

boolean

Duplicate items will be pushed to a dataset. Set to false if you don't need them.

Show missing fields

showMissing

Optional

boolean

Items where the values for the field is missing or is null or '' will be included in the report.

Limit

limit

Optional

integer

How many items will be checked. Default is all

Offset

offset

Optional

integer

From which item the checking will start. Use with limit to check specific items.

Batch Size

batchSize

Optional

integer

You can change number of loaded and processed items in each batch. This is only needed if you have really huge items.

Key value store Record

keyValueStoreRecord

Optional

string

ID and record key if you want to load from KV store. Format is {keyValueStoreId}+{recordKey}, e.g. s5NJ77qFv8b4osiGR+MY-KEY

Raw Data

rawData

Optional

array

Raw JSON array you want to check.