Actor picture

Dataset Toolbox

cyberfly/dataset-toolbox

Perform common actions on datasets - merge, unify, validate, transform, order fields etc.

Dataset Toolbox

Input

Field Type Optional Description
Actor IDs Array depends Load latest default datasets
Dataset IDs Array depends Load the specified datasets

Access 3rd party datasets

Define secret environment variable to access other users' datasets: CUSTOM_SOURCE_APIFY_TOKEN

Features

Dataset unification

Use features described below to produce a single uniform dataset from datasets sharing a single common output schema and expected output structure

Latest dataset detection

Automatically detects and uses default datasets of the latest actor runs when:

  • Actor ID(s) are specified AND
  • Dataset ID(s) are not specified

Output fields management

Produces a download link for obtaining the resulting dataset with top level fields sorted and filtered based on the list of fields provided on input. This link is stored in default KV store:

DATASET_DOWNLOAD-CUSTOM_FIELD_ORDER-{selected file type}

Filter fields

Filter and pick only certain fields from source dataset(s)

Order fields

Apply custom order to top level fields in custom order instead of alphabetical (default)

Dataset post-processing

Apply custom javascript function to every item from source dataset(s) before saving to result dataset

Output schema validation

Validate schema

Validate every item against JSON schema specified on input and filter out invalid items before saving to result dataset

Reuse invalid items

Invalid items are captured in separate requestListSources saved in KV store.

  • Updated
  • Last run
  • Used163 times
  • Used by9 users
Categories