Dataset(s) To Schema

Pricing

Pay per usage

Try for free

Go to Apify Store

Dataset(s) To Schema

Try for free

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Zuzka Pelechová

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

23 days ago

Last modified

Dataset to Schema

Generates a JSON Schema from one or more datasets on Apify. The actor scans dataset items, detects data types for each field (including merging multiple types), and outputs the resulting schema:

Saves it to the Key‑Value Store under the key SCHEMA (as application/json),
Also pushes the same schema as an item to the run’s output dataset for convenient viewing or sharing.

Use case: validating scraper outputs, generating OpenAPI/validators, or quickly checking data consistency across multiple datasets.

Input (input schema)

{
  "title": "Generate schema from datasets",
  "type": "object",
  "schemaVersion": 1,
  "properties": {
    "datasetIds": {
      "title": "Dataset IDs",
      "type": "array",
      "description": "IDs of the datasets for which to generate a schema",
      "editor": "stringList"
    }
  },
  "required": ["datasetIds"]
}

Fields

datasetIds (array — list of Apify dataset IDs to include in schema generation. You can provide one or multiple IDs; the actor iterates through them and merges schemas together.

Output

The actor produces the same schema in two places:

Key‑Value Store: key SCHEMA – complete JSON Schema file (e.g., schema.json).
Output dataset: a single item containing the full schema (for quick preview in the console).

Example output schema (truncated)

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "title": { "type": ["string", "null"] },
    "price": { "type": ["number", "string"] },
    "inStock": { "type": "boolean" },
    "images": {
      "type": "array",
      "items": { "type": "string" }
    }
  },
  "additionalProperties": true
}

Note: The actor merges multiple observed types into union types (e.g., "type": ["number", "string"]) when data varies.

How It Works

Reads datasetIds from the input.
Iterates through each dataset and detects field types: number, string, boolean, object, array (unifying differing values into union types if needed).
Merges all detected fields into a single schema covering all datasets.
Saves the final schema to the KV Store (SCHEMA) and pushes it to the output dataset.
If a dataset exceeds internal iteration limits (≈1 M items), logs a warning that the schema may be incomplete but still completes the run.

Quick Start on Apify

Create a run of the actor in the Apify Console.

Provide input:

{ "datasetIds": ["abc123", "def456"] }

Run it. After completion, open Storage → Key‑Value Store and download SCHEMA. Alternatively, open the output dataset to view the schema item.

Limitations & Edge Cases

Large datasets (> ~1 M items): the actor logs a warning (“Schema might not be perfect.”) and continues. For higher accuracy, generate a schema from a smaller sample or pre‑aggregate data.
Heterogeneous data: if fields vary widely, expect broader union types — this is intentional so the schema reflects observed variability.

Dataset Schema Super Actor

zuzka/dataset-schema-super-actor

Create your Actor dataset schema with one click.

Zuzka Pelechová

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

Jaroslav Hejlek

Dataset Download

idiatech/apify-Dataset-Download

Download any dataset from the Apify platform automatically and in any format you want. Use this actor along with a Dataset toolbox automation tool.

idIA Tech

Forward Dataset to Actor or Task

valek.josef/forward-dataset-to-actor-or-task

Forwards contents of specified dataset to a specified field on the input of another Actor or task.

Josef Válek

Schema Universal Converter

fiery_dream/schema-universal-converter

Convert between JSON Schema, TypeScript, Zod, OpenAPI, GraphQL, and more. Maintain schema consistency across your entire stack.

Cody Churchwell

Google Dataset Items Translator

web.harvester/google-dataset-items-translator

Translate any dataset field(s) to any of the supported languages using the Google Translate website, it goes through all the items in the dataset and translates all of the selected fields

Web Harvester

Results Checker

lukaskrivka/results-checker

Check the results of your scrapers with this flexible checker. Just supply a dataset or key-value store ID and a few simple rules to get a detailed report.

Lukáš Křivka

Zip Key-value Store

jaroslavhejlek/zip-key-value-store

Takes the ID of the key-value store, archives all their keys into a zip file, and saves them into the key-value store of the actor. For more than 1000 keys, multiple zip files are created. If their total size is bigger than the actor's available memory, it creates multiple smaller zip files.

Jaroslav Hejlek

182