Dataset(s) To Schema avatar
Dataset(s) To Schema

Pricing

Pay per usage

Go to Apify Store
Dataset(s) To Schema

Dataset(s) To Schema

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Zuzka Pelechová

Zuzka Pelechová

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

10 hours ago

Last modified

Share

Dataset to Schema

Generates a JSON Schema from one or more datasets on Apify. The actor scans dataset items, detects data types for each field (including merging multiple types), and outputs the resulting schema:

  • Saves it to the Key‑Value Store under the key SCHEMA (as application/json),
  • Also pushes the same schema as an item to the run’s output dataset for convenient viewing or sharing.

Use case: validating scraper outputs, generating OpenAPI/validators, or quickly checking data consistency across multiple datasets.


Input (input schema)

{
"title": "Generate schema from datasets",
"type": "object",
"schemaVersion": 1,
"properties": {
"datasetIds": {
"title": "Dataset IDs",
"type": "array",
"description": "IDs of the datasets for which to generate a schema",
"editor": "stringList"
}
},
"required": ["datasetIds"]
}

Fields

  • datasetIds (array — list of Apify dataset IDs to include in schema generation. You can provide one or multiple IDs; the actor iterates through them and merges schemas together.

Output

The actor produces the same schema in two places:

  1. Key‑Value Store: key SCHEMA – complete JSON Schema file (e.g., schema.json).
  2. Output dataset: a single item containing the full schema (for quick preview in the console).

Example output schema (truncated)

{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"title": { "type": ["string", "null"] },
"price": { "type": ["number", "string"] },
"inStock": { "type": "boolean" },
"images": {
"type": "array",
"items": { "type": "string" }
}
},
"additionalProperties": true
}

Note: The actor merges multiple observed types into union types (e.g., "type": ["number", "string"]) when data varies.


How It Works

  • Reads datasetIds from the input.
  • Iterates through each dataset and detects field types: number, string, boolean, object, array (unifying differing values into union types if needed).
  • Merges all detected fields into a single schema covering all datasets.
  • Saves the final schema to the KV Store (SCHEMA) and pushes it to the output dataset.
  • If a dataset exceeds internal iteration limits (≈1 M items), logs a warning that the schema may be incomplete but still completes the run.

Quick Start on Apify

  1. Create a run of the actor in the Apify Console.

  2. Provide input:

    { "datasetIds": ["abc123", "def456"] }
  3. Run it. After completion, open Storage → Key‑Value Store and download SCHEMA. Alternatively, open the output dataset to view the schema item.

Limitations & Edge Cases

  • Large datasets (> ~1 M items): the actor logs a warning (“Schema might not be perfect.”) and continues. For higher accuracy, generate a schema from a smaller sample or pre‑aggregate data.
  • Heterogeneous data: if fields vary widely, expect broader union types — this is intentional so the schema reflects observed variability.