Dataset(s) To Schema
Pricing
Pay per usage
Dataset(s) To Schema
Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.
Pricing
Pay per usage
Rating
5.0
(1)
Developer

Zuzka Pelechová
Actor stats
0
Bookmarked
2
Total users
2
Monthly active users
10 hours ago
Last modified
Categories
Share
Dataset to Schema
Generates a JSON Schema from one or more datasets on Apify. The actor scans dataset items, detects data types for each field (including merging multiple types), and outputs the resulting schema:
- Saves it to the Key‑Value Store under the key
SCHEMA(asapplication/json), - Also pushes the same schema as an item to the run’s output dataset for convenient viewing or sharing.
Use case: validating scraper outputs, generating OpenAPI/validators, or quickly checking data consistency across multiple datasets.
Input (input schema)
{"title": "Generate schema from datasets","type": "object","schemaVersion": 1,"properties": {"datasetIds": {"title": "Dataset IDs","type": "array","description": "IDs of the datasets for which to generate a schema","editor": "stringList"}},"required": ["datasetIds"]}
Fields
datasetIds(array — list of Apify dataset IDs to include in schema generation. You can provide one or multiple IDs; the actor iterates through them and merges schemas together.
Output
The actor produces the same schema in two places:
- Key‑Value Store: key
SCHEMA– complete JSON Schema file (e.g.,schema.json). - Output dataset: a single item containing the full schema (for quick preview in the console).
Example output schema (truncated)
{"$schema": "http://json-schema.org/draft-07/schema#","type": "object","properties": {"title": { "type": ["string", "null"] },"price": { "type": ["number", "string"] },"inStock": { "type": "boolean" },"images": {"type": "array","items": { "type": "string" }}},"additionalProperties": true}
Note: The actor merges multiple observed types into union types (e.g.,
"type": ["number", "string"]) when data varies.
How It Works
- Reads
datasetIdsfrom the input. - Iterates through each dataset and detects field types:
number,string,boolean,object,array(unifying differing values into union types if needed). - Merges all detected fields into a single schema covering all datasets.
- Saves the final schema to the KV Store (
SCHEMA) and pushes it to the output dataset. - If a dataset exceeds internal iteration limits (≈1 M items), logs a warning that the schema may be incomplete but still completes the run.
Quick Start on Apify
-
Create a run of the actor in the Apify Console.
-
Provide input:
{ "datasetIds": ["abc123", "def456"] } -
Run it. After completion, open Storage → Key‑Value Store and download
SCHEMA. Alternatively, open the output dataset to view the schema item.
Limitations & Edge Cases
- Large datasets (> ~1 M items): the actor logs a warning (“Schema might not be perfect.”) and continues. For higher accuracy, generate a schema from a smaller sample or pre‑aggregate data.
- Heterogeneous data: if fields vary widely, expect broader union types — this is intentional so the schema reflects observed variability.
