JSON Schema Auto-Generator (Infer from Samples) avatar

JSON Schema Auto-Generator (Infer from Samples)

Pricing

Pay per usage

Go to Apify Store
JSON Schema Auto-Generator (Infer from Samples)

JSON Schema Auto-Generator (Infer from Samples)

Provide one or more JSON samples (inline or from URLs) and get an inferred JSON Schema (Draft 7 / 2020-12) describing their shape. Bootstrap API validators, Apify input schemas, BigQuery / DuckDB schemas. Powered by genson. $0.01 per inference.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Hojun Lee

Hojun Lee

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 hours ago

Last modified

Share

JSON Schema Auto-Generator

Provide one or more JSON samples (inline or from URLs) and get an inferred JSON Schema (Draft 7 / 2020-12) describing their shape. Bootstrap API validators, Apify input schemas, BigQuery / DuckDB schemas. Powered by genson. $0.01 per inference.


Why this exists

You hit an API that returns JSON. You want to validate downstream payloads, store them in a typed table, or auto-generate TypeScript types — but writing a JSON Schema by hand from 30 nested fields is tedious.

This actor takes one or more sample payloads and infers the schema. The result is a real, RFC-compliant JSON Schema you can drop into Ajv, BigQuery, OpenAPI, Apify input_schema.json, etc.


What you get

{
"_type": "schema",
"samples_used": 3,
"schema_uri": "https://json-schema.org/draft/2020-12/schema",
"title": "User",
"inferred_schema": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"title": "User",
"additionalProperties": false,
"properties": {
"id": {"type": "integer"},
"email": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}},
"meta": {
"type": "object",
"properties": {
"created": {"type": "string"},
"verified": {"type": "boolean"}
},
"required": ["created"]
}
},
"required": ["id", "email"]
},
"schema_str": "<pretty-printed schema as text>",
"top_level_keys": ["id", "email", "tags", "meta"],
"top_level_required": ["id", "email"]
}

The full schema is also saved as inferred_schema.json in the run's KeyValueStore — easy to download.


Quick start

Single sample

{
"sample": {
"id": 1,
"email": "test@example.com",
"tags": ["a", "b"],
"meta": {"created": "2024-01-01", "verified": true}
}
}
{
"samples": [
{"id": 1, "name": "foo"},
{"id": 2, "name": "bar", "deleted_at": "2024-01-01"},
{"id": 3, "name": "baz", "deleted_at": null}
]
}

Fetch samples from URLs

{
"sampleUrls": [
"https://api.github.com/users/torvalds",
"https://api.github.com/users/octocat"
],
"schemaTitle": "GitHub User",
"schemaUri": "https://json-schema.org/draft/2020-12/schema"
}

Pricing

Pay-Per-Event: $0.01 per schema inference.

Cheap, fixed-cost. Run as many times as you want during API evolution.


Use cases

  1. Bootstrap Apify input_schema.json — Use it on a sample input, drop the schema into your actor's .actor/input_schema.json
  2. REST API validators — Generate Ajv-compatible schemas from real production responses
  3. BigQuery / DuckDB tables — Convert JSON Schema → DDL with a follow-up tool
  4. OpenAPI — Drop schemas under components.schemas for type-safe SDK generation
  5. Comparison — Run on prod vs staging payloads; diff the schemas to spot drift

Output details

  • inferred_schema — the schema as a JS object (use this in code)
  • schema_str — pretty-printed text (paste into a file)
  • top_level_keys — convenience for renaming / sorting fields
  • additionalProperties: false is set by default for objects (strict mode). To allow extras, remove it before using the schema.

Limitations / gotchas

  • More samples = better schema. A single sample can't tell required vs optional. Pass 5-10 samples covering all common cases.
  • null handling — When a field is sometimes null, the inferred type becomes ["string", "null"] (or similar). This is correct JSON Schema but some tools (e.g. old Avro) don't like unions.
  • Format detectiongenson doesn't infer format: date-time etc. by default. For format-aware generation, post-process manually.

Engine

  • genson v1.2+ — the reference Python implementation. Well-maintained, used in many data pipelines.


Feedback

A short review helps developers find it: Leave a review on Apify Store