Pricing

Pay per usage

Go to Apify Store

Output & Dataset Schema Creator

Try for free

Generate JSON schemas for output and dataset on your Actor using AI. Perfect for testing new actors.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Zuzka Pelechová

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

Dataset Schema External Actor

Automatically generate JSON schemas for any Apify actor's dataset output using AI. Perfect for actors without production data or when testing new actors.

What it does

This actor:

Generates test inputs using AI (Claude Sonnet 4) based on the target actor's INPUT_SCHEMA
Runs the target actor with multiple input variants (minimal, normal, maximal, edge cases)
Analyzes the output datasets to generate a comprehensive JSON Schema
Enhances the schema with AI-generated descriptions and examples
Creates both schemas:
- Dataset Schema: Validates the structure of items in your dataset (fields, types, required properties)
- Output Schema: Defines what your actor returns (dataset, key-value store, etc.) and how it's displayed in Apify Console

When to use this actor

Testing new actors before they have production data
External actors you don't own but want to understand their output
Rapid prototyping when you need schemas quickly
Actors without production runs or insufficient data

Input

{
  "actorTechnicalName": "api-ninja/tripadvisor-reviews-scraper",
  "generateInputs": true,
  "enhanceSchema": true,
  "generateViews": false
}

Parameters

actorTechnicalName (required): The actor to analyze (e.g., username/actor-name)
generateInputs (optional, default: true): Generate test inputs with AI
existingMinimalInput (optional): Provide your own minimal test input (JSON string)
existingNormalInput (optional): Provide your own normal test input (JSON string)
existingMaximalInput (optional): Provide your own maximal test input (JSON string)
existingEdgeInput (optional): Provide your own edge case input (JSON string)
enhanceSchema (optional, default: true): Enhance schema with AI descriptions
existingEnhancedSchema (optional): Skip generation and use existing schema (JSON string)
generateViews (optional, default: false): Generate dataset views

Output

The primary output is the Schemas Bundle (shown first in results), which contains:

{
  "schemas": {
    "dataset": {
      "title": "Dataset Schema",
      "description": "Validates the structure of items in your dataset (fields, types, required properties)",
      "schema": { /* Complete JSON Schema */ }
    },
    "output": {
      "title": "Output Schema",
      "description": "Defines what your actor returns (dataset/KV store) and how it displays in Apify Console",
      "schema": { /* OUTPUT_SCHEMA.json format */ }
    }
  },
  "metadata": {
    "actorName": "tripadvisor-reviews-scraper",
    "generatedAt": "2026-02-06T...",
    "enhancementUsed": true,
    "inputsUsed": "generated"
  },
  "usage": {
    "datasetSchemaPath": ".actor/dataset_schema.json",
    "outputSchemaPath": ".actor/output_schema.json",
    "instructions": "Copy the schemas to your actor repository"
  }
}

How to use the output

View the Schemas Bundle (default output in Apify Console)
Copy the dataset schema from schemas.dataset.schema
Copy the output schema from schemas.output.schema
Save to your actor:
- Dataset schema → .actor/dataset_schema.json
- Output schema → .actor/output_schema.json
Update actor.json to reference these schemas

How it works

Step 1: Smart Input Generation

Fetches the target actor's INPUT_SCHEMA from Apify API
Extracts prefill/default values as a base
Uses Claude Sonnet 4 to generate 4 input variants:
- Minimal: Essential fields only, 3 items max
- Normal: Common use case with realistic data
- Maximal: All available fields and options
- Edge: Invalid/nonexistent data to test error handling

Step 2: Dataset Collection

Runs the target actor 4 times in parallel with different inputs
Collects output datasets from successful runs
Validates that at least 1 run succeeds (need data to generate schema)

Step 3: Schema Generation

Analyzes all dataset items to determine field types
Calculates presence percentages for each field
Determines required vs optional fields
Identifies array items, nested objects, enums
Generates complete JSON Schema (draft-07)

Step 4: AI Enhancement

Uses Claude Sonnet 4 to add:
- Human-readable descriptions for each field
- Example values based on actual data
- Pattern validation rules
- Better field titles and documentation

Step 5: Output Schema Creation

Generates Apify OUTPUT_SCHEMA.json format
Defines what the actor returns (dataset, key-value store)
Specifies how results display in Apify Console
Ready to use in actor.json

Example Use Cases

Test a scraper you're building

{
  "actorTechnicalName": "my-username/my-new-scraper",
  "generateInputs": true,
  "enhanceSchema": true
}

Understand an external actor's output

{
  "actorTechnicalName": "apify/google-search-scraper",
  "generateInputs": true,
  "enhanceSchema": true
}

Use custom test inputs

{
  "actorTechnicalName": "username/actor-name",
  "generateInputs": false,
  "existingMinimalInput": "{\"query\": \"test\"}",
  "existingNormalInput": "{\"query\": \"test\", \"maxResults\": 100}"
}

Limitations

Requires the target actor to have a valid INPUT_SCHEMA
Generated inputs may not always be perfect (depends on schema quality)
Costs: Runs the target actor 4 times (uses their compute units)
AI enhancement requires OpenRouter credits (Apify provides via APIFY_TOKEN)

Tips for best results

Check the target actor's INPUT_SCHEMA first to ensure it exists and is complete
Provide custom inputs if AI generation fails (some actors have complex requirements)
Use generateViews: false unless you specifically need dataset views
Review the generated schema before using in production
Run multiple times if you want schemas from different data samples

Understanding Dataset Schema vs Output Schema

Dataset Schema (dataset_schema.json)

Validates the structure of individual items in your dataset:

What fields exist (e.g., title, price, url)
Field types (string, number, boolean, array, object)
Which fields are required vs optional
Validation rules (patterns, min/max values)

Example use: Ensure all scraped items have required fields before processing

Output Schema (output_schema.json)

Defines what your actor returns and how it displays:

Return type: dataset, key-value store, or both
Display format in Apify Console
Links to view results
Metadata structure

Example use: Show users where to find their scraped data in the UI

Support

For questions or feature requests, contact Apify support through the Apify Console.

Powered by Claude Sonnet 4 • Built by Apify

Dataset(s) To Schema

zuzka/dataset-to-schema

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

Zuzka Pelechová

5.0

Validate Dataset(s) with JSON Schema

jaroslavhejlek/validate-dataset-with-json-schema

This Actor validates items in one or more datasets against a provided JSON Schema. Use it if you planning to add a dataset validation schema to your actor and you want test it.

Jaroslav Hejlek

Output to Dataset

njoylab/apify-output-to-dataset

Merges outputs from multiple actors into a single dataset. Execute actors in series or parallel, combine data from datasets, key-value stores, webhooks, and export the final output in various formats.

njoylab

5.0

Testing actor

flow_matic/testing-actor

Flow Matic

🎉 Apify Actors

prog-party/apify-actors

This Apify Actors Actor retrieves data from Apify, allowing to filter, and returns a list of actors as a Dataset.

Prog Party

Dataset Download

idiatech/apify-Dataset-Download

Download any dataset from the Apify platform automatically and in any format you want. Use this actor along with a Dataset toolbox automation tool.

idIA Tech

LLM Dataset Processor

dusan.vystrcil/llm-dataset-processor

Allows you to process output of other actors or stored dataset with single LLM prompt. It's useful if you need to enrich data, summarize content, extract specific information, or manipulate data in a structured way using AI.

Dušan Vystrčil

132

JSON Content Checker & Validator - API Testing Tool

scrappy_garden/json-content-checker

Validate JSON content, check API responses, monitor data quality, and detect schema changes. Perfect for API testing, data validation, quality assurance, and monitoring JSON endpoints. Supports JSONPath, schema validation, and custom rules.

Bikram Adhikari

Url To Llm Dataset

consummate_mandala/url-to-llm-dataset

Donny Nguyen

Spawn Workers

pocesar/spawn-workers

This actor lets you spawn tasks or other actors in parallel on the Apify platform that shares a common output dataset, splitting a RequestQueue-like dataset containing request URLs