Output & Dataset Schema Creator avatar
Output & Dataset Schema Creator

Pricing

Pay per usage

Go to Apify Store
Output & Dataset Schema Creator

Output & Dataset Schema Creator

Generate JSON schemas for output and dataset on your Actor using AI. Perfect for testing new actors.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Zuzka Pelechová

Zuzka Pelechová

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

18 hours ago

Last modified

Categories

Share

Dataset Schema External Actor

Automatically generate JSON schemas for any Apify actor's dataset output using AI. Perfect for actors without production data or when testing new actors.

What it does

This actor:

  1. Generates test inputs using AI (Claude Sonnet 4) based on the target actor's INPUT_SCHEMA
  2. Runs the target actor with multiple input variants (minimal, normal, maximal, edge cases)
  3. Analyzes the output datasets to generate a comprehensive JSON Schema
  4. Enhances the schema with AI-generated descriptions and examples
  5. Creates both schemas:
    • Dataset Schema: Validates the structure of items in your dataset (fields, types, required properties)
    • Output Schema: Defines what your actor returns (dataset, key-value store, etc.) and how it's displayed in Apify Console

When to use this actor

  • Testing new actors before they have production data
  • External actors you don't own but want to understand their output
  • Rapid prototyping when you need schemas quickly
  • Actors without production runs or insufficient data

Input

{
"actorTechnicalName": "api-ninja/tripadvisor-reviews-scraper",
"generateInputs": true,
"enhanceSchema": true,
"generateViews": false
}

Parameters

  • actorTechnicalName (required): The actor to analyze (e.g., username/actor-name)
  • generateInputs (optional, default: true): Generate test inputs with AI
  • existingMinimalInput (optional): Provide your own minimal test input (JSON string)
  • existingNormalInput (optional): Provide your own normal test input (JSON string)
  • existingMaximalInput (optional): Provide your own maximal test input (JSON string)
  • existingEdgeInput (optional): Provide your own edge case input (JSON string)
  • enhanceSchema (optional, default: true): Enhance schema with AI descriptions
  • existingEnhancedSchema (optional): Skip generation and use existing schema (JSON string)
  • generateViews (optional, default: false): Generate dataset views

Output

The primary output is the Schemas Bundle (shown first in results), which contains:

{
"schemas": {
"dataset": {
"title": "Dataset Schema",
"description": "Validates the structure of items in your dataset (fields, types, required properties)",
"schema": { /* Complete JSON Schema */ }
},
"output": {
"title": "Output Schema",
"description": "Defines what your actor returns (dataset/KV store) and how it displays in Apify Console",
"schema": { /* OUTPUT_SCHEMA.json format */ }
}
},
"metadata": {
"actorName": "tripadvisor-reviews-scraper",
"generatedAt": "2026-02-06T...",
"enhancementUsed": true,
"inputsUsed": "generated"
},
"usage": {
"datasetSchemaPath": ".actor/dataset_schema.json",
"outputSchemaPath": ".actor/output_schema.json",
"instructions": "Copy the schemas to your actor repository"
}
}

How to use the output

  1. View the Schemas Bundle (default output in Apify Console)
  2. Copy the dataset schema from schemas.dataset.schema
  3. Copy the output schema from schemas.output.schema
  4. Save to your actor:
    • Dataset schema → .actor/dataset_schema.json
    • Output schema → .actor/output_schema.json
  5. Update actor.json to reference these schemas

How it works

Step 1: Smart Input Generation

  • Fetches the target actor's INPUT_SCHEMA from Apify API
  • Extracts prefill/default values as a base
  • Uses Claude Sonnet 4 to generate 4 input variants:
    • Minimal: Essential fields only, 3 items max
    • Normal: Common use case with realistic data
    • Maximal: All available fields and options
    • Edge: Invalid/nonexistent data to test error handling

Step 2: Dataset Collection

  • Runs the target actor 4 times in parallel with different inputs
  • Collects output datasets from successful runs
  • Validates that at least 1 run succeeds (need data to generate schema)

Step 3: Schema Generation

  • Analyzes all dataset items to determine field types
  • Calculates presence percentages for each field
  • Determines required vs optional fields
  • Identifies array items, nested objects, enums
  • Generates complete JSON Schema (draft-07)

Step 4: AI Enhancement

  • Uses Claude Sonnet 4 to add:
    • Human-readable descriptions for each field
    • Example values based on actual data
    • Pattern validation rules
    • Better field titles and documentation

Step 5: Output Schema Creation

  • Generates Apify OUTPUT_SCHEMA.json format
  • Defines what the actor returns (dataset, key-value store)
  • Specifies how results display in Apify Console
  • Ready to use in actor.json

Example Use Cases

Test a scraper you're building

{
"actorTechnicalName": "my-username/my-new-scraper",
"generateInputs": true,
"enhanceSchema": true
}

Understand an external actor's output

{
"actorTechnicalName": "apify/google-search-scraper",
"generateInputs": true,
"enhanceSchema": true
}

Use custom test inputs

{
"actorTechnicalName": "username/actor-name",
"generateInputs": false,
"existingMinimalInput": "{\"query\": \"test\"}",
"existingNormalInput": "{\"query\": \"test\", \"maxResults\": 100}"
}

Limitations

  • Requires the target actor to have a valid INPUT_SCHEMA
  • Generated inputs may not always be perfect (depends on schema quality)
  • Costs: Runs the target actor 4 times (uses their compute units)
  • AI enhancement requires OpenRouter credits (Apify provides via APIFY_TOKEN)

Tips for best results

  1. Check the target actor's INPUT_SCHEMA first to ensure it exists and is complete
  2. Provide custom inputs if AI generation fails (some actors have complex requirements)
  3. Use generateViews: false unless you specifically need dataset views
  4. Review the generated schema before using in production
  5. Run multiple times if you want schemas from different data samples

Understanding Dataset Schema vs Output Schema

Dataset Schema (dataset_schema.json)

Validates the structure of individual items in your dataset:

  • What fields exist (e.g., title, price, url)
  • Field types (string, number, boolean, array, object)
  • Which fields are required vs optional
  • Validation rules (patterns, min/max values)

Example use: Ensure all scraped items have required fields before processing

Output Schema (output_schema.json)

Defines what your actor returns and how it displays:

  • Return type: dataset, key-value store, or both
  • Display format in Apify Console
  • Links to view results
  • Metadata structure

Example use: Show users where to find their scraped data in the UI

Support

For questions or feature requests, contact Apify support through the Apify Console.


Powered by Claude Sonnet 4 • Built by Apify