Output & Dataset Schema Creator
Pricing
Pay per usage
Output & Dataset Schema Creator
Generate JSON schemas for output and dataset on your Actor using AI. Perfect for testing new actors.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Zuzka Pelechová
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
18 hours ago
Last modified
Categories
Share
Dataset Schema External Actor
Automatically generate JSON schemas for any Apify actor's dataset output using AI. Perfect for actors without production data or when testing new actors.
What it does
This actor:
- Generates test inputs using AI (Claude Sonnet 4) based on the target actor's INPUT_SCHEMA
- Runs the target actor with multiple input variants (minimal, normal, maximal, edge cases)
- Analyzes the output datasets to generate a comprehensive JSON Schema
- Enhances the schema with AI-generated descriptions and examples
- Creates both schemas:
- Dataset Schema: Validates the structure of items in your dataset (fields, types, required properties)
- Output Schema: Defines what your actor returns (dataset, key-value store, etc.) and how it's displayed in Apify Console
When to use this actor
- Testing new actors before they have production data
- External actors you don't own but want to understand their output
- Rapid prototyping when you need schemas quickly
- Actors without production runs or insufficient data
Input
{"actorTechnicalName": "api-ninja/tripadvisor-reviews-scraper","generateInputs": true,"enhanceSchema": true,"generateViews": false}
Parameters
actorTechnicalName(required): The actor to analyze (e.g.,username/actor-name)generateInputs(optional, default: true): Generate test inputs with AIexistingMinimalInput(optional): Provide your own minimal test input (JSON string)existingNormalInput(optional): Provide your own normal test input (JSON string)existingMaximalInput(optional): Provide your own maximal test input (JSON string)existingEdgeInput(optional): Provide your own edge case input (JSON string)enhanceSchema(optional, default: true): Enhance schema with AI descriptionsexistingEnhancedSchema(optional): Skip generation and use existing schema (JSON string)generateViews(optional, default: false): Generate dataset views
Output
The primary output is the Schemas Bundle (shown first in results), which contains:
{"schemas": {"dataset": {"title": "Dataset Schema","description": "Validates the structure of items in your dataset (fields, types, required properties)","schema": { /* Complete JSON Schema */ }},"output": {"title": "Output Schema","description": "Defines what your actor returns (dataset/KV store) and how it displays in Apify Console","schema": { /* OUTPUT_SCHEMA.json format */ }}},"metadata": {"actorName": "tripadvisor-reviews-scraper","generatedAt": "2026-02-06T...","enhancementUsed": true,"inputsUsed": "generated"},"usage": {"datasetSchemaPath": ".actor/dataset_schema.json","outputSchemaPath": ".actor/output_schema.json","instructions": "Copy the schemas to your actor repository"}}
How to use the output
- View the Schemas Bundle (default output in Apify Console)
- Copy the dataset schema from
schemas.dataset.schema - Copy the output schema from
schemas.output.schema - Save to your actor:
- Dataset schema →
.actor/dataset_schema.json - Output schema →
.actor/output_schema.json
- Dataset schema →
- Update actor.json to reference these schemas
How it works
Step 1: Smart Input Generation
- Fetches the target actor's INPUT_SCHEMA from Apify API
- Extracts prefill/default values as a base
- Uses Claude Sonnet 4 to generate 4 input variants:
- Minimal: Essential fields only, 3 items max
- Normal: Common use case with realistic data
- Maximal: All available fields and options
- Edge: Invalid/nonexistent data to test error handling
Step 2: Dataset Collection
- Runs the target actor 4 times in parallel with different inputs
- Collects output datasets from successful runs
- Validates that at least 1 run succeeds (need data to generate schema)
Step 3: Schema Generation
- Analyzes all dataset items to determine field types
- Calculates presence percentages for each field
- Determines required vs optional fields
- Identifies array items, nested objects, enums
- Generates complete JSON Schema (draft-07)
Step 4: AI Enhancement
- Uses Claude Sonnet 4 to add:
- Human-readable descriptions for each field
- Example values based on actual data
- Pattern validation rules
- Better field titles and documentation
Step 5: Output Schema Creation
- Generates Apify OUTPUT_SCHEMA.json format
- Defines what the actor returns (dataset, key-value store)
- Specifies how results display in Apify Console
- Ready to use in actor.json
Example Use Cases
Test a scraper you're building
{"actorTechnicalName": "my-username/my-new-scraper","generateInputs": true,"enhanceSchema": true}
Understand an external actor's output
{"actorTechnicalName": "apify/google-search-scraper","generateInputs": true,"enhanceSchema": true}
Use custom test inputs
{"actorTechnicalName": "username/actor-name","generateInputs": false,"existingMinimalInput": "{\"query\": \"test\"}","existingNormalInput": "{\"query\": \"test\", \"maxResults\": 100}"}
Limitations
- Requires the target actor to have a valid INPUT_SCHEMA
- Generated inputs may not always be perfect (depends on schema quality)
- Costs: Runs the target actor 4 times (uses their compute units)
- AI enhancement requires OpenRouter credits (Apify provides via APIFY_TOKEN)
Tips for best results
- Check the target actor's INPUT_SCHEMA first to ensure it exists and is complete
- Provide custom inputs if AI generation fails (some actors have complex requirements)
- Use generateViews: false unless you specifically need dataset views
- Review the generated schema before using in production
- Run multiple times if you want schemas from different data samples
Understanding Dataset Schema vs Output Schema
Dataset Schema (dataset_schema.json)
Validates the structure of individual items in your dataset:
- What fields exist (e.g.,
title,price,url) - Field types (string, number, boolean, array, object)
- Which fields are required vs optional
- Validation rules (patterns, min/max values)
Example use: Ensure all scraped items have required fields before processing
Output Schema (output_schema.json)
Defines what your actor returns and how it displays:
- Return type: dataset, key-value store, or both
- Display format in Apify Console
- Links to view results
- Metadata structure
Example use: Show users where to find their scraped data in the UI
Support
For questions or feature requests, contact Apify support through the Apify Console.
Powered by Claude Sonnet 4 • Built by Apify