Pricing

$54.00 / 1,000 processed-rows

Go to Apify Store

CSV/JSON Schema Normalizer

Try for free

Normalize CSV and JSON rows into stable, typed, automation-ready records.

Pricing

$54.00 / 1,000 processed-rows

Rating

0.0

(0)

Developer

Zentra

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

What this Actor does

What it does

Processes configured public sources or user-provided records for focused csv/json schema normalizer monitoring.
Emits structured rows with source references, stable identifiers, confidence, warnings, and run summary fields.
Supports sample-mode runs so Apify Store QA and first-time users can inspect output without depending on live third-party availability.

What it does not do

Does not scrape private, login-only, paywalled, or access-restricted data unless the user provides approved credentials for a source they control.
Does not guarantee every field is available from every source; missing or blocked fields are returned as warnings or nulls.
Does not make legal, financial, compliance, procurement, medical, safety, or regulatory decisions.

Who this is for

Developers, analysts, data operations teams, AI-agent builders, and automation owners use this actor when they need focused csv/json schema normalizer output instead of a broad generic scraper or manual checking.

Buyer outcomes

Turn csv/json schema normalizer inputs into repeatable structured output for downstream systems.
Prioritize cleanup with schema, quality, extraction, change, warning, and error fields.
Route normalized rows into Apify datasets, APIs, spreadsheets, automations, or AI-agent workflows.

Data sources

Sources monitored

Apify datasets/storage

Input

sourceMode: use sample for a smoke run, startUrls for URL-backed PDFs/datasets/pages, or configured dataset modes.
startUrls: PDF URLs, dataset URLs, public files, or pages to parse, audit, normalize, extract, or compare.
sourceIds: approved source or dataset identifiers used to scope the run.
maxItems: bounded number of files, tables, rows, fields, or changes to process.
watchlistTerms: optional column names, schema keys, quality rules, or extraction terms.
webhookUrl: optional completion destination for the transformation report.
outputMode: use sample records for Store validation or production output for normal runs.

How it transforms the input

Input: PDF, CSV, JSON, Apify dataset URL, table-like document, website, or messy operational data.
Transformation: parse, extract, normalize, audit, compare, dedupe, or report schema/quality issues.
Output: normalized fields, extracted tables/rows, schema report, diff report, warnings, confidence, and errors.

Output

The actor returns structured transformation records: extracted tables, normalized schemas, dataset quality metrics, diff reports, parsed fields, warnings, errors, and confidence signals.

Family-specific fields to expect:

extractedRows: Rows parsed or produced by the transformation.
schema: Detected, normalized, or target schema.
columns: Detected table or dataset columns.
validationErrors: Validation, parse, schema, or quality errors.
duplicateCount: Duplicate rows or keys found during audit/dedupe.
nullRate: Null or empty-value rate for important fields.
changedRecords: Added, removed, or changed records for diff workflows.
recordId: Stable record ID for exports, dedupe, and downstream joins.
title: Human-readable record title for review and export.
sourceName: Source identifier used to trace where the record came from.
sourceUrl: Direct source URL for review and audit.
dedupeKey: Stable key used for delta mode and duplicate suppression.
retrievedAt: Timestamp showing when the actor retrieved or generated this record.
score: Normalized field for filtering, routing, or downstream review.
scoreReasons: Buyer-readable explanation for the score or match.
confidence: Normalized field for filtering, routing, or downstream review.
errors: Normalized field for filtering, routing, or downstream review.
runSummary: Run-level summary for counts, filters, charges, and next actions.

Pricing

This actor uses Apify pay-per-event pricing. Current public listing guidance: $29-$49 / 1,000 launch validation records until public data proof is complete. Charges are tied to buyer-visible value events such as row-processed, schema-created, dataset-processed, record-saved, enriched-record. Small validation runs are supported so you can inspect output before scaling a schedule.

row-processed: Charge after producing one normalized row. Typical price: $0.002. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
schema-created: Charge after producing one inferred schema. Typical price: $0.080. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
dataset-processed: Base charge when CSV/JSON Schema Normalizer writes a non-empty default dataset. Typical price: $0.011. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
record-saved: Charge for each buyer-visible result saved by CSV/JSON Schema Normalizer. Typical price: $0.003. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
first-run-cap: Recommended first run budget cap. Typical price: $2.000. Start with the default small run, inspect the dataset, then raise maxItems or schedule recurring runs.

API example

curl -X POST "https://api.apify.com/v2/actors/zentrafoundry~csv-json-schema-normalizer/runs" \
+  -H "Authorization: Bearer $APIFY_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"maxItems":10,"sourceIds":["APIFY-DATASETS"],"includeSourceUrls":true,"includeMatchReasons":true,"outputMode":"buyer-ready-records"}'

Demo run

Recommended first run

{
    "maxItems": 10,
    "sourceIds": [
        "APIFY-DATASETS"
    ],
    "includeSourceUrls": true,
    "includeMatchReasons": true,
    "outputMode": "buyer-ready-records"
}

Sample output

Sample status: sample_unavailable at https://zentra.nimblique.studio/external/actor-review/samples/csv-json-schema-normalizer.json. No fake sample is published; run a bounded real sample refresh before using examples in promotion.

Recommended public tasks

[
    {
        "name": "Validate one small data transformation",
        "description": "Low-cost validation run for checking parsed, normalized, audited, or diffed output.",
        "input": {
            "maxItems": 10,
            "sourceIds": [
                "APIFY-DATASETS"
            ],
            "includeSourceUrls": true,
            "includeMatchReasons": true,
            "outputMode": "buyer-ready-records",
            "actorSlug": "csv-json-schema-normalizer"
        }
    },
    {
        "name": "Recurring dataset utility check",
        "description": "Recurring batch for schema, quality, extraction, or change reports.",
        "schedule": "Daily during local business hours",
        "input": {
            "maxItems": 25,
            "sourceIds": [
                "APIFY-DATASETS"
            ],
            "includeSourceUrls": true,
            "includeMatchReasons": true,
            "outputMode": "buyer-ready-records",
            "actorSlug": "csv-json-schema-normalizer"
        }
    }
]

Example use cases

Clean, extract, compare, or audit csv/json schema normalizer data before it enters a downstream workflow.
Convert messy inputs into predictable JSON/CSV-ready rows for APIs, spreadsheets, or agents.
Surface schema drift, duplicates, nulls, errors, warnings, or changed records.
Use small validation runs before connecting larger datasets or destinations.

Trust and compliance

Uses Apify datasets/storage.
Keeps source URLs and source identifiers in output records for auditability.
Does not require private credentials unless a source is explicitly configured for approved authenticated access.

Reliability and QA

Prefilled Apify Store QA input runs in sample mode and should finish within the automated quality window.
Empty input is handled with deterministic sample or diagnostic output instead of a crash.
Demo/sample runs suppress buyer-value charges while still writing representative dataset rows.
Production runs use bounded maxItems, source references, warnings, and run summaries so blocked or changed targets are visible.

Limitations

Results depend on public-source availability, source uptime, and source update cadence.
Public sources can revise records after publication; rerun scheduled tasks for fresh evidence.
Scores and match reasons are decision-support signals, not legal, financial, procurement, medical, safety, or regulatory advice.
Large production runs can cost more than the default smoke run; start small, inspect output, then scale schedules.

Legal and responsible use

Use this Actor only for public data or data you are authorized to process. You are responsible for complying with applicable laws, marketplace terms, robots policies, privacy rules, and source-specific limits.

Support

Open an issue on the Actor page with the run ID, input summary, expected result, and observed result. Do not include secrets, cookies, auth headers, or private account data.

FAQ

Can I run this without URLs? Yes. The default sample mode is designed to succeed without user-supplied URLs, and URL-backed runs can use startUrls when needed.

Can I schedule it? Yes. Use sinceLastRun, watchlistTerms, and optional webhookUrl to turn the actor into a recurring alert or report workflow.

How do I verify value before scaling? Run the recommended first-run input, review the sample output fields, then increase maxItems or schedule recurring runs after the dataset matches your use case.

CSV / JSON Schema Normalizer v2

zentrafoundry/csv-json-schema-normalizer-v2

Transform csv / json schema normalizer v2 inputs into structured rows, clear errors, confidence signals, and automation-ready output.

Zentra

Data Cleaner & Normalizer (JSON/CSV)

zenomastro/data-cleaner-normalizer

Clean and normalize JSON/CSV data: trim whitespace, lowercase emails, normalize phone numbers and dates, drop empty values/rows, and deduplicate by a field.

Rosario Vitale

Tool Output Schema Normalizer

junipr/tool-output-schema-normalizer

Normalize inconsistent tool/actor outputs into a target schema for downstream automation.

junipr

Company Name Normalizer

zentrafoundry/company-name-normalizer

Transform company name normalizer inputs into structured rows, clear errors, confidence signals, and automation-ready output.

Zentra

CSV to JSON Converter with Schema Inference & Validation

nibble/csv-json-schema-converter

Convert CSV files to clean, typed JSON. Auto-detects delimiter, infers a JSON Schema, and validates rows against your own schema. Ideal for APIs, data pipelines and AI agents.

Simon Fletcher

CSV Deduper Normalizer

junipr/csv-deduper-normalizer

Deduplicate and normalize CSV-style rows. Clean whitespace, casing, domains, URLs, and emails, then output kept and duplicate rows plus clean CSV/JSON files.

junipr

Public JSON to CSV Feed

convenient_yarn/public-json-to-csv-feed

Flatten inline JSON or public HTTPS JSON into Excel-safe CSV with stable export URLs for automation workflows.

Travis Berman

Project Normalizer

wild_equipment/project-normalizer

Zhang Luxin

Job Posting Signal Normalizer

zentrafoundry/job-posting-signal-normalizer

Normalize job datasets and detect hiring signals.

Zentra

AI Web Scraper — URL to JSON with Confidence

crisp_gopher/ai-scraper-to-json

Extract structured data from any website into typed JSON matching your schema, with a confidence score on every field. AI-powered, RAG-ready, with built-in schema validation and grounding to catch hallucinations.