CSV/JSON Schema Normalizer avatar

CSV/JSON Schema Normalizer

Pricing

$54.00 / 1,000 processed-rows

Go to Apify Store
CSV/JSON Schema Normalizer

CSV/JSON Schema Normalizer

Normalize CSV and JSON rows into stable, typed, automation-ready records.

Pricing

$54.00 / 1,000 processed-rows

Rating

0.0

(0)

Developer

Zentra

Zentra

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a few seconds ago

Last modified

Share

Transform csv/json schema normalizer inputs into structured rows, clear errors, confidence signals, and automation-ready output.

Who this is for

Developers, analysts, data operations teams, AI-agent builders, and automation owners use this actor when they need focused csv/json schema normalizer output instead of a broad generic scraper or manual checking.

Buyer outcomes

  • Turn csv/json schema normalizer inputs into repeatable structured output for downstream systems.
  • Prioritize cleanup with schema, quality, extraction, change, warning, and error fields.
  • Route normalized rows into Apify datasets, APIs, spreadsheets, automations, or AI-agent workflows.

Sources monitored

Inputs

  • sourceMode: use sample for a smoke run, startUrls for URL-backed PDFs/datasets/pages, or configured dataset modes.
  • startUrls: PDF URLs, dataset URLs, public files, or pages to parse, audit, normalize, extract, or compare.
  • sourceIds: approved source or dataset identifiers used to scope the run.
  • maxItems: bounded number of files, tables, rows, fields, or changes to process.
  • watchlistTerms: optional column names, schema keys, quality rules, or extraction terms.
  • webhookUrl: optional completion destination for the transformation report.
  • outputMode: use sample records for Store validation or production output for normal runs.

How it transforms the input

  • Input: PDF, CSV, JSON, Apify dataset URL, table-like document, website, or messy operational data.
  • Transformation: parse, extract, normalize, audit, compare, dedupe, or report schema/quality issues.
  • Output: normalized fields, extracted tables/rows, schema report, diff report, warnings, confidence, and errors.

Outputs

The actor returns structured transformation records: extracted tables, normalized schemas, dataset quality metrics, diff reports, parsed fields, warnings, errors, and confidence signals.

Family-specific fields to expect:

  • extractedRows: Rows parsed or produced by the transformation.

  • schema: Detected, normalized, or target schema.

  • columns: Detected table or dataset columns.

  • validationErrors: Validation, parse, schema, or quality errors.

  • duplicateCount: Duplicate rows or keys found during audit/dedupe.

  • nullRate: Null or empty-value rate for important fields.

  • changedRecords: Added, removed, or changed records for diff workflows.

  • recordId: Stable record ID for exports, dedupe, and downstream joins.

  • title: Human-readable record title for review and export.

  • sourceName: Source identifier used to trace where the record came from.

  • sourceUrl: Direct source URL for review and audit.

  • dedupeKey: Stable key used for delta mode and duplicate suppression.

  • retrievedAt: Timestamp showing when the actor retrieved or generated this record.

  • score: Normalized field for filtering, routing, or downstream review.

  • scoreReasons: Buyer-readable explanation for the score or match.

  • confidence: Normalized field for filtering, routing, or downstream review.

  • errors: Normalized field for filtering, routing, or downstream review.

  • runSummary: Run-level summary for counts, filters, charges, and next actions.

Pricing

This actor uses Apify pay-per-event pricing. Current public listing guidance: $29-$49 / 1,000 launch validation records until public data proof is complete. Charges are tied to buyer-visible value events such as row-processed, schema-created, dataset-processed, record-saved, enriched-record. Small validation runs are supported so you can inspect output before scaling a schedule.

  • row-processed: Charge after producing one normalized row. Typical price: $0.002. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • schema-created: Charge after producing one inferred schema. Typical price: $0.080. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • dataset-processed: Base charge when CSV/JSON Schema Normalizer writes a non-empty default dataset. Typical price: $0.011. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • record-saved: Charge for each buyer-visible result saved by CSV/JSON Schema Normalizer. Typical price: $0.003. A run that produces 10 matching records charges only for the matched buyer-value events and remains capped by the run limit.
  • first-run-cap: Recommended first run budget cap. Typical price: $2.000. Start with the default small run, inspect the dataset, then raise maxItems or schedule recurring runs.

API example

curl -X POST "https://api.apify.com/v2/actors/zentrafoundry~csv-json-schema-normalizer/runs" \
+ -H "Authorization: Bearer $APIFY_TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{"maxItems":10,"sourceIds":["APIFY-DATASETS"],"includeSourceUrls":true,"includeMatchReasons":true,"outputMode":"buyer-ready-records"}'
{
"maxItems": 10,
"sourceIds": [
"APIFY-DATASETS"
],
"includeSourceUrls": true,
"includeMatchReasons": true,
"outputMode": "buyer-ready-records"
}

Sample output

Sample status: sample_unavailable at https://zentra.nimblique.studio/external/actor-review/samples/csv-json-schema-normalizer.json. No fake sample is published; run a bounded real sample refresh before using examples in promotion.

[
{
"name": "Validate one small data transformation",
"description": "Low-cost validation run for checking parsed, normalized, audited, or diffed output.",
"input": {
"maxItems": 10,
"sourceIds": [
"APIFY-DATASETS"
],
"includeSourceUrls": true,
"includeMatchReasons": true,
"outputMode": "buyer-ready-records",
"actorSlug": "csv-json-schema-normalizer"
}
},
{
"name": "Recurring dataset utility check",
"description": "Recurring batch for schema, quality, extraction, or change reports.",
"schedule": "Daily during local business hours",
"input": {
"maxItems": 25,
"sourceIds": [
"APIFY-DATASETS"
],
"includeSourceUrls": true,
"includeMatchReasons": true,
"outputMode": "buyer-ready-records",
"actorSlug": "csv-json-schema-normalizer"
}
}
]

Use cases

  • Clean, extract, compare, or audit csv/json schema normalizer data before it enters a downstream workflow.
  • Convert messy inputs into predictable JSON/CSV-ready rows for APIs, spreadsheets, or agents.
  • Surface schema drift, duplicates, nulls, errors, warnings, or changed records.
  • Use small validation runs before connecting larger datasets or destinations.

Trust and compliance

  • Uses Apify datasets/storage.
  • Keeps source URLs and source identifiers in output records for auditability.
  • Does not require private credentials unless a source is explicitly configured for approved authenticated access.

Limitations

  • Results depend on public-source availability, source uptime, and source update cadence.
  • Public sources can revise records after publication; rerun scheduled tasks for fresh evidence.
  • Scores and match reasons are decision-support signals, not legal, financial, procurement, medical, safety, or regulatory advice.
  • Large production runs can cost more than the default smoke run; start small, inspect output, then scale schedules.

FAQ

Can I run this without URLs? Yes. The default sample mode is designed to succeed without user-supplied URLs, and URL-backed runs can use startUrls when needed.

Can I schedule it? Yes. Use sinceLastRun, watchlistTerms, and optional webhookUrl to turn the actor into a recurring alert or report workflow.

How do I verify value before scaling? Run the recommended first-run input, review the sample output fields, then increase maxItems or schedule recurring runs after the dataset matches your use case.