SEC Financials Normalizer — EDGAR XBRL to Clean JSON avatar

SEC Financials Normalizer — EDGAR XBRL to Clean JSON

Pricing

from $42.50 / 1,000 company-period normalizeds

Go to Apify Store
SEC Financials Normalizer — EDGAR XBRL to Clean JSON

SEC Financials Normalizer — EDGAR XBRL to Clean JSON

Normalize SEC EDGAR XBRL filings into comparable company financial statements as JSON. Give a ticker or CIK; get a standardized 10-K income statement, balance sheet, and cash flow, each line citing its XBRL tag and checked against accounting identities. Sector-aware (standard/financial/insurance).

Pricing

from $42.50 / 1,000 company-period normalizeds

Rating

0.0

(0)

Developer

Scott Helvick

Scott Helvick

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 hours ago

Last modified

Share

Every US public company's financials are in SEC EDGAR as XBRL — but raw XBRL is a bag of thousands of issuer-chosen tags, and two companies almost never tag the same line the same way, so the data isn't comparable without real normalization work. SEC Financials Normalizer does that work: give it a ticker or CIK and get a standardized income statement, balance sheet, and cash-flow statement as clean JSON — each line citing the exact XBRL tag it came from, each statement checked against accounting identities.

What this does

  • Ticker or CIK in, standardized statements out — pass AAPL or a 10-digit CIK; get the income statement, balance sheet, and cash-flow statement as typed JSON, one record per company per fiscal year.
  • Normalized, comparable line items — a maintained concept-map resolves each filer's messy, issuer-custom us-gaap tags onto a stable set of standard concepts (revenue, cost of revenue, gross profit, operating income, net income; total assets, liabilities, equity, cash; operating/investing/financing cash flow), so the same field means the same thing across companies and years.
  • Sector-aware — banks and insurers have no gross-profit structure and report total revenue with different tags. The Actor detects the sector from the filer's SIC code (standard / financial / insurance) and applies the right concept set, instead of forcing one template that yields wrong numbers for financials.
  • Values are verbatim, never invented — every number comes straight from an XBRL fact. Nothing is computed except a single explicit fallback (liabilities = assets − equity when a filer omits a standalone total-liabilities tag), which is flagged as derived.
  • Validated against accounting identities — each statement is checked against the filer's own reported subtotals: Assets = Liabilities + Equity, and (non-financials) Gross Profit = Revenue − Cost of Revenue. Every line carries a provenance flag (reported / derived / missing) and an identityValidated boolean; each record reports the identity residuals.
  • Source-cited — every line names the sourceTag it was drawn from, and each record links to the company's EDGAR filing index.
  • Batch — up to 50 companies and up to 10 annual periods per run; one fetch per company covers all requested years.

Use cases:

  • Pull standardized fundamentals for a watchlist of tickers as JSON, ready to write to a database.
  • Give an agent comparable income/balance/cash-flow figures for several companies without it parsing raw filings.
  • Build a multi-year fundamentals time series for one company in a single call.
  • Feed an LLM clean, cited financials instead of tens of thousands of tokens of raw XBRL.
  • Screen companies on normalized metrics with every number traceable to its source tag.

Why normalization matters

The data is already public and free — SEC's XBRL API hands back every fact a company filed. The problem is that it's a heap of thousands of tags, and the same economic line is tagged differently across filers and across years. Revenue might be RevenueFromContractWithCustomerExcludingAssessedTax on a recent tech filer, Revenues on an older or financial filer, SalesRevenueNet on something older still. Equity might be the parent-only figure or the figure including noncontrolling interests — and if you pick the wrong one for a holding company with large minority stakes, the balance sheet silently fails to balance. Pull a tag by name and you get numbers that look fine and aren't comparable.

Financials and insurers break naive templates entirely: a bank has no "gross profit," and its total revenue isn't the contract-with-customer line. A normalizer that doesn't know this returns a confidently wrong revenue figure for every insurer it touches.

This Actor's value is the maintained mapping that absorbs all of that — and the check that proves it worked. Because every filer reports its own subtotals, the accounting identities are self-contained ground truth: if the resolved components satisfy Assets = Liabilities + Equity, the normalization is corroborated by the filer's own numbers, not by trust. And because the output is built deterministically from XBRL facts — no language model in the path — the figures are never hallucinated. The one computed value, derived liabilities, is labeled as such.

How it compares to alternatives

ApproachStandardized line itemsSector-awareIdentity-validatedPer-line source citationNumbers
Raw EDGAR / XBRL scraperNo — raw tagsNoNotag dumpverbatim
Roll-your-own XBRL parserYou build + maintain itYou build itYou build itYou build itverbatim
LLM over raw filingsSometimesSometimesNoNohallucination risk
SEC Financials NormalizerYesYesYesYesverbatim

Raw scrapers hand back the tag soup; rolling your own means owning the concept-map and its drift forever; an LLM over filings risks inventing numbers. This Actor is the maintained-normalization layer with a correctness check, returning verbatim figures.

Input

FieldTypeRequiredDefaultDescription
identifiersarrayyesCompanies as ticker symbols (e.g. AAPL, BRK.B) or 10-digit SEC CIKs (zero-padding optional). Tickers are resolved via SEC's official ticker map. 1–50 per run.
yearsinteger1Most-recent annual (10-K) periods to return per company, newest first. One record (and one charge) per company-period. 1–10.
statementsarray["income","balance","cashflow"]Which statements to include. Valid values: income, balance, cashflow. Validated at runtime.

Output

One dataset record per company-period.

{
"identifier": "AAPL",
"status": "completed",
"cik": "0000320193",
"ticker": "AAPL",
"companyName": "Apple Inc.",
"sector": "standard",
"fiscalYear": 2025,
"periodEnd": "2025-09-27",
"form": "10-K",
"currency": "USD",
"statements": [
{
"kind": "income",
"lineItems": [
{ "concept": "revenue", "label": "Revenue", "value": 416161000000, "sourceTag": "RevenueFromContractWithCustomerExcludingAssessedTax", "provenance": "reported", "identityValidated": true },
{ "concept": "grossProfit", "label": "Gross profit", "value": 195196000000, "sourceTag": "GrossProfit", "provenance": "reported", "identityValidated": true }
]
},
{
"kind": "balance",
"lineItems": [
{ "concept": "totalAssets", "label": "Total assets", "value": 359240000000, "sourceTag": "Assets", "provenance": "reported", "identityValidated": true },
{ "concept": "totalLiabilities", "label": "Total liabilities", "value": 285510000000, "sourceTag": "Liabilities", "provenance": "reported", "identityValidated": true },
{ "concept": "totalEquity", "label": "Total equity (incl. NCI)", "value": 73730000000, "sourceTag": "StockholdersEquity", "provenance": "reported", "identityValidated": true }
]
}
],
"identities": [
{ "name": "assets = liabilities + equity", "holds": true, "residual": 0, "note": null },
{ "name": "gross profit = revenue - cost of revenue", "holds": true, "residual": 0, "note": null }
],
"sourceFilingUrl": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&type=10-K",
"error": null
}

A company that can't be resolved or has no usable annual facts comes back as one failed record (not charged): { "identifier": "NOTATICKER", "status": "failed", "statements": [], "identities": [], "error": "unresolved-identifier" }.

FieldTypeDescription
identifierstringThe input identifier (ticker or CIK), echoed.
statusstringcompleted (normalized, charged) or failed (unresolved / no facts; not charged).
cik / ticker / companyNamestring | nullResolved company identity.
sectorstring | nullstandard, financial, or insurance (from SIC); selects the income concept set.
fiscalYear / periodEnd / form / currencyThe period this record anchors on (annual 10-K, USD).
statementsarrayRequested statements; each line item has concept, label, value, sourceTag, provenance (reported/derived/missing), identityValidated.
identitiesarrayAccounting-identity checks: name, holds (null = not applicable to the sector), residual, note.
sourceFilingUrlstring | nullEDGAR filing-index link for citation.
errorstring | nullReason when status is failed; null on success.
noticestringStanding data-source + disclaimer note carried on every record (see Data source & disclaimer).

Example

{ "identifiers": ["AAPL", "JPM"], "years": 2, "statements": ["income", "balance", "cashflow"] }
curl -X POST "https://api.apify.com/v2/acts/shelvick~sec-financials-normalizer/run-sync-get-dataset-items?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"identifiers":["AAPL","JPM"],"years":2}'

Calling from an AI agent

Apify MCP server

The Actor is a callable tool on mcp.apify.com. The input schema is self-documenting — an LLM can construct a correct call from the tool description and field names alone. Pay per call via x402 USDC on Base or Skyfire managed tokens.

Apify SDK (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_TOKEN")
run = client.actor("shelvick/sec-financials-normalizer").call(
run_input={"identifiers": ["AAPL", "JPM"], "years": 2}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["identifier"], item["fiscalYear"], item["status"], item["sector"])

REST API

POST https://api.apify.com/v2/acts/shelvick~sec-financials-normalizer/run-sync-get-dataset-items?token=YOUR_TOKEN

For large batches, start asynchronously and poll the run's dataset.

Pricing

Pay-per-event, billed only on success: one charge per company-period record pushed. Companies that don't resolve, or have no usable annual facts, are never charged — a run only costs you the company-periods it actually normalized. One fetch covers all requested years, so multi-year requests are efficient. Cap a whole run with maxTotalChargeUsd.

See the Pricing tab on this Store page for the current per-event rate and any active subscriber discounts.

Behavior

Run-level failures (rare): invalid input fails the run before any work — empty identifiers, more than 50, years out of range (1–10), or an unknown statements value. Nothing is charged.

Per-record outcomes:

  • completed — a normalized statement set was produced (charged). Check each line's provenance and the identities residuals for confidence.
  • failedunresolved-identifier (ticker/CIK not found), no-annual-facts (no 10-K XBRL facts), or facts-fetch-failed (SEC fetch error). Never charged.

Performance: sourcing is SEC's free XBRL + submissions APIs; a company is a couple of API calls plus parsing, a few seconds each, rate-limited politely under SEC's ceiling. One companyfacts fetch covers every requested year. A 50-company multi-year run completes well within the run timeout.

FAQ

How are tickers resolved, and do class shares work? Tickers are mapped to CIKs via SEC's official ticker file; 10-digit CIKs are accepted directly. Class shares work either way — BRK.B and BRK-B both resolve.

Why is a line missing or derived? missing means no candidate tag for that concept was present in the filing. derived appears only for total liabilities when the filer omits a standalone tag — it's computed as assets − equity and flagged so you know it wasn't reported directly.

Does it cover quarterly periods? No — this version returns annual (10-K) periods only.

Am I charged for companies that fail? No. The charge fires only per completed company-period; unresolved or fact-less companies are free.

How do I know a number is trustworthy? Each line names its sourceTag and carries provenance + identityValidated; each record reports the identity residuals. A balanced sheet and a passing gross-profit check are the filer's own subtotals corroborating the normalization.

What this doesn't do

  • No quarterly statements. Annual 10-K periods only in this version.
  • No non-US / IFRS filers. US-GAAP XBRL from domestic SEC filers.
  • No segment, footnote, or per-share detail. The core three statements' standard lines, not the full filing.
  • No ratios, scoring, or analysis. It returns normalized statements; interpretation is yours to layer on.
  • No private companies. SEC filers only.

For raw filing documents or the full unfiltered XBRL fact set, use an EDGAR filings scraper. For non-SEC or private-company financials, use a commercial financial-data provider. For ratios, screening logic, or narrative analysis on top of these numbers, layer your own logic or an analysis tool — this Actor is the clean, cited input to that, not the analysis itself.


Design notes: www.scotthelvick.com/tools/sec-financials-normalizer