SEC Financials Normalizer — EDGAR XBRL to Clean JSON
Pricing
from $42.50 / 1,000 company-period normalizeds
SEC Financials Normalizer — EDGAR XBRL to Clean JSON
Normalize SEC EDGAR XBRL filings into comparable company financial statements as JSON. Give a ticker or CIK; get a standardized 10-K income statement, balance sheet, and cash flow, each line citing its XBRL tag and checked against accounting identities. Sector-aware (standard/financial/insurance).
Pricing
from $42.50 / 1,000 company-period normalizeds
Rating
0.0
(0)
Developer
Scott Helvick
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 hours ago
Last modified
Categories
Share
Every US public company's financials are in SEC EDGAR as XBRL — but raw XBRL is a bag of thousands of issuer-chosen tags, and two companies almost never tag the same line the same way, so the data isn't comparable without real normalization work. SEC Financials Normalizer does that work: give it a ticker or CIK and get a standardized income statement, balance sheet, and cash-flow statement as clean JSON — each line citing the exact XBRL tag it came from, each statement checked against accounting identities.
What this does
- Ticker or CIK in, standardized statements out — pass
AAPLor a 10-digit CIK; get the income statement, balance sheet, and cash-flow statement as typed JSON, one record per company per fiscal year. - Normalized, comparable line items — a maintained concept-map resolves each filer's messy, issuer-custom
us-gaaptags onto a stable set of standard concepts (revenue, cost of revenue, gross profit, operating income, net income; total assets, liabilities, equity, cash; operating/investing/financing cash flow), so the same field means the same thing across companies and years. - Sector-aware — banks and insurers have no gross-profit structure and report total revenue with different tags. The Actor detects the sector from the filer's SIC code (standard / financial / insurance) and applies the right concept set, instead of forcing one template that yields wrong numbers for financials.
- Values are verbatim, never invented — every number comes straight from an XBRL fact. Nothing is computed except a single explicit fallback (liabilities = assets − equity when a filer omits a standalone total-liabilities tag), which is flagged as
derived. - Validated against accounting identities — each statement is checked against the filer's own reported subtotals: Assets = Liabilities + Equity, and (non-financials) Gross Profit = Revenue − Cost of Revenue. Every line carries a provenance flag (
reported/derived/missing) and anidentityValidatedboolean; each record reports the identity residuals. - Source-cited — every line names the
sourceTagit was drawn from, and each record links to the company's EDGAR filing index. - Batch — up to 50 companies and up to 10 annual periods per run; one fetch per company covers all requested years.
Use cases:
- Pull standardized fundamentals for a watchlist of tickers as JSON, ready to write to a database.
- Give an agent comparable income/balance/cash-flow figures for several companies without it parsing raw filings.
- Build a multi-year fundamentals time series for one company in a single call.
- Feed an LLM clean, cited financials instead of tens of thousands of tokens of raw XBRL.
- Screen companies on normalized metrics with every number traceable to its source tag.
Why normalization matters
The data is already public and free — SEC's XBRL API hands back every fact a company filed. The problem is that it's a heap of thousands of tags, and the same economic line is tagged differently across filers and across years. Revenue might be RevenueFromContractWithCustomerExcludingAssessedTax on a recent tech filer, Revenues on an older or financial filer, SalesRevenueNet on something older still. Equity might be the parent-only figure or the figure including noncontrolling interests — and if you pick the wrong one for a holding company with large minority stakes, the balance sheet silently fails to balance. Pull a tag by name and you get numbers that look fine and aren't comparable.
Financials and insurers break naive templates entirely: a bank has no "gross profit," and its total revenue isn't the contract-with-customer line. A normalizer that doesn't know this returns a confidently wrong revenue figure for every insurer it touches.
This Actor's value is the maintained mapping that absorbs all of that — and the check that proves it worked. Because every filer reports its own subtotals, the accounting identities are self-contained ground truth: if the resolved components satisfy Assets = Liabilities + Equity, the normalization is corroborated by the filer's own numbers, not by trust. And because the output is built deterministically from XBRL facts — no language model in the path — the figures are never hallucinated. The one computed value, derived liabilities, is labeled as such.
How it compares to alternatives
| Approach | Standardized line items | Sector-aware | Identity-validated | Per-line source citation | Numbers |
|---|---|---|---|---|---|
| Raw EDGAR / XBRL scraper | No — raw tags | No | No | tag dump | verbatim |
| Roll-your-own XBRL parser | You build + maintain it | You build it | You build it | You build it | verbatim |
| LLM over raw filings | Sometimes | Sometimes | No | No | hallucination risk |
| SEC Financials Normalizer | Yes | Yes | Yes | Yes | verbatim |
Raw scrapers hand back the tag soup; rolling your own means owning the concept-map and its drift forever; an LLM over filings risks inventing numbers. This Actor is the maintained-normalization layer with a correctness check, returning verbatim figures.
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
identifiers | array | yes | — | Companies as ticker symbols (e.g. AAPL, BRK.B) or 10-digit SEC CIKs (zero-padding optional). Tickers are resolved via SEC's official ticker map. 1–50 per run. |
years | integer | — | 1 | Most-recent annual (10-K) periods to return per company, newest first. One record (and one charge) per company-period. 1–10. |
statements | array | — | ["income","balance","cashflow"] | Which statements to include. Valid values: income, balance, cashflow. Validated at runtime. |
Output
One dataset record per company-period.
{"identifier": "AAPL","status": "completed","cik": "0000320193","ticker": "AAPL","companyName": "Apple Inc.","sector": "standard","fiscalYear": 2025,"periodEnd": "2025-09-27","form": "10-K","currency": "USD","statements": [{"kind": "income","lineItems": [{ "concept": "revenue", "label": "Revenue", "value": 416161000000, "sourceTag": "RevenueFromContractWithCustomerExcludingAssessedTax", "provenance": "reported", "identityValidated": true },{ "concept": "grossProfit", "label": "Gross profit", "value": 195196000000, "sourceTag": "GrossProfit", "provenance": "reported", "identityValidated": true }]},{"kind": "balance","lineItems": [{ "concept": "totalAssets", "label": "Total assets", "value": 359240000000, "sourceTag": "Assets", "provenance": "reported", "identityValidated": true },{ "concept": "totalLiabilities", "label": "Total liabilities", "value": 285510000000, "sourceTag": "Liabilities", "provenance": "reported", "identityValidated": true },{ "concept": "totalEquity", "label": "Total equity (incl. NCI)", "value": 73730000000, "sourceTag": "StockholdersEquity", "provenance": "reported", "identityValidated": true }]}],"identities": [{ "name": "assets = liabilities + equity", "holds": true, "residual": 0, "note": null },{ "name": "gross profit = revenue - cost of revenue", "holds": true, "residual": 0, "note": null }],"sourceFilingUrl": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&type=10-K","error": null}
A company that can't be resolved or has no usable annual facts comes back as one failed record (not charged): { "identifier": "NOTATICKER", "status": "failed", "statements": [], "identities": [], "error": "unresolved-identifier" }.
| Field | Type | Description |
|---|---|---|
identifier | string | The input identifier (ticker or CIK), echoed. |
status | string | completed (normalized, charged) or failed (unresolved / no facts; not charged). |
cik / ticker / companyName | string | null | Resolved company identity. |
sector | string | null | standard, financial, or insurance (from SIC); selects the income concept set. |
fiscalYear / periodEnd / form / currency | — | The period this record anchors on (annual 10-K, USD). |
statements | array | Requested statements; each line item has concept, label, value, sourceTag, provenance (reported/derived/missing), identityValidated. |
identities | array | Accounting-identity checks: name, holds (null = not applicable to the sector), residual, note. |
sourceFilingUrl | string | null | EDGAR filing-index link for citation. |
error | string | null | Reason when status is failed; null on success. |
notice | string | Standing data-source + disclaimer note carried on every record (see Data source & disclaimer). |
Example
{ "identifiers": ["AAPL", "JPM"], "years": 2, "statements": ["income", "balance", "cashflow"] }
curl -X POST "https://api.apify.com/v2/acts/shelvick~sec-financials-normalizer/run-sync-get-dataset-items?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"identifiers":["AAPL","JPM"],"years":2}'
Calling from an AI agent
Apify MCP server
The Actor is a callable tool on mcp.apify.com. The input schema is self-documenting — an LLM can construct a correct call from the tool description and field names alone. Pay per call via x402 USDC on Base or Skyfire managed tokens.
Apify SDK (Python)
from apify_client import ApifyClientclient = ApifyClient("YOUR_TOKEN")run = client.actor("shelvick/sec-financials-normalizer").call(run_input={"identifiers": ["AAPL", "JPM"], "years": 2})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["identifier"], item["fiscalYear"], item["status"], item["sector"])
REST API
POST https://api.apify.com/v2/acts/shelvick~sec-financials-normalizer/run-sync-get-dataset-items?token=YOUR_TOKEN
For large batches, start asynchronously and poll the run's dataset.
Pricing
Pay-per-event, billed only on success: one charge per company-period record pushed. Companies that don't resolve, or have no usable annual facts, are never charged — a run only costs you the company-periods it actually normalized. One fetch covers all requested years, so multi-year requests are efficient. Cap a whole run with maxTotalChargeUsd.
See the Pricing tab on this Store page for the current per-event rate and any active subscriber discounts.
Behavior
Run-level failures (rare): invalid input fails the run before any work — empty identifiers, more than 50, years out of range (1–10), or an unknown statements value. Nothing is charged.
Per-record outcomes:
completed— a normalized statement set was produced (charged). Check each line'sprovenanceand theidentitiesresiduals for confidence.failed—unresolved-identifier(ticker/CIK not found),no-annual-facts(no 10-K XBRL facts), orfacts-fetch-failed(SEC fetch error). Never charged.
Performance: sourcing is SEC's free XBRL + submissions APIs; a company is a couple of API calls plus parsing, a few seconds each, rate-limited politely under SEC's ceiling. One companyfacts fetch covers every requested year. A 50-company multi-year run completes well within the run timeout.
FAQ
How are tickers resolved, and do class shares work?
Tickers are mapped to CIKs via SEC's official ticker file; 10-digit CIKs are accepted directly. Class shares work either way — BRK.B and BRK-B both resolve.
Why is a line missing or derived?
missing means no candidate tag for that concept was present in the filing. derived appears only for total liabilities when the filer omits a standalone tag — it's computed as assets − equity and flagged so you know it wasn't reported directly.
Does it cover quarterly periods? No — this version returns annual (10-K) periods only.
Am I charged for companies that fail?
No. The charge fires only per completed company-period; unresolved or fact-less companies are free.
How do I know a number is trustworthy?
Each line names its sourceTag and carries provenance + identityValidated; each record reports the identity residuals. A balanced sheet and a passing gross-profit check are the filer's own subtotals corroborating the normalization.
What this doesn't do
- No quarterly statements. Annual 10-K periods only in this version.
- No non-US / IFRS filers. US-GAAP XBRL from domestic SEC filers.
- No segment, footnote, or per-share detail. The core three statements' standard lines, not the full filing.
- No ratios, scoring, or analysis. It returns normalized statements; interpretation is yours to layer on.
- No private companies. SEC filers only.
For raw filing documents or the full unfiltered XBRL fact set, use an EDGAR filings scraper. For non-SEC or private-company financials, use a commercial financial-data provider. For ratios, screening logic, or narrative analysis on top of these numbers, layer your own logic or an analysis tool — this Actor is the clean, cited input to that, not the analysis itself.
Design notes: www.scotthelvick.com/tools/sec-financials-normalizer