SEC EDGAR Filings & Fundamentals API — RAG-Ready avatar

SEC EDGAR Filings & Fundamentals API — RAG-Ready

Under maintenance

Pricing

$2.00 / 1,000 dataset item scrapeds

Go to Apify Store
SEC EDGAR Filings & Fundamentals API — RAG-Ready

SEC EDGAR Filings & Fundamentals API — RAG-Ready

Under maintenance

Turn any ticker into clean, citation-tagged RAG chunks of SEC filings (10-K, 10-Q, 8-K, Form 4, 13F) plus normalized XBRL fundamentals. Official SEC EDGAR data, no API key. Full-text search, watch mode for new filings. Built for finance AI agents and RAG.

Pricing

$2.00 / 1,000 dataset item scrapeds

Rating

0.0

(0)

Developer

Harry Schoeller

Harry Schoeller

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

9 days ago

Last modified

Share

Turn the U.S. SEC EDGAR database into embeddings-ready, citation-rich data for AI / RAG pipelines and finance agents. Give it tickers (or CIKs, or a full-text query); get back clean Markdown chunks of every 10-K, 10-Q, and 8-K — split by SEC Item section, token-budgeted, deep-linked — plus XBRL fundamentals (Revenues, Net Income, EPS, Assets, cash flow) attached to each filing.

No API keys. Keyless EDGAR endpoints only, with a polite 10 req/s token-bucket limiter and a descriptive User-Agent, exactly as the SEC requires.

What it does

  1. Ticker → CIK resolution via the canonical company_tickers.json.
  2. Filing listing & filtering by form type and date window from the EDGAR submissions API.
  3. Full-text search across all of EDGAR via efts.sec.gov (optional fullTextQuery).
  4. Primary document fetch + clean HTML → Markdown (Readability + Turndown/GFM — financial tables survive intact).
  5. SEC Item segmentation (Item 1A. Risk Factors, Item 7. MD&A, 8-K Item 2.02, …) so every chunk carries an Item anchor.
  6. Structure-aware, token-budgeted chunking with overlap and headingsPath.
  7. XBRL fundamentals per company via the companyconcept / companyfacts API.

Why use it (SEO)

  • SEC EDGAR API for AI without writing a scraper.
  • 10-K / 10-Q / 8-K filings as RAG chunks with working citations and deep links.
  • XBRL fundamentals — Revenues, Net Income, EPS, Total Assets, operating/investing/financing cash flow.
  • Drop-in formats for LangChain, pgvector / vector DB bulk upsert, or generic JSONL.
  • Built for financial analysis agents, equity research copilots, and compliance/risk tooling.

Input

FieldTypeDefaultNotes
tickersstring[]e.g. ["AAPL","MSFT"]
ciksstring[]raw CIKs, zero-padded internally
filingTypesstring[]["10-K","10-Q","8-K"]EDGAR form types
since / untilstringISO date window on filingDate
fullTextQuerystringphrase search across all EDGAR
maxFilingsPerCompanyint25per-company cap
maxFilingsTotalint0global cap (0 = unlimited)
extractChunksbooltruefetch + chunk primary docs
chunkSize / chunkOverlapint512 / 64token budget
splitByItembooltruesegment on SEC Item boundaries
includeFundamentalsbooltrueattach XBRL facts
xbrlConceptsstring[]built-in ~20override the us-gaap allowlist
xbrlSourceenumcompanyconceptcompanyconcept or companyfacts
outputFormatenumchunks-jsonlchunks-jsonl, langchain, jsonl-bulk, filings-only
userAgentEmailstringcontact email for the SEC User-Agent

At least one of tickers, ciks, or fullTextQuery is required.

Output

Two record kinds land in the dataset:

  • recordType: "chunk" — the primary RAG artifact. Markdown text + headingsPath + a citation block (ticker, CIK, form, accession, filing date, Item code/title, deep link, and a human-readable citation string).
  • recordType: "filing" — one per filing, carrying fundamentals (latest observation per GAAP concept) and provenance. outputFormat: "filings-only" emits only these.

Default GAAP concepts (us-gaap)

Revenues, RevenueFromContractWithCustomerExcludingAssessedTax, CostOfRevenue, GrossProfit, OperatingIncomeLoss, NetIncomeLoss, EarningsPerShareBasic, EarningsPerShareDiluted, ResearchAndDevelopmentExpense, Assets, AssetsCurrent, Liabilities, LiabilitiesCurrent, StockholdersEquity, CashAndCashEquivalentsAtCarryingValue, LongTermDebtNoncurrent, NetCashProvidedByUsedInOperatingActivities, NetCashProvidedByUsedInInvestingActivities, NetCashProvidedByUsedInFinancingActivities, PaymentsToAcquirePropertyPlantAndEquipment.

Endpoints used (all keyless)

  • https://www.sec.gov/files/company_tickers.json
  • https://data.sec.gov/submissions/CIK##########.json
  • https://efts.sec.gov/LATEST/search-index
  • https://www.sec.gov/Archives/edgar/data/.../index.json + primary documents
  • https://data.sec.gov/api/xbrl/companyconcept/... and .../companyfacts/...

Local development

npm install
npm run build
apify run # uses .actor/INPUT.json