SEC EDGAR Filings & Fundamentals API — RAG-Ready
Under maintenancePricing
$2.00 / 1,000 dataset item scrapeds
SEC EDGAR Filings & Fundamentals API — RAG-Ready
Under maintenanceTurn any ticker into clean, citation-tagged RAG chunks of SEC filings (10-K, 10-Q, 8-K, Form 4, 13F) plus normalized XBRL fundamentals. Official SEC EDGAR data, no API key. Full-text search, watch mode for new filings. Built for finance AI agents and RAG.
Pricing
$2.00 / 1,000 dataset item scrapeds
Rating
0.0
(0)
Developer
Harry Schoeller
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
Turn the U.S. SEC EDGAR database into embeddings-ready, citation-rich data for AI / RAG pipelines and finance agents. Give it tickers (or CIKs, or a full-text query); get back clean Markdown chunks of every 10-K, 10-Q, and 8-K — split by SEC Item section, token-budgeted, deep-linked — plus XBRL fundamentals (Revenues, Net Income, EPS, Assets, cash flow) attached to each filing.
No API keys. Keyless EDGAR endpoints only, with a polite 10 req/s token-bucket limiter and a descriptive User-Agent, exactly as the SEC requires.
What it does
- Ticker → CIK resolution via the canonical
company_tickers.json. - Filing listing & filtering by form type and date window from the EDGAR submissions API.
- Full-text search across all of EDGAR via
efts.sec.gov(optionalfullTextQuery). - Primary document fetch + clean HTML → Markdown (Readability + Turndown/GFM — financial tables survive intact).
- SEC Item segmentation (
Item 1A. Risk Factors,Item 7. MD&A, 8-KItem 2.02, …) so every chunk carries an Item anchor. - Structure-aware, token-budgeted chunking with overlap and
headingsPath. - XBRL fundamentals per company via the
companyconcept/companyfactsAPI.
Why use it (SEO)
- SEC EDGAR API for AI without writing a scraper.
- 10-K / 10-Q / 8-K filings as RAG chunks with working citations and deep links.
- XBRL fundamentals — Revenues, Net Income, EPS, Total Assets, operating/investing/financing cash flow.
- Drop-in formats for LangChain, pgvector / vector DB bulk upsert, or generic JSONL.
- Built for financial analysis agents, equity research copilots, and compliance/risk tooling.
Input
| Field | Type | Default | Notes |
|---|---|---|---|
tickers | string[] | — | e.g. ["AAPL","MSFT"] |
ciks | string[] | — | raw CIKs, zero-padded internally |
filingTypes | string[] | ["10-K","10-Q","8-K"] | EDGAR form types |
since / until | string | — | ISO date window on filingDate |
fullTextQuery | string | — | phrase search across all EDGAR |
maxFilingsPerCompany | int | 25 | per-company cap |
maxFilingsTotal | int | 0 | global cap (0 = unlimited) |
extractChunks | bool | true | fetch + chunk primary docs |
chunkSize / chunkOverlap | int | 512 / 64 | token budget |
splitByItem | bool | true | segment on SEC Item boundaries |
includeFundamentals | bool | true | attach XBRL facts |
xbrlConcepts | string[] | built-in ~20 | override the us-gaap allowlist |
xbrlSource | enum | companyconcept | companyconcept or companyfacts |
outputFormat | enum | chunks-jsonl | chunks-jsonl, langchain, jsonl-bulk, filings-only |
userAgentEmail | string | — | contact email for the SEC User-Agent |
At least one of tickers, ciks, or fullTextQuery is required.
Output
Two record kinds land in the dataset:
recordType: "chunk"— the primary RAG artifact. Markdown text +headingsPath+ a citation block (ticker, CIK, form, accession, filing date, Item code/title, deep link, and a human-readablecitationstring).recordType: "filing"— one per filing, carryingfundamentals(latest observation per GAAP concept) and provenance.outputFormat: "filings-only"emits only these.
Default GAAP concepts (us-gaap)
Revenues, RevenueFromContractWithCustomerExcludingAssessedTax, CostOfRevenue, GrossProfit, OperatingIncomeLoss, NetIncomeLoss, EarningsPerShareBasic, EarningsPerShareDiluted, ResearchAndDevelopmentExpense, Assets, AssetsCurrent, Liabilities, LiabilitiesCurrent, StockholdersEquity, CashAndCashEquivalentsAtCarryingValue, LongTermDebtNoncurrent, NetCashProvidedByUsedInOperatingActivities, NetCashProvidedByUsedInInvestingActivities, NetCashProvidedByUsedInFinancingActivities, PaymentsToAcquirePropertyPlantAndEquipment.
Endpoints used (all keyless)
https://www.sec.gov/files/company_tickers.jsonhttps://data.sec.gov/submissions/CIK##########.jsonhttps://efts.sec.gov/LATEST/search-indexhttps://www.sec.gov/Archives/edgar/data/.../index.json+ primary documentshttps://data.sec.gov/api/xbrl/companyconcept/...and.../companyfacts/...
Local development
npm installnpm run buildapify run # uses .actor/INPUT.json