SEC Filings to Markdown for RAG
Pricing
from $40.00 / 1,000 markdown chunks
SEC Filings to Markdown for RAG
Convert SEC EDGAR filings (10-K, 10-Q, 8-K, 13F) into clean, chunked, citation-tagged Markdown for RAG and LLM pipelines. Official data, no login.
Pricing
from $40.00 / 1,000 markdown chunks
Rating
0.0
(0)
Developer
NexGenData
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
📑 SEC Filings to Markdown for RAG · EDGAR → LLM-Ready
Convert SEC EDGAR filings into clean, chunked, citation-tagged Markdown — built for AI engineers feeding financial filings into RAG pipelines and LLM agents.
⚡ What you get
| Field | Description |
|---|---|
company / cik / ticker | Issuer identity |
form | Filing type (10-K, 10-Q, 8-K, 13F, …) |
filingDate | Date filed |
accessionNumber | SEC accession number (citation) |
sourceUrl | Direct link to the source document (citation) |
chunkIndex / totalChunks | Position within the filing |
markdown | Clean Markdown chunk, ready for embedding |
🎯 Use cases
- AI engineers building financial-research copilots / RAG over filings
- Quant & fundamental analysts loading 10-Ks into a vector store
- Compliance teams building searchable filing knowledge bases
- Fintech products needing LLM-ready filing text with citations
🚀 Sample inputs
{ "companies": ["AAPL","MSFT"], "formTypes": ["10-K","10-Q"], "maxFilingsPerCompany": 2, "chunkWords": 800 }
{ "companies": ["NVDA"], "formTypes": ["8-K"], "maxFilingsPerCompany": 5 }
{ "companies": ["320193"], "formTypes": ["13F-HR"] }
📦 Sample output
{ "company": "Apple Inc.", "cik": "0000320193", "ticker": "AAPL", "form": "10-K","filingDate": "2025-11-01", "accessionNumber": "0000320193-25-000123","sourceUrl": "https://www.sec.gov/Archives/edgar/data/320193/.../aapl-20250927.htm","chunkIndex": 0, "totalChunks": 42, "markdown": "# Item 1. Business\nThe Company designs..." }
📊 Sample Output

🛠 How it works
- Source — resolves tickers→CIK and reads filings from the official SEC EDGAR APIs (
data.sec.gov,www.sec.gov/Archives). - Parser — strips scripts/styles and converts filing HTML to ATX Markdown.
- Chunking — splits into ~
chunkWords-word chunks for embedding. - Schema — one row per chunk with full citation fields (accession + source URL).
- Fallback — unresolved tickers / failed docs are logged and skipped; the run still succeeds.
🔗 Related Actors
- SEC EDGAR Scraper — structured filing data
- SEC Form 13F Holdings Tracker — institutional holdings
- RAG Web Browser — web content for retrieval
- Website Content Crawler — full-site Markdown for AI
💰 Pricing Example
Pay-per-event: $0.005 per run + $0.04 per Markdown chunk (document-record).
| Chunks | Cost |
|---|---|
| 100 | ~$4.00 |
| 500 | ~$20.00 |
| 2,000 | ~$80.00 |
| Apify's $5 free credit covers ~124 chunks. Start free → |
⚖️ Legal & data sources
Data is from the public SEC EDGAR system (data.sec.gov, www.sec.gov) — U.S. government public-domain filings. Requests use an identified, contact-bearing User-Agent per SEC access guidance. You are responsible for your downstream use.
❓ FAQ
Which forms are supported? Any EDGAR form type — pass them in formTypes (10-K, 10-Q, 8-K, 13F-HR, DEF 14A, …).
Ticker or CIK? Either; tickers are resolved to CIK automatically.
Are citations included? Yes — every chunk carries the accession number and source URL.
How big are chunks? ~chunkWords words (default 800); tune for your embedder.
Is the data fresh? Pulled live from EDGAR at run time.
Cost control? Use maxFilingsPerCompany and formTypes to bound output.
🆘 Troubleshooting
- Company not found — check the ticker symbol or pass the CIK directly.
- 0 chunks — the requested
formTypesmay not exist in that issuer's recent filings. - Huge output — lower
maxFilingsPerCompanyor narrowformTypes. - Markdown noise from exhibits — narrow to the primary form types you need.
🏷️ About NexGenData
NexGenData builds structured public-data tools for analysts, developers, and operators. Full catalog: thenextgennexus.com.