SEC Filings to Markdown for RAG avatar

SEC Filings to Markdown for RAG

Pricing

from $40.00 / 1,000 markdown chunks

Go to Apify Store
SEC Filings to Markdown for RAG

SEC Filings to Markdown for RAG

Convert SEC EDGAR filings (10-K, 10-Q, 8-K, 13F) into clean, chunked, citation-tagged Markdown for RAG and LLM pipelines. Official data, no login.

Pricing

from $40.00 / 1,000 markdown chunks

Rating

0.0

(0)

Developer

NexGenData

NexGenData

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

📑 SEC Filings to Markdown for RAG · EDGAR → LLM-Ready

Convert SEC EDGAR filings into clean, chunked, citation-tagged Markdown — built for AI engineers feeding financial filings into RAG pipelines and LLM agents.

⚡ What you get

FieldDescription
company / cik / tickerIssuer identity
formFiling type (10-K, 10-Q, 8-K, 13F, …)
filingDateDate filed
accessionNumberSEC accession number (citation)
sourceUrlDirect link to the source document (citation)
chunkIndex / totalChunksPosition within the filing
markdownClean Markdown chunk, ready for embedding

🎯 Use cases

  1. AI engineers building financial-research copilots / RAG over filings
  2. Quant & fundamental analysts loading 10-Ks into a vector store
  3. Compliance teams building searchable filing knowledge bases
  4. Fintech products needing LLM-ready filing text with citations

🚀 Sample inputs

{ "companies": ["AAPL","MSFT"], "formTypes": ["10-K","10-Q"], "maxFilingsPerCompany": 2, "chunkWords": 800 }
{ "companies": ["NVDA"], "formTypes": ["8-K"], "maxFilingsPerCompany": 5 }
{ "companies": ["320193"], "formTypes": ["13F-HR"] }

📦 Sample output

{ "company": "Apple Inc.", "cik": "0000320193", "ticker": "AAPL", "form": "10-K",
"filingDate": "2025-11-01", "accessionNumber": "0000320193-25-000123",
"sourceUrl": "https://www.sec.gov/Archives/edgar/data/320193/.../aapl-20250927.htm",
"chunkIndex": 0, "totalChunks": 42, "markdown": "# Item 1. Business\nThe Company designs..." }

📊 Sample Output

Sample output

🛠 How it works

  1. Source — resolves tickers→CIK and reads filings from the official SEC EDGAR APIs (data.sec.gov, www.sec.gov/Archives).
  2. Parser — strips scripts/styles and converts filing HTML to ATX Markdown.
  3. Chunking — splits into ~chunkWords-word chunks for embedding.
  4. Schema — one row per chunk with full citation fields (accession + source URL).
  5. Fallback — unresolved tickers / failed docs are logged and skipped; the run still succeeds.

💰 Pricing Example

Pay-per-event: $0.005 per run + $0.04 per Markdown chunk (document-record).

ChunksCost
100~$4.00
500~$20.00
2,000~$80.00
Apify's $5 free credit covers ~124 chunks. Start free →

Data is from the public SEC EDGAR system (data.sec.gov, www.sec.gov) — U.S. government public-domain filings. Requests use an identified, contact-bearing User-Agent per SEC access guidance. You are responsible for your downstream use.

❓ FAQ

Which forms are supported? Any EDGAR form type — pass them in formTypes (10-K, 10-Q, 8-K, 13F-HR, DEF 14A, …). Ticker or CIK? Either; tickers are resolved to CIK automatically. Are citations included? Yes — every chunk carries the accession number and source URL. How big are chunks? ~chunkWords words (default 800); tune for your embedder. Is the data fresh? Pulled live from EDGAR at run time. Cost control? Use maxFilingsPerCompany and formTypes to bound output.

🆘 Troubleshooting

  • Company not found — check the ticker symbol or pass the CIK directly.
  • 0 chunks — the requested formTypes may not exist in that issuer's recent filings.
  • Huge output — lower maxFilingsPerCompany or narrow formTypes.
  • Markdown noise from exhibits — narrow to the primary form types you need.

🏷️ About NexGenData

NexGenData builds structured public-data tools for analysts, developers, and operators. Full catalog: thenextgennexus.com.