SEC EDGAR AI Scraper - LLM-ready filings, RAG chunks & signals avatar

SEC EDGAR AI Scraper - LLM-ready filings, RAG chunks & signals

Pricing

Pay per event + usage

Go to Apify Store
SEC EDGAR AI Scraper - LLM-ready filings, RAG chunks & signals

SEC EDGAR AI Scraper - LLM-ready filings, RAG chunks & signals

SEC EDGAR as LLM-ready data: section-aware Markdown, RAG chunks with stable IDs + SHA-256, XBRL facts, Form 4/13F, section diff, scored signals. Multi-event PPE. Pairs with sec-edgar-mcp.

Pricing

Pay per event + usage

Rating

0.0

(0)

Developer

Domin Vo

Domin Vo

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

38 minutes ago

Last modified

Share

Pull SEC filings as clean JSON or LLM-ready Markdown — without writing a single line of EDGAR code. This Actor turns the entire SEC EDGAR corpus — 10-K, 10-Q, 8-K, 13F filings, Form 4 insider transactions, Form 144, XBRL facts, DEF 14A proxies, S-1 prospectuses, N-PORT mutual-fund holdings, Schedule 13D activist stakes, Form ADV, and 60+ more — into uniform rows you can pipe straight into an LLM, vector store, or notebook.

What does SEC EDGAR AI Scraper do?

One Actor, 70 modes, every U.S. public company. Pick a mode (e.g. form_13f_filings, form_4_filings, xbrl_facts), feed it a ticker (AAPL), CIK (0000320193), or company name, and get structured rows back in seconds. No User-Agent headers, no fair-use throttling, no XBRL parser to maintain.

Every row uses the same envelope — sha256, mode, cik, accession, filing_date, payload — so a 13F holding and an 8-K filing ingest the same way downstream.

Three things make this Actor different from any other SEC EDGAR scraper:

  1. RAG-native output. filing_markdown, filing_chunks, and section_diff ship section-aware Markdown, stable chunk IDs with SHA-256 fingerprints, and year-over-year diffs — drop straight into LangChain, LlamaIndex, or your own retrieval pipeline.
  2. Hedge-fund signals. signal_pack, insider_cluster, lockup_expiration, discrete_signals, and activist_clustering compute composite signals on top of raw filings. You don't get a thin wrapper around data.sec.gov — you get the alpha layer.
  3. Pay only for rows you keep. 20 bundled event types, priced from $0.0001 to $0.04. The first 20 rows of every run are free — prototype at zero cost.

Why use SEC EDGAR AI Scraper?

  • Quant funds & analysts — pull 13F filings, Form 4 insider trades, Schedule 13D activist stakes, and Form 144 planned sales into Python, R, or a warehouse with one HTTP call. Build signals without building a parser.
  • Fintech & RegTech apps — power compliance copilots, audit tools, and disclosure-monitoring products with a real SEC filings API. Section diffs flag the exact paragraphs that changed between 10-Ks.
  • AI agents & RAG pipelines — section-aware Markdown plus deterministic chunk IDs give your agent citations with provenance. Every chunk has a SHA-256 it can quote back.
  • Bloomberg Terminal alternative on a budget — get the SEC slice of the Terminal (13F changes, insider trades, proxy fights, M&A registrations) for cents per query instead of $24K a seat.

How to use SEC EDGAR AI Scraper

  1. Click Try for free on the Actor page.
  2. Pick a mode (e.g. form_13f_filings).
  3. Enter identifiers — tickers (AAPL), CIKs (0000320193), names, or domains.
  4. (Optional) Set since / until dates, a forms filter, or a limit.
  5. Click Save & Start. Stream results to JSON, CSV, Excel, NDJSON, or Markdown.

Or call it from Python:

from apify_client import ApifyClient
client = ApifyClient("<APIFY_TOKEN>")
run = client.actor("dominvo/sec-edgar-ai-scraper").call(run_input={
"mode": "form_13f_filings",
"identifiers": ["BRK.A", "RENTECH"],
"since": "2026-01-01",
"limit": 200,
})
for row in client.dataset(run["defaultDatasetId"]).iterate_items():
print(row["payload"]["issuer"], row["payload"]["value"])

Schedule daily runs, chain modes via Apify integrations, or trigger on a webhook — same API.

Input — picking a mode

70 modes, one Actor. Pick the mode that matches the form or signal you need:

GoalMode
Section 16 insider tradesform_4_filings
Institutional 13F holdingsform_13f_filings
Activist 5%+ stakesform_13d_filings, activist_clustering
Planned insider sales (90-day early warning)form_144_filings, lockup_expiration
Annual / quarterly reportsform_10k_filings, form_10q_filings
8-K item alerts with severity tagsform_8k_filings, discrete_signals
Proxy fights, exec comp, audit feesform_def14a_filings, def14a_exec_comp
IPO prospectus + lockup calendarform_s1_filings, lockup_expiration
M&A registration dealsform_s4_filings, form_to_tender, form_14d9_recommendation
Foreign issuers (FPI / ADR / Canadian)form_20f_filings, form_6k_filings, form_40f_filings
Mutual fund / ETF holdings + proxy votesform_nport_filings, form_npx_filings
US-GAAP / IFRS XBRL dataxbrl_facts, xbrl_statements, xbrl_frames, xbrl_metrics
Cross-cutting EDGAR searchfilings_search, filings_feed, filings_index
Section-aware Markdown for RAGfiling_markdown, filing_chunks
Year-over-year 10-K text diffsection_diff, disclosure_compare
Hedge-fund composite signalssignal_pack, insider_cluster
Resolve ticker ↔ CIK ↔ name ↔ domaincik_ticker_map
SEC enforcement & adviser datalitigation_releases, aaer_releases, comment_letters, form_adv_filings

Common inputs: identifiers (array), since / until (ISO date), forms (array filter), limit (cap rows), incremental (skip unchanged rows), output_format (json / ndjson / csv / xlsx / markdown).

{
"mode": "form_4_filings",
"identifiers": ["TSLA", "MSFT"],
"since": "2026-01-01",
"transaction_codes": ["S", "P"],
"limit": 500
}

Output — JSON, CSV, Excel, Markdown

Every row uses the same envelope with a mode-specific payload:

{
"sha256": "9f1c2a…",
"mode": "form_4_filings",
"cik": "0001318605",
"accession": "0001209191-26-012345",
"filing_date": "2026-05-12",
"payload": {
"issuer": "Tesla, Inc.",
"reporter": "Elon Musk",
"relationship": ["Director", "Officer", "10% Owner"],
"transaction_date": "2026-05-10",
"transaction_code": "S",
"shares": 1500000,
"price_per_share": 247.18,
"total_value": 370770000,
"shares_owned_after": 411062076
}
}

Switch output_format per run:

  • json — Apify dataset rows (default).
  • ndjson / csv / xlsx — single file streamed to the key-value store at end of run.
  • markdownLLM-ready output.md bundle: one ## heading per record, ready to paste into Claude, ChatGPT, or your RAG pipeline. The only SEC Actor that ships native Markdown output.

Data fields by mode

ModeWhat you getPersona
form_4_filingsSection 16 insider trades, one row per transactionQuant / Hedge
form_13f_filings13F-HR institutional holdings, one row per CUSIPQuant / Hedge
form_13d_filingsSchedule 13D activist 5%+ stakes + group filer clusteringQuant / Hedge
form_144_filingsRestricted-stock sale notices — 90-day forward signalQuant / Hedge
form_10k_filings / form_10q_filingsAnnual / quarterly reports with section indexFintech / RegTech
form_8k_filings8-K Items with severity tags + restatement flagsFintech / RegTech
xbrl_facts / xbrl_statements / xbrl_metrics / xbrl_framesUS-GAAP / IFRS XBRL data, atomic or normalizedFintech / Quant
filing_markdown / filing_chunksSection-aware Markdown + stable RAG chunk IDsLLM / Agentic
section_diff / disclosure_compareParagraph-level YoY 10-K diffLLM / Agentic
signal_pack / discrete_signals / insider_clusterHedge-fund composite signalsQuant / Hedge
litigation_releases / aaer_releases / comment_lettersSEC enforcement & restatement early-warning dataRegTech / Forensic
cik_ticker_mapResolve ticker / CIK / name / domainAll
filings_search / filings_feed / filings_indexEDGAR full-text search + real-time + historicalAll

Full 70-mode catalog in the Input tab dropdown.

Pricing — how much does it cost to pull SEC filings?

Pay per row, not per second. The first 20 rows of every run are free, so prototyping is free. Beyond that, only the rows you receive are charged:

Data shapeExample modesPrice per row
CIK / ticker lookupcik_ticker_map$0.0001
Filing index / feed / search hitfilings_index, filings_search, filings_feed$0.0004
Insider transactionform_4_filings, form_144_filings$0.0005
Institutional holdingform_13f_filings, form_nport_filings, form_npx_filings$0.0005
XBRL atomic factxbrl_facts, xbrl_frames, xbrl_metrics$0.0002
XBRL statement rowxbrl_statements$0.001
Form filing recordform_10k_filings, form_8k_filings, form_s1_filings, etc.$0.002
10-K section extractsection_extract modes (MD&A, Risk Factors, ICFR, …)$0.001
Filing as Markdown (section-aware)filing_markdown$0.003
RAG chunkfiling_chunks$0.0008
Year-over-year section diff opsection_diff, disclosure_compare$0.002
DEF 14A proxy rowdef14a_exec_comp, def14a_audit_fees, …$0.001
Fund report (N-CSR / N-CEN)form_ncsr_filings, form_ncen_filings$0.001
Enforcement / Form ADV recordlitigation_releases, aaer_releases, form_adv_filings$0.001
Risk signalsignal_pack, insider_cluster, activist_clustering$0.002
AI brief / summaryai_summary, ai_importance$0.04
Run startedonce per run$0.005
Change detected (incremental)only when a record's SHA-256 differs$0.0003

Worked examples:

  • Daily 13F sweep of 100 funds (~5,000 holdings) → $2.50 (minus the first 20 free rows).
  • Full 10-K of Apple as RAG chunks (~400 chunks) → $0.32.
  • Insider trades for the FAANG 5 over 2025 (~3,000 rows) → $1.50.

Tips & advanced options

  • Use incremental: true on schedules. Re-run the same input and you only pay $0.0003 per row whose SHA-256 actually changed. Most days that's a handful of rows out of thousands.
  • Start with a mid-cap (MDB, NET, SHOP) when testing — Berkshire-class filings can blow past 1 GB of memory on RAG-heavy modes.
  • Chain modes via the Apify API. Resolve identifiers with cik_ticker_map, then fan out to form_4_filings and form_13f_filings in parallel runs. Same dataset.
  • Pick the right output format. markdown for agents and RAG, csv / xlsx for analysts, ndjson for warehouses, json for everyone else.
  • No Apify Proxy needed. SEC fair-use requires identifying User-Agent (set automatically) and prohibits residential-proxy obfuscation — a single egress IP is fine and faster.

FAQ, disclaimers & support

Is this legal? Yes. SEC EDGAR is public-domain U.S. government data. This Actor follows SEC fair-use guidelines (identifying User-Agent, ≤10 req/s) and does not scrape any restricted source.

Is it official? No. This is an independent third-party tool consuming the official SEC endpoints (data.sec.gov, www.sec.gov). Not affiliated with the SEC.

Do I need to register a User-Agent or rate-limit my requests? No. We handle SEC fair-use compliance, rate limits, retries, and pagination in the background.

Can I use this as a Python SEC EDGAR API? Yes. ApifyClient("…").actor("dominvo/sec-edgar-ai-scraper").call(...) is the one-line drop-in. Works from any language with an HTTP client.

Does it cover foreign issuers? Yes — form_20f_filings (FPIs), form_6k_filings (foreign current reports), and form_40f_filings (Canadian) cover the non-U.S. surface.

Can I get LLM-ready output? Yes — set output_format: markdown and we bundle every row into one output.md file ready for RAG ingestion. Or use filing_chunks / filing_markdown modes directly.

What if I need a mode that isn't built yet? File an issue in the Issues tab. New modes ship along the priority order in the catalog; custom modes and bespoke chunking schemas are available on request.


Built on the Apify SDK, Crawlee for Python, and edgartools ≥ 5.31. Runs on a 2 GB memory tier with Apify Standby enabled.