SEC EDGAR AI Scraper - LLM-ready filings, RAG chunks & signals
Pricing
Pay per event + usage
SEC EDGAR AI Scraper - LLM-ready filings, RAG chunks & signals
SEC EDGAR as LLM-ready data: section-aware Markdown, RAG chunks with stable IDs + SHA-256, XBRL facts, Form 4/13F, section diff, scored signals. Multi-event PPE. Pairs with sec-edgar-mcp.
Pricing
Pay per event + usage
Rating
0.0
(0)
Developer
Domin Vo
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
38 minutes ago
Last modified
Categories
Share
Pull SEC filings as clean JSON or LLM-ready Markdown — without writing a single line of EDGAR code. This Actor turns the entire SEC EDGAR corpus — 10-K, 10-Q, 8-K, 13F filings, Form 4 insider transactions, Form 144, XBRL facts, DEF 14A proxies, S-1 prospectuses, N-PORT mutual-fund holdings, Schedule 13D activist stakes, Form ADV, and 60+ more — into uniform rows you can pipe straight into an LLM, vector store, or notebook.
What does SEC EDGAR AI Scraper do?
One Actor, 70 modes, every U.S. public company. Pick a mode (e.g. form_13f_filings, form_4_filings, xbrl_facts), feed it a ticker (AAPL), CIK (0000320193), or company name, and get structured rows back in seconds. No User-Agent headers, no fair-use throttling, no XBRL parser to maintain.
Every row uses the same envelope — sha256, mode, cik, accession, filing_date, payload — so a 13F holding and an 8-K filing ingest the same way downstream.
Three things make this Actor different from any other SEC EDGAR scraper:
- RAG-native output.
filing_markdown,filing_chunks, andsection_diffship section-aware Markdown, stable chunk IDs with SHA-256 fingerprints, and year-over-year diffs — drop straight into LangChain, LlamaIndex, or your own retrieval pipeline. - Hedge-fund signals.
signal_pack,insider_cluster,lockup_expiration,discrete_signals, andactivist_clusteringcompute composite signals on top of raw filings. You don't get a thin wrapper arounddata.sec.gov— you get the alpha layer. - Pay only for rows you keep. 20 bundled event types, priced from $0.0001 to $0.04. The first 20 rows of every run are free — prototype at zero cost.
Why use SEC EDGAR AI Scraper?
- Quant funds & analysts — pull 13F filings, Form 4 insider trades, Schedule 13D activist stakes, and Form 144 planned sales into Python, R, or a warehouse with one HTTP call. Build signals without building a parser.
- Fintech & RegTech apps — power compliance copilots, audit tools, and disclosure-monitoring products with a real SEC filings API. Section diffs flag the exact paragraphs that changed between 10-Ks.
- AI agents & RAG pipelines — section-aware Markdown plus deterministic chunk IDs give your agent citations with provenance. Every chunk has a SHA-256 it can quote back.
- Bloomberg Terminal alternative on a budget — get the SEC slice of the Terminal (13F changes, insider trades, proxy fights, M&A registrations) for cents per query instead of $24K a seat.
How to use SEC EDGAR AI Scraper
- Click Try for free on the Actor page.
- Pick a
mode(e.g.form_13f_filings). - Enter
identifiers— tickers (AAPL), CIKs (0000320193), names, or domains. - (Optional) Set
since/untildates, aformsfilter, or alimit. - Click Save & Start. Stream results to JSON, CSV, Excel, NDJSON, or Markdown.
Or call it from Python:
from apify_client import ApifyClientclient = ApifyClient("<APIFY_TOKEN>")run = client.actor("dominvo/sec-edgar-ai-scraper").call(run_input={"mode": "form_13f_filings","identifiers": ["BRK.A", "RENTECH"],"since": "2026-01-01","limit": 200,})for row in client.dataset(run["defaultDatasetId"]).iterate_items():print(row["payload"]["issuer"], row["payload"]["value"])
Schedule daily runs, chain modes via Apify integrations, or trigger on a webhook — same API.
Input — picking a mode
70 modes, one Actor. Pick the mode that matches the form or signal you need:
| Goal | Mode |
|---|---|
| Section 16 insider trades | form_4_filings |
| Institutional 13F holdings | form_13f_filings |
| Activist 5%+ stakes | form_13d_filings, activist_clustering |
| Planned insider sales (90-day early warning) | form_144_filings, lockup_expiration |
| Annual / quarterly reports | form_10k_filings, form_10q_filings |
| 8-K item alerts with severity tags | form_8k_filings, discrete_signals |
| Proxy fights, exec comp, audit fees | form_def14a_filings, def14a_exec_comp |
| IPO prospectus + lockup calendar | form_s1_filings, lockup_expiration |
| M&A registration deals | form_s4_filings, form_to_tender, form_14d9_recommendation |
| Foreign issuers (FPI / ADR / Canadian) | form_20f_filings, form_6k_filings, form_40f_filings |
| Mutual fund / ETF holdings + proxy votes | form_nport_filings, form_npx_filings |
| US-GAAP / IFRS XBRL data | xbrl_facts, xbrl_statements, xbrl_frames, xbrl_metrics |
| Cross-cutting EDGAR search | filings_search, filings_feed, filings_index |
| Section-aware Markdown for RAG | filing_markdown, filing_chunks |
| Year-over-year 10-K text diff | section_diff, disclosure_compare |
| Hedge-fund composite signals | signal_pack, insider_cluster |
| Resolve ticker ↔ CIK ↔ name ↔ domain | cik_ticker_map |
| SEC enforcement & adviser data | litigation_releases, aaer_releases, comment_letters, form_adv_filings |
Common inputs: identifiers (array), since / until (ISO date), forms (array filter), limit (cap rows), incremental (skip unchanged rows), output_format (json / ndjson / csv / xlsx / markdown).
{"mode": "form_4_filings","identifiers": ["TSLA", "MSFT"],"since": "2026-01-01","transaction_codes": ["S", "P"],"limit": 500}
Output — JSON, CSV, Excel, Markdown
Every row uses the same envelope with a mode-specific payload:
{"sha256": "9f1c2a…","mode": "form_4_filings","cik": "0001318605","accession": "0001209191-26-012345","filing_date": "2026-05-12","payload": {"issuer": "Tesla, Inc.","reporter": "Elon Musk","relationship": ["Director", "Officer", "10% Owner"],"transaction_date": "2026-05-10","transaction_code": "S","shares": 1500000,"price_per_share": 247.18,"total_value": 370770000,"shares_owned_after": 411062076}}
Switch output_format per run:
json— Apify dataset rows (default).ndjson/csv/xlsx— single file streamed to the key-value store at end of run.markdown— LLM-readyoutput.mdbundle: one## headingper record, ready to paste into Claude, ChatGPT, or your RAG pipeline. The only SEC Actor that ships native Markdown output.
Data fields by mode
| Mode | What you get | Persona |
|---|---|---|
form_4_filings | Section 16 insider trades, one row per transaction | Quant / Hedge |
form_13f_filings | 13F-HR institutional holdings, one row per CUSIP | Quant / Hedge |
form_13d_filings | Schedule 13D activist 5%+ stakes + group filer clustering | Quant / Hedge |
form_144_filings | Restricted-stock sale notices — 90-day forward signal | Quant / Hedge |
form_10k_filings / form_10q_filings | Annual / quarterly reports with section index | Fintech / RegTech |
form_8k_filings | 8-K Items with severity tags + restatement flags | Fintech / RegTech |
xbrl_facts / xbrl_statements / xbrl_metrics / xbrl_frames | US-GAAP / IFRS XBRL data, atomic or normalized | Fintech / Quant |
filing_markdown / filing_chunks | Section-aware Markdown + stable RAG chunk IDs | LLM / Agentic |
section_diff / disclosure_compare | Paragraph-level YoY 10-K diff | LLM / Agentic |
signal_pack / discrete_signals / insider_cluster | Hedge-fund composite signals | Quant / Hedge |
litigation_releases / aaer_releases / comment_letters | SEC enforcement & restatement early-warning data | RegTech / Forensic |
cik_ticker_map | Resolve ticker / CIK / name / domain | All |
filings_search / filings_feed / filings_index | EDGAR full-text search + real-time + historical | All |
Full 70-mode catalog in the Input tab dropdown.
Pricing — how much does it cost to pull SEC filings?
Pay per row, not per second. The first 20 rows of every run are free, so prototyping is free. Beyond that, only the rows you receive are charged:
| Data shape | Example modes | Price per row |
|---|---|---|
| CIK / ticker lookup | cik_ticker_map | $0.0001 |
| Filing index / feed / search hit | filings_index, filings_search, filings_feed | $0.0004 |
| Insider transaction | form_4_filings, form_144_filings | $0.0005 |
| Institutional holding | form_13f_filings, form_nport_filings, form_npx_filings | $0.0005 |
| XBRL atomic fact | xbrl_facts, xbrl_frames, xbrl_metrics | $0.0002 |
| XBRL statement row | xbrl_statements | $0.001 |
| Form filing record | form_10k_filings, form_8k_filings, form_s1_filings, etc. | $0.002 |
| 10-K section extract | section_extract modes (MD&A, Risk Factors, ICFR, …) | $0.001 |
| Filing as Markdown (section-aware) | filing_markdown | $0.003 |
| RAG chunk | filing_chunks | $0.0008 |
| Year-over-year section diff op | section_diff, disclosure_compare | $0.002 |
| DEF 14A proxy row | def14a_exec_comp, def14a_audit_fees, … | $0.001 |
| Fund report (N-CSR / N-CEN) | form_ncsr_filings, form_ncen_filings | $0.001 |
| Enforcement / Form ADV record | litigation_releases, aaer_releases, form_adv_filings | $0.001 |
| Risk signal | signal_pack, insider_cluster, activist_clustering | $0.002 |
| AI brief / summary | ai_summary, ai_importance | $0.04 |
| Run started | once per run | $0.005 |
| Change detected (incremental) | only when a record's SHA-256 differs | $0.0003 |
Worked examples:
- Daily 13F sweep of 100 funds (~5,000 holdings) → $2.50 (minus the first 20 free rows).
- Full 10-K of Apple as RAG chunks (~400 chunks) → $0.32.
- Insider trades for the FAANG 5 over 2025 (~3,000 rows) → $1.50.
Tips & advanced options
- Use
incremental: trueon schedules. Re-run the same input and you only pay$0.0003per row whose SHA-256 actually changed. Most days that's a handful of rows out of thousands. - Start with a mid-cap (
MDB,NET,SHOP) when testing — Berkshire-class filings can blow past 1 GB of memory on RAG-heavy modes. - Chain modes via the Apify API. Resolve identifiers with
cik_ticker_map, then fan out toform_4_filingsandform_13f_filingsin parallel runs. Same dataset. - Pick the right output format.
markdownfor agents and RAG,csv/xlsxfor analysts,ndjsonfor warehouses,jsonfor everyone else. - No Apify Proxy needed. SEC fair-use requires identifying User-Agent (set automatically) and prohibits residential-proxy obfuscation — a single egress IP is fine and faster.
FAQ, disclaimers & support
Is this legal? Yes. SEC EDGAR is public-domain U.S. government data. This Actor follows SEC fair-use guidelines (identifying User-Agent, ≤10 req/s) and does not scrape any restricted source.
Is it official? No. This is an independent third-party tool consuming the official SEC endpoints (data.sec.gov, www.sec.gov). Not affiliated with the SEC.
Do I need to register a User-Agent or rate-limit my requests? No. We handle SEC fair-use compliance, rate limits, retries, and pagination in the background.
Can I use this as a Python SEC EDGAR API? Yes. ApifyClient("…").actor("dominvo/sec-edgar-ai-scraper").call(...) is the one-line drop-in. Works from any language with an HTTP client.
Does it cover foreign issuers? Yes — form_20f_filings (FPIs), form_6k_filings (foreign current reports), and form_40f_filings (Canadian) cover the non-U.S. surface.
Can I get LLM-ready output? Yes — set output_format: markdown and we bundle every row into one output.md file ready for RAG ingestion. Or use filing_chunks / filing_markdown modes directly.
What if I need a mode that isn't built yet? File an issue in the Issues tab. New modes ship along the priority order in the catalog; custom modes and bespoke chunking schemas are available on request.
Built on the Apify SDK, Crawlee for Python, and edgartools ≥ 5.31. Runs on a 2 GB memory tier with Apify Standby enabled.