SEC EDGAR Scraper | US Public Company Filings & Profiles
Pricing
from $1.20 / 1,000 results
SEC EDGAR Scraper | US Public Company Filings & Profiles
SEC EDGAR scraper & API: export US public-company filings, profiles and XBRL fundamentals (revenue, net income, assets, EPS) by ticker, CIK or name. Equity research, financial statement data and compliance — official data.sec.gov, fast, no login.
Pricing
from $1.20 / 1,000 results
Rating
0.0
(0)
Developer
Haketa
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
The fastest way to turn a list of US stock tickers into a structured dataset of company profiles, recent SEC filings (10-K, 10-Q, 8-K, S-1, Form 4 …) and XBRL fundamentals (revenue, net income, total assets, total equity). Built on top of SEC's open data.sec.gov and www.sec.gov endpoints — no auth, no anti-bot, no HTML scraping — just clean JSON straight from the source of truth for US public-company disclosure.
Pass ["AAPL", "MSFT", "NVDA"] and you get back per-row CIK, ticker, exchange, SIC industry, EIN, fiscal-year end, mailing + business addresses, registered phone, plus the company's latest filings list with direct document URLs.
TL;DR — Tickers in, structured company-and-filings rows out. The same pipeline financial analysts pay $200/seat/month to Bloomberg for, only for the public US universe and at fractions of a cent per row.
What you get
For each company, one row with:
| Field | What |
|---|---|
cik | Central Index Key, 10-digit zero-padded (SEC's universal company id) |
name | Registered company name |
tickers, primaryTicker | All listed tickers + the first one |
exchanges, primaryExchange | All listing exchanges + the first one (NYSE, Nasdaq, NYSE Arca, …) |
sic, sicDescription | Standard Industrial Classification code + plain-text industry |
ein | Federal Employer Identification Number |
lei | Legal Entity Identifier (when filed) |
description | Short profile description (when SEC has one) |
website, investorWebsite | Corporate + IR site URLs |
category | Filer category — Large accelerated, Accelerated, Non-accelerated, Smaller reporting |
fiscalYearEnd | MMDD — Apple is 0926 (Sept 26 quarter-end) |
stateOfIncorporation | Two-letter state code |
entityType | "operating", "investment company", etc. |
phone | Registered SEC phone number |
formerNames | Array of previously-filed names |
mailingStreet, mailingCity, mailingState, mailingZip, mailingCountry | Mailing-address breakdown |
businessStreet, businessCity, businessState, businessZip, businessCountry | Business-address breakdown |
insiderTransactionForOwnerExists, insiderTransactionForIssuerExists | Boolean flags whether Form 4 / insider data is available |
recentFilings | Array of recent filings (see schema below) |
recentFilingsCount, lastFilingDate, lastFilingForm | Filings summary |
revenueLatest, revenueLatestPeriod, netIncomeLatest, totalAssetsLatest, totalEquityLatest | XBRL fundamentals (when includeFundamentals: true) |
edgarUrl, submissionsUrl | Direct links to the EDGAR company page and the raw submissions JSON |
searchInput, scrapedAt | Echo of how the row was identified + ISO timestamp |
recentFilings array shape
Each entry:
form—10-K,10-Q,8-K,S-1,DEF 14A,4, …filingDate— ISO date when SEC accepted the filingreportDate— ISO date of the report period it coversaccessionNumber— SEC's unique filing id (0000320193-25-000123)primaryDocument— file name of the primary documentprimaryDocDescription— SEC's short label for the documentfileNumber,items,sizeBytesisXBRL,isInlineXBRL— booleansdocumentUrl— direct download URL of the primary document (HTML / PDF / XML)indexUrl— direct link to the filing's index page on SEC.gov
The complete schema with types is in dataset_schema.json.
Why this data is monetizable
US public-company disclosure is the most regulated, most-litigated, most-syndicated dataset in finance. Bloomberg, Refinitiv, S&P Capital IQ and FactSet all charge four-figures per seat per month for cleaned-up versions of the same fields the SEC publishes for free. The catch: the SEC's own delivery format is a hierarchical CIK directory with thousands of tiny endpoints and arcane User-Agent rules — not exactly drag-and-drop into Sheets or Snowflake.
This actor closes that gap:
- No subscription, pay-per-use. Pay only for the runs you trigger.
- Multi-company batch. Throw a list of tickers in, get rows out.
- Tickers, CIKs and fuzzy company names all resolved automatically against SEC's official mapping (≈12,000 companies kept in sync).
- Recent filings already linked. Every row carries
documentUrl+indexUrlfor each filing — open them straight in the browser or pipe to an LLM. - XBRL fundamentals on demand. Revenue, net income, total assets, total equity — the four headline accounting facts every analyst joins on — fetched in one extra request per company.
Use cases
Equity / credit research
- Build a daily watchlist of every 8-K filed by your coverage list — material-event monitoring becomes a two-minute setup, not a $500/seat compliance subscription.
- Pre-screen earnings season — pull recent 10-K / 10-Q filings + revenue / net-income trend for every name in your sector in one run.
M&A / corporate development
- Map every S-1 (IPO) and S-4 (merger / acquisition) filed in your TAM over the last 12 months — instant pipeline of potential targets / competitive intel.
- Cross-reference
sic+ revenue size against your target profile.
Sales / lead generation (B2B)
- Companies on SEC EDGAR are US public companies — by definition the highest-budget B2B buyers. Filter by
sicDescriptionandphonefor your industry-matched outbound list. - Use
category: "Large accelerated filer"to filter to companies with > $700M float (highest LTV).
Tax / legal / compliance
- Beneficial ownership and entity-mapping projects —
formerNames,ein,stateOfIncorporation,leiare the join keys you need across multiple regulators. - Section 16 monitoring — filter
recentFilingstoform: "4"for insider transactions.
Quantitative research / AI / ML
- Build training corpora from public filings — drop every row's
documentUrlinto an ingestion pipeline, point at an LLM. - Backtest fundamentals —
includeFundamentals: trueplus your own time-series of past runs gives you a historical XBRL panel.
Journalism / investigative research
- Track filings by named individuals (look up their company via tickers / CIKs).
- Map cross-holdings via
insiderTransactionForIssuerExists+ Form 4 filings.
Real-estate / proptech analytics
- REITs are SEC filers — pull every REIT in a
sicDescription: "Real Estate Investment Trusts"slice, score bytotalAssets.
Sustainability / ESG
- Filter
recentFilingstoform: "SD"(Conflict Minerals Disclosure) or sustainability proxy statements — instant ESG-disclosure snapshot.
Inputs (full list)
Definitions live in input_schema.json; here's the human summary.
tickers(array) — US stock tickers. Case-insensitive. Examples:AAPL,MSFT,BRK.B. Resolved against SEC's official mapping.ciks(array) — 10-digit CIKs, or shorter (we zero-pad). Use this when you already know the CIK.companyNames(array) — Fuzzy company names (case-insensitive). Matches against SEC's mapping titles. Use tickers / CIKs for exact lookups.includeRecentFilings(boolean) — Attach the recent-filings list (defaulttrue).maxFilingsPerCompany(integer) — Cap on recent-filings rows per company. SEC ships up to 1,000 in the submissions endpoint; we trim to your cap. Default50.filingFormTypes(array) — Optional filter to specific forms (10-K,10-Q,8-K,S-1,4,DEF 14A, …). Leave empty to keep them all.includeFundamentals(boolean) — Pulls the latest reported value of Revenue, Net Income, Total Assets and Total Stockholders' Equity from SEC's XBRL endpoint. Adds ~1 extra request per concept per company. Defaultfalse.userAgentEmail(string) — SEC requires every API client to identify itself in the User-Agent header. Drop your contact email here; works either way but using your own keeps SEC happy if your runs grow.maxConcurrency(integer) — Parallel companies. Default5. SEC's published cap is 10 req/s, we leave headroom.requestDelay(integer) — Per-worker delay between requests. Default150 ms.proxyConfiguration(proxy) — SEC EDGAR is open from any IP, so proxy defaults to OFF.
Example inputs
1. Three big tech profiles, with filings
{"tickers": ["AAPL", "MSFT", "GOOGL"],"includeRecentFilings": true,"maxFilingsPerCompany": 30}
2. 50 tickers, just 10-K and 10-Q forms
{"tickers": ["AAPL","MSFT","NVDA","GOOGL","AMZN","META","TSLA","JPM","BRK.B","WMT"],"filingFormTypes": ["10-K", "10-Q"],"maxFilingsPerCompany": 8}
3. Full fundamentals dump for a coverage list
{"tickers": ["AAPL", "MSFT", "GOOGL", "AMZN", "META"],"includeFundamentals": true,"includeRecentFilings": false}
4. Lookup by company name (fuzzy)
{"companyNames": ["apple", "nvidia", "berkshire"]}
5. Direct CIK list (no resolution step)
{"ciks": ["0000320193", "0000789019", "1652044"],"maxFilingsPerCompany": 25}
Output sample
{"cik": "0000320193","name": "Apple Inc.","tickers": ["AAPL"],"primaryTicker": "AAPL","exchanges": ["Nasdaq"],"primaryExchange": "Nasdaq","sic": "3571","sicDescription": "Electronic Computers","ein": "942404110","lei": null,"description": "","website": null,"investorWebsite": null,"category": "Large accelerated filer","fiscalYearEnd": "0926","stateOfIncorporation": "CA","entityType": "operating","phone": "(408) 996-1010","formerNames": ["APPLE COMPUTER INC"],"mailingStreet": "ONE APPLE PARK WAY","mailingCity": "CUPERTINO","mailingState": "CA","mailingZip": "95014","mailingCountry": null,"businessStreet": "ONE APPLE PARK WAY","businessCity": "CUPERTINO","businessState": "CA","businessZip": "95014","businessCountry": null,"insiderTransactionForOwnerExists": false,"insiderTransactionForIssuerExists": true,"recentFilingsCount": 30,"lastFilingDate": "2026-05-29","lastFilingForm": "4","recentFilings": [{"form": "4","filingDate": "2026-05-29","reportDate": "2026-05-27","accessionNumber": "0001140361-26-023363","primaryDocument": "xslF345X06/form4.xml","primaryDocDescription": "FORM 4","fileNumber": "001-36743","items": null,"sizeBytes": 6543,"isXBRL": false,"isInlineXBRL": false,"documentUrl": "https://www.sec.gov/Archives/edgar/data/320193/000114036126023363/xslF345X06/form4.xml","indexUrl": "https://www.sec.gov/Archives/edgar/data/320193/000114036126023363/0001140361-26-023363-index.htm"}],"revenueLatest": 391035000000,"revenueLatestPeriod": "2025-09-28","netIncomeLatest": 99853000000,"totalAssetsLatest": 364980000000,"totalEquityLatest": 73330000000,"edgarUrl": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&type=&dateb=&owner=include&count=40","submissionsUrl": "https://data.sec.gov/submissions/CIK0000320193.json","searchInput": "ticker:AAPL","scrapedAt": "2026-05-31T19:14:02.000Z"}
Cost & throughput
Pay-per-event pricing — the exact tier is set on the actor's Apify Store page.
Throughput on the default config (maxConcurrency: 5, requestDelay: 150ms):
- Profile + filings only: ~10 companies / second.
- Profile + filings + fundamentals: ~3 companies / second (4 extra XBRL requests per company).
The S&P 500 — 500 tickers — runs in under a minute on the default config.
SEC's published rate limit is 10 req/s per IP. The defaults keep you well under it; raising maxConcurrency above 8 risks intermittent 429s.
How it works (under the hood)
- Ticker map cache — fetched once per run from
https://www.sec.gov/files/company_tickers.json(~800 KB, ~12,000 companies). Used to resolvetickersandcompanyNamesinto the official 10-digit CIK. - Submissions fetch — for each CIK,
https://data.sec.gov/submissions/CIK{CIK}.jsonreturns the company profile, addresses, fiscal year end, plus the recent-filings arrays (form[],filingDate[],accessionNumber[],primaryDocument[], …). We zip the parallel arrays into a singlerecentFilingsarray per company. - Filing URL construction —
documentUrlis built fromcik+accessionNumber(de-dashed) +primaryDocument.indexUrlpoints at the SEC.gov index page for the filing. - XBRL fundamentals (optional) — for each headline tag (Revenues, NetIncomeLoss, Assets, StockholdersEquity) we hit
https://data.sec.gov/api/xbrl/companyconcept/CIK{CIK}/us-gaap/{tag}.jsonand pick the most-recent annual (FY) reported value. Falls through known alternative tag names (e.g.RevenueFromContractWithCustomerExcludingAssessedTax) so newer companies still come back populated. - Concurrency — up to
maxConcurrencyworkers process companies in parallel with arequestDelay-ms sleep per task to stay under SEC's 10 req/s cap. - User-Agent — every request sends
User-Agent: Apify EDGAR Scraper (haketa) <your-email>. SEC blocks anonymous / generic-bot UAs with HTTP 403.
No Cheerio, no Playwright, no Cloudflare bypass — SEC EDGAR is one of the few datasets on the internet where raw HTTP "just works".
Tips & troubleshooting
Q: A ticker isn't being resolved.
A: The actor uses SEC's own company_tickers.json mapping which is US-listed only. ADRs and OTC-only tickers may not be in there — pass the CIK directly in that case.
Q: I'm seeing HTTP 403 from SEC.
A: Almost always the User-Agent. Make sure userAgentEmail is set to a real-looking string with @ in it. SEC does block requests without an identifying UA.
Q: My fundamentals are coming back null for some companies. A: Companies that don't file under US GAAP (foreign filers under IFRS, some investment companies) won't have the standard tags. The actor tries multiple tag alternatives; if all of them are missing the company genuinely doesn't report that concept.
Q: I want a specific 8-K from 2026 — can I extract it directly?
A: The actor surfaces every recent filing with its documentUrl. Once you have the URL, you can either follow up with another scraping pipeline (HTML, PDF) or paste it into an LLM for extraction.
Q: How do I get insider trades?
A: Set filingFormTypes: ["4"] — that filters the recent-filings array down to Section 16 transactions. The documentUrl then points at the Form 4 XML, which carries the officer name, transaction type, shares and price.
Q: How do I run this for the entire S&P 500 weekly? A: Use Apify Schedules with a daily cron + a static S&P-500 ticker list. The whole list runs in ~1 minute on the default config.
Q: SEC says it'll rate-limit me.
A: 10 req/s is the published cap. maxConcurrency: 5 + requestDelay: 150 ≈ 6 req/s. Bump requestDelay to 250 if you do see 429s.
Q: Are the data fresh? A: Real-time. We hit SEC's public endpoints on every run; SEC updates submissions JSON within minutes of a filing being accepted.
How this compares
There's no other generic SEC EDGAR-companies scraper on the Apify Store at this scope. The adjacent listings (Crunchbase scrapers at $0.01–0.02 per item) cover private-company funding rounds — useful but a different dataset. This actor is the public-equity equivalent: every US-listed company at fractional-cent pricing.
pratikdani/crunchbase-companies-scraper— private companies, paid for via $0.015/item.- This actor — public companies, regulatory ground truth, official source.
The two are complementary: run Crunchbase scrapers for pre-IPO targets, run this one for post-IPO coverage.
Legal & ethical use
All data this actor returns is public SEC disclosure. There is no terms-of-service issue, no anti-bot to bypass, no robots.txt to honour beyond SEC's own (which we do — we only hit the published endpoints).
The only obligation is SEC's User-Agent identification rule — the actor handles that automatically and lets you set your own contact email if you'd prefer.
Changelog
- 1.0 — Initial release. Tickers / CIKs / fuzzy company-names input. Profile + recent filings + optional XBRL fundamentals (Revenue, NetIncome, Assets, Equity). Concurrent workers, SEC User-Agent compliance, EDGAR document + index URL construction.
Roadmap / feature requests
- Full-text filings search input (use
efts.sec.govto search by keyword across all filings). - Section 16 insider trades parsed into individual transaction rows (form 4 XML).
- Historical XBRL time-series mode (one row per (cik, concept, period) instead of latest-only).
- Form 13F holdings (institutional investor positions) per CIK.
- Foreign filer support (IFRS taxonomy concepts).
Drop a comment on the Apify Store page if any of these would unblock you.