SEC EDGAR Scraper | US Public Company Filings & Profiles avatar

SEC EDGAR Scraper | US Public Company Filings & Profiles

Pricing

from $1.20 / 1,000 results

Go to Apify Store
SEC EDGAR Scraper | US Public Company Filings & Profiles

SEC EDGAR Scraper | US Public Company Filings & Profiles

SEC EDGAR scraper & API: export US public-company filings, profiles and XBRL fundamentals (revenue, net income, assets, EPS) by ticker, CIK or name. Equity research, financial statement data and compliance — official data.sec.gov, fast, no login.

Pricing

from $1.20 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Haketa

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

The fastest way to turn a list of US stock tickers into a structured dataset of company profiles, recent SEC filings (10-K, 10-Q, 8-K, S-1, Form 4 …) and XBRL fundamentals (revenue, net income, total assets, total equity). Built on top of SEC's open data.sec.gov and www.sec.gov endpoints — no auth, no anti-bot, no HTML scraping — just clean JSON straight from the source of truth for US public-company disclosure.

Pass ["AAPL", "MSFT", "NVDA"] and you get back per-row CIK, ticker, exchange, SIC industry, EIN, fiscal-year end, mailing + business addresses, registered phone, plus the company's latest filings list with direct document URLs.

TL;DR — Tickers in, structured company-and-filings rows out. The same pipeline financial analysts pay $200/seat/month to Bloomberg for, only for the public US universe and at fractions of a cent per row.


What you get

For each company, one row with:

FieldWhat
cikCentral Index Key, 10-digit zero-padded (SEC's universal company id)
nameRegistered company name
tickers, primaryTickerAll listed tickers + the first one
exchanges, primaryExchangeAll listing exchanges + the first one (NYSE, Nasdaq, NYSE Arca, …)
sic, sicDescriptionStandard Industrial Classification code + plain-text industry
einFederal Employer Identification Number
leiLegal Entity Identifier (when filed)
descriptionShort profile description (when SEC has one)
website, investorWebsiteCorporate + IR site URLs
categoryFiler category — Large accelerated, Accelerated, Non-accelerated, Smaller reporting
fiscalYearEndMMDD — Apple is 0926 (Sept 26 quarter-end)
stateOfIncorporationTwo-letter state code
entityType"operating", "investment company", etc.
phoneRegistered SEC phone number
formerNamesArray of previously-filed names
mailingStreet, mailingCity, mailingState, mailingZip, mailingCountryMailing-address breakdown
businessStreet, businessCity, businessState, businessZip, businessCountryBusiness-address breakdown
insiderTransactionForOwnerExists, insiderTransactionForIssuerExistsBoolean flags whether Form 4 / insider data is available
recentFilingsArray of recent filings (see schema below)
recentFilingsCount, lastFilingDate, lastFilingFormFilings summary
revenueLatest, revenueLatestPeriod, netIncomeLatest, totalAssetsLatest, totalEquityLatestXBRL fundamentals (when includeFundamentals: true)
edgarUrl, submissionsUrlDirect links to the EDGAR company page and the raw submissions JSON
searchInput, scrapedAtEcho of how the row was identified + ISO timestamp

recentFilings array shape

Each entry:

  • form10-K, 10-Q, 8-K, S-1, DEF 14A, 4, …
  • filingDate — ISO date when SEC accepted the filing
  • reportDate — ISO date of the report period it covers
  • accessionNumber — SEC's unique filing id (0000320193-25-000123)
  • primaryDocument — file name of the primary document
  • primaryDocDescription — SEC's short label for the document
  • fileNumber, items, sizeBytes
  • isXBRL, isInlineXBRL — booleans
  • documentUrldirect download URL of the primary document (HTML / PDF / XML)
  • indexUrl — direct link to the filing's index page on SEC.gov

The complete schema with types is in dataset_schema.json.


Why this data is monetizable

US public-company disclosure is the most regulated, most-litigated, most-syndicated dataset in finance. Bloomberg, Refinitiv, S&P Capital IQ and FactSet all charge four-figures per seat per month for cleaned-up versions of the same fields the SEC publishes for free. The catch: the SEC's own delivery format is a hierarchical CIK directory with thousands of tiny endpoints and arcane User-Agent rules — not exactly drag-and-drop into Sheets or Snowflake.

This actor closes that gap:

  • No subscription, pay-per-use. Pay only for the runs you trigger.
  • Multi-company batch. Throw a list of tickers in, get rows out.
  • Tickers, CIKs and fuzzy company names all resolved automatically against SEC's official mapping (≈12,000 companies kept in sync).
  • Recent filings already linked. Every row carries documentUrl + indexUrl for each filing — open them straight in the browser or pipe to an LLM.
  • XBRL fundamentals on demand. Revenue, net income, total assets, total equity — the four headline accounting facts every analyst joins on — fetched in one extra request per company.

Use cases

Equity / credit research

  • Build a daily watchlist of every 8-K filed by your coverage list — material-event monitoring becomes a two-minute setup, not a $500/seat compliance subscription.
  • Pre-screen earnings season — pull recent 10-K / 10-Q filings + revenue / net-income trend for every name in your sector in one run.

M&A / corporate development

  • Map every S-1 (IPO) and S-4 (merger / acquisition) filed in your TAM over the last 12 months — instant pipeline of potential targets / competitive intel.
  • Cross-reference sic + revenue size against your target profile.

Sales / lead generation (B2B)

  • Companies on SEC EDGAR are US public companies — by definition the highest-budget B2B buyers. Filter by sicDescription and phone for your industry-matched outbound list.
  • Use category: "Large accelerated filer" to filter to companies with > $700M float (highest LTV).
  • Beneficial ownership and entity-mapping projects — formerNames, ein, stateOfIncorporation, lei are the join keys you need across multiple regulators.
  • Section 16 monitoring — filter recentFilings to form: "4" for insider transactions.

Quantitative research / AI / ML

  • Build training corpora from public filings — drop every row's documentUrl into an ingestion pipeline, point at an LLM.
  • Backtest fundamentals — includeFundamentals: true plus your own time-series of past runs gives you a historical XBRL panel.

Journalism / investigative research

  • Track filings by named individuals (look up their company via tickers / CIKs).
  • Map cross-holdings via insiderTransactionForIssuerExists + Form 4 filings.

Real-estate / proptech analytics

  • REITs are SEC filers — pull every REIT in a sicDescription: "Real Estate Investment Trusts" slice, score by totalAssets.

Sustainability / ESG

  • Filter recentFilings to form: "SD" (Conflict Minerals Disclosure) or sustainability proxy statements — instant ESG-disclosure snapshot.

Inputs (full list)

Definitions live in input_schema.json; here's the human summary.

  • tickers (array) — US stock tickers. Case-insensitive. Examples: AAPL, MSFT, BRK.B. Resolved against SEC's official mapping.
  • ciks (array) — 10-digit CIKs, or shorter (we zero-pad). Use this when you already know the CIK.
  • companyNames (array) — Fuzzy company names (case-insensitive). Matches against SEC's mapping titles. Use tickers / CIKs for exact lookups.
  • includeRecentFilings (boolean) — Attach the recent-filings list (default true).
  • maxFilingsPerCompany (integer) — Cap on recent-filings rows per company. SEC ships up to 1,000 in the submissions endpoint; we trim to your cap. Default 50.
  • filingFormTypes (array) — Optional filter to specific forms (10-K, 10-Q, 8-K, S-1, 4, DEF 14A, …). Leave empty to keep them all.
  • includeFundamentals (boolean) — Pulls the latest reported value of Revenue, Net Income, Total Assets and Total Stockholders' Equity from SEC's XBRL endpoint. Adds ~1 extra request per concept per company. Default false.
  • userAgentEmail (string) — SEC requires every API client to identify itself in the User-Agent header. Drop your contact email here; works either way but using your own keeps SEC happy if your runs grow.
  • maxConcurrency (integer) — Parallel companies. Default 5. SEC's published cap is 10 req/s, we leave headroom.
  • requestDelay (integer) — Per-worker delay between requests. Default 150 ms.
  • proxyConfiguration (proxy) — SEC EDGAR is open from any IP, so proxy defaults to OFF.

Example inputs

1. Three big tech profiles, with filings

{
"tickers": ["AAPL", "MSFT", "GOOGL"],
"includeRecentFilings": true,
"maxFilingsPerCompany": 30
}

2. 50 tickers, just 10-K and 10-Q forms

{
"tickers": ["AAPL","MSFT","NVDA","GOOGL","AMZN","META","TSLA","JPM","BRK.B","WMT"],
"filingFormTypes": ["10-K", "10-Q"],
"maxFilingsPerCompany": 8
}

3. Full fundamentals dump for a coverage list

{
"tickers": ["AAPL", "MSFT", "GOOGL", "AMZN", "META"],
"includeFundamentals": true,
"includeRecentFilings": false
}

4. Lookup by company name (fuzzy)

{
"companyNames": ["apple", "nvidia", "berkshire"]
}

5. Direct CIK list (no resolution step)

{
"ciks": ["0000320193", "0000789019", "1652044"],
"maxFilingsPerCompany": 25
}

Output sample

{
"cik": "0000320193",
"name": "Apple Inc.",
"tickers": ["AAPL"],
"primaryTicker": "AAPL",
"exchanges": ["Nasdaq"],
"primaryExchange": "Nasdaq",
"sic": "3571",
"sicDescription": "Electronic Computers",
"ein": "942404110",
"lei": null,
"description": "",
"website": null,
"investorWebsite": null,
"category": "Large accelerated filer",
"fiscalYearEnd": "0926",
"stateOfIncorporation": "CA",
"entityType": "operating",
"phone": "(408) 996-1010",
"formerNames": ["APPLE COMPUTER INC"],
"mailingStreet": "ONE APPLE PARK WAY",
"mailingCity": "CUPERTINO",
"mailingState": "CA",
"mailingZip": "95014",
"mailingCountry": null,
"businessStreet": "ONE APPLE PARK WAY",
"businessCity": "CUPERTINO",
"businessState": "CA",
"businessZip": "95014",
"businessCountry": null,
"insiderTransactionForOwnerExists": false,
"insiderTransactionForIssuerExists": true,
"recentFilingsCount": 30,
"lastFilingDate": "2026-05-29",
"lastFilingForm": "4",
"recentFilings": [
{
"form": "4",
"filingDate": "2026-05-29",
"reportDate": "2026-05-27",
"accessionNumber": "0001140361-26-023363",
"primaryDocument": "xslF345X06/form4.xml",
"primaryDocDescription": "FORM 4",
"fileNumber": "001-36743",
"items": null,
"sizeBytes": 6543,
"isXBRL": false,
"isInlineXBRL": false,
"documentUrl": "https://www.sec.gov/Archives/edgar/data/320193/000114036126023363/xslF345X06/form4.xml",
"indexUrl": "https://www.sec.gov/Archives/edgar/data/320193/000114036126023363/0001140361-26-023363-index.htm"
}
],
"revenueLatest": 391035000000,
"revenueLatestPeriod": "2025-09-28",
"netIncomeLatest": 99853000000,
"totalAssetsLatest": 364980000000,
"totalEquityLatest": 73330000000,
"edgarUrl": "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000320193&type=&dateb=&owner=include&count=40",
"submissionsUrl": "https://data.sec.gov/submissions/CIK0000320193.json",
"searchInput": "ticker:AAPL",
"scrapedAt": "2026-05-31T19:14:02.000Z"
}

Cost & throughput

Pay-per-event pricing — the exact tier is set on the actor's Apify Store page.

Throughput on the default config (maxConcurrency: 5, requestDelay: 150ms):

  • Profile + filings only: ~10 companies / second.
  • Profile + filings + fundamentals: ~3 companies / second (4 extra XBRL requests per company).

The S&P 500 — 500 tickers — runs in under a minute on the default config.

SEC's published rate limit is 10 req/s per IP. The defaults keep you well under it; raising maxConcurrency above 8 risks intermittent 429s.


How it works (under the hood)

  1. Ticker map cache — fetched once per run from https://www.sec.gov/files/company_tickers.json (~800 KB, ~12,000 companies). Used to resolve tickers and companyNames into the official 10-digit CIK.
  2. Submissions fetch — for each CIK, https://data.sec.gov/submissions/CIK{CIK}.json returns the company profile, addresses, fiscal year end, plus the recent-filings arrays (form[], filingDate[], accessionNumber[], primaryDocument[], …). We zip the parallel arrays into a single recentFilings array per company.
  3. Filing URL constructiondocumentUrl is built from cik + accessionNumber (de-dashed) + primaryDocument. indexUrl points at the SEC.gov index page for the filing.
  4. XBRL fundamentals (optional) — for each headline tag (Revenues, NetIncomeLoss, Assets, StockholdersEquity) we hit https://data.sec.gov/api/xbrl/companyconcept/CIK{CIK}/us-gaap/{tag}.json and pick the most-recent annual (FY) reported value. Falls through known alternative tag names (e.g. RevenueFromContractWithCustomerExcludingAssessedTax) so newer companies still come back populated.
  5. Concurrency — up to maxConcurrency workers process companies in parallel with a requestDelay-ms sleep per task to stay under SEC's 10 req/s cap.
  6. User-Agent — every request sends User-Agent: Apify EDGAR Scraper (haketa) <your-email>. SEC blocks anonymous / generic-bot UAs with HTTP 403.

No Cheerio, no Playwright, no Cloudflare bypass — SEC EDGAR is one of the few datasets on the internet where raw HTTP "just works".


Tips & troubleshooting

Q: A ticker isn't being resolved. A: The actor uses SEC's own company_tickers.json mapping which is US-listed only. ADRs and OTC-only tickers may not be in there — pass the CIK directly in that case.

Q: I'm seeing HTTP 403 from SEC. A: Almost always the User-Agent. Make sure userAgentEmail is set to a real-looking string with @ in it. SEC does block requests without an identifying UA.

Q: My fundamentals are coming back null for some companies. A: Companies that don't file under US GAAP (foreign filers under IFRS, some investment companies) won't have the standard tags. The actor tries multiple tag alternatives; if all of them are missing the company genuinely doesn't report that concept.

Q: I want a specific 8-K from 2026 — can I extract it directly? A: The actor surfaces every recent filing with its documentUrl. Once you have the URL, you can either follow up with another scraping pipeline (HTML, PDF) or paste it into an LLM for extraction.

Q: How do I get insider trades? A: Set filingFormTypes: ["4"] — that filters the recent-filings array down to Section 16 transactions. The documentUrl then points at the Form 4 XML, which carries the officer name, transaction type, shares and price.

Q: How do I run this for the entire S&P 500 weekly? A: Use Apify Schedules with a daily cron + a static S&P-500 ticker list. The whole list runs in ~1 minute on the default config.

Q: SEC says it'll rate-limit me. A: 10 req/s is the published cap. maxConcurrency: 5 + requestDelay: 150 ≈ 6 req/s. Bump requestDelay to 250 if you do see 429s.

Q: Are the data fresh? A: Real-time. We hit SEC's public endpoints on every run; SEC updates submissions JSON within minutes of a filing being accepted.


How this compares

There's no other generic SEC EDGAR-companies scraper on the Apify Store at this scope. The adjacent listings (Crunchbase scrapers at $0.01–0.02 per item) cover private-company funding rounds — useful but a different dataset. This actor is the public-equity equivalent: every US-listed company at fractional-cent pricing.

  • pratikdani/crunchbase-companies-scraper — private companies, paid for via $0.015/item.
  • This actor — public companies, regulatory ground truth, official source.

The two are complementary: run Crunchbase scrapers for pre-IPO targets, run this one for post-IPO coverage.


All data this actor returns is public SEC disclosure. There is no terms-of-service issue, no anti-bot to bypass, no robots.txt to honour beyond SEC's own (which we do — we only hit the published endpoints).

The only obligation is SEC's User-Agent identification rule — the actor handles that automatically and lets you set your own contact email if you'd prefer.


Changelog

  • 1.0 — Initial release. Tickers / CIKs / fuzzy company-names input. Profile + recent filings + optional XBRL fundamentals (Revenue, NetIncome, Assets, Equity). Concurrent workers, SEC User-Agent compliance, EDGAR document + index URL construction.

Roadmap / feature requests

  • Full-text filings search input (use efts.sec.gov to search by keyword across all filings).
  • Section 16 insider trades parsed into individual transaction rows (form 4 XML).
  • Historical XBRL time-series mode (one row per (cik, concept, period) instead of latest-only).
  • Form 13F holdings (institutional investor positions) per CIK.
  • Foreign filer support (IFRS taxonomy concepts).

Drop a comment on the Apify Store page if any of these would unblock you.