IRS 990 Scraper — Nonprofit Officer Comp
Pricing
Pay per event
IRS 990 Scraper — Nonprofit Officer Comp
Scrape IRS Form 990 nonprofit officer compensation and key financials as structured rows via the ProPublica Nonprofit Explorer API — bulk EIN lookup or name/state search, one row per filing year, 990 / 990EZ / 990PF field mapping — export to JSON or CSV. No key, no login.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
12 days ago
Last modified
Categories
Share
IRS 990 Scraper — Nonprofit Officer Compensation
We do the dirty work so your dataset stays clean. 😈
$3.05 / 1,000 rows — Pay only for rows that land. No credit card required to try.
Extract IRS Form 990 nonprofit officer compensation and key financial line items as clean, structured rows — one row per (EIN, tax year) filing. Bulk EIN lookup or name-and-state search, normalised across all three return types (990, 990EZ, 990PF). Powered by ProPublica's Nonprofit Explorer API; public-domain IRS data, no API key required.
The IRS publishes Form 990 returns from roughly 1.6 million tax-exempt organisations, but each return is a deeply nested envelope and ProPublica's API returns 60+ flat fields wrapped in a verbose org object. The six existing Apify Actors for this API are thin wrappers that dump the raw JSON. This Actor structures the data: one row per filing year with the formtype-correct officer-comp field, a computed total-compensation-and-benefits figure, organisational context (state, city, NTEE, subsection), and surrounding financials (revenue, expenses, assets) for compensation-magnitude analysis.
🎯 What this scrapes
One ResultRow per (EIN, tax year) filing. Every row carries the same 21 columns regardless of which input mode you used. Data is published by ProPublica under their Terms of Use (IRS source data is public domain).
| Field | Type | Description |
|---|---|---|
ein | string | 9-digit IRS Employer Identification Number (zero-padded) |
organization_name | string | Legal name from ProPublica's organization.name |
organization_url | string | Nonprofit Explorer organisation page URL |
state | string | null | USPS 2-letter state of the organisation's principal address |
city | string | null | City of the organisation's principal address |
ntee_code | string | null | NTEE (National Taxonomy of Exempt Entities) classification code |
subsection_code | int | null | IRC subsection (3 = 501(c)(3), 4 = 501(c)(4), etc.) |
tax_year | int | Filing tax year (tax_prd_yr) |
tax_period_end | string | null | Fiscal-year end (YYYY-MM) |
form_type | string | 990, 990EZ, or 990PF |
pdf_url | string | null | Direct link to the IRS-filed 990 PDF on ProPublica |
officer_comp_usd | int | null | Aggregate officer/director/trustee compensation |
other_salaries_wages_usd | int | null | All non-officer salaries and wages (null on 990PF) |
payroll_taxes_usd | int | null | Payroll taxes |
pension_contributions_usd | int | null | Pension-plan contributions |
other_employee_benefits_usd | int | null | Other employee benefits |
total_comp_and_benefits_usd | int | null | Computed sum of the 5 comp/benefit fields |
total_revenue_usd | int | null | Total revenue (context for compensation magnitude) |
total_functional_expenses_usd | int | null | Total functional expenses |
total_assets_end_usd | int | null | Total assets at year-end |
scraped_at | string | ISO 8601 UTC datetime this row was written |
🔥 Features
- Two input modes —
eins(explicit list) andsearchQuery(name search). Pydantic XOR validator enforces exactly one before any network call. EIN normaliser accepts13-1684331,131684331, or integer forms. - Formtype-correct field mapping — officer comp lives in
compnsatncurrofcron 990/990EZ andcompofficerson 990PF. We pick the right field for each form so you never get a null officer-comp row for a private foundation. - Year-window filtering —
startYearandendYeardefault to the last 3 calendar years. Out-of-window filings are dropped silently. - Computed total compensation and benefits — sum of officer comp + non-officer salaries + payroll tax + pension + other benefits, computed per filing. Null only when all 5 components are null; never falsely zero.
- State filter (search mode) — pass
stateFilter=CAto scope a name search to a single state viasearch.json?state[id]=. - Multi-form support — 990, 990EZ, and 990PF all produce clean rows. Unknown formtypes are logged and dropped (forward-compatible with new IRS forms).
- We handle retries with exponential backoff —
Retry-Afterheaders honoured for429and503responses; up to 5 attempts per request before we surface a clear error. - We rotate browser fingerprints —
curl-cffiwith Chrome 131 TLS impersonation so the upstream sees real-browser TLS handshakes, not a Python script. - We rotate proxies on every block — Apify Proxy support via the
BUYPROXIES94952group keeps requests flowing through fresh exit IPs. - Pydantic v2 input and output models —
form_typevalidated against the enum ("990"/"990EZ"/"990PF"); EINs zero-padded to 9 digits; typed rows guaranteed.
💡 Use cases
- Nonprofit executive-pay benchmarking — bulk-pull officer compensation for a list of 100–1,000 peer organisations to benchmark CEO/CFO pay against revenue and asset size.
- Investigative journalism — flag nonprofits where officer comp is an outsized fraction of revenue or expenses; spot rapid year-over-year jumps across sector-wide datasets.
- Foundation due diligence — research a foundation's officer payroll and asset trajectory before applying for a grant or entering a partnership.
- Charity-rating data pipelines — feed a structured 990 dataset into your own scoring model without writing a ProPublica wrapper from scratch.
- Academic research — build multi-year panel datasets on nonprofit-sector compensation and finances for economics or public-policy papers.
- State-level nonprofit-sector reports — combine
stateFilterwith a broadsearchQuery(e.g."foundation") to enumerate an entire state's nonprofit landscape in one run. - Prospect research for nonprofit-tech vendors — enrich your CRM with 990-derived financials for sales outreach to development directors at larger organisations.
⚙️ How to use it
- Open the Actor input form on the Apify Console.
- Pick exactly one input mode:
- EINs — supply a list of
eins(hyphens optional, e.g.["13-1684331"]) for direct lookup. - Search query — set
searchQueryto a name fragment (e.g."community foundation"); optionally narrow withstateFilter=NY.
- EINs — supply a list of
- Set
startYearandendYearto scope the filing window. Defaults: current year minus 3 through current year. - Set
maxOrgs(1–5,000) to cap the organisation list in search mode. Ignored ineinsmode. - Toggle
useProxyon if you want requests routed through Apify Proxy. The Actor handles retries and backoff regardless. - Click Start. Rows stream into the default dataset and are available as JSON, CSV, Excel, or XML.
Single EIN, last 3 years (default window)
{"eins": ["131684331"]}
Multiple EINs, narrow window
{"eins": ["131684331", "133871360", "530196605"],"startYear": 2021,"endYear": 2023}
Name search in California, capped at 50 orgs
{"searchQuery": "community foundation","stateFilter": "CA","maxOrgs": 50,"startYear": 2022,"endYear": 2023}
📥 Input
| Field | Type | Default | Notes |
|---|---|---|---|
eins | string[] | — | 9-digit EINs; hyphens stripped automatically. XOR with searchQuery. |
searchQuery | string | — | 1–200 chars. XOR with eins. |
stateFilter | string | null | USPS 2-letter code (CA, NY, …). Search mode only. |
startYear | int | current year - 3 | Inclusive lower bound. |
endYear | int | current year | Inclusive upper bound; must be >= startYear. |
maxOrgs | int | 100 | 1–5,000. Search mode only. |
useProxy | bool | false | Routes requests through Apify Proxy (BUYPROXIES94952). |
📤 Output
One JSON object per (EIN, tax year) filing, pushed to the default Apify dataset. All 21 fields are present on every row; nullable fields carry null rather than being omitted.
{"ein": "131684331","organization_name": "Metropolitan Museum of Art","organization_url": "https://projects.propublica.org/nonprofits/organizations/131684331","state": "NY","city": "New York","ntee_code": "A51","subsection_code": 3,"tax_year": 2022,"tax_period_end": "2022-06","form_type": "990","pdf_url": "https://projects.propublica.org/nonprofits/organizations/131684331/202341349349303881/full","officer_comp_usd": 4872543,"other_salaries_wages_usd": 198345612,"payroll_taxes_usd": 15234871,"pension_contributions_usd": 18924301,"other_employee_benefits_usd": 22341887,"total_comp_and_benefits_usd": 259719214,"total_revenue_usd": 412345678,"total_functional_expenses_usd": 389234512,"total_assets_end_usd": 5234123456,"scraped_at": "2024-03-15T14:22:31Z"}
Download via the Apify Console dataset view, or pull via the Apify API in JSON, CSV, Excel, or XML format.
💰 Pricing
This Actor uses Pay-Per-Event pricing. You are charged only for rows that successfully land in the dataset.
| Event | Price |
|---|---|
actor-start | $0.05 (charged once per run) |
result-row | $0.003 per filing-year row written |
A typical 100-EIN run averaging 3 filings each = 300 rows = $0.95 total. A 1,000-EIN run at 3 filings average = 3,000 rows = $9.05 total.
No data returned means no result-row charges — only the $0.05 start fee.
What we handle for you
Structuring IRS public-domain data involves more than a simple HTTP call. Here is what runs on our side so it doesn't run on yours:
- Fingerprint rotation —
curl-cffireplays a real browser's TLS ClientHello and HTTP/2 SETTINGS frame on every session. The upstream sees Chrome 131, not Python. - Proxy rotation — on any block or rate-limit signal, we request a fresh Apify Proxy session with a new exit IP so the run keeps moving.
- Retry with exponential backoff —
408,429, and5xxresponses trigger automatic retries (up to 5 attempts, starting at 2 s, capped at 30 s) withRetry-Afterheaders honoured. - Rate-limit pacing — we back off gracefully when the upstream slows us down. Partial successes surface with a clear status message; we never silently return an empty dataset.
- Clean, typed rows — Pydantic v2 validates every row before it is written. EINs are zero-padded, form types are enum-checked, nulls are explicit. No surprise schema drift between runs.
- Pay-per-result — you are charged only for rows that land. No data, no charge (beyond the start fee).
🚧 Limitations
- Per-officer breakdown (name / title / hours / individual comp from Part VII Section A) — that detail lives only in the 990 PDF and is not exposed by ProPublica's structured API. v1 emits the aggregate Part IX line 5 / 990PF Part I line 15 officer-compensation total. Per-officer PDF parsing is on the v2 roadmap.
- Compensation from related organisations (Schedule J Part II) — same reason; PDF-only.
- Filings older than approximately 2010 — ProPublica's structured coverage does not extend earlier.
- Annual filing cadence — 990s file annually, often with an 18-month lag. This Actor provides historical and benchmark data, not a real-time feed.
- 990-N (e-Postcard) — the very smallest nonprofits file 990-N, which has no financial detail. These organisations are not returned.
❓ FAQ
What is an IRS 990 scraper? An IRS 990 scraper extracts structured data from Form 990 filings — the annual information returns that US tax-exempt organisations must file with the IRS. ProPublica's Nonprofit Explorer API is the most reliable public source for this data; this Actor wraps it with formtype-correct field mapping and computed compensation totals.
How does this differ from ProPublica's bulk download?
ProPublica offers a bulk CSV of their structured API output. This Actor adds three things: (1) formtype-correct officer-comp field selection across 990, 990EZ, and 990PF so you never get null values for private foundations; (2) a computed total_comp_and_benefits_usd field; and (3) on-demand EIN or name-search queries without downloading the entire bulk file.
What is a nonprofit data API? A nonprofit data API is a programmatic interface for accessing information about tax-exempt organisations — typically financial data, officer compensation, mission statements, and EIN identifiers sourced from IRS Form 990 filings. ProPublica's Nonprofit Explorer API is the canonical free source; this Actor structures its output into typed rows.
How do I do a form 990 data download for a specific sector?
Use searchQuery mode with a sector keyword (e.g. "hospital", "education", "housing") and optionally set stateFilter. Set maxOrgs to control the batch size. The resulting dataset exports as CSV, JSON, Excel, or XML from the Apify Console.
Does this support 990PF (private foundations)?
Yes. 990PF officer comp lives in a different API field (compofficers) than the standard 990/990EZ field (compnsatncurrofcr). We pick the correct field per filing so private foundations always return a populated officer_comp_usd value.
Can I use this for nonprofit officer compensation data research?
Yes. The Actor is designed for exactly this use case — benchmarking executive pay across peer organisations, building sector-wide compensation panels, and conducting foundation due-diligence research. Set startYear and endYear to build multi-year panels.
What is the 990 financial data API pricing? $0.003 per row plus a $0.05 start fee. A 100-org, 3-year window run produces roughly 300 rows and costs approximately $0.95. There is no subscription or minimum spend.
Is the underlying data public domain? IRS Form 990 data is public domain. ProPublica's Nonprofit Explorer API is subject to their Terms of Use, which permits commercial use.
⭐ Your feedback
If this Actor saves you time on nonprofit research, a quick review on the Apify Store helps us keep improving it. Found a bug or need a field we don't currently emit (per-officer breakdowns, Schedule J data)? Open an issue on the Store listing and we will triage it.
- Author: DevilScrapes
- Issues and feature requests: open an issue on the Apify Store listing.
- Source licensed under Apache 2.0.
Data source
ProPublica's Nonprofit Explorer API v2, which mirrors IRS Form 990 returns. IRS data is public domain; ProPublica's API is subject to their Terms of Use.