IRS 990 Scraper — Nonprofit Officer Comp avatar

IRS 990 Scraper — Nonprofit Officer Comp

Pricing

Pay per event

Go to Apify Store
IRS 990 Scraper — Nonprofit Officer Comp

IRS 990 Scraper — Nonprofit Officer Comp

Scrape IRS Form 990 nonprofit officer compensation and key financials as structured rows via the ProPublica Nonprofit Explorer API — bulk EIN lookup or name/state search, one row per filing year, 990 / 990EZ / 990PF field mapping — export to JSON or CSV. No key, no login.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

12 days ago

Last modified

Categories

Share

IRS 990 Scraper — Nonprofit Officer Compensation

IRS 990 Scraper — Nonprofit Officer Compensation

We do the dirty work so your dataset stays clean. 😈

$3.05 / 1,000 rows — Pay only for rows that land. No credit card required to try.

Extract IRS Form 990 nonprofit officer compensation and key financial line items as clean, structured rows — one row per (EIN, tax year) filing. Bulk EIN lookup or name-and-state search, normalised across all three return types (990, 990EZ, 990PF). Powered by ProPublica's Nonprofit Explorer API; public-domain IRS data, no API key required.

The IRS publishes Form 990 returns from roughly 1.6 million tax-exempt organisations, but each return is a deeply nested envelope and ProPublica's API returns 60+ flat fields wrapped in a verbose org object. The six existing Apify Actors for this API are thin wrappers that dump the raw JSON. This Actor structures the data: one row per filing year with the formtype-correct officer-comp field, a computed total-compensation-and-benefits figure, organisational context (state, city, NTEE, subsection), and surrounding financials (revenue, expenses, assets) for compensation-magnitude analysis.

🎯 What this scrapes

One ResultRow per (EIN, tax year) filing. Every row carries the same 21 columns regardless of which input mode you used. Data is published by ProPublica under their Terms of Use (IRS source data is public domain).

FieldTypeDescription
einstring9-digit IRS Employer Identification Number (zero-padded)
organization_namestringLegal name from ProPublica's organization.name
organization_urlstringNonprofit Explorer organisation page URL
statestring | nullUSPS 2-letter state of the organisation's principal address
citystring | nullCity of the organisation's principal address
ntee_codestring | nullNTEE (National Taxonomy of Exempt Entities) classification code
subsection_codeint | nullIRC subsection (3 = 501(c)(3), 4 = 501(c)(4), etc.)
tax_yearintFiling tax year (tax_prd_yr)
tax_period_endstring | nullFiscal-year end (YYYY-MM)
form_typestring990, 990EZ, or 990PF
pdf_urlstring | nullDirect link to the IRS-filed 990 PDF on ProPublica
officer_comp_usdint | nullAggregate officer/director/trustee compensation
other_salaries_wages_usdint | nullAll non-officer salaries and wages (null on 990PF)
payroll_taxes_usdint | nullPayroll taxes
pension_contributions_usdint | nullPension-plan contributions
other_employee_benefits_usdint | nullOther employee benefits
total_comp_and_benefits_usdint | nullComputed sum of the 5 comp/benefit fields
total_revenue_usdint | nullTotal revenue (context for compensation magnitude)
total_functional_expenses_usdint | nullTotal functional expenses
total_assets_end_usdint | nullTotal assets at year-end
scraped_atstringISO 8601 UTC datetime this row was written

🔥 Features

  • Two input modeseins (explicit list) and searchQuery (name search). Pydantic XOR validator enforces exactly one before any network call. EIN normaliser accepts 13-1684331, 131684331, or integer forms.
  • Formtype-correct field mapping — officer comp lives in compnsatncurrofcr on 990/990EZ and compofficers on 990PF. We pick the right field for each form so you never get a null officer-comp row for a private foundation.
  • Year-window filteringstartYear and endYear default to the last 3 calendar years. Out-of-window filings are dropped silently.
  • Computed total compensation and benefits — sum of officer comp + non-officer salaries + payroll tax + pension + other benefits, computed per filing. Null only when all 5 components are null; never falsely zero.
  • State filter (search mode) — pass stateFilter=CA to scope a name search to a single state via search.json?state[id]=.
  • Multi-form support — 990, 990EZ, and 990PF all produce clean rows. Unknown formtypes are logged and dropped (forward-compatible with new IRS forms).
  • We handle retries with exponential backoffRetry-After headers honoured for 429 and 503 responses; up to 5 attempts per request before we surface a clear error.
  • We rotate browser fingerprintscurl-cffi with Chrome 131 TLS impersonation so the upstream sees real-browser TLS handshakes, not a Python script.
  • We rotate proxies on every block — Apify Proxy support via the BUYPROXIES94952 group keeps requests flowing through fresh exit IPs.
  • Pydantic v2 input and output models — form_type validated against the enum ("990" / "990EZ" / "990PF"); EINs zero-padded to 9 digits; typed rows guaranteed.

💡 Use cases

  • Nonprofit executive-pay benchmarking — bulk-pull officer compensation for a list of 100–1,000 peer organisations to benchmark CEO/CFO pay against revenue and asset size.
  • Investigative journalism — flag nonprofits where officer comp is an outsized fraction of revenue or expenses; spot rapid year-over-year jumps across sector-wide datasets.
  • Foundation due diligence — research a foundation's officer payroll and asset trajectory before applying for a grant or entering a partnership.
  • Charity-rating data pipelines — feed a structured 990 dataset into your own scoring model without writing a ProPublica wrapper from scratch.
  • Academic research — build multi-year panel datasets on nonprofit-sector compensation and finances for economics or public-policy papers.
  • State-level nonprofit-sector reports — combine stateFilter with a broad searchQuery (e.g. "foundation") to enumerate an entire state's nonprofit landscape in one run.
  • Prospect research for nonprofit-tech vendors — enrich your CRM with 990-derived financials for sales outreach to development directors at larger organisations.

⚙️ How to use it

  1. Open the Actor input form on the Apify Console.
  2. Pick exactly one input mode:
    • EINs — supply a list of eins (hyphens optional, e.g. ["13-1684331"]) for direct lookup.
    • Search query — set searchQuery to a name fragment (e.g. "community foundation"); optionally narrow with stateFilter=NY.
  3. Set startYear and endYear to scope the filing window. Defaults: current year minus 3 through current year.
  4. Set maxOrgs (1–5,000) to cap the organisation list in search mode. Ignored in eins mode.
  5. Toggle useProxy on if you want requests routed through Apify Proxy. The Actor handles retries and backoff regardless.
  6. Click Start. Rows stream into the default dataset and are available as JSON, CSV, Excel, or XML.

Single EIN, last 3 years (default window)

{
"eins": ["131684331"]
}

Multiple EINs, narrow window

{
"eins": ["131684331", "133871360", "530196605"],
"startYear": 2021,
"endYear": 2023
}

Name search in California, capped at 50 orgs

{
"searchQuery": "community foundation",
"stateFilter": "CA",
"maxOrgs": 50,
"startYear": 2022,
"endYear": 2023
}

📥 Input

FieldTypeDefaultNotes
einsstring[]9-digit EINs; hyphens stripped automatically. XOR with searchQuery.
searchQuerystring1–200 chars. XOR with eins.
stateFilterstringnullUSPS 2-letter code (CA, NY, …). Search mode only.
startYearintcurrent year - 3Inclusive lower bound.
endYearintcurrent yearInclusive upper bound; must be >= startYear.
maxOrgsint1001–5,000. Search mode only.
useProxyboolfalseRoutes requests through Apify Proxy (BUYPROXIES94952).

📤 Output

One JSON object per (EIN, tax year) filing, pushed to the default Apify dataset. All 21 fields are present on every row; nullable fields carry null rather than being omitted.

{
"ein": "131684331",
"organization_name": "Metropolitan Museum of Art",
"organization_url": "https://projects.propublica.org/nonprofits/organizations/131684331",
"state": "NY",
"city": "New York",
"ntee_code": "A51",
"subsection_code": 3,
"tax_year": 2022,
"tax_period_end": "2022-06",
"form_type": "990",
"pdf_url": "https://projects.propublica.org/nonprofits/organizations/131684331/202341349349303881/full",
"officer_comp_usd": 4872543,
"other_salaries_wages_usd": 198345612,
"payroll_taxes_usd": 15234871,
"pension_contributions_usd": 18924301,
"other_employee_benefits_usd": 22341887,
"total_comp_and_benefits_usd": 259719214,
"total_revenue_usd": 412345678,
"total_functional_expenses_usd": 389234512,
"total_assets_end_usd": 5234123456,
"scraped_at": "2024-03-15T14:22:31Z"
}

Download via the Apify Console dataset view, or pull via the Apify API in JSON, CSV, Excel, or XML format.

💰 Pricing

This Actor uses Pay-Per-Event pricing. You are charged only for rows that successfully land in the dataset.

EventPrice
actor-start$0.05 (charged once per run)
result-row$0.003 per filing-year row written

A typical 100-EIN run averaging 3 filings each = 300 rows = $0.95 total. A 1,000-EIN run at 3 filings average = 3,000 rows = $9.05 total.

No data returned means no result-row charges — only the $0.05 start fee.

What we handle for you

Structuring IRS public-domain data involves more than a simple HTTP call. Here is what runs on our side so it doesn't run on yours:

  • Fingerprint rotationcurl-cffi replays a real browser's TLS ClientHello and HTTP/2 SETTINGS frame on every session. The upstream sees Chrome 131, not Python.
  • Proxy rotation — on any block or rate-limit signal, we request a fresh Apify Proxy session with a new exit IP so the run keeps moving.
  • Retry with exponential backoff408, 429, and 5xx responses trigger automatic retries (up to 5 attempts, starting at 2 s, capped at 30 s) with Retry-After headers honoured.
  • Rate-limit pacing — we back off gracefully when the upstream slows us down. Partial successes surface with a clear status message; we never silently return an empty dataset.
  • Clean, typed rows — Pydantic v2 validates every row before it is written. EINs are zero-padded, form types are enum-checked, nulls are explicit. No surprise schema drift between runs.
  • Pay-per-result — you are charged only for rows that land. No data, no charge (beyond the start fee).

🚧 Limitations

  • Per-officer breakdown (name / title / hours / individual comp from Part VII Section A) — that detail lives only in the 990 PDF and is not exposed by ProPublica's structured API. v1 emits the aggregate Part IX line 5 / 990PF Part I line 15 officer-compensation total. Per-officer PDF parsing is on the v2 roadmap.
  • Compensation from related organisations (Schedule J Part II) — same reason; PDF-only.
  • Filings older than approximately 2010 — ProPublica's structured coverage does not extend earlier.
  • Annual filing cadence — 990s file annually, often with an 18-month lag. This Actor provides historical and benchmark data, not a real-time feed.
  • 990-N (e-Postcard) — the very smallest nonprofits file 990-N, which has no financial detail. These organisations are not returned.

❓ FAQ

What is an IRS 990 scraper? An IRS 990 scraper extracts structured data from Form 990 filings — the annual information returns that US tax-exempt organisations must file with the IRS. ProPublica's Nonprofit Explorer API is the most reliable public source for this data; this Actor wraps it with formtype-correct field mapping and computed compensation totals.

How does this differ from ProPublica's bulk download? ProPublica offers a bulk CSV of their structured API output. This Actor adds three things: (1) formtype-correct officer-comp field selection across 990, 990EZ, and 990PF so you never get null values for private foundations; (2) a computed total_comp_and_benefits_usd field; and (3) on-demand EIN or name-search queries without downloading the entire bulk file.

What is a nonprofit data API? A nonprofit data API is a programmatic interface for accessing information about tax-exempt organisations — typically financial data, officer compensation, mission statements, and EIN identifiers sourced from IRS Form 990 filings. ProPublica's Nonprofit Explorer API is the canonical free source; this Actor structures its output into typed rows.

How do I do a form 990 data download for a specific sector? Use searchQuery mode with a sector keyword (e.g. "hospital", "education", "housing") and optionally set stateFilter. Set maxOrgs to control the batch size. The resulting dataset exports as CSV, JSON, Excel, or XML from the Apify Console.

Does this support 990PF (private foundations)? Yes. 990PF officer comp lives in a different API field (compofficers) than the standard 990/990EZ field (compnsatncurrofcr). We pick the correct field per filing so private foundations always return a populated officer_comp_usd value.

Can I use this for nonprofit officer compensation data research? Yes. The Actor is designed for exactly this use case — benchmarking executive pay across peer organisations, building sector-wide compensation panels, and conducting foundation due-diligence research. Set startYear and endYear to build multi-year panels.

What is the 990 financial data API pricing? $0.003 per row plus a $0.05 start fee. A 100-org, 3-year window run produces roughly 300 rows and costs approximately $0.95. There is no subscription or minimum spend.

Is the underlying data public domain? IRS Form 990 data is public domain. ProPublica's Nonprofit Explorer API is subject to their Terms of Use, which permits commercial use.

⭐ Your feedback

If this Actor saves you time on nonprofit research, a quick review on the Apify Store helps us keep improving it. Found a bug or need a field we don't currently emit (per-officer breakdowns, Schedule J data)? Open an issue on the Store listing and we will triage it.

  • Author: DevilScrapes
  • Issues and feature requests: open an issue on the Apify Store listing.
  • Source licensed under Apache 2.0.

Data source

ProPublica's Nonprofit Explorer API v2, which mirrors IRS Form 990 returns. IRS data is public domain; ProPublica's API is subject to their Terms of Use.