ProPublica Nonprofit Crawler - IRS 990 & Tax-Exempt Org Data avatar

ProPublica Nonprofit Crawler - IRS 990 & Tax-Exempt Org Data

Pricing

Pay per event

Go to Apify Store
ProPublica Nonprofit Crawler - IRS 990 & Tax-Exempt Org Data

ProPublica Nonprofit Crawler - IRS 990 & Tax-Exempt Org Data

Crawl IRS 990 filings and tax-exempt org data from ProPublica. Search by name, EIN, state, NTEE code, or 501(c) subsection. Extract revenue, expenses, net assets, officer compensation, and PDF links. For grant writers, donors, journalists, and compliance teams.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

ProPublica Nonprofit Crawler — IRS 990 Filings & Tax-Exempt Org Data

Crawl IRS Form 990 filings and tax-exempt organization data from the ProPublica Nonprofit Explorer API. Returns organization identity, 501(c) classification, NTEE codes, multi-year financial history, officer compensation, and PDF links for ~1.8M US nonprofits — the same data source investigative journalists use to follow the money.


ProPublica Nonprofit Crawler Features

  • Searches ProPublica's free Nonprofit Explorer API by name, state, NTEE category, or 501(c) subsection.
  • Extracts one row per (organization, tax year) when full financial data is requested — so year-over-year comparisons work without reshaping.
  • Returns 45+ fields per row including total revenue, functional expenses, net assets, total contributions, program service revenue, officer compensation, and wage breakdowns.
  • Fetches specific organizations by EIN list, with or without dashes.
  • Filters by 501(c) subsection (3, 4, 5, 6, 7, 8, 9, 10, 19) and NTEE top-level category (1-10).
  • Emits the direct ProPublica PDF URL for every 990 filing, structured or scanned.
  • No API key, no proxy, no browser. Pure JSON API with a polite 300ms pace.

Who Uses IRS 990 Data?

  • Grant writers — Screen foundations by NTEE category and asset size before writing another application nobody reads.
  • Donor due diligence — Pull a nonprofit's last five years of financials to check whether the ratio of officer comp to program expenses is reasonable or not.
  • Investigative journalists — Build leads by filtering 501(c)(4) social-welfare orgs in a specific state, or track foundation-to-foundation grants across years.
  • Compliance & KYC teams — Screen nonprofit counterparties against the IRS Business Master File and flag organizations with unusual asset or contribution patterns.
  • Academic researchers — Export a bounded slice of the nonprofit sector (hospitals, foundations, advocacy orgs) for econometric work without wrestling with IRS bulk extracts.
  • Market researchers — Size up a vertical by counting 501(c)(3) orgs with revenue above a threshold in a given geography.

How the ProPublica Nonprofit Crawler Works

  1. Pick a mode: search to paginate filtered results, or organizations to fetch specific EINs you already have.
  2. Set filters — search term, state, NTEE category, 501(c) subsection. Or leave them empty and browse the whole universe one page at a time.
  3. Decide whether you want one row per org (fast) or one row per filing year (richer). The includeFilings toggle controls this.
  4. The crawler paginates ProPublica's API at ~3 requests per second, hydrates each match with the organization detail endpoint when needed, and writes a flat JSON record per row. Dataset is ready to export as CSV, Excel, or JSON.

Input

{
"mode": "search",
"searchTerm": "hospital",
"state": "VT",
"nteeCategory": "",
"subsectionCode": "3",
"einList": [],
"includeFilings": true,
"maxItems": 100
}
FieldTypeDefaultDescription
modestring"search"Either "search" (paginate filtered results) or "organizations" (fetch specific EINs).
searchTermstring"hospital"Free-text query matched against org names. Leave empty to browse all orgs matching the other filters.
statestring""Two-letter US state code (e.g., "CA", "NY"). Search mode only.
nteeCategorystring""NTEE top-level category 1-10 (Arts, Education, Environment, Health, Human Services, International, Public Benefit, Religion, Mutual Benefit, Unclassified). Search mode only.
subsectionCodestring""IRS 501(c) subsection. "3" = charitable, "4" = social welfare, "6" = business leagues, etc. Search mode only.
einListarray[]EINs to fetch directly. Accepts "13-1623888" or "131623888". Required when mode is "organizations".
includeFilingsbooleantrueWhen true, emits one row per (org, tax year) with full 990 financials. When false, one summary row per org.
maxItemsinteger100Hard cap on records returned. Each filing year counts as one record when includeFilings is on.

Organizations mode example — fetch three specific EINs with their full filing history:

{
"mode": "organizations",
"einList": ["13-1623888", "53-0196605", "941340523"],
"includeFilings": true,
"maxItems": 50
}

ProPublica Nonprofit Crawler Output Fields

Each record is one (organization, tax year) row when includeFilings: true, or one summary row per organization when false. Organization-level fields are repeated across every filing row so downstream joins are trivial.

{
"ein": "941340523",
"strein": "94-1340523",
"name": "Kaiser Foundation Health Plan Inc",
"careofname": "% KP TAX",
"address": "ONE KAISER PLAZA",
"city": "Oakland",
"state": "CA",
"zipcode": "94612-3610",
"ntee_code": "E310",
"subsection_code": 3,
"subsection_label": "501(c)(3) - Charitable / religious / educational",
"classification_codes": "1200",
"activity_codes": "164000000",
"foundation_code": 16,
"deductibility_code": 1,
"exempt_organization_status_code": 1,
"organization_code": 1,
"ruling_date": "1981-12-01",
"latest_tax_period": "2024-12-01",
"latest_asset_amount": 33547368863,
"latest_income_amount": 93006408021,
"latest_revenue_amount": 82490440881,
"filing_year": 2023,
"filing_tax_period": 202312,
"filing_type": "990",
"filing_has_data": true,
"filing_pdf_url": "https://projects.propublica.org/nonprofits/download-filing?path=...",
"filing_updated": "2025-08-05T16:11:09.202Z",
"total_revenue": 75101306911,
"total_functional_expenses": 74356004001,
"total_assets_end": 31400724759,
"total_liabilities_end": 22078604890,
"net_assets_end": 9322119869,
"total_contributions": 11542682,
"program_service_revenue": 75068903991,
"investment_income": 298237561,
"net_rental_income": 1970907,
"net_gains_losses": -280418643,
"compensation_current_officers": 90793859,
"other_salaries_wages": 2680303412,
"payroll_taxes": 233744277,
"professional_fundraising_fees": 0,
"unrelated_business_income": "Y",
"data_source": "current_2026_03_10",
"updated_at": "2026-03-10T23:37:21.272Z",
"propublica_url": "https://projects.propublica.org/nonprofits/organizations/941340523",
"scraped_at": "2026-04-19T10:16:31.950Z"
}
FieldTypeDescription
einstring9-digit Employer Identification Number with leading zeros preserved.
streinstringEIN formatted with dash (XX-XXXXXXX).
namestringOrganization legal name.
sub_namestringAlternative name or DBA (from search results).
careofnamestringCare-of name on the IRS record.
address, city, state, zipcodestringRegistered address.
ntee_codestringNTEE classification (e.g., E200 for hospitals).
subsection_codenumberIRS 501(c) subsection integer.
subsection_labelstringHuman-readable subsection label.
classification_codesstringIRS classification codes.
activity_codesstringIRS activity codes.
foundation_codenumberIRS foundation status code.
deductibility_codenumberIRS deductibility code.
exempt_organization_status_codenumberIRS exempt status code (1 = unconditional).
organization_codenumberIRS organization type (1 = corporation, 2 = trust, etc.).
ruling_datestringDate the IRS granted exempt status (YYYY-MM-DD).
latest_tax_periodstringMost recent tax period on the master file.
latest_asset_amountnumberMost recent reported total assets (USD).
latest_income_amountnumberMost recent reported total income (USD).
latest_revenue_amountnumberMost recent reported total revenue (USD).
filing_yearnumberCalendar year of this filing.
filing_tax_periodnumberTax period end in YYYYMM format.
filing_typestringIRS form (990, 990-EZ, 990-PF).
filing_has_databooleanTrue when ProPublica parsed structured financial fields; false when only a PDF is available.
filing_pdf_urlstringDirect link to the 990 PDF.
filing_updatedstringLast-updated timestamp for the filing record (ISO 8601).
total_revenuenumberTotal revenue on the filing (USD).
total_functional_expensesnumberTotal functional expenses (USD).
total_assets_endnumberTotal assets at year end (USD).
total_liabilities_endnumberTotal liabilities at year end (USD).
net_assets_endnumberNet assets at year end (USD).
total_contributionsnumberTotal contributions, gifts, and grants received (USD).
program_service_revenuenumberTotal program service revenue (USD).
investment_incomenumberInvestment income (USD).
net_rental_incomenumberNet rental income (USD).
net_gains_lossesnumberNet gains/losses from asset sales (USD).
compensation_current_officersnumberCompensation of current officers, directors, trustees, and key employees (USD).
other_salaries_wagesnumberAll other salaries and wages (USD).
payroll_taxesnumberPayroll taxes (USD).
professional_fundraising_feesnumberProfessional fundraising fees (USD).
unrelated_business_incomestringY or N — whether the filing reports unrelated business income.
data_sourcestringProPublica data snapshot label (e.g., current_2026_03_10).
updated_atstringOrganization record last-updated timestamp.
propublica_urlstringLink to the ProPublica Nonprofit Explorer page for this org.
scraped_atstringTimestamp when this record was produced.

FAQ

How do I scrape IRS 990 data from ProPublica?

ProPublica Nonprofit Crawler wraps the Nonprofit Explorer API v2 and returns structured JSON. Set the mode to search, optionally filter by state, NTEE category, and 501(c) subsection, and run the actor. Output can be exported as CSV, Excel, or JSON from the run dataset.

How much does ProPublica Nonprofit Crawler cost to run?

ProPublica Nonprofit Crawler uses pay-per-event pricing: $0.10 per actor start plus $0.001 per record. A thousand-record pull costs about $1.10. A 100-record preview is around $0.20.

Can I fetch specific nonprofits by EIN?

ProPublica Nonprofit Crawler supports direct EIN lookup through the organizations mode. Paste a list of EINs (with or without dashes) into einList and the actor will fetch each organization and every filing ProPublica has on record.

Does ProPublica Nonprofit Crawler need a proxy or API key?

ProPublica Nonprofit Crawler doesn't need either. The Nonprofit Explorer API is free and unauthenticated, and the actor runs well inside the site's courtesy rate limit with no proxy configuration.

How many years of financial history does it return?

ProPublica Nonprofit Crawler returns every filing ProPublica has on record for each organization — typically 10-15 years of 990, 990-EZ, or 990-PF filings, back to the early 2000s for long-lived orgs. When includeFilings: true, each year is a separate row with full financial fields.

What's the difference between filing_has_data: true and false?

ProPublica Nonprofit Crawler marks a filing filing_has_data: true when ProPublica has parsed structured financial fields from the IRS extract, so you get all the revenue and expense columns populated. When false, only the PDF and basic metadata are available — the IRS has not yet released structured data for that tax year. Recent filings (last 1-2 years) are commonly in this state.


Need More Features?

Need custom fields, a different nonprofit data source, or batch export to your warehouse? File an issue on the actor page or get in touch.

Why Use ProPublica Nonprofit Crawler?

  • Affordable — $0.10 per start plus $0.001 per record. A thousand nonprofits runs about $1.
  • Fresh data — Pulls from ProPublica's live mirror of the IRS Business Master File and Annual Financial Extract, which is how the Nonprofit Explorer website itself gets its numbers. The data_source field tells you exactly which IRS extract the row came from.
  • Clean multi-year shape — One row per (organization, tax year) means year-over-year comparisons and trend analysis work without reshaping the output. Most scrapers hand you a nested blob and wish you luck.