ProPublica Nonprofit Crawler - IRS 990 & Tax-Exempt Org Data
Pricing
Pay per event
ProPublica Nonprofit Crawler - IRS 990 & Tax-Exempt Org Data
Crawl IRS 990 filings and tax-exempt org data from ProPublica. Search by name, EIN, state, NTEE code, or 501(c) subsection. Extract revenue, expenses, net assets, officer compensation, and PDF links. For grant writers, donors, journalists, and compliance teams.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
ProPublica Nonprofit Crawler — IRS 990 Filings & Tax-Exempt Org Data
Crawl IRS Form 990 filings and tax-exempt organization data from the ProPublica Nonprofit Explorer API. Returns organization identity, 501(c) classification, NTEE codes, multi-year financial history, officer compensation, and PDF links for ~1.8M US nonprofits — the same data source investigative journalists use to follow the money.
ProPublica Nonprofit Crawler Features
- Searches ProPublica's free Nonprofit Explorer API by name, state, NTEE category, or 501(c) subsection.
- Extracts one row per (organization, tax year) when full financial data is requested — so year-over-year comparisons work without reshaping.
- Returns 45+ fields per row including total revenue, functional expenses, net assets, total contributions, program service revenue, officer compensation, and wage breakdowns.
- Fetches specific organizations by EIN list, with or without dashes.
- Filters by 501(c) subsection (3, 4, 5, 6, 7, 8, 9, 10, 19) and NTEE top-level category (1-10).
- Emits the direct ProPublica PDF URL for every 990 filing, structured or scanned.
- No API key, no proxy, no browser. Pure JSON API with a polite 300ms pace.
Who Uses IRS 990 Data?
- Grant writers — Screen foundations by NTEE category and asset size before writing another application nobody reads.
- Donor due diligence — Pull a nonprofit's last five years of financials to check whether the ratio of officer comp to program expenses is reasonable or not.
- Investigative journalists — Build leads by filtering 501(c)(4) social-welfare orgs in a specific state, or track foundation-to-foundation grants across years.
- Compliance & KYC teams — Screen nonprofit counterparties against the IRS Business Master File and flag organizations with unusual asset or contribution patterns.
- Academic researchers — Export a bounded slice of the nonprofit sector (hospitals, foundations, advocacy orgs) for econometric work without wrestling with IRS bulk extracts.
- Market researchers — Size up a vertical by counting 501(c)(3) orgs with revenue above a threshold in a given geography.
How the ProPublica Nonprofit Crawler Works
- Pick a mode:
searchto paginate filtered results, ororganizationsto fetch specific EINs you already have. - Set filters — search term, state, NTEE category, 501(c) subsection. Or leave them empty and browse the whole universe one page at a time.
- Decide whether you want one row per org (fast) or one row per filing year (richer). The
includeFilingstoggle controls this. - The crawler paginates ProPublica's API at ~3 requests per second, hydrates each match with the organization detail endpoint when needed, and writes a flat JSON record per row. Dataset is ready to export as CSV, Excel, or JSON.
Input
{"mode": "search","searchTerm": "hospital","state": "VT","nteeCategory": "","subsectionCode": "3","einList": [],"includeFilings": true,"maxItems": 100}
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | "search" | Either "search" (paginate filtered results) or "organizations" (fetch specific EINs). |
searchTerm | string | "hospital" | Free-text query matched against org names. Leave empty to browse all orgs matching the other filters. |
state | string | "" | Two-letter US state code (e.g., "CA", "NY"). Search mode only. |
nteeCategory | string | "" | NTEE top-level category 1-10 (Arts, Education, Environment, Health, Human Services, International, Public Benefit, Religion, Mutual Benefit, Unclassified). Search mode only. |
subsectionCode | string | "" | IRS 501(c) subsection. "3" = charitable, "4" = social welfare, "6" = business leagues, etc. Search mode only. |
einList | array | [] | EINs to fetch directly. Accepts "13-1623888" or "131623888". Required when mode is "organizations". |
includeFilings | boolean | true | When true, emits one row per (org, tax year) with full 990 financials. When false, one summary row per org. |
maxItems | integer | 100 | Hard cap on records returned. Each filing year counts as one record when includeFilings is on. |
Organizations mode example — fetch three specific EINs with their full filing history:
{"mode": "organizations","einList": ["13-1623888", "53-0196605", "941340523"],"includeFilings": true,"maxItems": 50}
ProPublica Nonprofit Crawler Output Fields
Each record is one (organization, tax year) row when includeFilings: true, or one summary row per organization when false. Organization-level fields are repeated across every filing row so downstream joins are trivial.
{"ein": "941340523","strein": "94-1340523","name": "Kaiser Foundation Health Plan Inc","careofname": "% KP TAX","address": "ONE KAISER PLAZA","city": "Oakland","state": "CA","zipcode": "94612-3610","ntee_code": "E310","subsection_code": 3,"subsection_label": "501(c)(3) - Charitable / religious / educational","classification_codes": "1200","activity_codes": "164000000","foundation_code": 16,"deductibility_code": 1,"exempt_organization_status_code": 1,"organization_code": 1,"ruling_date": "1981-12-01","latest_tax_period": "2024-12-01","latest_asset_amount": 33547368863,"latest_income_amount": 93006408021,"latest_revenue_amount": 82490440881,"filing_year": 2023,"filing_tax_period": 202312,"filing_type": "990","filing_has_data": true,"filing_pdf_url": "https://projects.propublica.org/nonprofits/download-filing?path=...","filing_updated": "2025-08-05T16:11:09.202Z","total_revenue": 75101306911,"total_functional_expenses": 74356004001,"total_assets_end": 31400724759,"total_liabilities_end": 22078604890,"net_assets_end": 9322119869,"total_contributions": 11542682,"program_service_revenue": 75068903991,"investment_income": 298237561,"net_rental_income": 1970907,"net_gains_losses": -280418643,"compensation_current_officers": 90793859,"other_salaries_wages": 2680303412,"payroll_taxes": 233744277,"professional_fundraising_fees": 0,"unrelated_business_income": "Y","data_source": "current_2026_03_10","updated_at": "2026-03-10T23:37:21.272Z","propublica_url": "https://projects.propublica.org/nonprofits/organizations/941340523","scraped_at": "2026-04-19T10:16:31.950Z"}
| Field | Type | Description |
|---|---|---|
ein | string | 9-digit Employer Identification Number with leading zeros preserved. |
strein | string | EIN formatted with dash (XX-XXXXXXX). |
name | string | Organization legal name. |
sub_name | string | Alternative name or DBA (from search results). |
careofname | string | Care-of name on the IRS record. |
address, city, state, zipcode | string | Registered address. |
ntee_code | string | NTEE classification (e.g., E200 for hospitals). |
subsection_code | number | IRS 501(c) subsection integer. |
subsection_label | string | Human-readable subsection label. |
classification_codes | string | IRS classification codes. |
activity_codes | string | IRS activity codes. |
foundation_code | number | IRS foundation status code. |
deductibility_code | number | IRS deductibility code. |
exempt_organization_status_code | number | IRS exempt status code (1 = unconditional). |
organization_code | number | IRS organization type (1 = corporation, 2 = trust, etc.). |
ruling_date | string | Date the IRS granted exempt status (YYYY-MM-DD). |
latest_tax_period | string | Most recent tax period on the master file. |
latest_asset_amount | number | Most recent reported total assets (USD). |
latest_income_amount | number | Most recent reported total income (USD). |
latest_revenue_amount | number | Most recent reported total revenue (USD). |
filing_year | number | Calendar year of this filing. |
filing_tax_period | number | Tax period end in YYYYMM format. |
filing_type | string | IRS form (990, 990-EZ, 990-PF). |
filing_has_data | boolean | True when ProPublica parsed structured financial fields; false when only a PDF is available. |
filing_pdf_url | string | Direct link to the 990 PDF. |
filing_updated | string | Last-updated timestamp for the filing record (ISO 8601). |
total_revenue | number | Total revenue on the filing (USD). |
total_functional_expenses | number | Total functional expenses (USD). |
total_assets_end | number | Total assets at year end (USD). |
total_liabilities_end | number | Total liabilities at year end (USD). |
net_assets_end | number | Net assets at year end (USD). |
total_contributions | number | Total contributions, gifts, and grants received (USD). |
program_service_revenue | number | Total program service revenue (USD). |
investment_income | number | Investment income (USD). |
net_rental_income | number | Net rental income (USD). |
net_gains_losses | number | Net gains/losses from asset sales (USD). |
compensation_current_officers | number | Compensation of current officers, directors, trustees, and key employees (USD). |
other_salaries_wages | number | All other salaries and wages (USD). |
payroll_taxes | number | Payroll taxes (USD). |
professional_fundraising_fees | number | Professional fundraising fees (USD). |
unrelated_business_income | string | Y or N — whether the filing reports unrelated business income. |
data_source | string | ProPublica data snapshot label (e.g., current_2026_03_10). |
updated_at | string | Organization record last-updated timestamp. |
propublica_url | string | Link to the ProPublica Nonprofit Explorer page for this org. |
scraped_at | string | Timestamp when this record was produced. |
FAQ
How do I scrape IRS 990 data from ProPublica?
ProPublica Nonprofit Crawler wraps the Nonprofit Explorer API v2 and returns structured JSON. Set the mode to search, optionally filter by state, NTEE category, and 501(c) subsection, and run the actor. Output can be exported as CSV, Excel, or JSON from the run dataset.
How much does ProPublica Nonprofit Crawler cost to run?
ProPublica Nonprofit Crawler uses pay-per-event pricing: $0.10 per actor start plus $0.001 per record. A thousand-record pull costs about $1.10. A 100-record preview is around $0.20.
Can I fetch specific nonprofits by EIN?
ProPublica Nonprofit Crawler supports direct EIN lookup through the organizations mode. Paste a list of EINs (with or without dashes) into einList and the actor will fetch each organization and every filing ProPublica has on record.
Does ProPublica Nonprofit Crawler need a proxy or API key?
ProPublica Nonprofit Crawler doesn't need either. The Nonprofit Explorer API is free and unauthenticated, and the actor runs well inside the site's courtesy rate limit with no proxy configuration.
How many years of financial history does it return?
ProPublica Nonprofit Crawler returns every filing ProPublica has on record for each organization — typically 10-15 years of 990, 990-EZ, or 990-PF filings, back to the early 2000s for long-lived orgs. When includeFilings: true, each year is a separate row with full financial fields.
What's the difference between filing_has_data: true and false?
ProPublica Nonprofit Crawler marks a filing filing_has_data: true when ProPublica has parsed structured financial fields from the IRS extract, so you get all the revenue and expense columns populated. When false, only the PDF and basic metadata are available — the IRS has not yet released structured data for that tax year. Recent filings (last 1-2 years) are commonly in this state.
Need More Features?
Need custom fields, a different nonprofit data source, or batch export to your warehouse? File an issue on the actor page or get in touch.
Why Use ProPublica Nonprofit Crawler?
- Affordable — $0.10 per start plus $0.001 per record. A thousand nonprofits runs about $1.
- Fresh data — Pulls from ProPublica's live mirror of the IRS Business Master File and Annual Financial Extract, which is how the Nonprofit Explorer website itself gets its numbers. The
data_sourcefield tells you exactly which IRS extract the row came from. - Clean multi-year shape — One row per (organization, tax year) means year-over-year comparisons and trend analysis work without reshaping the output. Most scrapers hand you a nested blob and wish you luck.