New York Open Data Scraper avatar

New York Open Data Scraper

Pricing

from $13.00 / 1,000 result items

Go to Apify Store
New York Open Data Scraper

New York Open Data Scraper

Query New York State open data catalog across thousands of datasets covering health, transport, education, finance, environment, and demographics. Filter by dataset, agency, or category and export rows to JSON, CSV, or Excel for civic research, journalism, and analytics dashboards.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

🗽 New York State Open Data Scraper

🚀 Export any New York State open dataset in seconds. Pull rows from 1,500+ public datasets covering driver licenses, tax records, health, education, transportation, and more. Filter with full-text search, column equality, and custom sort.

🕒 Last updated: 2026-05-23 · 📊 4 fields per record · 🗽 1,500+ datasets · 🏛️ Official NY State source · 🔎 Socrata-powered

The New York State Open Data Scraper taps data.ny.gov, the official Socrata-powered open-data hub of New York State government. The Actor returns 4 structured fields per record: the dataset resource ID, the full row payload, a collection timestamp, and an error slot. The row payload preserves every column from the source dataset exactly as Socrata serves it, so downstream pipelines see no schema loss.

The catalog covers more than 1,500 public datasets across active driver licenses, tax rolls, public health surveillance, school report cards, MTA ridership, real-estate assessments, environmental sensor feeds, agency budgets, election results, vehicle inspections, and dozens of other domains. This Actor wraps Socrata SoQL so you can search, filter, and sort without touching the API directly.

🎯 Target Audience💡 Primary Use Cases
Real-estate analysts, NY-focused journalists, civic-tech developers, urban planners, transportation researchers, public-health teams, transparency advocatesProperty valuation comps, investigative reporting, civic-app data layers, urban analytics dashboards, transit ridership studies, license verification pipelines

📋 What the New York State Open Data Scraper does

A single workflow with rich filtering:

  • 📊 Pull rows by resource ID. Pass a 4x4 Socrata identifier (e.g. 9a8c-vfzj Active Driver License Information).
  • 🔎 Full-text search. Optional $q query across every column.
  • 🧮 Column filters. Pass {"county": "BRONX", "operation_type": "Store"} for exact-match equality filters.
  • 📐 Sort order. Order by any column (license_number DESC, city ASC).
  • 🪪 Stable schema. Each record bundles the resource ID, the raw row, and a collection timestamp.

The Actor handles Socrata pagination automatically so you do not have to worry about offsets.

💡 Why it matters: New York State publishes one of the largest civic open-data catalogs in the U.S., yet most teams burn engineering hours writing a Socrata client per dataset. This Actor delivers consistent rows you can pipe straight into BI tools, notebooks, or civic-tech apps without per-dataset glue code.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
resourceIdstring"9a8c-vfzj"4x4 Socrata identifier for the dataset. Find IDs at data.ny.gov.
searchQuerystring""Optional full-text query across every column.
filtersobject{}Exact-match column filters as key/value pairs.
sortOrderstring""Order results by a column (e.g. license_number DESC).

Example: Bronx restaurants from the Active Driver License dataset.

{
"maxItems": 50,
"resourceId": "9a8c-vfzj",
"filters": { "county": "BRONX" },
"sortOrder": "license_number DESC"
}

Example: full-text search across NY State agency contracts.

{
"maxItems": 100,
"resourceId": "6gke-w4nb",
"searchQuery": "education technology"
}

⚠️ Good to Know: dataset resource IDs are 4x4 Socrata identifiers visible in every dataset URL at data.ny.gov. Column names in filters and sortOrder use the dataset's API field names (lowercase with underscores), not the human-readable column headers.


📊 Output

Each record contains 4 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 resourceIdstring"9a8c-vfzj"
📦 rowobject{ "license_number": "...", "county": "BRONX", ... }
🕒 scrapedAtISO 8601"2026-05-23T10:00:00.000Z"
errorstring | nullnull

📦 Sample records


✨ Why choose this Actor

Capability
🗽1,500+ datasets. Every public dataset on data.ny.gov is reachable by resource ID.
🔎Native Socrata filtering. Full-text search, column equality, and sort, exposed as inputs.
📦Schema-preserving rows. Every dataset column is passed through unchanged.
🏛️Official source. Direct from the NY State open-data portal, no third-party caching.
Fast pagination. Pulls thousands of rows per minute with automatic offset handling.
🚫No authentication. Works against the public Socrata catalog. No login or API key needed.
🔁Always fresh. Each run pulls live rows, reflecting whatever the publishing agency updated last.

📊 NY State has one of the most active state-level open-data programs in the U.S. This Actor turns that catalog into structured rows for any downstream system.


📈 How it compares to alternatives

ApproachCostCoverageRefreshSetup
⭐ NY State Open Data Scraper (this Actor)$5 free credit, then pay-per-use1,500+ datasetsLive per run⚡ 2 min
Hand-written Socrata clientFree + engineeringSameBuild it yourself🛠️ Hours per dataset
Commercial real-estate / civic data providers$$$$Curated subsetReal-time⏳ Days
One-off CSV downloadsFreeSnapshot onlyManual🐢 Tech debt

Pick this Actor when you want consistent NY State open-data rows without writing per-dataset glue code.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the NY State Open Data Scraper page on the Apify Store.
  3. 🎯 Set input. Paste a resource ID from data.ny.gov, add optional filters or a search query, and set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🏠 Real Estate & Property

  • Assessment-roll comps for valuation models
  • Tax-lien and lis-pendens monitoring
  • Neighborhood and borough demographic overlays
  • Property-transfer pipelines for brokerage tools

📰 Journalism & Investigations

  • FOIL-free first pass on state datasets
  • Public-spending and contract analysis
  • License and inspection-record reporting
  • Election and campaign-finance reporting

🏛️ Civic-Tech & Government

  • Civic-app data layers
  • Public-service dashboards
  • Resident-facing search tools
  • Transparency portals and accountability sites

📊 Urban Analytics & Planning

  • Transit ridership trend modeling
  • Public-health surveillance dashboards
  • School-performance research
  • Environmental sensor analytics

🔌 Automating New York State Open Data Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. A daily run after the publishing agency's nightly refresh keeps downstream tables current automatically.


🌟 Beyond business use cases

Open civic data powers more than enterprise dashboards. The same structured rows support research, education, activism, and personal initiatives.

🎓 Research and academia

  • Urban-planning thesis projects
  • Public-policy quantitative research
  • Reproducible civic-data coursework
  • Open-government accountability studies

🎨 Personal and creative

  • Neighborhood-explorer hobby maps
  • Personal property-research tools
  • Data-art and civic-visualization projects
  • Local-history storytelling

🤝 Non-profit and civic

  • Voter education and outreach
  • Tenant-rights organizing
  • Public-health advocacy
  • Environmental justice campaigns

🧪 Experimentation

  • Train city-prediction models
  • Prototype civic AI agents
  • Build neighborhood-aware browser extensions
  • Test data-pipeline frameworks on real records

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Paste a resource ID from any data.ny.gov dataset, add optional filters or a search query, click Start, and the Actor pages through Socrata and emits one clean structured row per record.

📏 How accurate is the data?

The Actor reads rows directly from the official NY State portal. Accuracy and timeliness depend on the publishing agency. Each dataset page on data.ny.gov shows the last-updated timestamp.

🔁 How often is the dataset refreshed?

Update cadence varies by dataset: some refresh nightly (driver licenses, MTA ridership), others quarterly or annually (tax rolls, school report cards). Every run of this Actor pulls live rows.

🆔 How do I find a resource ID?

Open any dataset on data.ny.gov. The 4x4 identifier (e.g. 9a8c-vfzj) appears in the URL and in the API docs panel.

📊 How many datasets are covered?

More than 1,500 public datasets across health, transportation, education, real estate, energy, agriculture, public safety, and dozens of other domains.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval. A nightly cron is enough for most operational use cases.

Data published on data.ny.gov is open for commercial reuse under standard public-data terms. Check the per-dataset terms page on data.ny.gov for any attribution requirements.

💼 Can I use this data commercially?

Yes. NY State open data is licensed for commercial reuse. You are responsible for downstream compliance with privacy regulations relevant to your use case.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and unlocks scheduling and higher concurrency.

🧮 How do filters work?

Pass an object like {"county": "BRONX"} for exact-match column equality. For ranges and complex predicates use the search query for full-text matching across all columns.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

NY State Open Data Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe NY State rows into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to alert your team when a watched dataset crosses a threshold, or to push fresh rows into a Notion knowledge base.


💡 Pro Tip: browse the complete ParseForge collection for more civic and open-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the State of New York, any New York State agency, or Socrata/Tyler Technologies. All trademarks mentioned are the property of their respective owners. Only publicly available open civic data is collected.