San Francisco Open Data Scraper avatar

San Francisco Open Data Scraper

Pricing

from $25.72 / 1,000 results

Go to Apify Store
San Francisco Open Data Scraper

San Francisco Open Data Scraper

Scrape any San Francisco Open Data dataset via Socrata SODA API. Business registrations, restaurants, permits, parking, 311 calls, evictions and more. No API key required.

Pricing

from $25.72 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

ParseForge Banner

🌉 San Francisco Open Data Scraper

🚀 Export any San Francisco Open Data dataset in seconds. Tap 659 published datasets including business registrations, restaurants, building permits, 311 cases, parking citations, evictions, police incidents, and more, via the official Socrata SODA API. No API key, no registration.

🕒 Last updated: 2026-05-13 · 📊 Native dataset schema per record · 🗂️ 659 datasets · 🌉 City and County of San Francisco · 🔌 Socrata SODA API

The San Francisco Open Data Scraper is a universal export tool for every dataset on data.sfgov.org. The City and County of San Francisco publishes 659 datasets covering city operations, public safety, transportation, economy, environment, health, and culture. This Actor lets you pull any of them by passing the Socrata 4x4 dataset ID, optionally adding SoQL filters ($where, $select, $order, $q), and downloading the result as CSV, Excel, JSON, or XML.

The catalog spans every major SF civic data set, including building permits (i98e-djp9), registered businesses (pyih-qa8i), 311 service requests (vw6y-z8j6), parking citations (5cei-gny5), eviction notices (tu7p-pa2g), mobile food permits (rqzj-sfat), police incident reports (wg3w-h783), restaurant inspections, film locations, and historical crime statistics. Output preserves the dataset's native schema and appends three metadata fields: _datasetId, _datasetUrl, and _scrapedAt.

🎯 Target Audience💡 Primary Use Cases
Civic researchers, journalists, prop-tech startups, GIS engineers, data scientists, public health analysts, real-estate firms, urban planners, studentsCivic dashboards, FOIA-style export, permit/business/restaurant feeds, eviction and 311 monitoring, journalism investigations, ML training data on municipal events

📋 What the SF Open Data Scraper does

Four filtering knobs map straight to Socrata SoQL:

  • 🆔 Dataset selector. Pick any of 659 datasets by 4x4 ID. Find IDs in the URL of any dataset page on data.sfgov.org.
  • 🔍 WHERE clause. Standard SoQL $where, e.g. permit_type=3 AND filed_date>'2024-01-01'.
  • 📋 SELECT clause. Limit returned columns via $select.
  • 📈 ORDER clause. Sort with $order, e.g. filed_date DESC.
  • 🔎 Full-text search. Free-text $q across all string columns.

Each record returns the dataset's native columns verbatim (with Socrata's internal :@computed_region_* lookup columns stripped to keep the output clean), plus three appended metadata fields: _datasetId, _datasetUrl, and _scrapedAt. Pagination is automatic and capped at 1,000,000 rows.

💡 Why it matters: San Francisco publishes one of the richest open-data catalogs of any U.S. city, but the SODA API has its own query language, paging quirks, and computed-region noise. This Actor turns that into a clean, paginated export with no Socrata code on your side.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded SF dataset.


⚙️ Input

InputTypeDefaultBehavior
datasetIdenum (4x4)"3pee-9qhc"Socrata 4x4 ID. Required. Enumerates all 659 datasets published on data.sfgov.org.
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
wherestring (SoQL)""Socrata $where filter.
selectstring (SoQL)""Comma-separated columns to return.
orderstring (SoQL)""Sort, e.g. filed_date DESC.
querystring""Free-text full-text search (Socrata $q).

Example: every building permit filed in 2026 with cost over $1M.

{
"datasetId": "i98e-djp9",
"maxItems": 1000,
"where": "filed_date>'2026-01-01' AND estimated_cost>1000000",
"order": "filed_date DESC"
}

Example: 311 cases mentioning 'graffiti' in the Mission.

{
"datasetId": "vw6y-z8j6",
"maxItems": 500,
"query": "graffiti",
"where": "neighborhoods_sffind_boundaries='Mission'"
}

⚠️ Good to Know: the input dataset list contains all 659 datasets currently exposed on data.sfgov.org. A small number are private (require Socrata authentication) and will return an HTTP 401 / 403 error record. Browse the full catalog and find the right 4x4 ID at data.sfgov.org.


📊 Output

Each record returns the dataset's native schema verbatim (Socrata internal :@computed_region_* columns are stripped) plus three metadata fields. Download as CSV, Excel, JSON, or XML.

🧾 Schema (illustrative for building permits dataset i98e-djp9)

FieldTypeExample
🆔 permit_numberstring"201903226060"
🏗️ permit_type_definitionstring"additions alterations or repairs"
📅 filed_dateISO 8601"2019-03-22T14:35:59.000"
📋 statusstring"expired"
📍 street_number / street_name / street_suffixstring"760" / "14th" / "St"
📝 descriptionstring"revision to pa 2017-1120-4452..."
💵 estimated_cost / revised_coststring (number)"15000.0" / "97000.0"
🏘️ existing_units / proposed_unitsstring (number)"12.0" / "14.0"
📮 zipcodestring"94114"
🗺️ neighborhoods_analysis_boundariesstring"Castro/Upper Market"
📍 locationobject{"latitude":"...","longitude":"..."}
🆔 _datasetIdstring"i98e-djp9"
🔗 _datasetUrlstring"https://data.sfgov.org/d/i98e-djp9"
🕒 _scrapedAtISO 8601"2026-05-13T10:00:00.000Z"

Every dataset has its own column set. The Actor passes through whatever Socrata returns for the dataset you picked.

📦 Sample record (building permits)


✨ Why choose this Actor

Capability
🗂️659 datasets, one Actor. Every public dataset on data.sfgov.org enumerated in the input schema.
🔍Full SoQL filtering. $where, $select, $order, $q exposed as input fields.
🧹Cleaned output. Socrata :@computed_region_* internal columns stripped automatically.
🔗Dataset provenance. Every record stamped with _datasetId, _datasetUrl, _scrapedAt.
Fast. 1,000-row pages, automatic pagination up to 1,000,000 rows.
🚫No API key. The Socrata SODA API is public and unauthenticated for all public datasets.

📊 SF's open-data catalog is one of the most cited public-sector datasets in the country, powering everything from civic-tech projects to academic research.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ SF Open Data Scraper (this Actor)$5 free credit, then pay-per-useAll 659 SF datasetsLive per runfull SoQL ($where, $select, $order, $q)⚡ 2 min
Manual CSV download from data.sfgov.orgFreeOne dataset at a timeSnapshotNone🐢 Manual
Raw Socrata SODA queriesFreeFullLiveSoQL🛠️ Code required
Third-party civic-data aggregators$99+/monthMixedDailyVendor-defined⏳ Hours

Pick this Actor when you want a clean, filtered export of any SF dataset without writing a single line of Socrata code.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the San Francisco Open Data Scraper page on the Apify Store.
  3. 🎯 Pick a dataset. Find the 4x4 ID on data.sfgov.org (it's in every dataset URL) and paste it in.
  4. 🔍 Add optional filters. Type a SoQL $where, $order, $select, or full-text $q if you want a slice.
  5. 🚀 Run it. Click Start and let the Actor collect your data.
  6. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🏢 Real Estate and Construction

  • Track every building permit filed in your target ZIP
  • Lead-gen from eviction notices and 3R reports
  • Comparable cost-per-unit analysis for development bids
  • Monitor neighborhood change with permit pipeline data

🍴 Restaurant and Hospitality

  • Power restaurant-inspection lookup tools
  • Sync mobile food permit feeds for delivery startups
  • Track new business registrations by SIC code
  • Spot health violations across neighborhoods

🚓 Public Safety and Insurance

  • Build crime-density dashboards by neighborhood
  • Underwrite policies with live incident data
  • Risk-score parcels with parking-citation history
  • Track 311 service-request volume per district

🗞️ Journalism and Civic Tech

  • Investigate displacement via eviction notices
  • Quantify housing-supply changes year over year
  • Build live-updating civic dashboards
  • Power newsroom data-explainer features

🔌 Automating SF Open Data Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Urban-studies papers on housing, transit, displacement
  • Public-health theses with 311 and inspection data
  • Reproducible policy-impact studies with versioned pulls
  • GIS coursework on real municipal datasets

🎨 Personal and creative

  • Neighborhood dashboards for your block
  • Side projects mapping every food truck in the city
  • Civic-art and visualization exhibitions
  • Hobby trackers for permit pipeline or 311 timing

🤝 Non-profit and civic

  • Housing-justice orgs tracking eviction filings
  • Mutual-aid networks monitoring 311 categories
  • Civic-tech hackathons with structured datasets
  • Investigative journalism on city-government performance

🧪 Experimentation

  • Train classification ML models on 311 narratives
  • Prototype agent pipelines that summarize city activity
  • Test geocoding and address-normalization toolchains
  • Validate civic-tech product hypotheses with live data

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Paste the Socrata 4x4 ID of any SF dataset, optionally add SoQL filters and maxItems, click Start, and the Actor pages through the SODA API and emits the records verbatim with three appended metadata fields. No browser automation, no captchas, no setup.

🆔 How do I find a dataset ID?

Browse the catalog at data.sfgov.org. Every dataset URL ends in a 4x4 ID like i98e-djp9 (building permits) or vw6y-z8j6 (311 cases). Paste that ID into the input form.

🗂️ How many datasets are supported?

All 659 datasets currently exposed on data.sfgov.org are enumerated in the input dropdown. New datasets are added by the City regularly; reach out if you need a specific one that isn't yet in the list.

🔍 What is SoQL?

SoQL is Socrata's SQL-like query language for the SODA API. The Actor exposes $where, $select, $order, and $q as input fields. Reference docs: dev.socrata.com. A short cheat sheet: $where=col='value', $order=col DESC, $select=col1,col2, $q=search text.

🧹 Why are some columns missing from the output?

Socrata appends internal :@computed_region_* lookup columns to most datasets. These are noise for downstream analytics, so the Actor strips them automatically. Everything else in the dataset's native schema is passed through verbatim.

🔄 How fresh is the data?

The City of San Francisco updates each dataset on its own cadence (some daily, some weekly, some monthly). Every run of this Actor fetches the latest data available on data.sfgov.org as of run time.

🚫 Why did I get a 401 or 403 error?

A small number of datasets are private and require Socrata authentication. The Actor will return a clean {error: ...} record indicating which one. Public datasets work without any credentials.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep a downstream database in sync.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

Yes. SF Open Data is published under the City of San Francisco Open Data Policy and is generally free to reuse with attribution. Specific datasets may carry additional notes on their landing page; check before commercial redistribution.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

SF Open Data Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get notified when a new record matches your filters
  • Airbyte - Pipe SF datasets into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh SF civic data into your CRM or analytics backend.


💡 Pro Tip: browse the complete ParseForge collection for more public-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the City and County of San Francisco or Tyler Technologies / Socrata. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.