San Francisco Open Data Scraper
Pricing
from $25.72 / 1,000 results
San Francisco Open Data Scraper
Scrape any San Francisco Open Data dataset via Socrata SODA API. Business registrations, restaurants, permits, parking, 311 calls, evictions and more. No API key required.
Pricing
from $25.72 / 1,000 results
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share

🌉 San Francisco Open Data Scraper
🚀 Export any San Francisco Open Data dataset in seconds. Tap 659 published datasets including business registrations, restaurants, building permits, 311 cases, parking citations, evictions, police incidents, and more, via the official Socrata SODA API. No API key, no registration.
🕒 Last updated: 2026-05-13 · 📊 Native dataset schema per record · 🗂️ 659 datasets · 🌉 City and County of San Francisco · 🔌 Socrata SODA API
The San Francisco Open Data Scraper is a universal export tool for every dataset on data.sfgov.org. The City and County of San Francisco publishes 659 datasets covering city operations, public safety, transportation, economy, environment, health, and culture. This Actor lets you pull any of them by passing the Socrata 4x4 dataset ID, optionally adding SoQL filters ($where, $select, $order, $q), and downloading the result as CSV, Excel, JSON, or XML.
The catalog spans every major SF civic data set, including building permits (i98e-djp9), registered businesses (pyih-qa8i), 311 service requests (vw6y-z8j6), parking citations (5cei-gny5), eviction notices (tu7p-pa2g), mobile food permits (rqzj-sfat), police incident reports (wg3w-h783), restaurant inspections, film locations, and historical crime statistics. Output preserves the dataset's native schema and appends three metadata fields: _datasetId, _datasetUrl, and _scrapedAt.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Civic researchers, journalists, prop-tech startups, GIS engineers, data scientists, public health analysts, real-estate firms, urban planners, students | Civic dashboards, FOIA-style export, permit/business/restaurant feeds, eviction and 311 monitoring, journalism investigations, ML training data on municipal events |
📋 What the SF Open Data Scraper does
Four filtering knobs map straight to Socrata SoQL:
- 🆔 Dataset selector. Pick any of 659 datasets by 4x4 ID. Find IDs in the URL of any dataset page on
data.sfgov.org. - 🔍 WHERE clause. Standard SoQL
$where, e.g.permit_type=3 AND filed_date>'2024-01-01'. - 📋 SELECT clause. Limit returned columns via
$select. - 📈 ORDER clause. Sort with
$order, e.g.filed_date DESC. - 🔎 Full-text search. Free-text
$qacross all string columns.
Each record returns the dataset's native columns verbatim (with Socrata's internal :@computed_region_* lookup columns stripped to keep the output clean), plus three appended metadata fields: _datasetId, _datasetUrl, and _scrapedAt. Pagination is automatic and capped at 1,000,000 rows.
💡 Why it matters: San Francisco publishes one of the richest open-data catalogs of any U.S. city, but the SODA API has its own query language, paging quirks, and computed-region noise. This Actor turns that into a clean, paginated export with no Socrata code on your side.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded SF dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
datasetId | enum (4x4) | "3pee-9qhc" | Socrata 4x4 ID. Required. Enumerates all 659 datasets published on data.sfgov.org. |
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
where | string (SoQL) | "" | Socrata $where filter. |
select | string (SoQL) | "" | Comma-separated columns to return. |
order | string (SoQL) | "" | Sort, e.g. filed_date DESC. |
query | string | "" | Free-text full-text search (Socrata $q). |
Example: every building permit filed in 2026 with cost over $1M.
{"datasetId": "i98e-djp9","maxItems": 1000,"where": "filed_date>'2026-01-01' AND estimated_cost>1000000","order": "filed_date DESC"}
Example: 311 cases mentioning 'graffiti' in the Mission.
{"datasetId": "vw6y-z8j6","maxItems": 500,"query": "graffiti","where": "neighborhoods_sffind_boundaries='Mission'"}
⚠️ Good to Know: the input dataset list contains all 659 datasets currently exposed on data.sfgov.org. A small number are private (require Socrata authentication) and will return an HTTP 401 / 403 error record. Browse the full catalog and find the right 4x4 ID at data.sfgov.org.
📊 Output
Each record returns the dataset's native schema verbatim (Socrata internal :@computed_region_* columns are stripped) plus three metadata fields. Download as CSV, Excel, JSON, or XML.
🧾 Schema (illustrative for building permits dataset i98e-djp9)
| Field | Type | Example |
|---|---|---|
🆔 permit_number | string | "201903226060" |
🏗️ permit_type_definition | string | "additions alterations or repairs" |
📅 filed_date | ISO 8601 | "2019-03-22T14:35:59.000" |
📋 status | string | "expired" |
📍 street_number / street_name / street_suffix | string | "760" / "14th" / "St" |
📝 description | string | "revision to pa 2017-1120-4452..." |
💵 estimated_cost / revised_cost | string (number) | "15000.0" / "97000.0" |
🏘️ existing_units / proposed_units | string (number) | "12.0" / "14.0" |
📮 zipcode | string | "94114" |
🗺️ neighborhoods_analysis_boundaries | string | "Castro/Upper Market" |
📍 location | object | {"latitude":"...","longitude":"..."} |
🆔 _datasetId | string | "i98e-djp9" |
🔗 _datasetUrl | string | "https://data.sfgov.org/d/i98e-djp9" |
🕒 _scrapedAt | ISO 8601 | "2026-05-13T10:00:00.000Z" |
Every dataset has its own column set. The Actor passes through whatever Socrata returns for the dataset you picked.
📦 Sample record (building permits)
✨ Why choose this Actor
| Capability | |
|---|---|
| 🗂️ | 659 datasets, one Actor. Every public dataset on data.sfgov.org enumerated in the input schema. |
| 🔍 | Full SoQL filtering. $where, $select, $order, $q exposed as input fields. |
| 🧹 | Cleaned output. Socrata :@computed_region_* internal columns stripped automatically. |
| 🔗 | Dataset provenance. Every record stamped with _datasetId, _datasetUrl, _scrapedAt. |
| ⚡ | Fast. 1,000-row pages, automatic pagination up to 1,000,000 rows. |
| 🚫 | No API key. The Socrata SODA API is public and unauthenticated for all public datasets. |
📊 SF's open-data catalog is one of the most cited public-sector datasets in the country, powering everything from civic-tech projects to academic research.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ SF Open Data Scraper (this Actor) | $5 free credit, then pay-per-use | All 659 SF datasets | Live per run | full SoQL ($where, $select, $order, $q) | ⚡ 2 min |
| Manual CSV download from data.sfgov.org | Free | One dataset at a time | Snapshot | None | 🐢 Manual |
| Raw Socrata SODA queries | Free | Full | Live | SoQL | 🛠️ Code required |
| Third-party civic-data aggregators | $99+/month | Mixed | Daily | Vendor-defined | ⏳ Hours |
Pick this Actor when you want a clean, filtered export of any SF dataset without writing a single line of Socrata code.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the San Francisco Open Data Scraper page on the Apify Store.
- 🎯 Pick a dataset. Find the 4x4 ID on
data.sfgov.org(it's in every dataset URL) and paste it in. - 🔍 Add optional filters. Type a SoQL
$where,$order,$select, or full-text$qif you want a slice. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating SF Open Data Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Paste the Socrata 4x4 ID of any SF dataset, optionally add SoQL filters and maxItems, click Start, and the Actor pages through the SODA API and emits the records verbatim with three appended metadata fields. No browser automation, no captchas, no setup.
🆔 How do I find a dataset ID?
Browse the catalog at data.sfgov.org. Every dataset URL ends in a 4x4 ID like i98e-djp9 (building permits) or vw6y-z8j6 (311 cases). Paste that ID into the input form.
🗂️ How many datasets are supported?
All 659 datasets currently exposed on data.sfgov.org are enumerated in the input dropdown. New datasets are added by the City regularly; reach out if you need a specific one that isn't yet in the list.
🔍 What is SoQL?
SoQL is Socrata's SQL-like query language for the SODA API. The Actor exposes $where, $select, $order, and $q as input fields. Reference docs: dev.socrata.com. A short cheat sheet: $where=col='value', $order=col DESC, $select=col1,col2, $q=search text.
🧹 Why are some columns missing from the output?
Socrata appends internal :@computed_region_* lookup columns to most datasets. These are noise for downstream analytics, so the Actor strips them automatically. Everything else in the dataset's native schema is passed through verbatim.
🔄 How fresh is the data?
The City of San Francisco updates each dataset on its own cadence (some daily, some weekly, some monthly). Every run of this Actor fetches the latest data available on data.sfgov.org as of run time.
🚫 Why did I get a 401 or 403 error?
A small number of datasets are private and require Socrata authentication. The Actor will return a clean {error: ...} record indicating which one. Public datasets work without any credentials.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep a downstream database in sync.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.
⚖️ Is this data legal to use?
Yes. SF Open Data is published under the City of San Francisco Open Data Policy and is generally free to reuse with attribution. Specific datasets may carry additional notes on their landing page; check before commercial redistribution.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
SF Open Data Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get notified when a new record matches your filters
- Airbyte - Pipe SF datasets into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh SF civic data into your CRM or analytics backend.
🔗 Recommended Actors
- 🌴 Los Angeles Open Data Scraper - Same Socrata pattern, every LA dataset
- 🏗️ Seattle Building Permits Scraper - Seattle DCI permits
- 🏠 Greatschools Scraper - U.S. K-12 school ratings
- 🏛️ James Edition Real Estate Scraper - Luxury international real estate
- 📰 PR Newswire Scraper - Press releases and corporate announcements
💡 Pro Tip: browse the complete ParseForge collection for more public-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the City and County of San Francisco or Tyler Technologies / Socrata. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.