Hong Kong Open Data Scraper
Pricing
from $13.00 / 1,000 result items
Hong Kong Open Data Scraper
Export datasets from data.gov.hk, the Hong Kong government open data portal. Browse the full catalog or fetch specific datasets. Pull titles, organizations, descriptions, tags, update frequency, resource files, formats, licences, and direct download links.
Pricing
from $13.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

🇭🇰 Hong Kong Open Data Scraper
🚀 Export the Hong Kong government open data catalog in seconds. Browse 4,000+ datasets from data.gov.hk, filter by keyword or fetch by ID, and pull every resource file, format, licence, and download link. No login, no manual catalog crawl.
🕒 Last updated: 2026-05-23 · 📊 19 fields per record · 📂 4,000+ datasets · 🏛️ All HK gov departments · 📥 Direct download links
The Hong Kong Open Data Scraper queries the official data.gov.hk portal and returns 19 structured fields per dataset, including title, publishing organization, description, tags, update frequency, licence, maintainer contact, every downloadable resource with its format, and the canonical catalog page. The portal is the HK government's central open-data hub and aggregates data from all bureaus and departments.
The catalog covers transport, weather, planning, environment, health, education, demographics, real estate, statistics, public safety, and more, across more than 4,000 datasets contributed by every major HK department. This Actor turns the catalog into a clean CSV, Excel, JSON, or XML feed in under five minutes.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Southeast Asia real-estate analysts, urban researchers, fintechs operating in HK, smart-city teams, journalists, transit planners, civic-tech builders | Real-estate trend monitoring, transit feed ingestion, weather and pollution dashboards, regulatory data automation, civic transparency apps, HK market research |
📋 What the Hong Kong Open Data Scraper does
Two operating modes:
- 📑 Catalog mode. Walk every dataset in the portal, optionally filtered by a case-insensitive keyword on the dataset slug.
- 🎯 Dataset mode. Fetch a single dataset by its slug ID (e.g.
aahk-team1-flight-info) for targeted ingestion.
Each record includes the dataset ID, title (English and Traditional Chinese where available), publishing organization, free-text description, topic tags, dataset groups, update frequency, licence string, maintainer name and email, metadata creation and modification timestamps, and the full list of downloadable resources with their file format and direct download URL.
💡 Why it matters: HK's open-data portal exposes everything from MTR rider counts to land-registry pulls to typhoon-track forecasts. Building a recurring ingest against it means parsing the catalog API, walking pagination, and threading resource lookups by hand. This Actor does all of that and returns ready-to-load rows.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded catalog feed.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
| maxItems | integer | 10 | Datasets to return. Free plan caps at 10, paid plan at 1,000,000. |
| mode | string | "catalog" | catalog walks the portal; dataset fetches one record. |
| searchQuery | string | "weather" | Substring matched against dataset slugs in catalog mode. Empty walks the entire catalog. |
| datasetId | string | "" | Dataset slug. Used only when mode is dataset. |
Example: 50 weather-related datasets.
{"maxItems": 50,"mode": "catalog","searchQuery": "weather"}
Example: fetch the Hong Kong Airport flight info dataset.
{"mode": "dataset","datasetId": "aahk-team1-flight-info"}
⚠️ Good to Know: HK datasets are published in many formats including CSV, JSON, XML, GeoJSON, KMZ, PDF, and XLS. The
resourcesarray carries each file with its format string and a direct download link so you can wire up format-specific downstream parsers.
📊 Output
Each dataset record contains 19 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🏷️ logoUrl | string | null | "https://data.gov.hk/.../org_logo.png" |
🆔 datasetId | string | "aahk-team1-flight-info" |
📚 title | string | "Real-time Flight Information" |
🏛️ organization | string | "aahk" |
🏢 organizationTitle | string | "Airport Authority Hong Kong" |
📝 description | string | "Real-time flight arrival and departure data..." |
🏷️ tags | array | ["aviation", "transport", "real-time"] |
📦 groups | array | ["transportation"] |
🔢 numResources | number | 4 |
📥 resources | array | [{"name":"arrivals.json","format":"JSON","url":"..."}] |
🔄 updateFrequency | string | null | "real-time" |
⚖️ license | string | "Public Sector Information Licence" |
📖 dataDictionaryUrl | string | null | "https://www.hongkongairport.com/.../spec.pdf" |
📅 metadataCreated | string | "2018-03-15T08:42:13.000Z" |
🔁 metadataModified | string | "2026-05-10T03:00:00.000Z" |
👤 maintainer | string | null | "Airport Authority Hong Kong" |
📧 maintainerEmail | string | null | "opendata@hkairport.com" |
🔗 url | string | "https://data.gov.hk/en-data/dataset/aahk-team1-flight-info" |
🕒 scrapedAt | ISO 8601 | "2026-05-23T00:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 🏛️ | Whole-of-government coverage. Every HK bureau and department publishes here. |
| 🎯 | Catalog or single-dataset. Two modes cover bulk audits and targeted ingestion. |
| 📥 | Direct file links. Each resource carries name, format, and download URL ready to pipe into a worker. |
| ⚖️ | Licence and contact metadata. Maintainer name, email, and licence string per record for compliance. |
| ⚡ | Fast. 10 datasets in under 5 seconds, 4,000 in under 5 minutes. |
| 🔁 | Always fresh. Live catalog reads on every run. |
| 🚫 | No authentication. Public portal, no key required. |
📊 The HK open-data catalog is the single source of truth for civic, environmental, and economic data in Hong Kong.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ Hong Kong Open Data Scraper (this Actor) | $5 free credit, then pay-per-use | 4,000+ datasets | Live per run | catalog / single ID | ⚡ 2 min |
| Commercial HK market data | $1,000+/month | Curated slice | Daily | Vendor-specific | 🐢 Days |
| Custom catalog crawler | Free engineering | Full | Cron driven | Hand built | ⏳ Weeks |
| Per-department site visits | Free | Per-dept only | Manual | UI only | 🕒 Painful |
Pick this Actor when you want a clean, filterable feed of the entire HK government open-data catalog with zero parser maintenance.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the Hong Kong Open Data Scraper page on the Apify Store.
- 🎯 Set input. Pick catalog or dataset mode. In catalog mode add an optional keyword. In dataset mode supply a slug.
- 🚀 Run it. Click Start and let the Actor collect records.
- 📥 Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating Hong Kong Open Data Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily refreshes keep a downstream warehouse aligned with new dataset publications and metadata updates.
🌟 Beyond business use cases
Open civic data powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Pick catalog mode for a portal-wide walk or dataset mode for a single record. Set keyword or ID, click Start. The Actor returns rows with full metadata and every resource file's direct download link.
📏 How complete is the metadata?
The HK portal is maintained by the Office of the Government Chief Information Officer. Most fields are populated by the publishing department. Some smaller departments leave optional fields like dataDictionaryUrl empty.
🔁 How often is the catalog refreshed?
Datasets are added and updated continuously by HK government departments. Every Actor run hits the live portal, so new datasets and metadata edits appear immediately.
🌐 Are titles available in Traditional Chinese?
Yes. Bilingual title support is part of the portal. Many records expose English and Traditional Chinese titles, and the description field is also frequently bilingual.
📥 What formats do the resource files come in?
CSV, JSON, XML, GeoJSON, KMZ, PDF, XLS, and more. The resources array carries the format string per file so you can route to the right downstream parser.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to trigger this Actor on any cron interval (hourly, daily, weekly) and keep a downstream warehouse in sync.
⚖️ Is this data legal to use?
Yes. HK government data is published under the Public Sector Information licence which permits non-commercial and most commercial reuse with attribution. Always check the per-dataset license field.
💼 Can I use this data commercially?
In most cases yes, under the HK PSI Licence with attribution. A handful of datasets carry additional terms, so honor the license string on each record.
💳 Do I need a paid Apify plan to use this Actor?
No. The free plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.
🔁 What happens if a run fails or gets interrupted?
Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.
🆘 What if I need help?
Our support team is here. Use the Apify platform messaging or the Tally form linked below.
🔌 Integrate with any app
Hong Kong Open Data Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe HK open data into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh catalog records into your ingest pipeline or alert your data team in Slack.
🔗 Recommended Actors
- 🌍 OurAirports Scraper - Global airport database
- 📈 Indexmundi Scraper - Global demographic and economic indicators
- 🦋 GBIF Biodiversity Scraper - Global biodiversity occurrences
- 🏛️ Library of Congress Scraper - 170M+ digitized cultural records
- 🌦️ NOAA Weather Scraper - US weather observations and forecasts
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Government of the Hong Kong Special Administrative Region or any of its departments. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected, under the Hong Kong PSI Licence.