Hong Kong Open Data Scraper avatar

Hong Kong Open Data Scraper

Pricing

from $13.00 / 1,000 result items

Go to Apify Store
Hong Kong Open Data Scraper

Hong Kong Open Data Scraper

Export datasets from data.gov.hk, the Hong Kong government open data portal. Browse the full catalog or fetch specific datasets. Pull titles, organizations, descriptions, tags, update frequency, resource files, formats, licences, and direct download links.

Pricing

from $13.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

ParseForge Banner

🇭🇰 Hong Kong Open Data Scraper

🚀 Export the Hong Kong government open data catalog in seconds. Browse 4,000+ datasets from data.gov.hk, filter by keyword or fetch by ID, and pull every resource file, format, licence, and download link. No login, no manual catalog crawl.

🕒 Last updated: 2026-05-23 · 📊 19 fields per record · 📂 4,000+ datasets · 🏛️ All HK gov departments · 📥 Direct download links

The Hong Kong Open Data Scraper queries the official data.gov.hk portal and returns 19 structured fields per dataset, including title, publishing organization, description, tags, update frequency, licence, maintainer contact, every downloadable resource with its format, and the canonical catalog page. The portal is the HK government's central open-data hub and aggregates data from all bureaus and departments.

The catalog covers transport, weather, planning, environment, health, education, demographics, real estate, statistics, public safety, and more, across more than 4,000 datasets contributed by every major HK department. This Actor turns the catalog into a clean CSV, Excel, JSON, or XML feed in under five minutes.

🎯 Target Audience💡 Primary Use Cases
Southeast Asia real-estate analysts, urban researchers, fintechs operating in HK, smart-city teams, journalists, transit planners, civic-tech buildersReal-estate trend monitoring, transit feed ingestion, weather and pollution dashboards, regulatory data automation, civic transparency apps, HK market research

📋 What the Hong Kong Open Data Scraper does

Two operating modes:

  • 📑 Catalog mode. Walk every dataset in the portal, optionally filtered by a case-insensitive keyword on the dataset slug.
  • 🎯 Dataset mode. Fetch a single dataset by its slug ID (e.g. aahk-team1-flight-info) for targeted ingestion.

Each record includes the dataset ID, title (English and Traditional Chinese where available), publishing organization, free-text description, topic tags, dataset groups, update frequency, licence string, maintainer name and email, metadata creation and modification timestamps, and the full list of downloadable resources with their file format and direct download URL.

💡 Why it matters: HK's open-data portal exposes everything from MTR rider counts to land-registry pulls to typhoon-track forecasts. Building a recurring ingest against it means parsing the catalog API, walking pagination, and threading resource lookups by hand. This Actor does all of that and returns ready-to-load rows.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded catalog feed.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Datasets to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"catalog"catalog walks the portal; dataset fetches one record.
searchQuerystring"weather"Substring matched against dataset slugs in catalog mode. Empty walks the entire catalog.
datasetIdstring""Dataset slug. Used only when mode is dataset.

Example: 50 weather-related datasets.

{
"maxItems": 50,
"mode": "catalog",
"searchQuery": "weather"
}

Example: fetch the Hong Kong Airport flight info dataset.

{
"mode": "dataset",
"datasetId": "aahk-team1-flight-info"
}

⚠️ Good to Know: HK datasets are published in many formats including CSV, JSON, XML, GeoJSON, KMZ, PDF, and XLS. The resources array carries each file with its format string and a direct download link so you can wire up format-specific downstream parsers.


📊 Output

Each dataset record contains 19 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🏷️ logoUrlstring | null"https://data.gov.hk/.../org_logo.png"
🆔 datasetIdstring"aahk-team1-flight-info"
📚 titlestring"Real-time Flight Information"
🏛️ organizationstring"aahk"
🏢 organizationTitlestring"Airport Authority Hong Kong"
📝 descriptionstring"Real-time flight arrival and departure data..."
🏷️ tagsarray["aviation", "transport", "real-time"]
📦 groupsarray["transportation"]
🔢 numResourcesnumber4
📥 resourcesarray[{"name":"arrivals.json","format":"JSON","url":"..."}]
🔄 updateFrequencystring | null"real-time"
⚖️ licensestring"Public Sector Information Licence"
📖 dataDictionaryUrlstring | null"https://www.hongkongairport.com/.../spec.pdf"
📅 metadataCreatedstring"2018-03-15T08:42:13.000Z"
🔁 metadataModifiedstring"2026-05-10T03:00:00.000Z"
👤 maintainerstring | null"Airport Authority Hong Kong"
📧 maintainerEmailstring | null"opendata@hkairport.com"
🔗 urlstring"https://data.gov.hk/en-data/dataset/aahk-team1-flight-info"
🕒 scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
🏛️Whole-of-government coverage. Every HK bureau and department publishes here.
🎯Catalog or single-dataset. Two modes cover bulk audits and targeted ingestion.
📥Direct file links. Each resource carries name, format, and download URL ready to pipe into a worker.
⚖️Licence and contact metadata. Maintainer name, email, and licence string per record for compliance.
Fast. 10 datasets in under 5 seconds, 4,000 in under 5 minutes.
🔁Always fresh. Live catalog reads on every run.
🚫No authentication. Public portal, no key required.

📊 The HK open-data catalog is the single source of truth for civic, environmental, and economic data in Hong Kong.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ Hong Kong Open Data Scraper (this Actor)$5 free credit, then pay-per-use4,000+ datasetsLive per runcatalog / single ID⚡ 2 min
Commercial HK market data$1,000+/monthCurated sliceDailyVendor-specific🐢 Days
Custom catalog crawlerFree engineeringFullCron drivenHand built⏳ Weeks
Per-department site visitsFreePer-dept onlyManualUI only🕒 Painful

Pick this Actor when you want a clean, filterable feed of the entire HK government open-data catalog with zero parser maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Hong Kong Open Data Scraper page on the Apify Store.
  3. 🎯 Set input. Pick catalog or dataset mode. In catalog mode add an optional keyword. In dataset mode supply a slug.
  4. 🚀 Run it. Click Start and let the Actor collect records.
  5. 📥 Download. Grab your results from the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🏘️ Real Estate & Property Tech

  • Track HK land sale records and tender outcomes
  • Map planning approvals to neighborhood demand
  • Monitor public housing stock and waitlists
  • Enrich listing platforms with district demographics

🚇 Transit & Logistics

  • Real-time MTR, bus, and ferry feed ingestion
  • HK airport arrival and departure boards in product
  • Cargo and trade port data for logistics dashboards
  • Routing engines enriched with HK traffic flows

💼 Fintech & Compliance

  • Company registry and licence-status checks
  • Regulatory dataset monitoring for KYC pipelines
  • Currency, statistics, and economic indicator feeds
  • HK-specific compliance and reporting automation

🌦️ Environment & Smart City

  • Air-quality dashboards using EPD feeds
  • Real-time weather and typhoon track ingestion
  • Energy and utility consumption dashboards
  • Public-safety and emergency notification feeds

🔌 Automating Hong Kong Open Data Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Daily refreshes keep a downstream warehouse aligned with new dataset publications and metadata updates.


🌟 Beyond business use cases

Open civic data powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Urban planning and policy theses on HK datasets
  • Reproducible studies with cited dataset pulls
  • Open-data exercises for university courses
  • Comparative open-government research across cities

🎨 Personal and creative

  • Side-project dashboards for HK residents
  • Visualizations of typhoon tracks or pollution trends
  • Hobbyist apps that surface bus or ferry timings
  • Static-site mashups of public datasets

🤝 Non-profit and civic

  • Watchdog tracking of public spending and land tenders
  • Refugee and migrant resource maps
  • Community advocacy with transit accessibility data
  • Free educational materials on local government data

🧪 Experimentation

  • Train forecasting models on HK weather feeds
  • Prototype civic chatbots backed by real datasets
  • Test data-pipeline templates against varied formats
  • Benchmark ETL tools on a real multi-format catalog

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick catalog mode for a portal-wide walk or dataset mode for a single record. Set keyword or ID, click Start. The Actor returns rows with full metadata and every resource file's direct download link.

📏 How complete is the metadata?

The HK portal is maintained by the Office of the Government Chief Information Officer. Most fields are populated by the publishing department. Some smaller departments leave optional fields like dataDictionaryUrl empty.

🔁 How often is the catalog refreshed?

Datasets are added and updated continuously by HK government departments. Every Actor run hits the live portal, so new datasets and metadata edits appear immediately.

🌐 Are titles available in Traditional Chinese?

Yes. Bilingual title support is part of the portal. Many records expose English and Traditional Chinese titles, and the description field is also frequently bilingual.

📥 What formats do the resource files come in?

CSV, JSON, XML, GeoJSON, KMZ, PDF, XLS, and more. The resources array carries the format string per file so you can route to the right downstream parser.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to trigger this Actor on any cron interval (hourly, daily, weekly) and keep a downstream warehouse in sync.

Yes. HK government data is published under the Public Sector Information licence which permits non-commercial and most commercial reuse with attribution. Always check the per-dataset license field.

💼 Can I use this data commercially?

In most cases yes, under the HK PSI Licence with attribution. A handful of datasets carry additional terms, so honor the license string on each record.

💳 Do I need a paid Apify plan to use this Actor?

No. The free plan covers testing and small runs (10 records per run). A paid plan unlocks the higher cap, scheduling, and concurrency.

🔁 What happens if a run fails or gets interrupted?

Apify retries transient errors automatically. If a run still fails, inspect the log, fix the input, and restart. Partial datasets are preserved.

🆘 What if I need help?

Our support team is here. Use the Apify platform messaging or the Tally form linked below.


🔌 Integrate with any app

Hong Kong Open Data Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe HK open data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh catalog records into your ingest pipeline or alert your data team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the Government of the Hong Kong Special Administrative Region or any of its departments. All trademarks mentioned are the property of their respective owners. Only publicly available catalog data is collected, under the Hong Kong PSI Licence.