data.gov.my Scraper - Malaysia Statistics & Open Data
Pricing
Pay per event
data.gov.my Scraper - Malaysia Statistics & Open Data
Scrape data.gov.my, Malaysia's official open-data portal. Browse 300+ government datasets or pull time-series for specific ones with date filtering. Covers weekly fuel prices, CPI/PPI, labour stats, EPF, industrial production, and population data from DOSM, BNM, MOF and others. Clean JSON. No auth.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Share
Scrapes data.gov.my, Malaysia's official government open-data portal. Returns dataset metadata and time-series observations from DOSM, Bank Negara Malaysia, Ministry of Finance, Ministry of Health, and 40+ other agencies — weekly fuel prices, monthly CPI, labour stats, EPF demographics, industrial production, population by state, and 280+ more datasets, all in clean JSON.
data.gov.my Scraper Features
- Two modes — browse the full dataset catalogue (280+ entries with metadata) or pull time-series observations for specific datasets
- Date range filtering — narrow any dataset to a specific window using
startDate/endDate - Publisher filtering — limit catalogue browse to DOSM, BNM, MOF, MOH, or KKMM
- Covers the data people actually want — weekly RON95/RON97/diesel fuel prices (the ones Malaysian media cites every Thursday), monthly CPI/PPI, LFS labour force, EPF statistics, IPI industrial production, and population tables
- No auth, no proxy — the API is public and responds without credentials or residential IP tricks
- Multi-dataset runs — supply a list of dataset IDs and the scraper pulls each in sequence
Who Uses Malaysia Government Statistics?
- Malaysian media and research firms — pull the canonical fuel-price or CPI numbers that NST, The Edge, and Malaysiakini reference in their reporting
- Corporate planning teams — update FY models with the latest CPI/PPI/IPI without manually downloading CSV exports
- Fintech and consumer apps — surface weekly RON95 cap prices that Petronas and industry trackers use
- ESG and sustainability reporters — pull DOSM climate-station data and emissions-adjacent indicators
- Academia and think-tanks — build long-horizon socioeconomic panels from DOSM's population, labour, and health datasets
- Developers and data engineers — automate ingestion of government data into internal pipelines without managing the portal's Next.js interface
How data.gov.my Scraper Works
- Set your mode.
catalogue_browselists all 280+ available datasets with publisher, frequency, and tags.dataset_pullfetches actual observations for one or more dataset IDs you specify. - Optionally filter. In catalogue mode, narrow by publisher (DOSM, BNM, MOF, etc.). In dataset mode, pass a date range and the scraper filters to matching observation rows.
- The scraper fetches. Catalogue mode parses the Next.js page for the embedded dataset collection. Dataset mode calls the REST API at
api.data.gov.mywith pagination, stopping when the page returns fewer records than the page size. - You get structured JSON. Each record carries
dataset_id,row_date,row_data(all measure columns), and a directsource_urlback to data.gov.my.
Input
Dataset pull (fuel prices, specific year):
{"mode": "dataset_pull","datasetIds": ["fuelprice"],"startDate": "2025-01-01","endDate": "2025-12-31","maxItems": 100}
Catalogue browse (all DOSM datasets):
{"mode": "catalogue_browse","publisherFilter": "DOSM","maxItems": 500}
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | dataset_pull | catalogue_browse or dataset_pull |
datasetIds | string[] | ["fuelprice", "cpi_headline"] | Dataset IDs to pull (dataset_pull mode only) |
startDate | string | — | Filter to observations on or after this date (YYYY-MM-DD) |
endDate | string | — | Filter to observations on or before this date (YYYY-MM-DD) |
publisherFilter | string | any | Limit catalogue to one publisher: DOSM, BNM, MOF, MOH, KKMM, or any |
maxItems | integer | 10 | Maximum number of records to return |
data.gov.my Scraper Output Fields
Dataset Pull Mode
One record per observation row. The row_data field carries all measure columns for that dataset as a JSON string — for fuelprice that's ron95, ron97, diesel; for cpi that's the index components; and so on.
{"dataset_id": "fuelprice","dataset_title": null,"publisher": null,"frequency": null,"last_updated": null,"description": null,"tags": null,"row_date": "2025-01-09","row_data": "{\"ron95\":2.05,\"ron97\":3.47,\"diesel\":2.15,\"series_type\":\"level\"}","unit": null,"source_url": "https://data.gov.my/data-catalogue/fuelprice","scraped_at": "2025-01-15T08:30:00.000Z"}
| Field | Type | Description |
|---|---|---|
dataset_id | string | Canonical dataset ID (e.g. fuelprice, cpi_headline) |
dataset_title | string | Human-readable dataset title |
publisher | string | Publishing agency (populated in catalogue mode) |
frequency | string | Update frequency (populated in catalogue mode) |
last_updated | string | Last refresh ISO date (populated in catalogue mode) |
description | string | Catalogue description (populated in catalogue mode) |
tags | string | Categorical tags, pipe-separated (populated in catalogue mode) |
row_date | string | Observation date (YYYY-MM-DD) |
row_data | string | All measure columns as a JSON string |
unit | string | Measure unit |
source_url | string | Source page URL on data.gov.my |
scraped_at | string | Scrape timestamp ISO |
Catalogue Browse Mode
One record per dataset. Observation fields (row_date, row_data, unit) are null — this mode returns metadata only.
{"dataset_id": "fuelprice","dataset_title": "Weekly Retail Fuel Prices","publisher": "KPDN","frequency": "weekly","last_updated": "2025-01-09","description": "Weekly retail fuel prices for RON95, RON97, and diesel in Malaysia.","tags": "prices, fuel, energy","row_date": null,"row_data": null,"unit": null,"source_url": "https://data.gov.my/data-catalogue/fuelprice","scraped_at": "2025-01-15T08:30:00.000Z"}
🔍 FAQ
How do I find valid dataset IDs?
Run the scraper in catalogue_browse mode first. It returns all 280+ datasets with their IDs, publishers, and descriptions. The IDs you see there — fuelprice, cpi_headline, lfs_month, epf_demographics — are what you pass to datasetIds in dataset pull mode.
How much does data.gov.my Scraper cost to run?
At the default PPE rate, a typical dataset pull (100–500 rows) costs fractions of a cent. Full fuel price history (900+ weekly records) runs in seconds. The API is free and fast, so costs are dominated by Apify's minimum start fee rather than record count.
Does data.gov.my Scraper need proxies?
No. The data.gov.my API is public and doesn't block by geography or IP. A standard browser User-Agent header is sufficient. No residential proxies, no CAPTCHA.
What datasets does data.gov.my actually have?
Quite a few. Weekly fuel prices, monthly CPI/PPI/IPI, quarterly GDP components, annual population by state and district, monthly labour force survey, EPF and SOCSO statistics, electricity tariff data, climate station readings, road accidents, crime by state, health indicators, education enrollment. Run catalogue browse to see the full list — it expands as agencies publish new datasets.
Can I pull multiple datasets in one run?
Yes. Pass a list to datasetIds: ["fuelprice", "cpi_headline", "lfs_month"]. The scraper fetches each in sequence and combines the output. The dataset_id field on each record tells you which dataset it came from.
Need More Features?
Need additional metadata fields, webhook delivery for weekly fuel updates, or a specific output format? File a feature request or contact the team.
Why Use data.gov.my Scraper?
- First and only — no competitors on Apify target the data.gov.my v2 API, which launched in 2023 and keeps adding datasets.
- Covers the numbers that get cited — fuel prices, CPI, labour stats, and EPF data are the canonical government figures that Malaysian media, corporates, and academics reference. Programmatic access to the same source, without the CSV downloads.
- No friction — public API, no auth, no proxy, no CAPTCHA. You configure it once and it runs.