data.gov.my Scraper - Malaysia Statistics & Open Data avatar

data.gov.my Scraper - Malaysia Statistics & Open Data

Pricing

Pay per event

Go to Apify Store
data.gov.my Scraper - Malaysia Statistics & Open Data

data.gov.my Scraper - Malaysia Statistics & Open Data

Scrape data.gov.my, Malaysia's official open-data portal. Browse 300+ government datasets or pull time-series for specific ones with date filtering. Covers weekly fuel prices, CPI/PPI, labour stats, EPF, industrial production, and population data from DOSM, BNM, MOF and others. Clean JSON. No auth.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Categories

Share

Scrapes data.gov.my, Malaysia's official government open-data portal. Returns dataset metadata and time-series observations from DOSM, Bank Negara Malaysia, Ministry of Finance, Ministry of Health, and 40+ other agencies — weekly fuel prices, monthly CPI, labour stats, EPF demographics, industrial production, population by state, and 280+ more datasets, all in clean JSON.


data.gov.my Scraper Features

  • Two modes — browse the full dataset catalogue (280+ entries with metadata) or pull time-series observations for specific datasets
  • Date range filtering — narrow any dataset to a specific window using startDate / endDate
  • Publisher filtering — limit catalogue browse to DOSM, BNM, MOF, MOH, or KKMM
  • Covers the data people actually want — weekly RON95/RON97/diesel fuel prices (the ones Malaysian media cites every Thursday), monthly CPI/PPI, LFS labour force, EPF statistics, IPI industrial production, and population tables
  • No auth, no proxy — the API is public and responds without credentials or residential IP tricks
  • Multi-dataset runs — supply a list of dataset IDs and the scraper pulls each in sequence

Who Uses Malaysia Government Statistics?

  • Malaysian media and research firms — pull the canonical fuel-price or CPI numbers that NST, The Edge, and Malaysiakini reference in their reporting
  • Corporate planning teams — update FY models with the latest CPI/PPI/IPI without manually downloading CSV exports
  • Fintech and consumer apps — surface weekly RON95 cap prices that Petronas and industry trackers use
  • ESG and sustainability reporters — pull DOSM climate-station data and emissions-adjacent indicators
  • Academia and think-tanks — build long-horizon socioeconomic panels from DOSM's population, labour, and health datasets
  • Developers and data engineers — automate ingestion of government data into internal pipelines without managing the portal's Next.js interface

How data.gov.my Scraper Works

  1. Set your mode. catalogue_browse lists all 280+ available datasets with publisher, frequency, and tags. dataset_pull fetches actual observations for one or more dataset IDs you specify.
  2. Optionally filter. In catalogue mode, narrow by publisher (DOSM, BNM, MOF, etc.). In dataset mode, pass a date range and the scraper filters to matching observation rows.
  3. The scraper fetches. Catalogue mode parses the Next.js page for the embedded dataset collection. Dataset mode calls the REST API at api.data.gov.my with pagination, stopping when the page returns fewer records than the page size.
  4. You get structured JSON. Each record carries dataset_id, row_date, row_data (all measure columns), and a direct source_url back to data.gov.my.

Input

Dataset pull (fuel prices, specific year):

{
"mode": "dataset_pull",
"datasetIds": ["fuelprice"],
"startDate": "2025-01-01",
"endDate": "2025-12-31",
"maxItems": 100
}

Catalogue browse (all DOSM datasets):

{
"mode": "catalogue_browse",
"publisherFilter": "DOSM",
"maxItems": 500
}
FieldTypeDefaultDescription
modestringdataset_pullcatalogue_browse or dataset_pull
datasetIdsstring[]["fuelprice", "cpi_headline"]Dataset IDs to pull (dataset_pull mode only)
startDatestringFilter to observations on or after this date (YYYY-MM-DD)
endDatestringFilter to observations on or before this date (YYYY-MM-DD)
publisherFilterstringanyLimit catalogue to one publisher: DOSM, BNM, MOF, MOH, KKMM, or any
maxItemsinteger10Maximum number of records to return

data.gov.my Scraper Output Fields

Dataset Pull Mode

One record per observation row. The row_data field carries all measure columns for that dataset as a JSON string — for fuelprice that's ron95, ron97, diesel; for cpi that's the index components; and so on.

{
"dataset_id": "fuelprice",
"dataset_title": null,
"publisher": null,
"frequency": null,
"last_updated": null,
"description": null,
"tags": null,
"row_date": "2025-01-09",
"row_data": "{\"ron95\":2.05,\"ron97\":3.47,\"diesel\":2.15,\"series_type\":\"level\"}",
"unit": null,
"source_url": "https://data.gov.my/data-catalogue/fuelprice",
"scraped_at": "2025-01-15T08:30:00.000Z"
}
FieldTypeDescription
dataset_idstringCanonical dataset ID (e.g. fuelprice, cpi_headline)
dataset_titlestringHuman-readable dataset title
publisherstringPublishing agency (populated in catalogue mode)
frequencystringUpdate frequency (populated in catalogue mode)
last_updatedstringLast refresh ISO date (populated in catalogue mode)
descriptionstringCatalogue description (populated in catalogue mode)
tagsstringCategorical tags, pipe-separated (populated in catalogue mode)
row_datestringObservation date (YYYY-MM-DD)
row_datastringAll measure columns as a JSON string
unitstringMeasure unit
source_urlstringSource page URL on data.gov.my
scraped_atstringScrape timestamp ISO

Catalogue Browse Mode

One record per dataset. Observation fields (row_date, row_data, unit) are null — this mode returns metadata only.

{
"dataset_id": "fuelprice",
"dataset_title": "Weekly Retail Fuel Prices",
"publisher": "KPDN",
"frequency": "weekly",
"last_updated": "2025-01-09",
"description": "Weekly retail fuel prices for RON95, RON97, and diesel in Malaysia.",
"tags": "prices, fuel, energy",
"row_date": null,
"row_data": null,
"unit": null,
"source_url": "https://data.gov.my/data-catalogue/fuelprice",
"scraped_at": "2025-01-15T08:30:00.000Z"
}

🔍 FAQ

How do I find valid dataset IDs?

Run the scraper in catalogue_browse mode first. It returns all 280+ datasets with their IDs, publishers, and descriptions. The IDs you see there — fuelprice, cpi_headline, lfs_month, epf_demographics — are what you pass to datasetIds in dataset pull mode.

How much does data.gov.my Scraper cost to run?

At the default PPE rate, a typical dataset pull (100–500 rows) costs fractions of a cent. Full fuel price history (900+ weekly records) runs in seconds. The API is free and fast, so costs are dominated by Apify's minimum start fee rather than record count.

Does data.gov.my Scraper need proxies?

No. The data.gov.my API is public and doesn't block by geography or IP. A standard browser User-Agent header is sufficient. No residential proxies, no CAPTCHA.

What datasets does data.gov.my actually have?

Quite a few. Weekly fuel prices, monthly CPI/PPI/IPI, quarterly GDP components, annual population by state and district, monthly labour force survey, EPF and SOCSO statistics, electricity tariff data, climate station readings, road accidents, crime by state, health indicators, education enrollment. Run catalogue browse to see the full list — it expands as agencies publish new datasets.

Can I pull multiple datasets in one run?

Yes. Pass a list to datasetIds: ["fuelprice", "cpi_headline", "lfs_month"]. The scraper fetches each in sequence and combines the output. The dataset_id field on each record tells you which dataset it came from.


Need More Features?

Need additional metadata fields, webhook delivery for weekly fuel updates, or a specific output format? File a feature request or contact the team.

Why Use data.gov.my Scraper?

  • First and only — no competitors on Apify target the data.gov.my v2 API, which launched in 2023 and keeps adding datasets.
  • Covers the numbers that get cited — fuel prices, CPI, labour stats, and EPF data are the canonical government figures that Malaysian media, corporates, and academics reference. Programmatic access to the same source, without the CSV downloads.
  • No friction — public API, no auth, no proxy, no CAPTCHA. You configure it once and it runs.