data.gouv.fr Scraper avatar

data.gouv.fr Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
data.gouv.fr Scraper

data.gouv.fr Scraper

Scrape the French government open-data portal (data.gouv.fr). Search datasets by keyword, fetch full dataset details by ID/slug, list datasets by organization, and search organizations and reuses - with titles, descriptions, resources/formats, licenses, temporal coverage and metrics.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Scrape data.gouv.fr, the official French government open-data portal — no account, no API key, no cookies. Search tens of thousands of public datasets, pull full dataset metadata, browse everything an organization publishes, and discover the reuses (apps, APIs, visualizations, articles) built on top of French public data. Fast, structured JSON straight from the public data.gouv.fr API.

Ideal for open-data research, data journalism, civic tech, competitive analysis, dataset discovery, and building data catalogs.

What this actor does

  • Five modes: searchDatasets, datasetDetails, byOrganization, searchOrganizations, searchReuses
  • Flexible lookup: search by keyword, fetch by dataset ID / slug / full URL, or list every dataset of an organization
  • Reuse discovery: find apps, APIs, visualizations and articles, filterable by topic and type
  • Rich metadata: resources & file formats, licenses, temporal coverage, tags, and engagement metrics (views, reuses, followers, downloads)
  • Automatic pagination with cross-page de-duplication, so a request for N records returns N unique records
  • Empty fields are omitted — you never get nulls

Output per record

Every record is a flat JSON object. recordType, id, url, sourceUrl and scrapedAt appear on every record.

Dataset (searchDatasets, datasetDetails, byOrganization)

  • id, title, slug, description
  • license, frequency, accessType
  • qualityScore (0–1 metadata-quality score), featured (present when true), badges[] (e.g. hvd, spd)
  • organization, organizationId, organizationUrl
  • temporalCoverageStart, temporalCoverageEnd, spatialGranularity, spatialZones[]
  • tags[]
  • createdAt, lastModified, lastUpdate
  • resourceCount, formats[], resources[] — each resource has title, description, format, url, latestUrl, mime, type, fileType, filesize, checksum, createdAt, lastModified
  • views, followers, reuses, downloads
  • url — portal page · sourceUrl — API endpoint
  • recordType: "dataset", scrapedAt

Organization (searchOrganizations)

  • id, name, slug, acronym, description
  • businessNumberId (SIREN), badges[]
  • datasetsCount, reusesCount, dataservicesCount, followers, members, views
  • createdAt, lastModified
  • logoUrl, url, sourceUrl
  • recordType: "organization", scrapedAt

Reuse (searchReuses)

  • id, title, slug, description
  • type, topic
  • organization, organizationId
  • datasetsCount, tags[]
  • views, followers
  • createdAt, lastModified
  • imageUrl — reuse cover image
  • externalUrl — link to the reuse itself · url — portal page · sourceUrl — API endpoint
  • recordType: "reuse", scrapedAt

Sample dataset record

{
"recordType": "dataset",
"id": "53ba5b91a3a729219b7beae9",
"title": "Transport",
"slug": "transport",
"description": "Réseau de transport en commun …",
"license": "cc-zero",
"frequency": "unknown",
"accessType": "open",
"qualityScore": 0.444,
"organization": "Mairie de Monacia d'Aullène",
"organizationId": "5b9e...",
"organizationUrl": "https://www.data.gouv.fr/organizations/...",
"tags": ["transport", "mobilité"],
"createdAt": "2014-07-07T09:00:00+00:00",
"lastUpdate": "2014-09-02T15:44:46.643000+00:00",
"resourceCount": 2,
"formats": ["csv", "gtfs"],
"resources": [
{ "title": "Arrêts", "format": "csv", "url": "https://…/stops.csv", "latestUrl": "https://www.data.gouv.fr/api/1/datasets/r/…", "filesize": 102400, "checksum": "f0a7cc05…", "createdAt": "2014-07-07T10:34:25+00:00" }
],
"views": 3993,
"downloads": 488,
"url": "https://www.data.gouv.fr/datasets/transport",
"sourceUrl": "https://www.data.gouv.fr/api/1/datasets/transport/",
"scrapedAt": "2026-07-02T13:25:30+00:00"
}

Input

FieldTypeDefaultDescription
modestringsearchDatasetssearchDatasets / datasetDetails / byOrganization / searchOrganizations / searchReuses
querystringtransportFree-text keyword for the search modes (leave empty to browse all)
datasetIdsarrayDataset IDs, slugs or full URLs (mode=datasetDetails)
organizationstringOrganization slug, ID or full URL (mode=byOrganization)
topicstringReuse topic filter (mode=searchReuses)
reuseTypestringReuse type filter: API, Application, Visualization, … (mode=searchReuses)
formatstringOnly datasets offering this file format (CSV, JSON, GeoJSON, …) (dataset modes)
licensestringOnly datasets under this license (CC-BY, ODbL, Licence Ouverte, …) (dataset modes)
badgestringOnly datasets with a quality badge: HVD (High Value Dataset) or SPD (dataset modes)
tagstringOnly datasets/reuses carrying this exact tag
featuredbooleanfalseOnly editorially featured datasets/reuses
sortstring-reusesNewest, recently updated, most reused, most followed, or most viewed
maxItemsinteger50Hard cap on emitted records (1–2000)

Example: search datasets by keyword

{
"mode": "searchDatasets",
"query": "transport",
"sort": "-reuses",
"maxItems": 50
}

Example: fetch specific datasets

{
"mode": "datasetDetails",
"datasetIds": ["transport", "base-sirene-des-entreprises-et-de-leurs-etablissements-siren-siret"]
}

Example: every dataset from an organization

{
"mode": "byOrganization",
"organization": "institut-national-de-la-statistique-et-des-etudes-economiques-insee",
"sort": "-last_modified",
"maxItems": 100
}

Example: discover transport apps (reuses)

{
"mode": "searchReuses",
"topic": "transport_and_mobility",
"reuseType": "application",
"maxItems": 50
}

Use cases

  • Data catalogs — build a searchable catalog of French public datasets in a specific domain
  • Organization monitoring — track everything a ministry or agency publishes
  • Format discovery — find every dataset available as CSV, JSON, GeoJSON, …
  • Civic tech — discover apps and visualizations reusing transport, health, or environment data
  • Freshness tracking — monitor dataset update timestamps and coverage periods
  • Data journalism — surface newly published or most-reused open datasets

FAQ

Do I need an API key or account? No. The data.gouv.fr API is fully public and free.

Can I get every dataset from a specific ministry or agency? Yes — use Datasets by organization mode with the organization's slug or ID (a full profile URL also works).

Can I filter datasets by file format? Yes. Set the format input (e.g. csv, json, geojson, gtfs) to only return datasets that publish a resource in that format. You can further narrow results with license, badge (HVD / SPD), tag and featured. Every dataset record also lists its available formats.

What are reuses? Reuses are the apps, APIs, visualizations and articles that people have built on top of datasets. Search them by keyword, topic and type.

Which sort options apply to which mode? Datasets and organizations support all sort options. Reuses can be sorted by newest, recently updated, most followed or most viewed; "most reused" doesn't apply to reuses and falls back to the default relevance order.

Why do I sometimes get slightly fewer records than the reported total? The portal's relevance ordering can surface the same record on more than one page. The actor removes those duplicates, so you always get unique records.

Is the content in French? Yes — data.gouv.fr is the French national portal, so titles and descriptions are primarily in French. All text is returned as proper UTF-8.

How many results can I get? Set maxItems up to 2000 per run. The actor paginates automatically until it reaches maxItems or the matching results are exhausted.

Data source

Data comes from the public data.gouv.fr API, operated by the French government (Etalab / DINUM). All content is open data published under open licenses. This is a third-party actor and is not affiliated with data.gouv.fr, Etalab or DINUM.