# OpenDataSoft Dataset Catalog Scraper (`crawlerbros/opendatasoft-scraper`) Actor

Discover and browse 400+ public open datasets from the OpenDataSoft platform. Search by keyword, filter by theme or record count, and export dataset metadata or records.

- **URL**: https://apify.com/crawlerbros/opendatasoft-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Automation, Developer tools, Integrations
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## OpenDataSoft Dataset Catalog Scraper

Browse and download from **400+ public open datasets** hosted on the OpenDataSoft platform — covering transportation, environment, health, tourism, urban data, and more. No API key or account required.

### What it does

Scrapes the OpenDataSoft public catalog (`public.opendatasoft.com`), a curated collection of open government and community datasets. The scraper operates in two modes:
- **catalog** — discover datasets by keyword or theme
- **records** — export actual data rows from any specific dataset

### Input

| Field | Type | Description | Default |
|-------|------|-------------|---------|
| `mode` | string | `catalog` (browse datasets) or `records` (export rows from a dataset) | `catalog` |
| `query` | string | Search datasets by keyword (title, description, publisher) | — |
| `theme` | string | Filter by theme (e.g. `Environment`, `Transport`, `Health`) | — |
| `datasetId` | string | Dataset ID for mode=records (e.g. `osm-australia-cinema`) | — |
| `recordsFilter` | string | ODS filter expression for records (e.g. `country=Australia`) | — |
| `maxItems` | integer | Maximum number of items to emit (1–5,000) | 100 |

#### Modes

**catalog** — Searches the OpenDataSoft public dataset registry. Returns dataset metadata including title, publisher, record count, themes, keywords, license, and update frequency.

**records** — Fetches actual data rows from a specified dataset. Use the catalog mode first to find dataset IDs, then switch to records mode to extract the data.

### Output

#### Catalog mode output

```json
{
  "datasetId": "osm-australia-cinema",
  "datasetUid": "da-xyz789",
  "title": "Cinemas - Australia - OSM data",
  "description": "Cinema locations from OpenStreetMap data for Australia.",
  "publisher": "OpenDataSoft",
  "recordsCount": 408,
  "themes": ["Leisure & Tourism"],
  "keywords": ["cinema", "entertainment", "osm"],
  "language": "en",
  "license": "Open Database License",
  "licenseUrl": "https://opendatacommons.org/licenses/odbl/",
  "modifiedAt": "2024-01-15",
  "updateFrequency": "weekly",
  "features": ["geo"],
  "hasRecords": true,
  "dataVisible": true,
  "recordType": "dataset",
  "scrapedAt": "2024-01-15T14:30:00+00:00"
}
````

#### Records mode output

```json
{
  "datasetId": "osm-australia-cinema",
  "recordType": "record",
  "name": "United Cinemas Opera Quays",
  "opening_hours": "Mo-Su 10:00-23:00",
  "wheelchair": "yes",
  "meta_name_sub": "Sydney",
  "meta_name_state": "New South Wales",
  "meta_geo_pointLat": -33.859496,
  "meta_geo_pointLon": 151.213046,
  "scrapedAt": "2024-01-15T14:30:00+00:00"
}
```

#### Output Fields (catalog)

| Field | Type | Description |
|-------|------|-------------|
| `datasetId` | string | Unique dataset identifier |
| `title` | string | Human-readable dataset title |
| `description` | string | Dataset description |
| `publisher` | string | Organization that published the dataset |
| `recordsCount` | integer | Number of data rows in the dataset |
| `themes` | array | Thematic categories (e.g. Transport, Environment) |
| `keywords` | array | Keywords describing the dataset |
| `language` | string | Primary language |
| `license` | string | Data usage license |
| `licenseUrl` | string | Link to license document |
| `modifiedAt` | string | Last modification date |
| `updateFrequency` | string | How often the dataset updates |
| `features` | array | Dataset capabilities (e.g. geo, timeserie) |
| `hasRecords` | boolean | Whether the dataset has accessible records |
| `dataVisible` | boolean | Whether data is publicly visible |
| `recordType` | string | Always `dataset` in catalog mode |
| `scrapedAt` | string | ISO 8601 UTC timestamp |

All fields are omit-empty — null or empty fields are excluded from output.

### Use Cases

- **Data discovery** — Find relevant public datasets for research or integration projects.
- **Urban analytics** — Explore transportation, parking, cycling, or POI datasets.
- **Environmental research** — Discover air quality, water, and nature datasets.
- **Government data tracking** — Monitor what open data is available from public institutions.
- **Dataset inventory** — Build a searchable catalog of available open data sources.

### Example Inputs

#### Find transport datasets

```json
{ "mode": "catalog", "query": "transport", "maxItems": 50 }
```

#### Find environment datasets

```json
{ "mode": "catalog", "theme": "Environment", "maxItems": 100 }
```

#### Export records from a cinema dataset

```json
{
  "mode": "records",
  "datasetId": "osm-australia-cinema",
  "maxItems": 500
}
```

### Data Source

Data is from the **OpenDataSoft public platform** (`public.opendatasoft.com`), a curated catalog of open datasets contributed by governments, municipalities, and research institutions. Access is free and requires no authentication.

### FAQs

**Do I need an API key?**
No. The OpenDataSoft public platform (`public.opendatasoft.com`) is freely accessible with no credentials required.

**How many datasets are available?**
Approximately 400+ public datasets as of 2024, covering topics from OSM point-of-interest data to electoral results and urban infrastructure.

**How do I find a dataset ID?**
Run the scraper in `catalog` mode with a keyword search. The `datasetId` field in the output is what you pass to `records` mode.

**Can I filter records by specific field values?**
Yes, use the `recordsFilter` field in records mode. For example: `name="Paris"` or `type="restaurant"`. This uses the OpenDataSoft filter expression syntax.

**Why do some records have geo coordinates?**
Datasets with geographic data include a `meta_geo_point` field that the scraper automatically flattens into `meta_geo_pointLat` and `meta_geo_pointLon` fields.

**What is the difference between catalog and records mode?**
Catalog mode returns one row per dataset (metadata only). Records mode returns one row per data item within a specific dataset. Use catalog to discover datasets, then records to extract their data.

# Actor input Schema

## `mode` (type: `string`):

What to scrape: catalog (list datasets) or records (rows from a specific dataset).

## `query` (type: `string`):

Search datasets by keyword (title, description, publisher). Leave empty to list all.

## `theme` (type: `string`):

Filter datasets by theme (e.g. 'Environment', 'Transport', 'Health').

## `datasetId` (type: `string`):

The dataset identifier to fetch records from (e.g. 'osm-australia-cinema'). Find IDs via mode=catalog first.

## `recordsFilter` (type: `string`):

ODS filter expression for records (e.g. 'country=Australia'). Leave empty for all.

## `maxItems` (type: `integer`):

Maximum number of records/datasets to emit.

## Actor input object example

```json
{
  "mode": "catalog",
  "query": "transport",
  "datasetId": "osm-australia-cinema",
  "maxItems": 100
}
```

# Actor output Schema

## `records` (type: `string`):

Dataset containing all scraped OpenDataSoft records.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "catalog",
    "query": "transport",
    "datasetId": "osm-australia-cinema",
    "maxItems": 100
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/opendatasoft-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "catalog",
    "query": "transport",
    "datasetId": "osm-australia-cinema",
    "maxItems": 100,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/opendatasoft-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "catalog",
  "query": "transport",
  "datasetId": "osm-australia-cinema",
  "maxItems": 100
}' |
apify call crawlerbros/opendatasoft-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/opendatasoft-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "OpenDataSoft Dataset Catalog Scraper",
        "description": "Discover and browse 400+ public open datasets from the OpenDataSoft platform. Search by keyword, filter by theme or record count, and export dataset metadata or records.",
        "version": "1.0",
        "x-build-id": "SBd8Y7gwNMWY1Q9Hl"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~opendatasoft-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-opendatasoft-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~opendatasoft-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-opendatasoft-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~opendatasoft-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-opendatasoft-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "catalog",
                            "records"
                        ],
                        "type": "string",
                        "description": "What to scrape: catalog (list datasets) or records (rows from a specific dataset).",
                        "default": "catalog"
                    },
                    "query": {
                        "title": "Search query (mode=catalog)",
                        "type": "string",
                        "description": "Search datasets by keyword (title, description, publisher). Leave empty to list all."
                    },
                    "theme": {
                        "title": "Theme filter (mode=catalog)",
                        "type": "string",
                        "description": "Filter datasets by theme (e.g. 'Environment', 'Transport', 'Health')."
                    },
                    "datasetId": {
                        "title": "Dataset ID (mode=records)",
                        "type": "string",
                        "description": "The dataset identifier to fetch records from (e.g. 'osm-australia-cinema'). Find IDs via mode=catalog first."
                    },
                    "recordsFilter": {
                        "title": "Records filter (mode=records)",
                        "type": "string",
                        "description": "ODS filter expression for records (e.g. 'country=Australia'). Leave empty for all."
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Maximum number of records/datasets to emit.",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
