# Data.gov.uk Scraper - Low-cost💲🔥📚🇬🇧 (`delectable_incubator/data-gov-uk-scraper-low-cost`) Actor

Scrape data.gov.uk dataset listings 🔎📊 with a powerful open data scraper. Extract dataset titles, publishers, update dates, descriptions, tags, and dataset URLs from search results. Ideal for government data monitoring, open data research, dataset discovery, and structured data catalog creation 🚀

- **URL**: https://apify.com/delectable\_incubator/data-gov-uk-scraper-low-cost.md
- **Developed by:** [Prime Scrape](https://apify.com/delectable_incubator) (community)
- **Categories:** Lead generation, Jobs, Travel
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.00005 / actor start

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

<p align="center">
<img src="https://i.ibb.co/jkNS73wX/readme.png" alt="Data.gov.uk Dataset Scraper" width="100%">
</p>

---

## Data.gov.uk Dataset Scraper 🌐📊🇬🇧

The Data.gov.uk Dataset Scraper is a powerful and scalable Apify Actor designed to extract structured dataset listings directly from Data.gov.uk search result pages.

It enables open data discovery, public sector research, dataset monitoring, government data analysis, academic research, machine learning data collection, and structured dataset generation from the United Kingdom's official open data portal.

---

### 🎯 What This Scraper Does

Simply provide one or more Data.gov.uk search URLs and the scraper handles everything automatically.

✅ Extracts structured dataset listings from Data.gov.uk

✅ Supports bulk URL scraping

✅ Automatically processes search result pages

✅ Handles pagination automatically

✅ Applies maxItemsPerUrl limits

✅ Extracts dataset metadata and publication details

✅ Captures publisher information

✅ Collects dataset URLs and descriptions

✅ Generates clean and structured datasets

✅ Ready for analytics and research workflows

✅ Export-ready output format

---

### 📊 Data Extracted

#### 🌐 Dataset Information

| Field            | Description               |
| ---------------- | ------------------------- |
| 🆔 `datasetId`   | Unique dataset identifier |
| 📄 `title`       | Dataset title             |
| 🏢 `publishedBy` | Publishing organization   |
| 🕒 `lastUpdated` | Dataset last update date  |
| 📝 `description` | Dataset description       |
| 🔗 `datasetUrl`  | Dataset page URL          |
| 🌐 `sourceUrl`   | Search URL source         |

---

### 🛠 How to Use

#### 1️⃣ Configure Input

Provide one or multiple Data.gov.uk search URLs:

````

{
"urls": \[
"https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy",
"https://www.data.gov.uk/search?filters%5Btopic%5D=Health"
],
"maxItemsPerUrl": 20
}

```

#### 2️⃣ Run the Actor

• Loads Data.gov.uk search result pages

• Processes dataset listings automatically

• Extracts structured dataset information

• Collects publisher and update metadata

• Applies maxItemsPerUrl limits

• Stops automatically when limits are reached

#### 3️⃣ Export the Dataset

Download your results in multiple formats:

✅ JSON

✅ CSV

✅ Excel

✅ XML

✅ HTML

---

### ⚙️ Input Configuration

#### 📥 Input Example

```

{
"urls": \[
"https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy",
"https://www.data.gov.uk/search?filters%5Btopic%5D=Health"
],
"maxItemsPerUrl": 20
}

```

#### Input Fields

| Field            | Type    | Description                                                   |
| ---------------- | ------- | ------------------------------------------------------------- |
| `urls`           | array   | List of Data.gov.uk search URLs                               |
| `maxItemsPerUrl` | integer | Maximum number of datasets to collect per URL (0 = unlimited) |

---

### 📤 Output Example

```

{
"datasetId": "a24c061b-d57e-4889-abbc-eead787f38e6",
"title": "Bioscience and Health Technology Database",
"publishedBy": "Department for Business, Energy and Industrial Strategy",
"lastUpdated": "21 August 2020",
"description": "This dataset is used by Departmental officials to analyse and produce annual reports and statistics within the UK Life Sciences sector.",
"datasetUrl": "https://www.data.gov.uk/dataset/a24c061b-d57e-4889-abbc-eead787f38e6/bioscience-and-health-technology-database",
"sourceUrl": "https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy"
}

````

---

### 📊 Output Explanation

| Use Case                 | Description                                       |
| ------------------------ | ------------------------------------------------- |
| 📊 Open Data Research    | Collect structured public datasets                |
| 🏛 Government Monitoring | Track newly published government datasets         |
| 📈 Data Science Projects | Build datasets for analytics and machine learning |
| 📚 Academic Research     | Gather datasets for studies and publications      |
| 🤖 Automation Pipelines  | Feed open data into workflows and dashboards      |

---

### 🌍 Why Use This Scraper?

📊 Discover public datasets at scale

🏛 Monitor government open data publications

📈 Build research-ready structured datasets

🌍 Scrape multiple search topics simultaneously

⚡ Fast and automated extraction

🤖 Automation-ready output

📦 Bulk URL scraping support

🧠 Ideal for analysts, researchers, journalists, and data scientists

🚀 Scalable for both small and enterprise workloads

---

### ❓ FAQ

#### How does this scraper work?

The scraper loads Data.gov.uk search result pages and extracts structured dataset information including titles, descriptions, publishers, update dates, and dataset URLs.

#### Can I scrape multiple searches in one run?

Yes. You can provide multiple search URLs inside the `urls` array.

#### Does the scraper collect publisher information?

Yes. Publishing organizations are extracted whenever available.

#### Can I monitor new datasets over time?

Yes. You can schedule recurring Apify runs to monitor newly published or updated datasets.

#### Is the data collected live?

Yes. Data is extracted directly from Data.gov.uk during every run.

#### What export formats are supported?

JSON, CSV, Excel, XML, and HTML.

#### Can I use the extracted data commercially?

Yes. The extracted data can be used for analytics, research, automation, monitoring, and commercial applications.

#### What happens if the scraper fails?

The Actor includes retry mechanisms and automated error handling to improve reliability.

#### How long does a run take?

Most extractions complete within minutes depending on the number of URLs and requested results.

---

### 🚀 How to Use

1️⃣ Sign up — Create a free Apify account

2️⃣ Find the tool — Search for "Data.gov.uk Dataset Scraper" in the Apify Store

3️⃣ Configure URLs — Add one or multiple Data.gov.uk search URLs

4️⃣ Run it — Start the Actor and wait for extraction

5️⃣ Export data — Download results in JSON, CSV, Excel, XML, or HTML

---

### ⚠️ Disclaimer

This tool is an independent solution and is not affiliated with, endorsed by, or sponsored by Data.gov.uk or the UK Government.

---

### 💸 Pricing

This scraper runs on a **pay per events subscription model**.

You only pay for **successful runs**.

💳 **Price:** $1.19 / 1000 results


---

### Related Actors

If you're interested in other Open Data, Research, Government, Analytics, Marketplace, Jobs, Real Estate, or Lead Generation scraping solutions, explore more tools:

(Coming soon)

---

### 📬 Support

⭐⭐⭐⭐⭐ Leave a 5-star rating if you like this tool

---

### 🌍 PrimeScrape

Built for scalable web data extraction & automation.

Contact us for custom scraping solutions or enterprise requests via Apify or by email.

# Actor input Schema

## `urls` (type: `array`):

One or more data.gov.uk search URLs. 
Examples:
• https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy
• https://www.data.gov.uk/search?filters%5Btopic%5D=Health
• https://www.data.gov.uk/search?filters%5Btopic%5D=Environment
• https://www.data.gov.uk/search?q=climate+change
• https://www.data.gov.uk/search?filters%5Bpublisher%5D=NHS+Digital
## `maxItemsPerUrl` (type: `integer`):

Maximum number of unique datasets to collect per search URL. Each page yields ~20 results. Set to 0 for no limit. Scraping stops automatically when a page returns 0 cards (end of results) or after 2 consecutive zero-card pages.

## Actor input object example

```json
{
  "urls": [
    "https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy",
    "https://www.data.gov.uk/search?filters%5Btopic%5D=Health"
  ],
  "maxItemsPerUrl": 50
}
````

# Actor output Schema

## `datasets` (type: `string`):

No description

## `by_publisher` (type: `string`):

No description

## `by_source` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy",
        "https://www.data.gov.uk/search?filters%5Btopic%5D=Health"
    ],
    "maxItemsPerUrl": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("delectable_incubator/data-gov-uk-scraper-low-cost").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": [
        "https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy",
        "https://www.data.gov.uk/search?filters%5Btopic%5D=Health",
    ],
    "maxItemsPerUrl": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("delectable_incubator/data-gov-uk-scraper-low-cost").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy",
    "https://www.data.gov.uk/search?filters%5Btopic%5D=Health"
  ],
  "maxItemsPerUrl": 50
}' |
apify call delectable_incubator/data-gov-uk-scraper-low-cost --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=delectable_incubator/data-gov-uk-scraper-low-cost",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Data.gov.uk Scraper - Low-cost💲🔥📚🇬🇧",
        "description": "Scrape data.gov.uk dataset listings 🔎📊 with a powerful open data scraper. Extract dataset titles, publishers, update dates, descriptions, tags, and dataset URLs from search results. Ideal for government data monitoring, open data research, dataset discovery, and structured data catalog creation 🚀",
        "version": "0.0",
        "x-build-id": "Pk3sjdX9QL141QPhm"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/delectable_incubator~data-gov-uk-scraper-low-cost/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-delectable_incubator-data-gov-uk-scraper-low-cost",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/delectable_incubator~data-gov-uk-scraper-low-cost/runs": {
            "post": {
                "operationId": "runs-sync-delectable_incubator-data-gov-uk-scraper-low-cost",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/delectable_incubator~data-gov-uk-scraper-low-cost/run-sync": {
            "post": {
                "operationId": "run-sync-delectable_incubator-data-gov-uk-scraper-low-cost",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "Search URLs",
                        "type": "array",
                        "description": "One or more data.gov.uk search URLs. \nExamples:\n• https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy\n• https://www.data.gov.uk/search?filters%5Btopic%5D=Health\n• https://www.data.gov.uk/search?filters%5Btopic%5D=Environment\n• https://www.data.gov.uk/search?q=climate+change\n• https://www.data.gov.uk/search?filters%5Bpublisher%5D=NHS+Digital",
                        "items": {
                            "type": "string"
                        },
                        "default": [
                            "https://www.data.gov.uk/search?filters%5Btopic%5D=Business+and+economy",
                            "https://www.data.gov.uk/search?filters%5Btopic%5D=Health"
                        ]
                    },
                    "maxItemsPerUrl": {
                        "title": "Max Datasets per URL",
                        "type": "integer",
                        "description": "Maximum number of unique datasets to collect per search URL. Each page yields ~20 results. Set to 0 for no limit. Scraping stops automatically when a page returns 0 cards (end of results) or after 2 consecutive zero-card pages.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
