# EMA Medicines Scraper 💊 (`shahidirfan/ema-medicines-scraper`) Actor

Scrape European Medicines Agency data for drug approvals, clinical trials & pharmaceutical information. Extract EMA medicines, regulatory documents & authorization data at scale. Perfect for pharma research & compliance.

- **URL**: https://apify.com/shahidirfan/ema-medicines-scraper.md
- **Developed by:** [Shahid Irfan](https://apify.com/shahidirfan) (community)
- **Categories:** Agents, Automation, Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## European Medicines Agency Medicines Scraper

Extract structured medicines data from the European Medicines Agency in a format ready for analysis and monitoring. Collect medicine status, authorisation timelines, therapeutic information, product numbers, and official medicine URLs in one run. Built for regulatory research, portfolio tracking, and automated reporting workflows.

### Features

- **EMA medicines coverage** — Collect records for human and veterinary medicines published by EMA.
- **Keyword filtering** — Narrow output using medicine names, INN/common names, status terms, or therapeutic keywords.
- **URL-aware input** — Accept EMA search URLs, medicine detail URLs, or direct JSON report URLs.
- **Pagination control** — Limit extraction with `results_wanted` and `max_pages` for predictable run sizes.
- **Clean dataset output** — Excludes null and empty values from each dataset item.

### Use Cases

#### Regulatory Intelligence
Track medicine status changes, approval timelines, and updates for compliance and policy teams.

#### Portfolio Monitoring
Monitor medicine records by product name, INN, and therapeutic area for internal reporting.

#### Research Pipelines
Feed downstream BI tools and custom analytics with structured, machine-readable medicine records.

#### Competitive Benchmarking
Compare categories, authorisation timelines, and medicine status patterns across products.

### Input Parameters

| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| `url` | String | No | `https://www.ema.europa.eu/en/search?search_api_fulltext=nuwiq` | Optional EMA URL (search URL, medicine page URL, or JSON report URL). |
| `keyword` | String | No | `nuwiq` | Optional keyword filter. If provided, this takes priority over URL-derived keyword values. |
| `results_wanted` | Integer | No | `20` | Maximum number of records to return. |
| `max_pages` | Integer | No | `5` | Page cap used for slicing (20 records per page). |
| `proxyConfiguration` | Object | No | `{ "useApifyProxy": false }` | Proxy settings for restricted environments. |

### Output Data

Each item in the dataset can include the following fields:

| Field | Type | Description |
|---|---|---|
| `name_of_medicine` | String | Medicine name |
| `category` | String | Human or Veterinary |
| `medicine_status` | String | Current medicine status |
| `international_non_proprietary_name_common_name` | String | INN/common name |
| `therapeutic_area_mesh` | String | Therapeutic area |
| `marketing_authorisation_date` | String | Marketing authorisation date |
| `last_updated_date` | String | Last update date |
| `ema_product_number` | String | EMA product number |
| `medicine_url` | String | Official EMA medicine page |
| `source_api_url` | String | Source feed URL used during extraction |

### Usage Examples

#### Basic run with defaults

```json
{}
````

#### Keyword-based extraction

```json
{
    "keyword": "breast neoplasms",
    "results_wanted": 30,
    "max_pages": 3
}
```

#### Start from an EMA search URL

```json
{
    "url": "https://www.ema.europa.eu/en/search?search_api_fulltext=nuwiq&page=0",
    "results_wanted": 10
}
```

#### Specific medicine page URL

```json
{
    "url": "https://www.ema.europa.eu/en/medicines/human/EPAR/nuwiq"
}
```

### Sample Output

```json
{
    "category": "Human",
    "name_of_medicine": "Nuwiq",
    "ema_product_number": "EMEA/H/C/002813",
    "medicine_status": "Authorised",
    "international_non_proprietary_name_common_name": "simoctocog alfa",
    "therapeutic_area_mesh": "Hemophilia A",
    "marketing_authorisation_date": "22/07/2014",
    "last_updated_date": "21/05/2026",
    "medicine_url": "https://www.ema.europa.eu/en/medicines/human/EPAR/nuwiq",
    "source_api_url": "https://www.ema.europa.eu/en/documents/report/medicines-output-medicines_json-report_en.json"
}
```

### Tips for Best Results

#### Use keyword for focused extraction

Use specific medicine or therapeutic terms to reduce dataset size and improve relevance.

#### Control run size with limits

Start with `results_wanted: 20` to validate output quickly, then increase for production runs.

#### Prefer explicit URLs when needed

Use a medicine page URL to target one medicine or a search URL to carry query context.

### Integrations

Connect extracted data with:

- **Google Sheets** — Build tracking sheets
- **Airtable** — Create searchable medicine databases
- **Make** — Automate medicine monitoring workflows
- **Zapier** — Trigger downstream notifications and actions
- **Webhooks** — Push results to internal systems

Export formats available from dataset:

- **JSON**
- **CSV**
- **Excel**
- **XML**

### Frequently Asked Questions

#### Does user input override defaults?

Yes. Values provided in the run input always override schema prefills and local defaults.

#### What if both `url` and `keyword` are provided?

`keyword` is used as the primary text filter. URL values still help with page offset and URL-specific targeting.

#### Can I run without any input?

Yes. Defaults are provided for QA and quick start, and produce non-empty output.

# Actor input Schema

## `url` (type: `string`):

Optional EMA URL. Supports search URLs, medicine detail URLs, or the JSON report URL.

## `keyword` (type: `string`):

Optional text filter. If provided, this takes priority over keyword values parsed from URL.

## `results_wanted` (type: `integer`):

Maximum number of records to return.

## `max_pages` (type: `integer`):

Page cap for result slicing (20 records per page).

## `proxyConfiguration` (type: `object`):

Apify proxy settings.

## Actor input object example

```json
{
  "url": "https://www.ema.europa.eu/en/search?search_api_fulltext=nuwiq",
  "keyword": "nuwiq",
  "results_wanted": 20,
  "max_pages": 5,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "url": "https://www.ema.europa.eu/en/search?search_api_fulltext=nuwiq",
    "keyword": "nuwiq",
    "results_wanted": 20,
    "max_pages": 5
};

// Run the Actor and wait for it to finish
const run = await client.actor("shahidirfan/ema-medicines-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "url": "https://www.ema.europa.eu/en/search?search_api_fulltext=nuwiq",
    "keyword": "nuwiq",
    "results_wanted": 20,
    "max_pages": 5,
}

# Run the Actor and wait for it to finish
run = client.actor("shahidirfan/ema-medicines-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "url": "https://www.ema.europa.eu/en/search?search_api_fulltext=nuwiq",
  "keyword": "nuwiq",
  "results_wanted": 20,
  "max_pages": 5
}' |
apify call shahidirfan/ema-medicines-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=shahidirfan/ema-medicines-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "EMA Medicines Scraper 💊",
        "description": "Scrape European Medicines Agency data for drug approvals, clinical trials & pharmaceutical information. Extract EMA medicines, regulatory documents & authorization data at scale. Perfect for pharma research & compliance.",
        "version": "0.0",
        "x-build-id": "euZc8euggEOcWeuWc"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/shahidirfan~ema-medicines-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-shahidirfan-ema-medicines-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/shahidirfan~ema-medicines-scraper/runs": {
            "post": {
                "operationId": "runs-sync-shahidirfan-ema-medicines-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/shahidirfan~ema-medicines-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-shahidirfan-ema-medicines-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "url": {
                        "title": "URL",
                        "type": "string",
                        "description": "Optional EMA URL. Supports search URLs, medicine detail URLs, or the JSON report URL."
                    },
                    "keyword": {
                        "title": "Keyword",
                        "type": "string",
                        "description": "Optional text filter. If provided, this takes priority over keyword values parsed from URL."
                    },
                    "results_wanted": {
                        "title": "Results wanted",
                        "minimum": 1,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Maximum number of records to return.",
                        "default": 20
                    },
                    "max_pages": {
                        "title": "Max pages",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Page cap for result slicing (20 records per page).",
                        "default": 5
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy settings.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
