# ChEMBL Compounds Scraper (`parseforge/chembl-compounds-scraper`) Actor

Browse the ChEMBL bioactive molecule catalogue by max clinical phase from preclinical through approved drugs. Returns molecule identifiers, molecular weight, standard InChI, and structural data. Paginate by molregno. Useful for drug discovery, cheminformatics, and pharma research.

- **URL**: https://apify.com/parseforge/chembl-compounds-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Education, Automation, Integrations
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $7.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## ⚗️ ChEMBL Compounds Scraper

> 🚀 **Export ChEMBL chemical compounds in seconds. ChEMBL IDs, names, molecular formulas, weights, SMILES, InChI, and clinical phases — direct from the public ChEMBL REST API.**

> 🕒 **Last updated.** 2026-06-05 · **📊 12 fields** per record · more than two million bioactive molecules curated by ChEMBL at EBI · Public API · No login required

The ChEMBL Compounds Scraper turns the [ChEMBL REST API](https://www.ebi.ac.uk/chembl) public endpoint into a clean, structured dataset. It queries the source live, normalizes the response into one row per record, and pushes the result into an Apify dataset you can download or pipe to your warehouse.

More than two million bioactive molecules curated by ChEMBL at EBI are covered in a single run, with stable field names and null-safe parsing.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| 💊 Medicinal chemists | Inventory bioactive compound space |
| 🧪 Drug discovery | Filter by clinical phase for repurposing |
| 🎓 Educators | Build cheminformatics teaching sets |
| 🤖 ML teams | Train molecular property predictors |

### 📋 What the ChEMBL Compounds Scraper does

- Calls the public ChEMBL REST API endpoint with the parameters you supply.
- Parses the response and flattens each record into a single dataset row.
- Casts numeric fields to numbers where applicable for clean spreadsheet imports.
- Surfaces rate-limit or upstream errors as a single-row `error` record instead of crashing.
- Exports to every Apify dataset format supported in the UI.

> 💡 **Why it matters.** The raw ChEMBL REST API response is great for API consumers but awkward for spreadsheets and BI tools. This actor normalizes the shape so the data drops straight into pandas, BigQuery, or a Google Sheet.

### 🎬 Full Demo

_🚧 Coming soon._

### ⚙️ Input

<table>
<tr><th>Field</th><th>Type</th><th>Required</th><th>Description</th></tr>
<tr><td><code>maxPhase</code></td><td>string</td><td>No</td><td>Filter by maximum clinical trial phase reached.</td></tr>
<tr><td><code>molRegnoMin</code></td><td>integer</td><td>No</td><td>Optional lower bound on the ChEMBL molregno (internal numeric ID). Use to paginate through the full catalog.</td></tr>
<tr><td><code>maxItems</code></td><td>integer</td><td>No</td><td>Free users. 10. Paid users. up to 1,000,000. Prefill. 10.</td></tr>
</table>

**Example 1.**
```json
{
  "maxPhase": "-1",
  "molRegnoMin": 1,
  "maxItems": 10
}
````

**Example 2.**

```json
{
  "maxPhase": "-1",
  "molRegnoMin": 1,
  "maxItems": 50
}
```

> ⚠️ **Good to Know.** This actor calls the public ChEMBL REST API endpoint with no authentication required. Upstream rate limits apply; if the source returns a limit notice, you will see it as a single `error` record in your dataset.

### 📊 Output

Each record is a flat object. `error` is always last.

| Field | Type | Description |
|---|---|---|
| 🔹 `chembl_id` | string | Field from the ChEMBL REST API response. |
| 🔹 `pref_name` | string | Field from the ChEMBL REST API response. |
| 🔹 `molecular_formula` | string | Field from the ChEMBL REST API response. |
| 🔹 `mw` | string | Field from the ChEMBL REST API response. |
| 🔹 `canonical_smiles` | string | Field from the ChEMBL REST API response. |
| 🔹 `inchi` | string | Field from the ChEMBL REST API response. |
| 🔹 `max_phase` | string | Field from the ChEMBL REST API response. |
| 🔹 `indication_class` | string | Field from the ChEMBL REST API response. |
| 🔹 `atc_classifications` | string | Field from the ChEMBL REST API response. |
| 🔹 `structure_type` | string | Field from the ChEMBL REST API response. |
| 🔹 `scrapedAt` | string | Field from the ChEMBL REST API response. |
| 🔹 `error` | string | Set if the upstream response was an error or rate-limit. |

**Sample record.**

```json
{
  "chembl_id": "sample_chembl_id",
  "pref_name": "sample_pref_name",
  "molecular_formula": "sample_molecular_formula",
  "mw": "sample_mw",
  "canonical_smiles": "sample_canonical_smiles",
  "inchi": "sample_inchi",
  "max_phase": "sample_max_phase",
  "indication_class": "sample_indication_class",
  "atc_classifications": "sample_atc_classifications",
  "structure_type": "sample_structure_type",
  "scrapedAt": "sample_scrapedAt",
  "error": null
}
```

### ✨ Why choose this Actor

| 🆓 | Works with the public ChEMBL REST API endpoint. No API key, no signup. |
| 🧹 | Clean field names, ready for BI tools. |
| 🔢 | Numeric strings cast to real numbers where it makes sense. |
| 🛟 | Upstream errors and rate limits surface as a clean `error` record. |
| 🔌 | One-click export to every Apify dataset format. |
| 💾 | Push to dataset, then pipe to BigQuery, Snowflake, Postgres, or Google Sheets. |

### 📈 How it compares to alternatives

| Approach | Setup time | Clean shape | Pagination | Error handling |
|---|---|---|---|---|
| Roll your own `fetch` | 30 min + | ❌ | manual | manual |
| Copy-paste from the browser | 5 min, fragile | ❌ | ❌ | ❌ |
| **This Actor** | 5 sec, no install | ✅ | ✅ | ✅ |

### 🚀 How to use

1. Click **Try for free**.
2. Fill in the input (or leave defaults).
3. Click **Start**.
4. Within seconds, the dataset is ready for download or integration.

### 💼 Business use cases

**📊 Analytics.** Pipe records into your warehouse and join against internal data for cross-source dashboards.

**🤖 Automation.** Trigger this actor on a schedule, then push results to Slack, Airtable, or Google Sheets.

**🧪 Research.** Snapshot the public state of ChEMBL REST API on a date and archive it for reproducible studies.

**📰 Editorial.** Verify quotes, numbers, or records cited in stories with a one-click fresh pull.

### 🔌 Automating ChEMBL Compounds Scraper

- **Make / Zapier**. Trigger this actor on a schedule, push results to Slack, Airtable, Google Sheets, or anywhere else.
- **Cron schedule**. Use the native Apify scheduler to run on any cadence.
- **Webhooks**. Get a POST to your endpoint the moment a run finishes.
- **Pipe to BigQuery / Snowflake / Postgres**. Native Apify integrations move datasets straight into your warehouse.

### 🌟 Beyond business use cases

**🎓 Education.** Build classroom datasets without paying for a commercial feed.

**🧪 Personal research.** Track changes in the source over time on your own schedule.

**🤝 Non-profit and open data.** Build public dashboards without writing client code.

**🧰 Tinkering and prototyping.** Wire up a fresh data feed in seconds to test a new chart or model.

### 🤖 Ask an AI assistant about this scraper

Pop this README into ChatGPT, Claude, or any AI assistant and ask it to map your specific workflow to the actor's inputs. The schema, examples, and field list above contain everything an LLM needs to design a working pipeline.

### ❓ Frequently Asked Questions

**❓ Do I need an API key?** No. This actor calls the public ChEMBL REST API endpoint with no authentication required.

**❓ Is there a rate limit?** The upstream source may rate-limit aggressive use. If you hit a limit, the actor pushes a single `error` record rather than crashing.

**❓ Which formats can I download?** Every format Apify's dataset UI supports.

**❓ Are values cast to numbers?** Where the source returns numeric strings for numeric fields, yes.

**❓ How do you handle upstream errors?** A single record with a populated `error` field is pushed, then the actor exits cleanly.

**❓ Can I schedule runs?** Yes. Use Apify's native scheduler, Make, Zapier, or cron.

**❓ Is this scraping or API?** API. The ChEMBL REST API endpoint is fully public; this actor only normalizes the response.

**❓ Will the schema change?** Core fields are stable. Optional fields surface as null when the source omits them.

**❓ How fresh is the data?** Each run hits the live endpoint, so the data is as fresh as the source allows.

**❓ Can I filter the output?** Yes. The input fields above let you narrow the result set before it lands in your dataset.

### 🔌 Integrate with any app

Apify ships native integrations with Make, Zapier, Slack, Discord, Google Drive, Google Sheets, Gmail, Airbyte, Keboola, Telegram, GitHub, and any REST API or webhook endpoint. Trigger runs from a calendar event, a form submission, a cron job, or pipe results straight into BigQuery, Snowflake, or a Postgres warehouse.

### 🔗 Recommended Actors

| Actor | What it does |
|---|---|
| [ParseForge OurAirports Scraper](https://apify.com/parseforge/ourairports-scraper) | Global airport database. |
| [ParseForge Alpha Vantage Scraper](https://apify.com/parseforge) | Stocks, FX, crypto, and indicators. |
| [ParseForge CurseForge Mods Scraper](https://apify.com/parseforge/curseforge-mods-scraper) | Public mod metadata from CurseForge. |
| [ParseForge NBA Stats Scraper](https://apify.com/parseforge/nba-stats-scraper) | Player and team stats from NBA.com. |

> 💡 **Pro Tip.** Browse the complete [ParseForge collection](https://apify.com/parseforge) for 900+ production-grade scrapers across business intelligence, real estate, e-commerce, sports, finance, and public records.

***

**Disclaimer.** This actor scrapes only publicly available data. ParseForge is not affiliated with, endorsed by, or sponsored by any of the third-party services referenced. Users are responsible for complying with the target site's terms of service and applicable law. [Create a free account w/ $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp).

# Actor input Schema

## `maxPhase` (type: `string`):

Filter by maximum clinical trial phase reached.

## `molRegnoMin` (type: `integer`):

Optional lower bound on the ChEMBL molregno (internal numeric ID). Use to paginate through the full catalog.

## `maxItems` (type: `integer`):

Free users. Limited to 10 items (preview). Paid users. Optional, max 1,000,000.

## Actor input object example

```json
{
  "maxItems": 10
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/chembl-compounds-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "maxItems": 10 }

# Run the Actor and wait for it to finish
run = client.actor("parseforge/chembl-compounds-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10
}' |
apify call parseforge/chembl-compounds-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/chembl-compounds-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "ChEMBL Compounds Scraper",
        "description": "Browse the ChEMBL bioactive molecule catalogue by max clinical phase from preclinical through approved drugs. Returns molecule identifiers, molecular weight, standard InChI, and structural data. Paginate by molregno. Useful for drug discovery, cheminformatics, and pharma research.",
        "version": "0.1",
        "x-build-id": "jyyrKxTZnQgMb9B1z"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~chembl-compounds-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-chembl-compounds-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~chembl-compounds-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-chembl-compounds-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~chembl-compounds-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-chembl-compounds-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "maxPhase": {
                        "title": "Max Phase",
                        "enum": [
                            "",
                            "-1",
                            "0",
                            "1",
                            "2",
                            "3",
                            "4"
                        ],
                        "type": "string",
                        "description": "Filter by maximum clinical trial phase reached."
                    },
                    "molRegnoMin": {
                        "title": "Mol Regno Min",
                        "type": "integer",
                        "description": "Optional lower bound on the ChEMBL molregno (internal numeric ID). Use to paginate through the full catalog."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users. Limited to 10 items (preview). Paid users. Optional, max 1,000,000."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
