# ChEMBL Molecules Scraper (`parseforge/chembl-molecules-scraper`) Actor

Scrape molecules from EBI ChEMBL public API including SMILES, InChI, molecular properties (MW, logP, HBA, HBD, PSA, RTB), max phase, ATC classifications, oral/parenteral/topical flags, first approval, black box warning, prodrug and withdrawn flag. No API key required.

- **URL**: https://apify.com/parseforge/chembl-molecules-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Education, Developer tools, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $28.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/raw/main/banner.jpg)

## 🧪 ChEMBL Bioactive Molecules Scraper

> 🚀 **Export ChEMBL drug discovery data in seconds.** Pull **2.5 million+ bioactive molecules** with SMILES, InChI, ATC codes, clinical phase, and approval status. No API key, no registration, no manual REST stitching.

> 🕒 **Last updated:** 2026-05-13 · **📊 17 fields** per record · **💊 2.5M+ molecules** · **🧬 9 molecule types** · **🌐 EBI public API**

The **ChEMBL Molecules Scraper** queries the EBI ChEMBL public REST API and returns **17 fields per molecule**, including the canonical ChEMBL ID, preferred name, molecule type, max clinical phase, full structure descriptors (canonical SMILES, InChI, InChI Key), calculated molecular properties (molecular weight, LogP, hydrogen-bond donors and acceptors, polar surface area, rotatable bonds, Lipinski Rule of Five violations), ATC classifications, route of administration flags, first-approval year, and withdrawn status. ChEMBL is maintained by the European Bioinformatics Institute and is one of the largest manually curated databases of bioactive molecules in drug discovery.

The catalog covers **small molecules, antibodies, enzymes, proteins, oligonucleotides, oligosaccharides, cells, genes, and unknowns**, totalling more than 2.5 million entries. This Actor makes the data downloadable as CSV, Excel, JSON, or XML in under a minute. The molecule type filter runs server-side, so antibody-only or small-molecule-only exports are fast.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Cheminformaticians, drug discovery scientists, computational chemists, pharma data teams, ML researchers, bioinformaticians, academic labs, regulatory analysts | QSAR datasets, virtual screening libraries, ADMET feature tables, ATC mapping, clinical-phase tracking, approved-drug audits, withdrawn-drug watchlists |

---

### 📋 What the ChEMBL Molecules Scraper does

Two filtering workflows in a single run:

- 🔎 **Full-text query.** Substring match across molecule names and synonyms (e.g. `aspirin`, `imatinib`, `bevacizumab`).
- 🧬 **Type filter.** Server-side filter on `molecule_type`. Pick from small molecule, antibody, enzyme, protein, oligonucleotide, oligosaccharide, cell, gene, or unknown.
- 📜 **Paginated catalog dump.** Leave both filters empty to walk the entire ChEMBL catalog by offset.

Each record returns the canonical ChEMBL ID, the public explorer URL, the structure block (SMILES, InChI, InChI Key, molfile) when present, the property block (MW, LogP, HBA, HBD, PSA, RTB, full MWT, Rule-of-Five violations), the molecule hierarchy (active / parent / salt), the ATC classifications array, administration route flags (oral, parenteral, topical), the black-box-warning flag, the first-approval year, the withdrawn flag, and the prodrug flag.

> 💡 **Why it matters:** ChEMBL underpins most modern drug discovery pipelines. Building your own REST pagination, retry logic, and field selection means a week of plumbing. This Actor returns clean, joined records on every run.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded molecule dataset._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td><code>maxItems</code></td><td>integer</td><td><code>10</code></td><td>Records to return. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
<tr><td><code>query</code></td><td>string</td><td><code>"aspirin"</code></td><td>Substring text search across molecule names and synonyms. Empty = list all by offset.</td></tr>
<tr><td><code>moleculeType</code></td><td>string</td><td><code>""</code></td><td>One of 9 ChEMBL molecule types (Small molecule, Antibody, Cell, Enzyme, Gene, Oligonucleotide, Oligosaccharide, Protein, Unknown). Empty = all.</td></tr>
</tbody>
</table>

**Example: 50 approved antibody therapies (server-side type filter).**

```json
{
    "maxItems": 50,
    "moleculeType": "Antibody"
}
````

**Example: text query for everything starting with imatinib.**

```json
{
    "maxItems": 25,
    "query": "imatinib"
}
```

> ⚠️ **Good to Know:** antibodies, proteins, and cells have no SMILES or InChI because they are macromolecules. The `molecule_structures` and `molecule_properties` blocks are omitted for these types and the record stays clean. Small molecules return the full property block. ChEMBL `max_phase` follows the convention `4` = approved, `3` = phase III, `2` = phase II, `1` = phase I, `0.5` = preclinical, `null` = unknown.

***

### 📊 Output

Each molecule record contains up to **17 fields**. Download the dataset as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🆔 `molecule_chembl_id` | string | `"CHEMBL1201580"` |
| 🔗 `url` | string | `"https://www.ebi.ac.uk/chembl/explore/compound/CHEMBL1201580"` |
| 🏷️ `pref_name` | string | null | `"ADALIMUMAB"` |
| 🧬 `molecule_type` | string | null | `"Antibody"` |
| 🎯 `max_phase` | number | null | `4` |
| 🧪 `molecule_structures` | object | `{ canonical_smiles, standard_inchi, standard_inchi_key, molfile }` |
| 📐 `molecule_properties` | object | `{ mw_freebase, alogp, hba, hbd, psa, rtb, full_mwt, num_ro5_violations }` |
| 🌳 `molecule_hierarchy` | object | null | `{ active_chembl_id, parent_chembl_id, molecule_chembl_id }` |
| 🏥 `atc_classifications` | string\[] | `["L04AB04"]` |
| 💊 `indication_class` | string | `"Antineoplastic"` |
| 👄 `oral` | boolean | null | `false` |
| 💉 `parenteral` | boolean | null | `true` |
| 🧴 `topical` | boolean | null | `false` |
| ⚠️ `black_box_warning` | number | null | `1` |
| 📅 `first_approval` | number | null | `2002` |
| 🚫 `withdrawn_flag` | boolean | null | `false` |
| 🧬 `prodrug` | number | null | `0` |
| 🕒 `scrapedAt` | ISO 8601 | `"2026-05-13T22:26:22.480Z"` |

#### 📦 Sample records

<details>
<summary><strong>💉 Approved monoclonal antibody: ADALIMUMAB (CHEMBL1201580)</strong></summary>

```json
{
    "molecule_chembl_id": "CHEMBL1201580",
    "url": "https://www.ebi.ac.uk/chembl/explore/compound/CHEMBL1201580",
    "pref_name": "ADALIMUMAB",
    "molecule_type": "Antibody",
    "max_phase": 4,
    "molecule_hierarchy": {
        "active_chembl_id": "CHEMBL1201580",
        "molecule_chembl_id": "CHEMBL1201580",
        "parent_chembl_id": "CHEMBL1201580"
    },
    "atc_classifications": ["L04AB04"],
    "oral": false,
    "parenteral": true,
    "topical": false,
    "black_box_warning": 1,
    "first_approval": 2002,
    "withdrawn_flag": false,
    "prodrug": 0,
    "scrapedAt": "2026-05-13T22:26:22.480Z"
}
```

</details>

<details>
<summary><strong>🛑 Withdrawn drug: EFALIZUMAB (CHEMBL1201575)</strong></summary>

```json
{
    "molecule_chembl_id": "CHEMBL1201575",
    "url": "https://www.ebi.ac.uk/chembl/explore/compound/CHEMBL1201575",
    "pref_name": "EFALIZUMAB",
    "molecule_type": "Antibody",
    "max_phase": 4,
    "atc_classifications": ["L04AG02"],
    "oral": false,
    "parenteral": true,
    "topical": false,
    "black_box_warning": 0,
    "first_approval": 2003,
    "withdrawn_flag": true,
    "prodrug": 0,
    "scrapedAt": "2026-05-13T22:26:22.480Z"
}
```

</details>

<details>
<summary><strong>🎯 Oncology biologic: BEVACIZUMAB (CHEMBL1201583)</strong></summary>

```json
{
    "molecule_chembl_id": "CHEMBL1201583",
    "url": "https://www.ebi.ac.uk/chembl/explore/compound/CHEMBL1201583",
    "pref_name": "BEVACIZUMAB",
    "molecule_type": "Antibody",
    "max_phase": 4,
    "atc_classifications": ["L01FG01", "S01LA08"],
    "oral": false,
    "parenteral": true,
    "topical": false,
    "black_box_warning": 0,
    "first_approval": 2004,
    "withdrawn_flag": false,
    "prodrug": 0,
    "scrapedAt": "2026-05-13T22:26:22.480Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🧪 | **Massive coverage.** 2.5M+ bioactive molecules curated by EBI scientists. |
| 🎯 | **Server-side type filter.** Antibody-only, small-molecule-only, or protein-only exports run fast at the API level. |
| 🧬 | **Full structure block.** Canonical SMILES, InChI, InChI Key, and molfile in one place. |
| 📐 | **Calculated properties.** MW, LogP, HBA, HBD, PSA, RTB, full MWT, and Rule-of-Five violations precomputed by ChEMBL. |
| 🏥 | **Clinical context.** Max phase, ATC class, route of administration, first-approval year, and withdrawn flag. |
| ⚡ | **Fast.** Paginated REST with retry, returns 100 molecules per request. |
| 🚫 | **No authentication.** Works on the public EBI API. No login or API key. |

> 📊 ChEMBL is one of the most cited databases in cheminformatics literature. Accurate molecule metadata drives QSAR models, ADMET pipelines, and clinical-phase analytics.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| **⭐ ChEMBL Molecules Scraper** *(this Actor)* | $5 free credit, then pay-per-use | **2.5M+ molecules** | **Live per run** | text query, molecule type | ⚡ 2 min |
| Hand-rolled REST scripts | Free | Full ChEMBL | Manual | None unless you build them | 🐢 Days |
| DrugBank commercial license | $$$/year | Subset, drug-only | Curated | Many | ⏳ Hours |
| Open Targets GraphQL | Free | Drug-target focus | Live | Many | ⏳ Hours |

Pick this Actor when you want broad cheminformatics coverage, server-side type filtering, and no pipeline maintenance.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the ChEMBL Bioactive Molecules Scraper page on the Apify Store.
3. 🎯 **Set input.** Pick a molecule type, enter a text query, and set `maxItems`.
4. 🚀 **Run it.** Click **Start** and let the Actor collect your data.
5. 📥 **Download.** Grab your results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to downloaded dataset: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 💊 Pharma & Biotech R\&D

- QSAR and ADMET model training sets
- Virtual screening libraries by molecule class
- Competitive intelligence on clinical-phase assets
- Approved-drug audits for repurposing

</td>
<td width="50%" valign="top">

#### 🧬 Cheminformatics & Data Science

- SMILES libraries for fingerprint pipelines
- Lipinski Rule of Five compliance dashboards
- Property distribution analyses for lead optimization
- Joins with ChEMBL bioactivity tables

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 🏥 Regulatory & Pharmacovigilance

- Withdrawn-drug watchlists with year-of-approval context
- ATC classification mapping for therapeutic-area reporting
- Black-box-warning audits across portfolios
- Route-of-administration filtering for safety review

</td>
<td width="50%" valign="top">

#### 🤖 ML & AI for Drug Discovery

- Training sets for generative chemistry models
- Feature tables for activity-prediction models
- Multi-modal datasets joining structure and clinical metadata
- Benchmark suites for new architectures

</td>
</tr>
</table>

***

### 🔌 Automating ChEMBL Molecules Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. Weekly refreshes keep your local cheminformatics warehouse in sync with EBI ChEMBL releases.

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Reproducible cheminformatics studies with versioned dataset pulls
- Teaching datasets for QSAR and medicinal chemistry coursework
- Open-source ADMET benchmark publications
- Cross-database joins with UniProt, PubChem, and PDB

</td>
<td width="50%">

#### 🎨 Personal and creative

- Indie chemistry visualization apps
- Educational dashboards for science communication
- Drug-of-the-week newsletters and content research
- Hobbyist molecule explorers

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Neglected-disease pipeline mapping
- Open-science drug repurposing initiatives
- Public-domain pharmacology references
- Civic transparency on approved-drug catalogs

</td>
<td width="50%">

#### 🧪 Experimentation

- Train molecular property predictors
- Prototype agentic tools that resolve ChEMBL IDs
- Benchmark cheminformatics libraries on real data
- Generate molecule embeddings at scale

</td>
</tr>
</table>

***

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20ChEMBL%20Molecules%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20ChEMBL%20Molecules%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20ChEMBL%20Molecules%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20ChEMBL%20Molecules%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)

***

### ❓ Frequently Asked Questions

#### 🧩 How does it work?

Set a molecule-type filter or a text query in the input form, click Start, and the Actor calls the EBI ChEMBL REST API with server-side pagination. Records are emitted as clean, joined JSON ready for download or piping into a warehouse. No browser automation, no captchas, no setup.

#### 💊 Where does the data come from?

Directly from the EBI ChEMBL public REST API at `www.ebi.ac.uk/chembl/api/data/molecule`. ChEMBL is maintained by the European Bioinformatics Institute.

#### 🧬 Why are SMILES and InChI missing for some molecules?

Antibodies, proteins, cells, oligonucleotides, and oligosaccharides do not have small-molecule structure descriptors. SMILES and InChI are only meaningful for small molecules, so ChEMBL omits them for macromolecules. Our output reflects that by skipping the `molecule_structures` block for these types.

#### 🎯 What does `max_phase` mean?

It is the highest clinical development phase a molecule has reached. `4` = approved, `3` = phase III, `2` = phase II, `1` = phase I, `0.5` = preclinical, `null` = unknown or pre-clinical without a recorded phase.

#### 🏥 What is the ATC classification?

The Anatomical Therapeutic Chemical classification system from the World Health Organization. ChEMBL maps approved drugs to their ATC codes. A molecule can carry several ATC codes when it is indicated across therapeutic areas.

#### 🔁 How often is ChEMBL updated?

EBI releases new ChEMBL versions roughly every 6 to 12 months. Every run of this Actor hits the live API, so your dataset reflects the current ChEMBL release at run time.

#### ⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (weekly, monthly) and keep a downstream cheminformatics database in sync.

#### ⚖️ Is this data legal to use?

ChEMBL is released under a Creative Commons Attribution-ShareAlike license. The raw molecule data is publicly accessible. Review the ChEMBL license terms for your specific use case, especially for commercial redistribution.

#### 💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and unlocks scheduling, higher concurrency, and larger datasets.

#### 🧪 What if I need bioactivity data?

This Actor returns molecule-level records only. For activities, IC50 values, and target bindings, reach out via the contact form below to request a companion ChEMBL activities scraper.

#### 🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.

***

### 🔌 Integrate with any app

ChEMBL Molecules Scraper connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get run notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe molecule data into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits and releases
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh molecule batches into your product backend, or alert your team in Slack.

***

### 🔗 Recommended Actors

- [**🏥 FINRA BrokerCheck Scraper**](https://apify.com/parseforge/finra-brokercheck-scraper) - U.S. broker and firm regulatory disclosures
- [**🤗 Hugging Face Model Scraper**](https://apify.com/parseforge/hugging-face-model-scraper) - Model metadata, downloads, and benchmarks
- [**🏨 Greatschools Scraper**](https://apify.com/parseforge/greatschools-scraper) - U.S. school ratings and demographics
- [**📈 Smart Apify Actor Scraper**](https://apify.com/parseforge/smart-apify-actor-scraper) - Apify Store actor metadata and quality signals

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ChEMBL, the European Bioinformatics Institute, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available open ChEMBL data is collected.

# Actor input Schema

## `query` (type: `string`):

Substring search across molecule names / synonyms. Leave empty to list all molecules paginated by offset.

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `moleculeType` (type: `string`):

Filter results by ChEMBL molecule type. Leave empty for all types.

## Actor input object example

```json
{
  "query": "aspirin",
  "maxItems": 10,
  "moleculeType": ""
}
```

# Actor output Schema

## `overview` (type: `string`):

Overview of scraped data

## `fullData` (type: `string`):

Complete dataset

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "aspirin",
    "maxItems": 10,
    "moleculeType": ""
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/chembl-molecules-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "aspirin",
    "maxItems": 10,
    "moleculeType": "",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/chembl-molecules-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "aspirin",
  "maxItems": 10,
  "moleculeType": ""
}' |
apify call parseforge/chembl-molecules-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/chembl-molecules-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "ChEMBL Molecules Scraper",
        "description": "Scrape molecules from EBI ChEMBL public API including SMILES, InChI, molecular properties (MW, logP, HBA, HBD, PSA, RTB), max phase, ATC classifications, oral/parenteral/topical flags, first approval, black box warning, prodrug and withdrawn flag. No API key required.",
        "version": "0.0",
        "x-build-id": "1ybJHxXOdEjYohjmJ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~chembl-molecules-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-chembl-molecules-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~chembl-molecules-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-chembl-molecules-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~chembl-molecules-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-chembl-molecules-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Substring search across molecule names / synonyms. Leave empty to list all molecules paginated by offset.",
                        "default": ""
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "moleculeType": {
                        "title": "Molecule Type",
                        "enum": [
                            "",
                            "Small molecule",
                            "Antibody",
                            "Cell",
                            "Enzyme",
                            "Gene",
                            "Oligonucleotide",
                            "Oligosaccharide",
                            "Protein",
                            "Unknown"
                        ],
                        "type": "string",
                        "description": "Filter results by ChEMBL molecule type. Leave empty for all types.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
