ChEMBL Molecules Scraper
Pricing
from $28.50 / 1,000 results
ChEMBL Molecules Scraper
Scrape molecules from EBI ChEMBL public API including SMILES, InChI, molecular properties (MW, logP, HBA, HBD, PSA, RTB), max phase, ATC classifications, oral/parenteral/topical flags, first approval, black box warning, prodrug and withdrawn flag. No API key required.
Pricing
from $28.50 / 1,000 results
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share

🧪 ChEMBL Bioactive Molecules Scraper
🚀 Export ChEMBL drug discovery data in seconds. Pull 2.5 million+ bioactive molecules with SMILES, InChI, ATC codes, clinical phase, and approval status. No API key, no registration, no manual REST stitching.
🕒 Last updated: 2026-05-13 · 📊 17 fields per record · 💊 2.5M+ molecules · 🧬 9 molecule types · 🌐 EBI public API
The ChEMBL Molecules Scraper queries the EBI ChEMBL public REST API and returns 17 fields per molecule, including the canonical ChEMBL ID, preferred name, molecule type, max clinical phase, full structure descriptors (canonical SMILES, InChI, InChI Key), calculated molecular properties (molecular weight, LogP, hydrogen-bond donors and acceptors, polar surface area, rotatable bonds, Lipinski Rule of Five violations), ATC classifications, route of administration flags, first-approval year, and withdrawn status. ChEMBL is maintained by the European Bioinformatics Institute and is one of the largest manually curated databases of bioactive molecules in drug discovery.
The catalog covers small molecules, antibodies, enzymes, proteins, oligonucleotides, oligosaccharides, cells, genes, and unknowns, totalling more than 2.5 million entries. This Actor makes the data downloadable as CSV, Excel, JSON, or XML in under a minute. The molecule type filter runs server-side, so antibody-only or small-molecule-only exports are fast.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Cheminformaticians, drug discovery scientists, computational chemists, pharma data teams, ML researchers, bioinformaticians, academic labs, regulatory analysts | QSAR datasets, virtual screening libraries, ADMET feature tables, ATC mapping, clinical-phase tracking, approved-drug audits, withdrawn-drug watchlists |
📋 What the ChEMBL Molecules Scraper does
Two filtering workflows in a single run:
- 🔎 Full-text query. Substring match across molecule names and synonyms (e.g.
aspirin,imatinib,bevacizumab). - 🧬 Type filter. Server-side filter on
molecule_type. Pick from small molecule, antibody, enzyme, protein, oligonucleotide, oligosaccharide, cell, gene, or unknown. - 📜 Paginated catalog dump. Leave both filters empty to walk the entire ChEMBL catalog by offset.
Each record returns the canonical ChEMBL ID, the public explorer URL, the structure block (SMILES, InChI, InChI Key, molfile) when present, the property block (MW, LogP, HBA, HBD, PSA, RTB, full MWT, Rule-of-Five violations), the molecule hierarchy (active / parent / salt), the ATC classifications array, administration route flags (oral, parenteral, topical), the black-box-warning flag, the first-approval year, the withdrawn flag, and the prodrug flag.
💡 Why it matters: ChEMBL underpins most modern drug discovery pipelines. Building your own REST pagination, retry logic, and field selection means a week of plumbing. This Actor returns clean, joined records on every run.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded molecule dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
query | string | "aspirin" | Substring text search across molecule names and synonyms. Empty = list all by offset. |
moleculeType | string | "" | One of 9 ChEMBL molecule types (Small molecule, Antibody, Cell, Enzyme, Gene, Oligonucleotide, Oligosaccharide, Protein, Unknown). Empty = all. |
Example: 50 approved antibody therapies (server-side type filter).
{"maxItems": 50,"moleculeType": "Antibody"}
Example: text query for everything starting with imatinib.
{"maxItems": 25,"query": "imatinib"}
⚠️ Good to Know: antibodies, proteins, and cells have no SMILES or InChI because they are macromolecules. The
molecule_structuresandmolecule_propertiesblocks are omitted for these types and the record stays clean. Small molecules return the full property block. ChEMBLmax_phasefollows the convention4= approved,3= phase III,2= phase II,1= phase I,0.5= preclinical,null= unknown.
📊 Output
Each molecule record contains up to 17 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 molecule_chembl_id | string | "CHEMBL1201580" |
🔗 url | string | "https://www.ebi.ac.uk/chembl/explore/compound/CHEMBL1201580" |
🏷️ pref_name | string | null | "ADALIMUMAB" |
🧬 molecule_type | string | null | "Antibody" |
🎯 max_phase | number | null | 4 |
🧪 molecule_structures | object | { canonical_smiles, standard_inchi, standard_inchi_key, molfile } |
📐 molecule_properties | object | { mw_freebase, alogp, hba, hbd, psa, rtb, full_mwt, num_ro5_violations } |
🌳 molecule_hierarchy | object | null | { active_chembl_id, parent_chembl_id, molecule_chembl_id } |
🏥 atc_classifications | string[] | ["L04AB04"] |
💊 indication_class | string | "Antineoplastic" |
👄 oral | boolean | null | false |
💉 parenteral | boolean | null | true |
🧴 topical | boolean | null | false |
⚠️ black_box_warning | number | null | 1 |
📅 first_approval | number | null | 2002 |
🚫 withdrawn_flag | boolean | null | false |
🧬 prodrug | number | null | 0 |
🕒 scrapedAt | ISO 8601 | "2026-05-13T22:26:22.480Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 🧪 | Massive coverage. 2.5M+ bioactive molecules curated by EBI scientists. |
| 🎯 | Server-side type filter. Antibody-only, small-molecule-only, or protein-only exports run fast at the API level. |
| 🧬 | Full structure block. Canonical SMILES, InChI, InChI Key, and molfile in one place. |
| 📐 | Calculated properties. MW, LogP, HBA, HBD, PSA, RTB, full MWT, and Rule-of-Five violations precomputed by ChEMBL. |
| 🏥 | Clinical context. Max phase, ATC class, route of administration, first-approval year, and withdrawn flag. |
| ⚡ | Fast. Paginated REST with retry, returns 100 molecules per request. |
| 🚫 | No authentication. Works on the public EBI API. No login or API key. |
📊 ChEMBL is one of the most cited databases in cheminformatics literature. Accurate molecule metadata drives QSAR models, ADMET pipelines, and clinical-phase analytics.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ ChEMBL Molecules Scraper (this Actor) | $5 free credit, then pay-per-use | 2.5M+ molecules | Live per run | text query, molecule type | ⚡ 2 min |
| Hand-rolled REST scripts | Free | Full ChEMBL | Manual | None unless you build them | 🐢 Days |
| DrugBank commercial license | $$$/year | Subset, drug-only | Curated | Many | ⏳ Hours |
| Open Targets GraphQL | Free | Drug-target focus | Live | Many | ⏳ Hours |
Pick this Actor when you want broad cheminformatics coverage, server-side type filtering, and no pipeline maintenance.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the ChEMBL Bioactive Molecules Scraper page on the Apify Store.
- 🎯 Set input. Pick a molecule type, enter a text query, and set
maxItems. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating ChEMBL Molecules Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep your local cheminformatics warehouse in sync with EBI ChEMBL releases.
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Set a molecule-type filter or a text query in the input form, click Start, and the Actor calls the EBI ChEMBL REST API with server-side pagination. Records are emitted as clean, joined JSON ready for download or piping into a warehouse. No browser automation, no captchas, no setup.
💊 Where does the data come from?
Directly from the EBI ChEMBL public REST API at www.ebi.ac.uk/chembl/api/data/molecule. ChEMBL is maintained by the European Bioinformatics Institute.
🧬 Why are SMILES and InChI missing for some molecules?
Antibodies, proteins, cells, oligonucleotides, and oligosaccharides do not have small-molecule structure descriptors. SMILES and InChI are only meaningful for small molecules, so ChEMBL omits them for macromolecules. Our output reflects that by skipping the molecule_structures block for these types.
🎯 What does max_phase mean?
It is the highest clinical development phase a molecule has reached. 4 = approved, 3 = phase III, 2 = phase II, 1 = phase I, 0.5 = preclinical, null = unknown or pre-clinical without a recorded phase.
🏥 What is the ATC classification?
The Anatomical Therapeutic Chemical classification system from the World Health Organization. ChEMBL maps approved drugs to their ATC codes. A molecule can carry several ATC codes when it is indicated across therapeutic areas.
🔁 How often is ChEMBL updated?
EBI releases new ChEMBL versions roughly every 6 to 12 months. Every run of this Actor hits the live API, so your dataset reflects the current ChEMBL release at run time.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to run this Actor on any cron interval (weekly, monthly) and keep a downstream cheminformatics database in sync.
⚖️ Is this data legal to use?
ChEMBL is released under a Creative Commons Attribution-ShareAlike license. The raw molecule data is publicly accessible. Review the ChEMBL license terms for your specific use case, especially for commercial redistribution.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and unlocks scheduling, higher concurrency, and larger datasets.
🧪 What if I need bioactivity data?
This Actor returns molecule-level records only. For activities, IC50 values, and target bindings, reach out via the contact form below to request a companion ChEMBL activities scraper.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
ChEMBL Molecules Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe molecule data into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh molecule batches into your product backend, or alert your team in Slack.
🔗 Recommended Actors
- 🏥 FINRA BrokerCheck Scraper - U.S. broker and firm regulatory disclosures
- 🤗 Hugging Face Model Scraper - Model metadata, downloads, and benchmarks
- 🏨 Greatschools Scraper - U.S. school ratings and demographics
- 📈 Smart Apify Actor Scraper - Apify Store actor metadata and quality signals
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by ChEMBL, the European Bioinformatics Institute, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available open ChEMBL data is collected.