# PubChem Compound Scraper (`crawlerbros/pubchem-scraper`) Actor

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

- **URL**: https://apify.com/crawlerbros/pubchem-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Automation, Developer tools, MCP servers
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PubChem Compound Scraper

Scrape **PubChem** — the world's largest free chemistry database with 100M+ compounds maintained by the NCBI. Search by compound name, PubChem CID, SMILES string, or free-text query. Returns molecular identifiers, physicochemical properties, structural data, and synonyms. HTTP-only via the public PubChem REST API. No auth, no proxy required.

### What this actor does

- **Four modes:** `searchByName`, `searchBySmiles`, `searchByCid`, `fullTextSearch`
- **Compound lookup:** by IUPAC name, common name, CID, or SMILES notation
- **Rich properties:** molecular formula, weight, SMILES, InChI, InChIKey, XLogP, H-bond counts, heavy atom count, complexity
- **Synonyms:** up to 10 synonyms per compound
- **Empty fields are omitted** — no nulls in output

### Output per compound

| Field | Type | Description |
|---|---|---|
| `cid` | integer | PubChem Compound ID |
| `iupacName` | string | IUPAC systematic name |
| `molecularFormula` | string | Molecular formula (e.g. C9H8O4) |
| `molecularWeight` | float | Molecular weight in g/mol |
| `canonicalSmiles` | string | Canonical SMILES notation |
| `isomericSmiles` | string | Isomeric SMILES (with stereochemistry) |
| `inchiKey` | string | Standard InChIKey hash |
| `inchi` | string | Standard InChI string |
| `xlogp` | float | Computed XLogP3 lipophilicity |
| `exactMolecularWeight` | float | Exact monoisotopic mass |
| `hbondDonorCount` | integer | Number of hydrogen bond donors |
| `hbondAcceptorCount` | integer | Number of hydrogen bond acceptors |
| `heavyAtomCount` | integer | Number of heavy (non-hydrogen) atoms |
| `rotatablebondCount` | integer | Number of rotatable bonds |
| `synonyms` | array | Up to 10 common synonyms |
| `sourceUrl` | string | PubChem compound page URL |
| `recordType` | string | Always `"compound"` |
| `scrapedAt` | string | ISO 8601 timestamp |

### Input

| Field | Type | Default | Description |
|---|---|---|---|
| `mode` | string | `searchByName` | `searchByName` / `searchBySmiles` / `searchByCid` / `fullTextSearch` |
| `compoundNames` | array | `[]` | Compound names to look up (mode=searchByName) |
| `smilesList` | array | `[]` | SMILES strings (mode=searchBySmiles) |
| `cids` | array | `[]` | PubChem CIDs (mode=searchByCid) |
| `searchQuery` | string | `aspirin` | Free-text query (mode=fullTextSearch) |
| `maxItems` | integer | `10` | Max compounds to return (1–1000) |

#### Example: look up common drug compounds

```json
{
  "mode": "searchByName",
  "compoundNames": ["aspirin", "caffeine", "ibuprofen", "acetaminophen"],
  "maxItems": 4
}
````

#### Example: search by SMILES

```json
{
  "mode": "searchBySmiles",
  "smilesList": ["CC(=O)Oc1ccccc1C(=O)O", "Cn1cnc2c1c(=O)n(c(=O)n2C)C"],
  "maxItems": 2
}
```

#### Example: full-text search

```json
{
  "mode": "fullTextSearch",
  "searchQuery": "acetylsalicylic acid",
  "maxItems": 5
}
```

### FAQs

**Do I need an API key?**
No. PubChem's REST API is freely accessible with no authentication required.

**Are there rate limits?**
PubChem allows up to 5 requests per second. This actor enforces a 0.2s delay between requests automatically.

**How many compounds can I scrape?**
Up to 1000 per run. For `fullTextSearch`, the actor fetches matching CIDs first, then retrieves full data for each.

**What is the difference between canonical and isomeric SMILES?**
Canonical SMILES is a standardized representation without stereochemistry. Isomeric SMILES includes stereochemical information (E/Z, R/S).

**Can I search by molecular structure?**
Yes, use `searchBySmiles` mode with a valid SMILES string.

**Why are some fields missing from certain compounds?**
Not all compounds in PubChem have complete property sets. The actor omits any field for which PubChem returns no data.

**What is XLogP?**
XLogP3 is a computed measure of lipophilicity (fat-solubility) — key for predicting drug absorption, distribution, and bioavailability.

# Actor input Schema

## `mode` (type: `string`):

How to look up compounds.

## `compoundNames` (type: `array`):

List of compound names to look up (e.g. aspirin, caffeine).

## `smilesList` (type: `array`):

List of SMILES strings (e.g. CC(=O)Oc1ccccc1C(=O)O).

## `cids` (type: `array`):

List of PubChem compound IDs (e.g. 2244, 5793).

## `searchQuery` (type: `string`):

Free-text query to search across compound names and synonyms.

## `maxItems` (type: `integer`):

Maximum number of compounds to return (1–1000).

## Actor input object example

```json
{
  "mode": "fullTextSearch",
  "searchQuery": "aspirin",
  "maxItems": 10
}
```

# Actor output Schema

## `compounds` (type: `string`):

Dataset containing all scraped PubChem compound records.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "fullTextSearch",
    "searchQuery": "aspirin",
    "maxItems": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/pubchem-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "fullTextSearch",
    "searchQuery": "aspirin",
    "maxItems": 10,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/pubchem-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "fullTextSearch",
  "searchQuery": "aspirin",
  "maxItems": 10
}' |
apify call crawlerbros/pubchem-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/pubchem-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PubChem Compound Scraper",
        "description": "Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.",
        "version": "1.0",
        "x-build-id": "qSNvOYRlXRNZwdxCF"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~pubchem-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-pubchem-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~pubchem-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-pubchem-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~pubchem-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-pubchem-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "searchByName",
                            "searchBySmiles",
                            "searchByCid",
                            "fullTextSearch"
                        ],
                        "type": "string",
                        "description": "How to look up compounds.",
                        "default": "fullTextSearch"
                    },
                    "compoundNames": {
                        "title": "Compound names (mode=searchByName)",
                        "type": "array",
                        "description": "List of compound names to look up (e.g. aspirin, caffeine).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "smilesList": {
                        "title": "SMILES strings (mode=searchBySmiles)",
                        "type": "array",
                        "description": "List of SMILES strings (e.g. CC(=O)Oc1ccccc1C(=O)O).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "cids": {
                        "title": "PubChem CIDs (mode=searchByCid)",
                        "type": "array",
                        "description": "List of PubChem compound IDs (e.g. 2244, 5793).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchQuery": {
                        "title": "Search query (mode=fullTextSearch)",
                        "type": "string",
                        "description": "Free-text query to search across compound names and synonyms."
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of compounds to return (1–1000).",
                        "default": 10
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
