PubChem Compound Scraper avatar

PubChem Compound Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
PubChem Compound Scraper

PubChem Compound Scraper

Scrape PubChem - the world's largest free chemistry database with 100M+ compounds. Search by name, CID, SMILES, or full-text. Returns molecular formula, weight, SMILES, InChI, logP, H-bond counts, synonyms, and more.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

Scrape PubChem — the world's largest free chemistry database with 100M+ compounds maintained by the NCBI. Search by compound name, PubChem CID, SMILES string, or free-text query. Returns molecular identifiers, physicochemical properties, structural data, and synonyms. HTTP-only via the public PubChem REST API. No auth, no proxy required.

What this actor does

  • Four modes: searchByName, searchBySmiles, searchByCid, fullTextSearch
  • Compound lookup: by IUPAC name, common name, CID, or SMILES notation
  • Rich properties: molecular formula, weight, SMILES, InChI, InChIKey, XLogP, H-bond counts, heavy atom count, complexity
  • Synonyms: up to 10 synonyms per compound
  • Empty fields are omitted — no nulls in output

Output per compound

FieldTypeDescription
cidintegerPubChem Compound ID
iupacNamestringIUPAC systematic name
molecularFormulastringMolecular formula (e.g. C9H8O4)
molecularWeightfloatMolecular weight in g/mol
canonicalSmilesstringCanonical SMILES notation
isomericSmilesstringIsomeric SMILES (with stereochemistry)
inchiKeystringStandard InChIKey hash
inchistringStandard InChI string
xlogpfloatComputed XLogP3 lipophilicity
exactMolecularWeightfloatExact monoisotopic mass
hbondDonorCountintegerNumber of hydrogen bond donors
hbondAcceptorCountintegerNumber of hydrogen bond acceptors
heavyAtomCountintegerNumber of heavy (non-hydrogen) atoms
rotatablebondCountintegerNumber of rotatable bonds
synonymsarrayUp to 10 common synonyms
sourceUrlstringPubChem compound page URL
recordTypestringAlways "compound"
scrapedAtstringISO 8601 timestamp

Input

FieldTypeDefaultDescription
modestringsearchByNamesearchByName / searchBySmiles / searchByCid / fullTextSearch
compoundNamesarray[]Compound names to look up (mode=searchByName)
smilesListarray[]SMILES strings (mode=searchBySmiles)
cidsarray[]PubChem CIDs (mode=searchByCid)
searchQuerystringaspirinFree-text query (mode=fullTextSearch)
maxItemsinteger10Max compounds to return (1–1000)

Example: look up common drug compounds

{
"mode": "searchByName",
"compoundNames": ["aspirin", "caffeine", "ibuprofen", "acetaminophen"],
"maxItems": 4
}

Example: search by SMILES

{
"mode": "searchBySmiles",
"smilesList": ["CC(=O)Oc1ccccc1C(=O)O", "Cn1cnc2c1c(=O)n(c(=O)n2C)C"],
"maxItems": 2
}
{
"mode": "fullTextSearch",
"searchQuery": "acetylsalicylic acid",
"maxItems": 5
}

FAQs

Do I need an API key? No. PubChem's REST API is freely accessible with no authentication required.

Are there rate limits? PubChem allows up to 5 requests per second. This actor enforces a 0.2s delay between requests automatically.

How many compounds can I scrape? Up to 1000 per run. For fullTextSearch, the actor fetches matching CIDs first, then retrieves full data for each.

What is the difference between canonical and isomeric SMILES? Canonical SMILES is a standardized representation without stereochemistry. Isomeric SMILES includes stereochemical information (E/Z, R/S).

Can I search by molecular structure? Yes, use searchBySmiles mode with a valid SMILES string.

Why are some fields missing from certain compounds? Not all compounds in PubChem have complete property sets. The actor omits any field for which PubChem returns no data.

What is XLogP? XLogP3 is a computed measure of lipophilicity (fat-solubility) — key for predicting drug absorption, distribution, and bioavailability.