# RCSB PDB Protein Structure Scraper (`parseforge/rcsb-pdb-scraper`) Actor

Scrape protein structure entries from the RCSB Protein Data Bank including title, authors, citation, experimental method (X-ray, EM, NMR), resolution, cell parameters, symmetry, polymer entities, keywords and entry metadata. No API key required.

- **URL**: https://apify.com/parseforge/rcsb-pdb-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Education, Developer tools, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $28.87 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/raw/main/banner.jpg)

## 🧬 RCSB Protein Data Bank Scraper

> 🚀 **Export 3D macromolecular structure metadata in seconds.** Pull **220,000+ PDB entries** with resolution, experimental method, unit cell, primary citation, and deposit history. No API key, no registration, no manual REST stitching.

> 🕒 **Last updated:** 2026-05-13 · **📊 22 fields** per record · **🧬 220,000+ structures** · **🔬 9 experimental methods** · **🌐 RCSB public API**

The **RCSB PDB Scraper** queries the RCSB Search API and Data API and returns **22 fields per structure**, including the 4-character PDB ID, title and descriptor, classification keywords, experimental method (X-ray, cryo-EM, NMR, neutron, fiber, powder, scattering), combined resolution, unit-cell dimensions and crystal symmetry (for X-ray entries), deposit and release dates, polymer composition and atom count, the audit-author list, and the full primary citation (title, journal, year, authors, DOI, PubMed ID). The Protein Data Bank has been the global archive of 3D biological macromolecular structures since 1971.

The catalog covers **proteins, nucleic acids, complexes, viruses, ribosomes, membrane proteins, and small-molecule ligands** across **X-ray diffraction, electron microscopy (cryo-EM), solution and solid-state NMR, neutron diffraction, fiber, powder, electron crystallography, and solution scattering**. This Actor makes the data downloadable as CSV, Excel, JSON, or XML in under a minute. Crystallographic fields (unit cell, space group, resolution refinement) are surfaced only when relevant to the experiment.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Structural biologists, cryo-EM researchers, computational chemists, drug discovery teams, bioinformaticians, journal editors, citation analysts, ML researchers | Structure browsing, citation graphs, method benchmarking, drug-target validation, training sets for AI structure prediction, deposition tracking, journal scientometrics |

---

### 📋 What the RCSB PDB Scraper does

Two retrieval modes in a single run:

- 🔎 **Full-text search.** Query the RCSB search API for any text (e.g. `hemoglobin`, `SARS-CoV-2 spike`, `kinase inhibitor`).
- 🆔 **Explicit IDs.** Pass a list of 4-character PDB entry IDs (e.g. `["3GOU", "1HHO"]`) to fetch full metadata directly.
- 🔬 **Method filter.** Restrict by experimental method (X-ray, cryo-EM, NMR, neutron, fiber, powder, scattering, electron crystallography).

Each record returns the PDB ID, RCSB explorer URL, structure title and descriptor, classification keywords, experimental method, combined resolution, unit-cell dimensions and space group (for X-ray only), refinement resolution, deposit and release dates, polymer entity count, atom and monomer counts, the audit-author list, and the full primary citation block.

> 💡 **Why it matters:** PDB structures are the bedrock of structural biology, drug discovery, and the AlphaFold era. The RCSB API surfaces fields across multiple endpoints; this Actor joins them into a single, denormalized row per entry, complete with citation metadata.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded PDB dataset._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td><code>maxItems</code></td><td>integer</td><td><code>10</code></td><td>Records to return. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
<tr><td><code>searchQuery</code></td><td>string</td><td><code>"hemoglobin"</code></td><td>Full-text search query. Empty if you pass explicit IDs.</td></tr>
<tr><td><code>pdbIds</code></td><td>string[]</td><td><code>[]</code></td><td>Explicit 4-character PDB IDs. Overrides the search query when set.</td></tr>
<tr><td><code>experimentalMethod</code></td><td>string</td><td><code>""</code></td><td>One of 9 experimental methods. Empty = any.</td></tr>
</tbody>
</table>

**Example: 100 cryo-EM structures matching "spike".**

```json
{
    "maxItems": 100,
    "searchQuery": "spike",
    "experimentalMethod": "ELECTRON MICROSCOPY"
}
````

**Example: explicit IDs for hemoglobin classics.**

```json
{
    "maxItems": 5,
    "pdbIds": ["3GOU", "1HHO", "2DN1", "1A3N", "4HHB"]
}
```

> ⚠️ **Good to Know:** the unit-cell block, crystal count, and refinement resolution apply only to X-ray entries; the Actor omits these fields cleanly for cryo-EM and NMR structures rather than emitting always-null columns. Resolution comes from `rcsb_entry_info.resolution_combined` and is always an array (some methods return more than one value). The primary citation is taken from `rcsb_primary_citation` and falls back to the first `citation` entry.

***

### 📊 Output

Each PDB record contains up to **22 fields**. Download the dataset as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🆔 `rcsb_id` | string | `"10AD"` |
| 🔗 `url` | string | `"https://www.rcsb.org/structure/10AD"` |
| 🏷️ `title` | string | null | `"Cryo-EM structure of the human BK channel bound to the agonist NS1619"` |
| 🧪 `descriptor` | string | structure descriptor |
| 🔖 `keywords` | string | null | `"MEMBRANE PROTEIN"` |
| 📝 `keyword_text` | string | null | `"BK, Slo1, MEMBRANE PROTEIN"` |
| 🔬 `experimental_method` | string | null | `"EM"` |
| 📏 `resolution_combined` | number\[] | null | `[3.44]` |
| 🧊 `crystals_number` | number | (X-ray only) `1` |
| 📐 `cell` | object | (X-ray only) `{ length_a, length_b, length_c, angle_alpha, angle_beta, angle_gamma, Z_PDB }` |
| 🔷 `symmetry` | object | (X-ray only) `{ space_group_name_H_M, Int_Tables_number }` |
| 🎯 `ls_d_res_high` | number | (X-ray only) refinement resolution |
| 📅 `deposit_date` | string | null | `"2026-01-08T00:00:00.000+00:00"` |
| 📤 `release_date` | string | null | `"2026-02-04T00:00:00.000+00:00"` |
| 🔁 `revision_date` | string | null | `"2026-02-11T00:00:00.000+00:00"` |
| 🧬 `polymer_entity_count` | number | null | `1` |
| 🍯 `branched_entity_count` | number | null | `0` |
| 🧩 `polymer_composition` | string | null | `"homomeric protein"` |
| ⚛️ `deposited_atom_count` | number | null | `28028` |
| 🔗 `deposited_polymer_monomer_count` | number | null | `4452` |
| 👥 `audit_authors` | string\[] | `["Gonzalez-Sanabria, N.", "Contreras, G.F."]` |
| 📰 `primary_citation` | object | null | `{ title, journal, year, doi, pubmed, authors }` |
| 🕒 `scrapedAt` | ISO 8601 | `"2026-05-13T22:26:22.583Z"` |

#### 📦 Sample records

<details>
<summary><strong>🔬 Cryo-EM: human BK channel + NS1619 agonist (10AD)</strong></summary>

```json
{
    "rcsb_id": "10AD",
    "url": "https://www.rcsb.org/structure/10AD",
    "title": "Cryo-EM structure of the human BK channel bound to the agonist NS1619",
    "keywords": "MEMBRANE PROTEIN",
    "keyword_text": "BK, Slo1, MEMBRANE PROTEIN",
    "experimental_method": "EM",
    "resolution_combined": [3.44],
    "deposit_date": "2026-01-08T00:00:00.000+00:00",
    "release_date": "2026-02-04T00:00:00.000+00:00",
    "revision_date": "2026-02-11T00:00:00.000+00:00",
    "polymer_entity_count": 1,
    "branched_entity_count": 0,
    "polymer_composition": "homomeric protein",
    "deposited_atom_count": 28028,
    "deposited_polymer_monomer_count": 4452,
    "audit_authors": ["Gonzalez-Sanabria, N.", "Contreras, G.F.", "Perozo, E.", "Latorre, R."],
    "primary_citation": {
        "title": "The BK channel-NS1619 agonist complex reveals molecular insights into allosteric activation gating.",
        "journal": "Proc.Natl.Acad.Sci.USA",
        "year": 2026,
        "doi": "10.1073/pnas.2507707123",
        "pubmed": 41591909,
        "authors": ["Gonzalez-Sanabria, N.", "Contreras, G.F.", "Rojas, M.", "Duarte, Y.", "Gonzalez-Nilo, F.D.", "Perozo, E.", "Latorre, R."]
    },
    "scrapedAt": "2026-05-13T22:26:22.583Z"
}
```

</details>

<details>
<summary><strong>🧬 Cryo-EM heteromeric complex: CRBN-DDB1 + HBS1L + TNG961 (10AY)</strong></summary>

```json
{
    "rcsb_id": "10AY",
    "url": "https://www.rcsb.org/structure/10AY",
    "title": "Cryo-EM structure of CRBN-DDB1 in complex with HBS1L and TNG961",
    "keywords": "CYTOSOLIC PROTEIN",
    "keyword_text": "FOCAD, ribosome, PELO, ubiquitin, CYTOSOLIC PROTEIN",
    "experimental_method": "EM",
    "resolution_combined": [2.9],
    "polymer_entity_count": 3,
    "polymer_composition": "heteromeric protein",
    "deposited_atom_count": 20140,
    "primary_citation": {
        "title": "TNG961 is a selective oral HBS1L molecular glue degrader for the treatment of FOCAD-deleted cancers.",
        "journal": "Cancer Discov",
        "year": 2026,
        "doi": "10.1158/2159-8290.CD-26-0040",
        "pubmed": 42001523
    },
    "scrapedAt": "2026-05-13T22:26:22.567Z"
}
```

</details>

<details>
<summary><strong>🧠 Cryo-EM antibody-NMDAR complex (10EN)</strong></summary>

```json
{
    "rcsb_id": "10EN",
    "url": "https://www.rcsb.org/structure/10EN",
    "title": "SK3D-Matured in complex with GluN1-GluN2B, full refinement",
    "keywords": "SIGNALING PROTEIN/Immune System",
    "experimental_method": "EM",
    "resolution_combined": [3.7],
    "polymer_entity_count": 4,
    "polymer_composition": "heteromeric protein",
    "primary_citation": {
        "title": "Ectopic NMDAR expression in cancer unmasks germline-encoded autoimmunity.",
        "journal": "Nature",
        "year": 2026,
        "doi": "10.1038/s41586-026-10278-0",
        "pubmed": 41882353
    }
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🧬 | **Global coverage.** 220,000+ macromolecular structures across all PDB experimental methods. |
| 🎯 | **Two retrieval modes.** Run a full-text search or pass explicit PDB IDs in a single input. |
| 🔬 | **Method-aware fields.** Crystallographic fields appear only for X-ray entries; cryo-EM and NMR rows stay clean. |
| 📰 | **Full citation block.** Title, journal, year, DOI, PubMed ID, and author list per structure. |
| ⚡ | **Fast.** Parallel detail fetches (concurrency 8) bring 50 entries from 42 s to under 10 s. |
| 🔁 | **Always fresh.** Every run hits the RCSB API live, so newly released entries appear within hours of public release. |
| 🚫 | **No authentication.** Works on the public RCSB search and data APIs. No login or API key. |

> 📊 The Protein Data Bank powers every modern structure-based drug discovery pipeline and is the training ground for the AlphaFold era.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| **⭐ RCSB PDB Scraper** *(this Actor)* | $5 free credit, then pay-per-use | **220,000+ entries** | **Live per run** | text, IDs, method | ⚡ 2 min |
| RCSB REST + custom scripts | Free | Full PDB | Manual | Many, hand-rolled | 🐢 Days |
| PDBe SOLR API | Free | Mirror of PDB | Live | Many | ⏳ Hours |
| Crystallographic supplements from journals | Paid | Per-paper | Per-issue | None | 🕒 Variable |

Pick this Actor when you want broad structural-biology coverage, ready-joined records, and no pipeline maintenance.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the RCSB Protein Data Bank Scraper page on the Apify Store.
3. 🎯 **Set input.** Enter a search query or paste a list of PDB IDs, optionally filter by method.
4. 🚀 **Run it.** Click **Start** and let the Actor collect your data.
5. 📥 **Download.** Grab your results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to downloaded dataset: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 💊 Structure-Based Drug Discovery

- Target-validation surveys across approved drug classes
- Cryo-EM resolution surveys for cohorts of GPCRs or kinases
- Ligand-bound vs apo audits for screening campaigns
- Competitive intel on filed structures

</td>
<td width="50%" valign="top">

#### 🧬 Structural Biology Research

- Annual deposition trends across methods
- X-ray vs cryo-EM resolution distributions
- Polymer composition statistics by therapeutic area
- Collaborations and author network analyses

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 📰 Scientometrics & Citation Analysis

- DOI and PubMed cross-link feeds for biblio databases
- Author productivity dashboards
- Journal coverage of structural biology output
- Time-to-publication after deposit

</td>
<td width="50%" valign="top">

#### 🤖 ML & AI for Structure Prediction

- Curated training sets filtered by resolution
- Method-stratified evaluation sets
- Citation-linked benchmark suites
- Multi-modal joins with UniProt and ChEMBL

</td>
</tr>
</table>

***

### 🔌 Automating RCSB PDB Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. Weekly refreshes catch every new PDB release.

***

### 🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Reproducible structural-biology studies with versioned dataset pulls
- Teaching datasets for crystallography and cryo-EM coursework
- Open-source benchmarks for structure-prediction models
- Cross-database joins with UniProt, ChEMBL, and AlphaFold DB

</td>
<td width="50%">

#### 🎨 Personal and creative

- Indie 3D-structure viewers and educational apps
- Visualizations for science-communication content
- Hobbyist databases for crystallography enthusiasts
- Portfolio projects on protein-structure analysis

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Open-access pathogen-structure feeds during outbreaks
- Pandemic-response structural-biology mapping
- Public-domain references for science journalism
- Open-data education for high schools and museums

</td>
<td width="50%">

#### 🧪 Experimentation

- Train surface-prediction or pocket-detection models
- Prototype agentic tools that resolve PDB IDs
- Benchmark structure-search libraries on real data
- Generate structural embeddings at scale

</td>
</tr>
</table>

***

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20RCSB%20PDB%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20RCSB%20PDB%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20RCSB%20PDB%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20RCSB%20PDB%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20workflow.)

***

### ❓ Frequently Asked Questions

#### 🧩 How does it work?

Enter a search query or a list of PDB IDs, click Start, and the Actor hits the RCSB Search API to resolve IDs, then fetches full metadata per entry from the RCSB Data API at concurrency 8. Records are emitted as clean, joined JSON. No browser automation, no captchas, no setup.

#### 🧬 Where does the data come from?

Directly from the RCSB Search API (`search.rcsb.org/rcsbsearch/v2/query`) and Data API (`data.rcsb.org/rest/v1/core/entry`). The Protein Data Bank is maintained jointly by RCSB PDB (USA), PDBe (Europe), and PDBj (Japan) under the wwPDB.

#### 🔬 Why are unit-cell and refinement fields missing for some structures?

Crystallographic fields apply only to X-ray entries. Cryo-EM, NMR, neutron, and fiber-diffraction structures do not have a unit cell or a `ls_d_res_high` refinement resolution. The Actor omits those fields for non-X-ray rows to keep the dataset clean.

#### 📏 What does `resolution_combined` actually contain?

A numeric array from `rcsb_entry_info.resolution_combined`. Most entries return a single value; multi-experiment structures may return multiple. Units are angstroms.

#### 📰 Which citation field is `primary_citation`?

It is taken from `rcsb_primary_citation` if present, otherwise the first item in the `citation` array. It contains the publication title, journal, year, DOI, PubMed ID, and authors.

#### 🔁 How often is the dataset refreshed?

RCSB releases new and updated entries weekly on Wednesdays. Every run of this Actor pulls live, so your dataset reflects the current state of the PDB at run time.

#### 🆔 Can I fetch one specific PDB ID?

Yes. Pass it in the `pdbIds` array, leave `searchQuery` empty, and you will get back a single record with the full metadata block.

#### ⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (daily, weekly) and keep a downstream structural-biology database in sync.

#### ⚖️ Is this data legal to use?

The Protein Data Bank is released under a CC0 dedication. The raw structure metadata is publicly accessible. Review wwPDB licensing for your specific use case, especially for redistribution.

#### 💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and unlocks scheduling, higher concurrency, and larger datasets.

#### 🧪 What if I need atom coordinates?

This Actor returns metadata only, not the mmCIF or PDB coordinate files. For coordinates, fetch directly from the RCSB Files API, or reach out via the contact form below to request a companion coordinate fetcher.

#### 🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.

***

### 🔌 Integrate with any app

RCSB PDB Scraper connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get run notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe PDB data into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits and releases
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh PDB metadata into your research backend, or alert your team in Slack when a watched ID is released.

***

### 🔗 Recommended Actors

- [**🤗 Hugging Face Model Scraper**](https://apify.com/parseforge/hugging-face-model-scraper) - Model metadata, downloads, and benchmarks
- [**🏥 FINRA BrokerCheck Scraper**](https://apify.com/parseforge/finra-brokercheck-scraper) - U.S. broker and firm regulatory disclosures
- [**🏨 Greatschools Scraper**](https://apify.com/parseforge/greatschools-scraper) - U.S. school ratings and demographics
- [**📈 Smart Apify Actor Scraper**](https://apify.com/parseforge/smart-apify-actor-scraper) - Apify Store actor metadata and quality signals

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by RCSB PDB, the wwPDB, or any of its partner sites. All trademarks mentioned are the property of their respective owners. Only publicly available open structural-biology data is collected.

# Actor input Schema

## `searchQuery` (type: `string`):

Full-text search query (e.g. "hemoglobin", "SARS-CoV-2 spike"). Leave empty if you pass explicit PDB IDs.

## `pdbIds` (type: `array`):

Explicit 4-character PDB entry IDs to fetch (e.g. \["3GOU", "1HHO"]). Overrides the search query if provided.

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `experimentalMethod` (type: `string`):

Filter results by structure determination method. Leave empty for all methods.

## Actor input object example

```json
{
  "searchQuery": "hemoglobin",
  "pdbIds": [],
  "maxItems": 10,
  "experimentalMethod": ""
}
```

# Actor output Schema

## `overview` (type: `string`):

Overview of scraped data

## `fullData` (type: `string`):

Complete dataset

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "hemoglobin",
    "pdbIds": [],
    "maxItems": 10,
    "experimentalMethod": ""
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/rcsb-pdb-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "hemoglobin",
    "pdbIds": [],
    "maxItems": 10,
    "experimentalMethod": "",
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/rcsb-pdb-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "hemoglobin",
  "pdbIds": [],
  "maxItems": 10,
  "experimentalMethod": ""
}' |
apify call parseforge/rcsb-pdb-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/rcsb-pdb-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RCSB PDB Protein Structure Scraper",
        "description": "Scrape protein structure entries from the RCSB Protein Data Bank including title, authors, citation, experimental method (X-ray, EM, NMR), resolution, cell parameters, symmetry, polymer entities, keywords and entry metadata. No API key required.",
        "version": "0.0",
        "x-build-id": "roCUanmo5KzH8kpyz"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~rcsb-pdb-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-rcsb-pdb-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~rcsb-pdb-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-rcsb-pdb-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~rcsb-pdb-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-rcsb-pdb-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Full-text search query (e.g. \"hemoglobin\", \"SARS-CoV-2 spike\"). Leave empty if you pass explicit PDB IDs.",
                        "default": ""
                    },
                    "pdbIds": {
                        "title": "PDB IDs",
                        "type": "array",
                        "description": "Explicit 4-character PDB entry IDs to fetch (e.g. [\"3GOU\", \"1HHO\"]). Overrides the search query if provided.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "experimentalMethod": {
                        "title": "Experimental Method",
                        "enum": [
                            "",
                            "X-RAY DIFFRACTION",
                            "ELECTRON MICROSCOPY",
                            "SOLUTION NMR",
                            "NEUTRON DIFFRACTION",
                            "SOLID-STATE NMR",
                            "ELECTRON CRYSTALLOGRAPHY",
                            "FIBER DIFFRACTION",
                            "POWDER DIFFRACTION",
                            "SOLUTION SCATTERING"
                        ],
                        "type": "string",
                        "description": "Filter results by structure determination method. Leave empty for all methods.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
