Pricing

from $28.87 / 1,000 results

RCSB PDB Protein Structure Scraper

Scrape protein structure entries from the RCSB Protein Data Bank including title, authors, citation, experimental method (X-ray, EM, NMR), resolution, cell parameters, symmetry, polymer entities, keywords and entry metadata. No API key required.

Pricing

from $28.87 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

🧬 RCSB Protein Data Bank Scraper

🚀 Export 3D macromolecular structure metadata in seconds. Pull 220,000+ PDB entries with resolution, experimental method, unit cell, primary citation, and deposit history. No API key, no registration, no manual REST stitching.

The RCSB PDB Scraper queries the RCSB Search API and Data API and returns 22 fields per structure, including the 4-character PDB ID, title and descriptor, classification keywords, experimental method (X-ray, cryo-EM, NMR, neutron, fiber, powder, scattering), combined resolution, unit-cell dimensions and crystal symmetry (for X-ray entries), deposit and release dates, polymer composition and atom count, the audit-author list, and the full primary citation (title, journal, year, authors, DOI, PubMed ID). The Protein Data Bank has been the global archive of 3D biological macromolecular structures since 1971.

The catalog covers proteins, nucleic acids, complexes, viruses, ribosomes, membrane proteins, and small-molecule ligands across X-ray diffraction, electron microscopy (cryo-EM), solution and solid-state NMR, neutron diffraction, fiber, powder, electron crystallography, and solution scattering. This Actor makes the data downloadable as CSV, Excel, JSON, or XML in under a minute. Crystallographic fields (unit cell, space group, resolution refinement) are surfaced only when relevant to the experiment.

🎯 Target Audience	💡 Primary Use Cases
Structural biologists, cryo-EM researchers, computational chemists, drug discovery teams, bioinformaticians, journal editors, citation analysts, ML researchers	Structure browsing, citation graphs, method benchmarking, drug-target validation, training sets for AI structure prediction, deposition tracking, journal scientometrics

📋 What the RCSB PDB Scraper does

Two retrieval modes in a single run:

🔎 Full-text search. Query the RCSB search API for any text (e.g. hemoglobin, SARS-CoV-2 spike, kinase inhibitor).
🆔 Explicit IDs. Pass a list of 4-character PDB entry IDs (e.g. ["3GOU", "1HHO"]) to fetch full metadata directly.
🔬 Method filter. Restrict by experimental method (X-ray, cryo-EM, NMR, neutron, fiber, powder, scattering, electron crystallography).

Each record returns the PDB ID, RCSB explorer URL, structure title and descriptor, classification keywords, experimental method, combined resolution, unit-cell dimensions and space group (for X-ray only), refinement resolution, deposit and release dates, polymer entity count, atom and monomer counts, the audit-author list, and the full primary citation block.

💡 Why it matters: PDB structures are the bedrock of structural biology, drug discovery, and the AlphaFold era. The RCSB API surfaces fields across multiple endpoints; this Actor joins them into a single, denormalized row per entry, complete with citation metadata.

📊 Data fields

Each record includes: audit_authors, branched_entity_count, cell, crystals_number, deposit_date, deposited_atom_count, deposited_polymer_monomer_count, experimental_method, keyword_text, keywords, ls_d_res_high, polymer_composition, polymer_entity_count, primary_citation, rcsb_id, release_date, resolution_combined, revision_date, scrapedAt, symmetry, title, url. All 22 field names come from a real production run, so what you see here is what lands in your dataset.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the RCSB Protein Data Bank Scraper page on the Apify Store.
🎯 Set input. Enter a search query or paste a list of PDB IDs, optionally filter by method.
🚀 Run it. Click Start and let the Actor collect your data.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

🔗 Recommended Actors

🤗 Hugging Face Model Scraper - Model metadata, downloads, and benchmarks
🏥 FINRA BrokerCheck Scraper - U.S. broker and firm regulatory disclosures
🏨 Greatschools Scraper - U.S. school ratings and demographics
📈 Smart Apify Actor Scraper - Apify Store actor metadata and quality signals

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by RCSB PDB, the wwPDB, or any of its partner sites. All trademarks mentioned are the property of their respective owners. Only publicly available open structural-biology data is collected.

🆘 Need Help?

If you hit a bug, have questions about setup, or need a scraper we haven't built yet, open our contact form or write to parseforge@protonmail.com. We also take on paid custom data projects.

For faster answers, join our Discord. It's the best place to get support and suggest new actors.

Rcsb Protein Scraper

velvety_bedbug/rcsb-protein-scraper

RCSB Protein Data Bank Scraper. Structured data export for lead generation, enrichment, and competitive research.

Peters Bugs

Protein Structure Release Monitor

flintglade/protein-structure-release-monitor

Monitor protein structure releases and revisions across RCSB PDB, PDBe, AlphaFold DB, and UniProt with durable baselines and evidence-backed diffs.

Flintglade

Protein Structure Evidence Resolver

flintglade/protein-structure-evidence-resolver

Resolve protein structure evidence across RCSB PDB, PDBe, AlphaFold DB, and UniProt with source provenance, conflicts, and downloadable artifacts.

Flintglade

InterPro Protein Families Scraper

parseforge/interpro-protein-families-scraper

Pull protein family and domain entries from the EBI InterPro database by accession or by browsing every entry. Each record carries the accession, type, name, member database signatures from Pfam, SMART and CDD, GO terms, and protein counts. Built for bioinformatics annotation work.

ParseForge

EBI Proteins API Scraper

parseforge/ebi-proteins-api-scraper

Tap the EMBL EBI Proteins API for curated protein entries filtered by protein name and organism. Returns accession identifiers, gene names, taxonomy, feature annotations, and sequence metadata. Useful for comparative genomics, interaction analysis, and protein function enrichment studies.

ParseForge

Uniprot Scraper

fortuitous_pirate/uniprot-scraper

Scrape UniProt protein knowledge base: 250M+ proteins including reviewed Swiss-Prot entries. Search by protein name, gene, organism. Free, no auth required.

Fortuitous Pirate

STRING Protein Interactions Scraper

parseforge/string-protein-interactions-scraper

Pull protein interaction networks from STRING DB by passing one or more protein identifiers and an NCBI taxon ID. Returns interacting partner names, STRING IDs, combined confidence scores, and evidence channel subscores. Useful for systems biology, target discovery, and pathway analysis.

ParseForge

NCBI Bio Datasets

flamelit_arowana/ncbi-bio-datasets

Query NCBI databases (PubMed, Gene, Protein, SNPs) via the E-utilities API. Search biomedical literature, gene sequences, protein data, and genetic variants — no API key required.

Kevin Grossi

UniProt Protein Scraper

parseforge/uniprot-protein-scraper

Query the UniProt knowledgebase with any free text search to retrieve protein entries with accession identifiers, names, gene symbols, organism, sequence length, and functional annotations. Useful for proteomics research, bioinformatics pipelines, and structural biology cross referencing.

ParseForge

USDA Food Nutrition Scraper — Calories, Protein & Macros

copious_atoll/usda-food-nutrition

Extract USDA FoodData Central nutritional data. Search 300K+ foods for calories, protein, fat, carbs, vitamins, minerals. Branded and generic foods. Free USDA API, no proxy needed.