Pricing

from $28.12 / 1,000 results

UniProt Protein Sequence & Annotation Scraper

Export UniProt Knowledgebase entries — search Swiss-Prot by organism, keyword, gene, or any UniProt query, or fetch a single accession. Returns names, genes, organism, sequence length & molecular weight, keywords, comments, features, and PDB/RefSeq/Ensembl/KEGG cross-refs.

Pricing

from $28.12 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

🧬 UniProt Protein Sequence & Annotation Scraper

🚀 Export UniProt Knowledgebase entries in seconds. Query Swiss-Prot and TrEMBL by organism, gene, keyword, subcellular location, length range, or any UniProt field, or fetch a single accession with full annotations. No API key, no SPARQL, no XML parsing.

The UniProt Protein Scraper queries the official UniProt REST API and returns standardized protein records from the world's largest protein-sequence knowledgebase. Each entry carries the primary accession, UniProtKB ID, entry type (reviewed Swiss-Prot vs unreviewed TrEMBL), protein name, alternative names, gene names, organism (scientific + common + taxon ID + lineage), evidence level, annotation score, sequence length, molecular weight, CRC64 / MD5 sequence hashes, keywords (with categories), curated comments (function, subunit, subcellular location, etc.), structural features, reference counts, last-update date, entry version, and the canonical UniProt URL.

UniProt is maintained jointly by EMBL-EBI, SIB, and PIR and is the de facto reference for protein biology in research, pharma, and bioinformatics. Coverage spans 250 million+ entries across 2.7 million+ species in TrEMBL, with ~570,000 manually curated entries in Swiss-Prot. This Actor flattens UniProt's nested JSON into rows that drop into pandas, R, or any warehouse.

🎯 Target Audience	💡 Primary Use Cases
Bioinformatics teams, computational biologists, pharma research, structural biologists, drug-discovery startups, science journalists	Proteome exports, gene-to-protein mapping, target dossier builds, organism-level annotation, sequence + feature retrieval, cross-database joining

📋 What the UniProt Scraper does

Two lookup modes in one Actor:

🔍 Query mode. Pass any UniProt query (reviewed:true AND organism_id:9606, keyword:KW-0181, gene:BRCA1, cc_subcellular_location:nucleus, existence:1, taxonomy_id:10090 AND length:[100 TO 500]).
🆔 Accession mode. Set accession (e.g. P00533) for a single full-entry pull. Skips the search query entirely.

Each record carries identifiers (primary accession, UniProtKB ID, entry type), names (protein name, alternative names, gene names), taxonomy (scientific + common organism, taxon ID, lineage), evidence (protein existence, annotation score), sequence facts (length, molecular weight, CRC64, MD5, plus optional full sequence string), curated annotations (keywords, comments, features), reference + feature counts, last-updated date, version, and the canonical UniProt URL.

💡 Why it matters: UniProt's REST API is rich but verbose. Researchers and engineering teams spend days writing parsers for keywords, comments, and features. This Actor flattens the response into 25 spreadsheet-ready fields so target dossiers, comparative proteomics, and dataset prep land in one query.

📊 Data fields

Each record includes: alternativeNames, annotationScore, comments, crossReferences, ecNumbers, entryType, entryVersion, featureCount, features, geneNames, geneSynonyms, keywords, lastUpdated, organismCommon, organismLineage, organismScientific, primaryAccession, proteinExistence, proteinName, referenceCount, scrapedAt, sequenceCrc64, sequenceLength, sequenceMd5, sequenceMolWeight, taxonId, uniProtkbId, url. All 28 field names come from a real production run, so what you see here is what lands in your dataset.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the UniProt Protein Scraper page on the Apify Store.
🎯 Set input. Pick a query (reviewed:true AND organism_id:9606 is a great starter) or an accession.
🚀 Run it. Click Start and let the Actor walk the UniProt API.
📥 Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to a downloaded proteome slice: 3-5 minutes. No coding required.

🔗 Recommended Actors

💊 RxNorm Drug Concepts Scraper - Standardized US drug vocabulary
🏥 ICD-10-CM, LOINC & Clinical Terminology Scraper - Diagnosis, lab, and drug codes
🤗 Hugging Face Model Scraper - AI model registry metadata
🛡️ urlscan.io Threat Intelligence Scraper - Live web scan data
🌐 RDAP Domain Lookup Scraper - Modern WHOIS replacement

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by EMBL-EBI, the SIB Swiss Institute of Bioinformatics, the Protein Information Resource (PIR), the UniProt Consortium, or any of their funding agencies. All trademarks mentioned are the property of their respective owners. Only publicly available UniProtKB data is collected. Please cite UniProt as required by their CC BY 4.0 license.

🆘 Need Help?

If you hit a bug, have questions about setup, or need a scraper we haven't built yet, open our contact form or write to parseforge@protonmail.com. We also take on paid custom data projects.

For faster answers, join our Discord. It's the best place to get support and suggest new actors.

UniProt Protein Scraper

parseforge/uniprot-protein-scraper

Query the UniProt knowledgebase with any free text search to retrieve protein entries with accession identifiers, names, gene symbols, organism, sequence length, and functional annotations. Useful for proteomics research, bioinformatics pipelines, and structural biology cross referencing.

ParseForge

Uniprot Scraper

fortuitous_pirate/uniprot-scraper

Scrape UniProt protein knowledge base: 250M+ proteins including reviewed Swiss-Prot entries. Search by protein name, gene, organism. Free, no auth required.

Fortuitous Pirate

UniProt Proteins Scraper

parseforge/uniprot-proteins-scraper

Query UniProt with its native syntax such as reviewed=true or organism_id 9606. Returns accession, protein name, organism, gene names, sequence, length, function, keywords, EC numbers, and reference count. Restrict to Swiss-Prot reviewed entries. Useful for proteomics and drug discovery.

ParseForge

HGNC Gene Symbols Scraper

parseforge/hgnc-gene-symbols-scraper

Query the HUGO Gene Nomenclature Committee database for approved human gene symbols, names, aliases, chromosomal location, gene family, RefSeq, Ensembl, OMIM, UniProt, and external links. Export to JSON, CSV, or Excel for bioinformatics, genomics research, and pharmaceutical pipelines.

ParseForge

EBI Proteins API Scraper

parseforge/ebi-proteins-api-scraper

Tap the EMBL EBI Proteins API for curated protein entries filtered by protein name and organism. Returns accession identifiers, gene names, taxonomy, feature annotations, and sequence metadata. Useful for comparative genomics, interaction analysis, and protein function enrichment studies.

ParseForge

Ensembl Genomics Scraper (Genes, Variants, Sequences)

parseforge/ensembl-genomics-scraper

Query the Ensembl genome reference for 200+ species. Look up genes by symbol or stable ID, list features in a genomic region, fetch DNA sequence, or resolve human variants (rsIDs). Returns biotype, coordinates, transcript IDs, descriptions, and assembly metadata.

ParseForge

Protein Structure Release Monitor

flintglade/protein-structure-release-monitor

Monitor protein structure releases and revisions across RCSB PDB, PDBe, AlphaFold DB, and UniProt with durable baselines and evidence-backed diffs.

Flintglade

Protein Structure Evidence Resolver

flintglade/protein-structure-evidence-resolver

Resolve protein structure evidence across RCSB PDB, PDBe, AlphaFold DB, and UniProt with source provenance, conflicts, and downloadable artifacts.

Flintglade

Ensembl Gene Lookup Scraper

parseforge/ensembl-gene-lookup-scraper

Resolve human gene symbols against the Ensembl REST API to fetch stable gene identifiers, chromosome location, strand, biotype, and description. Useful for variant annotation, RNA seq pipelines, and gene set enrichment workflows that need clean Ensembl mappings from a list of HGNC symbols.

ParseForge

ChEMBL Targets Scraper

parseforge/chembl-targets-scraper

Query the ChEMBL target catalog by ID, keyword, organism, or target type. Records include target ChEMBL ID, preferred name, organism, target type, gene symbol, tax ID, components with accession and description, and cross references. Useful for drug discovery research and target review.

ParseForge