UniProt Protein Sequence & Annotation Scraper
Pricing
from $28.12 / 1,000 results
UniProt Protein Sequence & Annotation Scraper
Export UniProt Knowledgebase entries — search Swiss-Prot by organism, keyword, gene, or any UniProt query, or fetch a single accession. Returns names, genes, organism, sequence length & molecular weight, keywords, comments, features, and PDB/RefSeq/Ensembl/KEGG cross-refs.
Pricing
from $28.12 / 1,000 results
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share

🧬 UniProt Protein Sequence & Annotation Scraper
🚀 Export UniProt Knowledgebase entries in seconds. Query Swiss-Prot and TrEMBL by organism, gene, keyword, subcellular location, length range, or any UniProt field, or fetch a single accession with full annotations. No API key, no SPARQL, no XML parsing.
🕒 Last updated: 2026-05-13 · 📊 25 fields per entry · 🧬 250M+ UniProt entries · 🌍 every kingdom of life
The UniProt Protein Scraper queries the official UniProt REST API and returns standardized protein records from the world's largest protein-sequence knowledgebase. Each entry carries the primary accession, UniProtKB ID, entry type (reviewed Swiss-Prot vs unreviewed TrEMBL), protein name, alternative names, gene names, organism (scientific + common + taxon ID + lineage), evidence level, annotation score, sequence length, molecular weight, CRC64 / MD5 sequence hashes, keywords (with categories), curated comments (function, subunit, subcellular location, etc.), structural features, reference counts, last-update date, entry version, and the canonical UniProt URL.
UniProt is maintained jointly by EMBL-EBI, SIB, and PIR and is the de facto reference for protein biology in research, pharma, and bioinformatics. Coverage spans 250 million+ entries across 2.7 million+ species in TrEMBL, with ~570,000 manually curated entries in Swiss-Prot. This Actor flattens UniProt's nested JSON into rows that drop into pandas, R, or any warehouse.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Bioinformatics teams, computational biologists, pharma research, structural biologists, drug-discovery startups, science journalists | Proteome exports, gene-to-protein mapping, target dossier builds, organism-level annotation, sequence + feature retrieval, cross-database joining |
📋 What the UniProt Scraper does
Two lookup modes in one Actor:
- 🔍 Query mode. Pass any UniProt query (
reviewed:true AND organism_id:9606,keyword:KW-0181,gene:BRCA1,cc_subcellular_location:nucleus,existence:1,taxonomy_id:10090 AND length:[100 TO 500]). - 🆔 Accession mode. Set
accession(e.g.P00533) for a single full-entry pull. Skips the search query entirely.
Each record carries identifiers (primary accession, UniProtKB ID, entry type), names (protein name, alternative names, gene names), taxonomy (scientific + common organism, taxon ID, lineage), evidence (protein existence, annotation score), sequence facts (length, molecular weight, CRC64, MD5, plus optional full sequence string), curated annotations (keywords, comments, features), reference + feature counts, last-updated date, version, and the canonical UniProt URL.
💡 Why it matters: UniProt's REST API is rich but verbose. Researchers and engineering teams spend days writing parsers for keywords, comments, and features. This Actor flattens the response into 25 spreadsheet-ready fields so target dossiers, comparative proteomics, and dataset prep land in one query.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing a human proteome pull, gene lookup, and accession fetch.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
query | string | "reviewed:true AND organism_id:9606" | UniProt query syntax. Supports reviewed:, organism_id:, taxonomy_id:, gene:, keyword:, cc_subcellular_location:, existence:, length:[X TO Y], and more. Ignored when accession is set. |
accession | string | "" | Single UniProt accession (e.g. P00533). Bypasses the search query when set. |
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
fetchSequence | boolean | false | When true, embeds the full amino-acid sequence string in every record. Sequence length and molecular weight are always returned. |
pageSize | integer | 500 | Entries per API request. UniProt hard max is 500. |
Example: every reviewed human Swiss-Prot entry.
{"query": "reviewed:true AND organism_id:9606","maxItems": 1000,"pageSize": 500}
Example: single accession, full sequence included.
{"accession": "P00533","fetchSequence": true}
⚠️ Good to Know: the
accessionfield is for a single entry. To resolve a list of accessions, use the query syntax:accession:P00533 OR accession:P04637. UsefetchSequence: false(default) when you do not need the raw amino-acid string. Sequence length and molecular weight are always returned regardless.
📊 Output
Each entry carries 25 fields. Download as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 primaryAccession | string | "A0A0C5B5G6" |
🏷️ uniProtkbId | string | "MOTSC_HUMAN" |
📚 entryType | string | "UniProtKB reviewed (Swiss-Prot)" |
🧬 proteinName | string | "Mitochondrial-derived peptide MOTS-c" |
📝 alternativeNames | string[] | ["Mitochondrial open reading frame of the 12S rRNA-c"] |
🧫 geneNames | string[] | ["MT-RNR1"] |
🦠 organismScientific | string | "Homo sapiens" |
👤 organismCommon | string | "Human" |
🆔 taxonId | number | 9606 |
🌳 organismLineage | string[] | ["Eukaryota","Metazoa","Chordata",...] |
🧪 proteinExistence | string | "1: Evidence at protein level" |
⭐ annotationScore | number | 5 |
📏 sequenceLength | number | 16 |
⚖️ sequenceMolWeight | number | 2175 |
🔐 sequenceCrc64 | string | "361DE748426DD505" |
🔐 sequenceMd5 | string | "AE72B6C4E87692429C0D558B92BD7B3D" |
🏷️ keywords | object[] | [{ "id": "KW-0238", "category": "Molecular function", "name": "DNA-binding" }] |
💬 comments | object[] | [{ "type": "FUNCTION", "text": "Regulates insulin sensitivity ..." }] |
🧩 features | object[] | [{ "type": "Chain", "description": "MOTS-c", "start": 1, "end": 16 }] |
📖 referenceCount | number | 17 |
🧱 featureCount | number | 6 |
📅 lastUpdated | date | "2026-01-28" |
🔢 entryVersion | number | 30 |
🔗 url | string | "https://www.uniprot.org/uniprotkb/A0A0C5B5G6/entry" |
🕒 scrapedAt | ISO 8601 | "2026-05-13T22:25:18.386Z" |
📦 Sample record
✨ Why choose this Actor
| Capability | |
|---|---|
| 🧬 | Authoritative knowledgebase. Pulls directly from the official UniProt REST API. |
| 🔍 | Full query syntax. Every UniProt search field works: organism, gene, keyword, location, length range, evidence, taxonomy. |
| 🆔 | Accession fast-path. Set accession: to pull one entry without writing a query. |
| 📏 | Sequence facts built in. Length and molecular weight always returned. Full sequence string available on demand. |
| 🏷️ | Curated annotations exposed. Keywords, comments, and features come through as structured arrays. |
| 🚫 | No API key. UniProt is a free public service. |
| 🔁 | Always fresh. Reflects the current UniProt release. |
📊 UniProt entries are referenced in nearly every modern paper on protein biology, drug discovery, and structural biology.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Format | Setup |
|---|---|---|---|---|---|
| ⭐ UniProt Scraper (this Actor) | $5 free credit, then pay-per-use | UniProtKB (Swiss-Prot + TrEMBL) | Live per run | Flat JSON / CSV | ⚡ 2 min |
| Direct REST API calls | Free | Same | Live | Nested JSON | 🐢 Hours |
| Full release FASTA + XML download | Free | Full UniProt | 8-week release | Massive flatfiles | 🐢 Days |
| Commercial bioinformatics platform | $$$ | Curated subset | Real-time | Web UI / API | ⏳ Vendor onboarding |
Pick this Actor when you want UniProt records in a flat table without writing a client or downloading the release.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the UniProt Protein Scraper page on the Apify Store.
- 🎯 Set input. Pick a query (
reviewed:true AND organism_id:9606is a great starter) or an accession. - 🚀 Run it. Click Start and let the Actor walk the UniProt API.
- 📥 Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to a downloaded proteome slice: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating UniProt Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. UniProt has an eight-week release cycle. Schedule a refresh on the same cadence to stay current.
🌟 Beyond business use cases
UniProt data feeds far more than commercial pharma. The same structured records support research, education, and open-science work.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Either supply a UniProt query (reviewed:true AND organism_id:9606) or an accession (P00533), then click Start. The Actor pages through the UniProt REST API, flattens nested fields, and emits a row per entry with 25 columns including keywords, comments, and features.
🔍 What query syntax can I use?
Everything UniProt supports in its own search bar. Common fields: reviewed:, organism_id:, taxonomy_id:, gene:, keyword:, cc_subcellular_location:, existence:, length:[X TO Y], accession:, plus boolean AND/OR/NOT. See the UniProt query fields docs for the full list.
🆔 How do I look up a single accession?
Set the accession field (e.g. P00533). It bypasses the query and pulls the full entry directly.
🧬 How do I look up many accessions at once?
Use the query syntax with OR: accession:P00533 OR accession:P04637 OR accession:Q9Y6K8.
📏 Does it include the full sequence string?
Only when fetchSequence: true. Sequence length and molecular weight are always returned. Skip the full string for big proteomes to keep dataset sizes manageable.
🔁 How fresh is the data?
UniProt releases every eight weeks. Every run hits the live API, so output reflects the current release.
📚 What is the difference between Swiss-Prot and TrEMBL?
Swiss-Prot is manually curated (reviewed:true, ~570K entries). TrEMBL is automatically annotated (reviewed:false, hundreds of millions of entries). Pick the slice your work needs.
🚫 Do I need an API key?
No. The UniProt REST API is free and public.
⏰ Can I schedule recurring runs?
Yes. Use Apify Schedules to refresh on the UniProt release cadence and pipe results into your pipeline.
⚖️ Is this data legal to use?
Yes. UniProt is released under CC BY 4.0. Attribute UniProt in any downstream publication or product, as their license requires.
💳 Do I need a paid Apify plan?
No. The free plan covers small runs (10 records). A paid plan unlocks higher limits and scheduling.
🆘 What if I need help?
Reach out via the contact form below to request a custom protein workflow.
🔌 Integrate with any app
UniProt Protein Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step research workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get release notifications in your channels
- Airbyte - Pipe protein records into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh UniProt entries into your bio pipeline or alert your team in Slack.
🔗 Recommended Actors
- 💊 RxNorm Drug Concepts Scraper - Standardized US drug vocabulary
- 🏥 ICD-10-CM, LOINC & Clinical Terminology Scraper - Diagnosis, lab, and drug codes
- 🤗 Hugging Face Model Scraper - AI model registry metadata
- 🛡️ urlscan.io Threat Intelligence Scraper - Live web scan data
- 🌐 RDAP Domain Lookup Scraper - Modern WHOIS replacement
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by EMBL-EBI, the SIB Swiss Institute of Bioinformatics, the Protein Information Resource (PIR), the UniProt Consortium, or any of their funding agencies. All trademarks mentioned are the property of their respective owners. Only publicly available UniProtKB data is collected. Please cite UniProt as required by their CC BY 4.0 license.