# UniProt Protein Sequence & Annotation Scraper (`parseforge/uniprot-scraper`) Actor

Export UniProt Knowledgebase entries — search Swiss-Prot by organism, keyword, gene, or any UniProt query, or fetch a single accession. Returns names, genes, organism, sequence length & molecular weight, keywords, comments, features, and PDB/RefSeq/Ensembl/KEGG cross-refs.

- **URL**: https://apify.com/parseforge/uniprot-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Education, Developer tools, Business
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $28.12 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/raw/main/banner.jpg)

## 🧬 UniProt Protein Sequence & Annotation Scraper

> 🚀 **Export UniProt Knowledgebase entries in seconds.** Query Swiss-Prot and TrEMBL by **organism, gene, keyword, subcellular location, length range, or any UniProt field**, or fetch a single accession with full annotations. No API key, no SPARQL, no XML parsing.

> 🕒 **Last updated:** 2026-05-13 · **📊 25 fields** per entry · **🧬 250M+ UniProt entries** · **🌍 every kingdom of life**

The **UniProt Protein Scraper** queries the official UniProt REST API and returns standardized protein records from the world's largest protein-sequence knowledgebase. Each entry carries the primary accession, UniProtKB ID, entry type (reviewed Swiss-Prot vs unreviewed TrEMBL), protein name, alternative names, gene names, organism (scientific + common + taxon ID + lineage), evidence level, annotation score, sequence length, molecular weight, CRC64 / MD5 sequence hashes, keywords (with categories), curated comments (function, subunit, subcellular location, etc.), structural features, reference counts, last-update date, entry version, and the canonical UniProt URL.

UniProt is maintained jointly by **EMBL-EBI, SIB, and PIR** and is the de facto reference for protein biology in research, pharma, and bioinformatics. Coverage spans **250 million+ entries** across **2.7 million+ species** in TrEMBL, with **~570,000 manually curated entries in Swiss-Prot**. This Actor flattens UniProt's nested JSON into rows that drop into pandas, R, or any warehouse.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Bioinformatics teams, computational biologists, pharma research, structural biologists, drug-discovery startups, science journalists | Proteome exports, gene-to-protein mapping, target dossier builds, organism-level annotation, sequence + feature retrieval, cross-database joining |

---

### 📋 What the UniProt Scraper does

Two lookup modes in one Actor:

- 🔍 **Query mode.** Pass any UniProt query (`reviewed:true AND organism_id:9606`, `keyword:KW-0181`, `gene:BRCA1`, `cc_subcellular_location:nucleus`, `existence:1`, `taxonomy_id:10090 AND length:[100 TO 500]`).
- 🆔 **Accession mode.** Set `accession` (e.g. `P00533`) for a single full-entry pull. Skips the search query entirely.

Each record carries identifiers (primary accession, UniProtKB ID, entry type), names (protein name, alternative names, gene names), taxonomy (scientific + common organism, taxon ID, lineage), evidence (protein existence, annotation score), sequence facts (length, molecular weight, CRC64, MD5, plus optional full sequence string), curated annotations (keywords, comments, features), reference + feature counts, last-updated date, version, and the canonical UniProt URL.

> 💡 **Why it matters:** UniProt's REST API is rich but verbose. Researchers and engineering teams spend days writing parsers for keywords, comments, and features. This Actor flattens the response into 25 spreadsheet-ready fields so target dossiers, comparative proteomics, and dataset prep land in one query.

---

### 🎬 Full Demo

_🚧 Coming soon: a 3-minute walkthrough showing a human proteome pull, gene lookup, and accession fetch._

---

### ⚙️ Input

<table>
<thead>
<tr><th>Input</th><th>Type</th><th>Default</th><th>Behavior</th></tr>
</thead>
<tbody>
<tr><td><code>query</code></td><td>string</td><td><code>"reviewed:true AND organism_id:9606"</code></td><td>UniProt query syntax. Supports <code>reviewed:</code>, <code>organism_id:</code>, <code>taxonomy_id:</code>, <code>gene:</code>, <code>keyword:</code>, <code>cc_subcellular_location:</code>, <code>existence:</code>, <code>length:[X TO Y]</code>, and more. Ignored when <code>accession</code> is set.</td></tr>
<tr><td><code>accession</code></td><td>string</td><td><code>""</code></td><td>Single UniProt accession (e.g. <code>P00533</code>). Bypasses the search query when set.</td></tr>
<tr><td><code>maxItems</code></td><td>integer</td><td><code>10</code></td><td>Records to return. Free plan caps at 10, paid plan at 1,000,000.</td></tr>
<tr><td><code>fetchSequence</code></td><td>boolean</td><td><code>false</code></td><td>When <code>true</code>, embeds the full amino-acid sequence string in every record. Sequence length and molecular weight are always returned.</td></tr>
<tr><td><code>pageSize</code></td><td>integer</td><td><code>500</code></td><td>Entries per API request. UniProt hard max is 500.</td></tr>
</tbody>
</table>

**Example: every reviewed human Swiss-Prot entry.**

```json
{
    "query": "reviewed:true AND organism_id:9606",
    "maxItems": 1000,
    "pageSize": 500
}
````

**Example: single accession, full sequence included.**

```json
{
    "accession": "P00533",
    "fetchSequence": true
}
```

> ⚠️ **Good to Know:** the `accession` field is for a single entry. To resolve a list of accessions, use the query syntax: `accession:P00533 OR accession:P04637`. Use `fetchSequence: false` (default) when you do not need the raw amino-acid string. Sequence length and molecular weight are always returned regardless.

***

### 📊 Output

Each entry carries **25 fields**. Download as CSV, Excel, JSON, or XML.

#### 🧾 Schema

| Field | Type | Example |
|---|---|---|
| 🆔 `primaryAccession` | string | `"A0A0C5B5G6"` |
| 🏷️ `uniProtkbId` | string | `"MOTSC_HUMAN"` |
| 📚 `entryType` | string | `"UniProtKB reviewed (Swiss-Prot)"` |
| 🧬 `proteinName` | string | `"Mitochondrial-derived peptide MOTS-c"` |
| 📝 `alternativeNames` | string\[] | `["Mitochondrial open reading frame of the 12S rRNA-c"]` |
| 🧫 `geneNames` | string\[] | `["MT-RNR1"]` |
| 🦠 `organismScientific` | string | `"Homo sapiens"` |
| 👤 `organismCommon` | string | `"Human"` |
| 🆔 `taxonId` | number | `9606` |
| 🌳 `organismLineage` | string\[] | `["Eukaryota","Metazoa","Chordata",...]` |
| 🧪 `proteinExistence` | string | `"1: Evidence at protein level"` |
| ⭐ `annotationScore` | number | `5` |
| 📏 `sequenceLength` | number | `16` |
| ⚖️ `sequenceMolWeight` | number | `2175` |
| 🔐 `sequenceCrc64` | string | `"361DE748426DD505"` |
| 🔐 `sequenceMd5` | string | `"AE72B6C4E87692429C0D558B92BD7B3D"` |
| 🏷️ `keywords` | object\[] | `[{ "id": "KW-0238", "category": "Molecular function", "name": "DNA-binding" }]` |
| 💬 `comments` | object\[] | `[{ "type": "FUNCTION", "text": "Regulates insulin sensitivity ..." }]` |
| 🧩 `features` | object\[] | `[{ "type": "Chain", "description": "MOTS-c", "start": 1, "end": 16 }]` |
| 📖 `referenceCount` | number | `17` |
| 🧱 `featureCount` | number | `6` |
| 📅 `lastUpdated` | date | `"2026-01-28"` |
| 🔢 `entryVersion` | number | `30` |
| 🔗 `url` | string | `"https://www.uniprot.org/uniprotkb/A0A0C5B5G6/entry"` |
| 🕒 `scrapedAt` | ISO 8601 | `"2026-05-13T22:25:18.386Z"` |

#### 📦 Sample record

<details>
<summary><strong>🧬 Human MOTS-c peptide (UniProt A0A0C5B5G6)</strong></summary>

```json
{
    "primaryAccession": "A0A0C5B5G6",
    "uniProtkbId": "MOTSC_HUMAN",
    "entryType": "UniProtKB reviewed (Swiss-Prot)",
    "proteinName": "Mitochondrial-derived peptide MOTS-c",
    "alternativeNames": ["Mitochondrial open reading frame of the 12S rRNA-c"],
    "geneNames": ["MT-RNR1"],
    "organismScientific": "Homo sapiens",
    "organismCommon": "Human",
    "taxonId": 9606,
    "organismLineage": [
        "Eukaryota", "Metazoa", "Chordata", "Craniata", "Vertebrata",
        "Euteleostomi", "Mammalia", "Eutheria", "Euarchontoglires",
        "Primates", "Haplorrhini", "Catarrhini", "Hominidae", "Homo"
    ],
    "proteinExistence": "1: Evidence at protein level",
    "annotationScore": 5,
    "sequenceLength": 16,
    "sequenceMolWeight": 2175,
    "sequenceCrc64": "361DE748426DD505",
    "sequenceMd5": "AE72B6C4E87692429C0D558B92BD7B3D",
    "keywords": [
        { "id": "KW-0238", "category": "Molecular function", "name": "DNA-binding" },
        { "id": "KW-0496", "category": "Cellular component", "name": "Mitochondrion" },
        { "id": "KW-0539", "category": "Cellular component", "name": "Nucleus" }
    ],
    "comments": [
        { "type": "FUNCTION", "text": "Regulates insulin sensitivity and metabolic homeostasis ..." },
        { "type": "SUBUNIT", "text": "Interacts with transcription factors ATF1 and NFE2L2/NRF2 ..." }
    ],
    "features": [
        { "type": "Chain", "description": "Mitochondrial-derived peptide MOTS-c", "start": 1, "end": 16 }
    ],
    "referenceCount": 17,
    "featureCount": 6,
    "lastUpdated": "2026-01-28",
    "entryVersion": 30,
    "url": "https://www.uniprot.org/uniprotkb/A0A0C5B5G6/entry",
    "scrapedAt": "2026-05-13T22:25:18.386Z"
}
```

</details>

***

### ✨ Why choose this Actor

| | Capability |
|---|---|
| 🧬 | **Authoritative knowledgebase.** Pulls directly from the official UniProt REST API. |
| 🔍 | **Full query syntax.** Every UniProt search field works: organism, gene, keyword, location, length range, evidence, taxonomy. |
| 🆔 | **Accession fast-path.** Set `accession:` to pull one entry without writing a query. |
| 📏 | **Sequence facts built in.** Length and molecular weight always returned. Full sequence string available on demand. |
| 🏷️ | **Curated annotations exposed.** Keywords, comments, and features come through as structured arrays. |
| 🚫 | **No API key.** UniProt is a free public service. |
| 🔁 | **Always fresh.** Reflects the current UniProt release. |

> 📊 UniProt entries are referenced in nearly every modern paper on protein biology, drug discovery, and structural biology.

***

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Refresh | Format | Setup |
|---|---|---|---|---|---|
| **⭐ UniProt Scraper** *(this Actor)* | $5 free credit, then pay-per-use | UniProtKB (Swiss-Prot + TrEMBL) | **Live per run** | Flat JSON / CSV | ⚡ 2 min |
| Direct REST API calls | Free | Same | Live | Nested JSON | 🐢 Hours |
| Full release FASTA + XML download | Free | Full UniProt | 8-week release | Massive flatfiles | 🐢 Days |
| Commercial bioinformatics platform | $$$ | Curated subset | Real-time | Web UI / API | ⏳ Vendor onboarding |

Pick this Actor when you want UniProt records in a flat table without writing a client or downloading the release.

***

### 🚀 How to use

1. 📝 **Sign up.** [Create a free account with $5 credit](https://console.apify.com/sign-up?fpr=vmoqkp) (takes 2 minutes).
2. 🌐 **Open the Actor.** Go to the UniProt Protein Scraper page on the Apify Store.
3. 🎯 **Set input.** Pick a query (`reviewed:true AND organism_id:9606` is a great starter) or an accession.
4. 🚀 **Run it.** Click **Start** and let the Actor walk the UniProt API.
5. 📥 **Download.** Grab results in the **Dataset** tab as CSV, Excel, JSON, or XML.

> ⏱️ Total time from signup to a downloaded proteome slice: **3-5 minutes.** No coding required.

***

### 💼 Business use cases

<table>
<tr>
<td width="50%" valign="top">

#### 🧪 Drug Discovery & Pharma

- Target dossier builds for new programs
- Cross-organism homolog comparisons
- Subcellular location filters for druggability
- Evidence-level scoring for prioritization

</td>
<td width="50%" valign="top">

#### 🧬 Bioinformatics & Genomics

- Gene-to-protein lookups across organisms
- Proteome exports for comparative analysis
- Annotation enrichment for variant calling
- Keyword and feature-based cohort building

</td>
</tr>
<tr>
<td width="50%" valign="top">

#### 🔬 Structural Biology

- Length and molecular-weight filters for crystallography candidates
- Feature-table mining for domain boundaries
- Sequence hash joins to PDB or AlphaFold IDs
- Reference-count signals for popular targets

</td>
<td width="50%" valign="top">

#### 🤖 LLM & Bio AI

- Ground LLM responses in UniProt-authoritative data
- Build RAG indexes for protein chatbots
- Training data for sequence-attribute models
- Validation layers for bio AI agents

</td>
</tr>
</table>

***

### 🔌 Automating UniProt Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

- 🟢 **Node.js.** Install the `apify-client` NPM package.
- 🐍 **Python.** Use the `apify-client` PyPI package.
- 📚 See the [Apify API documentation](https://docs.apify.com/api/v2) for full details.

The [Apify Schedules feature](https://docs.apify.com/platform/schedules) lets you trigger this Actor on any cron interval. UniProt has an eight-week release cycle. Schedule a refresh on the same cadence to stay current.

***

### 🌟 Beyond business use cases

UniProt data feeds far more than commercial pharma. The same structured records support research, education, and open-science work.

<table>
<tr>
<td width="50%">

#### 🎓 Research and academia

- Reproducible proteome datasets for papers
- Coursework on protein annotation and biocuration
- Comparative-genomics theses with structured features
- Open-data benchmarks for sequence-based ML

</td>
<td width="50%">

#### 🎨 Personal and creative

- Hobbyist bioinformatics portfolio projects
- Sci-comm visualizations of protein families
- Personal target tracker for citizen scientists
- Indie tools for amateur synthetic biology

</td>
</tr>
<tr>
<td width="50%">

#### 🤝 Non-profit and civic

- Pandemic preparedness datasets keyed to UniProt
- Public-health reports on pathogen proteomes
- Open-source vaccine candidate research
- Civic transparency on bio-research outputs

</td>
<td width="50%">

#### 🧪 Experimentation

- Train sequence-attribute ML classifiers
- Prototype agents that build target dossiers
- Test bio chatbot grounding against real records
- Benchmark protein-NER models

</td>
</tr>
</table>

***

### 🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

- 💬 [**ChatGPT**](https://chat.openai.com/?q=How%20do%20I%20use%20the%20UniProt%20Protein%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20bioinformatics%20workflow.)
- 🧠 [**Claude**](https://claude.ai/new?q=How%20do%20I%20use%20the%20UniProt%20Protein%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20bioinformatics%20workflow.)
- 🔍 [**Perplexity**](https://perplexity.ai/search?q=How%20do%20I%20use%20the%20UniProt%20Protein%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20bioinformatics%20workflow.)
- 🅒 [**Copilot**](https://copilot.microsoft.com/?q=How%20do%20I%20use%20the%20UniProt%20Protein%20Scraper%20by%20ParseForge%20on%20Apify%3F%20Show%20me%20input%20examples%2C%20output%20fields%2C%20common%20use%20cases%2C%20and%20how%20to%20integrate%20it%20into%20a%20bioinformatics%20workflow.)

***

### ❓ Frequently Asked Questions

#### 🧩 How does it work?

Either supply a UniProt query (`reviewed:true AND organism_id:9606`) or an accession (`P00533`), then click Start. The Actor pages through the UniProt REST API, flattens nested fields, and emits a row per entry with 25 columns including keywords, comments, and features.

#### 🔍 What query syntax can I use?

Everything UniProt supports in its own search bar. Common fields: `reviewed:`, `organism_id:`, `taxonomy_id:`, `gene:`, `keyword:`, `cc_subcellular_location:`, `existence:`, `length:[X TO Y]`, `accession:`, plus boolean `AND`/`OR`/`NOT`. See the [UniProt query fields docs](https://www.uniprot.org/help/query-fields) for the full list.

#### 🆔 How do I look up a single accession?

Set the `accession` field (e.g. `P00533`). It bypasses the query and pulls the full entry directly.

#### 🧬 How do I look up many accessions at once?

Use the query syntax with `OR`: `accession:P00533 OR accession:P04637 OR accession:Q9Y6K8`.

#### 📏 Does it include the full sequence string?

Only when `fetchSequence: true`. Sequence length and molecular weight are always returned. Skip the full string for big proteomes to keep dataset sizes manageable.

#### 🔁 How fresh is the data?

UniProt releases every eight weeks. Every run hits the live API, so output reflects the current release.

#### 📚 What is the difference between Swiss-Prot and TrEMBL?

Swiss-Prot is manually curated (`reviewed:true`, ~570K entries). TrEMBL is automatically annotated (`reviewed:false`, hundreds of millions of entries). Pick the slice your work needs.

#### 🚫 Do I need an API key?

No. The UniProt REST API is free and public.

#### ⏰ Can I schedule recurring runs?

Yes. Use Apify Schedules to refresh on the UniProt release cadence and pipe results into your pipeline.

#### ⚖️ Is this data legal to use?

Yes. UniProt is released under CC BY 4.0. Attribute UniProt in any downstream publication or product, as their license requires.

#### 💳 Do I need a paid Apify plan?

No. The free plan covers small runs (10 records). A paid plan unlocks higher limits and scheduling.

#### 🆘 What if I need help?

Reach out via the contact form below to request a custom protein workflow.

***

### 🔌 Integrate with any app

UniProt Protein Scraper connects to any cloud service via [Apify integrations](https://apify.com/integrations):

- [**Make**](https://docs.apify.com/platform/integrations/make) - Automate multi-step research workflows
- [**Zapier**](https://docs.apify.com/platform/integrations/zapier) - Connect with 5,000+ apps
- [**Slack**](https://docs.apify.com/platform/integrations/slack) - Get release notifications in your channels
- [**Airbyte**](https://docs.apify.com/platform/integrations/airbyte) - Pipe protein records into your warehouse
- [**GitHub**](https://docs.apify.com/platform/integrations/github) - Trigger runs from commits and releases
- [**Google Drive**](https://docs.apify.com/platform/integrations/drive) - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh UniProt entries into your bio pipeline or alert your team in Slack.

***

### 🔗 Recommended Actors

- [**💊 RxNorm Drug Concepts Scraper**](https://apify.com/parseforge/rxnorm-drug-concepts-scraper) - Standardized US drug vocabulary
- [**🏥 ICD-10-CM, LOINC & Clinical Terminology Scraper**](https://apify.com/parseforge/icd10-loinc-clinical-scraper) - Diagnosis, lab, and drug codes
- [**🤗 Hugging Face Model Scraper**](https://apify.com/parseforge/hugging-face-model-scraper) - AI model registry metadata
- [**🛡️ urlscan.io Threat Intelligence Scraper**](https://apify.com/parseforge/urlscan-scraper) - Live web scan data
- [**🌐 RDAP Domain Lookup Scraper**](https://apify.com/parseforge/rdap-domain-lookup-scraper) - Modern WHOIS replacement

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more reference-data scrapers.

***

**🆘 Need Help?** [**Open our contact form**](https://tally.so/r/BzdKgA) to request a new scraper, propose a custom data project, or report an issue.

***

> **⚠️ Disclaimer:** this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by EMBL-EBI, the SIB Swiss Institute of Bioinformatics, the Protein Information Resource (PIR), the UniProt Consortium, or any of their funding agencies. All trademarks mentioned are the property of their respective owners. Only publicly available UniProtKB data is collected. Please cite UniProt as required by their CC BY 4.0 license.

# Actor input Schema

## `query` (type: `string`):

UniProt query syntax. Examples:

- reviewed:true AND organism\_id:9606 — human Swiss-Prot proteins
- keyword:KW-0181 — collagen
- gene:BRCA1 — by gene name
- cc\_subcellular\_location:nucleus — by subcellular location
- existence:1 — evidence at protein level
- taxonomy\_id:10090 AND length:\[100 TO 500] — mouse 100–500aa proteins
  See https://www.uniprot.org/help/query-fields for the full field list. Ignored when an accession is supplied.

## `accession` (type: `string`):

Fetch one specific entry by UniProt accession (e.g. P00533). Bypasses the search query when set.

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## `fetchSequence` (type: `boolean`):

Embed the full amino-acid sequence string in every record. Off by default — the sequence length and molecular weight are always returned regardless.

## `pageSize` (type: `integer`):

Entries per API request (UniProt max 500).

## Actor input object example

```json
{
  "query": "reviewed:true AND organism_id:9606",
  "maxItems": 10,
  "fetchSequence": false,
  "pageSize": 500
}
```

# Actor output Schema

## `overview` (type: `string`):

Overview of scraped data

## `fullData` (type: `string`):

Complete dataset

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "reviewed:true AND organism_id:9606",
    "maxItems": 10,
    "fetchSequence": false,
    "pageSize": 500
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/uniprot-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "reviewed:true AND organism_id:9606",
    "maxItems": 10,
    "fetchSequence": False,
    "pageSize": 500,
}

# Run the Actor and wait for it to finish
run = client.actor("parseforge/uniprot-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "reviewed:true AND organism_id:9606",
  "maxItems": 10,
  "fetchSequence": false,
  "pageSize": 500
}' |
apify call parseforge/uniprot-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/uniprot-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "UniProt Protein Sequence & Annotation Scraper",
        "description": "Export UniProt Knowledgebase entries — search Swiss-Prot by organism, keyword, gene, or any UniProt query, or fetch a single accession. Returns names, genes, organism, sequence length & molecular weight, keywords, comments, features, and PDB/RefSeq/Ensembl/KEGG cross-refs.",
        "version": "0.0",
        "x-build-id": "3NHeBjA8KXZow17hA"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~uniprot-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-uniprot-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~uniprot-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-uniprot-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~uniprot-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-uniprot-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "UniProt Query",
                        "type": "string",
                        "description": "UniProt query syntax. Examples:\n  - reviewed:true AND organism_id:9606 — human Swiss-Prot proteins\n  - keyword:KW-0181 — collagen\n  - gene:BRCA1 — by gene name\n  - cc_subcellular_location:nucleus — by subcellular location\n  - existence:1 — evidence at protein level\n  - taxonomy_id:10090 AND length:[100 TO 500] — mouse 100–500aa proteins\nSee https://www.uniprot.org/help/query-fields for the full field list. Ignored when an accession is supplied."
                    },
                    "accession": {
                        "title": "Accession (single entry)",
                        "type": "string",
                        "description": "Fetch one specific entry by UniProt accession (e.g. P00533). Bypasses the search query when set."
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    },
                    "fetchSequence": {
                        "title": "Include Sequence String",
                        "type": "boolean",
                        "description": "Embed the full amino-acid sequence string in every record. Off by default — the sequence length and molecular weight are always returned regardless.",
                        "default": false
                    },
                    "pageSize": {
                        "title": "Page Size",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Entries per API request (UniProt max 500).",
                        "default": 500
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
