# NCBI dbSNP Variants Scraper (`parseforge/ncbi-dbsnp-variants-scraper`) Actor

Discover medical and biomedical records from Ncbi Dbsnp Variants with names, identifiers, classifications, descriptions, status and source links. Ideal for healthcare research, pharma teams and clinical analytics. Run on demand or on a recurring schedule and feed every row into your favourite ana.

- **URL**: https://apify.com/parseforge/ncbi-dbsnp-variants-scraper.md
- **Developed by:** [ParseForge](https://apify.com/parseforge) (community)
- **Categories:** Other, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![ParseForge Banner](https://github.com/ParseForge/apify-assets/blob/ad35ccc13ddd068b9d6cba33f323962e39aed5b2/banner.jpg?raw=true)

## 🧬 NCBI dbSNP Variants Scraper

> 🚀 **Pull human SNP variants from NCBI dbSNP in seconds.** rsIDs, chromosome and position, alleles, functional class, gene context, clinical significance and global minor allele frequencies from the official NIH database.

> 🕒 **Last updated:** 2026-05-27 · **📊 22 fields** per record · **1B+ rsIDs** · **Global population frequencies**

NCBI dbSNP is the world's authoritative public catalogue of single-nucleotide variants. This scraper wraps the official E-utilities `esearch` + `esummary` flow and returns a clean, structured table for any gene, condition or rsID query.

Every record carries the rsID, SPDI string, chromosome position, allele, functional class (intron/upstream/exon), gene symbols and Entrez IDs, validation status, clinical significance and global MAFs from 1000Genomes, gnomAD, TOPMED, TOMMO, ALFA and more.

| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Geneticists and bioinformaticians | Pull variant tables for a gene of interest |
| Clinical researchers | Build pathogenic-variant lists for a condition |
| Pharma and biotech | Annotate genotyping panels |
| Academic teams | Run reproducible analyses without flat-file pulls |
| Data engineers | Pipe dbSNP into your variant warehouse |

### 📋 What the NCBI dbSNP Variants Scraper does

- Calls the official E-utilities `esearch` to resolve a gene/condition/rsID query
- Calls `esummary` to fetch full variant metadata
- Returns rsID, SPDI, chromosome, position, alleles, gene context, clinical significance, global MAFs
- Stream-delivers to multiple table outputs

> 💡 **Why it matters:** every clinical and pharmacogenomic analysis starts with annotating variants. dbSNP is the canonical source  -  and this actor makes it queryable from a spreadsheet workflow.

### 🎬 Full Demo (_🚧 Coming soon_)

### ⚙️ Input

<table>
<tr><th>Field</th><th>Type</th><th>Description</th></tr>
<tr><td>query</td><td>string</td><td>Gene symbol, rsID or condition keyword</td></tr>
<tr><td>maxItems</td><td>integer</td><td>Cap on records returned (free plan: 10)</td></tr>
</table>

```json
{ "query": "BRCA1", "maxItems": 25 }
````

```json
{ "query": "rs328", "maxItems": 1 }
```

> ⚠️ **Good to Know:** NCBI E-utilities is rate-limited to 3 requests/sec without an API key. The actor batches IDs into a single `esummary` call to stay well under the limit.

### 📊 Output

<table>
<tr><th>Field</th><th>Description</th></tr>
<tr><td>🆔 rsId</td><td>dbSNP rs identifier</td></tr>
<tr><td>🏷 snpClass</td><td>SNV / insertion / deletion / etc.</td></tr>
<tr><td>🧬 chromosome / position / accession / spdi</td><td>Genomic location</td></tr>
<tr><td>🔡 alleles</td><td>Allele code</td></tr>
<tr><td>📋 functionalClass</td><td>Intron / upstream / exon / etc.</td></tr>
<tr><td>🧪 geneSymbols / geneIds</td><td>Gene context</td></tr>
<tr><td>⚕️ clinicalSignificance</td><td>Benign / pathogenic / etc.</td></tr>
<tr><td>✅ validated</td><td>Validation status</td></tr>
<tr><td>📊 globalMaf / globalMafs</td><td>Global minor allele frequencies</td></tr>
<tr><td>🏷 handle / taxonomyId</td><td>Submitter and species</td></tr>
<tr><td>📅 createDate / updateDate / origBuild / updBuild</td><td>Provenance</td></tr>
<tr><td>📝 hgvs</td><td>HGVS notation</td></tr>
<tr><td>🔗 sourceUrl</td><td>dbSNP page</td></tr>
<tr><td>🕒 scrapedAt</td><td>ISO timestamp</td></tr>
</table>

### ✨ Why choose this Actor

- 🆓 Public NIH/NCBI data, no auth required
- 📡 Direct hit on the official E-utilities API
- 🧬 Returns global MAFs from 25+ populations
- 🧰 Clean field names  -  no feed parsing
- 📦 Pull as multiple table outputs

### 📈 How it compares to alternatives

| Approach | Cost | Coverage | Setup time |
|---|---|---|---|
| Manual VCF pulls from NCBI FTP | Free | Bulk only | Hours |
| Direct E-utilities calls | Free | Full | Code required |
| ParseForge dbSNP Scraper | Pay-per-result | Full + structured | Minutes |

### 🚀 How to use

1. [Create a free Apify account](https://console.apify.com/sign-up?fpr=vmoqkp) (includes $5 credit).
2. Open the NCBI dbSNP Variants Scraper.
3. Set `query` (gene symbol, rsID or condition).
4. Click **Start** and use multiple table outputs.
5. Schedule or trigger from your bioinformatics pipeline.

### 💼 Business use cases

**Pharmacogenomics**  -  annotate a drug-response panel with current dbSNP records.

**Clinical decision support**  -  pull pathogenic variants for a condition.

**Genotyping QC**  -  verify variant annotations match the live dbSNP record.

**Variant database curation**  -  keep your internal warehouse in sync with NCBI updates.

### 🔌 Automating NCBI dbSNP Variants Scraper

Hook into Make, Zapier, n8n, Airbyte, Pipedream, Slack, GitHub Actions or any HTTP webhook.

### 🌟 Beyond business use cases

- **Research:** explore the global frequency of a candidate variant.
- **Personal:** annotate your own genotyping report from 23andMe / Ancestry.
- **Non-profit:** support rare-disease variant research.
- **Experimentation:** train ML models on annotated variant tables.

### 🤖 Ask an AI assistant about this scraper

Ask ChatGPT, Claude, Perplexity or Copilot: "How do I pull every pathogenic BRCA1 variant from NCBI dbSNP using the ParseForge Apify actor?"

### ❓ Frequently Asked Questions

**Do I need an NCBI API key?**
No, but providing one increases your rate limit to 10 req/sec. The actor uses unauthenticated mode by default.

**Is dbSNP human only?**
Currently human-focused. Other species are available via the same E-utilities pattern but with different `taxonomyId`.

**Can I query by rsID directly?**
Yes  -  set `query` to `rs328` or `328`.

**Are clinical-significance annotations from ClinVar?**
dbSNP propagates ClinVar annotations into the summary. Always verify in ClinVar for clinical use.

**What's SPDI?**
The Sequence-Position-Deletion-Insertion notation: NCBI's modern standard for variant representation.

**How fresh is dbSNP?**
dbSNP releases new builds periodically. The actor returns whatever the live API serves.

**Can I get VCF output?**
This actor produces a tabular summary. Combine with NCBI's VCF use for raw genotyping.

**Is the actor rate-limited?**
The actor stays under 3 req/sec to comply with E-utilities limits.

**Are alternate assemblies supported?**
The actor returns the canonical GRCh38 position. Other assemblies require manual liftover.

**Can I batch thousands of rsIDs?**
Yes  -  set `maxItems` accordingly. The actor batches into a single `esummary` call.

### 🔌 Integrate with any app

Apify, Make, Zapier, n8n, Pipedream, Slack, Airbyte, GitHub, Google Drive, Power Automate, AWS Lambda, REST webhook.

### 🔗 Recommended Actors

| Actor | What it does |
|---|---|
| [OpenAlex Institutions Scraper](https://apify.com/parseforge/openalex-institutions-scraper) | Global research institutions |
| [EU Clinical Trials Register Scraper](https://apify.com/parseforge/eu-clinical-trials-register-scraper) | Clinical trial records |
| [NHTSA Vehicle Complaints Scraper](https://apify.com/parseforge/nhtsa-vehicle-complaints-scraper) | US vehicle complaint data |

> 💡 **Pro Tip:** browse the complete [ParseForge collection](https://apify.com/parseforge) for more government and research data scrapers.

**🆘 Need Help?** [Open our contact form](https://tally.so/r/BzdKgA)

> **⚠️ Disclaimer:** independent tool, not affiliated with NCBI or NIH. Only publicly available open data is collected.

# Actor input Schema

## `query` (type: `string`):

Free-text query against dbSNP (gene symbol, rsID, condition). Example: BRCA1, APOE, sickle cell.

## `maxItems` (type: `integer`):

Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000

## Actor input object example

```json
{
  "query": "BRCA1",
  "maxItems": 10
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "maxItems": 10
};

// Run the Actor and wait for it to finish
const run = await client.actor("parseforge/ncbi-dbsnp-variants-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "maxItems": 10 }

# Run the Actor and wait for it to finish
run = client.actor("parseforge/ncbi-dbsnp-variants-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "maxItems": 10
}' |
apify call parseforge/ncbi-dbsnp-variants-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=parseforge/ncbi-dbsnp-variants-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "NCBI dbSNP Variants Scraper",
        "description": "Discover medical and biomedical records from Ncbi Dbsnp Variants with names, identifiers, classifications, descriptions, status and source links. Ideal for healthcare research, pharma teams and clinical analytics. Run on demand or on a recurring schedule and feed every row into your favourite ana.",
        "version": "0.1",
        "x-build-id": "ODxkqvFpItKlMEFKm"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/parseforge~ncbi-dbsnp-variants-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-parseforge-ncbi-dbsnp-variants-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/parseforge~ncbi-dbsnp-variants-scraper/runs": {
            "post": {
                "operationId": "runs-sync-parseforge-ncbi-dbsnp-variants-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/parseforge~ncbi-dbsnp-variants-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-parseforge-ncbi-dbsnp-variants-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "query"
                ],
                "properties": {
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Free-text query against dbSNP (gene symbol, rsID, condition). Example: BRCA1, APOE, sickle cell.",
                        "default": "BRCA1"
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Free users: Limited to 10 items (preview). Paid users: Optional, max 1,000,000"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
