Ensembl Genomics Scraper (Genes, Variants, Sequences)
Pricing
from $18.00 / 1,000 result items
Ensembl Genomics Scraper (Genes, Variants, Sequences)
Query the Ensembl genome reference for 200+ species. Look up genes by symbol or stable ID, list features in a genomic region, fetch DNA sequence, or resolve human variants (rsIDs). Returns biotype, coordinates, transcript IDs, descriptions, and assembly metadata.
Pricing
from $18.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

🧬 Ensembl Genomics Scraper
🚀 Export genes, variants, and DNA sequences in seconds. Look up by gene symbol, stable ID, chromosomal region, or human rsID across 20+ species. Returns biotype, coordinates, transcript IDs, sequence, allele frequencies, and assembly metadata.
🕒 Last updated: 2026-05-23 · 📊 30 fields per record · 🧬 20+ species · 🔁 5 modes · 🧫 Ensembl genome reference
The Ensembl Genomics Scraper queries the public Ensembl genome reference, the de facto open browser for vertebrate, model-organism, and select non-vertebrate genomes. It returns up to 30 structured fields per record, including stable ID, display name, object type, biotype, species, chromosome, start, end, strand, assembly, description, canonical transcript, source, logic name, molecule type, sequence length and sequence, variant name, variant class, minor allele and frequency, ancestral allele, allele string, most-severe consequence, mappings, evidence, synonyms, mode, query, and the scrape timestamp.
The catalog spans 20+ reference species including human, mouse, rat, zebrafish, fruit fly, roundworm, baker's yeast, thale cress, chicken, pig, cow, dog, cat, horse, sheep, rhesus macaque, chimpanzee, western clawed frog, medaka, and mosquito. This Actor returns gene lookups, region overlaps, sequence fetches, and human variant resolutions in one run.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Bioinformaticians, pharma research, genetics labs, academic researchers, computational biology students, biotech startups, precision-medicine teams | Gene annotation pipelines, variant impact analysis, comparative genomics, target identification, rsID resolution for GWAS, sequence retrieval for primer design |
📋 What the Ensembl Genomics Scraper does
Five query workflows in a single Actor:
- 🧬 Lookup by gene symbol. Resolve
BRCA2,TP53,EGFR, etc. to Ensembl stable IDs, coordinates, biotype, and canonical transcript. - 🆔 Lookup by stable ID. Pass
ENSG00000139618orENST00000380152for any Ensembl-supported species. - 🗺️ Overlap region. Return all gene features inside
chromosome:start-end(e.g.7:140424943-140624564). - 🧪 Sequence by ID. Fetch the raw DNA, cDNA, or protein sequence for any Ensembl stable ID.
- 🧬 Variation by rsID. Resolve human dbSNP rsIDs (e.g.
rs56116432,rs1042522) to allele frequencies, consequences, and ancestral alleles.
Each record bundles the relevant Ensembl-native fields, the species, the mode used, the original query string, and a collection timestamp.
💡 Why it matters: the Ensembl genome browser is the most widely cited open genome reference in life sciences. Hand-coding a REST client means handling rate limits, schema-per-endpoint quirks, and pagination. This Actor delivers consistent records you can pipe straight into BI tools, notebooks, or pipelines.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
| maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
| mode | string | "lookupSymbol" | One of lookupSymbol, lookupId, overlapRegion, sequence, variation. |
| species | string | "homo_sapiens" | Ensembl species slug. Used by symbol, region, sequence modes. |
| symbols | array | ["BRCA2","TP53","EGFR","MYC","KRAS"] | Gene symbols for lookupSymbol mode. |
| stableIds | array | [] | Ensembl stable IDs for lookupId or sequence. |
| region | string | "" | Genomic region chr:start-end for overlapRegion. |
| rsids | array | [] | dbSNP rsIDs for variation mode (human-only). |
Example: human cancer-gene panel.
{"maxItems": 5,"mode": "lookupSymbol","species": "homo_sapiens","symbols": ["BRCA1", "BRCA2", "TP53", "EGFR", "KRAS"]}
Example: all genes overlapping the BRCA2 locus.
{"maxItems": 100,"mode": "overlapRegion","species": "homo_sapiens","region": "13:32315086-32400266"}
Example: resolve TP53 missense variant rsIDs.
{"maxItems": 10,"mode": "variation","rsids": ["rs1042522", "rs56116432", "rs17878362"]}
⚠️ Good to Know: the Ensembl species slug follows the
genus_speciesconvention (e.g.homo_sapiens,mus_musculus). Thevariationmode is human-only (dbSNP). For coordinate-based queries, regions must followchr:start-endwith assembly coordinates matching the current Ensembl release for that species.
📊 Output
Each record contains up to 30 fields depending on the mode. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 stableId | string | "ENSG00000139618" |
🏷️ displayName | string | "BRCA2" |
🔧 objectType | string | "Gene" |
🧬 biotype | string | "protein_coding" |
🐾 species | string | "homo_sapiens" |
🧭 chromosome | string | "13" |
▶️ start | number | 32315086 |
⏹️ end | number | 32400266 |
↔️ strand | number | 1 |
📐 assemblyName | string | "GRCh38" |
📝 description | string | null | "BRCA2 DNA repair associated" |
🧾 canonicalTranscript | string | "ENST00000380152.8" |
🏛️ source | string | "ensembl_havana" |
🧠 logicName | string | "ensembl_havana_gene_homo_sapiens" |
🧪 molecule | string | "dna" |
📏 sequenceLength | number | 84981 |
🧬 sequence | string | "ATG..." |
🆔 variantName | string | "rs1042522" |
🏷️ varClass | string | "SNP" |
🔡 minorAllele | string | "C" |
📊 minorAlleleFreq | number | 0.3401 |
🌳 ancestralAllele | string | "G" |
🧬 alleleString | string | "C/G" |
⚠️ mostSevereConsequence | string | "missense_variant" |
🗺️ mappings | array | [ ... ] |
🔬 evidence | array | ["Frequency","1000Genomes"] |
🔗 synonyms | array | ["NM_000546.6:c.215C>G"] |
🔧 mode | string | "lookupSymbol" |
🔎 query | string | "BRCA2" |
🕒 scrapedAt | ISO 8601 | "2026-05-23T10:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 🧬 | 20+ reference species. Human, mouse, rat, zebrafish, fly, worm, yeast, arabidopsis, and more. |
| 🔁 | Five modes in one Actor. Symbol lookup, ID lookup, region overlap, sequence fetch, and variant resolution. |
| 🆔 | dbSNP rsID resolution. Human variants returned with MAF, ancestral allele, consequence, evidence. |
| 🗺️ | Region-based queries. Pull all gene features inside any chromosomal interval. |
| 🧪 | Raw sequence retrieval. DNA, cDNA, or protein, by Ensembl stable ID. |
| 🚫 | No authentication. Works against the public Ensembl reference. No login or API key needed. |
| 🔁 | Always fresh. Each run pulls the live reference, reflecting the latest Ensembl release. |
📊 The Ensembl reference underpins thousands of life-science publications and GWAS pipelines worldwide.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Setup |
|---|---|---|---|---|
| ⭐ Ensembl Genomics Scraper (this Actor) | $5 free credit, then pay-per-use | 20+ species, 5 modes | Live per run | ⚡ 2 min |
| Hand-written Ensembl REST client | Free + engineering | Same | Build it yourself | 🛠️ Hours |
| Commercial bio-databases | $$$$ | Same + curation | Real-time | ⏳ Procurement |
| Hard-coded gene tables | Free | One snapshot | Manual | 🐢 Tech debt |
Pick this Actor when you want consistent Ensembl records without writing and maintaining a REST client.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the Ensembl Genomics Scraper page on the Apify Store.
- 🎯 Set input. Pick a mode, a species, and a query payload (symbols, stable IDs, region, or rsIDs). Set
maxItems. - 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating Ensembl Genomics Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Hook a webhook to a Slack channel for alerting when a panel of variants flips consequence in a new Ensembl release.
🌟 Beyond business use cases
Open genome data powers more than commercial R&D. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Pick a mode, a species, and a query payload. The Actor reads the public Ensembl reference and emits a clean structured record per gene, region, sequence, or variant.
📏 How accurate is the data?
Ensembl is the de facto open genome browser, curated by EMBL-EBI and the Wellcome Sanger Institute. The reference is updated several times per year. For clinical reporting always cross-check against the latest release notes.
🔁 How often is the dataset refreshed?
Ensembl publishes major releases roughly every two months and patch updates more frequently. Every run of this Actor pulls the live reference.
🐾 Which species are supported?
20+ reference species including human, mouse, rat, zebrafish, fruit fly, roundworm, baker's yeast, thale cress, chicken, pig, cow, dog, cat, horse, sheep, rhesus macaque, chimpanzee, western clawed frog, medaka, and mosquito.
🧬 Which variant set is supported?
dbSNP rsIDs for human. Other species are supported for gene lookups, region overlaps, and sequence retrieval.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to run this Actor on any cron interval. A weekly run is enough to track inter-release annotation drift.
⚖️ Is this data legal to use?
Ensembl is published as open data under standard academic licenses. Commercial use is permitted; check the source for any attribution preferences.
💼 Can I use this data commercially?
Yes. The Ensembl reference is openly licensed for commercial reuse with attribution to EMBL-EBI.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small queries (10 records per run). A paid plan lifts the limit and unlocks scheduling and higher concurrency.
🔧 What if a stable ID is from an older Ensembl release?
The Actor uses the current Ensembl reference. Deprecated stable IDs return an error record with a clear message; use the Ensembl ID history view to resolve to the current ID.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
Ensembl Genomics Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe gene and variant records into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to push fresh variant annotations into a Notion knowledge base or alert on gene-panel changes.
🔗 Recommended Actors
- 🧪 KEGG Pathways Scraper - Biochemical pathways and orthologies
- 📚 ArXiv Scraper - Pre-print research papers
- 🔬 Figshare Scraper - Open scientific datasets and supplementary files
- 🧬 ClinicalTrials.gov Scraper - U.S. clinical trial registry
- 📊 GBIF Biodiversity Scraper - Global biodiversity occurrence records
💡 Pro Tip: browse the complete ParseForge collection for more open-science scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Ensembl, EMBL-EBI, the Wellcome Sanger Institute, or NCBI/dbSNP. All trademarks mentioned are the property of their respective owners. Only publicly available open genome reference data is collected.