Ensembl Genomics Scraper (Genes, Variants, Sequences) avatar

Ensembl Genomics Scraper (Genes, Variants, Sequences)

Pricing

from $18.00 / 1,000 result items

Go to Apify Store
Ensembl Genomics Scraper (Genes, Variants, Sequences)

Ensembl Genomics Scraper (Genes, Variants, Sequences)

Query the Ensembl genome reference for 200+ species. Look up genes by symbol or stable ID, list features in a genomic region, fetch DNA sequence, or resolve human variants (rsIDs). Returns biotype, coordinates, transcript IDs, descriptions, and assembly metadata.

Pricing

from $18.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

🧬 Ensembl Genomics Scraper

🚀 Export genes, variants, and DNA sequences in seconds. Look up by gene symbol, stable ID, chromosomal region, or human rsID across 20+ species. Returns biotype, coordinates, transcript IDs, sequence, allele frequencies, and assembly metadata.

🕒 Last updated: 2026-05-23 · 📊 30 fields per record · 🧬 20+ species · 🔁 5 modes · 🧫 Ensembl genome reference

The Ensembl Genomics Scraper queries the public Ensembl genome reference, the de facto open browser for vertebrate, model-organism, and select non-vertebrate genomes. It returns up to 30 structured fields per record, including stable ID, display name, object type, biotype, species, chromosome, start, end, strand, assembly, description, canonical transcript, source, logic name, molecule type, sequence length and sequence, variant name, variant class, minor allele and frequency, ancestral allele, allele string, most-severe consequence, mappings, evidence, synonyms, mode, query, and the scrape timestamp.

The catalog spans 20+ reference species including human, mouse, rat, zebrafish, fruit fly, roundworm, baker's yeast, thale cress, chicken, pig, cow, dog, cat, horse, sheep, rhesus macaque, chimpanzee, western clawed frog, medaka, and mosquito. This Actor returns gene lookups, region overlaps, sequence fetches, and human variant resolutions in one run.

🎯 Target Audience💡 Primary Use Cases
Bioinformaticians, pharma research, genetics labs, academic researchers, computational biology students, biotech startups, precision-medicine teamsGene annotation pipelines, variant impact analysis, comparative genomics, target identification, rsID resolution for GWAS, sequence retrieval for primer design

📋 What the Ensembl Genomics Scraper does

Five query workflows in a single Actor:

  • 🧬 Lookup by gene symbol. Resolve BRCA2, TP53, EGFR, etc. to Ensembl stable IDs, coordinates, biotype, and canonical transcript.
  • 🆔 Lookup by stable ID. Pass ENSG00000139618 or ENST00000380152 for any Ensembl-supported species.
  • 🗺️ Overlap region. Return all gene features inside chromosome:start-end (e.g. 7:140424943-140624564).
  • 🧪 Sequence by ID. Fetch the raw DNA, cDNA, or protein sequence for any Ensembl stable ID.
  • 🧬 Variation by rsID. Resolve human dbSNP rsIDs (e.g. rs56116432, rs1042522) to allele frequencies, consequences, and ancestral alleles.

Each record bundles the relevant Ensembl-native fields, the species, the mode used, the original query string, and a collection timestamp.

💡 Why it matters: the Ensembl genome browser is the most widely cited open genome reference in life sciences. Hand-coding a REST client means handling rate limits, schema-per-endpoint quirks, and pagination. This Actor delivers consistent records you can pipe straight into BI tools, notebooks, or pipelines.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"lookupSymbol"One of lookupSymbol, lookupId, overlapRegion, sequence, variation.
speciesstring"homo_sapiens"Ensembl species slug. Used by symbol, region, sequence modes.
symbolsarray["BRCA2","TP53","EGFR","MYC","KRAS"]Gene symbols for lookupSymbol mode.
stableIdsarray[]Ensembl stable IDs for lookupId or sequence.
regionstring""Genomic region chr:start-end for overlapRegion.
rsidsarray[]dbSNP rsIDs for variation mode (human-only).

Example: human cancer-gene panel.

{
"maxItems": 5,
"mode": "lookupSymbol",
"species": "homo_sapiens",
"symbols": ["BRCA1", "BRCA2", "TP53", "EGFR", "KRAS"]
}

Example: all genes overlapping the BRCA2 locus.

{
"maxItems": 100,
"mode": "overlapRegion",
"species": "homo_sapiens",
"region": "13:32315086-32400266"
}

Example: resolve TP53 missense variant rsIDs.

{
"maxItems": 10,
"mode": "variation",
"rsids": ["rs1042522", "rs56116432", "rs17878362"]
}

⚠️ Good to Know: the Ensembl species slug follows the genus_species convention (e.g. homo_sapiens, mus_musculus). The variation mode is human-only (dbSNP). For coordinate-based queries, regions must follow chr:start-end with assembly coordinates matching the current Ensembl release for that species.


📊 Output

Each record contains up to 30 fields depending on the mode. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 stableIdstring"ENSG00000139618"
🏷️ displayNamestring"BRCA2"
🔧 objectTypestring"Gene"
🧬 biotypestring"protein_coding"
🐾 speciesstring"homo_sapiens"
🧭 chromosomestring"13"
▶️ startnumber32315086
⏹️ endnumber32400266
↔️ strandnumber1
📐 assemblyNamestring"GRCh38"
📝 descriptionstring | null"BRCA2 DNA repair associated"
🧾 canonicalTranscriptstring"ENST00000380152.8"
🏛️ sourcestring"ensembl_havana"
🧠 logicNamestring"ensembl_havana_gene_homo_sapiens"
🧪 moleculestring"dna"
📏 sequenceLengthnumber84981
🧬 sequencestring"ATG..."
🆔 variantNamestring"rs1042522"
🏷️ varClassstring"SNP"
🔡 minorAllelestring"C"
📊 minorAlleleFreqnumber0.3401
🌳 ancestralAllelestring"G"
🧬 alleleStringstring"C/G"
⚠️ mostSevereConsequencestring"missense_variant"
🗺️ mappingsarray[ ... ]
🔬 evidencearray["Frequency","1000Genomes"]
🔗 synonymsarray["NM_000546.6:c.215C>G"]
🔧 modestring"lookupSymbol"
🔎 querystring"BRCA2"
🕒 scrapedAtISO 8601"2026-05-23T10:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
🧬20+ reference species. Human, mouse, rat, zebrafish, fly, worm, yeast, arabidopsis, and more.
🔁Five modes in one Actor. Symbol lookup, ID lookup, region overlap, sequence fetch, and variant resolution.
🆔dbSNP rsID resolution. Human variants returned with MAF, ancestral allele, consequence, evidence.
🗺️Region-based queries. Pull all gene features inside any chromosomal interval.
🧪Raw sequence retrieval. DNA, cDNA, or protein, by Ensembl stable ID.
🚫No authentication. Works against the public Ensembl reference. No login or API key needed.
🔁Always fresh. Each run pulls the live reference, reflecting the latest Ensembl release.

📊 The Ensembl reference underpins thousands of life-science publications and GWAS pipelines worldwide.


📈 How it compares to alternatives

ApproachCostCoverageRefreshSetup
⭐ Ensembl Genomics Scraper (this Actor)$5 free credit, then pay-per-use20+ species, 5 modesLive per run⚡ 2 min
Hand-written Ensembl REST clientFree + engineeringSameBuild it yourself🛠️ Hours
Commercial bio-databases$$$$Same + curationReal-time⏳ Procurement
Hard-coded gene tablesFreeOne snapshotManual🐢 Tech debt

Pick this Actor when you want consistent Ensembl records without writing and maintaining a REST client.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the Ensembl Genomics Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a mode, a species, and a query payload (symbols, stable IDs, region, or rsIDs). Set maxItems.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

💊 Pharma & Drug Discovery

  • Target identification by gene panel
  • Variant impact triage in pipelines
  • Comparative genomics across model organisms
  • Pre-clinical species selection workflows

🧪 Clinical Genomics & Diagnostics

  • rsID-to-consequence lookups for GWAS
  • Variant interpretation pipelines
  • Reference gene annotation for sequencing reports
  • Coordinate liftover validation

🌱 Agricultural Genomics

  • Crop and livestock breeding gene catalogs
  • Trait-associated marker discovery
  • Comparative analysis (cow, pig, chicken, sheep)
  • Genome-assembly QC

🧫 Biotech R&D

  • Primer design from raw DNA sequence
  • CRISPR guide design pipelines
  • Synthetic biology target sourcing
  • Orthology mapping across model species

🔌 Automating Ensembl Genomics Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hook a webhook to a Slack channel for alerting when a panel of variants flips consequence in a new Ensembl release.


🌟 Beyond business use cases

Open genome data powers more than commercial R&D. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Reproducible variant interpretation datasets
  • Comparative genomics coursework
  • Open-data thesis projects
  • Cross-species ortholog studies

🎨 Personal and creative

  • Personal-genome interpretation hobby projects
  • Citizen-science genealogy and ancestry tools
  • Educational visualizations of gene structure
  • Bioinformatics learning portfolios

🤝 Non-profit and civic

  • Rare-disease research collectives
  • Patient-advocacy variant dashboards
  • Open biomedical-data initiatives
  • Public-health surveillance pipelines

🧪 Experimentation

  • Train variant-effect prediction models
  • Prototype gene-annotation AI agents
  • Build genomics-aware chatbots
  • Test bioinformatics pipelines on real records

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick a mode, a species, and a query payload. The Actor reads the public Ensembl reference and emits a clean structured record per gene, region, sequence, or variant.

📏 How accurate is the data?

Ensembl is the de facto open genome browser, curated by EMBL-EBI and the Wellcome Sanger Institute. The reference is updated several times per year. For clinical reporting always cross-check against the latest release notes.

🔁 How often is the dataset refreshed?

Ensembl publishes major releases roughly every two months and patch updates more frequently. Every run of this Actor pulls the live reference.

🐾 Which species are supported?

20+ reference species including human, mouse, rat, zebrafish, fruit fly, roundworm, baker's yeast, thale cress, chicken, pig, cow, dog, cat, horse, sheep, rhesus macaque, chimpanzee, western clawed frog, medaka, and mosquito.

🧬 Which variant set is supported?

dbSNP rsIDs for human. Other species are supported for gene lookups, region overlaps, and sequence retrieval.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval. A weekly run is enough to track inter-release annotation drift.

Ensembl is published as open data under standard academic licenses. Commercial use is permitted; check the source for any attribution preferences.

💼 Can I use this data commercially?

Yes. The Ensembl reference is openly licensed for commercial reuse with attribution to EMBL-EBI.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small queries (10 records per run). A paid plan lifts the limit and unlocks scheduling and higher concurrency.

🔧 What if a stable ID is from an older Ensembl release?

The Actor uses the current Ensembl reference. Deprecated stable IDs return an error record with a clear message; use the Ensembl ID history view to resolve to the current ID.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

Ensembl Genomics Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe gene and variant records into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to push fresh variant annotations into a Notion knowledge base or alert on gene-panel changes.


💡 Pro Tip: browse the complete ParseForge collection for more open-science scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Ensembl, EMBL-EBI, the Wellcome Sanger Institute, or NCBI/dbSNP. All trademarks mentioned are the property of their respective owners. Only publicly available open genome reference data is collected.