HGNC Gene Symbols Scraper
Pricing
from $15.00 / 1,000 result items
HGNC Gene Symbols Scraper
Query the HUGO Gene Nomenclature Committee database for approved human gene symbols, names, aliases, chromosomal location, gene family, RefSeq, Ensembl, OMIM, UniProt, and external links. Export to JSON, CSV, or Excel for bioinformatics, genomics research, and pharmaceutical pipelines.
Pricing
from $15.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

🧬 HGNC Gene Symbols Scraper
🚀 Export approved human gene symbols in seconds. Pull 43,000+ HGNC-approved gene records with cross-references to Ensembl, Entrez, UniProt, OMIM, and PubMed. No API key, no registration, no manual nomenclature lookups.
🕒 Last updated: 2026-05-23 · 📊 27 fields per record · 🧬 43,000+ genes · 🔗 9 cross-references · 🌍 HUGO canonical
The HGNC Gene Symbols Scraper exports records from the HUGO Gene Nomenclature Committee, the official authority for assigning unique human gene symbols and names. Each record carries 27 fields including approved symbol, full name, chromosomal location, aliases, previous symbols, gene group, status, and cross-references to Ensembl, Entrez, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. HGNC nomenclature underpins virtually every modern human-genetics database and clinical-genomics pipeline.
Coverage spans 43,000+ approved gene symbols plus thousands of pseudogenes, withdrawn symbols, and reserved names. This Actor turns lookup-by-symbol, lookup-by-ID, and search-by-keyword into one-step exports as CSV, Excel, JSON, or XML.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Bioinformatics teams, clinical-genomics labs, pharma R&D, computational biologists, science writers, EHR vendors | Variant interpretation, gene-panel design, cross-DB joins, symbol normalization, literature mining, omics pipeline annotation |
📋 What the HGNC Scraper does
Five lookup modes in a single run:
- 🔤 Symbol lookup. Resolve approved symbols like
BRCA1,TP53,EGFR,MYC,AKT1. - 🆔 HGNC ID lookup. Resolve canonical HGNC IDs like
1100orHGNC:1100. - 🔗 Entrez Gene ID lookup. Cross-reference NCBI Entrez IDs back to HGNC records.
- 🧪 UniProt accession lookup. Map protein accessions like
P38398to gene records. - 🔍 Free-text search. Query across symbols, names, aliases, and previous names.
Each record includes chromosomal location, locus type and group, alias and previous symbols, gene-family group, status, approval date, last-modified timestamp, and the complete cross-reference panel.
💡 Why it matters: symbol nomenclature drifts. A gene approved as
MLLin 2010 is nowKMT2A. Pipelines and clinical reports that miss the update silently lose joins. This Actor returns the canonical, current HGNC record on every lookup so your annotations stay correct.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to resolve a panel of symbols into a downloadable cross-reference table.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
| maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
| mode | string | "fetchBySymbol" | One of searchQuery, fetchBySymbol, fetchByHgncId, fetchByEntrezId, fetchByUniprot. |
| values | array | ["BRCA1", "TP53", "EGFR", "MYC", "AKT1"] | Symbols, IDs, or search terms. One lookup per entry. |
Example: resolve a panel of cancer genes by approved symbol.
{"maxItems": 25,"mode": "fetchBySymbol","values": ["BRCA1", "BRCA2", "TP53", "EGFR", "KRAS", "MYC", "PTEN", "APC", "RB1", "NF1"]}
Example: map UniProt accessions back to HGNC records.
{"maxItems": 5,"mode": "fetchByUniprot","values": ["P38398", "P04637", "P01133"]}
⚠️ Good to Know: HGNC assigns symbols for human genes only. Mouse and rat orthologs are linked via MGD and RGD cross-references inside each record, but rodent-only symbols are not in scope.
📊 Output
Each gene record contains 27 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 hgncId | string | "HGNC:1100" |
🔤 symbol | string | "BRCA1" |
📛 name | string | "BRCA1 DNA repair associated" |
🧬 locusType | string | "gene with protein product" |
🧪 locusGroup | string | "protein-coding gene" |
📍 location | string | "17q21.31" |
🔁 aliasSymbol | array | ["BRCC1", "PNCA4"] |
🏷️ aliasName | array | ["Breast cancer 1, early onset"] |
⏪ prevSymbol | array | ["BRCAI"] |
⏪ prevName | array | [] |
👥 geneGroup | array | ["Ring finger proteins", "BRCT domain containing"] |
🔗 entrezId | string | "672" |
🔗 ensemblGeneId | string | "ENSG00000012048" |
🔗 ucscId | string | "uc002ict.5" |
🔗 refseqAccession | array | ["NM_007294"] |
🧪 uniprotIds | array | ["P38398"] |
📚 omimId | array | ["113705"] |
📚 pubmedId | array | ["2270482", "8554067"] |
🐭 mgdId | array | ["MGI:104537"] |
🐀 rgdId | array | ["RGD:2218"] |
🧬 ccdsId | array | ["CCDS11456"] |
🔗 vegaId | string | null | "OTTHUMG00000157426" |
✅ status | string | "Approved" |
📅 dateApprovedReserved | string | "1989-06-30" |
📅 dateModified | string | "2024-09-12" |
🗃️ raw | object | Full HGNC payload for that record |
🕒 scrapedAt | ISO 8601 | "2026-05-23T00:00:00.000Z" |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 🧬 | Canonical nomenclature. HUGO-approved symbols and names backed by 30+ years of curation. |
| 🔗 | Nine cross-references per record. Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, Vega. |
| 🔤 | Five lookup modes. Symbol, HGNC ID, Entrez ID, UniProt accession, free-text search. |
| ⏪ | Aliases and previous symbols. Resolve historical names like MLL to current KMT2A automatically. |
| ⚡ | Fast. 10 lookups in seconds, hundreds in under a minute. |
| 🔁 | Always fresh. Pulls live HGNC records so updates appear on the next run. |
| 🚫 | No authentication. Works against the public HGNC data feed. No login or key needed. |
📊 Gene symbol consistency is one of the most under-appreciated quality signals in modern genomics. This Actor makes it trivial to enforce.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Lookups | Setup |
|---|---|---|---|---|---|
| ⭐ HGNC Scraper (this Actor) | $5 free credit, then pay-per-use | 43,000+ human genes | Live per run | symbol, ID, Entrez, UniProt, search | ⚡ 2 min |
| Manual HGNC web search | Free | Full | Live | One at a time | 🐢 Per-row |
| Bulk file downloads | Free | Full snapshot | Quarterly | Local parsing | ⏳ Hours |
| Generic biomedical APIs | Varies | Mixed | Mixed | Often paid | 🕒 Variable |
Pick this Actor when you want HGNC records on demand without bulk downloads or per-row clicks.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the HGNC Gene Symbols Scraper page on the Apify Store.
- 🎯 Set input. Choose a mode, paste your symbols or IDs into the
valueslist, and setmaxItems. - 🚀 Run it. Click Start and let the Actor resolve every lookup.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating HGNC Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep clinical and research databases aligned with HGNC updates automatically.
🌟 Beyond business use cases
Authoritative gene nomenclature has reach well beyond commercial pipelines. The same records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Pick a lookup mode, paste your symbols or IDs into the values list, and the Actor resolves each one against HGNC and emits a clean structured record. No browser automation, no captchas, no setup.
📏 How accurate are the symbols?
HGNC is the canonical authority for human gene nomenclature. Every approved symbol is reviewed and assigned by HUGO curators. Status flags (Approved, Entry Withdrawn, Symbol Withdrawn) are surfaced on every record so you always know what you have.
🔁 How often is the dataset refreshed?
HGNC updates its records continuously as new symbols are approved and existing ones are reviewed. Every run of this Actor fetches live data.
🔗 Which cross-references are included?
Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. Not every cross-reference is populated for every gene; the field is an empty array when HGNC has no mapping.
⏪ Can it resolve old symbols?
Yes. Run free-text search with the obsolete symbol (for example MLL) and the Actor returns the current approved record (KMT2A) along with the alias and previous-symbol arrays.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep your annotation database in sync with HGNC releases.
⚖️ Is this data legal to use?
HGNC data is publicly available and widely cited. Standard scholarly attribution applies; commercial pipelines and clinical tools have been using HGNC nomenclature for decades.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small panels (10 records per run). A paid plan lifts the limit for full panel resolution and scheduling.
🔁 What happens if a run fails or gets interrupted?
Apify automatically retries transient errors. If a run still fails, inspect the log, fix the input, and re-run. Partial datasets from failed runs are preserved.
🐭 Does it return mouse or rat genes?
No, HGNC covers human genes only. The MGD and RGD ID fields cross-reference the rodent equivalents so you can follow up in those databases.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
HGNC Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe gene records into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh HGNC records into your annotation database or alert your team in Slack on symbol updates.
🔗 Recommended Actors
- 🩺 ClinicalTrials.gov Scraper - Registered clinical trials worldwide
- 📖 arXiv Scraper - Open-access preprints across science
- 🧪 OSF Scraper - Open Science Framework projects and registrations
- 📊 Figshare Scraper - Research data and figures with DOIs
- 🌍 GBIF Biodiversity Scraper - Global biodiversity occurrence records
💡 Pro Tip: browse the complete ParseForge collection for more biomedical and reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by HGNC, HUGO, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available gene nomenclature data is collected.