HGNC Gene Symbols Scraper avatar

HGNC Gene Symbols Scraper

Pricing

from $15.00 / 1,000 result items

Go to Apify Store
HGNC Gene Symbols Scraper

HGNC Gene Symbols Scraper

Query the HUGO Gene Nomenclature Committee database for approved human gene symbols, names, aliases, chromosomal location, gene family, RefSeq, Ensembl, OMIM, UniProt, and external links. Export to JSON, CSV, or Excel for bioinformatics, genomics research, and pharmaceutical pipelines.

Pricing

from $15.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

🧬 HGNC Gene Symbols Scraper

🚀 Export approved human gene symbols in seconds. Pull 43,000+ HGNC-approved gene records with cross-references to Ensembl, Entrez, UniProt, OMIM, and PubMed. No API key, no registration, no manual nomenclature lookups.

🕒 Last updated: 2026-05-23 · 📊 27 fields per record · 🧬 43,000+ genes · 🔗 9 cross-references · 🌍 HUGO canonical

The HGNC Gene Symbols Scraper exports records from the HUGO Gene Nomenclature Committee, the official authority for assigning unique human gene symbols and names. Each record carries 27 fields including approved symbol, full name, chromosomal location, aliases, previous symbols, gene group, status, and cross-references to Ensembl, Entrez, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. HGNC nomenclature underpins virtually every modern human-genetics database and clinical-genomics pipeline.

Coverage spans 43,000+ approved gene symbols plus thousands of pseudogenes, withdrawn symbols, and reserved names. This Actor turns lookup-by-symbol, lookup-by-ID, and search-by-keyword into one-step exports as CSV, Excel, JSON, or XML.

🎯 Target Audience💡 Primary Use Cases
Bioinformatics teams, clinical-genomics labs, pharma R&D, computational biologists, science writers, EHR vendorsVariant interpretation, gene-panel design, cross-DB joins, symbol normalization, literature mining, omics pipeline annotation

📋 What the HGNC Scraper does

Five lookup modes in a single run:

  • 🔤 Symbol lookup. Resolve approved symbols like BRCA1, TP53, EGFR, MYC, AKT1.
  • 🆔 HGNC ID lookup. Resolve canonical HGNC IDs like 1100 or HGNC:1100.
  • 🔗 Entrez Gene ID lookup. Cross-reference NCBI Entrez IDs back to HGNC records.
  • 🧪 UniProt accession lookup. Map protein accessions like P38398 to gene records.
  • 🔍 Free-text search. Query across symbols, names, aliases, and previous names.

Each record includes chromosomal location, locus type and group, alias and previous symbols, gene-family group, status, approval date, last-modified timestamp, and the complete cross-reference panel.

💡 Why it matters: symbol nomenclature drifts. A gene approved as MLL in 2010 is now KMT2A. Pipelines and clinical reports that miss the update silently lose joins. This Actor returns the canonical, current HGNC record on every lookup so your annotations stay correct.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to resolve a panel of symbols into a downloadable cross-reference table.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"fetchBySymbol"One of searchQuery, fetchBySymbol, fetchByHgncId, fetchByEntrezId, fetchByUniprot.
valuesarray["BRCA1", "TP53", "EGFR", "MYC", "AKT1"]Symbols, IDs, or search terms. One lookup per entry.

Example: resolve a panel of cancer genes by approved symbol.

{
"maxItems": 25,
"mode": "fetchBySymbol",
"values": ["BRCA1", "BRCA2", "TP53", "EGFR", "KRAS", "MYC", "PTEN", "APC", "RB1", "NF1"]
}

Example: map UniProt accessions back to HGNC records.

{
"maxItems": 5,
"mode": "fetchByUniprot",
"values": ["P38398", "P04637", "P01133"]
}

⚠️ Good to Know: HGNC assigns symbols for human genes only. Mouse and rat orthologs are linked via MGD and RGD cross-references inside each record, but rodent-only symbols are not in scope.


📊 Output

Each gene record contains 27 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 hgncIdstring"HGNC:1100"
🔤 symbolstring"BRCA1"
📛 namestring"BRCA1 DNA repair associated"
🧬 locusTypestring"gene with protein product"
🧪 locusGroupstring"protein-coding gene"
📍 locationstring"17q21.31"
🔁 aliasSymbolarray["BRCC1", "PNCA4"]
🏷️ aliasNamearray["Breast cancer 1, early onset"]
prevSymbolarray["BRCAI"]
prevNamearray[]
👥 geneGrouparray["Ring finger proteins", "BRCT domain containing"]
🔗 entrezIdstring"672"
🔗 ensemblGeneIdstring"ENSG00000012048"
🔗 ucscIdstring"uc002ict.5"
🔗 refseqAccessionarray["NM_007294"]
🧪 uniprotIdsarray["P38398"]
📚 omimIdarray["113705"]
📚 pubmedIdarray["2270482", "8554067"]
🐭 mgdIdarray["MGI:104537"]
🐀 rgdIdarray["RGD:2218"]
🧬 ccdsIdarray["CCDS11456"]
🔗 vegaIdstring | null"OTTHUMG00000157426"
statusstring"Approved"
📅 dateApprovedReservedstring"1989-06-30"
📅 dateModifiedstring"2024-09-12"
🗃️ rawobjectFull HGNC payload for that record
🕒 scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

📦 Sample records


✨ Why choose this Actor

Capability
🧬Canonical nomenclature. HUGO-approved symbols and names backed by 30+ years of curation.
🔗Nine cross-references per record. Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, Vega.
🔤Five lookup modes. Symbol, HGNC ID, Entrez ID, UniProt accession, free-text search.
Aliases and previous symbols. Resolve historical names like MLL to current KMT2A automatically.
Fast. 10 lookups in seconds, hundreds in under a minute.
🔁Always fresh. Pulls live HGNC records so updates appear on the next run.
🚫No authentication. Works against the public HGNC data feed. No login or key needed.

📊 Gene symbol consistency is one of the most under-appreciated quality signals in modern genomics. This Actor makes it trivial to enforce.


📈 How it compares to alternatives

ApproachCostCoverageRefreshLookupsSetup
⭐ HGNC Scraper (this Actor)$5 free credit, then pay-per-use43,000+ human genesLive per runsymbol, ID, Entrez, UniProt, search⚡ 2 min
Manual HGNC web searchFreeFullLiveOne at a time🐢 Per-row
Bulk file downloadsFreeFull snapshotQuarterlyLocal parsing⏳ Hours
Generic biomedical APIsVariesMixedMixedOften paid🕒 Variable

Pick this Actor when you want HGNC records on demand without bulk downloads or per-row clicks.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the HGNC Gene Symbols Scraper page on the Apify Store.
  3. 🎯 Set input. Choose a mode, paste your symbols or IDs into the values list, and set maxItems.
  4. 🚀 Run it. Click Start and let the Actor resolve every lookup.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

🧪 Clinical & Translational Genomics

  • Variant interpretation pipelines with canonical symbols
  • Gene-panel design and QA for diagnostics
  • EHR and lab-report normalization
  • Cross-DB joins from Entrez or UniProt back to HGNC

💊 Pharma & Biotech R&D

  • Target-list curation against canonical nomenclature
  • Drug-target literature mining via PubMed IDs
  • Multi-source omics annotation with stable IDs
  • Patent and FDA filing nomenclature checks

🧮 Bioinformatics Pipelines

  • Symbol-history normalization for legacy datasets
  • RNA-seq and microarray probe-to-gene mapping
  • Cross-species ortholog joins via MGD and RGD
  • Pre-flight QA on submitted FASTA/GFF annotations

📰 Science Communication & EdTech

  • Up-to-date gene cards for popular-science articles
  • Interactive teaching tools with live HGNC data
  • Database front-ends for medical education
  • Symbol lookup widgets for science journalism

🔌 Automating HGNC Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep clinical and research databases aligned with HGNC updates automatically.


🌟 Beyond business use cases

Authoritative gene nomenclature has reach well beyond commercial pipelines. The same records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Reproducible variant calls for peer-reviewed studies
  • Class assignments on gene-naming conventions
  • Cross-DB join exercises for bioinformatics courses
  • Citation-friendly snapshots of canonical records

🎨 Personal and creative

  • 23andMe and consumer-genomics result decoding
  • Custom Anki decks for med-school revision
  • Hobbyist family-history disease research
  • Indie biotech newsletter content automation

🤝 Non-profit and civic

  • Rare-disease patient-advocacy gene factsheets
  • Public health surveillance with canonical symbols
  • Open-data biology curriculum for high schools
  • Grant-proposal supporting evidence with stable IDs

🧪 Experimentation

  • Train LLMs on canonical biomedical vocabulary
  • Build agentic tools that resolve symbol drift live
  • Prototype knowledge graphs with HGNC as the spine
  • Validate gene-prediction models against ground truth

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick a lookup mode, paste your symbols or IDs into the values list, and the Actor resolves each one against HGNC and emits a clean structured record. No browser automation, no captchas, no setup.

📏 How accurate are the symbols?

HGNC is the canonical authority for human gene nomenclature. Every approved symbol is reviewed and assigned by HUGO curators. Status flags (Approved, Entry Withdrawn, Symbol Withdrawn) are surfaced on every record so you always know what you have.

🔁 How often is the dataset refreshed?

HGNC updates its records continuously as new symbols are approved and existing ones are reviewed. Every run of this Actor fetches live data.

🔗 Which cross-references are included?

Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. Not every cross-reference is populated for every gene; the field is an empty array when HGNC has no mapping.

⏪ Can it resolve old symbols?

Yes. Run free-text search with the obsolete symbol (for example MLL) and the Actor returns the current approved record (KMT2A) along with the alias and previous-symbol arrays.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep your annotation database in sync with HGNC releases.

HGNC data is publicly available and widely cited. Standard scholarly attribution applies; commercial pipelines and clinical tools have been using HGNC nomenclature for decades.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small panels (10 records per run). A paid plan lifts the limit for full panel resolution and scheduling.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, inspect the log, fix the input, and re-run. Partial datasets from failed runs are preserved.

🐭 Does it return mouse or rat genes?

No, HGNC covers human genes only. The MGD and RGD ID fields cross-reference the rodent equivalents so you can follow up in those databases.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

HGNC Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe gene records into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh HGNC records into your annotation database or alert your team in Slack on symbol updates.


💡 Pro Tip: browse the complete ParseForge collection for more biomedical and reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by HGNC, HUGO, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available gene nomenclature data is collected.