KEGG Pathways Scraper avatar

KEGG Pathways Scraper

Pricing

from $18.00 / 1,000 result items

Go to Apify Store
KEGG Pathways Scraper

KEGG Pathways Scraper

Export biological pathway data from KEGG, the reference encyclopedia of genes, proteins, and metabolic networks. List, fetch, or search entries from pathway, module, KO, genome, compound, glycan, reaction, enzyme, drug, and disease databases.

Pricing

from $18.00 / 1,000 result items

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

ParseForge Banner

🧬 KEGG Pathways Scraper

🚀 Export biological pathway data from KEGG in seconds. List, search, or fetch entries across 18 KEGG databases (pathway, module, KO, compound, glycan, reaction, enzyme, drug, disease, and more). No login, no API key.

🕒 Last updated: 2026-05-22 · 📊 20 fields per record · 🧬 18 KEGG databases · 🔬 Live KEGG feed · 🔍 3 query modes

The KEGG Pathways Scraper taps KEGG (Kyoto Encyclopedia of Genes and Genomes), the reference resource for understanding biological systems from genomes and pathways to drugs and diseases. The Actor returns 20 structured fields per record, including the entry ID, name, organism, definition, description, classification, cross-references to modules, diseases, drugs, pathways, genes, compounds, reactions, enzymes, and literature citations.

The catalog covers 18 KEGG databases, from pathway and module to compound, glycan, reaction, rclass, enzyme, network, variant, disease, drug, dgroup, plus the organism-specific KO orthology and viral/addendum genomes. This Actor exposes three query modes (list, find, get) so you can browse, search, or pull full detail with the same predictable schema.

🎯 Target Audience💡 Primary Use Cases
Systems biologists, drug-discovery scientists, metabolic engineers, bioinformaticians, computational biologists, pharma research, biotech startupsPathway enrichment analysis, drug-target identification, metabolic network modelling, disease-gene mapping, ortholog assignment, KEGG-driven LLM grounding

📋 What the KEGG Pathways Scraper does

Three query workflows across 18 databases in a single Actor:

  • 📑 List mode. Enumerate every entry in a database (e.g. all human pathways, all compounds, all drugs).
  • 🔎 Find mode. Free-text keyword search across a database (e.g. glycolysis, insulin, tuberculosis).
  • 🧾 Get mode. Retrieve full entry detail by ID (e.g. hsa00010+hsa00020 for human glycolysis and TCA cycle).
  • 🧬 Organism filter. Restrict pathway/module/KO queries to one organism via KEGG codes (hsa human, mmu mouse, eco E. coli, sce yeast).

Each record includes the entry ID, name, organism, definition, description, classification hierarchy, plus structured cross-references to modules, diseases, drugs, related pathways, genes, compounds, reactions, enzymes, and literature references. The raw KEGG flat-file text is also preserved.

💡 Why it matters: KEGG is the most widely cited pathway resource in systems biology, but its REST API returns plain-text flat files that need a custom parser per database. This Actor delivers a single normalized schema, so you can move from query to dashboard or model in minutes.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modeenum"list"One of list, find, get.
databaseenum"pathway"One of 18 KEGG databases.
organismstring"hsa"KEGG organism code. Applies to list mode with pathway/module/ko.
querystring""Keyword for find mode, or +-joined IDs for get mode.

Example: list all human pathways.

{
"maxItems": 500,
"mode": "list",
"database": "pathway",
"organism": "hsa"
}

Example: fetch full detail for glycolysis and TCA cycle.

{
"maxItems": 2,
"mode": "get",
"database": "pathway",
"query": "hsa00010+hsa00020"
}

Example: search for tuberculosis-related drugs.

{
"maxItems": 50,
"mode": "find",
"database": "drug",
"query": "tuberculosis"
}

⚠️ Good to Know: KEGG's get operation accepts up to 10 IDs per request. The Actor chunks larger ID lists automatically and stitches results back together.


📊 Output

Each record contains 20 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
🆔 entryIdstring"hsa00010"
🗂️ databasestring"pathway"
🏷️ namestring"Glycolysis / Gluconeogenesis"
🧬 organismstring | null"Homo sapiens (human)"
📝 definitionstring | null"Glycolysis is the process of..."
📖 descriptionstring | null"Reference pathway map..."
🗂️ classificationstring[] | null["Metabolism", "Carbohydrate metabolism"]
🧱 modulesobject[] | null[{ "id": "M00001", "name": "..." }]
🩺 diseasesobject[] | null[{ "id": "H00114", "name": "Hereditary fructose intolerance" }]
💊 drugsobject[] | null[{ "id": "D00097", "name": "Metformin" }]
🔗 pathwaysobject[] | null[{ "id": "hsa00020", "name": "Citrate cycle" }]
🧬 genesobject[] | null[{ "id": "3098", "symbol": "HK1" }]
⚗️ compoundsobject[] | null[{ "id": "C00031", "name": "D-Glucose" }]
🔄 reactionsobject[] | null[{ "id": "R01786", "name": "..." }]
🧪 enzymesobject[] | null[{ "id": "2.7.1.1", "name": "Hexokinase" }]
📚 referencesobject[] | null[{ "pmid": "12345678", "title": "..." }]
📄 rawEntrystring | null"ENTRY hsa00010 Pathway..."
🔗 urlstring"https://www.kegg.jp/entry/hsa00010"
🕓 scrapedAtISO 8601"2026-05-22T00:00:00.000Z"
⚠️ errorstring | nullnull

📦 Sample records


✨ Why choose this Actor

Capability
🗂️18 databases. Pathway, module, KO, compound, glycan, reaction, enzyme, drug, disease, and more.
🔍Three query modes. List, find, and get cover discovery, search, and detail pulls.
🧬Organism-aware. Filter pathway, module, and KO queries by KEGG organism code.
🔗Rich cross-references. Every entry links to related modules, diseases, drugs, genes, compounds.
Fast. 100 entries in under a minute.
🔁Always fresh. Every run hits the live KEGG feed.
🚫No API key. Public KEGG REST endpoints need no registration.

📊 KEGG is the foundational systems-biology reference cited in tens of thousands of papers across genomics, metabolomics, and drug discovery.


📈 How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
⭐ KEGG Pathways Scraper (this Actor)$5 free credit, then pay-per-use18 databasesLive per runList, find, get, organism⚡ 2 min
Manual REST calls + custom parserFreeFullPer-buildHand-rolled⏳ Days
KEGGREST R packageFreeFullLiveLimited🐢 Hours
Commercial pathway-analysis suites$$$$/yearCurated subsetVendor scheduleVendor-defined🕒 Sales cycle

Pick this Actor when you want a single normalized schema, three query modes, and zero parser maintenance.


🚀 How to use

  1. 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. 🌐 Open the Actor. Go to the KEGG Pathways Scraper page on the Apify Store.
  3. 🎯 Set input. Pick a database, query mode, and any keyword or ID list.
  4. 🚀 Run it. Click Start and let the Actor collect your data.
  5. 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


💼 Business use cases

💊 Drug discovery

  • Drug-target identification across pathways
  • Polypharmacology and side-effect mapping
  • Disease-gene-drug network construction
  • Pathway-level repositioning screens

🧬 Systems biology

  • Pathway enrichment for omics datasets
  • KO orthology mapping across organisms
  • Metabolic network reconstruction
  • Comparative genomics across species

🏭 Metabolic engineering

  • Strain-design enzyme target lookup
  • Flux-balance analysis input data
  • Pathway gap-filling and rerouting
  • Compound and reaction reference tables

🧪 Translational research

  • Disease-pathway annotation for diagnostics
  • Biomarker discovery context
  • Multi-omics integration anchors
  • Curation pipelines for proprietary KBs

🔌 Automating KEGG Pathways Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • 🟢 Node.js. Install the apify-client NPM package.
  • 🐍 Python. Use the apify-client PyPI package.
  • 📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly or monthly refreshes keep downstream knowledge bases in sync automatically.


🌟 Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

  • Pathway and module datasets for systems-biology courses
  • Reproducible studies with cited, versioned KEGG pulls
  • Thesis projects on metabolic engineering and orthology
  • Cross-organism comparative genomics

🎨 Personal and creative

  • Hobbyist bioinformatics blogs and explainers
  • Visualization projects on metabolic networks
  • Educational apps that teach pathway biology
  • Side projects on plant or microbial metabolism

🤝 Non-profit and civic

  • Open-science platforms for global-south researchers
  • Public-health education on disease pathways
  • Citizen-science projects on microbiome metabolism
  • Outreach resources for STEM education

🧪 Experimentation

  • Train pathway-prediction ML models
  • Validate generative-biology tools against KEGG ground truth
  • Prototype agent pipelines that answer pathway questions
  • Build LLM-grounded biology assistants with cited records

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


❓ Frequently Asked Questions

🧩 How does it work?

Pick a database, a query mode (list, find, get), and any keyword or ID list. The Actor hits the public KEGG endpoint, parses the flat-file response into a clean schema, and emits one structured record per entry.

📏 How accurate is the data?

Records mirror the live KEGG feed at run time. KEGG is curated by the Kanehisa Laboratories at Kyoto University and is one of the most widely cited bioinformatics resources.

🔁 How often is the dataset refreshed?

KEGG is updated continuously. Every Actor run pulls the latest entries, modules, references, and cross-references at run time.

🧬 What organism codes are supported?

Every KEGG organism code works, including hsa (human), mmu (mouse), rno (rat), eco (E. coli), sce (yeast), dre (zebrafish), and thousands more. See the KEGG organism list for the full set.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval and keep a downstream knowledge base in sync.

KEGG offers free academic access. Commercial users should review KEGG's licensing terms before redistribution or productization. Raw entry retrieval for non-commercial research is generally permitted.

💼 Can I use this data commercially?

Commercial use of KEGG typically requires a KEGG FTP/API subscription from Pathway Solutions. Review the terms before deploying KEGG-derived data in a commercial product.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


🔌 Integrate with any app

KEGG Pathways Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe pathway data into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh KEGG data into your product backend, or alert your team in Slack.


💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.


🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by KEGG, Kanehisa Laboratories, Kyoto University, or Pathway Solutions Inc. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.