KEGG Pathways Scraper
Pricing
from $18.00 / 1,000 result items
KEGG Pathways Scraper
Export biological pathway data from KEGG, the reference encyclopedia of genes, proteins, and metabolic networks. List, fetch, or search entries from pathway, module, KO, genome, compound, glycan, reaction, enzyme, drug, and disease databases.
Pricing
from $18.00 / 1,000 result items
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share

🧬 KEGG Pathways Scraper
🚀 Export biological pathway data from KEGG in seconds. List, search, or fetch entries across 18 KEGG databases (pathway, module, KO, compound, glycan, reaction, enzyme, drug, disease, and more). No login, no API key.
🕒 Last updated: 2026-05-22 · 📊 20 fields per record · 🧬 18 KEGG databases · 🔬 Live KEGG feed · 🔍 3 query modes
The KEGG Pathways Scraper taps KEGG (Kyoto Encyclopedia of Genes and Genomes), the reference resource for understanding biological systems from genomes and pathways to drugs and diseases. The Actor returns 20 structured fields per record, including the entry ID, name, organism, definition, description, classification, cross-references to modules, diseases, drugs, pathways, genes, compounds, reactions, enzymes, and literature citations.
The catalog covers 18 KEGG databases, from pathway and module to compound, glycan, reaction, rclass, enzyme, network, variant, disease, drug, dgroup, plus the organism-specific KO orthology and viral/addendum genomes. This Actor exposes three query modes (list, find, get) so you can browse, search, or pull full detail with the same predictable schema.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Systems biologists, drug-discovery scientists, metabolic engineers, bioinformaticians, computational biologists, pharma research, biotech startups | Pathway enrichment analysis, drug-target identification, metabolic network modelling, disease-gene mapping, ortholog assignment, KEGG-driven LLM grounding |
📋 What the KEGG Pathways Scraper does
Three query workflows across 18 databases in a single Actor:
- 📑 List mode. Enumerate every entry in a database (e.g. all human pathways, all compounds, all drugs).
- 🔎 Find mode. Free-text keyword search across a database (e.g.
glycolysis,insulin,tuberculosis). - 🧾 Get mode. Retrieve full entry detail by ID (e.g.
hsa00010+hsa00020for human glycolysis and TCA cycle). - 🧬 Organism filter. Restrict pathway/module/KO queries to one organism via KEGG codes (
hsahuman,mmumouse,ecoE. coli,sceyeast).
Each record includes the entry ID, name, organism, definition, description, classification hierarchy, plus structured cross-references to modules, diseases, drugs, related pathways, genes, compounds, reactions, enzymes, and literature references. The raw KEGG flat-file text is also preserved.
💡 Why it matters: KEGG is the most widely cited pathway resource in systems biology, but its REST API returns plain-text flat files that need a custom parser per database. This Actor delivers a single normalized schema, so you can move from query to dashboard or model in minutes.
🎬 Full Demo
🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
mode | enum | "list" | One of list, find, get. |
database | enum | "pathway" | One of 18 KEGG databases. |
organism | string | "hsa" | KEGG organism code. Applies to list mode with pathway/module/ko. |
query | string | "" | Keyword for find mode, or +-joined IDs for get mode. |
Example: list all human pathways.
{"maxItems": 500,"mode": "list","database": "pathway","organism": "hsa"}
Example: fetch full detail for glycolysis and TCA cycle.
{"maxItems": 2,"mode": "get","database": "pathway","query": "hsa00010+hsa00020"}
Example: search for tuberculosis-related drugs.
{"maxItems": 50,"mode": "find","database": "drug","query": "tuberculosis"}
⚠️ Good to Know: KEGG's
getoperation accepts up to 10 IDs per request. The Actor chunks larger ID lists automatically and stitches results back together.
📊 Output
Each record contains 20 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
🆔 entryId | string | "hsa00010" |
🗂️ database | string | "pathway" |
🏷️ name | string | "Glycolysis / Gluconeogenesis" |
🧬 organism | string | null | "Homo sapiens (human)" |
📝 definition | string | null | "Glycolysis is the process of..." |
📖 description | string | null | "Reference pathway map..." |
🗂️ classification | string[] | null | ["Metabolism", "Carbohydrate metabolism"] |
🧱 modules | object[] | null | [{ "id": "M00001", "name": "..." }] |
🩺 diseases | object[] | null | [{ "id": "H00114", "name": "Hereditary fructose intolerance" }] |
💊 drugs | object[] | null | [{ "id": "D00097", "name": "Metformin" }] |
🔗 pathways | object[] | null | [{ "id": "hsa00020", "name": "Citrate cycle" }] |
🧬 genes | object[] | null | [{ "id": "3098", "symbol": "HK1" }] |
⚗️ compounds | object[] | null | [{ "id": "C00031", "name": "D-Glucose" }] |
🔄 reactions | object[] | null | [{ "id": "R01786", "name": "..." }] |
🧪 enzymes | object[] | null | [{ "id": "2.7.1.1", "name": "Hexokinase" }] |
📚 references | object[] | null | [{ "pmid": "12345678", "title": "..." }] |
📄 rawEntry | string | null | "ENTRY hsa00010 Pathway..." |
🔗 url | string | "https://www.kegg.jp/entry/hsa00010" |
🕓 scrapedAt | ISO 8601 | "2026-05-22T00:00:00.000Z" |
⚠️ error | string | null | null |
📦 Sample records
✨ Why choose this Actor
| Capability | |
|---|---|
| 🗂️ | 18 databases. Pathway, module, KO, compound, glycan, reaction, enzyme, drug, disease, and more. |
| 🔍 | Three query modes. List, find, and get cover discovery, search, and detail pulls. |
| 🧬 | Organism-aware. Filter pathway, module, and KO queries by KEGG organism code. |
| 🔗 | Rich cross-references. Every entry links to related modules, diseases, drugs, genes, compounds. |
| ⚡ | Fast. 100 entries in under a minute. |
| 🔁 | Always fresh. Every run hits the live KEGG feed. |
| 🚫 | No API key. Public KEGG REST endpoints need no registration. |
📊 KEGG is the foundational systems-biology reference cited in tens of thousands of papers across genomics, metabolomics, and drug discovery.
📈 How it compares to alternatives
| Approach | Cost | Coverage | Refresh | Filters | Setup |
|---|---|---|---|---|---|
| ⭐ KEGG Pathways Scraper (this Actor) | $5 free credit, then pay-per-use | 18 databases | Live per run | List, find, get, organism | ⚡ 2 min |
| Manual REST calls + custom parser | Free | Full | Per-build | Hand-rolled | ⏳ Days |
| KEGGREST R package | Free | Full | Live | Limited | 🐢 Hours |
| Commercial pathway-analysis suites | $$$$/year | Curated subset | Vendor schedule | Vendor-defined | 🕒 Sales cycle |
Pick this Actor when you want a single normalized schema, three query modes, and zero parser maintenance.
🚀 How to use
- 📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
- 🌐 Open the Actor. Go to the KEGG Pathways Scraper page on the Apify Store.
- 🎯 Set input. Pick a database, query mode, and any keyword or ID list.
- 🚀 Run it. Click Start and let the Actor collect your data.
- 📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.
⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.
💼 Business use cases
🔌 Automating KEGG Pathways Scraper
Control the scraper programmatically for scheduled runs and pipeline integrations:
- 🟢 Node.js. Install the
apify-clientNPM package. - 🐍 Python. Use the
apify-clientPyPI package. - 📚 See the Apify API documentation for full details.
The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly or monthly refreshes keep downstream knowledge bases in sync automatically.
🌟 Beyond business use cases
Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.
🤖 Ask an AI assistant about this scraper
Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:
- 💬 ChatGPT
- 🧠 Claude
- 🔍 Perplexity
- 🅒 Copilot
❓ Frequently Asked Questions
🧩 How does it work?
Pick a database, a query mode (list, find, get), and any keyword or ID list. The Actor hits the public KEGG endpoint, parses the flat-file response into a clean schema, and emits one structured record per entry.
📏 How accurate is the data?
Records mirror the live KEGG feed at run time. KEGG is curated by the Kanehisa Laboratories at Kyoto University and is one of the most widely cited bioinformatics resources.
🔁 How often is the dataset refreshed?
KEGG is updated continuously. Every Actor run pulls the latest entries, modules, references, and cross-references at run time.
🧬 What organism codes are supported?
Every KEGG organism code works, including hsa (human), mmu (mouse), rno (rat), eco (E. coli), sce (yeast), dre (zebrafish), and thousands more. See the KEGG organism list for the full set.
⏰ Can I schedule regular runs?
Yes. Use Apify Schedules to run this Actor on any cron interval and keep a downstream knowledge base in sync.
⚖️ Is this data legal to use?
KEGG offers free academic access. Commercial users should review KEGG's licensing terms before redistribution or productization. Raw entry retrieval for non-commercial research is generally permitted.
💼 Can I use this data commercially?
Commercial use of KEGG typically requires a KEGG FTP/API subscription from Pathway Solutions. Review the terms before deploying KEGG-derived data in a commercial product.
💳 Do I need a paid Apify plan to use this Actor?
No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.
🔁 What happens if a run fails or gets interrupted?
Apify automatically retries transient errors. If a run still fails, you can inspect the log in the Runs tab, fix the input, and re-run. Partial datasets from failed runs are preserved so you never lose progress.
🆘 What if I need help?
Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.
🔌 Integrate with any app
KEGG Pathways Scraper connects to any cloud service via Apify integrations:
- Make - Automate multi-step workflows
- Zapier - Connect with 5,000+ apps
- Slack - Get run notifications in your channels
- Airbyte - Pipe pathway data into your warehouse
- GitHub - Trigger runs from commits and releases
- Google Drive - Export datasets straight to Sheets
You can also use webhooks to trigger downstream actions when a run finishes. Push fresh KEGG data into your product backend, or alert your team in Slack.
🔗 Recommended Actors
- 🧪 PubChem Compound Scraper - NIH chemical compound database
- 🏥 ClinicalTrials.gov Scraper - Global clinical research registry
- 📚 PubMed Scraper - Biomedical literature search
- 🔬 ArXiv Scraper - Preprint research papers
- 📊 GBIF Biodiversity Scraper - Global species occurrence data
💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.
🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.
⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by KEGG, Kanehisa Laboratories, Kyoto University, or Pathway Solutions Inc. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.