Amazon Best Selling Categories Scraper avatar

Amazon Best Selling Categories Scraper

Pricing

Pay per usage

Go to Apify Store
Amazon Best Selling Categories Scraper

Amazon Best Selling Categories Scraper

Discover Amazon categories and subcategories from Best Sellers, New Releases, Most Wished For, Most Gifted, and Movers & Shakers. Export structured breadcrumbs, node keys, and parent chains—plus optional NODE_TREE JSON.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Amazon Scraper

Amazon Scraper

Maintained by Community

Actor stats

0

Bookmarked

11

Total users

3

Monthly active users

2 days ago

Last modified

Share

Amazon Best Sellers Category Tree Scraper

Build an Amazon BrowseNode / category navigation dataset by parsing the left-side navigation (“category tree”) on Amazon list pages.

This actor is designed for category discovery (not product extraction):

  • Discover categories and subcategories up to a chosen depth (maxChildLevels)
  • Emit one dataset row per discovered node (not per visited page)
  • Optionally build a frontend-friendly NODE_TREE JSON in Key-Value Store (nodeTreeEnabled)

Supported locales

  • United States (US): www.amazon.com
  • Australia (AU): www.amazon.com.au
  • Canada (CA): www.amazon.ca
  • France (FR): www.amazon.fr
  • Germany (DE): www.amazon.de
  • India (IN): www.amazon.in
  • Italy (IT): www.amazon.it
  • Japan (JP): www.amazon.co.jp
  • Spain (ES): www.amazon.es
  • United Kingdom (GB): www.amazon.co.uk

Supported roots & URL patterns

Supported “root” list pages (slugs):

  • bestsellers
  • new-releases
  • most-wished-for
  • most-gifted
  • movers-and-shakers

Supported URL shapes:

  • Modern /gp/<rootSlug> pages:
    • https://www.amazon.com/gp/bestsellers
    • https://www.amazon.com/gp/new-releases/digital-text/ref=...
  • Legacy /zgbs/ pages (US common when users copy from browser):
    • https://www.amazon.com/Best-Sellers/zgbs/ref=...
    • https://www.amazon.com/Best-Sellers-Clothing-Shoes-Jewelry/zgbs/fashion/ref=...

Input

Configure the actor run via Apify input:

  • categoryUrls (string[], required): Amazon category URLs to start from.
    • Recommended: provide an array of URLs.
    • Backward-compatible: a newline-separated string is still accepted.
  • maxChildLevels (number): How many child levels to discover (minimum 1).
  • retries (number): Retry count for blocked/empty pages.
  • nodeTreeEnabled (boolean): When enabled, after the crawl finishes the actor reads the default dataset and generates a domain-keyed category NodeTree JSON saved to the default key-value store under key NODE_TREE.

Example input

{
"categoryUrls": [
"https://www.amazon.com/gp/bestsellers",
"https://www.amazon.co.jp/gp/bestsellers",
"https://www.amazon.es/gp/bestsellers",
"https://www.amazon.it/gp/bestsellers",
"https://www.amazon.fr/gp/bestsellers",
"https://www.amazon.de/gp/bestsellers",
"https://www.amazon.ca/gp/bestsellers",
"https://www.amazon.com.au/gp/bestsellers",
"https://www.amazon.in/gp/bestsellers"
],
"maxChildLevels": 2,
"nodeTreeEnabled": true,
"retries": 5
}

Depth semantics (important)

Key idea: requests are pages, dataset rows are nodes.

  • A visited page has a currentLevel (input URLs are level 0)
  • The page contains a parsed children list (those are the discovered nodes for the next level)
  • The actor emits dataset rows for those child nodes

Practical expectations (one input URL):

  • maxChildLevels: 1
    • Requests: level 0 only (usually 1 page)
    • Dataset rows: level 1 nodes (often ~40 on US Best Sellers)
  • maxChildLevels: 2
    • Requests: level 0 + level 1 pages
    • Dataset rows: level 1 + level 2 nodes

Where to get categoryUrls

Use “bestseller-style” list pages on Amazon, for example:

  • All categories entry:
    • https://www.amazon.com/gp/bestsellers
  • A specific department entry:
    • https://www.amazon.com/gp/bestsellers/digital-text
  • A different root:
    • https://www.amazon.com/gp/new-releases/digital-text

Output

The actor stores results in the default dataset as one row per discovered node:

  • Fields: domain, depth, breadcrumbs, category, parentNodeKey, parents, nodeKey

Breadcrumb semantics:

  • breadcrumbs is the parent path (does not include the current category)
  • The separator is >
  • parents is the ordered breadcrumb chain with frontend-friendly IDs (root:*, slug:*, browseNode:*)

Example dataset row:

{
"domain": "www.amazon.com",
"depth": 1,
"breadcrumbs": "Any Department",
"category": "Digital Music",
"parentNodeKey": "bestsellers",
"parents": [
{
"name": "Any Department",
"nodeId": "root:bestsellers",
"nodeKey": "bestsellers"
}
],
"nodeKey": "bestsellers/dmusic"
}

Example dataset

About NODE_TREE

If you want a ready-to-render navigation structure (tree/cascader), enable:

  • nodeTreeEnabled: true

At the end of the run, the actor writes a multi-root NODE_TREE document into Key-Value Store:

Node ID conventions (important)

To support multiple roots (Best Sellers / New Releases / ...) and cross-root merging, nodes use namespaced IDs:

  • Root nodes: root:<rootSlug>
    • e.g. root:bestsellers, root:new-releases
  • Slug nodes: slug:<slug>
    • e.g. slug:digital-text
  • BrowseNode nodes (digits only): browseNode:<id>
    • e.g. browseNode:8624102011

Benefits:

  • Globally unique IDs (prevents slug vs numeric collisions)
  • The same slug can exist under multiple roots and still be treated as the same node (via parentIds / childrenByRoot)

Node fields

type Node = {
id: string
idType: 'root' | 'slug' | 'browseNode'
title: string
kind: 'root' | 'department' | 'category'
level: number
parentIds: string[]
childrenByRoot: Record<string, string[]>
nodeKeyByRoot: Record<string, string>
urlByRoot: Record<string, string>
}

Semantics:

  • id / idType: unique node identity (see namespaced IDs above).
  • title: display label, derived from breadcrumbs/category when possible; falls back to the raw ID suffix.
  • kind:
    • root: root list node (e.g. root:bestsellers)
    • department: a node whose minimal observed level is 1 under any root (e.g. slug:digital-text)
    • category: all deeper nodes
  • level: the minimal observed path depth across all roots (based on nodeKeyByRoot).
    • For rendering, prefer using traversal depth under the chosen rootId.
  • parentIds: the set of direct parent node IDs (multi-parent is allowed).
    • Example: slug:digital-text could have both root:bestsellers and root:new-releases as parents.
  • childrenByRoot[rootId]: the direct children list scoped to a root view.
    • This is the recommended field for frontend tree rendering.
    • Root scoping avoids mixing children from different roots.
  • nodeKeyByRoot[rootId]: canonical nodeKey for this node under a given root (useful for URL building and UI instance keys).
  • urlByRoot[rootId]: fully built Amazon URL for that root (based on https://<domain>/gp/<nodeKey>).

Use Cases

AI Knowledge Bases (e.g. Gemini Gems, ChatGPT GPTs)

Feed the NODE_TREE JSON directly into an AI knowledge base to give your custom AI assistant a structured understanding of Amazon's product taxonomy. This lets users ask natural-language questions like "what subcategories exist under Electronics?" and get accurate, structured answers.

How to set it up:

  1. Run the actor with nodeTreeEnabled: true to generate the NODE_TREE JSON.
  2. Download the JSON from the Key-Value Store.
  3. Upload it as a knowledge source in your Gem / GPT / RAG pipeline.

E-commerce Market Research

Map the full category hierarchy across multiple Amazon locales to identify market gaps, compare category structures between regions, or track how Amazon reorganizes its taxonomy over time.

Product Listing & SEO Tools

Use the category tree to build browse-node pickers in seller tools, auto-suggest the correct Amazon category path for a product, or validate that a listing is placed in the right node.

Price & Trend Monitoring

Use discovered category URLs as seeds for downstream scrapers that track bestseller rankings, pricing trends, or new-release velocity within specific categories.

Catalog Enrichment & Classification

Map your internal product catalog to Amazon's browse node IDs to enrich product metadata, improve search relevance, or power recommendation engines.


Troubleshooting

  • “Input is empty or url format is not valid”: Provide at least one categoryUrls entry, and make sure each URL matches one of the supported list patterns.
  • 404 pages: Missing categories are logged as warnings during the run.