Shopee Scraper avatar

Shopee Scraper

Pricing

$10.00 / 1,000 results

Go to Apify Store
Shopee Scraper

Shopee Scraper

A Puppeteer-based Apify Actor that scrapes product listings from [Shopee Vietnam](https://shopee.vn). Supports **search pages**, **category pages**, and automatically paginates through results up to a configurable item limit.

Pricing

$10.00 / 1,000 results

Rating

0.0

(0)

Developer

Tin

Tin

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

5

Monthly active users

a day ago

Last modified

Categories

Share

A Puppeteer-based Apify Actor that scrapes product listings from Shopee Vietnam. Supports search pages, category pages, and automatically paginates through results up to a configurable item limit.


Features

  • Scrapes search results (/search?keyword=...) and category pages (/category/...)
  • Handles both Shopee API response shapes:
    • /api/v4/search/search_items — keyword search
    • /api/v4/recommend/recommend_v2 — category landing pages
  • Automatic pagination via next-page URL detection (no click simulation)
  • Respects a maxItems limit and stops crawling once reached
  • Cookie-based login support for authenticated scraping
  • Prices normalised from Shopee micros (÷ 1,000,000) to real values
  • Full CDN image URLs constructed automatically

Input

FieldTypeDefaultDescription
startUrlsarray[{ url: "https://shopee.vn/Men-Clothes-cat.11035567" }]URLs to start scraping from (search or category pages)
maxItemsinteger10Maximum number of items to collect before stopping
loginCookiesarrayBrowser cookies for authenticated session (optional)

Example input (INPUT.json)

{
"startUrls": [
{ "url": "https://shopee.vn/search?category=11036030&keyword=xiaomi" },
{ "url": "https://shopee.vn/Men-Clothes-cat.11035567" }
],
"maxItems": 100
}

Output

Each item in the dataset represents one product listing with the following fields:

FieldTypeDescription
urlstringPage URL the item was scraped from
itemUrlstringDirect product URL (https://shopee.vn/product/{shopid}/{itemid})
itemidnumberShopee item ID
shopidnumberShopee shop ID
namestringProduct name
brandstring|nullBrand name
shop_namestring|nullShop name
shop_locationstring|nullShop location
currencystringCurrency code (e.g. VND)
pricenumberCurrent price (in currency units)
price_minnumberMinimum variant price
price_maxnumberMaximum variant price
price_before_discountnumber|nullOriginal price before discount
discountstring|nullDiscount percentage (e.g. "8%")
soldnumberMonthly sold count
historical_soldnumberAll-time sold count
liked_countnumberNumber of likes/favourites
cmt_countnumber|nullNumber of reviews
rating_starnumber|nullAverage star rating
stocknumber|nullAvailable stock
item_statusstringStatus (e.g. "normal")
catidnumberCategory ID
imagestring|nullMain image URL
imagesstring[]All image URLs
is_official_shopboolean|nullWhether the seller is an official shop
is_on_flash_saleboolean|nullWhether item is on flash sale
can_use_codboolean|nullCash on delivery availability
ctimenumberListing creation timestamp (Unix)

How it works

  1. Input is read via Actor.getInput(). Start URLs and maxItems are extracted.
  2. Cookies are injected into each page before navigation to maintain a logged-in session.
  3. XHR responses are intercepted — the crawler listens for responses from:
    • /api/v4/search/search_items (search pages)
    • /api/v4/recommend/recommend_v2 (category pages)
  4. Items are extracted and normalised from whichever API response shape is present, then pushed to the dataset.
  5. Pagination — after each page, the next-page button's href is read from the DOM. If the href points to "/" or is absent, crawling stops (last page reached). Otherwise, the next URL is enqueued.
  6. maxItems enforcement — counting is tracked globally; pagination stops as soon as the limit is reached.
  7. Cookies are saved after each page so the session stays fresh across requests.

Running locally

# Install dependencies
npm install
# Run with local storage (storage/ directory)
apify run
# Run and purge previous local storage first
apify run --purge

Deploy to Apify

# Authenticate
apify login
# Push and deploy
apify push

Notes

  • Shopee stores prices internally as micros (integer × 10⁻⁶). The scraper divides by 1,000,000 to produce real currency values.

  • Image IDs returned by the API are converted to full CDN URLs using https://down-vn.img.susercontent.com/file/.

  • Proxy configuration (Apify Residential, Vietnam) is defined in main.js and can be enabled by uncommenting the proxyConfiguration option in the crawler.

  • Join our developer community on Discord