Shopee Scraper (Login Required) avatar

Shopee Scraper (Login Required)

Pricing

$10.00 / 1,000 results

Go to Apify Store
Shopee Scraper (Login Required)

Shopee Scraper (Login Required)

A Puppeteer-based Apify Actor that scrapes product listings from [Shopee Vietnam](https://shopee.vn). Supports **search pages**, **category pages**, and automatically paginates through results up to a configurable item limit.

Pricing

$10.00 / 1,000 results

Rating

0.0

(0)

Developer

Tin

Tin

Maintained by Community

Actor stats

0

Bookmarked

17

Total users

9

Monthly active users

22 days ago

Last modified

Categories

Share

Shopee Scraper

A Puppeteer-based Apify Actor that scrapes product listings from Shopee Vietnam. Supports search pages, category pages, and automatically paginates through results up to a configurable item limit.


Features

  • Scrapes search results (/search?keyword=...) and category pages (/category/...)
  • Handles both Shopee API response shapes:
    • /api/v4/search/search_items — keyword search
    • /api/v4/recommend/recommend_v2 — category landing pages
  • Automatic pagination via next-page URL detection (no click simulation)
  • Respects a maxItems limit and stops crawling once reached
  • Cookie-based login support for authenticated scraping
  • Prices normalised from Shopee micros (÷ 1,000,000) to real values
  • Full CDN image URLs constructed automatically

Input

FieldTypeDefaultDescription
startUrlsarray[{ url: "https://shopee.vn/Men-Clothes-cat.11035567" }]URLs to start scraping from (search or category pages)
maxItemsinteger10Maximum number of items to collect before stopping
loginCookiesarrayBrowser cookies for authenticated session. Short-lived — re-capture before every run (see Authentication)
extraHeadersobject{}Anti-bot headers (x-sap-sec, x-sap-ri, sz-token, d-nonptcha-sync, af-ac-enc-*). Same short TTL — re-capture every run

Example input (INPUT.json)

{
"startUrls": [
{ "url": "https://shopee.vn/search?category=11036030&keyword=xiaomi" },
{ "url": "https://shopee.vn/Men-Clothes-cat.11035567" }
],
"maxItems": 100
}

Output

Each item in the dataset represents one product listing with the following fields:

FieldTypeDescription
urlstringPage URL the item was scraped from
itemUrlstringDirect product URL (https://shopee.vn/product/{shopid}/{itemid})
itemidnumberShopee item ID
shopidnumberShopee shop ID
namestringProduct name
brandstring|nullBrand name
shop_namestring|nullShop name
shop_locationstring|nullShop location
currencystringCurrency code (e.g. VND)
pricenumberCurrent price (in currency units)
price_minnumberMinimum variant price
price_maxnumberMaximum variant price
price_before_discountnumber|nullOriginal price before discount
discountstring|nullDiscount percentage (e.g. "8%")
soldnumberMonthly sold count
historical_soldnumberAll-time sold count
liked_countnumberNumber of likes/favourites
cmt_countnumber|nullNumber of reviews
rating_starnumber|nullAverage star rating
stocknumber|nullAvailable stock
item_statusstringStatus (e.g. "normal")
catidnumberCategory ID
imagestring|nullMain image URL
imagesstring[]All image URLs
is_official_shopboolean|nullWhether the seller is an official shop
is_on_flash_saleboolean|nullWhether item is on flash sale
can_use_codboolean|nullCash on delivery availability
ctimenumberListing creation timestamp (Unix)

How it works

  1. Input is read via Actor.getInput(). Start URLs and maxItems are extracted.
  2. Cookies are injected into each page before navigation to maintain a logged-in session.
  3. XHR responses are intercepted — the crawler listens for responses from:
    • /api/v4/search/search_items (search pages)
    • /api/v4/recommend/recommend_v2 (category pages)
  4. Items are extracted and normalised from whichever API response shape is present, then pushed to the dataset.
  5. Pagination — after each page, the next-page button's href is read from the DOM. If the href points to "/" or is absent, crawling stops (last page reached). Otherwise, the next URL is enqueued.
  6. maxItems enforcement — counting is tracked globally; pagination stops as soon as the limit is reached.
  7. Cookies are saved after each page so the session stays fresh across requests.

Authentication

Shopee's API protects every request with two short-lived layers of credentials:

  1. Session cookies (csrftoken, SPC_F, SPC_U, SPC_ST, SPC_EC, SPC_T_ID, etc.) — bound to the active browser session.
  2. Signed anti-bot headers (x-sap-sec, x-sap-ri, sz-token, af-ac-enc-dat, af-ac-enc-sz-token, d-nonptcha-sync) — recomputed by Shopee's frontend JavaScript on every request.

⚠️ These tokens expire within minutes, not hours. Capture a fresh set immediately before every run. Reusing tokens from a previous session will return error: 90309999 (anti-bot rejection) with no items.

How to capture credentials

  1. Open the target Shopee search/category page in a logged-in Chrome/Edge session (e.g. https://shopee.vn/search?keyword=xiaomi&category=11036030).
  2. Open DevTools → Network tab → filter for search_items.
  3. Trigger the request (refresh or scroll).
  4. Right-click the search_items row → CopyCopy as cURL (bash) (or Copy as fetch for JS).
  5. From the captured request, populate the actor input:
    • startUrls[0].url — the full API URL from the cURL (the /api/v4/search/search_items?... line).
    • extraHeaders — every -H header except :authority, :method, :path, :scheme, accept-encoding (let the HTTP layer handle compression).
    • loginCookies — either parse the cookie: header into a [{name, value}, ...] array, or just paste the full cookie string into extraHeaders.cookie (overrides the array).
  6. Run within 60 seconds. Any longer and the signed tokens will have rotated and Shopee will reject the request.

Why the URL must come from the same capture

x-sap-sec is a signature over the request URL. The captured startUrls[0].url includes the original extra_params and view_session_id UUIDs that were signed — changing them invalidates the signature. Don't rewrite the URL; paste it verbatim.


Notes

  • Shopee stores prices internally as micros (integer × 10⁻⁶). The scraper divides by 1,000,000 to produce real currency values.

  • Image IDs returned by the API are converted to full CDN URLs using https://down-vn.img.susercontent.com/file/.

  • Proxy configuration (Apify Residential, Vietnam) is defined in main.js and can be enabled by uncommenting the proxyConfiguration option in the crawler.

  • Join our developer community on Discord