Shopee Scraper (Login Required)
Pricing
$10.00 / 1,000 results
Shopee Scraper (Login Required)
A Puppeteer-based Apify Actor that scrapes product listings from [Shopee Vietnam](https://shopee.vn). Supports **search pages**, **category pages**, and automatically paginates through results up to a configurable item limit.
Pricing
$10.00 / 1,000 results
Rating
0.0
(0)
Developer
Tin
Maintained by CommunityActor stats
0
Bookmarked
17
Total users
9
Monthly active users
22 days ago
Last modified
Categories
Share
Shopee Scraper
A Puppeteer-based Apify Actor that scrapes product listings from Shopee Vietnam. Supports search pages, category pages, and automatically paginates through results up to a configurable item limit.
Features
- Scrapes search results (
/search?keyword=...) and category pages (/category/...) - Handles both Shopee API response shapes:
/api/v4/search/search_items— keyword search/api/v4/recommend/recommend_v2— category landing pages
- Automatic pagination via next-page URL detection (no click simulation)
- Respects a
maxItemslimit and stops crawling once reached - Cookie-based login support for authenticated scraping
- Prices normalised from Shopee micros (÷ 1,000,000) to real values
- Full CDN image URLs constructed automatically
Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | [{ url: "https://shopee.vn/Men-Clothes-cat.11035567" }] | URLs to start scraping from (search or category pages) |
maxItems | integer | 10 | Maximum number of items to collect before stopping |
loginCookies | array | — | Browser cookies for authenticated session. Short-lived — re-capture before every run (see Authentication) |
extraHeaders | object | {} | Anti-bot headers (x-sap-sec, x-sap-ri, sz-token, d-nonptcha-sync, af-ac-enc-*). Same short TTL — re-capture every run |
Example input (INPUT.json)
{"startUrls": [{ "url": "https://shopee.vn/search?category=11036030&keyword=xiaomi" },{ "url": "https://shopee.vn/Men-Clothes-cat.11035567" }],"maxItems": 100}
Output
Each item in the dataset represents one product listing with the following fields:
| Field | Type | Description |
|---|---|---|
url | string | Page URL the item was scraped from |
itemUrl | string | Direct product URL (https://shopee.vn/product/{shopid}/{itemid}) |
itemid | number | Shopee item ID |
shopid | number | Shopee shop ID |
name | string | Product name |
brand | string|null | Brand name |
shop_name | string|null | Shop name |
shop_location | string|null | Shop location |
currency | string | Currency code (e.g. VND) |
price | number | Current price (in currency units) |
price_min | number | Minimum variant price |
price_max | number | Maximum variant price |
price_before_discount | number|null | Original price before discount |
discount | string|null | Discount percentage (e.g. "8%") |
sold | number | Monthly sold count |
historical_sold | number | All-time sold count |
liked_count | number | Number of likes/favourites |
cmt_count | number|null | Number of reviews |
rating_star | number|null | Average star rating |
stock | number|null | Available stock |
item_status | string | Status (e.g. "normal") |
catid | number | Category ID |
image | string|null | Main image URL |
images | string[] | All image URLs |
is_official_shop | boolean|null | Whether the seller is an official shop |
is_on_flash_sale | boolean|null | Whether item is on flash sale |
can_use_cod | boolean|null | Cash on delivery availability |
ctime | number | Listing creation timestamp (Unix) |
How it works
- Input is read via
Actor.getInput(). Start URLs andmaxItemsare extracted. - Cookies are injected into each page before navigation to maintain a logged-in session.
- XHR responses are intercepted — the crawler listens for responses from:
/api/v4/search/search_items(search pages)/api/v4/recommend/recommend_v2(category pages)
- Items are extracted and normalised from whichever API response shape is present, then pushed to the dataset.
- Pagination — after each page, the next-page button's
hrefis read from the DOM. If the href points to"/"or is absent, crawling stops (last page reached). Otherwise, the next URL is enqueued. maxItemsenforcement — counting is tracked globally; pagination stops as soon as the limit is reached.- Cookies are saved after each page so the session stays fresh across requests.
Authentication
Shopee's API protects every request with two short-lived layers of credentials:
- Session cookies (
csrftoken,SPC_F,SPC_U,SPC_ST,SPC_EC,SPC_T_ID, etc.) — bound to the active browser session. - Signed anti-bot headers (
x-sap-sec,x-sap-ri,sz-token,af-ac-enc-dat,af-ac-enc-sz-token,d-nonptcha-sync) — recomputed by Shopee's frontend JavaScript on every request.
⚠️ These tokens expire within minutes, not hours. Capture a fresh set immediately before every run. Reusing tokens from a previous session will return
error: 90309999(anti-bot rejection) with no items.
How to capture credentials
- Open the target Shopee search/category page in a logged-in Chrome/Edge session (e.g.
https://shopee.vn/search?keyword=xiaomi&category=11036030). - Open DevTools → Network tab → filter for
search_items. - Trigger the request (refresh or scroll).
- Right-click the
search_itemsrow → Copy → Copy as cURL (bash) (or Copy as fetch for JS). - From the captured request, populate the actor input:
startUrls[0].url— the full API URL from the cURL (the/api/v4/search/search_items?...line).extraHeaders— every-Hheader except:authority,:method,:path,:scheme,accept-encoding(let the HTTP layer handle compression).loginCookies— either parse thecookie:header into a[{name, value}, ...]array, or just paste the full cookie string intoextraHeaders.cookie(overrides the array).
- Run within 60 seconds. Any longer and the signed tokens will have rotated and Shopee will reject the request.
Why the URL must come from the same capture
x-sap-sec is a signature over the request URL. The captured startUrls[0].url includes the original extra_params and view_session_id UUIDs that were signed — changing them invalidates the signature. Don't rewrite the URL; paste it verbatim.
Notes
-
Shopee stores prices internally as micros (integer × 10⁻⁶). The scraper divides by
1,000,000to produce real currency values. -
Image IDs returned by the API are converted to full CDN URLs using
https://down-vn.img.susercontent.com/file/. -
Proxy configuration (Apify Residential, Vietnam) is defined in
main.jsand can be enabled by uncommenting theproxyConfigurationoption in the crawler.
