Shopee Scraper
Pricing
$10.00 / 1,000 results
Shopee Scraper
A Puppeteer-based Apify Actor that scrapes product listings from [Shopee Vietnam](https://shopee.vn). Supports **search pages**, **category pages**, and automatically paginates through results up to a configurable item limit.
Pricing
$10.00 / 1,000 results
Rating
0.0
(0)
Developer
Tin
Actor stats
0
Bookmarked
6
Total users
5
Monthly active users
a day ago
Last modified
Categories
Share
A Puppeteer-based Apify Actor that scrapes product listings from Shopee Vietnam. Supports search pages, category pages, and automatically paginates through results up to a configurable item limit.
Features
- Scrapes search results (
/search?keyword=...) and category pages (/category/...) - Handles both Shopee API response shapes:
/api/v4/search/search_items— keyword search/api/v4/recommend/recommend_v2— category landing pages
- Automatic pagination via next-page URL detection (no click simulation)
- Respects a
maxItemslimit and stops crawling once reached - Cookie-based login support for authenticated scraping
- Prices normalised from Shopee micros (÷ 1,000,000) to real values
- Full CDN image URLs constructed automatically
Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | [{ url: "https://shopee.vn/Men-Clothes-cat.11035567" }] | URLs to start scraping from (search or category pages) |
maxItems | integer | 10 | Maximum number of items to collect before stopping |
loginCookies | array | — | Browser cookies for authenticated session (optional) |
Example input (INPUT.json)
{"startUrls": [{ "url": "https://shopee.vn/search?category=11036030&keyword=xiaomi" },{ "url": "https://shopee.vn/Men-Clothes-cat.11035567" }],"maxItems": 100}
Output
Each item in the dataset represents one product listing with the following fields:
| Field | Type | Description |
|---|---|---|
url | string | Page URL the item was scraped from |
itemUrl | string | Direct product URL (https://shopee.vn/product/{shopid}/{itemid}) |
itemid | number | Shopee item ID |
shopid | number | Shopee shop ID |
name | string | Product name |
brand | string|null | Brand name |
shop_name | string|null | Shop name |
shop_location | string|null | Shop location |
currency | string | Currency code (e.g. VND) |
price | number | Current price (in currency units) |
price_min | number | Minimum variant price |
price_max | number | Maximum variant price |
price_before_discount | number|null | Original price before discount |
discount | string|null | Discount percentage (e.g. "8%") |
sold | number | Monthly sold count |
historical_sold | number | All-time sold count |
liked_count | number | Number of likes/favourites |
cmt_count | number|null | Number of reviews |
rating_star | number|null | Average star rating |
stock | number|null | Available stock |
item_status | string | Status (e.g. "normal") |
catid | number | Category ID |
image | string|null | Main image URL |
images | string[] | All image URLs |
is_official_shop | boolean|null | Whether the seller is an official shop |
is_on_flash_sale | boolean|null | Whether item is on flash sale |
can_use_cod | boolean|null | Cash on delivery availability |
ctime | number | Listing creation timestamp (Unix) |
How it works
- Input is read via
Actor.getInput(). Start URLs andmaxItemsare extracted. - Cookies are injected into each page before navigation to maintain a logged-in session.
- XHR responses are intercepted — the crawler listens for responses from:
/api/v4/search/search_items(search pages)/api/v4/recommend/recommend_v2(category pages)
- Items are extracted and normalised from whichever API response shape is present, then pushed to the dataset.
- Pagination — after each page, the next-page button's
hrefis read from the DOM. If the href points to"/"or is absent, crawling stops (last page reached). Otherwise, the next URL is enqueued. maxItemsenforcement — counting is tracked globally; pagination stops as soon as the limit is reached.- Cookies are saved after each page so the session stays fresh across requests.
Running locally
# Install dependenciesnpm install# Run with local storage (storage/ directory)apify run# Run and purge previous local storage firstapify run --purge
Deploy to Apify
# Authenticateapify login# Push and deployapify push
Notes
-
Shopee stores prices internally as micros (integer × 10⁻⁶). The scraper divides by
1,000,000to produce real currency values. -
Image IDs returned by the API are converted to full CDN URLs using
https://down-vn.img.susercontent.com/file/. -
Proxy configuration (Apify Residential, Vietnam) is defined in
main.jsand can be enabled by uncommenting theproxyConfigurationoption in the crawler.