Pricing

$19.00/month + usage

AliExpress Product Search Scraper (Crawlee, Proxy-Ready)

Scrape AliExpress search results by keywords with proxy support, session rotation, and cost guardrails. Outputs a clean, typed dataset (type: product) plus optional suggestion/unknown entries. Also available as Pay-per-event version (better for automation/MCP).

Pricing

$19.00/month + usage

Rating

0.0

(0)

Developer

TrendMatch

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

AliExpress Product Scraper (PRO)

Scrape AliExpress search results at scale using Crawlee PlaywrightCrawler with automatic session rotation, proxy support, and anti-bot handling.

Built for the Apify Store as a Rental Actor.

What it does

For each search query you provide, the Actor:

Opens AliExpress search pages (SEO-friendly URLs with automatic fallback).
Extracts product cards: title, price, image, rating, orders, store name, and URL.
Deduplicates results across queries by product URL.
(Optional) Visits individual product detail pages for enriched data (multi-strategy: window.runParams, ld+json, DOM fallback).
Pushes all items to the Apify Dataset and saves run metadata to the Key-Value store.

Quick start

Minimal input:

{
    "queries": ["LED strip lights", "wireless earbuds"]
}

Recommended input for reliable, cost-efficient scraping:

{
    "queries": ["LED strip lights", "wireless earbuds", "phone case iPhone 15"],
    "maxPages": 3,
    "maxItemsTotal": 300,
    "maxConcurrency": 2,
    "throttleMs": 2500,
    "proxy": {
        "useApifyProxy": true,
        "apifyProxyGroups": ["RESIDENTIAL"]
    }
}

With enrichment:

{
    "queries": ["portable blender"],
    "maxPages": 2,
    "maxItemsTotal": 100,
    "enrich": true,
    "enrichLimit": 30,
    "proxy": {
        "useApifyProxy": true,
        "apifyProxyGroups": ["RESIDENTIAL"]
    }
}

Input parameters

Parameter	Type	Default	Description
`queries`	`string[]`	required	Search keywords to scrape
`maxQueries`	`int`	`10`	Max queries to process (cap: 200)
`maxPages`	`int`	`3`	Pages per query (cap: 20)
`maxItemsTotal`	`int`	`200`	Total items hard cap (cap: 5000)
`maxConcurrency`	`int`	`2`	Parallel browser pages (cap: 5)
`throttleMs`	`int`	`2500`	Min delay between requests in ms
`maxRequestRetries`	`int`	`3`	Retries per failed request
`headless`	`bool`	`true`	Run browser headless
`enrich`	`bool`	`false`	Visit product detail pages
`enrichLimit`	`int`	`20`	Max products to enrich
`includeSuggestions`	`bool`	`false`	Include related-search suggestions in dataset (typed as `type: "suggestion"`)
`includeUnknown`	`bool`	`false`	Include items with valid URL but no commerce signals (typed as `type: "unknown"`)
`proxy`	`object`	`{ useApifyProxy: true, apifyProxyGroups: ["AUTO"] }`	Proxy configuration (Apify proxy editor)
`proxy.useApifyProxy`	`bool`	`true`	Use Apify Proxy
`proxy.apifyProxyGroups`	`string[]`	`["AUTO"]`	Proxy group(s)
`proxy.proxyUrls`	`string[]`	`[]`	Custom proxy URLs
`debugLog`	`bool`	`false`	Verbose logging
`saveHtmlOnError`	`bool`	`false`	Save blocked page HTML to KV store

Proxy groups: AUTO vs RESIDENTIAL

AliExpress uses aggressive bot protection. Proxies are the biggest factor in reliability.

Option	When to use	Pros	Cons
`apifyProxyGroups: ["AUTO"]` (default)	Start here	Cheapest/simplest default	May hit WAF/CAPTCHA more often
`apifyProxyGroups: ["RESIDENTIAL"]`	If you see frequent blocks	Much more reliable on protected pages	Higher proxy cost for the user

Disabling proxy: Set proxy.useApifyProxy: false and leave proxy.proxyUrls empty to run without any proxy. This is useful for local testing but will almost certainly get blocked on AliExpress in production.

Custom proxies: Set proxy.useApifyProxy: false and provide your own URLs in proxy.proxyUrls (format: http://user:pass@host:port). The Actor will rotate through them automatically.

Practical guidance:

Start with AUTO.
If results are empty or blocked often, switch to RESIDENTIAL and keep maxConcurrency low (1-2) with throttleMs >= 2500.

Output dataset fields

Each item in the output dataset has these fields:

Product items (`type: "product"`)

Field	Type	Description
`type`	`string`	Always `"product"`
`query`	`string`	The search query that produced this item
`rank`	`int`	Position on the search results page
`productId`	`string`	AliExpress product ID
`title`	`string`	Product title
`url`	`string`	Canonical product URL
`image`	`string\|null`	Product image URL; `null` if AliExpress did not provide a valid CDN image
`imageValid`	`bool`	`true` if image is from a known AliExpress CDN and not a tiny UI asset
`price`	`number`	Sale price (USD) — may be `null` if not extracted
`originalPrice`	`number`	Original price before discount
`currency`	`string`	Currency code (typically `USD`)
`discount`	`number`	Discount percentage
`rating`	`number`	Star rating (0-5) — may be `null`
`orders`	`int`	Approximate orders/sold count — may be `null`
`storeName`	`string`	Seller store name — may be `null`
`scrapedAt`	`string`	ISO 8601 timestamp
`source`	`string`	Always `aliexpress`
`enriched`	`bool`	Whether detail-page enrichment succeeded (only if `enrich=true`)
`enrichSources`	`string`	Which data sources were used for enrichment

Note: price, rating, orders, storeName, and image depend on what AliExpress renders and may be null. They are not used as hard filters — a product is included if it has a valid /item/ URL and an extractable numeric productId. Image validity is a soft check: imageValid indicates whether the image comes from a known AliExpress CDN host.

Suggestion items (`type: "suggestion"`) — only with `includeSuggestions: true`

Field	Type	Description
`type`	`string`	Always `"suggestion"`
`suggestion`	`string`	The related search term
`query`	`string`	The original query that produced this suggestion
`rank`	`int`	Order within the suggestion block
`scrapedAt`	`string`	ISO 8601 timestamp

Product filtering

The Actor applies three layers of filtering to ensure a clean, store-ready dataset:

Layer 1 — Grid scoping (DOM)

The page.evaluate scopes its search to the main product-results grid using known container selectors (SearchProductFeed, search-item-card-wrapper, manhattan--container, list--gallery--). If none match, it falls back to document.body. Within that scope, each a[href*="/item/"] link must live inside a card-like container ([class*="card"], [class*="product"], [class*="item"], [class*="gallery"]) and that container must contain an <img> element. Bare text anchors and cards without images are skipped at the DOM level.

Layer 2 — Hard URL filter

A card is kept only if both conditions are met:

URL contains /item/ — standard AliExpress product URL pattern.
Numeric productId is extractable from the URL (e.g. /item/1234567890.html).

Cards that fail either check are silently dropped (nonProductDropped).

Layer 3 — Commerce signal check

After extraction, items with no commerce signals (price, orders, rating, storeName all null) and no valid CDN image are classified as type: "unknown" and dropped by default (unknownDropped). Set includeUnknown: true to keep them in the dataset.

Image validation (soft)

Image validation is a soft check — products with missing or invalid images are still included with image: null and imageValid: false. The missingImage metric tracks how many products had no valid CDN image. Valid images must be from a known AliExpress CDN (*.alicdn.com, *.aliexpress-media.com, *.aliyuncs.com) and not a tiny UI asset.

Run metadata

After each run, the Actor stores a metadata record in the Key-Value store with:

runSummary: queriesTotal, searchRequestsProcessed, uniqueQueriesProcessed, pagesFetched, itemsPushed, itemsSkipped, productsFound, productsPushed, nonProductDropped, missingImage, unknownDropped, suggestionsCaptured, blockedCount, failedCount, sessionsRotated, durationSecs
failedQueries: array of { query, page, reason }
effectiveConfig: the actual config used (with proxy credentials redacted)

Access it via the API: GET /v2/key-value-stores/{storeId}/records/metadata

Recommended settings for reliability and cost

Scenario	maxConcurrency	throttleMs	Proxy	Estimated CU
Light (5 queries, 3 pages)	1	3000	AUTO	~0.05-0.1
Medium (20 queries, 3 pages)	2	2500	AUTO	~0.2-0.5
Heavy (50 queries, 5 pages)	2	2500	RESIDENTIAL	~1-3
With enrichment (+50 products)	1	3000	RESIDENTIAL	+0.5-1

Key cost-control lever: maxItemsTotal caps total output regardless of queries/pages.

Anti-bot notes

AliExpress uses several anti-bot protections:

CAPTCHA/WAF/Punish pages: The Actor detects these automatically, retires the session, and retries with a fresh session/proxy.
SEO alphabet redirects: Detected and handled via fallback SearchText URLs.
Locale redirects: US-market cookies are injected to force USD pricing.
Rate limiting: The throttleMs delay (with random jitter) helps avoid triggering rate limits.

Best practices:

Use RESIDENTIAL proxy group for the most reliable results.
Keep maxConcurrency at 1-2.
Keep throttleMs at 2500ms or higher.
If you see many blocks in the logs, increase throttle and switch to RESIDENTIAL.

Troubleshooting

Problem	Solution
All requests blocked	Switch to `RESIDENTIAL` proxy group, increase `throttleMs` to 3000+
Empty results	Check if your query returns results on aliexpress.us manually. Enable `saveHtmlOnError` to inspect blocked pages
Wrong currency / prices	The Actor forces USD via cookies, but some redirects can override this. Use RESIDENTIAL proxy for US-based IPs
Enrichment low success rate	AliExpress detail pages are heavily protected. Consider increasing `throttleMs` for enrichment
Run times out	Reduce `maxQueries`, `maxPages`, or `maxItemsTotal`. Lower `enrichLimit`
"No proxy configured" warning	Enable `proxy.useApifyProxy` or provide `proxy.proxyUrls`. Without proxy, AliExpress will block almost immediately

Local testing

Run the self-contained smoke test (creates storage, writes INPUT, runs the Actor):

$npm test

Manual equivalent (reuses the smoke-test storage):

$CRAWLEE_STORAGE_DIR="$(pwd)/apify_storage_smoke" node src/main.js

Note: Apify SDK v3 uses CRAWLEE_STORAGE_DIR (not APIFY_LOCAL_STORAGE_DIR) to resolve the local key-value store where INPUT.json lives.

Monetization

This Actor is designed for Rental monetization on the Apify Store. Users rent access and pay for their own platform usage (compute units, proxy traffic) on their Apify account. There are no hidden costs or external API dependencies.

Changelog

v1.0.5 (2026-02-13)

Anti-false-positive filtering: grid scoping + card structure gates + commerce signal check.
page.evaluate now scopes to the main results grid (SearchProductFeed, search-item-card-wrapper, manhattan--container, list--gallery--) with document.body fallback.
Card detection requires a meaningful container class (card, product, item, gallery) — removed overly broad div[class] / parentElement fallback.
Each card must contain an <img> element; bare text anchors are skipped at DOM level.
Commerce signal check: items with no price, orders, rating, storeName, and no valid image are classified as type: "unknown" and dropped by default.
New includeUnknown input (default false): when enabled, unknown items appear in the dataset with type: "unknown".
New unknownDropped metric in run metadata.

v1.0.4 (2026-02-13)

Image validation is now soft: products with missing or invalid images are no longer discarded. Instead they get image: null and imageValid: false.
Hard filters remain: URL must contain /item/ and numeric productId must be extractable.
New imageValid boolean field on every product item.
New missingImage metric in run metadata (counts products with invalid/absent image).

v1.0.3 (2026-02-13)

Store-ready dataset: Non-product cards (suggestions, promos, banners) are now filtered out by default.
Product validation: URL must contain /item/, numeric productId must be extractable, image must be from a known AliExpress CDN host.
New type field on every dataset item ("product" or "suggestion").
New includeSuggestions input (default false): when enabled, related-search suggestions are captured with type: "suggestion".
New filtering metrics in run metadata: productsFound, productsPushed, nonProductDropped, suggestionsCaptured.
Image URL normalization (protocol-less URLs get https: prefix).
New helpers: isProductUrl, isValidProductImage, normalizeImageUrl, classifyCard, parseSuggestions.

v1.0.0 (2026-02-12)

Initial release.
Crawlee PlaywrightCrawler with SessionPool and ProxyConfiguration.
Search result parsing with SEO URL + fallback strategy.
Optional detail-page enrichment (runParams / ld+json / DOM).
Input validation with hard caps and conservative defaults.
CAPTCHA/WAF/block detection with automatic session rotation.
Run metadata with summary, failed queries, and effective config.

AliExpress Product Search Scraper (Pay-per-event)

opalescent_junior/aliexpress-product-scraper-pay-per-event

TrendMatch

2.9

AliExpress Product Scraper

xtracto/aliexpress-product-scraper

Extract product data from AliExpress item pages.

Farhan Febrian Nauval

AliExpress Scraper

logical_scrapers/aliexpress-scraper

Powerful AliExpress product scraper that extracts detailed product data from search results. Get real-time pricing, sales data, and product information from AliExpress marketplace. Perfect for market research, price monitoring, and competitive analysis. Clean, structured JSON output.

Goldmine

415

4.7

AliExpress Scraper

unfenced-group/aliexpress-scraper

Scrape AliExpress products, prices, reviews, variants and sellers by search keyword or product URL.

Unfenced Group

Aliexpress Product Description

pintostudio/aliexpress-product-description

The AliExpress Product Description Scraper is a powerful Apify actor that extracts detailed product descriptions, specifications, images, and pricing from AliExpress listings.

Pinto Studio

114

AliExpress Product Variant Details Scraper

nifty.codes/aliexpress-product-ariants-scraper

Extract product variants including SKU ID, price, stock, and attributes from individual AliExpress product pages. Powered by AliExpress Scraper.

Nifty

Aliexpress Product Details

coladeu/aliexpress-product-details

AliExpress Product Details Scraper allows you to extract complete product information from AliExpress product pages like descriptions, shipping info, sku prices etc. This actor scrapes structured and up-to-date data directly from AliExpress product URLs, making it ideal for e-commerce research

Coladeu

143

AliExpress Listings Scraper

piotrv1001/aliexpress-listings-scraper

The AliExpress Listings Scraper extracts product data from AliExpress search URLs, capturing images, titles, prices, stock status, and store details—ideal for market research, price tracking, and competitor analysis.

FalconScrape

671

4.5

AliExpress Products Scraper

nifty.codes/aliexpress-products-scraper

Extract product listings with prices, ratings, and store info from search results, categories, and store pages. Powered by AliExpress Scraper.

Nifty

AliExpress Scraper — Products, Reviews & Sellers

khadinakbar/aliexpress-all-in-one-scraper

Scrape AliExpress product listings, full product details (specs, SKU variants, images), customer reviews, and seller data. Search by keyword or paste product/search URLs - the Actor auto-detects what to scrape. Pay-per-event pricing.