All notable changes to this project will be documented in this file.
Complete rewrite — Global Marketplace & Variant Specialist
- 9 dedicated site extractors: AliExpress (
window.runParams), Temu (__NEXT_DATA__), SHEIN (gbProductDetailInfo), Etsy (JSON-LD + variation blocks), DHGate (__initial_state__), Wish (__data__), Joom (__JOOM_DATA__), Banggood (productData), Gearbest (gbData)
- Variant-First SKU extraction: every color × size × material combination extracted as a separate SKU entry with its own price, stock count, and image
- Cartesian product builder (
variant_extractor.js): handles both pre-combined SKU lists and raw option-axis objects
- Stealth Playwright browser (
stealth_browser.js):
- Fingerprint injection via
fingerprint-generator + fingerprint-injector
- CDP timezone spoofing per target country
- Human-like mouse movement + random delays
- DataDome / Cloudflare challenge detection and retry
- Geo-targeting (
geo_config.js): 15 country profiles with locale, timezone, Accept-Language, and default currency
- Shadow Pricing: variant prices captured from hydration JSON before DOM render
- AI-ready
clean_text field: HTML stripped, entities decoded — feed directly to LLMs
- Buy-Box fields:
seller_id, seller_name, shipping_cost on every product
- Currency conversion: 25 currencies with hardcoded rates; optional live exchange rate API
- Promise-based concurrency semaphore: configurable
maxConcurrency 1–20
- Generic fallback extractor: JSON-LD → Open Graph → CSS heuristics for any site
src/main.js: full rewrite — concurrent URL processing, currency conversion, variant normalisation
src/fetcher.js: stealth Playwright browser replaces basic Puppeteer; geo-aware HTTP headers
src/parser.js: multi-strategy pipeline (window vars → JSON-LD → CSS → Open Graph)
src/normalize.js: 25 currencies, live-rate prefetch, European decimal format support
src/utils.js: added cleanText(), safeJsonParse(), sleep()
INPUT_SCHEMA.json: new fields — targetCountry, targetCurrency, proxyUrl, liveRatesUrl, maxConcurrency
OUTPUT_SCHEMA.json: full variant schema — variants[], clean_text, seller_id, shipping_cost, sku_count
README.md: complete rewrite with business use-cases, site table, architecture overview
- Price-change alerting (webhook/email/Slack notifications)
diff.js / price history store (replaced by full product extraction)
amazon_api.js (Amazon PA API — out of scope for marketplace fast-fashion focus)
storeHistory / alertThresholdPercent / alertOn input fields
test/variant.test.js: Cartesian product, edge cases, shape A/B normalisation
test/sites.test.js: URL resolver, all site modules interface check, generic extractor
test/utils.test.js: cleanText, extractASIN, isValidEmail, safeJsonParse
- Updated
test/parser.test.js: JSON-LD extraction, CSS fallback, AliExpress window.runParams, availability detection
- Updated
test/normalize.test.js: currency conversion, European format, live rates
- Initial Actor scaffold for DRAM Pricing Change Monitor
- Features: fetcher, parser, normalize, diff, CSV importer scaffold, Amazon PA API hook, notifications
- Added tests and smoke script
- Documentation and Apify packaging files