Never-FAILED hardening: a run can now only end FAILED on a genuine unrecoverable code crash. (1) crawler.run() is wrapped in try/catch — a fatal Crawlee/proxy/browser-launch error now writes a schema-valid diagnostic row and exits SUCCEEDED with a terminal WARNING status instead of propagating uncaught. (2) New safePushData wraps EVERY Actor.pushData (products, reviews, diagnostics): a platform-side dataset_schema rejection on one record is caught and retried with a minimal no-null compliant record, so one bad record can never fail the batch. (3) The Akamai/PerimeterX block-exhaustion path now sets { isStatusMessageTerminal: true, level: 'WARNING' } on its status message and pushes its diagnostic via safePushData. Scraping/extraction and the session-rotation/backoff escalation are unchanged.
0.7 — 2026-06-11
Success-rate fix: empty/mismatched user input (no searchQuery, productUrls, or itemIds for the selected mode) now exits SUCCEEDED with a terminal WARNING status telling the user exactly what to supply, instead of calling Actor.fail. A user-configuration mistake no longer counts as an actor failure on the 30-day success-rate counter. Scraping, extraction, and the Akamai/PerimeterX session-rotation/backoff escalation are unchanged.
0.6 — 2026-06-10
Success-rate hardening: final Akamai/PerimeterX exhaustion now writes a non-billable diagnostic dataset row, stores RUN_SUMMARY.stopReason, and exits SUCCEEDED instead of Actor.fail. The schema prefill now requests 5 products instead of 50 so automated health checks stay inside 5 minutes.
0.5 — 2026-06-06
Anti-bot retry escalation to lift the 30-day success rate above 81.8%. A single transient Akamai/PerimeterX block on the first request no longer fails the whole run. Block pages and empty __NEXT_DATA__ responses now throw Crawlee SessionError, which rotates to a FRESH US residential session (different egress IP) on a dedicated maxSessionRotations budget (8) — separate from maxRequestRetries (raised 4→6). Exponential backoff (~1.5s→30s) between rotated attempts plus sameDomainDelaySecs: 3 give PerimeterX time to cool off. The honest-fail (Actor.fail) is retained as the FINAL fallback only — it now fires after the escalation is exhausted, never as a silent empty-dataset success.
0.4 — 2026-06-04
Category mode removed from input schema. Walmart's /shop/, /browse/, /cp/, and /search?cat_id= endpoints either return empty searchResult payloads or get aggressively blocked by Akamai/PerimeterX. Brutal-tested 4 URL variants in v0.3 — none returned usable data. Workaround: use mode=search with a category-name keyword (e.g. 'gaming laptop') and filter with minPrice/maxPrice/brands/minRating. Internal category handler code retained for future iteration when Walmart's category payload shape stabilises.
0.3 — 2026-06-04
Category mode now works on /browse/... and /cp/... URLs that end in a cat_id digit chain (e.g. /browse/electronics/headphones/3944_1101193_447529_1071966). Auto-converted to /search?cat_id=… which uses Walmart's standard search payload and dodges PerimeterX bias against /browse and /cp.
/shop/… URLs without an extractable cat_id are still attempted as-is but documented as higher-block-rate.
0.2 — 2026-06-04
Search cards now return prices and brands reliably — added 10+ Walmart Next.js price paths (currentPrice/linePrice/priceString/salePrice variants) + string-price parser handling formats like '$24.99', 'From $X', '$X - $Y'.
Category mode now extracts products from /shop, /browse, and /cp URLs — added contentLayout.modules[] fallback for category-page shape.
itemIds mode rewritten to resolve via search lookup ?q={id} — Walmart's PerimeterX no longer blocks slug-less /ip/{id} URLs into infinite retry loops.
Cost safety: maxRequestRetries 10 → 4, navigationTimeoutSecs 60 → 45, requestHandlerTimeoutSecs 120 → 90. Caps runaway runs at ~3-4 min instead of full 1h timeout.
All productUrl / imageUrl fields returned as absolute https://www.walmart.com/... URLs (no more relative paths).
Live example IDs in input schema (13544111159 / 741224561 — JLab + ONN headphones, both verified live in v0.2 testing).
0.1 — 2026-05-13
Initial release: all-in-one Walmart.com extractor with 4 input modes (search / category / productUrls / itemIds) and optional per-review extraction.
PlaywrightCrawler (Chromium) with Apify Residential proxy pinned to US, fingerprint pool, session pool with cookie persistence, resource blocking for image/font/css/media.
Extracts product data from Next.js NEXT_DATA JSON blob — defensive multi-path picker covers Walmart's variant layouts.