Walmart Data Extractor
Pricing
Pay per event
Walmart Data Extractor
Extract Walmart.com products from search, category, product URLs, or item IDs. Returns prices, ratings, stock, seller, variants, breadcrumbs, and specifications. Optional per-review extraction. PPE pricing — $0.003 per product, $0.002 per review.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Khadin Akbar
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
3
Monthly active users
18 hours ago
Last modified
Categories
Share
Walmart Data Extractor — Products, Prices, Reviews & Sellers
Production-grade scraper that extracts structured product data from Walmart.com: titles, brands, current and list prices, savings, ratings, review counts, stock status, seller name and type (Walmart vs marketplace), pickup/shipping availability, images, variants, breadcrumbs, and specifications. Optionally pulls individual customer reviews. Built on Walmart's __NEXT_DATA__ JSON blob (richer and more reliable than DOM scraping) and runs through Apify Residential proxies to clear Walmart's Akamai + PerimeterX defense.
One actor, four modes — search, category, direct product URLs, or raw Walmart item IDs.
What you get per product
| Field | Description |
|---|---|
itemId | Walmart numeric item ID (a.k.a. usItemId). |
productUrl | Canonical Walmart product page URL. |
title | Product title. |
brand | Brand name. |
model | Manufacturer model number, when available. |
category | Leaf category (Laptops, Wireless Headphones, etc.). |
breadcrumbs[] | Full category path from root → leaf. |
currentPrice | Current selling price (USD). |
listPrice | MSRP / strikethrough price, when shown. |
savings | listPrice - currentPrice, when both present. |
onSale | true if savings > 0. |
currency | Currency code (USD). |
rating | Average customer rating (0–5). |
reviewCount | Total number of customer reviews. |
inStock | Boolean parsed from availabilityStatus. |
sellerName | Selling entity (Walmart.com or marketplace seller). |
sellerType | walmart / marketplace / walmart_or_marketplace. |
pickupAvailable | Local store pickup available. |
shippingAvailable | Ships from any fulfillment channel. |
freeShipping | Free shipping flagged. |
imageUrl | Primary image URL. |
images[] | Full image gallery. |
variants[] | Color / size / configuration variants (when includeVariants). |
specifications{} | Manufacturer spec sheet as a {key: value} map (when includeSpecifications). |
description | Short product description. |
badges[] | Promotional badges (Rollback, Reduced Price, Bestseller, etc.). |
scrapedAt | ISO timestamp the record was captured. |
When includeReviews = true, the same dataset also receives review records tagged _type: "review":
| Field | Description |
|---|---|
reviewId | Walmart review identifier. |
itemId | Parent product item ID. |
rating | Star rating (1–5). |
title | Review title. |
text | Review body. |
author | Reviewer display name. |
verifiedPurchaser | Verified buyer flag. |
helpfulVotes / unhelpfulVotes | Community vote counts. |
photos[] | Review-attached photos. |
pros[] / cons[] | Bullet pros/cons (when present). |
submittedAt | ISO date of the review. |
When to use this actor
- Walmart Marketplace sellers tracking competitor SKUs, Buy Box wins, and MAP violations.
- Retail price-monitoring SaaS building cross-retailer feeds (pair with our Amazon, eBay, and Google Shopping actors).
- Dropshipping / arbitrage agents comparing Walmart prices against Amazon/Walmart pricing in real time.
- Brand teams monitoring review sentiment and rating drift on their SKUs.
- AI shopping agents that need current Walmart prices/stock as a tool call (
apify--walmart-data-extractorvia Apify MCP).
When NOT to use
- Walmart Canada / Mexico — this actor targets
walmart.comonly. Walmart.ca and walmart.com.mx have different DOM/JSON shapes. - Bulk catalogs > 500 products per run — split into multiple runs. Walmart's anti-bot pool degrades with sustained heavy concurrency.
- Walmart Grocery delivery — grocery uses a separate API surface; out of scope here.
Modes
mode: search
Run a keyword search.
{ "mode": "search", "searchQuery": "samsung 65 inch tv", "maxProducts": 50 }
mode: category
Crawl a Walmart category/listing URL.
{ "mode": "category", "categoryUrls": ["https://www.walmart.com/shop/electronics/laptops"] }
mode: productUrls
Direct product detail pages.
{ "mode": "productUrls", "productUrls": ["https://www.walmart.com/ip/Apple-AirPods-Pro/588402301","https://www.walmart.com/ip/123456789"] }
mode: itemIds
Bulk catalog enrichment from a list of Walmart item IDs.
{ "mode": "itemIds", "itemIds": ["588402301", "210384701"] }
Pricing — Pay Per Event
This actor uses Apify's pay-per-event billing — you only pay for results you actually receive, plus a tiny start fee.
| Event | Price |
|---|---|
apify-actor-start | $0.00005 (one-time per run) |
product-extracted | $0.003 per product |
review-extracted | $0.002 per review (only when includeReviews = true) |
Typical costs:
- 50 products:
~$0.15 - 50 products + 10 reviews each:
~$1.15 - 500 products:
~$1.50
Pay-per-usage (compute + proxy) is also enabled for heavy power-user jobs — buyers self-select at run time.
Usage examples
Via Apify console
- Pick a
modeand supply the matching field (searchQuery,categoryUrls,productUrls, oritemIds). - Optionally set
includeReviews: trueandmaxReviewsPerProduct. - Run.
Via API (Node.js)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('khadinakbar/walmart-data-extractor').call({mode: 'search',searchQuery: 'wireless headphones',maxProducts: 50,includeReviews: false,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Via API (Python)
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('khadinakbar/walmart-data-extractor').call(run_input={'mode': 'productUrls','productUrls': ['https://www.walmart.com/ip/588402301'],'includeReviews': True,'maxReviewsPerProduct': 20,})items = list(client.dataset(run['defaultDatasetId']).iterate_items())print(items)
Via Apify MCP (Claude / GPT / Gemini agents)
Tool name: apify--walmart-data-extractor. The actor's input schema is wired for AI-agent consumption — every property has a 4-sentence description with a literal example and disambiguator.
How it works
- PlaywrightCrawler (Chromium) with realistic Chrome 120+ fingerprint pool (Windows + macOS desktop, en-US locale).
- Apify Residential proxy pinned to US — non-US IPs see different (or missing) Walmart inventory due to geo-locking.
- Session pool with cookie persistence — PerimeterX scores per-session, so we keep sessions warm. Sessions are retired on 403/429/503 or block-page detection.
- Resource blocking — images, fonts, stylesheets, and media are aborted in
preNavigationHooksfor ~3x faster page loads (we don't need them, the data lives in__NEXT_DATA__). - Multi-path defensive extraction —
parsers.jswalks several known Next.js layouts so a single Walmart schema change rarely breaks the extractor. - Silent-failure guard — runs that complete with 0 products and >0 errors fail with an actionable status message rather than charging the start fee on an empty dataset.
Known limits
- Walmart caps search at 25 pages (~1,000 results). Use category URLs or split keyword variants for larger catalogs.
- Reviews shape may vary for some product types (auto, grocery, books). The extractor falls back gracefully — partial review records over no review records.
- Akamai + PerimeterX rate-limits the residential pool under heavy concurrent load. If a run produces 0 items, retry in 10–15 minutes or lower
maxProducts. - Geo-locking — Walmart shows different prices/availability per ZIP. The default US residential proxy yields the "default" US view; ZIP-aware pricing is on the roadmap for v0.2.
Legal
This actor scrapes publicly available product information from walmart.com. By using it, you assume responsibility for your compliance with Walmart's Terms of Use and applicable scraping/data laws in your jurisdiction. The actor does not require any Walmart account login and does not extract account-only or personal data. Use at your own risk.
