Woocommerce Scraper avatar

Woocommerce Scraper

Pricing

$19.99/month + usage

Go to Apify Store
Woocommerce Scraper

Woocommerce Scraper

🛒 WooCommerce Scraper extracts product data from WooCommerce stores—titles, prices, stock, variants, categories, images & reviews—at scale. 🔍 Perfect for catalog building, competitor tracking & SEO. ⚙️ Export CSV/JSON and schedule automated updates.

Pricing

$19.99/month + usage

Rating

0.0

(0)

Developer

ScrapePilot

ScrapePilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

20 days ago

Last modified

Share

Woocommerce Scraper

The Woocommerce Scraper is a production-ready Apify actor that extracts structured product and content data from WooCommerce-powered WordPress sites at scale. It solves tedious catalog research by automating WooCommerce product data extraction with powerful filters, smart proxy fallback, and batch processing. Ideal for marketers, developers, data analysts, and researchers, this WooCommerce web scraper works as a woocommerce product scraper for price monitoring, feed building, and SEO analysis—enabling automated WooCommerce product scraping across multiple stores with clean, export-ready datasets.

What is Woocommerce Scraper?

Woocommerce Scraper is a scalable WooCommerce scraping tool that pulls structured data from WooCommerce REST and Store API endpoints, including products, categories, tags, reviews, pages, posts, and more. It addresses the core challenge of collecting up-to-date product catalogs by letting you scrape WooCommerce products with fine-grained filters (price, SKU, stock, ratings) and robust proxy fallback. Built for marketers, e-commerce teams, developers, and researchers, it enables automated WooCommerce product scraping across many stores to power pricing intelligence, inventory checks, and product feed enrichment at scale.

What data / output can you get?

Below are the primary fields produced when scraping the “products” resource. Output is pushed to the Apify dataset and can be exported as CSV or JSON. For non-product resources (e.g., pages, posts, categories), the actor returns the source API objects as-is and augments them with store and resource_type.

Data fieldDescriptionExample value
urlProduct page URL (permalink)https://shop.example.com/product/example-product/
idProduct ID12345
nameProduct nameExample Product
slugProduct slugexample-product
typeProduct typesimple
skuProduct SKUEX-001
on_saleWhether the product is on salefalse
prices.priceCurrent product price (string)"29.99"
prices.currency_codeISO currency code"USD"
average_ratingAverage customer rating (string)"4.5"
review_countNumber of reviews12
is_in_stockStock availability flagtrue
storeSource store URLhttps://shop.example.com
resource_typeScraped resource typeproducts

Bonus fields produced for products include: parent, variation, short_description, description (formatted as md/text/html), images, categories, tags, brands, attributes, variations, grouped_products, has_options, is_purchasable, is_on_backorder, low_stock_remaining, sold_individually, stock_availability, add_to_cart, extensions. Exports are available via Apify in CSV and JSON.

Key features

  • 🧠 Smart proxy fallback & retries — Automatic progression from no proxy ➜ datacenter ➜ residential with retries ensures higher success rates on WooCommerce stores that block requests.
  • 🗂️ Multi-resource coverage — Scrape products or switch to categories, brands, tags, attributes, reviews, pages, posts, comments, post-categories, post-tags, and users from a WordPress + WooCommerce site.
  • 🎯 Powerful product filters — Filter and sort by search, SKU, rating, price range, tax_class, category, tag, product_type, status, stock, featured, on_sale, and include_variations to build precise datasets.
  • 📦 Batch processing — Add multiple stores via startUrls/url or load from dev_fileupload to run a single job across many WooCommerce domains.
  • ✍️ Flexible content formatting — Output description and short_description in Markdown (default), plain text, or HTML using the format setting to fit your publishing workflow.
  • 🔧 Advanced HTTP controls — Configure custom headers, cookies, and proxy strings with dev_custom_headers, dev_custom_cookies, and dev_proxy_config for edge cases and troubleshooting.
  • 💾 Reliable data handling — Incremental saving, optional data cleansing control (dev_no_strip), and custom dataset naming/clearing (dev_dataset_name, dev_dataset_clear) for repeatable workflows.
  • 👩‍💻 Developer friendly — Built in Python with the Apify SDK; integrate via the Apify Dataset API, connect to pipelines, or use it as a woocommerce scraper python component in your stack.
  • ⏱️ Automation-ready — Schedule runs on Apify to maintain a woocommerce product feed scraper and keep product catalogs fresh for analytics and SEO.

How to use Woocommerce Scraper - step by step

  1. Sign in to Apify and open the Apify Console.
  2. Find “woocommerce-scraper” in the Actors section and open it.
  3. Add your store list:
    • Paste store URLs into startUrls (or use url as an alternative).
    • Optionally supply a file/URL with additional domains via dev_fileupload.
  4. Choose what to scrape:
    • Set resource to products, categories, brands, tags, attributes, reviews, pages, posts, comments, post-categories, post-tags, or users.
  5. Configure filters and sorting:
    • Use limit (1–1000), search, sku, rating, min_price/max_price, tax_class, category, tag, product_type, status, stock, featured, sale, sort, and order as needed.
  6. Set output formatting and variations:
    • Set format to md (default), text, or html; toggle include_variations if you want variant items included.
  7. Networking options (optional):
    • Configure proxyConfiguration or supply dev_proxy_config; add dev_custom_headers/dev_custom_cookies for advanced scenarios.
  8. Start the run:
    • Click Start. Monitor real-time logs and progress indicators.
  9. Download results:
    • Go to the OUTPUT (Dataset) tab to preview and export results to JSON or CSV.

Pro Tip: Use dev_transform_fields to select only the fields you need (e.g., name,prices.price,images.0) and dev_dataset_name to route outputs into predictable dataset names for downstream automation.

Use cases

Use case nameDescription
E-commerce research & catalogingAggregate product data from multiple WooCommerce stores to analyze assortments, categories, and pricing at scale.
Competitor price trackingMonitor prices and on_sale status using a woocommerce price scraper to feed dashboards and alerts.
Product feed generationBuild a woocommerce product feed scraper pipeline to export clean CSV/JSON for ads, marketplaces, or SEO.
Inventory & stock monitoringTrack is_in_stock, is_on_backorder, and low_stock_remaining for replenishment and merchandising decisions.
Content & SEO analysisScrape descriptions, categories, tags, and reviews from WordPress/WooCommerce for content optimization.
Data enrichment API pipelineIntegrate datasets with internal systems via the Apify Dataset API for automated woocommerce product data extraction.
Academic & market researchCollect cross-site WooCommerce data to study market breadth, pricing distributions, and consumer trends.

Why choose Woocommerce Scraper?

A precise, automation-ready WooCommerce data scraper built for reliability and scale.

  • ✅ Accurate, structured outputs for products and WordPress content resources.
  • 🌍 Works across many WooCommerce stores with robust proxy fallback and retries.
  • 📈 Scales to thousands of records per run with granular filters and sorting.
  • 👩‍💻 Developer access via Apify datasets and Python-based architecture.
  • 🔒 Public-data only design with optional data cleansing controls.
  • 💸 Cost-effective alternative to brittle browser plugins or one-off scripts.
  • 🔗 Schedule, export, and integrate with existing analytics or ETL workflows.

Bottom line: This is a production-grade woocommerce web scraper—more reliable than ad‑hoc extensions and tuned for clean, reusable datasets.

Yes—when used responsibly. This actor accesses publicly available endpoints on WooCommerce/WordPress sites and does not log in or access private data. Follow these guidelines:

  • Scrape only public data and respect the website’s robots.txt and terms of service.
  • Avoid collecting personal or sensitive information.
  • Ensure your use complies with regulations like GDPR/CCPA and your organization’s policies.
  • Consult your legal team for edge cases or jurisdiction-specific requirements.

Input parameters & output format

Example JSON input

{
"startUrls": [
"https://shop.example.com",
"https://another-woo.example.org"
],
"resource": "products",
"limit": 100,
"include_variations": false,
"format": "md",
"sort": "price",
"order": "desc",
"search": "running shoes",
"sku": "SKU-1001,SKU-2002",
"rating": "4,5",
"min_price": 10,
"max_price": 200,
"tax_class": "standard",
"category": "12,34",
"tag": "summer",
"product_type": "simple",
"status": "publish",
"stock": "instock",
"featured": false,
"sale": false,
"proxyConfiguration": {
"useApifyProxy": false
},
"dev_proxy_config": "http://user:pass@proxy.example.com:8000",
"dev_custom_headers": "X-Debug: true",
"dev_custom_cookies": "[{\"session\":\"abc123\"}]",
"dev_transform_fields": "name,prices.price,images.0",
"dev_dataset_name": "data-{ACTOR}-{DATE}-{TIME}",
"dev_dataset_clear": false,
"dev_no_strip": false,
"dev_fileupload": "https://example.com/woo-stores.txt"
}

All input fields

  • startUrls
    • Type: array
    • Required: yes
    • Default: none (prefill: ["https://woocommerce.com"])
    • Description: Where do you want to Shop? (Also accepts 'url' as array of store URLs)
  • url
    • Type: array
    • Required: no
    • Default: none
    • Description: Alternative to startUrls: array of store URLs to scrape.
  • limit
    • Type: integer
    • Required: no
    • Default: 20 (min 1, max 1000)
    • Description: Number of results (per-query)
  • resource
    • Type: string
    • Required: no
    • Default: products
    • Description: Select resource type to scrape (products, categories, brands, tags, attributes, reviews, pages, posts, comments, post-categories, post-tags, users)
  • include_variations
    • Type: boolean
    • Required: no
    • Default: false
    • Description: Include product variations in results
  • format
    • Type: string
    • Required: no
    • Default: md
    • Description: Output format for Descriptions. (default: Markdown)
  • sort
    • Type: string
    • Required: no
    • Default: date
    • Description: Sort results by attribute
  • order
    • Type: string
    • Required: no
    • Default: (empty = Auto)
    • Description: Order sort direction
  • search
    • Type: string
    • Required: no
    • Default: none
    • Description: Limit results to those matching a string.
  • sku
    • Type: string
    • Required: no
    • Default: none
    • Description: Limit result set to products with specific SKU(s). Use commas to separate.
  • rating
    • Type: string
    • Required: no
    • Default: none
    • Description: Filter by product ratings. Enter comma-separated rating values (e.g., 1,2,3,4,5)
  • min_price
    • Type: integer
    • Required: no
    • Default: none
    • Description: Limit result set to products based on a minimum price.
  • max_price
    • Type: integer
    • Required: no
    • Default: none
    • Description: Limit result set to products based on a maximum price.
  • tax_class
    • Type: string
    • Required: no
    • Default: (empty = Any)
    • Description: Limit result set to products with a specific tax class.
  • category
    • Type: string
    • Required: no
    • Default: none
    • Description: Product Category ID(s) separated by comma
  • tag
    • Type: string
    • Required: no
    • Default: none
    • Description: Product Tag ID(s) separated by comma
  • product_type
    • Type: string
    • Required: no
    • Default: (empty = Any)
    • Description: Products assigned a specific type
  • status
    • Type: string
    • Required: no
    • Default: (empty = Any)
    • Description: Filter by product status
  • stock
    • Type: string
    • Required: no
    • Default: (empty = Any)
    • Description: Filter by stock status
  • featured
    • Type: boolean
    • Required: no
    • Default: false
    • Description: Featured products
  • sale
    • Type: boolean
    • Required: no
    • Default: false
    • Description: Products on sale
  • proxyConfiguration
    • Type: object
    • Required: no
    • Default: none (prefill: {"useApifyProxy": false})
    • Description: Choose which proxies to use. By default, no proxy is used.
  • dev_proxy_config
    • Type: string
    • Required: no
    • Default: none
    • Description: Supported protocol: HTTP(S), SOCKS5 {http|socks5}://{user:pass}@{hostname|ip-address}:port Example: socks5://example.com:9000
  • dev_custom_headers
    • Type: string
    • Required: no
    • Default: none
    • Description: Additional HTTP Headers as JSON array. Example: [{"name": "Authorization", "value": "Bearer token"}]
  • dev_custom_cookies
    • Type: string
    • Required: no
    • Default: none
    • Description: Additional HTTP Cookies as JSON array. Example: [{"name": "session", "value": "abc123"}]
  • dev_transform_fields
    • Type: string
    • Required: no
    • Default: none
    • Description: Transform the resulting output. Enter comma-separated field paths. For nested object use DOT (e.g., address.streetAddress). For nested array use NUMBER (e.g., images.0.url).
  • dev_dataset_name
    • Type: string
    • Required: no
    • Default: none
    • Description: Save results into custom named Dataset; supports masks {ACTOR}, {DATE}, {TIME}.
  • dev_dataset_clear
    • Type: boolean
    • Required: no
    • Default: false
    • Description: Clear Dataset before insert/update.
  • dev_no_strip
    • Type: boolean
    • Required: no
    • Default: false
    • Description: Disable data cleansing; keep/save empty values (NULL, FALSE, empty ARRAY/OBJECT/STRING).
  • dev_fileupload
    • Type: string
    • Required: no
    • Default: none
    • Description: Upload your file and copy & paste the URL somewhere.

Example JSON output (products)

{
"url": "https://shop.example.com/product/example-product/",
"id": 12345,
"name": "Example Product",
"slug": "example-product",
"parent": 0,
"type": "simple",
"variation": "",
"sku": "EX-001",
"short_description": "A lightweight, breathable running shoe.",
"description": "## Highlights\n- Mesh upper\n- Cushioned midsole\n\nGreat for daily runs.",
"on_sale": false,
"prices": {
"price": "29.99",
"regular_price": "29.99",
"sale_price": "0",
"price_range": null,
"currency_code": "USD",
"currency_symbol": "$",
"currency_minor_unit": 2,
"currency_decimal_separator": ".",
"currency_thousand_separator": ",",
"currency_prefix": "$",
"currency_suffix": ""
},
"average_rating": "4.5",
"review_count": 12,
"images": [],
"categories": [],
"tags": [],
"brands": [],
"attributes": [],
"variations": [],
"grouped_products": [],
"has_options": false,
"is_purchasable": true,
"is_in_stock": true,
"is_on_backorder": false,
"low_stock_remaining": null,
"sold_individually": false,
"stock_availability": {
"class": "in-stock",
"text": ""
},
"add_to_cart": {
"minimum": 1,
"maximum": 10,
"multiple_of": 1,
"single_text": "Add to cart",
"url": ""
},
"extensions": {
"bundles": [],
"compatibility": {
"host": "",
"host_slug": "",
"vendor_id": 0,
"version": ""
},
"dependencies": {
"plugins": [],
"themes": []
},
"express": {
"plans": ""
},
"marketplace": {
"slug": ""
}
},
"store": "https://shop.example.com",
"resource_type": "products"
}

Note: For non-product resources (e.g., pages, posts, categories), the actor returns the original API object structure for that resource and adds store and resource_type. If dev_transform_fields is used, output will include only the requested fields.

FAQ

Does this work with all WooCommerce stores?

Yes, it works with WooCommerce sites exposing public REST/Store API endpoints, which is standard for WooCommerce. It does not require login or cookies to scrape public data.

Can I scrape WooCommerce product variants?

Yes. Set include_variations to true to include variation items in the results. Leave it false to exclude variations and keep only parent/simple products.

How many products can I scrape per run?

The limit parameter supports up to 1000 results per query. For larger catalogs, run multiple jobs with different filters or offsets (the actor paginates internally and respects your limit).

Is this a wooCommerce scraper plugin I install on WordPress?

No. This is not a woocommerce scraper plugin for WordPress. It’s a cloud-based Apify actor (Python) that functions as a woocommerce web scraper and woocommerce product scraper without needing access to the target site.

Can I filter by price, rating, stock, or SKU?

Yes. You can filter using min_price, max_price, rating, stock, sku, and more (tax_class, category, tag, product_type, status, featured, sale). Sorting is available via sort and order.

Does it export images and categories?

Yes. The products output includes images, categories, and tags as provided by the WooCommerce Store API, making it suitable for a woocommerce image scraper and woocommerce product feed scraper workflow.

Can I use it with Python or an API?

Yes. The actor is implemented in Python and runs on Apify. You can access results via the Apify Dataset API, integrate with your pipelines, or use it as a woocommerce scraper python component in your stack.

What happens if a store blocks my requests?

The actor automatically falls back from no proxy to datacenter proxies and then to residential proxies with retries. This smart proxy strategy improves success rates on stricter stores.

Yes—when done responsibly. The actor collects only publicly available data and does not access private or password-protected content. Ensure compliance with website terms and regulations (e.g., GDPR/CCPA) for your use case.

Can it scrape content beyond products?

Yes. Set resource to categories, brands, tags, attributes, reviews, pages, posts, comments, post-categories, post-tags, or users to collect broader WordPress/WooCommerce data.

Closing CTA / Final thoughts

Woocommerce Scraper is built for fast, reliable WooCommerce product data extraction at scale. With robust proxy handling, rich filters, multi-resource support, and flexible output formatting, it empowers marketers, developers, analysts, and researchers to build accurate datasets for pricing, inventory, and SEO. Export results to CSV/JSON from the Apify dataset, schedule automated runs, and plug the outputs into your pipelines. Start automating your woocommerce product scraper workflow today and extract smarter, cleaner data for your next project.