πŸ›οΈ Shopify Product Scraper avatar
πŸ›οΈ Shopify Product Scraper

Pricing

Pay per usage

Go to Apify Store
πŸ›οΈ Shopify Product Scraper

πŸ›οΈ Shopify Product Scraper

Extract product data from any Shopify-powered store instantly. This universal tool is lightweight and optimized for speed, gathering prices, variants, and images with ease. To ensure maximum stability and avoid IP blocking, using residential proxies is highly recommended.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

an hour ago

Last modified

Share

Shopify Product Scraper

Extract comprehensive product data from any Shopify store with high speed and reliability. This scraper supports multiple extraction methods including JSON API (recommended), JSON-LD structured data, and HTML parsing to ensure maximum compatibility across all Shopify stores.

Why Choose This Shopify Scraper?

  • Multiple Extraction Methods - Automatically selects the best method: JSON API (fastest), JSON-LD, or HTML parsing
  • Complete Product Data - Extracts titles, prices, variants, images, descriptions, SKUs, inventory, and more
  • Smart Pagination - Automatically handles pagination to scrape entire collections
  • Variant Support - Captures all product variants (sizes, colors, styles) with individual pricing
  • Customizable Filtering - Control stock status, collection targeting, and result limits
  • High Performance - Optimized for speed without compromising data quality
  • Proxy Support - Built-in proxy rotation to avoid rate limiting

Features

  • Scrape products from any Shopify store
  • Extract data from specific collections or search results
  • Support for product variants (sizes, colors, options)
  • Automatic pagination handling
  • JSON API priority for maximum speed
  • HTML parsing fallback for compatibility
  • Structured data extraction (JSON-LD)
  • Filter by stock availability
  • Customizable result limits
  • Proxy configuration support

Input Configuration

The scraper accepts the following input parameters:

Required Parameters

ParameterTypeDescription
shopUrlStringThe base URL of the Shopify store (e.g., https://www.allbirds.com or allbirds.com)

Optional Parameters

ParameterTypeDefaultDescription
startUrlsArray-Specific URLs to scrape (collection pages, product pages). Overrides shopUrl if provided.
collectionString"all"Collection handle to scrape (e.g., "mens-shoes", "new-arrivals"). Use "all" for all products.
searchQueryString-Search for products matching this query instead of scraping a collection.
maxProductsInteger100Maximum number of products to scrape. Set to 0 or leave empty for unlimited.
maxPagesInteger999Maximum number of pages to crawl. Safety limit to prevent infinite loops.
includeVariantsBooleantrueInclude all product variants (sizes, colors, etc.) in the output.
includeOutOfStockBooleantrueInclude products that are currently out of stock.
proxyConfigurationObjectResidentialProxy settings. Residential proxies recommended for best results.

Input Examples

Example 1: Scrape All Products from a Store

{
"shopUrl": "https://www.allbirds.com",
"collection": "all",
"maxProducts": 100
}

Example 2: Scrape Specific Collection

{
"shopUrl": "https://gymshark.com",
"collection": "mens-clothing",
"maxProducts": 50,
"includeVariants": true
}

Example 3: Search Query

{
"shopUrl": "https://www.fashionnova.com",
"searchQuery": "black dress",
"maxProducts": 30,
"includeOutOfStock": false
}

Example 4: Multiple URLs

{
"startUrls": [
"https://store1.myshopify.com/collections/summer",
"https://store2.myshopify.com/collections/winter"
],
"maxProducts": 100
}

Output Format

The scraper returns structured data in JSON format. Each product contains the following fields:

Output Schema

{
"id": 1234567890,
"title": "Wool Runner - Natural Grey",
"handle": "wool-runner-natural-grey",
"description": "Product description text...",
"vendor": "Allbirds",
"product_type": "Shoes",
"tags": ["sustainable", "comfortable", "casual"],
"price": 98.00,
"compare_at_price": 120.00,
"currency": "USD",
"available": true,
"inventory_quantity": 45,
"sku": "WR-NG-10",
"barcode": "123456789012",
"weight": 500,
"weight_unit": "g",
"images": [
"https://cdn.shopify.com/s/files/1/image1.jpg",
"https://cdn.shopify.com/s/files/1/image2.jpg"
],
"variants": [
{
"id": 987654321,
"title": "Size 10 / Natural Grey",
"option1": "10",
"option2": "Natural Grey",
"option3": null,
"price": 98.00,
"compare_at_price": 120.00,
"sku": "WR-NG-10",
"available": true,
"inventory_quantity": 15
}
],
"url": "https://www.allbirds.com/products/wool-runner-natural-grey",
"created_at": "2023-01-15T10:30:00Z",
"updated_at": "2024-12-18T08:20:00Z",
"published_at": "2023-01-20T09:00:00Z"
}

Key Output Fields

FieldTypeDescription
idIntegerUnique product identifier from Shopify
titleStringProduct name/title
handleStringURL-friendly product identifier
descriptionStringProduct description (may include HTML)
vendorStringBrand or manufacturer name
product_typeStringProduct category or type
tagsArrayProduct tags for categorization
priceNumberCurrent product price
compare_at_priceNumberOriginal price (before discount)
currencyStringPrice currency code (USD, EUR, etc.)
availableBooleanWhether product is in stock
inventory_quantityIntegerAvailable stock quantity
skuStringStock keeping unit identifier
imagesArrayArray of product image URLs
variantsArrayAll product variants (sizes, colors, etc.)
urlStringDirect link to product page

Use Cases

This Shopify scraper is perfect for:

  • Price Monitoring - Track competitor pricing and detect price changes
  • Market Research - Analyze product catalogs and market trends
  • Inventory Tracking - Monitor stock levels and availability
  • Product Database - Build comprehensive product databases for comparison sites
  • Competitive Analysis - Analyze competitor product offerings and pricing strategies
  • Dropshipping - Find products for dropshipping businesses
  • Data Analytics - Gather data for business intelligence and analytics
  • SEO Analysis - Study product descriptions and metadata

How It Works

The scraper uses an intelligent multi-method approach:

  1. JSON API Method (Primary)

    Attempts to fetch product data directly from Shopify's JSON API endpoints (/products.json, /collections/{handle}/products.json). This is the fastest and most reliable method.

  2. JSON-LD Extraction (Secondary)

    If JSON API is unavailable, extracts structured data from JSON-LD schema markup embedded in HTML pages.

  3. HTML Parsing (Fallback)

    As a last resort, parses product information directly from HTML using intelligent selectors that work across different Shopify themes.

Performance and Limits

MetricValue
Average Speed50-200 products per minute (depends on method and store)
Recommended Memory2048 MB
Timeout3600 seconds (1 hour)
Max Concurrency5 requests
Retry Attempts3 per request

Best Practices

  • Automatic Method Selection - The scraper automatically chooses the fastest available method (JSON API β†’ JSON-LD β†’ HTML parsing)
  • Enable Proxies - Always use residential proxies for large-scale scraping
  • Set Reasonable Limits - Use maxProducts to control cost and runtime
  • Include Variants - Set includeVariants: true for complete product data
  • Handle Rate Limits - Use proxy rotation to avoid rate limiting
  • Test First - Start with small maxProducts values to test configuration

Troubleshooting

Common Issues and Solutions

Data Export Options

Export your scraped data in multiple formats:

  • JSON - Structured data format, ideal for APIs and applications
  • CSV - Spreadsheet format, perfect for Excel and data analysis
  • Excel - Native Excel format with formatting preserved
  • HTML - Human-readable table format
  • XML - Structured markup format
  • RSS - Feed format for automated monitoring

Integration Options

Integrate the scraper with other tools and platforms:

  • Schedule regular runs using Apify Scheduler
  • Connect to Google Sheets for automatic data updates
  • Integrate with webhooks for real-time notifications
  • Use Apify API for programmatic access
  • Connect to Zapier, Make, or other automation platforms
  • Export to databases (PostgreSQL, MongoDB, etc.)

Cost Optimization Tips

  • Use maxProducts to limit the number of results
  • Set includeOutOfStock: false to skip unavailable products
  • Use datacenter proxies for stores without strict rate limits (cheaper than residential)
  • Schedule scraping during off-peak hours
  • Leverage the dataset cache to avoid re-scraping recent data

Privacy and Ethics

  • This scraper only collects publicly available product information
  • Always respect website terms of service
  • Use reasonable rate limits to avoid overloading target servers
  • Only scrape data you have permission to access
  • Comply with data protection regulations (GDPR, CCPA, etc.)

Support

Need help or have questions?

Updates and Changelog

The scraper is regularly updated to:

  • Maintain compatibility with Shopify platform changes
  • Improve performance and reliability
  • Add new features based on user feedback
  • Fix bugs and resolve issues
  • Enhance data extraction accuracy

Technical Requirements

  • Node.js 22 or higher
  • Apify platform account
  • Proxy configuration (recommended for production use)
  • Minimum 2048 MB memory allocation
  • Shopify Store Finder
  • Shopify Collection Scraper
  • E-commerce Price Monitor
  • Product Review Scraper

Built with ❀️ for the Apify community
Happy scraping!