Shopify Products Scraper Pro avatar
Shopify Products Scraper Pro

Pricing

from $0.50 / 1,000 products

Go to Apify Store
Shopify Products Scraper Pro

Shopify Products Scraper Pro

Extract product data from any Shopify store using official JSON API. Get products, variants, prices, inventory, images, and metadata. No authentication required. Fast, accurate, and cost-effective solution for e-commerce intelligence and competitor analysis.

Pricing

from $0.50 / 1,000 products

Rating

0.0

(0)

Developer

Normalize

Normalize

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

3 days ago

Last modified

Share

Extract comprehensive product data from any Shopify store using the official Shopify JSON API. Fast, reliable, and cost-effective solution for e-commerce data extraction, competitor analysis, and market research.

What Does This Actor Do?

Shopify Products Scraper Pro extracts product information from Shopify stores without requiring authentication or API keys. It collects structured data including product details, variants, prices, inventory levels, images, and metadata - perfect for e-commerce intelligence, dropshipping, price monitoring, and market analysis.

Why Choose This Scraper?

This actor scrapes product information from Shopify stores without requiring authentication or API keys. It leverages Shopify's public JSON endpoints to extract structured data including products, variants, prices, inventory levels, images, and metadata.

Key Features:

  • Uses official Shopify JSON API (not HTML scraping)
  • Works on any public Shopify store
  • No authentication required
  • High accuracy and reliability
  • Automatic pagination and retry logic
  • Respectful rate limiting

Use Cases

E-commerce Intelligence:

  • Competitor product analysis and pricing research
  • Market trend identification and category analysis
  • Product catalog monitoring and updates

Business Operations:

  • Dropshipping supplier inventory tracking
  • Price comparison platform data collection
  • Product database enrichment and synchronization

Market Research:

  • Industry product trends analysis
  • Vendor and brand comparison
  • Seasonal catalog changes tracking

Input Configuration

Required Parameters

storeDomain (String)

  • The Shopify store domain to scrape
  • Example: gymshark.com or store.myshopify.com
  • Do not include https:// or paths

Optional Parameters

mode (String)

  • Scraping mode: all, collection, or handles
  • Default: all
  • all: Scrape all products from the store
  • collection: Scrape products from a specific collection
  • handles: Scrape specific products by handle

collectionHandle (String)

  • Collection handle to scrape (required when mode is collection)
  • Example: mens, sale, new-arrivals
  • Find handle in collection URL: /collections/HANDLE

productHandles (Array)

  • Array of product handles or URLs (required when mode is handles)
  • Example: ["product-handle", "https://store.com/products/product-handle"]
  • Can mix handles and full URLs

includeVariants (Boolean)

  • Include detailed variant information
  • Default: true
  • Set to false to reduce output size

includeImages (Boolean)

  • Include product image details
  • Default: true
  • Set to false to reduce output size

maxProducts (Integer)

  • Maximum number of products to scrape
  • Default: Unlimited
  • Useful for testing or sampling

maxConcurrency (Integer)

  • Number of concurrent requests
  • Default: 5
  • Range: 1 to 20
  • Higher values = faster scraping but more resource usage

proxyConfiguration (Object)

  • Apify proxy configuration
  • Recommended for large-scale scraping
  • Example: {"useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"]}

Input Examples

Example 1: Scrape All Products

{
"storeDomain": "gymshark.com",
"mode": "all"
}

Example 2: Scrape Specific Collection

{
"storeDomain": "gymshark.com",
"mode": "collection",
"collectionHandle": "mens"
}

Example 3: Scrape Specific Products

{
"storeDomain": "gymshark.com",
"mode": "handles",
"productHandles": [
"legacy-tshirt",
"vital-seamless-leggings"
]
}

Example 4: Optimized for Speed

{
"storeDomain": "bigstore.com",
"mode": "all",
"maxConcurrency": 15,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Example 5: Testing Configuration

{
"storeDomain": "gymshark.com",
"mode": "all",
"maxProducts": 50,
"includeVariants": false,
"includeImages": false
}

Output Format

The actor outputs structured JSON data with comprehensive product information.

Product Data Structure

{
"url": "https://store.com/products/product-handle",
"id": 1234567890,
"title": "Product Name - Variant",
"handle": "product-handle",
"description": "<p>HTML description</p>",
"descriptionText": "Plain text description",
"vendor": "Brand Name",
"productType": "Category",
"tags": ["tag1", "tag2"],
"price": 29.99,
"priceMin": 29.99,
"priceMax": 39.99,
"priceVaries": true,
"compareAtPrice": 49.99,
"compareAtPriceMin": 49.99,
"compareAtPriceMax": 49.99,
"onSale": true,
"available": true,
"totalInventory": 500,
"variantsCount": 8,
"variants": [...],
"options": [...],
"imagesCount": 6,
"images": [...],
"featuredImage": "https://cdn.shopify.com/...",
"createdAt": "2024-01-01T00:00:00Z",
"updatedAt": "2024-11-12T10:00:00Z",
"publishedAt": "2024-01-01T12:00:00Z",
"scrapedAt": "2024-11-12T10:30:00Z"
}

Variant Information

When includeVariants is true, each product includes detailed variant data:

"variants": [
{
"id": 9876543210,
"title": "Small / Black",
"price": 29.99,
"compareAtPrice": 49.99,
"sku": "PROD-SKU-001",
"barcode": "123456789012",
"inventoryQuantity": 100,
"available": true,
"option1": "Small",
"option2": "Black",
"option3": null,
"weight": 0.2,
"weightUnit": "kg",
"requiresShipping": true,
"taxable": true
}
]

Product Options

"options": [
{
"name": "Size",
"position": 1,
"values": ["Small", "Medium", "Large", "XL"]
},
{
"name": "Color",
"position": 2,
"values": ["Black", "White", "Blue"]
}
]

Image Information

When includeImages is true:

"images": [
{
"id": 3333333333,
"src": "https://cdn.shopify.com/s/files/1/xxxx/products/image.jpg",
"alt": "Product Image Description",
"width": 2048,
"height": 2048,
"position": 1
}
]

Performance

Speed:

  • Average: 500-1000 products per minute
  • Depends on store response time and concurrency settings

Accuracy:

  • 100% data accuracy using official API
  • No parsing errors or missing fields

Reliability:

  • Automatic retry on failures with exponential backoff
  • Error handling for network issues and rate limits
  • Success rate: 99%+

Resource Usage:

  • Memory: Less than 512MB RAM for most jobs
  • Compute: Approximately 0.01 compute units per 1,000 products

Pricing

Cost Estimate:

  • Small store (100 products): ~$0.002
  • Medium store (1,000 products): ~$0.02
  • Large store (10,000 products): ~$0.20
  • Enterprise (100,000 products): ~$2.00

Actual costs depend on compute time and proxy usage.

How It Works

This actor leverages Shopify's public JSON API endpoints available on all Shopify stores:

API Endpoints Used:

  • https://store.com/products.json - Product listing with pagination
  • https://store.com/products/handle.json - Individual product details
  • https://store.com/collections/handle/products.json - Collection products

Process Flow:

  1. Domain Validation: Verifies the provided domain is a valid Shopify store
  2. Mode Selection: Routes to appropriate scraping strategy (all/collection/handles)
  3. Data Fetching: Makes requests to Shopify JSON endpoints with pagination
  4. Data Processing: Normalizes and enriches product data
  5. Output: Saves structured data to Apify dataset

Technical Advantages:

  • No HTML parsing - direct JSON API access
  • No CSS selectors that break with theme updates
  • No authentication or API keys required
  • Works on any Shopify store regardless of plan or theme
  • Consistent data structure across all stores

Best Practices

For Large Stores (10,000+ products)

  1. Enable proxy configuration to avoid rate limiting
  2. Increase concurrency to 10-15 for faster scraping
  3. Consider scraping specific collections instead of entire store
  4. Use maxProducts parameter for initial testing

For Regular Monitoring

  1. Use mode: "collection" for specific categories
  2. Schedule runs during off-peak hours
  3. Store results in named datasets for comparison
  4. Set up webhooks for automated processing

For Data Quality

  1. Keep includeVariants: true for complete inventory data
  2. Enable includeImages: true for product catalogs
  3. Use product handles for precise targeting
  4. Verify store domain before large scraping jobs

Troubleshooting

Store Not Found Error

Issue: "Domain does not appear to be a Shopify store"

Solutions:

  • Verify the domain is correct (no typos)
  • Remove https:// and paths from domain
  • Try without www. prefix
  • Ensure the store is publicly accessible (not password-protected)

No Products Returned

Issue: Actor completes but returns empty dataset

Solutions:

  • Verify the store has published products
  • Check if collection handle is correct (try mode: "all" first)
  • Ensure products are not restricted by location/password
  • Check actor logs for specific error messages

Slow Performance

Issue: Actor takes longer than expected

Solutions:

  • Increase maxConcurrency (up to 20)
  • Enable Apify proxy configuration
  • Reduce output size with includeVariants: false
  • Check if store has slow response times

Incomplete Data

Issue: Some products missing fields

Solutions:

  • Some Shopify stores may not populate all fields
  • Check if includeVariants and includeImages are enabled
  • Verify the store's product data in Shopify admin
  • Review actor logs for parsing warnings

Limitations

Technical Limitations:

  • Only scrapes publicly accessible stores
  • Cannot access password-protected stores or products
  • Cannot bypass Shopify Plus wholesale portals
  • Limited by Shopify's public API availability

Data Limitations:

  • Cannot access customer data or order information
  • Cannot retrieve draft or unpublished products
  • Cannot access admin-only product metadata
  • Inventory counts may be cached by Shopify

Rate Limiting:

  • Respects Shopify's fair use guidelines
  • Implements polite crawling (1-2 requests/second)
  • Automatic backoff on rate limit responses
  • Proxy usage recommended for very large stores

FAQ

Q: Does this work on all Shopify stores?

A: Yes, it works on any public Shopify store including custom domains and *.myshopify.com stores.

Q: Do I need API credentials or store access?

A: No authentication required. This uses public JSON endpoints available on all Shopify stores.

Q: Will I get blocked or rate limited?

A: The actor implements polite crawling with automatic retries. For large-scale scraping, use Apify proxies.

Q: How accurate is the data compared to HTML scraping?

A: 100% accurate. Using official API eliminates parsing errors common with HTML scraping.

Q: Can I scrape product reviews or customer data?

A: No, this actor only accesses publicly available product catalog data.

Q: How do I find collection handles?

A: Visit the collection page in your browser. The handle is in the URL: https://store.com/collections/HANDLE

Q: Can I scrape multiple stores in one run?

A: No, configure one store per actor run. Use Apify tasks or schedules for multiple stores.

Q: What happens if a product is deleted during scraping?

A: The actor handles 404 errors gracefully and continues with remaining products.

Explore our complete Shopify scraping suite:

  • Shopify Price Monitor - Track price changes and sales over time
  • Shopify Inventory Tracker - Monitor stock levels and availability
  • Shopify Store Analyzer - Extract store metadata and analytics
  • Shopify Collection Scraper - Specialized collection-based extraction
  • Shopify Feed Generator - Generate product feeds for Google Shopping

This actor accesses only publicly available data from Shopify stores through official public API endpoints. It does not:

  • Require authentication or API keys
  • Circumvent access controls or security measures
  • Access password-protected or restricted content
  • Violate Shopify's Terms of Service

The actor implements responsible scraping practices including rate limiting and respectful request patterns. Users are responsible for ensuring their use complies with applicable laws, data protection regulations, and the terms of service of stores they scrape.