Shopify Products Scraper
Pricing
$19.99/month + usage
Shopify Products Scraper
Pricing
$19.99/month + usage
Rating
0.0
(0)
Developer
ScraperX
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
A powerful Apify Actor that extracts comprehensive product data from Shopify stores. This actor automatically discovers product URLs from store pages and retrieves detailed product information including prices, variants, descriptions, images, and metadata in structured JSON format.
Why Choose Us?
- Automatic Product Discovery: Automatically finds all product URLs from Shopify store pages - no need to manually list products
- Complete Product Data: Extracts full product information including variants, prices, images, descriptions, and metadata
- Smart Proxy Management: Intelligent proxy fallback system ensures reliable data extraction even when stores implement blocking
- Bulk Processing: Process multiple Shopify stores simultaneously with efficient concurrent requests
- Live Data Saving: Results are saved in real-time, so you don't lose data if the actor is interrupted
- Production Ready: Built with robust error handling, retry logic, and detailed logging for reliable operation
Key Features
🔍 Automatic Product Discovery
- Scans store HTML to automatically find all product URLs containing
/products/ - Handles both absolute and relative URLs
- Deduplicates product links automatically
📊 Comprehensive Data Extraction
- Product details: ID, title, description, vendor, product type
- Pricing information: current price, compare-at price, currency
- Variants: all product variants with sizes, colors, SKUs, inventory
- Images: product images with URLs, dimensions, and variant associations
- Metadata: tags, creation dates, update timestamps, published status
- Full JSON API response preserved for maximum data completeness
🔄 Intelligent Proxy Fallback
- Default: Starts with no proxy for direct connection
- Automatic Fallback: If blocked (403/429), automatically switches to datacenter proxy
- Residential Proxy: If datacenter fails, falls back to residential proxy with 3 retries
- Sticky Proxy: Once residential proxy is activated, uses it for all remaining requests
- Clear Logging: All proxy switches and retries are logged for transparency
⚡ Performance Optimized
- Asynchronous processing for fast concurrent requests
- Efficient HTML parsing with BeautifulSoup
- Live data saving prevents data loss
- Progress tracking with detailed logs
🛡️ Reliable & Robust
- Comprehensive error handling for network issues
- Automatic retry logic for failed requests
- Graceful handling of missing or malformed data
- Detailed logging for monitoring and debugging
Input
The actor accepts the following input parameters:
JSON Example
{"startUrls": ["https://lootcrate.com","https://www.decathlon.com"],"proxyConfiguration": {"useApifyProxy": false}}
Input Fields
| Field | Type | Required | Description |
|---|---|---|---|
| startUrls | array | ✅ Yes | List of Shopify store URLs to scrape. Supports bulk input. Each URL should be a valid Shopify store homepage (e.g., https://lootcrate.com). |
| proxyConfiguration | object | ❌ No | Proxy settings. By default, no proxy is used. If the platform blocks requests, the actor automatically falls back to datacenter proxy, then residential proxy with 3 retries. |
Input Details
-
startUrls:
- Accepts one or more Shopify store URLs
- Each URL should be the store's homepage
- The actor will automatically discover all product pages
- Example:
["https://lootcrate.com", "https://www.decathlon.com"]
-
proxyConfiguration:
- Optional proxy configuration
- Default:
{"useApifyProxy": false}(no proxy) - If enabled, uses Apify's proxy infrastructure
- Automatic fallback ensures reliable data extraction
Output
The actor outputs structured product data grouped by store URL. Each product includes complete information from Shopify's JSON API.
Output Structure
{"https://lootcrate.com": {"total_found": 5,"processed": 5,"successful": 5,"products": [{"url": "https://lootcrate.com/products/loot-crate","json_url": "https://lootcrate.com/products/loot-crate.json","data": {"product": {"id": 5083963261059,"title": "Loot Crate","body_html": "<p>Product description...</p>","vendor": "Loot Crate Core","product_type": "Subscription Box","created_at": "2020-07-07T14:17:32-07:00","handle": "loot-crate","updated_at": "2025-12-28T22:57:43-08:00","published_at": "2023-03-09T06:53:59-08:00","tags": "Subscription, Collectibles, Pop Culture","variants": [{"id": 34197535719555,"product_id": 5083963261059,"title": "S / XS","price": "29.99","compare_at_price": "24.99","sku": "1010126US","inventory_management": "shopify","weight": 0.0,"weight_unit": "lb","requires_shipping": true}],"images": [{"id": 123456789,"product_id": 5083963261059,"src": "https://cdn.shopify.com/...","width": 2000,"height": 2000,"alt": "Product image"}]}}}]}}
Output Fields
| Field | Description |
|---|---|
| store_url | The Shopify store URL that was scraped |
| total_found | Total number of product URLs discovered on the store |
| processed | Number of products processed |
| successful | Number of products successfully extracted |
| products | Array of product objects, each containing: |
| - url | Direct product page URL |
| - json_url | Shopify JSON API endpoint URL |
| - data | Complete product data from Shopify API including: |
| - Product ID, title, description, vendor, type | |
| - Pricing and variants information | |
| - Product images and metadata | |
| - Tags, dates, and all other product attributes |
🚀 How to Use the Actor (via Apify Console)
- Log in to Apify Console and navigate to Actors
- Find the
shopify-products-scraperactor and click on it - Configure inputs:
- Add one or more Shopify store URLs in the
startUrlsfield - Optionally configure proxy settings (default: no proxy with automatic fallback)
- Add one or more Shopify store URLs in the
- Run the actor by clicking the "Start" button
- Monitor progress in real-time through the detailed logs:
- Product discovery progress
- Proxy usage and fallback events
- Success/failure counts for each product
- Access results in the OUTPUT tab:
- View data in the structured table view
- Export results as JSON or CSV
- Download the complete dataset
Example Usage
Input:
{"startUrls": ["https://lootcrate.com"]}
Result:
- Automatically discovers all products on the store
- Extracts complete product data for each item
- Groups results by store URL
- Provides summary statistics (total found, processed, successful)
Best Use Cases
🛒 E-commerce Intelligence
- Monitor competitor product catalogs and pricing
- Track product availability and inventory changes
- Analyze product categories and trends across multiple stores
📊 Market Research
- Gather product data for market analysis
- Compare product offerings across different Shopify stores
- Study pricing strategies and product positioning
🔄 Data Integration
- Import product catalogs into your own systems
- Sync product data for affiliate programs
- Build product comparison engines
📈 Business Intelligence
- Track product launches and updates
- Monitor vendor and product type distributions
- Analyze product metadata and tagging strategies
🎯 Price Monitoring
- Track price changes over time
- Monitor compare-at prices and discounts
- Analyze pricing across variants
Frequently Asked Questions
How does the actor discover products?
The actor automatically scans the store's HTML for links containing /products/ and extracts all product URLs. No manual product listing is required.
What happens if a store blocks my requests?
The actor implements intelligent proxy fallback:
- Starts with no proxy (direct connection)
- If blocked → automatically switches to datacenter proxy
- If still blocked → falls back to residential proxy with 3 retries
- Once residential proxy is activated, it's used for all remaining requests
Can I scrape multiple stores at once?
Yes! Simply add multiple URLs to the startUrls array. The actor processes them sequentially, grouping results by store URL.
What data is included in the output?
The actor extracts complete product data from Shopify's JSON API, including:
- Basic info (ID, title, description, vendor, type)
- Pricing (current price, compare-at price, currency)
- Variants (all sizes, colors, SKUs, inventory)
- Images (URLs, dimensions, variant associations)
- Metadata (tags, dates, published status)
- And all other fields available in Shopify's product API
How long does scraping take?
Scraping time depends on:
- Number of stores to process
- Number of products per store
- Network speed and proxy performance
- Store response times
The actor processes products concurrently for faster results. Progress is logged in real-time.
Can I limit the number of products scraped?
Currently, the actor processes all discovered products. You can filter results after scraping, or modify the code to add a maxItems limit if needed.
What if a product URL returns an error?
The actor handles errors gracefully:
- Failed products are logged with error details
- Successful products are still saved
- Summary statistics show success/failure counts
- The actor continues processing remaining products
Is the data saved in real-time?
Yes! The actor uses Apify's dataset feature to save data as it's extracted. This means:
- Data is available even if the actor is interrupted
- You can monitor progress in real-time
- Results are automatically saved to the dataset
Support and Feedback
💬 For custom solutions or feature requests, contact us at: dev.scraperengine@gmail.com
We're always looking to improve the actor based on user feedback. If you encounter any issues or have suggestions for new features, please don't hesitate to reach out!
Cautions
⚠️ Important Legal and Ethical Considerations:
-
Public Data Only: This actor collects data only from publicly available Shopify store pages. It does not access private accounts, password-protected content, or restricted areas.
-
Respect Robots.txt: While the actor can access public product pages, users should respect website terms of service and robots.txt files.
-
Rate Limiting: The actor includes built-in delays and proxy management to avoid overwhelming target servers. However, users should be mindful of scraping frequency.
-
Legal Compliance: Users are responsible for ensuring their use of this actor complies with:
- Local data protection and privacy laws (GDPR, CCPA, etc.)
- Website terms of service
- Copyright and intellectual property regulations
- Anti-spam and data collection regulations
-
Ethical Use: This tool is intended for legitimate business intelligence, market research, and data analysis purposes. Users should not use it for:
- Harassment or stalking
- Unauthorized data collection
- Violating terms of service
- Any illegal activities
-
Data Responsibility: Users are responsible for how they store, use, and share the collected data. Ensure proper data security and privacy practices.
Built with ❤️ for the Apify community