Shopify Product Scraper
Pricing
from $2.00 / 1,000 results
Shopify Product Scraper
Extract comprehensive product data from any Shopify-powered online store. Monitor prices, track inventory, and gather competitive intelligence effortlessly.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer

HappiTap
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Extract comprehensive product data from any Shopify-powered online store. Monitor prices, track inventory, and gather competitive intelligence effortlessly.
Pay-Per-Event Pricing: $0.002 per product variant + $0.00005 per run
π What does this actor do?
This powerful scraper automates the extraction of product information from Shopify stores, delivering structured data including:
- Product Details: Title, description, SKU, product type
- Pricing Information: Current price, currency
- Inventory Status: Stock availability and quantity
- Product Images: All product and variant images
- Variants: Colors, sizes, materials, and custom attributes
- Metadata: Brand, tags, creation/update dates
- Additional Data: Barcodes, weight, shipping requirements
π‘ Use Cases
- Price Monitoring: Track competitor pricing across multiple stores
- Inventory Management: Monitor stock levels and availability
- Market Research: Analyze product catalogs and trends
- Data Integration: Feed product data into your systems
- Competitive Intelligence: Stay informed about market changes
π° Pricing
Pay-Per-Event Model:
- Actor Start: $0.00005 per run
- Product Variant: $0.002 per variant
Cost Examples:
- Small store (200 variants): ~$0.40
- Medium store (1,500 variants): ~$3.00
- Large store (8,000 variants): ~$16.00
Only pay for successfully scraped variants. Set spending limits to control costs.
π Input Configuration
Required Fields
- Start URLs: One or more Shopify store URLs (e.g.,
https://www.example-store.com)
Optional Fields
-
Max items: Maximum number of products to scrape (0 = unlimited)
-
Proxy Configuration (Optional)
-
useApifyProxy: Enable Apify's proxy rotation (recommended for production)
-
proxyUrls: Use your own proxy servers
-
Leave empty to run without proxy (may be blocked by some stores)
Note: Proxy features require full Apify permissions or a paid plan. If running with LIMITED_PERMISSIONS, the actor will automatically run without proxy.
- Fetch HTML: Enable if you need HTML content (slower)
- Max concurrency: Number of parallel requests (default: 10)
- Max request retries: Retry attempts for failed requests (default: 3)
- Debug Log: Enable verbose logging for troubleshooting
Advanced Options
- Extend Output Function: Customize output data structure
- Extend Scraper Function: Add custom scraping logic
- Custom Data: Pass additional data to extend functions
π Output Format
Each product variant is output as a separate item with the following structure:
{"url": "https://example.com/products/product-name","title": "Product Name","id": "1234567890","sku": "SKU-12345","description": "Product description text","price": 29.99,"currency": "USD","availability": "in stock","product_type": "Clothing","brand": "Brand Name","color": "Blue","size": "Medium","material": "Cotton","display_name": "Blue / Medium","images_urls": ["https://cdn.shopify.com/image1.jpg","https://cdn.shopify.com/image2.jpg"],"video_urls": [],"created_at": "2023-01-15T10:30:00.000Z","updated_at": "2023-12-20T14:45:00.000Z","published_at": "2023-01-20T09:00:00.000Z","additional": {"variant_attributes": "Color: Blue / Size: Medium","variant_title": "Blue / Medium","scraped_at": "2024-01-01T12:00:00.000Z","barcode": "123456789012","taxcode": null,"stock_count": 50,"tags": ["new", "sale", "featured"],"weight": "0.5 kg","requires_shipping": true}}
π§ Extend Output Function
Filter and customize output items:
async ({ item, customData }) => {// Filter out items that don't match criteriaif (!item.title.includes('cuisine')) {return null; // omit the output}// Remove unwanted fieldsdelete item.additional;// Add custom dataitem.requestId = customData.requestId;return item;}
π οΈ Extend Scraper Function
Interact with different scraper phases:
async ({ label, url, filter, fns, filteredSitemapUrls, customData }) => {switch (label) {case 'FILTER_SITEMAP_URL': {// Filter product URLsfilter(url.includes('cooking') || url.includes(customData.filter));break;}case 'SETUP': {// Modify sitemap URLs before scrapingfilteredSitemapUrls.add('https://example.com/secret-unlisted-sitemap.xml');filteredSitemapUrls.forEach((sitemapURL) => {if (!sitemapURL.includes('en-us')) {filteredSitemapUrls.delete(sitemapURL);}});break;}}}
Available Labels
- SETUP: Called before scraping starts
- FILTER_SITEMAP_URL: Filter product URLs from sitemaps
- PRENAVIGATION: Before each request
- POSTNAVIGATION: After each request
- RUN: Before crawler starts
- FINISHED: After scraping completes
π¦ How It Works
- Discovery: Fetches
robots.txtto find sitemap URLs - Sitemap Parsing: Extracts product URLs from Shopify sitemaps
- Product Scraping: Retrieves product data via Shopify's JSON API
- Data Processing: Transforms and structures product information
- Output: Saves each variant as a separate dataset item
π° Cost Optimization
- Use JSON mode (default) instead of HTML for faster scraping
- Set Max items to limit the number of products scraped
- Adjust Max concurrency based on your needs (higher = faster but more expensive)
- Use FILTER_SITEMAP_URL to scrape only specific products
π Troubleshooting
"Not a Shopify URL" Error
- Ensure the URL is a Shopify-powered store
- Try disabling "Check for Shopify on robots"
Missing Products
- Check if products are listed in the sitemap
- Verify products are published and not hidden
Slow Performance
- Disable "Fetch HTML" if not needed
- Increase "Max concurrency"
- Use Apify proxy for better performance
π Notes
- Each product variant is output as a separate item
- Images are deduplicated and cleaned
- Dates are normalized to ISO 8601 format
- Stock availability is determined from inventory quantity or availability flag
π License
Apache 2.0