Shopify Product Scraper avatar
Shopify Product Scraper
Under maintenance

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Shopify Product Scraper

Shopify Product Scraper

Under maintenance

Extract comprehensive product data from any Shopify-powered online store. Monitor prices, track inventory, and gather competitive intelligence effortlessly.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

HappiTap

HappiTap

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Extract comprehensive product data from any Shopify-powered online store. Monitor prices, track inventory, and gather competitive intelligence effortlessly.

Pay-Per-Event Pricing: $0.002 per product variant + $0.00005 per run

πŸš€ What does this actor do?

This powerful scraper automates the extraction of product information from Shopify stores, delivering structured data including:

  • Product Details: Title, description, SKU, product type
  • Pricing Information: Current price, currency
  • Inventory Status: Stock availability and quantity
  • Product Images: All product and variant images
  • Variants: Colors, sizes, materials, and custom attributes
  • Metadata: Brand, tags, creation/update dates
  • Additional Data: Barcodes, weight, shipping requirements

πŸ’‘ Use Cases

  • Price Monitoring: Track competitor pricing across multiple stores
  • Inventory Management: Monitor stock levels and availability
  • Market Research: Analyze product catalogs and trends
  • Data Integration: Feed product data into your systems
  • Competitive Intelligence: Stay informed about market changes

πŸ’° Pricing

Pay-Per-Event Model:

  • Actor Start: $0.00005 per run
  • Product Variant: $0.002 per variant

Cost Examples:

  • Small store (200 variants): ~$0.40
  • Medium store (1,500 variants): ~$3.00
  • Large store (8,000 variants): ~$16.00

Only pay for successfully scraped variants. Set spending limits to control costs.

πŸ“‹ Input Configuration

Required Fields

  • Start URLs: One or more Shopify store URLs (e.g., https://www.example-store.com)

Optional Fields

  • Max items: Maximum number of products to scrape (0 = unlimited)

  • Proxy Configuration (Optional)

  • useApifyProxy: Enable Apify's proxy rotation (recommended for production)

  • proxyUrls: Use your own proxy servers

  • Leave empty to run without proxy (may be blocked by some stores)

Note: Proxy features require full Apify permissions or a paid plan. If running with LIMITED_PERMISSIONS, the actor will automatically run without proxy.

  • Fetch HTML: Enable if you need HTML content (slower)
  • Max concurrency: Number of parallel requests (default: 10)
  • Max request retries: Retry attempts for failed requests (default: 3)
  • Debug Log: Enable verbose logging for troubleshooting

Advanced Options

  • Extend Output Function: Customize output data structure
  • Extend Scraper Function: Add custom scraping logic
  • Custom Data: Pass additional data to extend functions

πŸ“Š Output Format

Each product variant is output as a separate item with the following structure:

{
"url": "https://example.com/products/product-name",
"title": "Product Name",
"id": "1234567890",
"sku": "SKU-12345",
"description": "Product description text",
"price": 29.99,
"currency": "USD",
"availability": "in stock",
"product_type": "Clothing",
"brand": "Brand Name",
"color": "Blue",
"size": "Medium",
"material": "Cotton",
"display_name": "Blue / Medium",
"images_urls": [
"https://cdn.shopify.com/image1.jpg",
"https://cdn.shopify.com/image2.jpg"
],
"video_urls": [],
"created_at": "2023-01-15T10:30:00.000Z",
"updated_at": "2023-12-20T14:45:00.000Z",
"published_at": "2023-01-20T09:00:00.000Z",
"additional": {
"variant_attributes": "Color: Blue / Size: Medium",
"variant_title": "Blue / Medium",
"scraped_at": "2024-01-01T12:00:00.000Z",
"barcode": "123456789012",
"taxcode": null,
"stock_count": 50,
"tags": ["new", "sale", "featured"],
"weight": "0.5 kg",
"requires_shipping": true
}
}

πŸ”§ Extend Output Function

Filter and customize output items:

async ({ item, customData }) => {
// Filter out items that don't match criteria
if (!item.title.includes('cuisine')) {
return null; // omit the output
}
// Remove unwanted fields
delete item.additional;
// Add custom data
item.requestId = customData.requestId;
return item;
}

πŸ› οΈ Extend Scraper Function

Interact with different scraper phases:

async ({ label, url, filter, fns, filteredSitemapUrls, customData }) => {
switch (label) {
case 'FILTER_SITEMAP_URL': {
// Filter product URLs
filter(
url.includes('cooking') || url.includes(customData.filter)
);
break;
}
case 'SETUP': {
// Modify sitemap URLs before scraping
filteredSitemapUrls.add('https://example.com/secret-unlisted-sitemap.xml');
filteredSitemapUrls.forEach((sitemapURL) => {
if (!sitemapURL.includes('en-us')) {
filteredSitemapUrls.delete(sitemapURL);
}
});
break;
}
}
}

Available Labels

  • SETUP: Called before scraping starts
  • FILTER_SITEMAP_URL: Filter product URLs from sitemaps
  • PRENAVIGATION: Before each request
  • POSTNAVIGATION: After each request
  • RUN: Before crawler starts
  • FINISHED: After scraping completes

🚦 How It Works

  1. Discovery: Fetches robots.txt to find sitemap URLs
  2. Sitemap Parsing: Extracts product URLs from Shopify sitemaps
  3. Product Scraping: Retrieves product data via Shopify's JSON API
  4. Data Processing: Transforms and structures product information
  5. Output: Saves each variant as a separate dataset item

πŸ’° Cost Optimization

  • Use JSON mode (default) instead of HTML for faster scraping
  • Set Max items to limit the number of products scraped
  • Adjust Max concurrency based on your needs (higher = faster but more expensive)
  • Use FILTER_SITEMAP_URL to scrape only specific products

πŸ› Troubleshooting

"Not a Shopify URL" Error

  • Ensure the URL is a Shopify-powered store
  • Try disabling "Check for Shopify on robots"

Missing Products

  • Check if products are listed in the sitemap
  • Verify products are published and not hidden

Slow Performance

  • Disable "Fetch HTML" if not needed
  • Increase "Max concurrency"
  • Use Apify proxy for better performance

πŸ“ Notes

  • Each product variant is output as a separate item
  • Images are deduplicated and cleaned
  • Dates are normalized to ISO 8601 format
  • Stock availability is determined from inventory quantity or availability flag

πŸ“„ License

Apache 2.0