Shopify Product & Collection Scraper avatar
Shopify Product & Collection Scraper
Under maintenance

Pricing

Pay per event

Go to Apify Store
Shopify Product & Collection Scraper

Shopify Product & Collection Scraper

Under maintenance

Developed by

Farabiulder

Farabiulder

Maintained by Community

Shopify Product & Collection Scraper is a powerful API that allows you to extract structured data from any public Shopify-powered store. Simply provide a product or collection page URL, and this API will return essential data including titles, prices, images, descriptions, variants and availability.

5.0 (1)

Pricing

Pay per event

0

3

3

Last modified

a day ago

An Apify actor that retrieves product and collection information from Shopify stores as JSON data.

Features

  • Automatic Type Detection: Automatically detects whether the URL is a product or collection page
  • Multiple Scraping Methods:
    • First tries Shopify's JSON endpoints for clean data
    • Falls back to HTML scraping with multiple selectors
  • Comprehensive Data Extraction: Extracts titles, prices, images, variants, descriptions, and more
  • Robust Error Handling: Multiple fallback methods ensure data extraction even when some methods fail
  • Apify Integration: Properly structured as an Apify actor with input/output handling

Input Parameters

ParameterTypeRequiredDescription
urlstringYesThe Shopify URL to scrape (must contain /products/ or /collections/)
typestringNoType of content to scrape: "auto", "product", or "collection" (default: "auto")

Output Format

For Collections

{
"id": "collection_id",
"title": "Collection Name",
"handle": "collection-handle",
"body_html": "Collection description",
"published_at": "2023-01-01T00:00:00Z",
"updated_at": "2023-01-01T00:00:00Z",
"sort_order": "manual",
"template_suffix": null,
"products_count": 50,
"products": [
{
"id": "product_id",
"title": "Product Name",
"handle": "product-handle",
"description": "Product description",
"images": ["https://example.com/image.jpg"],
"url": "https://store.com/products/product-handle",
"price": "29.99",
"compare_at_price": "39.99",
"variants": [
{
"id": "variant_id",
"title": "Default Title",
"price": "29.99",
"compare_at_price": "39.99",
"available": true,
"sku": "SKU123"
}
]
}
],
"pagination": {
"current_page": 1,
"total_pages": 1,
"total_products": 50,
"products_on_page": 50
}
}

For Products

{
"id": "product_id",
"title": "Product Name",
"handle": "product-handle",
"body_html": "Product description",
"images": ["https://example.com/image.jpg"],
"variants": [
{
"id": "variant_id",
"title": "Default Title",
"price": "29.99",
"compare_at_price": "39.99",
"available": true,
"sku": "SKU123"
}
],
"price": "29.99",
"availability": "In Stock",
"sku": "SKU123",
"vendor": "Brand Name"
}

Usage Examples

Local Development

  1. Install dependencies:
$npm install
  1. Create a .env file (optional for local testing):
SCRAPEDO_API_KEY=your_api_key_here
  1. Run the Apify actor:
$npm start
  1. Or test the standalone version:
$node standalone.js

Apify Platform

  1. Deploy to Apify platform
  2. Use the web interface to input parameters
  3. Or use the Apify API:
curl -X POST "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://store.myshopify.com/collections/featured",
"type": "collection"
}'

Testing

Standalone Testing

The standalone.js file provides a simple way to test the scraper without Apify:

$node standalone.js

This will:

  • Test with a real Shopify store URL
  • Save results to scraping-results.json
  • Show detailed logging of the scraping process

Custom Testing

You can modify test-input.json to test different URLs:

{
"url": "https://your-shopify-store.com/collections/featured",
"type": "collection"
}

Supported Shopify Features

  • JSON Endpoints: Automatically tries *.json endpoints for clean data
  • Product Collections: Extracts all products from collection pages
  • Product Variants: Handles multiple product variants with different prices
  • Images: Extracts product images with proper URL resolution
  • Pricing: Handles both regular and sale prices
  • SEO Data: Extracts product handles, titles, and descriptions
  • Inventory: Tracks product availability and SKU information

Error Handling

The actor implements multiple fallback methods:

  1. Primary: Shopify JSON endpoints (/products/*.json, /collections/*.json)
  2. Secondary: Collection products API (/collections/*/products.json)
  3. Fallback: HTML scraping with multiple CSS selectors
  4. Graceful Degradation: Returns partial data even if some methods fail

File Structure

apify-product-collection/
├── src/
│ └── main.js # Main Apify actor code
├── apify.json # Apify configuration
├── package.json # Dependencies and scripts
├── standalone.js # Standalone testing version
├── test-input.json # Test input for local development
├── example.js # Example usage
└── README.md # This file

Limitations

  • Requires the Shopify store to be publicly accessible
  • Some stores may block automated requests (403/404 errors are common)
  • Complex product variants may not be fully captured via HTML scraping
  • Pagination is limited to the current page for collection scraping

Troubleshooting

Common Issues

  1. 403/404 Errors: Many Shopify stores block automated requests
  2. No Products Found: Try different URLs or check if the store is accessible
  3. Apify Integration Issues: Make sure you're using the correct Apify SDK version

Debugging

  • Check the console output for detailed error messages
  • Use standalone.js for easier debugging
  • Modify test-input.json to test different URLs

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

ISC License