Shopify Product & Collection Scraper avatar

Shopify Product & Collection Scraper

Deprecated

Pricing

Pay per usage

Go to Apify Store
Shopify Product & Collection Scraper

Shopify Product & Collection Scraper

Deprecated

Shopify Product & Collection Scraper is a powerful API that allows you to extract structured data from any public Shopify-powered store. Simply provide a product or collection page URL, and this API will return essential data including titles, prices, images, descriptions, variants and availability.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Farabiulder

Farabiulder

Maintained by Community

Actor stats

0

Bookmarked

9

Total users

9

Monthly active users

3 months ago

Last modified

Share

An Apify actor that retrieves product and collection information from Shopify stores as JSON data.

Features

  • Automatic Type Detection: Automatically detects whether the URL is a product or collection page
  • Multiple Scraping Methods:
    • First tries Shopify's JSON endpoints for clean data
    • Falls back to HTML scraping with multiple selectors
  • Comprehensive Data Extraction: Extracts titles, prices, images, variants, descriptions, and more
  • Robust Error Handling: Multiple fallback methods ensure data extraction even when some methods fail
  • Apify Integration: Properly structured as an Apify actor with input/output handling

Input Parameters

ParameterTypeRequiredDescription
urlstringYesThe Shopify URL to scrape (must contain /products/ or /collections/)
typestringNoType of content to scrape: "auto", "product", or "collection" (default: "auto")

Output Format

For Collections

{
"id": "collection_id",
"title": "Collection Name",
"handle": "collection-handle",
"body_html": "Collection description",
"published_at": "2023-01-01T00:00:00Z",
"updated_at": "2023-01-01T00:00:00Z",
"sort_order": "manual",
"template_suffix": null,
"products_count": 50,
"products": [
{
"id": "product_id",
"title": "Product Name",
"handle": "product-handle",
"description": "Product description",
"images": ["https://example.com/image.jpg"],
"url": "https://store.com/products/product-handle",
"price": "29.99",
"compare_at_price": "39.99",
"variants": [
{
"id": "variant_id",
"title": "Default Title",
"price": "29.99",
"compare_at_price": "39.99",
"available": true,
"sku": "SKU123"
}
]
}
],
"pagination": {
"current_page": 1,
"total_pages": 1,
"total_products": 50,
"products_on_page": 50
}
}

For Products

{
"id": "product_id",
"title": "Product Name",
"handle": "product-handle",
"body_html": "Product description",
"images": ["https://example.com/image.jpg"],
"variants": [
{
"id": "variant_id",
"title": "Default Title",
"price": "29.99",
"compare_at_price": "39.99",
"available": true,
"sku": "SKU123"
}
],
"price": "29.99",
"availability": "In Stock",
"sku": "SKU123",
"vendor": "Brand Name"
}

Usage Examples

Local Development

  1. Install dependencies:
$npm install
  1. Create a .env file (optional for local testing):
SCRAPEDO_API_KEY=your_api_key_here
  1. Run the Apify actor:
$npm start
  1. Or test the standalone version:
$node standalone.js

Apify Platform

  1. Deploy to Apify platform
  2. Use the web interface to input parameters
  3. Or use the Apify API:
curl -X POST "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"url": "https://store.myshopify.com/collections/featured",
"type": "collection"
}'

Testing

Standalone Testing

The standalone.js file provides a simple way to test the scraper without Apify:

$node standalone.js

This will:

  • Test with a real Shopify store URL
  • Save results to scraping-results.json
  • Show detailed logging of the scraping process

Custom Testing

You can modify test-input.json to test different URLs:

{
"url": "https://your-shopify-store.com/collections/featured",
"type": "collection"
}

Supported Shopify Features

  • JSON Endpoints: Automatically tries *.json endpoints for clean data
  • Product Collections: Extracts all products from collection pages
  • Product Variants: Handles multiple product variants with different prices
  • Images: Extracts product images with proper URL resolution
  • Pricing: Handles both regular and sale prices
  • SEO Data: Extracts product handles, titles, and descriptions
  • Inventory: Tracks product availability and SKU information

Error Handling

The actor implements multiple fallback methods:

  1. Primary: Shopify JSON endpoints (/products/*.json, /collections/*.json)
  2. Secondary: Collection products API (/collections/*/products.json)
  3. Fallback: HTML scraping with multiple CSS selectors
  4. Graceful Degradation: Returns partial data even if some methods fail

File Structure

apify-product-collection/
├── src/
│ └── main.js # Main Apify actor code
├── apify.json # Apify configuration
├── package.json # Dependencies and scripts
├── standalone.js # Standalone testing version
├── test-input.json # Test input for local development
├── example.js # Example usage
└── README.md # This file

Limitations

  • Requires the Shopify store to be publicly accessible
  • Some stores may block automated requests (403/404 errors are common)
  • Complex product variants may not be fully captured via HTML scraping
  • Pagination is limited to the current page for collection scraping

Troubleshooting

Common Issues

  1. 403/404 Errors: Many Shopify stores block automated requests
  2. No Products Found: Try different URLs or check if the store is accessible
  3. Apify Integration Issues: Make sure you're using the correct Apify SDK version

Debugging

  • Check the console output for detailed error messages
  • Use standalone.js for easier debugging
  • Modify test-input.json to test different URLs

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

License

ISC License