Shopify Product & Collection Scraper
DeprecatedPricing
Pay per usage
Shopify Product & Collection Scraper
DeprecatedShopify Product & Collection Scraper is a powerful API that allows you to extract structured data from any public Shopify-powered store. Simply provide a product or collection page URL, and this API will return essential data including titles, prices, images, descriptions, variants and availability.
Pricing
Pay per usage
Rating
5.0
(1)
Developer
Farabiulder
Maintained by CommunityActor stats
0
Bookmarked
9
Total users
9
Monthly active users
3 months ago
Last modified
Categories
Share
An Apify actor that retrieves product and collection information from Shopify stores as JSON data.
Features
- Automatic Type Detection: Automatically detects whether the URL is a product or collection page
- Multiple Scraping Methods:
- First tries Shopify's JSON endpoints for clean data
- Falls back to HTML scraping with multiple selectors
- Comprehensive Data Extraction: Extracts titles, prices, images, variants, descriptions, and more
- Robust Error Handling: Multiple fallback methods ensure data extraction even when some methods fail
- Apify Integration: Properly structured as an Apify actor with input/output handling
Input Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
url | string | Yes | The Shopify URL to scrape (must contain /products/ or /collections/) |
type | string | No | Type of content to scrape: "auto", "product", or "collection" (default: "auto") |
Output Format
For Collections
{"id": "collection_id","title": "Collection Name","handle": "collection-handle","body_html": "Collection description","published_at": "2023-01-01T00:00:00Z","updated_at": "2023-01-01T00:00:00Z","sort_order": "manual","template_suffix": null,"products_count": 50,"products": [{"id": "product_id","title": "Product Name","handle": "product-handle","description": "Product description","images": ["https://example.com/image.jpg"],"url": "https://store.com/products/product-handle","price": "29.99","compare_at_price": "39.99","variants": [{"id": "variant_id","title": "Default Title","price": "29.99","compare_at_price": "39.99","available": true,"sku": "SKU123"}]}],"pagination": {"current_page": 1,"total_pages": 1,"total_products": 50,"products_on_page": 50}}
For Products
{"id": "product_id","title": "Product Name","handle": "product-handle","body_html": "Product description","images": ["https://example.com/image.jpg"],"variants": [{"id": "variant_id","title": "Default Title","price": "29.99","compare_at_price": "39.99","available": true,"sku": "SKU123"}],"price": "29.99","availability": "In Stock","sku": "SKU123","vendor": "Brand Name"}
Usage Examples
Local Development
- Install dependencies:
$npm install
- Create a
.envfile (optional for local testing):
SCRAPEDO_API_KEY=your_api_key_here
- Run the Apify actor:
$npm start
- Or test the standalone version:
$node standalone.js
Apify Platform
- Deploy to Apify platform
- Use the web interface to input parameters
- Or use the Apify API:
curl -X POST "https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"url": "https://store.myshopify.com/collections/featured","type": "collection"}'
Testing
Standalone Testing
The standalone.js file provides a simple way to test the scraper without Apify:
$node standalone.js
This will:
- Test with a real Shopify store URL
- Save results to
scraping-results.json - Show detailed logging of the scraping process
Custom Testing
You can modify test-input.json to test different URLs:
{"url": "https://your-shopify-store.com/collections/featured","type": "collection"}
Supported Shopify Features
- JSON Endpoints: Automatically tries
*.jsonendpoints for clean data - Product Collections: Extracts all products from collection pages
- Product Variants: Handles multiple product variants with different prices
- Images: Extracts product images with proper URL resolution
- Pricing: Handles both regular and sale prices
- SEO Data: Extracts product handles, titles, and descriptions
- Inventory: Tracks product availability and SKU information
Error Handling
The actor implements multiple fallback methods:
- Primary: Shopify JSON endpoints (
/products/*.json,/collections/*.json) - Secondary: Collection products API (
/collections/*/products.json) - Fallback: HTML scraping with multiple CSS selectors
- Graceful Degradation: Returns partial data even if some methods fail
File Structure
apify-product-collection/├── src/│ └── main.js # Main Apify actor code├── apify.json # Apify configuration├── package.json # Dependencies and scripts├── standalone.js # Standalone testing version├── test-input.json # Test input for local development├── example.js # Example usage└── README.md # This file
Limitations
- Requires the Shopify store to be publicly accessible
- Some stores may block automated requests (403/404 errors are common)
- Complex product variants may not be fully captured via HTML scraping
- Pagination is limited to the current page for collection scraping
Troubleshooting
Common Issues
- 403/404 Errors: Many Shopify stores block automated requests
- No Products Found: Try different URLs or check if the store is accessible
- Apify Integration Issues: Make sure you're using the correct Apify SDK version
Debugging
- Check the console output for detailed error messages
- Use
standalone.jsfor easier debugging - Modify
test-input.jsonto test different URLs
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
License
ISC License