
Shopify Scraper (GraphQL)
Pricing
Pay per usage
Go to Apify Store

Shopify Scraper (GraphQL)
An Apify actor that crawls Shopify stores via `sitemap.xml` and fetches product data using the Storefront GraphQL API. Optimized for speed and cost with per-host batching, incremental processing, and buffered dataset writes.
0.0 (0)
Pricing
Pay per usage
0
4
4
Last modified
23 days ago
Shopify Scraper (GraphQL)
An Apify actor that crawls Shopify stores via sitemap.xml
and fetches product data using the Storefront GraphQL API. Optimized for speed and cost with per-host batching, incremental processing, and buffered dataset writes.
Features
- Reads
sitemap.xml
, filters product URLs (/products/<handle>
) - Batches GraphQL requests per store using aliases (fewer round-trips)
- Optional incremental runs (skips already processed product IDs)
- Optional lastmod cutoff to skip old products
- Outputs a single record per product; all variants are available under
additional.variants
- Extensible via
extendScraperFunction
andextendOutputFunction
Input parameters (core)
startUrls
: array ofsitemap.xml
URLsstorefrontApiVersion
: Storefront API version (e.g.,2024-07
)storefrontAccessToken
: your Storefront access tokenmaxRequestsPerCrawl
,maxConcurrency
,maxRequestRetries
,proxyConfig
,debugLog
Performance inputs
updatedSince
: ISO date; skips products with<lastmod>
older than thisbatchSize
: product handles per GraphQL request (default 10)flushIntervalMs
: max delay before sending a partial batch (default 300)perHostConcurrency
: parallel GraphQL requests per store (default 2)bufferWrites
: buffer dataset writes (default true)bufferSize
: items per dataset push (default 100)
Run locally
- Install dependencies:
$npm install
- Create local input at
apify_storage/key_value_stores/default/INPUT.json
, for example:
{"startUrls": [{ "url": "https://example.com/sitemap.xml" }],"storefrontApiVersion": "2024-07","storefrontAccessToken": "<YOUR_STOREFRONT_TOKEN>","maxRequestsPerCrawl": 50,"maxConcurrency": 10,"debugLog": true}
- Start the actor:
$npm start
Or development mode with auto-restart:
$npm run dev
GitHub integration
Workflows in .github/workflows/
:
ci.yml
: install, lint, and syntax check on push/PR tomain
.codeql.yml
: CodeQL security analysis on push/PR and weekly.
Docker quick start
make init # creates .env and INPUT.json from templatesmake run # docker compose up --build actor
Outputs will be in apify_storage/datasets/default
.
Extensibility
extendScraperFunction
: lifecycle hooks (SETUP
,FILTER_SITEMAP_URL
,PRENAVIGATION
,POSTNAVIGATION
,RUN
,FINISHED
)extendOutputFunction
: transform/filter final records before they are saved to the Dataset
License
This project is licensed under the Apache License 2.0. See LICENSE
and NOTICE
.