Actor Website Change Monitor
Pricing
Pay per usage
Actor Website Change Monitor
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Egor Kaleynik
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Website Change Detection & Monitoring (Apify Actor)
Stateful website monitoring actor for long-running cron use. Unlike one-shot extractors, this actor stores URL history, compares snapshots between runs, and emits change events.
Modes
monitor: user provides explicit URL list (startUrls), actor watches for changes.discover: actor discovers URLs from a domain (or category URL seed) and then monitors them.promo-detect: actor generates promo candidates from a pattern library, probes them in batch, then monitors live pages.
Amazon rule:
platform=amazon+mode=promo-detectis blocked by input validation.
Scheduler Guidance
- Default: every
24h - Recommended: every
12h - Hard minimum:
6h - Maximum interval:
168h(7 days)
Input guardrail:
checkIntervalHoursmust be6..168.
Tiering Model
- Tier 1: checked every run.
- Tier 2: checked on modulo rotation (
runCounter % N === tierOffset), whereN=tier2Modulo. - Tier 3: baseline capture tier (new URLs).
- Quarantine: after
5consecutive fetch errors.
Transitions:
- Tier 3 -> Tier 2 after baseline captured.
- Tier 2 -> Tier 1 after detected change.
- Tier 1 -> Tier 2 after
3stable runs. - Any -> Quarantine after
5consecutive errors.
Platform Support
- Detection: Shopify, WooCommerce, Magento, Amazon, BigCommerce, PrestaShop, Generic.
- Platform extraction: Shopify/Woo/Magento/Amazon selector packs + JSON/API-like parsing + JSON-LD fallback.
Discovery Strategies
Mode discover
- Sitemap recursion (
/sitemap.xml,/sitemap_index.xml,robots.txtreferences) - Shopify API pagination (
/products.json?limit=250&since_id=...) for Shopify platforms - Category/listing crawl fallback (depth and page-limit configurable)
Amazon discover
- Strategy 1:
amazonAsinslist (build/dp/{ASIN}URLs) - Strategy 2: keyword search crawl (
/s), parsedata-asin - Strategy 3: browse node crawl (
/b?node=...), parsedata-asin
Mode C Probe Engine
Batch probe engine (before registry merge):
- Concurrency control (
probeConcurrency) - Per-domain throttle (
probeDelayMs) - HEAD-first probing with GET fallback for HEAD-blocking responses
- Redirect capture (
MODE_C_REDIRECTSKV key) - Probe lifecycle events:
page_appearedfor newly live probed URLspage_disappearedfor previously known URLs that fail probe
Localization:
- Promo URL candidates include static locales and dynamic prefixes extracted from homepage
hreflanglinks.
Amazon Hardening
Implemented safeguards:
- Forces JS rendering (
renderJs=truesemantics) - Randomized delay profile (
3-8s) - Rotating desktop user agents
- Session cookies persisted in-run
- Residential proxy enforcement (
proxyConfig.apifyProxyGroupsmust includeRESIDENTIAL)
Stateful Storage (KV)
urlRegistryurlRegistryByDomainplatformCacherunCounterfailedWebhooksdeadWebhooksMODE_C_REDIRECTS
Webhook Reliability
- Failed sends buffered into
failedWebhooks - Retry on next run before dispatching new events
- Retry cap:
20attempts - Exceeded entries moved to
deadWebhooksdead-letter queue
Run output includes webhook retry/dead-letter metrics.
Input Highlights
Core:
mode,startUrls,domain,platformHintmonitorSelectors,changeThresholdmaxUrls,tier2Modulo,checkIntervalHoursnotificationWebhook
Mode B:
categoryCrawlDepth,maxCategoryPages
Mode C:
probeOnlyMode,probeConcurrency,probeDelayMsuserPatterns
Amazon:
proxyConfigamazonAsins,amazonSearchKeywords,amazonSearchCategoryId,amazonBrowseNodeId
Scale:
checkConcurrency(parallel URL checks)
domain input accepts either:
- host/domain, e.g.
www.reserved.com - full URL seed, e.g.
https://www.reserved.com/pl/pl/kobieta
When a full URL seed is provided in discover mode, category crawl starts from that URL first.
Local Development
Install/build/run:
npm installnpm run buildnpm start
Run tests:
$npm run test
Smoke inputs are in smoke-tests/.
Deployment Sanity Checklist
npm run buildpasses.npm run testpasses.- Verify
.actor/actor.jsonpoints toINPUT_SCHEMA.jsonandDockerfile. - For Amazon runs, set proxy group to
RESIDENTIAL. - Create Apify Scheduler task with interval
>= 6h. - Validate webhook endpoint availability and idempotency handling.
Cost Model Notes
- HTML fetch path (Cheerio/gotScraping) is the cheap default.
- JS rendering is more expensive and should be used only where needed.
- Tiering and modulo rotation are the primary cost controls at scale.
- Probe-first Mode C avoids expensive full checks for dead URLs.