
Sitemap Change Orchestrator
Pricing
Pay per usage

Sitemap Change Orchestrator
Monitor website sitemaps for new, updated, or removed URLs. Integration with the Website Content Crawler (WCC) allows feeding only relevant URLs. This ensures your web crawls are efficient, targeted, and resource-optimized, keeping your datasets fresh for any application.
0.0 (0)
Pricing
Pay per usage
1
Total users
2
Monthly users
2
Runs succeeded
88%
Last modified
a day ago
Start URLs
startUrls
arrayOptional
List of start URLs to scrape. These can be direct sitemap urls or website on which the sitemaps are going to be found if the discoverSitemaps
is enabled.
Discover sitemaps
discoverSitemaps
booleanOptional
If enabled, the actor will fetch each start URL's robots.txt and enqueue any sitemap URL it finds. This is useful if you don't want to enter direct sitemap URLs. Please note that this will only work if the website has robots.txt.
Default value of this property is true
Change types
changeTypes
arrayOptional
Which change types to include in the output.
Default value of this property is ["NEW","UPDATED"]
Snapshot key prefix
snapshotKeyPrefix
stringOptional
Prefix for the snapshot record key stored in the snapshots key-value store, to separate runs by website or project.
Default value of this property is "DEFAULT"
URL filter regex
urlFilterRegex
stringOptional
Regex pattern to filter which URLs are included in the output and snapshot. This filter applies only to the final URLs and not to intermediate sitemap URLs.
Add removed URLs to key-value store
addRemovedUrlsToKvs
booleanOptional
If enabled, the actor will always also include URLs that were removed compared to the previous snapshot to the key-value store.
Default value of this property is false
Proxy configuration
proxyConfiguration
objectOptional
Proxy configuration used for crawling.
Default value of this property is {"useApifyProxy":true}
Memory
scdMemory
EnumOptional
Amount of memory (RAM) allocated to the actor run in megabytes.
Value options:
"32768": string"16384": string"8192": string"4096": string"2048": string"1024": string"512": string
Default value of this property is "4096"
Timeout
scdTimeout
integerOptional
Timeout for the actor run in seconds. Zero value means there is no timeout, and the Actor runs until completion, or maybe infinitely. Default is 360,000 seconds (100 hours).
Default value of this property is 360000
Search for sitemaps on start URLs from the WCC input
addWccUrlsToScd
booleanOptional
If enabled, start URLs from Website Content Crawler will be treated as start URLs defined within this orchestrator.
Default value of this property is true
Maximum URLs per run
wccMaxUrlsPerRun
integerOptional
How many URLs from the Sitemap Change Detecter is there going to be in a single Website Content Crawler run. Note that each run's default dataset will be merged and output after all runs complete.
Default value of this property is 50000
Memory
wccMemory
EnumOptional
Amount of memory (RAM) allocated to the actor run in megabytes.
Value options:
"32768": string"16384": string"8192": string"4096": string"2048": string"1024": string"512": string
Default value of this property is "4096"