Sitemap Change Orchestrator avatar
Sitemap Change Orchestrator

Pricing

Pay per usage

Go to Store
Sitemap Change Orchestrator

Sitemap Change Orchestrator

Developed by

Tri⟁angle

Tri⟁angle

Maintained by Apify

Monitor website sitemaps for new, updated, or removed URLs. Integration with the Website Content Crawler (WCC) allows feeding only relevant URLs. This ensures your web crawls are efficient, targeted, and resource-optimized, keeping your datasets fresh for any application.

0.0 (0)

Pricing

Pay per usage

1

Total users

2

Monthly users

2

Runs succeeded

88%

Last modified

a day ago

Start URLs

startUrlsarrayOptional

List of start URLs to scrape. These can be direct sitemap urls or website on which the sitemaps are going to be found if the discoverSitemaps is enabled.

Discover sitemaps

discoverSitemapsbooleanOptional

If enabled, the actor will fetch each start URL's robots.txt and enqueue any sitemap URL it finds. This is useful if you don't want to enter direct sitemap URLs. Please note that this will only work if the website has robots.txt.

Default value of this property is true

Change types

changeTypesarrayOptional

Which change types to include in the output.

Default value of this property is ["NEW","UPDATED"]

Snapshot key prefix

snapshotKeyPrefixstringOptional

Prefix for the snapshot record key stored in the snapshots key-value store, to separate runs by website or project.

Default value of this property is "DEFAULT"

URL filter regex

urlFilterRegexstringOptional

Regex pattern to filter which URLs are included in the output and snapshot. This filter applies only to the final URLs and not to intermediate sitemap URLs.

Add removed URLs to key-value store

addRemovedUrlsToKvsbooleanOptional

If enabled, the actor will always also include URLs that were removed compared to the previous snapshot to the key-value store.

Default value of this property is false

Proxy configuration

proxyConfigurationobjectOptional

Proxy configuration used for crawling.

Default value of this property is {"useApifyProxy":true}

Memory

scdMemoryEnumOptional

Amount of memory (RAM) allocated to the actor run in megabytes.

Value options:

"32768": string"16384": string"8192": string"4096": string"2048": string"1024": string"512": string

Default value of this property is "4096"

Timeout

scdTimeoutintegerOptional

Timeout for the actor run in seconds. Zero value means there is no timeout, and the Actor runs until completion, or maybe infinitely. Default is 360,000 seconds (100 hours).

Default value of this property is 360000

Input

wccInputobjectRequired

Input JSON for the Website Content Crawler actor.

Search for sitemaps on start URLs from the WCC input

addWccUrlsToScdbooleanOptional

If enabled, start URLs from Website Content Crawler will be treated as start URLs defined within this orchestrator.

Default value of this property is true

Maximum URLs per run

wccMaxUrlsPerRunintegerOptional

How many URLs from the Sitemap Change Detecter is there going to be in a single Website Content Crawler run. Note that each run's default dataset will be merged and output after all runs complete.

Default value of this property is 50000

Memory

wccMemoryEnumOptional

Amount of memory (RAM) allocated to the actor run in megabytes.

Value options:

"32768": string"16384": string"8192": string"4096": string"2048": string"1024": string"512": string

Default value of this property is "4096"

Timeout

wccTimeoutintegerOptional

Timeout for the actor run in seconds. Zero value means there is no timeout, and the Actor runs until completion, or maybe infinitely. Default is 360,000 seconds (100 hours).

Default value of this property is 360000

Skip Website Content Crawler

skipWccbooleanOptional

If checked, the Website Content Crawler won't run after detecting sitemap changes. This is useful if you want to only initialize the sitemap snapshot without scraping the URLs.

Default value of this property is false