Updated Content Checker avatar
Updated Content Checker

Pricing

from $0.10 / 1,000 results

Go to Apify Store
Updated Content Checker

Updated Content Checker

Monitors sitemaps for new/updated content. Returns only URLs modified since a specified date for efficient incremental scraping.

Pricing

from $0.10 / 1,000 results

Rating

0.0

(0)

Developer

Tomáš Gabík

Tomáš Gabík

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Apify Actor that monitors sitemaps for new/updated content. Returns only URLs modified since a specified date, enabling efficient incremental scraping with tools like Website Content Crawler (WCC).

Features

  • Parses regular sitemaps and sitemap indexes
  • Relative date filter: "7 days", "2 weeks", "1 month", "24 hours"
  • Absolute date filter: ISO 8601 dates like "2026-01-15"
  • Optional regex pattern filtering for URLs
  • Persists last check timestamp for automatic incremental checks

Input

FieldTypeRequiredDescription
sitemapUrlstringYesURL of the sitemap.xml file
newerThanstringNoRelative filter: "7 days", "2 weeks", "1 month", "24 hours"
newerThanDatestringNoAbsolute date filter (ISO 8601)
urlPatternstringNoRegex pattern to filter URLs
storeLastCheckbooleanNoStore timestamp for future runs (default: false)
storeNamestringNoKey-value store name for persistence

Priority: newerThan > newerThanDate > stored date

Output

Dataset

{
"url": "https://example.com/article",
"lastModified": "2026-01-20T15:59:22Z"
}

OUTPUT (Summary)

{
"sitemapUrl": "https://example.com/sitemap.xml",
"totalUrlsInSitemap": 500,
"filteredUrlCount": 500,
"updatedUrlCount": 10,
"cutoffDate": "2026-01-13T00:00:00Z",
"cutoffSource": "relative time: 7 days",
"checkedAt": "2026-01-20T16:00:00Z"
}

Usage Examples

Get URLs updated in the last 7 days

{
"sitemapUrl": "https://help.wealthsimple.com/hc/sitemap.xml",
"newerThan": "7 days"
}

Get URLs updated since a specific date

{
"sitemapUrl": "https://help.wealthsimple.com/hc/sitemap.xml",
"newerThanDate": "2026-01-15"
}

Filter only English articles

{
"sitemapUrl": "https://help.wealthsimple.com/hc/sitemap.xml",
"newerThan": "7 days",
"urlPattern": "/en-ca/articles/"
}

Integration with Website Content Crawler

Use the output dataset as input for WCC to scrape only updated pages:

  1. Run this Actor to get updated URLs
  2. Pass the dataset URLs to Website Content Crawler
  3. Only changed content gets scraped