iRozhlas Sitemap Discovery avatar

iRozhlas Sitemap Discovery

Pricing

Pay per usage

Go to Apify Store
iRozhlas Sitemap Discovery

iRozhlas Sitemap Discovery

Discovers historical article URLs from iROZHLAS.cz using their official sitemaps.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Jakub Kopecký

Jakub Kopecký

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

an hour ago

Last modified

Categories

Share

Discovers historical article URLs from iROZHLAS.cz using their official sitemaps.

What it does

  • Reads the public sitemap index
  • Downloads sitemap-N.xml files (the active numbered ones)
  • Filters to only zpravy-domov/* and zpravy-svet/* (configurable)
  • Extracts publication date directly from the URL slug (no page fetch needed)
  • Applies optional date range filter
  • Returns clean { url, date } items with proper ISO dates

Smart caching

Uses a persistent Apify Key-Value Store (irozhlas-sitemap-cache).

Cache keys are namespaced by your pathPrefixes (e.g. sitemap-7.xml--zpravy-domov+zpravy-svet--urls). This means:

  • Changing pathPrefixes in the future automatically gives you fresh caches for the new combination.
  • Historical sitemap-N.xml files are downloaded + filtered once (per prefix set) and then reused forever for any date range.
  • Only the current highest-numbered sitemap (sitemap-7.xml today, later sitemap-8.xml, etc.) performs a cheap HEAD + Last-Modified check on every run.
  • Date filtering is applied only at output time.

When a new higher sitemap appears (e.g. sitemap-8), the previous latest is automatically demoted to "historical" and its cache is reused without re-downloading.

Input

FieldTypeDefaultDescription
pathPrefixesarray["zpravy-domov", "zpravy-svet"]URL path prefixes to keep
dateFromstringYYYY-MM-DD (inclusive)
dateTostringYYYY-MM-DD (inclusive)
proxyConfigurationobjectResidential SKRecommended

Output

One dataset item per matching article:

{
"url": "https://www.irozhlas.cz/zpravy-domov/ted-nema-duvod-utect-ale-pozdeji-ano-vysvetluje-soud-proc-nechal-persana-zadeha_1707020600_ogo",
"date": "2017-07-02T06:00:00+02:00"
}

Usage (from pipeline later)

const run = await client.actor('jakub.kopecky/irozhlas-discovery').call({
pathPrefixes: ['zpravy-domov', 'zpravy-svet'],
dateFrom: '2026-05-01',
dateTo: '2026-05-31',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();

Caching & Apify KV Store

This Actor uses a persistent named Key-Value Store (irozhlas-sitemap-cache).

Important: Apify KV Store keys are extremely restrictive:

  • Allowed characters: a-zA-Z0-9!-_.'()
  • Max length: 256 characters

All cache keys are aggressively sanitized at runtime to prevent crashes. Keys are namespaced by pathPrefixes so changing them gives you fresh caches automatically.

Deploy

cd actors/irozhlas-discovery
pnpm install
apify push

Then use the returned Actor ID (e.g. jakub.kopecky/irozhlas-discovery).