iRozhlas Sitemap Discovery
Pricing
Pay per usage
iRozhlas Sitemap Discovery
Discovers historical article URLs from iROZHLAS.cz using their official sitemaps.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Jakub Kopecký
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
2
Monthly active users
an hour ago
Last modified
Categories
Share
Discovers historical article URLs from iROZHLAS.cz using their official sitemaps.
What it does
- Reads the public sitemap index
- Downloads
sitemap-N.xmlfiles (the active numbered ones) - Filters to only
zpravy-domov/*andzpravy-svet/*(configurable) - Extracts publication date directly from the URL slug (no page fetch needed)
- Applies optional date range filter
- Returns clean
{ url, date }items with proper ISO dates
Smart caching
Uses a persistent Apify Key-Value Store (irozhlas-sitemap-cache).
Cache keys are namespaced by your pathPrefixes (e.g. sitemap-7.xml--zpravy-domov+zpravy-svet--urls).
This means:
- Changing
pathPrefixesin the future automatically gives you fresh caches for the new combination. - Historical
sitemap-N.xmlfiles are downloaded + filtered once (per prefix set) and then reused forever for any date range. - Only the current highest-numbered sitemap (
sitemap-7.xmltoday, latersitemap-8.xml, etc.) performs a cheapHEAD+Last-Modifiedcheck on every run. - Date filtering is applied only at output time.
When a new higher sitemap appears (e.g. sitemap-8), the previous latest is automatically demoted to "historical" and its cache is reused without re-downloading.
Input
| Field | Type | Default | Description |
|---|---|---|---|
pathPrefixes | array | ["zpravy-domov", "zpravy-svet"] | URL path prefixes to keep |
dateFrom | string | — | YYYY-MM-DD (inclusive) |
dateTo | string | — | YYYY-MM-DD (inclusive) |
proxyConfiguration | object | Residential SK | Recommended |
Output
One dataset item per matching article:
{"url": "https://www.irozhlas.cz/zpravy-domov/ted-nema-duvod-utect-ale-pozdeji-ano-vysvetluje-soud-proc-nechal-persana-zadeha_1707020600_ogo","date": "2017-07-02T06:00:00+02:00"}
Usage (from pipeline later)
const run = await client.actor('jakub.kopecky/irozhlas-discovery').call({pathPrefixes: ['zpravy-domov', 'zpravy-svet'],dateFrom: '2026-05-01',dateTo: '2026-05-31',});const { items } = await client.dataset(run.defaultDatasetId).listItems();
Caching & Apify KV Store
This Actor uses a persistent named Key-Value Store (irozhlas-sitemap-cache).
Important: Apify KV Store keys are extremely restrictive:
- Allowed characters:
a-zA-Z0-9!-_.'() - Max length: 256 characters
All cache keys are aggressively sanitized at runtime to prevent crashes.
Keys are namespaced by pathPrefixes so changing them gives you fresh caches automatically.
Deploy
cd actors/irozhlas-discoverypnpm installapify push
Then use the returned Actor ID (e.g. jakub.kopecky/irozhlas-discovery).