iRozhlas Article Extractor
Pricing
Pay per usage
iRozhlas Article Extractor
Extracts clean article body text from irozhlas.cz news articles. Give it a list of article URLs and it returns the paragraph content - no navigation, no sidebar, no metadata.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Jakub Kopecký
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
iRozhlas Article Extractor 🗞️
Extracts clean article body text from irozhlas.cz news articles. Give it a list of article URLs and it returns the paragraph content — no navigation, no sidebar, no metadata.
What it does
Fetches each article URL, isolates the main <article> body, and extracts
paragraph text. The output is plain text — paragraphs joined by double
newlines, ready for LLM processing, summarization, or content analysis.
How it works
- Fetches each URL through the Apify proxy (residential proxies supported).
- Parses the HTML with a fast CSS selector:
article[role="article"] .col--main p:not(.meta--right). - Skips metadata, author lines, and UI chrome by targeting only the content column. ⚡ Single HTTP request per article — no headless browser, no recursive crawl.
Input
| Field | Type | Description |
|---|---|---|
startUrls | array | List of irozhlas.cz article URLs. |
selector | string | CSS selector for article body (default targets the main content). |
proxyConfiguration | object | Apify proxy settings (residential recommended). |
Output
One dataset item per URL:
{"url": "https://www.irozhlas.cz/zpravy-svet/...","text": "first paragraph\n\nsecond paragraph","status": "ok"}
status can be:
- ✅
ok— content extracted - ∅
empty— selector matched nothing - ✗
error— fetch failed or HTTP ≥400