iRozhlas Article Extractor avatar

iRozhlas Article Extractor

Pricing

Pay per usage

Go to Apify Store
iRozhlas Article Extractor

iRozhlas Article Extractor

Extracts clean article body text from irozhlas.cz news articles. Give it a list of article URLs and it returns the paragraph content - no navigation, no sidebar, no metadata.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Jakub Kopecký

Jakub Kopecký

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

0

Monthly active users

3 days ago

Last modified

Categories

Share

iRozhlas Article Extractor 🗞️

Extracts clean article body text from irozhlas.cz news articles. Give it a list of article URLs and it returns the paragraph content — no navigation, no sidebar, no metadata.

What it does

Fetches each article URL, isolates the main <article> body, and extracts paragraph text. The output is plain text — paragraphs joined by double newlines, ready for LLM processing, summarization, or content analysis.

How it works

  • Fetches each URL through the Apify proxy (residential proxies supported).
  • Parses the HTML with a fast CSS selector: article[role="article"] .col--main p:not(.meta--right).
  • Skips metadata, author lines, and UI chrome by targeting only the content column. ⚡ Single HTTP request per article — no headless browser, no recursive crawl.

Input

FieldTypeDescription
startUrlsarrayList of irozhlas.cz article URLs.
selectorstringCSS selector for article body (default targets the main content).
proxyConfigurationobjectApify proxy settings (residential recommended).

Output

One dataset item per URL:

{
"url": "https://www.irozhlas.cz/zpravy-svet/...",
"text": "first paragraph\n\nsecond paragraph",
"status": "ok"
}

status can be:

  • ok — content extracted
  • empty — selector matched nothing
  • error — fetch failed or HTTP ≥400