Pricing

Pay per usage

iRozhlas Article Extractor

Extracts clean article body text from irozhlas.cz news articles. Give it a list of article URLs and it returns the paragraph content - no navigation, no sidebar, no metadata.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Jakub Kopecký

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

iRozhlas Article Extractor 🗞️

Extracts clean article body text from irozhlas.cz news articles. Give it a list of article URLs and it returns the paragraph content — no navigation, no sidebar, no metadata.

What it does

Fetches each article URL, isolates the main <article> body, and extracts paragraph text. The output is plain text — paragraphs joined by double newlines, ready for LLM processing, summarization, or content analysis.

How it works

Fetches each URL through the Apify proxy (residential proxies supported).
Parses the HTML with a fast CSS selector: article[role="article"] .col--main p:not(.meta--right).
Skips metadata, author lines, and UI chrome by targeting only the content column. ⚡ Single HTTP request per article — no headless browser, no recursive crawl.

Input

Field	Type	Description
`startUrls`	array	List of irozhlas.cz article URLs.
`selector`	string	CSS selector for article body (default targets the main content).
`proxyConfiguration`	object	Apify proxy settings (residential recommended).

Output

One dataset item per URL:

{
  "url": "https://www.irozhlas.cz/zpravy-svet/...",
  "text": "first paragraph\n\nsecond paragraph",
  "status": "ok"
}

status can be:

✅ ok — content extracted
∅ empty — selector matched nothing
✗ error — fetch failed or HTTP ≥400

iRozhlas Sitemap Discovery

jakub.kopecky/irozhlas-discovery

Discovers historical article URLs from iROZHLAS.cz using their official sitemaps.

Jakub Kopecký

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

184

Public Article Intelligence & Citation Extractor

jacksu/public-article-intelligence-agent

Extract clean article text, metadata, summaries, citations, diagnostics, and change signals from public article URLs.

jack su

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

WebScrap

Web Article Extractor — Clean Reader Mode Text & Metadata

maged120/reader-mode

Extract clean, readable article content from any web page. Strips ads, navigation, and clutter — returns title, author, full body text, and publish date in structured JSON.

Maged

Article Scraper & News Scraper API

tugelbay/article-extractor

Article scraper API for clean text and metadata from URLs as Markdown, text, or HTML for RAG, AI agents, monitoring, and research. Guide: https://konabayev.com/tools/article-extractor/?utm_source=apify_info&utm_medium=referral&utm_campaign=article-extractor

Tugelbay Konabayev

Smart Article Extractor

datapilot/smart-article-extractor

News Article Extractor Actor fetches article URLs and extracts structured content using Requests, , and Newspaper3k. It collects title, author, publish date, text, summary, keywords, images, and word count. Supports proxy use and outputs clean JSON results.

Data Pilot

Article Content Extractor

codingfrontend/article-content-extractor

Extract clean article content, metadata and structured information from any web page. Returns title, description, author, publish date, plain content, word count, images, and more.

Coding Frontned

Article Content Extractor 📄

easyapi/article-content-extractor

Extract clean article content, metadata and structured information from any web page. Supports multiple URLs and returns well-formatted JSON with title, description, content, author, publish date and more. 🔍📄

EasyApi

137

5.0

Smart Article Extractor

parseforge/article-extractor

Extract clean article content from any news, blog, or publisher site! Pull full body text, author, publish date, word count, language, reading time, images, and metadata at scale. Ideal for content research, media monitoring, SEO audits, and AI training. Start extracting articles in minutes!