Pricing

Pay per event

RSS Feed Scraper — Atom, Podcast & Multi-Feed

Parse and convert any RSS or Atom feed to a clean dataset — title, link, author, published date, summary, full HTML content, tags, GUID — export to JSON or CSV. A drop-in RSS feed parser for RSS 2.0, Atom 1.0, and the content:encoded / dc:creator extensions.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🎯 What this scrapes

RSS and Atom are still the most reliable way to subscribe to a publication. This Actor parses any feed URL — news site, blog, podcast, GitHub release feed, Reddit, Substack, Medium per-user — and writes one row per item. Output is normalised across RSS and Atom dialects so downstream code never needs to care which format it received.

Feed sources that work out of the box:

News publishers (New York Times, BBC, Reuters — any site that vends an RSS endpoint)
Podcast directories and individual show feeds (<enclosure> tags parsed automatically)
GitHub release and commit feeds
Reddit .rss and .json community feeds
Substack and Medium per-author feeds
Google Alerts export feeds
Any custom-built Atom or RSS 2.0 / RSS 1.0 feed

🔥 Features

🛡️ Browser fingerprint rotation — curl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not a Python script.
🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or 429 response.
🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per feed, Retry-After header honoured.
🧱 Rate-limit-aware pacing — when a feed host pushes back, we slow down and surface exactly what was collected before the limit hit.
🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, exportable as JSON / CSV / Excel straight from Apify Console.
💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data, no charge (beyond the small actor-start fee).
📡 Multi-feed batching — pass a list of URLs; the Actor fetches and normalises them all in one run, deduplicating by GUID.
📝 Full HTML content — when a feed publishes content:encoded or Atom content, we capture the full body alongside the summary, not just a truncated snippet.

💡 Use cases

News aggregation dashboard — pull 20 publications into one stream and pipe to Slack, Discord, or a webhook.
Brand monitoring — track every Google Alerts RSS feed for your company name, product, or competitors.
Content automation — feed company-blog RSS into a translation pipeline, summary LLM, or newsletter tool.
Podcast RSS parser — podcast RSS is standard RSS with <enclosure> tags; this Actor surfaces the episode link, title, author, and published date for every episode in the feed.
LLM-ready news digest — pass structured rows straight to an LLM pipeline; ISO-8601 timestamps and clean HTML make chunking predictable.
RSS-to-Google-Sheets / Notion / Airtable — export via Apify's native integration or the API; no glue code required.
Feed archival — schedule the Actor daily to build a rolling archive of feeds that don't publish full history.

⚙️ How to use it

Click Try for free at the top of the page.
Paste one or more RSS / Atom feed URLs into the feedUrls field — one per line.
Optionally set maxItemsPerFeed and toggle includeContent for full HTML.
Click Start. Output streams into the run's dataset in real time.
Export from Storage → Dataset as JSON, CSV, or Excel — or call the Apify API from your own code.

For scheduled runs, use Apify Schedules (cron syntax) so the Actor refreshes your dataset on your preferred cadence.

📥 Input

Field	Type	Required	Default	Notes
`feedUrls`	`array`	yes	`["https://news.ycombinator.com/rss"]`	List of RSS / Atom feed URLs. One URL per item. Each URL produces one or more dataset rows.
`maxItemsPerFeed`	`integer`	no	`50`	Cap on items pulled from a single feed. Set to `0` for no limit.
`includeContent`	`boolean`	no	`true`	When `true`, includes the full HTML body (`content:encoded` / Atom `content`). When `false`, summary only.
`userAgent`	`string`	no	`"DevilScrapesBot/1.0 (+https://apify.com/DevilScrapes)"`	Custom User-Agent string. Default identifies as Devil Scrapes RSS reader.
`proxyConfiguration`	`object`	no	`{"useApifyProxy": false}`	Some publishers rate-limit scrapers. Apify Proxy provides sticky sessions and IP rotation when needed.

Example input

{
  "feedUrls": [
    "https://news.ycombinator.com/rss",
    "https://feeds.arstechnica.com/arstechnica/index"
  ],
  "maxItemsPerFeed": 25,
  "includeContent": true,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

📤 Output

Every row is one feed item. All fields follow Pydantic validation — no nulls where a value existed, no phantom fields.

Field	Type	Notes
`feed_url`	`string`	Source feed URL passed in `feedUrls`.
`feed_title`	`string \| null`	Feed channel title.
`feed_format`	`string`	`"rss"` or `"atom"`.
`item_id`	`string \| null`	Item GUID (RSS) or `id` (Atom). Used for deduplication.
`title`	`string`	Item headline.
`link`	`string`	Item permalink URL.
`author`	`string \| null`	Author from `dc:creator` or Atom `author/name`.
`summary`	`string \| null`	Short description / `atom:summary`.
`content_html`	`string \| null`	Full HTML body when the feed includes it.
`categories`	`array`	Item tags / categories (empty array if none).
`published`	`string \| null`	Publish timestamp in ISO-8601 format.
`updated`	`string \| null`	Updated timestamp in ISO-8601 format.
`scraped_at`	`string`	ISO-8601 timestamp for when this row was recorded.

Example output

{
  "feed_url": "https://news.ycombinator.com/rss",
  "feed_title": "Hacker News",
  "feed_format": "rss",
  "item_id": "https://news.ycombinator.com/item?id=48000000",
  "title": "Show HN: Building a hosted RSS parser for the post-LLM web",
  "link": "https://news.ycombinator.com/item?id=48000000",
  "author": null,
  "summary": "A discussion about ...",
  "content_html": null,
  "categories": [],
  "published": "2026-05-15T20:00:00+00:00",
  "updated": null,
  "scraped_at": "2026-06-01T09:00:00+00:00"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

Event	USD	What it is
`actor-start`	$0.005	One-off warm-up charge per run
`result`	$0.001	Per dataset item written

Example: 1 000 items at the rates above ≈ $1.00.

No subscription, no monthly minimum, no card to start — Apify gives every new account $5 of free credit, which covers your first 5 000 rows.

🚧 Limitations

Paginated feeds — we don't follow <link rel="next"> paged feeds automatically. Pass each page URL explicitly if you need full history.
JavaScript-rendered feeds — feeds that require JavaScript to load are not supported. You would need a browser-based Actor for those.
Malformed XML — feedparser is lenient and handles most broken XML, but severely corrupted feeds may yield partial or empty results. The run surfaces a warning, not a silent empty dataset.
Rate-limiting by feed hosts — heavily scraped feeds (e.g. Reddit) may enforce per-IP rate limits. Enable Apify Proxy in proxyConfiguration to rotate IPs.

❓ FAQ

Is this the same as an rss parser api?

Functionally, yes — you call it via the Apify API (or the Console UI), pass feed URLs, and get back structured JSON. The difference is that we handle the messy parts a bare HTTP client doesn't: malformed XML, charset detection, multi-dialect normalisation, and the network-level blocks that make your home-rolled parser fail on 1 in 20 feeds.

Does this handle podcasts?

Yes — podcast RSS is standard RSS with <enclosure> tags. This Actor is a capable podcast RSS parser: the enclosure URL (the audio file) appears in the link field for each episode row, alongside the episode title, author, and published date.

What about atom feed parser support?

Full Atom 1.0 support is built in. The feed_format field tells you which dialect was parsed. Both RSS and Atom rows share the same output schema, so your downstream code needs no format-specific logic.

Why is content_html empty for some feeds?

Some publishers deliberately publish summary-only feeds to drive clicks to their site. The full body lives on the publisher's page, not in the feed XML. We surface what the feed provides — no fabrication.

What if a feed URL returns an error?

The Actor logs the failure with the HTTP status code, marks that feed as errored in the status message, and continues processing the remaining URLs. You never get a silent empty dataset — partial success is surfaced explicitly.

Can I run this on a schedule?

Yes. Use Apify Schedules to trigger a run on any cron cadence. Pair it with a named dataset to accumulate a rolling archive without overwriting previous results.

Does it deduplicate items across runs?

Within a single run, items are deduplicated by GUID / Atom id. Across runs, deduplication is your responsibility — filter by item_id in your downstream pipeline or use a named dataset with upsert logic.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.

RSS Feed Reader — RSS & Atom to Clean JSON

omao/rss-feed

Fetch any RSS or Atom feed and get clean, structured JSON: title, link, author, published date, HTML-stripped summary and tags. One row per entry. Fast, no setup.

Marouane Oulabass

Universal RSS/Atom Feed Reader

blazing_stake/rss-feed-reader

Parse any RSS or Atom feed into clean JSON: title, link, author, date, categories, content, media. Handles both formats. For content monitoring and aggregation.

Mehmet Kut

RSS & News Feed Extractor - Articles to JSON/CSV

pear_fight/rss-news-feed-extractor-articles-to-json-csv

Parse any RSS or Atom feed into clean, structured article data: title, link, author, publish date, categories, summary and full content. Handles both RSS and Atom formats. Perfect for news monitoring, content aggregation and feeding data pipelines. Export to JSON, CSV, Excel.

Harald

RSS Feed Scraper - RSS & Atom Data

benthepythondev/rss-feed-scraper

Scrape RSS and Atom feeds into structured records with title, URL, author, publish date, categories, image and summary.

Ben

RSS Feed Discovery — Find RSS & Atom Feeds

q_services/rss-feed-discovery

Find RSS, Atom and JSON Feed URLs from websites using HTML discovery tags and common feed paths.

Q Services

RSS Feed Scraper & RSS to JSON Converter

xtech/feed-extractor

Scrape and parse RSS, Atom, JSON Feed (and podcast RSS) URLs into clean, structured JSON. Outputs one dataset row per feed entry/item for easy export to CSV/JSON and automations.

Xtech

RSS & Atom Feed Aggregator

mahogany_songbird/rss-feed-aggregator

Parse RSS/Atom feeds into structured items.

Britton Furness

RSS Feed Scraper

ef12/rss-scraper

Fetch and parse any RSS or Atom feed into structured JSON. Get titles, links, descriptions, authors, dates, and categories.

Daniel Wilson

RSS Feed Parser â€” Convert Any RSS or Atom Feed to Clean JSON

eliai/rss-feed-parser

RSS feed parser for developers and AI agents: pass any RSS or Atom feed URL as input and get back clean, structured JSON items (title, link, date, and other feed fields). Pay per result â€” cost scales with items parsed, nothing hidden.

Anthony Snider

RSS / Atom Feed to Dataset

wiry_kingdom/rss-feed-to-dataset

Convert any RSS 2.0, Atom 1.0, or RDF feed into a clean structured dataset. Extracts title, link, pubDate, author, summary, content, categories, enclosures. Works with podcasts, news, blogs, GitHub releases. No API keys.