RSS / Atom Feed Scraper
Pricing
Pay per event
RSS / Atom Feed Scraper
Parse any RSS or Atom feed into a clean structured dataset — title, link, author, published date, summary, full content (HTML), tags, GUID. Handles RSS 2.0, Atom 1.0, and the common content:encoded / dc:creator extensions.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
🎯 What this scrapes
RSS and Atom are still the most reliable way to subscribe to a publication. This Actor parses any feed URL — news site, blog, podcast, GitHub release feed, Reddit, Substack, Medium per-user — and writes one row per item. Output is normalised across RSS and Atom dialects so downstream code doesn't need to care which it was.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- News dashboard — aggregate 20 publications into one stream, pipe to Slack.
- Brand monitoring — track every Google Alerts RSS feed for your competitors.
- Content automation — feed company-blog RSS into a translation pipeline.
- Podcast metadata — RSS podcast feeds parse cleanly here (we surface
linkper episode).
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
feedUrls | array | yes | ['https://news.ycombinator.com/rss'] | List of RSS / Atom feed URLs. One URL per line. Each becomes one or more dataset rows. |
maxItemsPerFeed | integer | no | 50 | Cap on items pulled from a single feed. |
includeContent | boolean | no | True | When true, include the full HTML body (RSS content:encoded / Atom content). When false, only the summary. |
userAgent | string | no | 'DevilScrapesBot/1.0 (+https://apify.com/DevilScrapes)' | Custom UA string. Default identifies as Devil Scrapes RSS reader. |
proxyConfiguration | object | no | {'useApifyProxy': False} | Some publishers gate scrapers. Apify Proxy helps with sticky sessions. |
Example input
{"feedUrls": ["https://news.ycombinator.com/rss"],"maxItemsPerFeed": 3,"includeContent": false,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
feed_url | string | Source feed URL. |
feed_title | ['string', 'null'] | Feed title (the channel title). |
feed_format | string | rss or atom. |
item_id | ['string', 'null'] | Item GUID / Atom id. |
title | string | Item title. |
link | string | Item permalink URL. |
author | ['string', 'null'] | Item author (dc:creator or atom:author/name). |
summary | ['string', 'null'] | Short summary (description / atom:summary). |
content_html | ['string', 'null'] | Full HTML body when available. |
categories | array | Item categories / tags. |
published | ['string', 'null'] | Publish timestamp (ISO-8601 if parseable). |
updated | ['string', 'null'] | Updated timestamp. |
scraped_at | string | When this row was recorded. |
Example output
{"feed_url": "https://news.ycombinator.com/rss","feed_title": "Hacker News","feed_format": "rss","title": "Show HN: \u2026","link": "https://news.ycombinator.com/item?id=48000000","published": "2026-05-15T20:00:00+00:00"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.001 | Per dataset item |
Example: 1 000 results at the rates above ≈ $1.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
We don't follow <link rel="next"> paged feeds — pass each page URL explicitly. We don't render JavaScript-emitted feeds; you'd need a browser-based Actor for those.
❓ FAQ
Does this handle podcasts?
Yes — podcast RSS is just RSS with <enclosure>. The enclosure URL appears in link for episode rows.
What if a feed is malformed?
feedparser is lenient — it'll parse most malformed feeds and warn rather than fail. The Actor surfaces the items it could extract.
Why is content_html empty?
Some feeds publish summary-only; the full body is on the publisher's site.
Atom vs RSS?
We normalise both. feed_format tells you which dialect we parsed.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.