RSS / Atom Feed Scraper avatar

RSS / Atom Feed Scraper

Pricing

Pay per event

Go to Apify Store
RSS / Atom Feed Scraper

RSS / Atom Feed Scraper

Parse any RSS or Atom feed into a clean structured dataset — title, link, author, published date, summary, full content (HTML), tags, GUID. Handles RSS 2.0, Atom 1.0, and the common content:encoded / dc:creator extensions.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share


🎯 What this scrapes

RSS and Atom are still the most reliable way to subscribe to a publication. This Actor parses any feed URL — news site, blog, podcast, GitHub release feed, Reddit, Substack, Medium per-user — and writes one row per item. Output is normalised across RSS and Atom dialects so downstream code doesn't need to care which it was.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
  • 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
  • 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

  • News dashboard — aggregate 20 publications into one stream, pipe to Slack.
  • Brand monitoring — track every Google Alerts RSS feed for your competitors.
  • Content automation — feed company-blog RSS into a translation pipeline.
  • Podcast metadata — RSS podcast feeds parse cleanly here (we surface link per episode).

⚙️ How to use it

  1. Click Try for free at the top of the page.
  2. Fill in the input form — most fields have sensible defaults.
  3. Click Start. Output streams into the run's dataset.
  4. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.

📥 Input

FieldTypeRequiredDefaultNotes
feedUrlsarrayyes['https://news.ycombinator.com/rss']List of RSS / Atom feed URLs. One URL per line. Each becomes one or more dataset rows.
maxItemsPerFeedintegerno50Cap on items pulled from a single feed.
includeContentbooleannoTrueWhen true, include the full HTML body (RSS content:encoded / Atom content). When false, only the summary.
userAgentstringno'DevilScrapesBot/1.0 (+https://apify.com/DevilScrapes)'Custom UA string. Default identifies as Devil Scrapes RSS reader.
proxyConfigurationobjectno{'useApifyProxy': False}Some publishers gate scrapers. Apify Proxy helps with sticky sessions.

Example input

{
"feedUrls": [
"https://news.ycombinator.com/rss"
],
"maxItemsPerFeed": 3,
"includeContent": false,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

Every row is one dataset item.

FieldTypeNotes
feed_urlstringSource feed URL.
feed_title['string', 'null']Feed title (the channel title).
feed_formatstringrss or atom.
item_id['string', 'null']Item GUID / Atom id.
titlestringItem title.
linkstringItem permalink URL.
author['string', 'null']Item author (dc:creator or atom:author/name).
summary['string', 'null']Short summary (description / atom:summary).
content_html['string', 'null']Full HTML body when available.
categoriesarrayItem categories / tags.
published['string', 'null']Publish timestamp (ISO-8601 if parseable).
updated['string', 'null']Updated timestamp.
scraped_atstringWhen this row was recorded.

Example output

{
"feed_url": "https://news.ycombinator.com/rss",
"feed_title": "Hacker News",
"feed_format": "rss",
"title": "Show HN: \u2026",
"link": "https://news.ycombinator.com/item?id=48000000",
"published": "2026-05-15T20:00:00+00:00"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result$0.001Per dataset item

Example: 1 000 results at the rates above ≈ $1.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

We don't follow <link rel="next"> paged feeds — pass each page URL explicitly. We don't render JavaScript-emitted feeds; you'd need a browser-based Actor for those.

❓ FAQ

Does this handle podcasts?

Yes — podcast RSS is just RSS with <enclosure>. The enclosure URL appears in link for episode rows.

What if a feed is malformed?

feedparser is lenient — it'll parse most malformed feeds and warn rather than fail. The Actor surfaces the items it could extract.

Why is content_html empty?

Some feeds publish summary-only; the full body is on the publisher's site.

Atom vs RSS?

We normalise both. feed_format tells you which dialect we parsed.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.