RSS / XML Scraper
Pricing
Pay per usage
RSS / XML Scraper
Meet the RSS / XML Scraper: the most advanced actor for parsing any RSS feed or XML file. It effortlessly extracts clean, structured data from even the most complex sources. Your ultimate tool for content aggregation, data monitoring, and content analysis.
Pricing
Pay per usage
Rating
5.0
(4)
Developer
Shahid Irfan
Actor stats
5
Bookmarked
45
Total users
13
Monthly active users
10 days ago
Last modified
Categories
Share
RSS & XML Feed Scraper
Extract structured data from RSS, Atom, and XML feeds at scale. Collect article titles, links, descriptions, authors, timestamps, and full article text from any feed URL or website. Perfect for content aggregation, news monitoring, research, and automated data pipelines.
Features
- Multi-Feed Support — Process dozens of RSS, Atom, and XML feeds in a single run
- Auto Feed Discovery — Automatically detect and extract feed URLs from any website
- Full Article Extraction — Optionally fetch complete article text, keywords, and metadata from each entry link
- Flexible URL Input — Accepts single URLs, multi-line lists, comma-separated, or JSON arrays
- Concurrent Processing — Configurable parallel processing for high-throughput collection
- Legacy Compatibility — Supports
rss_urlandxml_urlparameters for backward compatibility - Structured Output — Clean, consistent JSON output ready for analysis or integration
Use Cases
Content Aggregation & News Monitoring
Build a centralized news feed by pulling articles from multiple sources simultaneously. Track breaking news, industry updates, and blog posts in real time without manual browsing.
Competitive Intelligence
Monitor competitors' blogs, press releases, and product update feeds. Stay ahead by automatically collecting and analyzing new content as soon as it is published.
Research & Academic Data Collection
Gather large volumes of timestamped articles from scholarly or industry publications. Build annotated datasets for NLP, sentiment analysis, or trend research.
Content Pipeline Automation
Feed scraped articles directly into CMS platforms, email newsletters, or Slack channels. Automate content curation workflows without writing custom parsers.
SEO & Backlink Monitoring
Track mentions of your brand or keywords across RSS-enabled blogs and news sites. Identify backlink opportunities and monitor your online presence at scale.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
urls | String | Yes | — | RSS/Atom feed URLs or website URLs. Accepts single URL, multi-line, comma-separated, or JSON array |
extractContent | Boolean | No | false | Extract full article text, keywords, and metadata from each entry link |
maxEntries | Integer | No | 0 | Max entries per feed (0 = all entries) |
discoverFeeds | Boolean | No | false | Scan website URLs to automatically discover RSS/Atom feeds |
userAgent | String | No | — | Custom user agent string for HTTP requests |
timeout | Integer | No | 30 | HTTP request timeout in seconds |
concurrency | Integer | No | 5 | Number of feeds or websites to process in parallel |
rss_url | String | No | — | Legacy: single RSS feed URL (alternative to urls) |
xml_url | String | No | — | Legacy: single XML feed URL (alternative to urls) |
Output Data
Each item in the dataset contains:
| Field | Type | Description |
|---|---|---|
feed_url | String | Source feed URL |
title | String | Article or entry title |
link | String | URL of the full article |
description | String | Summary or excerpt from the feed |
author | String | Author name as listed in the feed |
published | String | Publication date in ISO 8601 format |
id | String | Unique entry identifier |
tags | Array | List of category tags or labels |
collected_at | String | Timestamp of when the data was collected |
Additional Fields (when extractContent is enabled)
| Field | Type | Description |
|---|---|---|
full_text | String | Complete plain-text article content |
full_html | String | Complete article HTML content |
keywords | Array | Keywords extracted from article metadata |
top_image | String | URL of the article's main image |
authors | Array | All authors listed on the article page |
publish_date | String | Publish date parsed from article page |
meta_description | String | Meta description tag from the article page |
Usage Examples
Basic RSS Feed Scraping
Extract all entries from a single RSS feed:
{"urls": "https://feeds.bbci.co.uk/news/rss.xml"}
Multiple Feeds at Once
Collect data from several feeds in one run:
{"urls": ["https://feeds.bbci.co.uk/news/rss.xml","https://rss.cnn.com/rss/edition.rss","https://feeds.reuters.com/reuters/topNews"]}
Website Feed Discovery
Let the actor find and extract feeds from a website automatically:
{"urls": "https://techcrunch.com","discoverFeeds": true}
Full Article Content Extraction
Fetch complete article text and metadata along with feed data:
{"urls": "https://feeds.bbci.co.uk/news/technology/rss.xml","extractContent": true,"maxEntries": 50}
High-Volume Concurrent Collection
Process multiple feeds with increased concurrency and a custom limit:
{"urls": ["https://blog1.com/feed.xml","https://blog2.com/rss","https://news.com/atom.xml"],"maxEntries": 100,"concurrency": 10,"timeout": 60}
Sample Output
{"feed_url": "https://feeds.bbci.co.uk/news/rss.xml","title": "Global Markets Rally on Trade Deal Optimism","link": "https://www.bbc.com/news/business/article-12345","description": "Stock markets rose sharply after new trade agreements were announced.","author": "Jane Smith","published": "2025-11-08T10:30:00+00:00","id": "urn:bbc:news:article-12345","tags": ["business", "markets", "economy"],"collected_at": "2025-11-08T12:00:00+00:00","full_text": "Global markets saw significant gains on Friday following...","keywords": ["trade", "markets", "economy", "stocks"],"top_image": "https://ichef.bbci.co.uk/news/1024/branded_news/image.jpg","authors": ["Jane Smith"],"publish_date": "2025-11-08T10:30:00+00:00","meta_description": "Markets rise after optimism over new trade deal."}
Tips for Best Results
Choosing the Right URLs
- Use direct feed URLs (ending in
.xml,/rss,/feed,/atom) for fastest results - Enable
discoverFeedswhen you only have a website homepage URL - Test with the BBC or Reuters feed to verify your setup works correctly
Managing Entry Volume
- Leave
maxEntriesat0to collect all available entries - Set a specific limit (e.g.,
50or100) to control run time and cost - Start small during testing, then scale up for production
Optimizing Speed
- Increase
concurrencyto10or higher when processing many independent feeds - Reduce
concurrencyif a target site starts blocking requests - Use a custom
userAgentif feeds return 403 errors
Full Content Extraction
- Enable
extractContentonly when you need the full article body, not just feed summaries - This option increases run time and credit usage proportionally
- Some websites block automated article fetching — use a custom
userAgentif needed
Integrations
Connect your collected feed data with popular tools:
- Google Sheets — Export CSV for quick spreadsheet analysis
- Airtable — Build a searchable content database
- Slack — Send new article alerts to your team channel
- Make (Integromat) — Trigger automated content workflows
- Zapier — Route new entries into any connected app
- Webhooks — Push results to your own API or backend
Export Formats
Download your dataset in multiple formats:
- JSON — For developers and API integrations
- CSV — For spreadsheet and reporting tools
- Excel — For business analysis and dashboards
- XML — For legacy system integrations
Frequently Asked Questions
What feed formats are supported?
The actor supports RSS 2.0, RSS 1.0, Atom, and most standard XML feed formats. It automatically detects the format and parses entries accordingly.
Can I scrape feeds that require authentication?
Currently the actor supports public feeds only. Private or authenticated feeds are not supported.
How do I scrape multiple feeds at once?
Provide multiple URLs in the urls field as a JSON array, comma-separated list, or multi-line text — all formats are accepted.
What does discoverFeeds do exactly?
When enabled, the actor visits the provided website URL, searches the HTML for RSS/Atom feed link tags, and automatically extracts entries from any feeds it finds.
Is there a limit on the number of feeds I can process?
Up to 100 URLs per run are supported. Each feed can return up to 10,000 entries (configurable via maxEntries).
How fast is the actor?
Typical speed is 100–500 entries per minute depending on feed size, concurrency setting, and whether full content extraction is enabled.
What happens if a feed URL is invalid or unreachable?
The actor logs an error for that feed, skips it, and continues processing the remaining URLs without stopping the run.
Can I use this on a schedule?
Yes — use Apify Schedules to run this actor automatically on any interval (hourly, daily, weekly) to keep your dataset fresh.
Why is full content extraction slower?
When extractContent is enabled, the actor visits each article's link individually to fetch and parse the full page content, which adds additional HTTP requests per entry.
What is the difference between urls, rss_url, and xml_url?
urls is the primary input and supports all formats. rss_url and xml_url are legacy parameters retained for backward compatibility with older integrations.
Support
For issues or feature requests, contact support through the Apify Console.
Resources
Legal Notice
This actor is designed for legitimate data collection purposes. Users are responsible for ensuring compliance with the terms of service of any website or feed they access, as well as all applicable laws and regulations. Use collected data responsibly and respect rate limits and robots.txt directives.

