Pricing

Pay per usage

Try for free

Go to Apify Store

RSS / XML Scraper

Try for free

Meet the RSS / XML Scraper: the most advanced actor for parsing any RSS feed or XML file. It effortlessly extracts clean, structured data from even the most complex sources. Your ultimate tool for content aggregation, data monitoring, and content analysis.

Pricing

Pay per usage

Rating

5.0

(4)

Developer

Shahid Irfan

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

RSS/XML Scraper

📋 What does this Actor do?

This Actor scrapes RSS/Atom feeds and extracts structured data from feed entries. It automatically discovers RSS feeds from websites and can optionally extract full article content. All extracted data is stored in the Apify dataset for easy processing and analysis.

✨ Key Features

📡 Feed Scraping: Extract data from RSS/Atom feeds
🔍 Auto Discovery: Find RSS feeds automatically from websites
📄 Full Content: Optional extraction of complete article content
⚡ Fast Processing: Asynchronous processing for high performance
🎯 Structured Data: Clean, structured output in JSON format
🔧 Flexible Input: Support for multiple URL formats and input methods

📥 Input

The Actor accepts various input formats to accommodate different use cases.

Input Parameters

Parameter	Type	Required	Default	Description
`urls`	`string` or `array`	Required	-	RSS feed URLs, website URLs, or both. Supports multiple formats: • Single URL: `"https://example.com/feed.xml"` • Multi-line: One URL per line • Comma-separated: `"url1,url2,url3"` • JSON array: `["url1", "url2"]`
`extractContent`	`boolean`	Optional	`false`	Extract full article content from feed entry links
`maxEntries`	`number`	Optional	`0`	Maximum entries to process per feed (0 = all entries)
`discoverFeeds`	`boolean`	Optional	`false`	Automatically discover RSS feeds from website URLs
`userAgent`	`string`	Optional	-	Custom user agent string for HTTP requests
`timeout`	`number`	Optional	`30`	Request timeout in seconds
`concurrency`	`number`	Optional	`5`	Maximum number of feeds/websites processed in parallel

Legacy Parameters (for backward compatibility)

Parameter	Type	Required	Description
`rss_url`	`string`	Optional	Single RSS feed URL (alternative to `urls`)
`xml_url`	`string`	Optional	Single XML feed URL (alternative to `urls`)

📤 Output

The Actor outputs structured JSON data to the Apify dataset. Data is available in multiple views for different analysis needs.

Data Structure

Each processed entry contains the following fields:

{
  "feed_url": "https://example.com/feed.xml",
  "title": "Article Title",
  "link": "https://example.com/article",
  "description": "Article description or summary",
  "author": "John Doe",
  "published": "2025-11-08T10:30:00+00:00",
  "id": "unique-entry-identifier",
  "tags": ["tag1", "tag2"],
  "collected_at": "2025-11-08T12:00:00+00:00"
}

Additional Fields (when `extractContent: true`)

{
  "full_text": "Complete article text content...",
  "full_html": "<p>Complete article HTML...</p>",
  "keywords": ["keyword1", "keyword2"],
  "top_image": "https://example.com/image.jpg",
  "authors": ["Author Name"],
  "publish_date": "2025-11-08T10:30:00+00:00",
  "meta_description": "Article meta description"
}

Dataset Views

The dataset provides multiple views for different analysis needs:

📊 Overview: Complete entry data with all fields
📰 Feeds: Feed-level information and metadata
📝 Articles: Article content and extracted data

🚀 Usage Examples

Basic RSS Feed Scraping

{
  "urls": "https://example.com/feed.xml"
}

Multiple Feeds

{
  "urls": [
    "https://blog1.com/feed.xml",
    "https://blog2.com/rss",
    "https://news.com/atom.xml"
  ]
}

Website Feed Discovery

{
  "urls": "https://example.com",
  "discoverFeeds": true
}

Full Content Extraction

{
  "urls": "https://tech-news.com/feed.xml",
  "extractContent": true,
  "maxEntries": 50
}

Advanced Configuration

{
  "urls": "https://example.com/feed.xml",
  "extractContent": true,
  "maxEntries": 100,
  "discoverFeeds": false,
  "userAgent": "Custom Bot/1.0",
  "timeout": 60
}

Legacy Input Format

{
  "rss_url": "https://example.com/feed.xml",
  "extractContent": true
}

💰 Cost & Performance

Compute Units

Free: 1,000 entries per month
Paid: $0.25 per 1,000 entries

Performance

Typical Speed: 100-500 entries per minute
Concurrent Processing: Multiple feeds processed simultaneously
Memory Usage: ~50MB base + ~10MB per active feed

⚠️ Limits & Quotas

Maximum URLs: 100 URLs per run
Maximum Entries: 10,000 entries per feed (configurable)
Request Timeout: 300 seconds maximum
Rate Limiting: Automatic handling of rate limits
File Size: No limit on extracted content

🛠️ Troubleshooting

Common Issues

"No feeds found"

Check if the URL is accessible
Verify the URL points to a valid RSS/Atom feed
Use discoverFeeds: true for website URLs

"Content extraction failed"

Some websites block automated access
Try with a custom userAgent
Check if the article URL is still valid

"Timeout errors"

Increase the timeout parameter
Reduce maxEntries for large feeds
Check network connectivity

Error Handling

The Actor automatically handles:

Network timeouts and retries
Invalid URLs and feeds
Malformed content
Rate limiting from websites

📚 Resources

RSS / XML Scraper

jupri/rss-xml-scraper

💫 Scrape RSS / XML / Sitemap or other XML

cat

856

4.3

RSS Feed Aggregator

eloquent_mountain/rss-feed-aggregator

RSS Feed Aggregator Collect and consolidate multiple RSS feeds effortlessly with this Apify actor. Fetch items concurrently from various feeds, deduplicate entries, and select specific fields for a customized output. Ideal for news aggregation and content curation.

Paco

Podcast Intelligence Aggregator - iTunes API & RSS Feed Scraper

benthepythondev/podcast-intelligence-aggregator

Extract comprehensive podcast data from iTunes/Apple Podcasts and RSS feeds. Perfect for podcast networks, advertisers, market research, and business intelligence. Search by keywords, lookup by ID, or parse direct RSS feeds. Get episodes, metadata, analytics, and more.

ben

Website to RSS Feed Generator

constant_quadruped/website-to-rss

Convert any website into an RSS feed instantly. Auto-detects blog posts, news, and articles. Supports JavaScript sites via Playwright. Filter by keywords, extract full content, output as RSS or JSON. Perfect for competitor monitoring, news aggregation, and research tracking.

Website to RSS Feed Converter - Monitor Any Website

scrappy_garden/website-to-rss-converter

Convert any website into an RSS feed automatically. Monitor blogs, news sites, e-commerce stores for new content. Get instant notifications when pages change. Perfect for content aggregation, monitoring competitors, and staying updated. Export to RSS, JSON, or XML.

Bikram Adhikari

Query RSS Feeds

xiaopanai/Apify-RSS-Search

Run keyword queries on custom list of RSS feeds to find the contents that you want. Uses fuzzy logic to match the query to the contents and also applies a recency adjustment (higher penalty for older entries).

Xiaopan AI

188

Linkedin Company Profile Scraper

scraper-engine/linkedin-company-profile-scraper

LinkedIn Company Profile Scraper extracts company details including name, industry, size, description, specialties, posts, and engagement stats. Perfect for market research, lead generation, or competitor analysis. Export structured data in JSON, CSV, or Excel for easy insights.

Scraper Engine

244

5.0

Financial News Scrapper

coder_zoro/financial-news-scrapper

Scrape real-time financial news across crypto, stocks, markets, and ETFs. Get timestamps, headlines, and links for fast market insights. Ideal for traders, analysts, financial apps, dashboards, and automated trading systems to stay ahead of market trends.

Zoro

Linkedin Company Scraper

icypeas_official/linkedin-company-scraper

LinkedIn Company Scraper powered by Icypeas. Upload a list of LinkedIn company URLs to find their information. Supports CSV, TSV, and semicolon-separated formats. Returns all the LinkedIn information about companies.