RSS / XML Scraper
Pricing
Pay per usage
RSS / XML Scraper
Meet the RSS / XML Scraper: the most advanced actor for parsing any RSS feed or XML file. It effortlessly extracts clean, structured data from even the most complex sources. Your ultimate tool for content aggregation, data monitoring, and content analysis.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
8
Total users
3
Monthly active users
18 days ago
Last modified
Categories
Share
RSS/XML Scraper
📋 What does this Actor do?
This Actor scrapes RSS/Atom feeds and extracts structured data from feed entries. It automatically discovers RSS feeds from websites and can optionally extract full article content. All extracted data is stored in the Apify dataset for easy processing and analysis.
✨ Key Features
- 📡 Feed Scraping: Extract data from RSS/Atom feeds
- 🔍 Auto Discovery: Find RSS feeds automatically from websites
- 📄 Full Content: Optional extraction of complete article content
- ⚡ Fast Processing: Asynchronous processing for high performance
- 🎯 Structured Data: Clean, structured output in JSON format
- 🔧 Flexible Input: Support for multiple URL formats and input methods
📥 Input
The Actor accepts various input formats to accommodate different use cases.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
urls | string or array | Required | - | RSS feed URLs, website URLs, or both. Supports multiple formats: • Single URL: "https://example.com/feed.xml"• Multi-line: One URL per line • Comma-separated: "url1,url2,url3"• JSON array: ["url1", "url2"] |
extractContent | boolean | Optional | false | Extract full article content from feed entry links |
maxEntries | number | Optional | 0 | Maximum entries to process per feed (0 = all entries) |
discoverFeeds | boolean | Optional | false | Automatically discover RSS feeds from website URLs |
userAgent | string | Optional | - | Custom user agent string for HTTP requests |
timeout | number | Optional | 30 | Request timeout in seconds |
concurrency | number | Optional | 5 | Maximum number of feeds/websites processed in parallel |
Legacy Parameters (for backward compatibility)
| Parameter | Type | Required | Description |
|---|---|---|---|
rss_url | string | Optional | Single RSS feed URL (alternative to urls) |
xml_url | string | Optional | Single XML feed URL (alternative to urls) |
📤 Output
The Actor outputs structured JSON data to the Apify dataset. Data is available in multiple views for different analysis needs.
Data Structure
Each processed entry contains the following fields:
{"feed_url": "https://example.com/feed.xml","title": "Article Title","link": "https://example.com/article","description": "Article description or summary","author": "John Doe","published": "2025-11-08T10:30:00+00:00","id": "unique-entry-identifier","tags": ["tag1", "tag2"],"collected_at": "2025-11-08T12:00:00+00:00"}
Additional Fields (when extractContent: true)
{"full_text": "Complete article text content...","full_html": "<p>Complete article HTML...</p>","keywords": ["keyword1", "keyword2"],"top_image": "https://example.com/image.jpg","authors": ["Author Name"],"publish_date": "2025-11-08T10:30:00+00:00","meta_description": "Article meta description"}
Dataset Views
The dataset provides multiple views for different analysis needs:
- 📊 Overview: Complete entry data with all fields
- 📰 Feeds: Feed-level information and metadata
- 📝 Articles: Article content and extracted data
🚀 Usage Examples
Basic RSS Feed Scraping
{"urls": "https://example.com/feed.xml"}
Multiple Feeds
{"urls": ["https://blog1.com/feed.xml","https://blog2.com/rss","https://news.com/atom.xml"]}
Website Feed Discovery
{"urls": "https://example.com","discoverFeeds": true}
Full Content Extraction
{"urls": "https://tech-news.com/feed.xml","extractContent": true,"maxEntries": 50}
Advanced Configuration
{"urls": "https://example.com/feed.xml","extractContent": true,"maxEntries": 100,"discoverFeeds": false,"userAgent": "Custom Bot/1.0","timeout": 60}
Legacy Input Format
{"rss_url": "https://example.com/feed.xml","extractContent": true}
💰 Cost & Performance
Compute Units
- Free: 1,000 entries per month
- Paid: $0.25 per 1,000 entries
Performance
- Typical Speed: 100-500 entries per minute
- Concurrent Processing: Multiple feeds processed simultaneously
- Memory Usage: ~50MB base + ~10MB per active feed
⚠️ Limits & Quotas
- Maximum URLs: 100 URLs per run
- Maximum Entries: 10,000 entries per feed (configurable)
- Request Timeout: 300 seconds maximum
- Rate Limiting: Automatic handling of rate limits
- File Size: No limit on extracted content
🛠️ Troubleshooting
Common Issues
"No feeds found"
- Check if the URL is accessible
- Verify the URL points to a valid RSS/Atom feed
- Use
discoverFeeds: truefor website URLs
"Content extraction failed"
- Some websites block automated access
- Try with a custom
userAgent - Check if the article URL is still valid
"Timeout errors"
- Increase the
timeoutparameter - Reduce
maxEntriesfor large feeds - Check network connectivity
Error Handling
The Actor automatically handles:
- Network timeouts and retries
- Invalid URLs and feeds
- Malformed content
- Rate limiting from websites
