# News Aggregator - RSS Feed Parser & Article Extractor (`klondikeking/news-aggregator`) Actor

Extract structured news articles from any RSS feed. Get headlines, summaries, publication dates, authors, and source URLs in clean JSON. Perfect for media monitoring, content curation, and news aggregation pipelines.

- **URL**: https://apify.com/klondikeking/news-aggregator.md
- **Developed by:** [Pierrick McD0nald](https://apify.com/klondikeking) (community)
- **Categories:** News, Marketing
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## News Aggregator - RSS Feed Parser — Structured News Data from Any RSS Feed

Aggregate news articles from multiple RSS feeds into clean, structured JSON data. This Actor extracts titles, links, descriptions, publication dates, categories, and source domains from standard RSS 2.0 and Atom feeds, making it easy to monitor news sources, track topics, and build content pipelines without writing custom parsers.

### Use Cases

- **Content Monitoring** — Track news from multiple sources in a single structured dataset for analysis, dashboards, or alerting
- **Market Intelligence** — Monitor industry news, competitor mentions, and trending topics across publications
- **Research & Analysis** — Collect news articles for academic research, sentiment analysis, or trend detection
- **Media Aggregation** — Build curated news feeds for websites, newsletters, or internal dashboards
- **SEO & Content Strategy** — Analyze publishing frequency, topic coverage, and content gaps across news sources

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `feedUrls` | Array | Yes | List of RSS or Atom feed URLs to aggregate (e.g., `['https://feeds.bbci.co.uk/news/rss.xml']`) |
| `maxItems` | Number | No | Maximum total articles to extract across all feeds (default: 100) |
| `proxyConfiguration` | Object | No | Proxy configuration for feed requests. Disabled by default since RSS feeds rarely require proxies |

### Output

The Actor outputs a dataset with the following fields for each article:

```json
{
  "title": "Example News Article Title",
  "link": "https://example.com/news/article",
  "description": "A brief summary of the article content",
  "pubDate": "Mon, 28 Jun 2026 12:00:00 GMT",
  "source": "feeds.bbci.co.uk",
  "category": "World",
  "feedUrl": "https://feeds.bbci.co.uk/news/rss.xml",
  "scrapedAt": "2026-06-28T12:00:00.000Z"
}
````

### Pricing

Pay per event: **$0.001 per article extracted** (equivalent to $1.00 per 1,000 articles). No charge for failed requests or empty feeds. Pricing is based on the number of articles successfully pushed to the dataset.

### Limitations

- This Actor extracts data from RSS and Atom feeds only. It does not scrape full article content from the original websites
- Some RSS feeds may truncate descriptions or omit categories. The Actor returns exactly what the feed provides
- Very large feeds (10,000+ items) may be memory-intensive. Use `maxItems` to limit output
- Feed parsing depends on the RSS/Atom being well-formed. Malformed feeds may return partial data or errors
- This Actor does not deduplicate articles across feeds. The same article appearing in multiple feeds will be extracted multiple times

### FAQ

**Q: Can I use this with any RSS feed?**
A: Yes. The Actor supports standard RSS 2.0 and Atom feed formats. Simply provide the feed URL in the `feedUrls` input field.

**Q: Do I need a proxy?**
A: No. RSS feeds are generally public and do not require proxy rotation. The proxy configuration is disabled by default.

**Q: How many articles can I extract per run?**
A: The `maxItems` input controls the total. The default is 100. There is no hard limit, but very large runs will consume more compute.

**Q: Does this Actor extract full article text?**
A: No. This Actor extracts the metadata provided by the RSS feed (title, link, description, date, category). To extract full article text, you would need to feed the extracted links into a separate article scraper.

### Changelog

- **v1.0.0** — Initial release. RSS 2.0 and Atom feed parsing with structured JSON output, PPE charging, and multi-source aggregation.

# Actor input Schema

## `feedUrls` (type: `array`):

List of RSS feed URLs to aggregate (e.g., \['https://feeds.bbci.co.uk/news/rss.xml'])

## `maxItems` (type: `integer`):

Maximum number of articles to extract across all feeds

## `proxyConfiguration` (type: `object`):

Proxy settings for RSS feed requests

## Actor input object example

```json
{
  "feedUrls": [
    "https://feeds.bbci.co.uk/news/rss.xml"
  ],
  "maxItems": 50,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `articles` (type: `string`):

Dataset containing extracted news articles

## `stats` (type: `string`):

Key-value store with aggregation statistics

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "feedUrls": [
        "https://feeds.bbci.co.uk/news/rss.xml"
    ],
    "maxItems": 50,
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("klondikeking/news-aggregator").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "feedUrls": ["https://feeds.bbci.co.uk/news/rss.xml"],
    "maxItems": 50,
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("klondikeking/news-aggregator").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "feedUrls": [
    "https://feeds.bbci.co.uk/news/rss.xml"
  ],
  "maxItems": 50,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call klondikeking/news-aggregator --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=klondikeking/news-aggregator",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "News Aggregator - RSS Feed Parser & Article Extractor",
        "description": "Extract structured news articles from any RSS feed. Get headlines, summaries, publication dates, authors, and source URLs in clean JSON. Perfect for media monitoring, content curation, and news aggregation pipelines.",
        "version": "1.0",
        "x-build-id": "qicfcjKL4dlHD8sI6"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/klondikeking~news-aggregator/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-klondikeking-news-aggregator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/klondikeking~news-aggregator/runs": {
            "post": {
                "operationId": "runs-sync-klondikeking-news-aggregator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/klondikeking~news-aggregator/run-sync": {
            "post": {
                "operationId": "run-sync-klondikeking-news-aggregator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "feedUrls"
                ],
                "properties": {
                    "feedUrls": {
                        "title": "RSS Feed URLs",
                        "type": "array",
                        "description": "List of RSS feed URLs to aggregate (e.g., ['https://feeds.bbci.co.uk/news/rss.xml'])",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Maximum Articles",
                        "type": "integer",
                        "description": "Maximum number of articles to extract across all feeds",
                        "default": 100
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Proxy settings for RSS feed requests"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
