# RSS Feed Aggregator & Article Extractor (`darknezz/rss-feed-aggregator`) Actor

Aggregate RSS/Atom feeds and extract full article content. Multi-feed ingestion, deduplication, keyword filtering, rich metadata. Returns clean JSON with full-text extraction. For news monitoring, AI training, and curation.

- **URL**: https://apify.com/darknezz/rss-feed-aggregator.md
- **Developed by:** [Oaida Adrian](https://apify.com/darknezz) (community)
- **Categories:** News, AI, Automation
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## RSS Feed Aggregator & Article Extractor

Aggregate RSS/Atom feeds and extract full article content with clean, structured JSON output.

### Features

- **Multi-feed aggregation** — Process dozens of RSS/Atom feeds in a single run
- **Full article extraction** — Uses [trafilatura](https://github.com/adbar/trafilatura) for high-precision main content extraction
- **Smart deduplication** — Removes duplicate articles by URL or title
- **Date & category filtering** — Filter articles by publish date or feed categories
- **Rich metadata** — Extracts authors, tags, enclosures, images, word counts, and more
- **Structured output** — Clean JSON schema ready for downstream processing, AI training, or content curation

### Input Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `feedUrls` | array | *required* | RSS/Atom feed URLs to aggregate |
| `maxArticles` | integer | 100 | Maximum articles to extract (0 = unlimited) |
| `extractFullContent` | boolean | true | Fetch and extract full article text from each URL |
| `deduplicateBy` | string | "link" | Deduplication field: `link`, `title`, or `none` |
| `dateFilter` | string | "" | Only include articles after this date (ISO 8601) |
| `includeCategories` | string | "" | Comma-separated categories to include |
| `proxyConfiguration` | object | Apify proxy | Proxy settings for content extraction |

### Output Fields

| Field | Description |
|---|---|
| `title` | Article title |
| `link` | Article URL |
| `pubDate` | Publication date (ISO 8601) |
| `source` | Feed source name |
| `author` | Article author |
| `summary` | RSS feed summary/excerpt |
| `fullText` | Full extracted article text (if `extractFullContent` is enabled) |
| `wordCount` | Word count of extracted text |
| `categories` | Article categories/tags |
| `tags` | Extracted meta tags |
| `enclosures` | Media attachments (images, podcasts, etc.) |

### Use Cases

- **News monitoring** — Track multiple news sources in one feed
- **Content curation** — Aggregate niche content for blogs or newsletters
- **AI training data** — Collect clean text for model training
- **Media intelligence** — Monitor competitors, track mentions, analyse trends
- **SEO monitoring** — Track industry publications and backlink opportunities

### Pricing

This actor uses **pay-per-event** pricing. You are charged per article successfully extracted.

# Actor input Schema

## `feedUrls` (type: `array`):

List of RSS/Atom feed URLs to aggregate.
## `maxArticles` (type: `integer`):

Maximum number of articles to extract per run. 0 = no limit.
## `extractFullContent` (type: `boolean`):

If enabled, fetches and extracts the full article text from each article URL.
## `deduplicateBy` (type: `string`):

Field to use for deduplication.
## `dateFilter` (type: `string`):

Only include articles published after this date (ISO 8601, e.g. 2025-01-01). Leave empty for no filter.
## `includeCategories` (type: `string`):

Comma-separated list of categories to include. Leave empty for all.
## `proxyConfiguration` (type: `object`):

Use Apify proxy for article content extraction.

## Actor input object example

```json
{
  "maxArticles": 100,
  "extractFullContent": true,
  "deduplicateBy": "link",
  "dateFilter": "",
  "includeCategories": "",
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("darknezz/rss-feed-aggregator").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "proxyConfiguration": { "useApifyProxy": True } }

# Run the Actor and wait for it to finish
run = client.actor("darknezz/rss-feed-aggregator").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call darknezz/rss-feed-aggregator --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=darknezz/rss-feed-aggregator",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RSS Feed Aggregator & Article Extractor",
        "description": "Aggregate RSS/Atom feeds and extract full article content. Multi-feed ingestion, deduplication, keyword filtering, rich metadata. Returns clean JSON with full-text extraction. For news monitoring, AI training, and curation.",
        "version": "0.1",
        "x-build-id": "UiQ4dyLNmmnP1AwzL"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/darknezz~rss-feed-aggregator/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-darknezz-rss-feed-aggregator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/darknezz~rss-feed-aggregator/runs": {
            "post": {
                "operationId": "runs-sync-darknezz-rss-feed-aggregator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/darknezz~rss-feed-aggregator/run-sync": {
            "post": {
                "operationId": "run-sync-darknezz-rss-feed-aggregator",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "feedUrls"
                ],
                "properties": {
                    "feedUrls": {
                        "title": "RSS/Atom Feed URLs",
                        "type": "array",
                        "description": "List of RSS/Atom feed URLs to aggregate.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "title": "Feed URL",
                                    "description": "URL of the RSS or Atom feed.",
                                    "type": "string"
                                }
                            }
                        }
                    },
                    "maxArticles": {
                        "title": "Maximum Articles",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of articles to extract per run. 0 = no limit.",
                        "default": 100
                    },
                    "extractFullContent": {
                        "title": "Extract Full Article Content",
                        "type": "boolean",
                        "description": "If enabled, fetches and extracts the full article text from each article URL.",
                        "default": true
                    },
                    "deduplicateBy": {
                        "title": "Deduplicate by",
                        "enum": [
                            "link",
                            "title",
                            "none"
                        ],
                        "type": "string",
                        "description": "Field to use for deduplication.",
                        "default": "link"
                    },
                    "dateFilter": {
                        "title": "Date filter",
                        "type": "string",
                        "description": "Only include articles published after this date (ISO 8601, e.g. 2025-01-01). Leave empty for no filter.",
                        "default": ""
                    },
                    "includeCategories": {
                        "title": "Include only these categories",
                        "type": "string",
                        "description": "Comma-separated list of categories to include. Leave empty for all.",
                        "default": ""
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Use Apify proxy for article content extraction."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
