# RSS / Atom Feed to Dataset (`wiry_kingdom/rss-feed-to-dataset`) Actor

Convert any RSS 2.0, Atom 1.0, or RDF feed into a clean structured dataset. Extracts title, link, pubDate, author, summary, content, categories, enclosures. Works with podcasts, news, blogs, GitHub releases. No API keys.

- **URL**: https://apify.com/wiry\_kingdom/rss-feed-to-dataset.md
- **Developed by:** [Mohieldin Mohamed](https://apify.com/wiry_kingdom) (community)
- **Categories:** Developer tools, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## RSS / Atom Feed to Dataset

**Convert any RSS, Atom, or RDF feed into a clean structured Apify dataset in seconds.**

Point this actor at a feed URL — Hacker News, GitHub releases, Reddit, your favorite blog, an iTunes podcast, the New York Times — and it returns every item as a normalized JSON row you can download as JSON, CSV, HTML, or Excel.

### What does RSS Feed to Dataset do?

Web feeds are a goldmine for content monitoring, news aggregation, and competitive intelligence — but every parser library you'd write is slightly different and most break on edge cases (CDATA, namespaced tags, empty fields, weird date formats). This actor handles all that for you and gives you a **single normalized output schema** regardless of whether the feed is RSS 2.0, Atom 1.0, or RDF 1.0.

Try it: the default input is **`https://news.ycombinator.com/rss`** — press Start and you'll get back the current top 30 HN stories in seconds.

Apify platform advantages: **scheduled runs** (poll a feed every hour), **API access** (pull dataset directly into Zapier/n8n), **integrations** (push items to Google Sheets, Slack, Airtable), and **proxy rotation** if a feed blocks server IPs.

### Why use RSS Feed to Dataset?

- **Content monitoring** — track every new post on a competitor's blog
- **News aggregation** — pull headlines from 50 news sources into one CSV
- **Backup** — archive your own feed regularly so you don't lose old posts
- **LLM training data** — feed structured news content into an embeddings model
- **Podcast catalog** — extract iTunes feeds into a dataset of episodes with audio URLs
- **Release notification** — watch GitHub release feeds for libraries you depend on
- **Custom Slack bot** — bridge any RSS feed into Slack via Apify webhooks

### How to use RSS Feed to Dataset

1. Click **Try for free** (or **Start** if you're already logged in)
2. Paste one or more feed URLs into **Feed URLs** (e.g. `https://news.ycombinator.com/rss`)
3. Optionally cap **Max items per feed** (default 100)
4. Click **Start**
5. Download the dataset in JSON, CSV, HTML, or Excel — or hit the API endpoint

### Input

- **Feed URLs** — one or more RSS/Atom/RDF feed URLs
- **Max items per feed** — cap on items per feed (default 100, use 0 for unlimited)
- **Include full content** — attach `<content:encoded>` body to each item (default: yes)
- **Include raw XML** — debug mode: attach the original raw item XML (default: no)
- **Proxy configuration** — optional Apify Proxy for paid/protected feeds

### Output

```json
{
    "title": "Show HN: Atlas — 6 MCP servers for Claude",
    "link": "https://news.ycombinator.com/item?id=12345678",
    "guid": "12345678",
    "pubDate": "Tue, 15 Apr 2026 15:42:00 +0000",
    "author": "mohye24k",
    "summary": "Atlas is a suite of 6 MCP servers...",
    "content": "<p>Full HTML content here...</p>",
    "categories": ["AI", "Open Source"],
    "enclosureUrl": null,
    "enclosureType": null,
    "enclosureLength": null,
    "feedTitle": "Hacker News",
    "feedDescription": "Links for the intellectually curious",
    "feedLink": "https://news.ycombinator.com/",
    "feedUrl": "https://news.ycombinator.com/rss",
    "feedType": "rss",
    "extractedAt": "2026-04-15T17:00:00.000Z"
}
````

### Data table

| Field | Type | Description |
|-------|------|-------------|
| `title` | string | Item title |
| `link` | string | Permalink to the item |
| `guid` | string | Unique identifier (RSS) or `<id>` (Atom) |
| `pubDate` | string | Publication date as found in the feed |
| `author` | string | Author name (`<author>`, `<dc:creator>`, or `<author>/<name>`) |
| `summary` | string | Short description |
| `content` | string | Full body (`<content:encoded>` for RSS, `<content>` for Atom) |
| `categories` | array | List of category tags |
| `enclosureUrl` | string | Attached file URL (podcasts, attachments) |
| `enclosureType` | string | MIME type of enclosure |
| `enclosureLength` | number | File size in bytes |
| `feedTitle` | string | Feed channel title |
| `feedDescription` | string | Feed channel description |
| `feedLink` | string | Feed website URL |
| `feedUrl` | string | The feed URL you provided |
| `feedType` | string | `rss`, `atom`, or `rdf` |
| `extractedAt` | string | ISO timestamp of extraction |

### Pricing

This actor uses Apify's **pay-per-event** pricing:

- **Actor start**: $0.01 per run
- **Per item extracted**: $0.002 per item

**Example costs:**

- Hacker News RSS (30 items) → ~$0.07
- A blog feed with 50 items → ~$0.11
- 10 feeds × 50 items each → ~$1.01

Free Apify tier members get $5/month in platform credits, which covers ~2,000 items per month.

### Tips and advanced options

- **Schedule daily runs** to catch new items as they appear, then deduplicate by `guid` downstream
- **Combine multiple feeds in one run** to save on the actor-start fee — pass an array
- **Disable `includeContent`** for faster runs and smaller datasets when you only need headlines
- **Enable Apify Proxy** for feeds behind Cloudflare or rate-limited
- **Pipe into Slack / Discord / Email** via Apify integrations

### FAQ and support

**Does it support podcast feeds?** Yes. Episode `enclosureUrl` is included in the output.

**Does it support paywall / authenticated feeds?** Yes — pass a feed URL with `?token=...` or use the proxy configuration to route through your own IP.

**What about feeds with custom namespaces?** The actor extracts `dc:creator`, `content:encoded`, and other common namespaces. For exotic namespaces, enable `includeRawXml` and parse downstream.

**Found a bug?** Open an issue on the Issues tab.

# Actor input Schema

## `feedUrls` (type: `array`):

One or more feed URLs. Supports RSS 2.0, Atom 1.0, RDF 1.0, and iTunes podcast feeds.

## `maxItemsPerFeed` (type: `integer`):

Cap on items extracted per feed (0 = unlimited). Useful for very large feeds.

## `includeContent` (type: `boolean`):

Include the full <content:encoded> or <content> body. Disable for faster runs and smaller datasets.

## `includeRawXml` (type: `boolean`):

Attach the original raw XML for each item (useful for debugging or feeds with custom namespaces).

## `proxyConfiguration` (type: `object`):

Optional Apify Proxy. Some feeds (paid newsletters, intranet) block direct server IPs.

## Actor input object example

```json
{
  "feedUrls": [
    {
      "url": "https://news.ycombinator.com/rss"
    }
  ],
  "maxItemsPerFeed": 100,
  "includeContent": true,
  "includeRawXml": false,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "feedUrls": [
        {
            "url": "https://news.ycombinator.com/rss"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("wiry_kingdom/rss-feed-to-dataset").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "feedUrls": [{ "url": "https://news.ycombinator.com/rss" }] }

# Run the Actor and wait for it to finish
run = client.actor("wiry_kingdom/rss-feed-to-dataset").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "feedUrls": [
    {
      "url": "https://news.ycombinator.com/rss"
    }
  ]
}' |
apify call wiry_kingdom/rss-feed-to-dataset --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=wiry_kingdom/rss-feed-to-dataset",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "RSS / Atom Feed to Dataset",
        "description": "Convert any RSS 2.0, Atom 1.0, or RDF feed into a clean structured dataset. Extracts title, link, pubDate, author, summary, content, categories, enclosures. Works with podcasts, news, blogs, GitHub releases. No API keys.",
        "version": "0.1",
        "x-build-id": "Pr7YiOmkGLbkyxLPp"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/wiry_kingdom~rss-feed-to-dataset/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-wiry_kingdom-rss-feed-to-dataset",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/wiry_kingdom~rss-feed-to-dataset/runs": {
            "post": {
                "operationId": "runs-sync-wiry_kingdom-rss-feed-to-dataset",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/wiry_kingdom~rss-feed-to-dataset/run-sync": {
            "post": {
                "operationId": "run-sync-wiry_kingdom-rss-feed-to-dataset",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "feedUrls"
                ],
                "properties": {
                    "feedUrls": {
                        "title": "RSS / Atom feed URLs",
                        "type": "array",
                        "description": "One or more feed URLs. Supports RSS 2.0, Atom 1.0, RDF 1.0, and iTunes podcast feeds.",
                        "default": [
                            {
                                "url": "https://news.ycombinator.com/rss"
                            }
                        ],
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "maxItemsPerFeed": {
                        "title": "Max items per feed",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Cap on items extracted per feed (0 = unlimited). Useful for very large feeds.",
                        "default": 100
                    },
                    "includeContent": {
                        "title": "Include full content",
                        "type": "boolean",
                        "description": "Include the full <content:encoded> or <content> body. Disable for faster runs and smaller datasets.",
                        "default": true
                    },
                    "includeRawXml": {
                        "title": "Include raw XML for each item",
                        "type": "boolean",
                        "description": "Attach the original raw XML for each item (useful for debugging or feeds with custom namespaces).",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Optional Apify Proxy. Some feeds (paid newsletters, intranet) block direct server IPs.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
