Pricing

$6.99/month + usage

News & RSS Feed Scraper

News & RSS Feed Scraper is a powerful tool that extracts structured article data from any RSS/Atom feed. Perfect for news aggregators, content analysis, and AI training pipeline

Pricing

$6.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

✨ Features

📡 Preset feeds – Instantly scrape top tech news sites (TechCrunch, The Verge, Wired, etc.) with zero configuration.
🔧 Custom feeds – Provide your own RSS/Atom feed URL and scrape any news source.
📰 Full article fetching – Optionally fetch the complete article content (HTML or plain text) from the original URL.
🖼️ Image extraction – Automatically extracts the main image from each article.
🧹 Clean output – Consistent schema with fields: title, url, description, author, category, published, image.
🌐 Proxy support – Built‑in Apify proxy with residential, datacenter, or custom proxy configurations to avoid blocking.
📊 Result limits – Control the number of articles returned per run.
⚡ Fast & scalable – Built on top of Axios and rss-parser with concurrency control.
🔍 SEO friendly – Output is ready to be indexed or fed into downstream tools.

🚀 How It Works

Provide input – Choose a preset feed or supply a custom RSS URL.
Scraping – The actor fetches the RSS feed, parses entries, and extracts metadata.
Optional full content – If enabled, it navigates to each article's URL and fetches the full HTML (or text) content.
Proxy rotation – For large runs or geo‑restricted feeds, residential proxies ensure high success rates.
Output – Returns a clean JSON array of articles ready for your application.

📥 Input Schema

The actor accepts the following input fields. All fields are optional; defaults are sensible.

Field	Type	Default	Description
`fetch_full_articles`	Boolean	`false`	If `true`, the actor will fetch the full HTML content of each article from its original URL.
`preset_feed`	String	`"techcrunch"`	Choose from a list of pre‑configured feeds: `"techcrunch"`, `"theverge"`, `"wired"`, `"cnn"`, `"bbc"`. If empty, you must provide a custom feed URL.
`custom_feed_url`	String	`""`	Your own RSS/Atom feed URL. Overrides `preset_feed` if provided.
`proxyConfiguration`	Object	`{ "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }`	Proxy settings. See Proxy Configuration for details.
`max_results`	Integer	`20`	Maximum number of articles to return (1–100).
`full_content_type`	String	`"html"`	When `fetch_full_articles` is `true`, choose `"html"` (raw HTML) or `"text"` (plain text).
`timeout_secs`	Integer	`30`	Timeout for each article fetch (in seconds).

Example Input (JSON)

{
  "fetch_full_articles": false,
  "preset_feed": "techcrunch",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  },
  "max_results": 20
}

📤 Output Format

The actor returns an array of objects, each representing a news article. Below is the schema:

Field	Type	Description
`type`	String	Always `"rss"` for this actor.
`source`	String	Domain name of the feed source (e.g., `"techcrunch.com"`).
`title`	String	Article headline.
`url`	String	Direct link to the full article.
`description`	String	Excerpt or summary from the RSS feed.
`published`	String (ISO 8601) or `null`	Publication date and time, if available.
`author`	String or `null`	Author name(s).
`category`	String	Comma‑separated categories/tags.
`image`	String or `null`	URL of the main article image.
`full_content`	String or `null`	Only present if `fetch_full_articles=true`. Contains the full article HTML or text.

Example Output (JSON)

[
  {
    "type": "rss",
    "source": "techcrunch.com",
    "title": "‘Not built right the first time’ — Musk’s xAI is starting over again, again",
    "url": "https://techcrunch.com/2025/03/15/not-built-right-the-first-time-musks-xai-is-starting-over-again-again/",
    "description": "The AI lab is revamping its effort to build an AI coding tool, with two new executives joining from Cursor.",
    "published": null,
    "author": "Tim Fernholz",
    "category": "AI, cursor, Elon Musk",
    "image": null
  },
  {
    "type": "rss",
    "source": "techcrunch.com",
    "title": "Lawyer behind AI psychosis cases warns of mass casualty risks",
    "url": "https://techcrunch.com/2025/03/15/lawyer-behind-ai-psychosis-cases-warns-of-mass-casualty-risks/",
    "description": "AI chatbots have been linked to suicides for years. Now one lawyer says they are showing up in mass casualty cases too, and the technology is moving faster than the safeguards.",
    "published": null,
    "author": "Rebecca Bellan",
    "category": "AI, ai delusions, ai psychosis",
    "image": null
  }
  // ... more articles up to max_results
]

🛠️ Usage

▶️ Run on Apify Console

Go to Apify Console and open the Actor page for News & RSS Feed Scraper.
Click "Run".
Fill in the input fields (or use the default).
Click "Start" and wait for results.

🔌 Run via Apify API (cURL)

curl -X POST "https://api.apify.com/v2/acts/your-username~news-rss-scraper/runs?token=<YOUR_API_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
    "fetch_full_articles": false,
    "preset_feed": "techcrunch",
    "max_results": 10
  }'

📦 Use as a Node.js Module

Install the package:

$npm install news-rss-scraper

Then use it in your code:

const { scrapeNews } = require('news-rss-scraper');

(async () => {
  const results = await scrapeNews({
    fetch_full_articles: false,
    preset_feed: 'techcrunch',
    max_results: 5
  });
  console.log(results);
})();

🌐 Proxy Configuration

To avoid IP‑based blocking, especially for high‑volume scraping, you can configure proxies. The actor integrates seamlessly with Apify Proxy.

Property	Type	Description
`useApifyProxy`	Boolean	If `true`, enables Apify Proxy. Default: `true`.
`apifyProxyGroups`	Array	Proxy groups: `["RESIDENTIAL"]`, `["DATACENTER"]`, or `["SHADER"]`. Residential is recommended for news sites.
`proxyUrls`	Array	Custom proxy URLs (e.g., `["http://user:pass@proxy.example.com:8080"]`). Ignored if `useApifyProxy` is `true`.

Example with custom proxies:

{
  "proxyConfiguration": {
    "useApifyProxy": false,
    "proxyUrls": ["http://user:pass@123.45.67.89:8080"]
  }
}

🧪 Advanced Options

Option	Description
`custom_feed_url`	If you need a feed not in the preset list, provide its full URL here.
`full_content_type`	When `fetch_full_articles=true`, choose `"html"` (raw HTML) or `"text"` (plain text stripped of tags).
`timeout_secs`	Timeout for each individual article fetch. Increase if sites are slow.

❓ FAQ / Troubleshooting

Q: Why are some articles missing images?

A: Not all RSS feeds include image metadata. The actor tries to extract images from the <enclosure> tag or the <media:content> tag. If none exist, the image field will be null.

Q: How can I scrape a non‑English news site?

A: Simply provide its RSS feed URL in custom_feed_url. The actor works with any valid RSS/Atom feed regardless of language.

Q: I'm getting blocked / timeouts.

A: Enable residential proxies (apifyProxyGroups: ["RESIDENTIAL"]) and reduce max_results to stay under rate limits. You can also increase timeout_secs.

Q: Can I run this Actor for free?

A: On Apify, each run consumes platform credits. Check Apify pricing for details. A small number of runs may be covered by the free tier.

Rss Feed Scraper

technicaldost/rss-feed-scraper

Technical Dost Solutions

Rss Feed API

vivid_astronaut/rss-feed

Fabio Suizu

RSS & Atom Feed to JSON Scraper

andok/rss-parser

Monitor blogs, news sites, and podcasts. Convert any RSS or Atom feed into structured JSON data for instant content syndication.

Andok

RSS Feed Scraper & RSS to JSON Converter

xtech/feed-extractor

Scrape and parse RSS, Atom, JSON Feed (and podcast RSS) URLs into clean, structured JSON. Outputs one dataset row per feed entry/item for easy export to CSV/JSON and automations.

Xtech

Website to RSS Feed Generator

constant_quadruped/website-to-rss

Convert any website into an RSS feed instantly. Auto-detects blog posts, news, and articles. Supports JavaScript sites via Playwright. Filter by keywords, extract full content, output as RSS or JSON. Perfect for competitor monitoring, news aggregation, and research tracking.

Google News RSS Scraper

cloud9_ai/google-news-scraper

Scrape Google News search results via RSS feed. Returns article titles, URLs, sources, publish dates, and summaries for any keyword. No API key needed.

cloud9

RSS Feed Scraper (All-in-one) 🔎📡📰

scrapestorm/rss-feed-scraper-all-in-one

Unlock the power of news and content with automated RSS feed scraping 📡📰 Discover and aggregate articles across multiple sources, extract structured data like titles, authors, publish dates, and content 📊 Perfect for news aggregation, media monitoring, content research, and trend analysis 🌍📈

Storm_Scraper

RSS Feed Reader

fatihai-tools/rss-feed-reader

Parse and extract RSS/Atom feed entries with titles, content, dates, and categories. Monitor multiple feeds, build content aggregators, and track publication schedules.

fatih dağüstü

RSS & News Feed Aggregator — Multi-Source Article Scraper

joyouscam35875/rss-news-aggregator

Aggregate and parse RSS/Atom feeds from any source. Extract articles with titles, descriptions, authors, dates, images. Optionally fetch full article content. Perfect for news monitoring and AI pipelines. $0.0005/article.

Ken Digital

RSS to JSON Converter

logiover/rss-to-json-converter

Convert any RSS, Atom or RDF feed URL into clean JSON. Bulk process multiple feeds and export structured items (title, link, date, author, tags, summary, content). Perfect for AI news agents, RAG pipelines, monitoring and content aggregators.