Pricing

from $20.00 / 1,000 results

Google News Scraper — Headlines, Sources, URLs

Turn any Google News query into a deduplicated dataset of up to 2,000 articles: titles, sources, dates, RSS links, resolved publisher URLs, clean snippets. Multiple RSS time-window passes for depth beyond single-feed limits. Excel-ready CSV. No API key. Not affiliated with Google.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Scrapeify

Actor stats

Bookmarked

Total users

Monthly active users

21 days ago

Last modified

Google News RSS Scraper — Structured Headlines, Sources & Article URLs (Up to 2,000)

Turn any Google News search query into a deduplicated, structured dataset of headlines, publisher names, publication timestamps, RSS links, and resolved article URLs — without a Google API key or a headless browser. The Scrapeify Google News Scraper issues multiple RSS passes across time-window phases to overcome single-feed size limits, merges and deduplicates results across passes, and exports to a Dataset, RESULTS_CSV (Excel-friendly UTF-8 BOM), RESULTS_JSON, and a run OUTPUT summary.

Built for media monitoring teams, competitive intelligence analysts, AI content pipelines, and search visibility researchers who need repeatable, structured coverage of any news topic at scale.

Features

Capability	Detail
RSS-first architecture	HTTP fetches to `news.google.com/rss/search` — lightweight, no browser required
Multi-phase coverage	Multiple `when` passes (1h, 7d, 30d, 1y) to approximate depth beyond single-feed limits
Deduplication	Merges results across phases using stable RSS identifiers and normalized URLs
Clean text fields	HTML stripped from descriptions for downstream NLP and embedding workflows
Canonical URL resolution	Parses Google redirect parameters to surface publisher `articleUrl` where available
429 / 5xx retry logic	Bounded retry attempts with backoff for transient Google RSS errors
Up to 2,000 articles	Per-run cap with input validation; dedup stats in `OUTPUT`
Structured columns	`position`, `keyword`, `title`, `link`, `articleUrl`, `pubDate`, `sourceName`, `description`
Excel-ready CSV	`RESULTS_CSV` with UTF-8 BOM and quoted fields for Windows compatibility
Input flexibility	Aliases: `query`, `searchQuery`, `q` for keyword; `maxResults` for `numberOfResults`

Use Cases

Media Monitoring & Press Tracking

Track news coverage for brand names, executives, products, or regulatory topics. Schedule hourly or daily runs and diff new link values since the previous run to surface breaking coverage before competitors do.

Competitive Intelligence

Monitor rival company and product news. Identify PR campaigns, product launches, partnership announcements, and negative press. Build a structured archive of competitor mentions for strategic planning.

SEO & Search Visibility Research

Map which publishers and articles rank in Google News for your target keywords. Identify content gaps, measure your brand's News presence, and track competitors' earned media performance over time.

AI Content Pipeline (Stage 1 Retrieval)

Use as Stage 1 of a retrieval stack: headlines + snippets cheaply triage topic relevance → LLMs decide which URLs warrant full article fetching and chunking → agents post summaries to ticketing or Slack.

RAG Knowledge Base Construction

Feed title + description + articleUrl into embedding pipelines. Store with keyword and sourceName metadata for semantic retrieval. Enable AI-generated answers with cited, timestamped news sources.

Industry Trend Analysis

Aggregate sourceName distributions and publication cadence for any keyword over time. Identify which outlets cover a topic most frequently, which publishers are emerging voices, and how news volume correlates with market events.

E-Commerce & Brand Intelligence

Track product recalls, supply chain disruptions, competitor product launches, and category news that affects purchasing decisions. Combine with Amazon Scraper data for comprehensive market intelligence.

Automation & Alert Pipelines

Trigger Apify runs on a cron schedule. Diff against previous dataset by link or articleUrl. Push new articles to Slack, email, or a ticketing system automatically.

Data Aggregation & Multi-Source Research

Combine Google News results with Google Maps, Amazon, and Meta Ad Library actor outputs for comprehensive multi-source dossiers on brands, markets, or topics.

Academic & Policy Research

Track news coverage of policy topics, scientific developments, or public health issues at scale. Export to CSV for corpus analysis, NLP research, or data journalism workflows.

Why Choose This Actor

Lightweight and cost-efficient — HTTP-only; no browser fleet; suitable for high-frequency scheduling
Deduplication built in — fewer duplicate rows than naive single-RSS pulls
Production outputs — Dataset + CSV + JSON keys fit ETL, BI, and client-reporting workflows
Cloud-native — Apify standard Dataset and Key-value store semantics with scheduling and webhooks
Automation-ready — identical input contract across Console, REST API, and SDK clients

Quick Start

Open the Scrapeify Google News Scraper on Apify Console.
Enter a keyword (e.g. renewable energy policy) and set numberOfResults (e.g. 500).
Click Start and wait for completion (typically seconds to low minutes).
Export the Dataset as JSON or CSV, or download RESULTS_CSV from Storage → Key-value store.

Tip: Start with numberOfResults: 50 to validate keyword coverage before scaling to the 2,000-article limit.

Input Schema

{
  "keyword": "semiconductor supply chain",
  "numberOfResults": 500
}

Field	Type	Required	Description
`keyword`	string	Yes	News search phrase. Aliases: `query`, `searchQuery`, `q`. Supports operators (quotes, site:, etc.)
`numberOfResults`	integer	Yes	Unique articles to collect (1–2,000). Alias: `maxResults`

Output Schema

Dataset Row (one row per article)

{
  "position": 1,
  "keyword": "semiconductor supply chain",
  "title": "Fab expansion slows as equipment backlog extends into 2027",
  "link": "https://news.google.com/rss/articles/CBMiXGh0dHBzOi8vd3d3LmV4YW1wbGUuY29tL3RlY2gvZmFiLWRlbGF5cw...",
  "articleUrl": "https://www.example.com/tech/fab-delays",
  "pubDate": "Wed, 07 May 2026 08:15:00 GMT",
  "sourceName": "TechCrunch",
  "description": "Equipment vendors report extended lead times for EUV modules as chipmakers compete for capacity at advanced nodes."
}

Field	Type	Description
`position`	integer	Deduped result position (1-based)
`keyword`	string	Input keyword echoed on every row for joins and audits
`title`	string	Article headline
`link`	string	Google News RSS link (use as stable identifier)
`articleUrl`	string	Resolved publisher URL when available; `null` if redirect omitted
`pubDate`	string	Publication date in RSS format
`sourceName`	string	Publisher name
`description`	string	Article snippet with HTML stripped

Note: articleUrl resolves the Google redirect to the original publisher URL when redirect parameters are present. Use link as the stable dedup key; articleUrl as the citation URL for downstream crawling.

Run Summary (`OUTPUT` key in default KV store)

{
  "ok": true,
  "keyword": "semiconductor supply chain",
  "numberOfResults": 500,
  "returnedCount": 487,
  "meta": {
    "stoppedReason": "target_reached",
    "passesCompleted": 4,
    "totalFetched": 512,
    "uniqueAfterDedupe": 487
  },
  "scrapedAt": "2026-05-07T04:00:00.000Z",
  "download": {
    "dataset": "Export as CSV/JSON from Dataset tab",
    "keyValueStore": "RESULTS_CSV = Excel-friendly CSV (UTF-8 BOM, quoted fields)"
  },
  "csv": null,
  "note": "CSV too large to embed inline; use RESULTS_CSV key."
}

Field	Type	Description
`ok`	boolean	`true` if articles were returned; `false` on error or empty
`returnedCount`	integer	Unique articles after deduplication
`meta.stoppedReason`	string	`target_reached`, `exhausted`, or error descriptor
`meta.passesCompleted`	integer	Number of RSS phase passes completed
`meta.uniqueAfterDedupe`	integer	Articles remaining after cross-phase dedup
`csv`	string/null	Embedded CSV string when small enough; else `null`

Additional KV keys: RESULTS_CSV (full CSV, UTF-8 BOM), RESULTS_JSON (full JSON array).

API Examples

cURL

curl "https://api.apify.com/v2/acts/scrapeify~google-news-scraper/runs?token=$APIFY_TOKEN" \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "keyword": "climate policy",
    "numberOfResults": 250
  }'

Python

import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

run = client.actor("scrapeify/google-news-scraper").call(
    run_input={"keyword": "climate policy", "numberOfResults": 250}
)

for article in client.dataset(run["defaultDatasetId"]).iterate_items():
    url = article.get("articleUrl") or article["link"]
    print(article["title"], article["sourceName"], url)

JavaScript / Node.js

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const run = await client.actor("scrapeify/google-news-scraper").call({
  keyword: "climate policy",
  numberOfResults: 250,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Collected ${items.length} unique articles from ${new Set(items.map(a => a.sourceName)).size} publishers`);

Integration Examples

ChatGPT / Custom GPT Actions

Register the Apify run endpoint as a Custom GPT action. Return title, sourceName, pubDate, and articleUrl as a JSON array. The model can summarize recent coverage, identify trends, or answer questions grounded in actual news articles.

Claude Tool Use

from langchain.tools import tool

@tool
def get_recent_news(keyword: str, n: int = 100) -> list:
    """Fetch recent Google News articles for a keyword. Returns structured article data."""
    run = client.actor("scrapeify/google-news-scraper").call(
        run_input={"keyword": keyword, "numberOfResults": n}
    )
    return client.dataset(run["defaultDatasetId"]).list_items().items

Pass the structured list to Claude for summarization, entity extraction, or sentiment analysis with articleUrl citations.

Gemini

Fetch 500+ article headlines and snippets → pass to Gemini's long-context window → generate a comprehensive topic briefing with source attribution and emerging narrative threads.

LangChain

from langchain.tools import tool
from langchain.text_splitter import RecursiveCharacterTextSplitter

@tool
def fetch_news_corpus(keyword: str, n: int) -> list:
    """Search Google News and return article data for RAG ingestion."""
    run = client.actor("scrapeify/google-news-scraper").call(
        run_input={"keyword": keyword, "numberOfResults": n}
    )
    return client.dataset(run["defaultDatasetId"]).list_items().items

# Use as a retriever tool in a ConversationalRetrievalChain

CrewAI

NewsResearchAgent fetches articles with this tool. AnalysisAgent identifies key themes and entities. WritingAgent drafts a briefing document with source citations and publication dates.

AutoGen

# UserProxyAgent: "Summarize the last 100 news articles about EV battery technology"
# ResearchAgent: calls google_news_scraper tool → returns structured JSON
# SynthesisAgent: extracts key claims, publisher perspectives, and publication timeline

n8n / Make.com / Zapier

Cron trigger → Apify run → iterate Dataset items → filter for new link values since last run → push to Slack digest, Notion page, or HubSpot deal activity feed.

RAG Systems

# 1. Fetch articles
articles = get_recent_news("renewable energy", n=500)

# 2. Create documents for vector store
from langchain.schema import Document
docs = [
    Document(
        page_content=f"{a['title']}. {a['description']}",
        metadata={"url": a.get("articleUrl") or a["link"],
                  "source": a["sourceName"],
                  "date": a["pubDate"]}
    )
    for a in articles
]

# 3. Embed and index
vectorstore.add_documents(docs)

Frequently Asked Questions

1. Do I need a Google API key or Google Cloud account? No. The actor fetches public RSS endpoints from news.google.com — no API credentials required.

2. Why do I sometimes get fewer articles than requested? There may not be enough distinct articles across RSS phases for the keyword. Inspect meta.uniqueAfterDedupe and meta.stoppedReason in OUTPUT.

3. When is articleUrl null? Some Google News RSS entries don't include redirect parameters that allow URL resolution. Fall back to link for stable identification.

4. How does deduplication work across phases? The actor tracks stable RSS identifiers and normalized URLs across all passes. Articles seen in multiple time-window phases are merged into a single row.

5. Can I search by country or language? The current implementation uses default hl and gl parameters. Fork the actor for specific ceid locale pairs (e.g. ceid=GB%3Aen for UK English).

6. Is full article text included? No — only RSS fields: title, snippet, source, date, and URL. Crawl articleUrl with a separate article fetcher to retrieve full text.

7. How fast are runs typically? Seconds to low minutes depending on numberOfResults and Google RSS response times.

8. How does the actor handle 429 rate limiting? Bounded retry attempts with backoff. Avoid launching excessive parallel runs from a single IP for the same keyword.

9. Does RESULTS_CSV open correctly in Excel? Yes — RESULTS_CSV uses UTF-8 BOM encoding and quoted fields for Windows Excel compatibility.

10. Can I schedule hourly monitoring runs? Yes — use Apify Schedules combined with webhooks to your notification stack.

11. Are publication dates reliable? pubDate reflects what the RSS feed reports. Some publishers use the crawl date rather than original publication date.

12. Can I combine results with other Scrapeify actors? Yes — join Google News results with Maps, Amazon, or Ad Library actor outputs in your data warehouse by keyword or entity.

13. What input aliases are supported? query, searchQuery, q for the keyword; maxResults for numberOfResults.

14. What causes an empty dataset with error rows? Check message in pushed error items and OUTPUT.ok for details. Common causes: empty keyword, Google temporarily blocking the IP, or zero-result queries.

15. Can I use this for real-time news alerts? Hourly runs are practical. For sub-minute latency, a dedicated news API is more appropriate.

16. How do I ingest into a vector database? Use title + description as the text content. Store articleUrl, sourceName, keyword, and pubDate as metadata for filtering and citation.

17. What is the difference between link and articleUrl? link is the Google News RSS URL — use as the stable dedup key. articleUrl is the resolved publisher URL — use as the citation link for downstream crawling and user-facing references.

18. Can I track which publishers cover a topic most? Yes — aggregate sourceName values across Dataset rows. Sort by frequency to rank publishers by topic coverage volume.

19. Does the actor support Google Alerts-style monitoring? This actor provides structured rows for programmatic pipelines. For email digests, Google Alerts is a simpler option. For database-integrated monitoring and downstream automation, this actor is the better choice.

20. Is there an upper limit per keyword per run? Yes — 2,000 unique articles per run (input validation). For broader coverage, run multiple passes across overlapping time windows with different when parameters.

21. How should I handle GDPR for article data? Headlines and snippets may mention individuals. Apply your organization's data retention and classification policies to stored news corpora.

22. Can I retrieve articles from specific publishers? Add site:publisher.com to the keyword query to target a specific domain in Google News search.

23. What is meta.passesCompleted? The number of RSS phase passes the actor completed (e.g. 1h, 7d, 30d, 1y windows). More passes generally yield broader coverage.

24. Does this include paywalled articles? Only metadata (title, snippet, source, URL) is collected from RSS — no paywall bypass. Full text requires a separate article fetcher.

25. How do I build an idempotent monitoring pipeline? Key on link or normalized articleUrl before inserting into your database. Compare new link sets against the previous run to identify net-new coverage.

Best Practices

Stagger schedules — don't hammer RSS from many simultaneous tasks on one egress IP
Key on link for idempotent pipelines before inserting into Postgres or vector stores
Rate-limit downstream crawling — respect robots.txt and publisher terms when fetching full article text from articleUrl
Start small — validate with numberOfResults: 50 before scaling to 2,000
Monitor returnedCount trends — alert on significant drops week-over-week for fixed keywords
Archive RESULTS_JSON alongside OUTPUT for each scheduled run to enable historical diff analysis
Use keyword column for joins — it's echoed on every row, making multi-keyword batch pipelines easy to merge

Performance & Scalability

Factor	Guidance
Throughput	HTTP-only; highly efficient for high-frequency scheduling
Upper bound	2,000 deduplicated articles per run
Run time	Seconds to low minutes depending on RSS response latency and `numberOfResults`
Horizontal scale	Run parallel actors per keyword list — each is independent
Storage	Dataset is authoritative; `RESULTS_CSV` and `RESULTS_JSON` may be limited by KV size for large runs

AI & Automation Workflows

3-stage retrieval pipeline:

Stage 1 (this actor): headlines + snippets cheaply triage topic relevance
Stage 2 (article fetcher): crawl articleUrl for full text on relevant articles
Stage 3 (LLM): chunk, embed, and index full text; generate answers with articleUrl citations

Competitive briefing automation: Schedule weekly Google News runs for competitor brand names → extract key themes from titles and snippets using an LLM → generate competitive intelligence brief → post to Confluence or Notion.

Trend detection pipeline: Daily runs for industry keywords → aggregate pubDate distribution → detect volume spikes indicating major news events → alert stakeholders before the news cycle peaks.

Error Handling

Scenario	Behavior
Missing or empty keyword	Error row in Dataset + `OUTPUT.ok: false`
Empty results	Completes with `returnedCount = 0`; `meta.stoppedReason = exhausted`
429 rate limiting	Bounded retries with backoff; persistent failures surface in run logs
KV size limits	`csv` field in OUTPUT set to `null`; use `RESULTS_CSV` KV key or Dataset export
Transient HTTP errors	Retried per module constants; logged if persistent

Trust & Reliability

Scrapeify maintains this actor for repeatable news monitoring with structured outputs, explicit dedup statistics, and clear storage keys — suitable for production automation when combined with appropriate compliance review and downstream content policies.

Explore the full Scrapeify suite — chain these actors together for end-to-end automation pipelines:

Actor	What it does
Amazon Scraper	ASINs, prices, sponsored flags across 23 marketplaces
Instagram Ad Library Scraper	Instagram-only ads from Meta Ad Library
Meta Ad Library Scraper	Facebook & Instagram ads with sort options
WhatsApp Ad Scraper	Click-to-WhatsApp ad creatives
YouTube Video Downloader	Videos & audio to Apify Key-Value Store
Meta Brand & Page ID Finder	Resolve brand names to numeric Page IDs
Google Maps Scraper	Local business leads, reviews, emails, contacts

Google News is a trademark of Google LLC. This actor is not affiliated with or endorsed by Google.

Google News Scraper Comprehensive

scrapeio/google-news-scraper

Enter a keyword and collect up to 2,000 deduplicated Google News articles—headlines, publisher URLs, dates, sources, and snippets—from the public RSS layer. Excel‑ready CSV (UTF‑8 BOM, quoted fields) plus JSON. Perfect for brand monitoring, PR measurement, and news datasets. No Google Cloud API key.

Shop Intel

5.0

Google News RSS Scraper

cloud9_ai/google-news-scraper

Scrape Google News search results via RSS feed. Returns article titles, URLs, sources, publish dates, and summaries for any keyword. No API key needed.

cloud9

Google News Scraper - Search Headlines, Sources & Articles

thirdwatch/google-news-scraper

Scrape Google News articles. Get article titles, sources, publication dates, descriptions, and URLs. Supports keyword search, time ranges, and multiple languages/countries.

Thirdwatch

Google News Scraper

santamaria-automations/google-news-scraper

Scrape Google News search results. Extract article titles, sources, snippets, links, and publication dates for any keyword. Multi-query support.

Ale

Google News Scraper

scrapapi/google-news-scraper

📰 Google News Scraper collects headlines, snippets, sources, dates & links from Google News by topic, keyword, region & language. 🔎 Export to JSON/CSV for monitoring trends, competitors & PR. ⚡ Fast, reliable, proxy-ready. 🚀 Perfect for media research & market intel.

ScrapAPI

Google News Scraper

muscular_quadruplet/google-news-scraper

Scrape Google News articles by keyword or topic. Get headlines, sources, publish dates, snippets. Monitor news mentions, track industry trends, build news aggregators. Real-time news scraping.

Do It

Google News Scraper

thescrappa/google-news-scraper

Scrape Google News articles for keywords, topics, publications, and story clusters. Extract titles, links, sources, snippets, dates, thumbnails, and Google News tokens for media monitoring and market research.

Scrappa

Google News Scraper

scrapebase/google-news-scraper

Stay on top of breaking stories with this Google News scraper 📰⚡ Extract headlines, sources, publish dates, snippets, links, and more from Google News results. Perfect for trend tracking, media monitoring, research, and content planning. Get fresh news data fast 🚀

ScrapeBase

Google News Scraper

sourabhbgp/google-news-scraper

Scrape Google News articles by keyword, topic, or headlines. 100 results per query, full geo/language support. RSS-based, fast and reliable.

Sourabh Kumar

Google News Scraper

oneary/google-news-scraper

Luan

Google News Scraper — Headlines, Sources, URLs

Google News RSS Scraper — Structured Headlines, Sources & Article URLs (Up to 2,000)

Features

Use Cases

Media Monitoring & Press Tracking

Competitive Intelligence

SEO & Search Visibility Research

AI Content Pipeline (Stage 1 Retrieval)

RAG Knowledge Base Construction

Industry Trend Analysis

E-Commerce & Brand Intelligence

Automation & Alert Pipelines

Data Aggregation & Multi-Source Research

Academic & Policy Research

Why Choose This Actor

Quick Start

Input Schema

Output Schema

Dataset Row (one row per article)

Run Summary (OUTPUT key in default KV store)

API Examples

cURL

Python

JavaScript / Node.js

Integration Examples

ChatGPT / Custom GPT Actions

Claude Tool Use

Gemini

LangChain

CrewAI

AutoGen

n8n / Make.com / Zapier

RAG Systems

Frequently Asked Questions

Best Practices

Performance & Scalability

AI & Automation Workflows

Error Handling

Trust & Reliability

Related Scrapeify Actors

You might also like

Google News Scraper Comprehensive

Google News RSS Scraper

Google News Scraper - Search Headlines, Sources & Articles

Google News Scraper

Google News Scraper

Google News Scraper

Google News Scraper

Google News Scraper

Google News Scraper

Google News Scraper

Run Summary (`OUTPUT` key in default KV store)