Google News Scraper — Headlines, Sources, URLs
Pricing
from $20.00 / 1,000 results
Google News Scraper — Headlines, Sources, URLs
Turn any Google News query into a deduplicated dataset of up to 2,000 articles: titles, sources, dates, RSS links, resolved publisher URLs, clean snippets. Multiple RSS time-window passes for depth beyond single-feed limits. Excel-ready CSV. No API key. Not affiliated with Google.
Google News RSS Scraper — Structured Headlines, Sources & Article URLs (Up to 2,000)
Turn any Google News search query into a deduplicated, structured dataset of headlines, publisher names, publication timestamps, RSS links, and resolved article URLs — without a Google API key or a headless browser. The Scrapeify Google News Scraper issues multiple RSS passes across time-window phases to overcome single-feed size limits, merges and deduplicates results across passes, and exports to a Dataset, RESULTS_CSV (Excel-friendly UTF-8 BOM), RESULTS_JSON, and a run OUTPUT summary.
Built for media monitoring teams, competitive intelligence analysts, AI content pipelines, and search visibility researchers who need repeatable, structured coverage of any news topic at scale.
Features
| Capability | Detail |
|---|---|
| RSS-first architecture | HTTP fetches to news.google.com/rss/search — lightweight, no browser required |
| Multi-phase coverage | Multiple when passes (1h, 7d, 30d, 1y) to approximate depth beyond single-feed limits |
| Deduplication | Merges results across phases using stable RSS identifiers and normalized URLs |
| Clean text fields | HTML stripped from descriptions for downstream NLP and embedding workflows |
| Canonical URL resolution | Parses Google redirect parameters to surface publisher articleUrl where available |
| 429 / 5xx retry logic | Bounded retry attempts with backoff for transient Google RSS errors |
| Up to 2,000 articles | Per-run cap with input validation; dedup stats in OUTPUT |
| Structured columns | position, keyword, title, link, articleUrl, pubDate, sourceName, description |
| Excel-ready CSV | RESULTS_CSV with UTF-8 BOM and quoted fields for Windows compatibility |
| Input flexibility | Aliases: query, searchQuery, q for keyword; maxResults for numberOfResults |
Use Cases
Media Monitoring & Press Tracking
Track news coverage for brand names, executives, products, or regulatory topics. Schedule hourly or daily runs and diff new link values since the previous run to surface breaking coverage before competitors do.
Competitive Intelligence
Monitor rival company and product news. Identify PR campaigns, product launches, partnership announcements, and negative press. Build a structured archive of competitor mentions for strategic planning.
SEO & Search Visibility Research
Map which publishers and articles rank in Google News for your target keywords. Identify content gaps, measure your brand's News presence, and track competitors' earned media performance over time.
AI Content Pipeline (Stage 1 Retrieval)
Use as Stage 1 of a retrieval stack: headlines + snippets cheaply triage topic relevance → LLMs decide which URLs warrant full article fetching and chunking → agents post summaries to ticketing or Slack.
RAG Knowledge Base Construction
Feed title + description + articleUrl into embedding pipelines. Store with keyword and sourceName metadata for semantic retrieval. Enable AI-generated answers with cited, timestamped news sources.
Industry Trend Analysis
Aggregate sourceName distributions and publication cadence for any keyword over time. Identify which outlets cover a topic most frequently, which publishers are emerging voices, and how news volume correlates with market events.
E-Commerce & Brand Intelligence
Track product recalls, supply chain disruptions, competitor product launches, and category news that affects purchasing decisions. Combine with Amazon Scraper data for comprehensive market intelligence.
Automation & Alert Pipelines
Trigger Apify runs on a cron schedule. Diff against previous dataset by link or articleUrl. Push new articles to Slack, email, or a ticketing system automatically.
Data Aggregation & Multi-Source Research
Combine Google News results with Google Maps, Amazon, and Meta Ad Library actor outputs for comprehensive multi-source dossiers on brands, markets, or topics.
Academic & Policy Research
Track news coverage of policy topics, scientific developments, or public health issues at scale. Export to CSV for corpus analysis, NLP research, or data journalism workflows.
Why Choose This Actor
- Lightweight and cost-efficient — HTTP-only; no browser fleet; suitable for high-frequency scheduling
- Deduplication built in — fewer duplicate rows than naive single-RSS pulls
- Production outputs — Dataset + CSV + JSON keys fit ETL, BI, and client-reporting workflows
- Cloud-native — Apify standard Dataset and Key-value store semantics with scheduling and webhooks
- Automation-ready — identical input contract across Console, REST API, and SDK clients
Quick Start
- Open the Scrapeify Google News Scraper on Apify Console.
- Enter a
keyword(e.g.renewable energy policy) and setnumberOfResults(e.g.500). - Click Start and wait for completion (typically seconds to low minutes).
- Export the Dataset as JSON or CSV, or download RESULTS_CSV from Storage → Key-value store.
Tip: Start with
numberOfResults: 50to validate keyword coverage before scaling to the 2,000-article limit.
Input Schema
{"keyword": "semiconductor supply chain","numberOfResults": 500}
| Field | Type | Required | Description |
|---|---|---|---|
keyword | string | Yes | News search phrase. Aliases: query, searchQuery, q. Supports operators (quotes, site:, etc.) |
numberOfResults | integer | Yes | Unique articles to collect (1–2,000). Alias: maxResults |
Output Schema
Dataset Row (one row per article)
{"position": 1,"keyword": "semiconductor supply chain","title": "Fab expansion slows as equipment backlog extends into 2027","link": "https://news.google.com/rss/articles/CBMiXGh0dHBzOi8vd3d3LmV4YW1wbGUuY29tL3RlY2gvZmFiLWRlbGF5cw...","articleUrl": "https://www.example.com/tech/fab-delays","pubDate": "Wed, 07 May 2026 08:15:00 GMT","sourceName": "TechCrunch","description": "Equipment vendors report extended lead times for EUV modules as chipmakers compete for capacity at advanced nodes."}
| Field | Type | Description |
|---|---|---|
position | integer | Deduped result position (1-based) |
keyword | string | Input keyword echoed on every row for joins and audits |
title | string | Article headline |
link | string | Google News RSS link (use as stable identifier) |
articleUrl | string | Resolved publisher URL when available; null if redirect omitted |
pubDate | string | Publication date in RSS format |
sourceName | string | Publisher name |
description | string | Article snippet with HTML stripped |
Note:
articleUrlresolves the Google redirect to the original publisher URL when redirect parameters are present. Uselinkas the stable dedup key;articleUrlas the citation URL for downstream crawling.
Run Summary (OUTPUT key in default KV store)
{"ok": true,"keyword": "semiconductor supply chain","numberOfResults": 500,"returnedCount": 487,"meta": {"stoppedReason": "target_reached","passesCompleted": 4,"totalFetched": 512,"uniqueAfterDedupe": 487},"scrapedAt": "2026-05-07T04:00:00.000Z","download": {"dataset": "Export as CSV/JSON from Dataset tab","keyValueStore": "RESULTS_CSV = Excel-friendly CSV (UTF-8 BOM, quoted fields)"},"csv": null,"note": "CSV too large to embed inline; use RESULTS_CSV key."}
| Field | Type | Description |
|---|---|---|
ok | boolean | true if articles were returned; false on error or empty |
returnedCount | integer | Unique articles after deduplication |
meta.stoppedReason | string | target_reached, exhausted, or error descriptor |
meta.passesCompleted | integer | Number of RSS phase passes completed |
meta.uniqueAfterDedupe | integer | Articles remaining after cross-phase dedup |
csv | string/null | Embedded CSV string when small enough; else null |
Additional KV keys: RESULTS_CSV (full CSV, UTF-8 BOM), RESULTS_JSON (full JSON array).
API Examples
cURL
curl "https://api.apify.com/v2/acts/scrapeify~google-news-scraper/runs?token=$APIFY_TOKEN" \-X POST \-H "Content-Type: application/json" \-d '{"keyword": "climate policy","numberOfResults": 250}'
Python
import osfrom apify_client import ApifyClientclient = ApifyClient(os.environ["APIFY_TOKEN"])run = client.actor("scrapeify/google-news-scraper").call(run_input={"keyword": "climate policy", "numberOfResults": 250})for article in client.dataset(run["defaultDatasetId"]).iterate_items():url = article.get("articleUrl") or article["link"]print(article["title"], article["sourceName"], url)
JavaScript / Node.js
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor("scrapeify/google-news-scraper").call({keyword: "climate policy",numberOfResults: 250,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Collected ${items.length} unique articles from ${new Set(items.map(a => a.sourceName)).size} publishers`);
Integration Examples
ChatGPT / Custom GPT Actions
Register the Apify run endpoint as a Custom GPT action. Return title, sourceName, pubDate, and articleUrl as a JSON array. The model can summarize recent coverage, identify trends, or answer questions grounded in actual news articles.
Claude Tool Use
from langchain.tools import tool@tooldef get_recent_news(keyword: str, n: int = 100) -> list:"""Fetch recent Google News articles for a keyword. Returns structured article data."""run = client.actor("scrapeify/google-news-scraper").call(run_input={"keyword": keyword, "numberOfResults": n})return client.dataset(run["defaultDatasetId"]).list_items().items
Pass the structured list to Claude for summarization, entity extraction, or sentiment analysis with articleUrl citations.
Gemini
Fetch 500+ article headlines and snippets → pass to Gemini's long-context window → generate a comprehensive topic briefing with source attribution and emerging narrative threads.
LangChain
from langchain.tools import toolfrom langchain.text_splitter import RecursiveCharacterTextSplitter@tooldef fetch_news_corpus(keyword: str, n: int) -> list:"""Search Google News and return article data for RAG ingestion."""run = client.actor("scrapeify/google-news-scraper").call(run_input={"keyword": keyword, "numberOfResults": n})return client.dataset(run["defaultDatasetId"]).list_items().items# Use as a retriever tool in a ConversationalRetrievalChain
CrewAI
NewsResearchAgent fetches articles with this tool. AnalysisAgent identifies key themes and entities. WritingAgent drafts a briefing document with source citations and publication dates.
AutoGen
# UserProxyAgent: "Summarize the last 100 news articles about EV battery technology"# ResearchAgent: calls google_news_scraper tool → returns structured JSON# SynthesisAgent: extracts key claims, publisher perspectives, and publication timeline
n8n / Make.com / Zapier
Cron trigger → Apify run → iterate Dataset items → filter for new link values since last run → push to Slack digest, Notion page, or HubSpot deal activity feed.
RAG Systems
# 1. Fetch articlesarticles = get_recent_news("renewable energy", n=500)# 2. Create documents for vector storefrom langchain.schema import Documentdocs = [Document(page_content=f"{a['title']}. {a['description']}",metadata={"url": a.get("articleUrl") or a["link"],"source": a["sourceName"],"date": a["pubDate"]})for a in articles]# 3. Embed and indexvectorstore.add_documents(docs)
Frequently Asked Questions
1. Do I need a Google API key or Google Cloud account?
No. The actor fetches public RSS endpoints from news.google.com — no API credentials required.
2. Why do I sometimes get fewer articles than requested?
There may not be enough distinct articles across RSS phases for the keyword. Inspect meta.uniqueAfterDedupe and meta.stoppedReason in OUTPUT.
3. When is articleUrl null?
Some Google News RSS entries don't include redirect parameters that allow URL resolution. Fall back to link for stable identification.
4. How does deduplication work across phases? The actor tracks stable RSS identifiers and normalized URLs across all passes. Articles seen in multiple time-window phases are merged into a single row.
5. Can I search by country or language?
The current implementation uses default hl and gl parameters. Fork the actor for specific ceid locale pairs (e.g. ceid=GB%3Aen for UK English).
6. Is full article text included?
No — only RSS fields: title, snippet, source, date, and URL. Crawl articleUrl with a separate article fetcher to retrieve full text.
7. How fast are runs typically?
Seconds to low minutes depending on numberOfResults and Google RSS response times.
8. How does the actor handle 429 rate limiting? Bounded retry attempts with backoff. Avoid launching excessive parallel runs from a single IP for the same keyword.
9. Does RESULTS_CSV open correctly in Excel?
Yes — RESULTS_CSV uses UTF-8 BOM encoding and quoted fields for Windows Excel compatibility.
10. Can I schedule hourly monitoring runs? Yes — use Apify Schedules combined with webhooks to your notification stack.
11. Are publication dates reliable?
pubDate reflects what the RSS feed reports. Some publishers use the crawl date rather than original publication date.
12. Can I combine results with other Scrapeify actors? Yes — join Google News results with Maps, Amazon, or Ad Library actor outputs in your data warehouse by keyword or entity.
13. What input aliases are supported?
query, searchQuery, q for the keyword; maxResults for numberOfResults.
14. What causes an empty dataset with error rows?
Check message in pushed error items and OUTPUT.ok for details. Common causes: empty keyword, Google temporarily blocking the IP, or zero-result queries.
15. Can I use this for real-time news alerts? Hourly runs are practical. For sub-minute latency, a dedicated news API is more appropriate.
16. How do I ingest into a vector database?
Use title + description as the text content. Store articleUrl, sourceName, keyword, and pubDate as metadata for filtering and citation.
17. What is the difference between link and articleUrl?
link is the Google News RSS URL — use as the stable dedup key. articleUrl is the resolved publisher URL — use as the citation link for downstream crawling and user-facing references.
18. Can I track which publishers cover a topic most?
Yes — aggregate sourceName values across Dataset rows. Sort by frequency to rank publishers by topic coverage volume.
19. Does the actor support Google Alerts-style monitoring? This actor provides structured rows for programmatic pipelines. For email digests, Google Alerts is a simpler option. For database-integrated monitoring and downstream automation, this actor is the better choice.
20. Is there an upper limit per keyword per run?
Yes — 2,000 unique articles per run (input validation). For broader coverage, run multiple passes across overlapping time windows with different when parameters.
21. How should I handle GDPR for article data? Headlines and snippets may mention individuals. Apply your organization's data retention and classification policies to stored news corpora.
22. Can I retrieve articles from specific publishers?
Add site:publisher.com to the keyword query to target a specific domain in Google News search.
23. What is meta.passesCompleted?
The number of RSS phase passes the actor completed (e.g. 1h, 7d, 30d, 1y windows). More passes generally yield broader coverage.
24. Does this include paywalled articles? Only metadata (title, snippet, source, URL) is collected from RSS — no paywall bypass. Full text requires a separate article fetcher.
25. How do I build an idempotent monitoring pipeline?
Key on link or normalized articleUrl before inserting into your database. Compare new link sets against the previous run to identify net-new coverage.
Best Practices
- Stagger schedules — don't hammer RSS from many simultaneous tasks on one egress IP
- Key on
linkfor idempotent pipelines before inserting into Postgres or vector stores - Rate-limit downstream crawling — respect
robots.txtand publisher terms when fetching full article text fromarticleUrl - Start small — validate with
numberOfResults: 50before scaling to 2,000 - Monitor
returnedCounttrends — alert on significant drops week-over-week for fixed keywords - Archive
RESULTS_JSONalongsideOUTPUTfor each scheduled run to enable historical diff analysis - Use
keywordcolumn for joins — it's echoed on every row, making multi-keyword batch pipelines easy to merge
Performance & Scalability
| Factor | Guidance |
|---|---|
| Throughput | HTTP-only; highly efficient for high-frequency scheduling |
| Upper bound | 2,000 deduplicated articles per run |
| Run time | Seconds to low minutes depending on RSS response latency and numberOfResults |
| Horizontal scale | Run parallel actors per keyword list — each is independent |
| Storage | Dataset is authoritative; RESULTS_CSV and RESULTS_JSON may be limited by KV size for large runs |
AI & Automation Workflows
3-stage retrieval pipeline:
- Stage 1 (this actor): headlines + snippets cheaply triage topic relevance
- Stage 2 (article fetcher): crawl
articleUrlfor full text on relevant articles - Stage 3 (LLM): chunk, embed, and index full text; generate answers with
articleUrlcitations
Competitive briefing automation: Schedule weekly Google News runs for competitor brand names → extract key themes from titles and snippets using an LLM → generate competitive intelligence brief → post to Confluence or Notion.
Trend detection pipeline:
Daily runs for industry keywords → aggregate pubDate distribution → detect volume spikes indicating major news events → alert stakeholders before the news cycle peaks.
Error Handling
| Scenario | Behavior |
|---|---|
| Missing or empty keyword | Error row in Dataset + OUTPUT.ok: false |
| Empty results | Completes with returnedCount = 0; meta.stoppedReason = exhausted |
| 429 rate limiting | Bounded retries with backoff; persistent failures surface in run logs |
| KV size limits | csv field in OUTPUT set to null; use RESULTS_CSV KV key or Dataset export |
| Transient HTTP errors | Retried per module constants; logged if persistent |
Trust & Reliability
Scrapeify maintains this actor for repeatable news monitoring with structured outputs, explicit dedup statistics, and clear storage keys — suitable for production automation when combined with appropriate compliance review and downstream content policies.
Related Scrapeify Actors
Explore the full Scrapeify suite — chain these actors together for end-to-end automation pipelines:
| Actor | What it does |
|---|---|
| Amazon Scraper | ASINs, prices, sponsored flags across 23 marketplaces |
| Instagram Ad Library Scraper | Instagram-only ads from Meta Ad Library |
| Meta Ad Library Scraper | Facebook & Instagram ads with sort options |
| WhatsApp Ad Scraper | Click-to-WhatsApp ad creatives |
| YouTube Video Downloader | Videos & audio to Apify Key-Value Store |
| Meta Brand & Page ID Finder | Resolve brand names to numeric Page IDs |
| Google Maps Scraper | Local business leads, reviews, emails, contacts |
Google News is a trademark of Google LLC. This actor is not affiliated with or endorsed by Google.