Google News Scraper Comprehensive
Pricing
from $5.00 / 1,000 results
Google News Scraper Comprehensive
The most advanced Google News scraping Actor on Apify—built for high-volume, deduplicated data extraction with precision. It overcomes RSS limits using smart multi-window pagination, ensuring maximum coverage. Outputs clean, Excel-ready datasets ideal for research, monitoring, and automation.
Pricing
from $5.00 / 1,000 results
Rating
5.0
(4)
Developer
Shop Intel
Actor stats
2
Bookmarked
0
Total users
0
Monthly active users
7 days ago
Last modified
Share
Google News Scraper
Apify actor: [Add your published Store URL here, e.g. https://apify.com/your-org/google-news-scraper]
Google News Scraper collects headlines, links, dates, sources, and snippets from Google News via its public RSS layer, then hands you a Dataset and a spreadsheet-friendly CSV—well suited to monitoring, research, and reporting pipelines.
At a glance
- What it does: Fetches
news.google.com/rss/searchfor your keyword, merges several time windows and optionalrel="next"pages, and dedupes byguid/ link so rows stay unique. - Input:
keyword+numberOfResults(1–2000); aliasesq,maxResults,resultssupported. - Outputs: Dataset (one article per row),
RESULTS_CSV(UTF‑8 BOM, quoted, Excel-friendly),RESULTS_JSON,OUTPUT(includesmeta.stoppedReason). - Locale: This version builds feeds with US + English (
en-US/US) for stable, repeatable URLs. - Independence: Not affiliated with Google.
Highlights
| Why teams use it | What you get |
|---|---|
| Simple input | Keyword plus how many unique articles to collect—two fields in the default form. |
| High volume | Ask for up to 2,000 unique articles per run; the actor dedupes across feeds automatically. |
| Excel-ready export | RESULTS_CSV uses UTF-8 BOM, quoted fields, and CRLF — opens cleanly in Excel and Google Sheets. |
| Smarter than one RSS hit | Merges multiple time windows (all, 7d, 30d, 1y, 1h) and follows rel="next" when Google sends it, so you get depth without duplicate rows. |
| Publisher-friendly fields | Title, dates, source name, snippet, Google News link, and a best-effort direct article URL when the feed exposes it. |
Features
| Area | What this actor gives you |
|---|---|
| Search | Single keyword (or phrase) — same idea as typing in Google News search. |
| Volume | 1–2,000 unique articles per run (numberOfResults). |
| Locale | Runs with US / English RSS parameters (hl / gl) for consistent, reproducible feed URLs. |
| Deduping | Same story appearing in multiple passes is kept once (by guid or link/title). |
| Feeds | Uses news.google.com/rss/search with automatic phase and pagination logic when available. |
| Outputs | Dataset (one JSON object per article) + RESULTS_CSV + RESULTS_JSON + run OUTPUT summary. |
Input
Where: Apify → this actor → Input tab (form or JSON).
Required fields:
| Field | Type | Description |
|---|---|---|
keyword | string | What to search on Google News (e.g. renewable energy, Quarterly earnings, World Cup). |
numberOfResults | integer | How many unique articles to collect (1–2000). |
Aliases (for API tasks and older scripts):
| Canonical | Also accepted |
|---|---|
keyword | query, searchQuery, q |
numberOfResults | maxResults, results |
Note: Region and language are fixed in this version to US / English for predictable RSS behavior. Every field in the input schema is wired into the scraper — there are no decorative placeholders.
Output
Everything is attached to the run you started.
Where to download
| Output | What it is | Where in Apify |
|---|---|---|
| Dataset | One JSON record per article — main export | Run → Dataset → Export (JSON, CSV, Excel, …) |
| RESULTS_CSV | Excel-friendly CSV (UTF-8 BOM, all fields quoted, CRLF) | Run → Storage → Key-value store → RESULTS_CSV |
| RESULTS_JSON | Summary plus articles array when size allows; otherwise a compact summary pointing at the Dataset | Storage → RESULTS_JSON |
| OUTPUT | Run summary: keyword, requested vs returned count, meta (phases, stop reason), optional embedded CSV for small runs | Storage → OUTPUT |
What each article row contains
| Field | Meaning |
|---|---|
position | 1-based rank in this run |
keyword | The search keyword used |
title | Headline (HTML stripped) |
link | Link as returned in the RSS feed (often a Google News URL) |
articleUrl | Best-effort publisher URL parsed from Google’s redirect when possible; otherwise same as link |
pubDate | Publication/update time string from the feed |
sourceName | Outlet name when the feed provides it |
description | Snippet / summary text (HTML stripped) |
guid | Stable id from the feed when present (in Dataset JSON; CSV uses the standard export columns above) |
Check OUTPUT → meta.stoppedReason after a run:
completed— reached the requested count (or capped by available unique items).partial_no_more_feed— fewer unique articles existed than requested.no_items_or_http_error— empty result or HTTP issue; see logs.
Quick start
- Open the actor → Input.
- Set Keyword (e.g.
artificial intelligence). - Set Number of results (e.g.
50). - Click Start.
- When the run finishes: Dataset → Export → CSV/Excel, or download
RESULTS_CSVfrom Storage.
Example inputs (JSON)
Default-style run
{"keyword": "artificial intelligence","numberOfResults": 30}
Larger pull (deduped across feed passes)
{"keyword": "climate policy","numberOfResults": 500}
Using aliases
{"q": "semiconductor supply chain","maxResults": 100}
FAQ
Does this call the Google News API (Cloud product)?
No. It uses RSS/XML from news.google.com, which is a different surface from Google’s paid News API.
Why fewer rows than numberOfResults?
The index for a query may not yield that many distinct stories. The actor stops when feeds are exhausted; check OUTPUT.meta.stoppedReason (partial_no_more_feed vs completed).
Can I change country or language?
Not in this release—the scraper fixes hl / gl to English / US so URLs stay predictable.
Is articleUrl always the publisher URL?
Often the feed gives a Google News link; the actor tries to unwrap the publisher URL when the redirect allows it. If not, articleUrl stays the same as link.
Legal use
Honor Google’s terms, outlet robots/ToS, and copyright when storing or republishing full text.
Discoverability & related topics
Google News RSS scraper Apify · news monitoring automation · export Google News to CSV · deduplicated article list · keyword news dataset · bulk news links for research · meta.stoppedReason run diagnostics
Local development
make venv && make testAPIFY_LOCAL_STORAGE_DIR=./storage mkdir -p storage/key_value_stores/default && cp input.json storage/key_value_stores/default/INPUT.jsonAPIFY_LOCAL_STORAGE_DIR=./storage .venv/bin/python -m src
FastAd — automation & growth tooling.