Google News Scraper Comprehensive avatar

Google News Scraper Comprehensive

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Google News Scraper Comprehensive

Google News Scraper Comprehensive

The most advanced Google News scraping Actor on Apify—built for high-volume, deduplicated data extraction with precision. It overcomes RSS limits using smart multi-window pagination, ensuring maximum coverage. Outputs clean, Excel-ready datasets ideal for research, monitoring, and automation.

Pricing

from $5.00 / 1,000 results

Rating

5.0

(4)

Developer

Shop Intel

Shop Intel

Maintained by Community

Actor stats

2

Bookmarked

0

Total users

0

Monthly active users

7 days ago

Last modified

Categories

Share

Google News Scraper

Apify actor: [Add your published Store URL here, e.g. https://apify.com/your-org/google-news-scraper]

Google News Scraper collects headlines, links, dates, sources, and snippets from Google News via its public RSS layer, then hands you a Dataset and a spreadsheet-friendly CSV—well suited to monitoring, research, and reporting pipelines.

At a glance

  • What it does: Fetches news.google.com/rss/search for your keyword, merges several time windows and optional rel="next" pages, and dedupes by guid / link so rows stay unique.
  • Input: keyword + numberOfResults (1–2000); aliases q, maxResults, results supported.
  • Outputs: Dataset (one article per row), RESULTS_CSV (UTF‑8 BOM, quoted, Excel-friendly), RESULTS_JSON, OUTPUT (includes meta.stoppedReason).
  • Locale: This version builds feeds with US + English (en-US / US) for stable, repeatable URLs.
  • Independence: Not affiliated with Google.

Highlights

Why teams use itWhat you get
Simple inputKeyword plus how many unique articles to collect—two fields in the default form.
High volumeAsk for up to 2,000 unique articles per run; the actor dedupes across feeds automatically.
Excel-ready exportRESULTS_CSV uses UTF-8 BOM, quoted fields, and CRLF — opens cleanly in Excel and Google Sheets.
Smarter than one RSS hitMerges multiple time windows (all, 7d, 30d, 1y, 1h) and follows rel="next" when Google sends it, so you get depth without duplicate rows.
Publisher-friendly fieldsTitle, dates, source name, snippet, Google News link, and a best-effort direct article URL when the feed exposes it.

Features

AreaWhat this actor gives you
SearchSingle keyword (or phrase) — same idea as typing in Google News search.
Volume1–2,000 unique articles per run (numberOfResults).
LocaleRuns with US / English RSS parameters (hl / gl) for consistent, reproducible feed URLs.
DedupingSame story appearing in multiple passes is kept once (by guid or link/title).
FeedsUses news.google.com/rss/search with automatic phase and pagination logic when available.
OutputsDataset (one JSON object per article) + RESULTS_CSV + RESULTS_JSON + run OUTPUT summary.

Input

Where: Apify → this actor → Input tab (form or JSON).

Required fields:

FieldTypeDescription
keywordstringWhat to search on Google News (e.g. renewable energy, Quarterly earnings, World Cup).
numberOfResultsintegerHow many unique articles to collect (1–2000).

Aliases (for API tasks and older scripts):

CanonicalAlso accepted
keywordquery, searchQuery, q
numberOfResultsmaxResults, results

Note: Region and language are fixed in this version to US / English for predictable RSS behavior. Every field in the input schema is wired into the scraper — there are no decorative placeholders.


Output

Everything is attached to the run you started.

Where to download

OutputWhat it isWhere in Apify
DatasetOne JSON record per article — main exportRun → Dataset → Export (JSON, CSV, Excel, …)
RESULTS_CSVExcel-friendly CSV (UTF-8 BOM, all fields quoted, CRLF)Run → Storage → Key-value store → RESULTS_CSV
RESULTS_JSONSummary plus articles array when size allows; otherwise a compact summary pointing at the DatasetStorageRESULTS_JSON
OUTPUTRun summary: keyword, requested vs returned count, meta (phases, stop reason), optional embedded CSV for small runsStorageOUTPUT

What each article row contains

FieldMeaning
position1-based rank in this run
keywordThe search keyword used
titleHeadline (HTML stripped)
linkLink as returned in the RSS feed (often a Google News URL)
articleUrlBest-effort publisher URL parsed from Google’s redirect when possible; otherwise same as link
pubDatePublication/update time string from the feed
sourceNameOutlet name when the feed provides it
descriptionSnippet / summary text (HTML stripped)
guidStable id from the feed when present (in Dataset JSON; CSV uses the standard export columns above)

Check OUTPUTmeta.stoppedReason after a run:

  • completed — reached the requested count (or capped by available unique items).
  • partial_no_more_feed — fewer unique articles existed than requested.
  • no_items_or_http_error — empty result or HTTP issue; see logs.

Quick start

  1. Open the actor → Input.
  2. Set Keyword (e.g. artificial intelligence).
  3. Set Number of results (e.g. 50).
  4. Click Start.
  5. When the run finishes: DatasetExport → CSV/Excel, or download RESULTS_CSV from Storage.

Example inputs (JSON)

Default-style run

{
"keyword": "artificial intelligence",
"numberOfResults": 30
}

Larger pull (deduped across feed passes)

{
"keyword": "climate policy",
"numberOfResults": 500
}

Using aliases

{
"q": "semiconductor supply chain",
"maxResults": 100
}

FAQ

Does this call the Google News API (Cloud product)?
No. It uses RSS/XML from news.google.com, which is a different surface from Google’s paid News API.

Why fewer rows than numberOfResults?
The index for a query may not yield that many distinct stories. The actor stops when feeds are exhausted; check OUTPUT.meta.stoppedReason (partial_no_more_feed vs completed).

Can I change country or language?
Not in this release—the scraper fixes hl / gl to English / US so URLs stay predictable.

Is articleUrl always the publisher URL?
Often the feed gives a Google News link; the actor tries to unwrap the publisher URL when the redirect allows it. If not, articleUrl stays the same as link.

Legal use
Honor Google’s terms, outlet robots/ToS, and copyright when storing or republishing full text.


Google News RSS scraper Apify · news monitoring automation · export Google News to CSV · deduplicated article list · keyword news dataset · bulk news links for research · meta.stoppedReason run diagnostics


Local development

make venv && make test
APIFY_LOCAL_STORAGE_DIR=./storage mkdir -p storage/key_value_stores/default && cp input.json storage/key_value_stores/default/INPUT.json
APIFY_LOCAL_STORAGE_DIR=./storage .venv/bin/python -m src

FastAd — automation & growth tooling.