Google News Scraper - Search, Topics & Headlines avatar

Google News Scraper - Search, Topics & Headlines

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Google News Scraper - Search, Topics & Headlines

Google News Scraper - Search, Topics & Headlines

Scrape Google News without an API key: search queries, topic/geo headlines, and top stories. Optionally resolve real publisher URLs and extract article text. Export JSON/CSV/Excel.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Shahryar

Shahryar

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share

Google News Scraper – Search, Topics, Headlines & Article Text (No API Key)

The Google News scraper collects news headlines from Google News without an API key or login: run search queries, pull topic and geo headline feeds, and grab the front-page top stories — in any language/country edition. Optionally resolve the real publisher URL behind each Google News redirect and extract the full article text, author, and publish date. Use it as a Google News API alternative and export to JSON, CSV, or Excel.

Built for media monitoring, brand and PR tracking, SEO and content research, and news aggregation — a fast Google News data extractor that turns public RSS feeds into clean, structured records.

Pairs with the Hacker News Scraper for full news + tech-community monitoring in one pipeline.

Why this scraper

Google News exposes clean RSS feeds, but two things make them awkward to use directly: every link is a news.google.com redirect (not the publisher), and the feeds are point-in-time snapshots with no pagination. This Actor parses the feeds into structured records, optionally resolves the redirect to the real publisher URL via Google's current 2-step batchexecute flow, and lets you broaden coverage by splitting queries with Google News operators (when:, site:, date ranges). It only claims what it does — resolution and article-text extraction are best-effort and never fail the run.

What it does

  • 🔎 Search feeds – each query becomes a /rss/search?q= feed. Supports Google News operators: when:7d, site:reuters.com, intitle:ai, before:/after: dates, boolean OR, exact "quotes", and -exclusions.
  • 🗂️ Topic feeds – WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH.
  • 📍 Geo feeds – place-based headlines, e.g. London, San Francisco.
  • 🏠 Top stories – the front-page feed for the chosen edition.
  • 🌍 Any editionlanguage + country drive hl, gl, and the matching ceid.
  • 🔗 Real publisher URLs (optional) – resolve each news.google.com redirect to the actual article URL (best-effort).
  • 📝 Article text (optional) – fetch the resolved page and extract main body text, author, and published date (best-effort; publishers vary).

Example input

{
"queries": ["artificial intelligence", "tesla when:7d", "site:reuters.com climate"],
"topics": ["TECHNOLOGY", "BUSINESS"],
"geoLocations": ["London"],
"includeTopStories": false,
"language": "en-US",
"country": "US",
"maxItemsPerQuery": 100,
"maxItems": 1000,
"resolveArticleUrls": false,
"scrapeArticleText": false,
"proxyConfiguration": { "useApifyProxy": true }
}

Example output (article)

{
"type": "article",
"title": "OpenAI announces new model",
"link": "https://news.google.com/rss/articles/CBMi…?oc=5",
"googleNewsUrl": "https://news.google.com/rss/articles/CBMi…?oc=5",
"guid": "CBMi…",
"articleId": "CBMi…",
"source": "Reuters",
"sourceUrl": "https://www.reuters.com",
"publishedAt": "2026-06-17T22:55:21.000Z",
"pubDate": "Wed, 17 Jun 2026 22:55:21 GMT",
"snippet": "OpenAI announced … Reuters",
"query": "artificial intelligence",
"feedType": "search",
"language": "en-US",
"country": "US",
"scrapedAt": "2026-06-20T10:00:00.000Z"
}

Example output (with resolveArticleUrls + scrapeArticleText)

When those options are enabled, link becomes the real publisher URL (when resolution succeeds) and extra fields are added:

{
"type": "article",
"title": "OpenAI announces new model",
"link": "https://www.reuters.com/technology/openai-…",
"googleNewsUrl": "https://news.google.com/rss/articles/CBMi…?oc=5",
"source": "Reuters",
"publishedAt": "2026-06-17T22:55:21.000Z",
"articleResolved": true,
"articleText": "OpenAI on Wednesday announced …\n\n…",
"articleAuthor": "Jane Doe",
"articlePublishedAt": "2026-06-17T22:50:00.000Z",
"feedType": "search",
"language": "en-US",
"country": "US"
}

Output fields

Every item is pushed with type: "article". The fields below are exactly what the code emits — nothing more, nothing less.

FieldTypeDescription
typestringAlways "article" (the only output item type).
titlestringHeadline text.
linkstringReal publisher URL when resolveArticleUrls succeeds, otherwise the Google News redirect.
googleNewsUrlstringAlways the original news.google.com redirect.
guidstringThe article GUID (CBMi id, isPermaLink="false").
articleIdstringThe CBMi article id extracted from the guid/link.
sourcestringPublisher display name (from <source>).
sourceUrlstringPublisher homepage (from the <source url="…"> attribute).
publishedAtstringPublish time as ISO 8601 (parsed from RFC-822 <pubDate>); null if unparseable.
pubDatestringRaw <pubDate> string.
snippetstringPlain-text snippet stripped from the <description> HTML.
querystringOriginating query / topic / place (null for top stories).
feedTypestringsearch, topic, geo, or top.
languagestringLanguage used for the feed.
countrystringCountry used for the feed.
scrapedAtstringISO 8601 timestamp of when the item was collected.

Extra fields when resolveArticleUrls is enabled

FieldTypeDescription
articleResolvedbooleanWhether the redirect was resolved to a real publisher URL. false when resolution failed (the item then keeps the Google News redirect in link).

Extra fields when scrapeArticleText is enabled (implies resolveArticleUrls)

FieldTypeDescription
articleTextstringExtracted main body text (null if extraction failed, paywalled, or under the ~80-char threshold).
articleAuthorstringAuthor from the publisher page meta tags / byline (best-effort; null if not found).
articlePublishedAtstringPublished date from the publisher page meta (ISO 8601 when parseable, else the raw meta string; null if not found).

Input reference

FieldTypeDescription
queriesarrayFree-text search queries; each becomes a /rss/search?q= feed. Supports Google News operators.
topicsarray (enum)Topic feeds: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH.
geoLocationsarrayPlace names for geo headline feeds, e.g. London.
includeTopStoriesbooleanAlso fetch the front-page Top stories feed. Default false.
languagestringInterface/content language (hl), e.g. en-US. Default en-US.
countrystringEdition country (gl), e.g. US. Default US.
maxItemsPerQueryintegerCap items taken from each feed (0 = no cap). Default 100.
maxItemsintegerGlobal cap on total articles (0 = no limit). Default 1000.
resolveArticleUrlsbooleanResolve each redirect to the real publisher URL. Default false.
scrapeArticleTextbooleanFetch the resolved page and extract text/author/date (requires resolve). Default false.
proxyConfigurationobjectApify proxy settings. Default useApifyProxy: true.

Common use cases

  • Media & brand monitoring – track Google News mentions of a company, product, or person with "brand" queries and when: time windows; export the dataset to CSV/Excel for reporting.
  • PR & competitor tracking – follow site: and topic feeds to see who is covering what across publishers and countries.
  • SEO & content research / news aggregation – pull topic/geo headlines for a daily digest, newsletter, or content calendar without a Google News API key.
  • Dataset building & NLP – resolve real publisher URLs and extract full article text, author, and publish date for downstream sentiment analysis, summarization, or model training.
  • Market & finance signals – monitor ticker, sector, or policy queries in near real time and feed the JSON output into dashboards or alerts.

Notes & limits

  • No pagination. Google News feeds are point-in-time snapshots: search feeds return up to ~100 items and topic/geo feeds up to ~30. maxItemsPerQuery only trims a feed — to get more coverage, split a query with when: / site: / before:/after: operators rather than expecting more items per query.
  • Redirect resolution is best-effort. Current article ids are the long CBMi… form whose publisher URL is not decodable offline; resolution uses Google's 2-step batchexecute flow, which Google changes periodically. On failure the item keeps the Google News redirect and articleResolved is false.
  • Article-text extraction is best-effort. Publisher pages vary; some paywall, 403, or show consent walls, so articleText can be empty for some items. A single publisher failure never fails the run.
  • Resolution/text add cost & blocking risk. They issue 1–2 extra requests per item against Google's heavier app surface and arbitrary publisher sites, which throttle sooner than the RSS feeds. The Actor caps concurrency and rotates sessions; use residential proxies for large runs.
  • ceid must match the locale. It is built automatically as {COUNTRY}:{language-base} (e.g. en-US + USUS:en); a mismatch returns empty or wrong-locale results.
  • Exotic locales. Friendly topic names (TECHNOLOGY, …) are verified for en-US and major editions; some unusual locales use opaque base64 topic ids — you can pass such an id directly in topics.

FAQ

Do I need a Google News API key or login? No. The Actor reads public Google News RSS feeds — there is no API key, OAuth, or login involved. It is a key-free Google News API alternative.

Why is link a news.google.com URL? That is the feed's redirect. Enable resolveArticleUrls to turn it into the real publisher URL (best-effort). The original redirect is always preserved in googleNewsUrl.

Why did I only get ~100 results for my query? Google News search feeds cap at roughly 100 items and topic/geo feeds at ~30, with no pagination — they are point-in-time snapshots. Split the query with operators like when:7d, before:/after: date ranges, or site: to cover more, or schedule the run to collect over time.

Can I scrape Google News for any country or language? Yes. Set language (e.g. en-US, en-GB, es-419, fr) and country (e.g. US, GB, IN, DE). They drive hl, gl, and the ceid parameter, which is built automatically as {COUNTRY}:{language-base} (so en-US + USceid=US:en). A mismatch returns empty or wrong-locale results.

How do I control how many articles I get? Use maxItemsPerQuery to cap items taken from each feed (trims the snapshot; 0 = no cap) and maxItems for a global cap across all feeds (0 = no limit). Defaults are 100 per feed and 1000 total.

Which proxy should I use? Datacenter (the default useApifyProxy: true) is fine for the RSS feeds. Switch to residential if you enable resolution/article-text at volume and start hitting blocks — those features add 1–2 heavier requests per item against Google's app surface and publisher sites.

Why is articleText empty for some items? The publisher paywalled, returned 403, showed a consent wall, or rendered the body with JavaScript. Extraction is best-effort, uses meta tags + common article containers, and intentionally never fails the run.

Can I export the results to JSON, CSV, or Excel? Yes. Every run's dataset can be exported to JSON, CSV, Excel (XLSX), HTML, RSS, or XML, or pulled via the Apify API/SDK and webhooks for downstream pipelines.

Does this work with topics and geographic (place) feeds, not just search? Yes. Provide any combination of queries (search), topics (WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH), geoLocations (place names), and includeTopStories (front-page feed). Each becomes its own RSS feed.

Can I monitor news continuously? Yes. Schedule the Actor (e.g. hourly/daily) to build a rolling dataset for media monitoring or brand tracking. Pair it with the Hacker News Scraper to cover both mainstream news and the tech community in one workflow.