Google News Scraper avatar

Google News Scraper

Pricing

Pay per usage

Go to Apify Store
Google News Scraper

Google News Scraper

Fast Google News scraper. Search by keyword with time-range filter, scrape top stories per country, topic sections, geo headlines, or topic IDs. Extracts title, publisher, date, snippet, image, related coverage. Optional decode of the wrapped Google URL into the real publisher URL.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Crikit

Crikit

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

Fast, reliable Google News scraper. Pulls news articles by keyword, country, topic section, local geo headlines, or direct Google News topic ID. Pure RSS read, no headless browser, no anti-bot stack.

What it does

Scrapes Google News for news articles and returns structured records with title, publisher, publish date, snippet, thumbnail, Google News URL, and clustered related coverage. Optionally resolves the wrapped Google News URL into the publisher's real article URL via Google's own batchexecute RPC.

Inputs

FieldTypeDescription
queriesstring[]Keywords or phrases. Supports Google operators: site:reuters.com, intitle:, OR, -exclude, "exact phrase".
topicsstring[]Editorial sections: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH.
geosstring[]Free-text city names like San Francisco, New York, Tokyo. Google resolves to a geo topic ID.
topicIdsstring[]Direct Google News topic ID (CAAq... base64).
includeTopStoriesbooleanAlso scrape each locale's Top Stories feed.
localesobject[]Country editions. Each entry is {hl, gl, ceid}. Default: US English. Examples: {hl: "fr-FR", gl: "FR", ceid: "FR:fr"}, {hl: "ja-JP", gl: "JP", ceid: "JP:ja"}.
timeRangeenumRestrict keyword searches to a window: 1h, 12h, 1d, 7d, 30d, 1y. Use any for no restriction.
resolvePublisherUrlsbooleanWhen true, decodes each Google News URL into the publisher's real URL. Adds one HTTP request per article.
maxResultsPerTargetintegerCap per single feed (one query × one locale).
maxResultsintegerHard cap across the entire run.
maxConcurrencyintegerNumber of feeds fetched in parallel.
decodeConcurrencyintegerConcurrent URL decoder calls.
proxyConfigurationobjectApify proxy settings. Defaults to RESIDENTIAL US.

You can mix any combination of queries, topics, geos, topicIds, and includeTopStories in a single run.

Outputs

Every record includes:

  • articleId - Google's stable CBM identifier for the article
  • title - article title (publisher suffix stripped)
  • googleNewsUrl - the wrapped Google News URL
  • publisherUrl - the publisher's real URL, when resolvePublisherUrls=true
  • publisherName, publisherHomeUrl - publication details
  • publishedAt - ISO-8601 UTC
  • imageUrl - thumbnail URL (note: Google News RSS rarely emits per-item images; this field is usually null)
  • snippetHtml - raw HTML snippet
  • relatedCoverage[] - other articles Google clustered under the same story
  • feedKind, query, topic, topicId, geo, locale, scrapedAt

Example use cases

  • PR firms tracking brand mentions across publishers
  • Hedge funds running event-driven signal pipelines
  • Market intelligence teams monitoring competitor coverage
  • Sentiment analysis on a stream of topic-tagged news
  • Local-news dashboards by city
  • Country-by-country media coverage comparison

Performance and pricing

  • Pure RSS reads: ~5 sec for 100 records on a single feed
  • No JS execution, no headless browser, datacenter proxy sufficient (RESIDENTIAL default for safety margin)
  • Pay per dataset item. Each successful record costs the same flat fee regardless of whether you used a query, topic, geo, or topic-id feed
  • The resolvePublisherUrls flag adds an extra HTTP request per article and slows the run roughly 3x. Disable it for the cheapest, fastest scrape

Tips

  • For high-volume keywords, splice by timeRange (1h, 1d, 7d) and run the same query multiple times across windows. Each Google feed is hard-capped at 100 items
  • Fan a single query across multiple locales (US:en, GB:en, FR:fr, JP:ja) to multiply your coverage per run
  • Geo topic IDs are stable. The first time you scrape San Francisco, capture the topic ID from the run's googleNewsUrl and use topicIds directly in later runs to skip the redirect

Stack

Python + httpx + Apify SDK. No browser. Simple parser over Google's public RSS XML.