Google News Scraper avatar

Google News Scraper

Pricing

from $3.00 / 1,000 article returneds

Go to Apify Store
Google News Scraper

Google News Scraper

Search Google News by keyword or topic and get clean, structured articles with the REAL publisher URL (not Google's redirect), source, date, and snippet. Optional full article text and AI summary + sentiment. No key, no login.

Pricing

from $3.00 / 1,000 article returneds

Rating

5.0

(1)

Developer

Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

16 hours ago

Last modified

Share

Search Google News by keyword or topic and get back clean, structured articles — with the real publisher URL instead of Google's useless redirect link. No API key, no login.

Most Google News scrapers hand you https://news.google.com/rss/articles/CBMi... links that you then have to figure out how to open. This one decodes them to the actual reuters.com/... (or wherever) URL, and falls back gracefully (with urlResolved: false) when a link genuinely can't be decoded — instead of silently dropping the article.

What you get per article

title, url (resolved publisher link), urlResolved, source, sourceUrl (publisher homepage), publishedAt (ISO), snippet, and the original googleUrl. Turn on extras for articleText (full body) and aiSummary + sentiment.

Fields that can be null

  • url / urlResolved — URL resolution is best-effort. When a Google link can't be decoded, url stays the Google redirect and urlResolved is false. Check urlResolved to know which you got.
  • sourceUrl, source, publishedAt, snippet — null when Google's feed doesn't include that field for an item.
  • articleText — only present when Fetch full article text is on AND extraction succeeded. Some sites block scraping or have no extractable body; those rows come back with articleText: null, are flagged ok: false, and are not charged.
  • aiSummary / sentiment — only present when AI summary is on AND the OpenAI call succeeded; otherwise omitted/null for that row.

Input

FieldNotes
queryKeywords. Supports Google operators, e.g. tesla OR rivian, site:reuters.com, intitle:layoffs.
topicUse a topic feed (World, Business, Technology, …) instead of a query.
freshnessLast hour / 24h / 7d / 30d / year.
language / countrye.g. en-US / US, de / DE.
maxItemsUp to ~100 (Google's per-feed cap).
resolveUrlsDecode to the real publisher URL. On by default.
fetchArticleTextDownload each article and extract the body.
aiSummary1–2 sentence summary + sentiment (needs your OpenAI key).

Output

One dataset row per article. Pricing is pay-per-result: you are only charged for genuine, complete article rows (ok: true). Rows we couldn't fully deliver are never charged — this includes:

  • empty/invalid input (a single ok: false diagnostic row with errorCode: "BAD_INPUT"),
  • no results for the query/topic (NO_RESULTS),
  • blocks, rate limits, or network errors (BLOCKED / RATE_LIMITED / NETWORK),
  • and, when Fetch full article text is on, any article whose body couldn't be extracted (articleText: null, flagged ok: false).

Proxy

Google News RSS is a public, no-auth API with no anti-bot, so no proxy is required and the default runs without one (saving proxy credits). Only enable Apify Proxy if you hit IP rate limits at very high volume.

Troubleshooting

  • Many rows have urlResolved: false? Some publishers' Google links can't be decoded; the article still comes back with its source, title, date and the Google link.
  • articleText: null on several rows? Those sites blocked extraction — they are flagged ok: false and were not charged.
  • Getting a BAD_INPUT row? Provide a query or pick a topic (and an OpenAI key if aiSummary is on).

Example

{ "query": "openai funding", "freshness": "7d", "language": "en-US", "country": "US", "maxItems": 30, "resolveUrls": true }

Notes

Google News search feeds cap at roughly 100 results per query — split big jobs by keyword, source (site:), or freshness window. URL resolution is best-effort: most links decode, and any that don't still come back with the source, title, date and the Google link so nothing is lost.