Google News Scraper — Canonical URLs & Brand Tracking avatar

Google News Scraper — Canonical URLs & Brand Tracking

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Google News Scraper — Canonical URLs & Brand Tracking

Google News Scraper — Canonical URLs & Brand Tracking

Scrape Google News by keyword, brand, or topic across 50+ countries. Returns canonical publisher URLs (not Google redirects), source domains, dates, snippets, and thumbnails. Filter by date range or source. 100+ results per query, 3-second cold start, no proxy required.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

News Monitor Pro — Brand & Topic Tracking

Track news for any keyword, brand, or topic across 50+ countries and 30+ languages. Get real article URLs, source domains, thumbnails — not just Google redirect links.

The only Google News scraper that gives you canonical publisher URLs (not Google redirects), source domains, thumbnails, date filters, source whitelists/blacklists, and 100+ results per query via smart time-window stitching. Pure HTTP — 3-second cold start, no proxy required.


🎯 Use cases

  • PR / brand monitoring — Track every mention of your company or competitors across the global news cycle
  • SEO / content teams — Discover which publishers cover your industry, monitor backlink opportunities
  • Investment research — Real-time news signals for stocks, crypto, commodities, geopolitical events
  • Crisis monitoring — Hourly scheduled runs alert your team when negative coverage breaks
  • Content curation — Build daily newsletters, briefings, RSS-to-Slack pipelines
  • AI / ML training — Clean, structured news datasets for NLP, sentiment analysis, summarization models
  • Market research — Track industry trends across countries, languages, and time windows
  • Compliance / regulatory monitoring — Watch news for keywords across regulated industries (pharma, finance, etc.)

✨ Why this scraper beats the rest

FeatureThis actorOther Google News scrapers
Canonical publisher URLs (not Google redirects)✅ Built-in❌ Premium upsell or missing
Source domain extraction❌ Source name only
Thumbnail image URLs
Date range filter (from/to)
Source whitelist / blacklist
100+ articles per query (time-window stitching)❌ Capped at 100
Top headlines + topic + geo + search (all 4 modes)⚠️ Search only or topic only
Multi-query batch with cross-dedup⚠️ Some
Pure HTTP (no browser, no proxy)✅ 3s cold start✅ Same

📥 Input

Quick start — search for a brand

{
"queries": ["openai", "anthropic claude"],
"maxArticles": 50,
"language": "en-US",
"country": "US"
}

Search with operators

Google News supports full search operator syntax in queries:

OperatorMeaningExample
"phrase"Exact match"climate policy"
ORBoolean ORtesla OR spacex
-wordExcludeapple -fruit
site:domain.comLimit to sourceapify site:techcrunch.com
intitle:wordIn title onlyintitle:OpenAI
{
"queries": ["\"artificial intelligence\" -hype site:reuters.com"],
"maxArticles": 100
}

Browse by topic (curated top stories)

{
"topic": "TECHNOLOGY",
"language": "en-US",
"country": "US",
"maxArticles": 50
}

Available topics: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH.

News from a specific location

{
"geoLocation": "London",
"language": "en-GB",
"country": "GB",
"maxArticles": 30
}

Top headlines (daily digest)

{
"topHeadlines": true,
"language": "en-US",
"country": "US"
}

Power features — time range + source filter

{
"queries": ["climate change"],
"maxArticles": 200,
"fromDate": "2026-04-01",
"toDate": "2026-05-01",
"includeSources": ["reuters.com", "bloomberg.com", "ft.com", "wsj.com"]
}

When maxArticles > 100, the actor automatically fetches multiple time-windowed feeds (when:1h, when:1d, when:7d, when:30d, when:1y), deduplicates by article GUID, and returns up to 500 articles per query.

Non-English news

{
"queries": ["bourse économie"],
"language": "fr",
"country": "FR",
"maxArticles": 50
}
{
"queries": ["半導体 NVIDIA"],
"language": "ja",
"country": "JP",
"maxArticles": 30
}

📤 Output fields

Every dataset row contains:

FieldTypeDescription
titlestringArticle headline (source suffix stripped from Google's format)
descriptionstring | nullArticle snippet, HTML stripped, plain text, max 500 chars
sourcestring | nullPublisher name (e.g. "Reuters", "TechCrunch", "BBC")
sourceDomainstring | nullPublisher domain (e.g. "reuters.com") — easy filtering
sourceUrlstring | nullPublisher homepage URL from the RSS <source> tag
linkstringGoogle News redirect URL (always set)
originalUrlstring | nullCanonical publisher URL when resolveUrls=true (default ON)
thumbnailUrlstring | nullArticle thumbnail image URL when available
publishedAtstring | nullPublication date in ISO 8601 (UTC)
publishedAtRawstring | nullOriginal RFC 2822 date string from RSS
guidstring | nullUnique Google News article identifier
querystring | nullSearch query that found this article (search mode)
topicstring | nullTopic category (topic mode)
geoLocationstring | nullGeographic location (geo mode)
feedTypestringsearch, topic, geo, or top_headlines
languagestringLanguage code used
countrystringCountry code used
timeWindowstring | nullTime window used (e.g. 1d, 7d) or null
scrapedAtstringWhen this article was scraped (ISO 8601)

Example output row

{
"title": "OpenAI Unveils Next-Generation Reasoning Model",
"description": "The new model targets regulated industries with improved factual accuracy and audit logs.",
"source": "Reuters",
"sourceDomain": "reuters.com",
"sourceUrl": "https://www.reuters.com",
"link": "https://news.google.com/rss/articles/CBMiYmh0dHBzOi8vd3d3LnJldXRl...",
"originalUrl": "https://www.reuters.com/technology/openai-unveils-next-gen-reasoning-model-2026-04-21/",
"thumbnailUrl": "https://lh3.googleusercontent.com/...",
"publishedAt": "2026-04-21T15:30:00.000Z",
"publishedAtRaw": "Mon, 21 Apr 2026 15:30:00 GMT",
"guid": "CBMiYmh0dHBzOi8vd3d3LnJldXRl...",
"query": "openai",
"topic": null,
"geoLocation": null,
"feedType": "search",
"language": "en-US",
"country": "US",
"timeWindow": null,
"scrapedAt": "2026-05-13T08:15:42.310Z"
}

🌍 Languages & countries supported

Any Google News-supported combination works. Common examples:

  • 🇺🇸 language=en-US, country=US
  • 🇬🇧 language=en-GB, country=GB
  • 🇫🇷 language=fr, country=FR
  • 🇩🇪 language=de, country=DE
  • 🇪🇸 language=es, country=ES
  • 🇮🇹 language=it, country=IT
  • 🇧🇷 language=pt-BR, country=BR
  • 🇯🇵 language=ja, country=JP
  • 🇨🇳 language=zh-CN, country=CN
  • 🇹🇷 language=tr, country=TR
  • 🇸🇦 language=ar, country=SA
  • 🇮🇳 language=hi, country=IN

Just supply the right language and country codes — the actor adapts automatically.


⚡ Performance

  • Cold start: ~3 seconds (no browser, no native deps)
  • Throughput: 100 articles in 5 seconds; 500 articles in 25 seconds
  • Memory: 256 MB (default)
  • No proxy needed for typical usage — Google News RSS is generous with rate limits

💡 Common workflows

Daily brand monitoring → Slack

  1. Schedule this actor to run every morning at 8:00
  2. Input: { "queries": ["your brand"], "fromDate": "yesterday", "maxArticles": 50 }
  3. Set up Integrations → Slack webhook
  4. Every new article gets posted to your team's channel

Multi-country campaign tracking

{
"queries": ["nike sustainability"],
"maxArticles": 50
}

Run with country=US, then country=GB, then country=DE, etc. — merge datasets to see global coverage.

AI training dataset

{
"topic": "TECHNOLOGY",
"maxArticles": 500,
"language": "en-US",
"country": "US",
"resolveUrls": true
}

Schedule daily for one month → ~15,000 deduplicated tech articles with canonical URLs. Pipe originalUrl into a content extractor (Trafilatura, Newspaper3k) for full-text training data.

Stock signal monitoring

{
"queries": ["NVDA earnings", "NVDA layoffs", "NVDA lawsuit"],
"timeWindow": "1h",
"maxArticles": 30,
"includeSources": ["reuters.com", "bloomberg.com", "wsj.com", "ft.com"]
}

Schedule hourly. Pipe to a webhook that triggers an alert when count > 0.


❓ FAQ

Q: Why are some originalUrl fields null? A: A few percent of Google News redirects fail or time out. Try increasing the actor's timeout in Apify Console settings, or use the raw link field as fallback.

Q: Can I get full article text? A: This actor returns headlines, snippets, and URLs. For full article text, pipe originalUrl into a content-extraction actor (e.g. Mozilla Readability, Trafilatura).

Q: Does Google rate-limit me? A: Google News RSS is generous — most users never hit rate limits even at high volumes. If you do see rate limit errors, enable Apify Proxy in the input config.

Q: Why does maxArticles cap below 100 per query without stitching? A: Google News RSS itself caps single-feed responses at ~100 items. We work around this by requesting multiple time-windowed feeds and deduplicating — that's what triggers when you set maxArticles > 100.

Q: Can I search by URL substring or specific publisher? A: Use the site:domain.com operator inside your query, or use includeSources for post-filter whitelisting. Both work — site: filters at Google's side (cheaper), includeSources filters after fetching.

Q: Does this work for languages with non-Latin scripts (Chinese, Japanese, Arabic, etc.)? A: Yes — queries are URL-encoded properly and Google News supports all major scripts. Just set the right language and country codes.


This actor accesses publicly available Google News RSS feeds — the same data accessible to any RSS reader. No login required, no terms-of-service violation. Use the data responsibly:

  • ✅ Respect publisher copyrights — don't republish full article text without permission
  • ✅ Aggregate data and link to original sources
  • ✅ Comply with GDPR / privacy laws for any downstream use of personal data in articles
  • ❌ Don't use this to mass-scrape and republish news content