Google News Scraper — Canonical URLs & Brand Tracking
Pricing
from $1.00 / 1,000 results
Google News Scraper — Canonical URLs & Brand Tracking
Scrape Google News by keyword, brand, or topic across 50+ countries. Returns canonical publisher URLs (not Google redirects), source domains, dates, snippets, and thumbnails. Filter by date range or source. 100+ results per query, 3-second cold start, no proxy required.
News Monitor Pro — Brand & Topic Tracking
Track news for any keyword, brand, or topic across 50+ countries and 30+ languages. Get real article URLs, source domains, thumbnails — not just Google redirect links.
The only Google News scraper that gives you canonical publisher URLs (not Google redirects), source domains, thumbnails, date filters, source whitelists/blacklists, and 100+ results per query via smart time-window stitching. Pure HTTP — 3-second cold start, no proxy required.
🎯 Use cases
- PR / brand monitoring — Track every mention of your company or competitors across the global news cycle
- SEO / content teams — Discover which publishers cover your industry, monitor backlink opportunities
- Investment research — Real-time news signals for stocks, crypto, commodities, geopolitical events
- Crisis monitoring — Hourly scheduled runs alert your team when negative coverage breaks
- Content curation — Build daily newsletters, briefings, RSS-to-Slack pipelines
- AI / ML training — Clean, structured news datasets for NLP, sentiment analysis, summarization models
- Market research — Track industry trends across countries, languages, and time windows
- Compliance / regulatory monitoring — Watch news for keywords across regulated industries (pharma, finance, etc.)
✨ Why this scraper beats the rest
| Feature | This actor | Other Google News scrapers |
|---|---|---|
| Canonical publisher URLs (not Google redirects) | ✅ Built-in | ❌ Premium upsell or missing |
| Source domain extraction | ✅ | ❌ Source name only |
| Thumbnail image URLs | ✅ | ❌ |
| Date range filter (from/to) | ✅ | ❌ |
| Source whitelist / blacklist | ✅ | ❌ |
| 100+ articles per query (time-window stitching) | ✅ | ❌ Capped at 100 |
| Top headlines + topic + geo + search (all 4 modes) | ✅ | ⚠️ Search only or topic only |
| Multi-query batch with cross-dedup | ✅ | ⚠️ Some |
| Pure HTTP (no browser, no proxy) | ✅ 3s cold start | ✅ Same |
📥 Input
Quick start — search for a brand
{"queries": ["openai", "anthropic claude"],"maxArticles": 50,"language": "en-US","country": "US"}
Search with operators
Google News supports full search operator syntax in queries:
| Operator | Meaning | Example |
|---|---|---|
"phrase" | Exact match | "climate policy" |
OR | Boolean OR | tesla OR spacex |
-word | Exclude | apple -fruit |
site:domain.com | Limit to source | apify site:techcrunch.com |
intitle:word | In title only | intitle:OpenAI |
{"queries": ["\"artificial intelligence\" -hype site:reuters.com"],"maxArticles": 100}
Browse by topic (curated top stories)
{"topic": "TECHNOLOGY","language": "en-US","country": "US","maxArticles": 50}
Available topics: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH.
News from a specific location
{"geoLocation": "London","language": "en-GB","country": "GB","maxArticles": 30}
Top headlines (daily digest)
{"topHeadlines": true,"language": "en-US","country": "US"}
Power features — time range + source filter
{"queries": ["climate change"],"maxArticles": 200,"fromDate": "2026-04-01","toDate": "2026-05-01","includeSources": ["reuters.com", "bloomberg.com", "ft.com", "wsj.com"]}
When maxArticles > 100, the actor automatically fetches multiple time-windowed feeds (when:1h, when:1d, when:7d, when:30d, when:1y), deduplicates by article GUID, and returns up to 500 articles per query.
Non-English news
{"queries": ["bourse économie"],"language": "fr","country": "FR","maxArticles": 50}
{"queries": ["半導体 NVIDIA"],"language": "ja","country": "JP","maxArticles": 30}
📤 Output fields
Every dataset row contains:
| Field | Type | Description |
|---|---|---|
title | string | Article headline (source suffix stripped from Google's format) |
description | string | null | Article snippet, HTML stripped, plain text, max 500 chars |
source | string | null | Publisher name (e.g. "Reuters", "TechCrunch", "BBC") |
sourceDomain | string | null | Publisher domain (e.g. "reuters.com") — easy filtering |
sourceUrl | string | null | Publisher homepage URL from the RSS <source> tag |
link | string | Google News redirect URL (always set) |
originalUrl | string | null | Canonical publisher URL when resolveUrls=true (default ON) |
thumbnailUrl | string | null | Article thumbnail image URL when available |
publishedAt | string | null | Publication date in ISO 8601 (UTC) |
publishedAtRaw | string | null | Original RFC 2822 date string from RSS |
guid | string | null | Unique Google News article identifier |
query | string | null | Search query that found this article (search mode) |
topic | string | null | Topic category (topic mode) |
geoLocation | string | null | Geographic location (geo mode) |
feedType | string | search, topic, geo, or top_headlines |
language | string | Language code used |
country | string | Country code used |
timeWindow | string | null | Time window used (e.g. 1d, 7d) or null |
scrapedAt | string | When this article was scraped (ISO 8601) |
Example output row
{"title": "OpenAI Unveils Next-Generation Reasoning Model","description": "The new model targets regulated industries with improved factual accuracy and audit logs.","source": "Reuters","sourceDomain": "reuters.com","sourceUrl": "https://www.reuters.com","link": "https://news.google.com/rss/articles/CBMiYmh0dHBzOi8vd3d3LnJldXRl...","originalUrl": "https://www.reuters.com/technology/openai-unveils-next-gen-reasoning-model-2026-04-21/","thumbnailUrl": "https://lh3.googleusercontent.com/...","publishedAt": "2026-04-21T15:30:00.000Z","publishedAtRaw": "Mon, 21 Apr 2026 15:30:00 GMT","guid": "CBMiYmh0dHBzOi8vd3d3LnJldXRl...","query": "openai","topic": null,"geoLocation": null,"feedType": "search","language": "en-US","country": "US","timeWindow": null,"scrapedAt": "2026-05-13T08:15:42.310Z"}
🌍 Languages & countries supported
Any Google News-supported combination works. Common examples:
- 🇺🇸
language=en-US, country=US - 🇬🇧
language=en-GB, country=GB - 🇫🇷
language=fr, country=FR - 🇩🇪
language=de, country=DE - 🇪🇸
language=es, country=ES - 🇮🇹
language=it, country=IT - 🇧🇷
language=pt-BR, country=BR - 🇯🇵
language=ja, country=JP - 🇨🇳
language=zh-CN, country=CN - 🇹🇷
language=tr, country=TR - 🇸🇦
language=ar, country=SA - 🇮🇳
language=hi, country=IN
Just supply the right language and country codes — the actor adapts automatically.
⚡ Performance
- Cold start: ~3 seconds (no browser, no native deps)
- Throughput: 100 articles in 5 seconds; 500 articles in 25 seconds
- Memory: 256 MB (default)
- No proxy needed for typical usage — Google News RSS is generous with rate limits
💡 Common workflows
Daily brand monitoring → Slack
- Schedule this actor to run every morning at 8:00
- Input:
{ "queries": ["your brand"], "fromDate": "yesterday", "maxArticles": 50 } - Set up Integrations → Slack webhook
- Every new article gets posted to your team's channel
Multi-country campaign tracking
{"queries": ["nike sustainability"],"maxArticles": 50}
Run with country=US, then country=GB, then country=DE, etc. — merge datasets to see global coverage.
AI training dataset
{"topic": "TECHNOLOGY","maxArticles": 500,"language": "en-US","country": "US","resolveUrls": true}
Schedule daily for one month → ~15,000 deduplicated tech articles with canonical URLs. Pipe originalUrl into a content extractor (Trafilatura, Newspaper3k) for full-text training data.
Stock signal monitoring
{"queries": ["NVDA earnings", "NVDA layoffs", "NVDA lawsuit"],"timeWindow": "1h","maxArticles": 30,"includeSources": ["reuters.com", "bloomberg.com", "wsj.com", "ft.com"]}
Schedule hourly. Pipe to a webhook that triggers an alert when count > 0.
❓ FAQ
Q: Why are some originalUrl fields null?
A: A few percent of Google News redirects fail or time out. Try increasing the actor's timeout in Apify Console settings, or use the raw link field as fallback.
Q: Can I get full article text?
A: This actor returns headlines, snippets, and URLs. For full article text, pipe originalUrl into a content-extraction actor (e.g. Mozilla Readability, Trafilatura).
Q: Does Google rate-limit me? A: Google News RSS is generous — most users never hit rate limits even at high volumes. If you do see rate limit errors, enable Apify Proxy in the input config.
Q: Why does maxArticles cap below 100 per query without stitching?
A: Google News RSS itself caps single-feed responses at ~100 items. We work around this by requesting multiple time-windowed feeds and deduplicating — that's what triggers when you set maxArticles > 100.
Q: Can I search by URL substring or specific publisher?
A: Use the site:domain.com operator inside your query, or use includeSources for post-filter whitelisting. Both work — site: filters at Google's side (cheaper), includeSources filters after fetching.
Q: Does this work for languages with non-Latin scripts (Chinese, Japanese, Arabic, etc.)?
A: Yes — queries are URL-encoded properly and Google News supports all major scripts. Just set the right language and country codes.
📜 Legal & ethical
This actor accesses publicly available Google News RSS feeds — the same data accessible to any RSS reader. No login required, no terms-of-service violation. Use the data responsibly:
- ✅ Respect publisher copyrights — don't republish full article text without permission
- ✅ Aggregate data and link to original sources
- ✅ Comply with GDPR / privacy laws for any downstream use of personal data in articles
- ❌ Don't use this to mass-scrape and republish news content