Google News Scraper — Canonical URLs & Brand Tracking avatar

Google News Scraper — Canonical URLs & Brand Tracking

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Google News Scraper — Canonical URLs & Brand Tracking

Google News Scraper — Canonical URLs & Brand Tracking

Scrape Google News by keyword, brand, or topic across 50+ countries. Returns canonical publisher URLs (not Google redirects), source domains, dates, snippets, and thumbnails. Filter by date range or source. 100+ results per query, 3-second cold start, no proxy required.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

3

Monthly active users

2 days ago

Last modified

Categories

Share

📰 Google News Scraper — Canonical Publisher URLs, Brand & Topic Tracking

Google News Scraper — Canonical URLs & Brand Tracking

Scrape Google News for any keyword, brand or topic across 50+ countries and 30+ languages. Unlike most Google News scrapers, this Actor returns canonical publisher URLs (not Google redirect links) plus source domains, thumbnails, publication dates and 100+ results per query — all exported to JSON, CSV or Excel.

Track brands, monitor topics, build news datasets and power PR and crisis-monitoring pipelines. Pure HTTP with a ~3-second cold start, no login, no API key and no proxy required for typical use.

✨ What this Actor does / Key features

  • 🔗 Canonical publisher URLs — resolves Google News redirects to the real article URL on the publisher's site
  • 🌍 50+ countries, 30+ languages — any Google News language/country combination, including non-Latin scripts
  • 🔎 Four feed modes — keyword search, topic browsing, location-based news and country top headlines
  • 🧮 Supports Google search operators in queries — exact phrase, OR, -exclude, site:, intitle:
  • 📅 Date range filters (fromDate/toDate) and a recent timeWindow operator
  • 🏷️ Source whitelist & blacklist by publisher domain
  • 📈 100+ results per query via automatic time-window stitching (up to 500 per query) with GUID deduplication
  • 🖼️ Extracts source domain, source homepage URL and article thumbnails
  • ⚡ Pure HTTP — ~3s cold start, multi-query batch with cross-query dedup

🔍 Input

FieldTypeDescription
queriesarraySearch keywords/topics. Each becomes a separate Google News feed. Supports "exact phrase", OR, -word, site:domain.com, intitle:word. Leave empty if using topic or geoLocation.
topicstring (select)Browse top stories by topic instead of searching: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH.
geoLocationstringGet news for a specific city/region (English name, e.g. London, Tokyo).
topHeadlinesbooleanFetch the main Google News top headlines feed for the selected language/country, overriding queries/topic/geo.
languagestringGoogle News interface language (e.g. en-US, fr, de, ja, ar, tr).
countrystringISO 2-letter country code (e.g. US, GB, DE, JP, IN).
maxArticlesintegerMax articles per query/feed (0–500). Above 100, the Actor stitches multiple time-windowed feeds and deduplicates.
timeWindowstring (select)Restrict to a recent window: 1h, 12h, 1d, 7d, 30d, 1y. Ignored when maxArticles > 100.
fromDate / toDatestringISO date filters applied post-fetch on publication date.
includeSourcesarrayWhitelist of publisher domains — only these sources are returned.
excludeSourcesarrayBlacklist of publisher domains — these sources are filtered out.
resolveUrlsbooleanFollow Google News redirects to return the canonical publisher URL (default true).
extractThumbnailsbooleanInclude the article thumbnail image URL when available (default true).
proxyConfigurationobjectOptional Apify proxy — rarely needed, Google News RSS seldom rate-limits.

🚀 Example input

{
"queries": ["openai", "\"artificial intelligence\" -hype site:reuters.com"],
"maxArticles": 200,
"language": "en-US",
"country": "US",
"fromDate": "2026-04-01",
"toDate": "2026-05-01",
"includeSources": ["reuters.com", "bloomberg.com", "ft.com"],
"resolveUrls": true
}

📦 Output

Each article is saved as one structured record.

FieldDescription
titleArticle headline (Google's - Publisher suffix stripped)
descriptionPlain-text article snippet (HTML stripped)
sourcePublisher display name (e.g. Reuters, BBC)
sourceDomainPublisher apex domain (e.g. reuters.com) — easy filtering
sourceUrlPublisher homepage URL
linkGoogle News redirect URL (always present)
originalUrlCanonical publisher URL when resolveUrls is on
thumbnailUrlArticle thumbnail image URL when available
publishedAtPublication date/time in ISO 8601 (UTC)
publishedAtRawOriginal RFC 822 date string from the RSS feed
guidUnique Google News article identifier
querySearch query that found this article (search mode)
topicTopic code (topic mode)
geoLocationLocation name (geo mode)
feedTypesearch, topic, geo or top_headlines
language / countryFeed language and country codes used
timeWindowTime window applied to this fetch, or null
scrapedAtWhen the article was scraped (ISO 8601)

Three pre-built dataset views are included: Overview, By query and By source. Export as JSON, CSV, Excel or XML.

💡 Use cases

  • PR & brand monitoring — track every mention of your company or competitors across the global news cycle.
  • Investment & market research — real-time news signals for stocks, crypto, commodities and geopolitical events.
  • Crisis monitoring — hourly scheduled runs alert your team when negative coverage breaks.
  • SEO & content teams — discover which publishers cover your industry and find backlink opportunities.
  • News curation — build daily newsletters, briefings and RSS-to-Slack pipelines.
  • AI / ML teams — assemble clean, structured news datasets with canonical URLs for NLP, sentiment and summarization models.

❓ Frequently Asked Questions

Do I need a Google account or API key? No. The Actor reads publicly available Google News RSS feeds — the same data any RSS reader can access. No login and no API key required.

Is this legal? The Actor accesses public Google News feeds. Use the data responsibly: respect publisher copyrights, link to original sources, and do not republish full article text. Comply with GDPR and privacy laws for any downstream use.

Why are some originalUrl fields null? A small percentage of Google News redirects fail or time out. Use the raw link field as a fallback, or increase the Actor's timeout in the Apify console.

Can I get the full article text? No — the Actor returns headlines, snippets and URLs. Pipe originalUrl into a content-extraction tool (e.g. Trafilatura, Mozilla Readability) for full text.

How do I get more than 100 results per query? Set maxArticles above 100. Google News RSS caps a single feed at ~100 items, so the Actor automatically requests multiple time-windowed feeds and deduplicates by GUID, delivering up to 500 per query.

Does Google rate-limit this? Google News RSS is generous and most users never hit limits. If you do, enable Apify Proxy in the input config.

Does it support non-English / non-Latin scripts? Yes. Queries are URL-encoded properly and Google News supports all major scripts (Chinese, Japanese, Arabic, etc.) — just set the right language and country.

What output formats are supported? Results are stored in an Apify dataset and can be exported as JSON, CSV, Excel or XML, or pulled via the Apify API.

⏰ Scheduling & integration

Schedule this Actor on Apify to monitor brands and topics hourly or daily. Export results to JSON, CSV or Excel, call it via the Apify API, or connect it to Slack, Google Sheets and webhooks through Apify integrations for automated news alerts and digests.


Changelog

  • 2026-06-01 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

  • 2026-05-25 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

Last reviewed: 2026-06-01.