Deprecated

Pricing

$0.70 / 1,000 results

See alternative Actors

Go to Apify Store

Google News Scraper

Deprecated

See alternative Actors

[💰 $0.70 / 1K] Production-grade Google News scraper. Search by keywords, topics or any Google News topic/section URL. HD thumbnails, decoded publisher URLs, descriptions, full metadata. Up to 50,000 articles per run.

Pricing

$0.70 / 1,000 results

Rating

0.0

(0)

Developer

Kitcune Mia

Actor stats

Bookmarked

Total users

Monthly active users

25 days ago

Last modified

📰 Google News Scraper

Production-grade Google News scraper. Browser-grade TLS fingerprinting, Apify Residential US proxies, three independent retry budgets, HD thumbnails — all without a single line of headless-browser code.

Pull clean, structured news data from Google News by keywords, predefined topics, or any topic/section URL you paste from your browser. Works for any country and language. Scales from a 5-article spot check to 50,000-article archives.

⚡ Why This Scraper

🧬 Real browser fingerprint, no headless browser

Built on curl_cffi with Chrome TLS impersonation. Google sees a real Chrome handshake — JA3, ALPN, HTTP/2 frames — without paying the cost of Selenium, Playwright, or a JS engine. Result: ~10× lower latency and CPU than a headless setup, the same survival rate against bot defenses.

🌎 Residential US IPs that rotate per request

Every HTTP call gets a fresh exit IP from Apify's residential pool. No session pinning, no shared rate limits between requests. You can ramp parallelism without tripping Google's 429 wall.

🛡️ Three independent retry budgets

The scraper runs three separate I/O stages — RSS fetch · URL decode · publisher page fetch — and each has its own retry counter, exponential backoff with jitter, and 15-second hard timeout. A single flaky publisher can't drain the budget you need to keep ingesting RSS feeds.

🖼️ HD images without visiting publisher pages

Most scrapers that want article thumbnails have to follow every link to the publisher and parse <meta og:image>. This one fetches Google News' HTML index in parallel with the RSS feed and pulls the HD thumbnail straight from news.google.com/api/attachments/.... Free images, zero extra requests per article.

🔓 Smart URL decoder

Optional decodeUrls resolves news.google.com/rss/articles/CBMi... redirects to the actual publisher URL via Google's Fbv4je batchexecute endpoint — so you can store, deduplicate, and re-crawl real source URLs.

📊 50,000-article archives, day-by-day

Google News RSS caps each feed at ~100 results. When you ask for maxItems > 100, the scraper automatically splits the date range one day at a time and parallelizes across days — no manual chunking, no coordination on your side.

🚀 Quick Start

Drop this into the Actor input:

{
  "keywords": ["Elon Musk"],
  "maxArticles": 10,
  "timeframe": "1d",
  "region_language": "US:en",
  "extractImages": true
}

You get back HD-illustrated, ISO-timestamped, source-attributed news in seconds.

🧭 Two Ways to Search (Mix and Match)

🔍 Mode 1 — Keyword search

Full Google search operator support (site:, intitle:, inurl:, OR, AND, -, "…").

{
  "keywords": ["bitcoin", "ethereum -dogecoin", "intitle:\"AI\" site:bbc.com"],
  "timeframe": "1h",
  "maxArticles": 50
}

Perfect for brand monitoring · competitor tracking · trend detection · SEO research.

📰 Mode 2 — Topics & sections

Predefined Google News topics: WORLD · NATION · BUSINESS · TECHNOLOGY · ENTERTAINMENT · SPORTS · SCIENCE · HEALTH

{
  "topics": ["TECHNOLOGY", "BUSINESS"],
  "maxArticles": 100,
  "region_language": "FR:fr"
}

Custom topic/section URLs — browse Google News, navigate to anything (e.g. Sports → F1, Tech → Artificial Intelligence), and paste the URL straight from your address bar:

{
  "topicUrls": [
    "https://news.google.com/topics/CAAq.../sections/CAQi...?hl=en-US&gl=US&ceid=US:en"
  ]
}

You can combine all three modes (keywords + topics + topicUrls) in a single run.

📦 Output Schema

Every article is one clean JSON record:

{
  "title": "Elon Musk tells his side of OpenAI's beginnings - PBS",
  "source": "PBS",
  "sourceUrl": "https://www.pbs.org",
  "url": "https://www.pbs.org/newshour/show/elon-musk-...",
  "rssLink": "https://news.google.com/rss/articles/CBMi...",
  "guid": "CBMi...",
  "articleId": "CBMi...",
  "publishedAt": "2026-04-30T00:46:35.000Z",
  "publishedTimestamp": 1777509995000,
  "image": "https://news.google.com/api/attachments/CC8i...-w400-h224-p-df-rw",
  "description": "Tesla and SpaceX CEO Elon Musk took the witness stand...",
  "loadedUrl": "https://www.pbs.org/newshour/show/elon-musk-...",
  "metadata": {
    "scrapeTimestamp": "2026-04-30T13:57:26.461Z",
    "sourceType": "keyword",
    "timeframe": "1d",
    "region": "US",
    "language": "en",
    "keyword": "Elon Musk"
  }
}

Field	Always?	Notes
`title`, `source`, `sourceUrl`	✅	From RSS
`url`	✅	Resolved publisher URL when `decodeUrls=true`, else Google News redirect
`rssLink`, `guid`, `articleId`	✅	The CBMi… identifier extracted in three forms
`publishedAt`, `publishedTimestamp`	✅	ISO-8601 UTC + Unix epoch ms
`image`	when `extractImages=true` (default)	Google News HD thumbnail; upgraded to publisher's `og:image` if `extractDescriptions=true`
`description`	when `extractDescriptions=true`	`og:description` from publisher page
`loadedUrl`	when publisher page was fetched	Final URL after redirects
`metadata.sourceType`	✅	`keyword` / `topic` / `topic_url`

Pipe this directly into BigQuery, Snowflake, Airtable, Google Sheets, Slack, your CRM, or anything that speaks JSON / CSV / XLSX.

📋 Input Reference

Sources (at least one required)

Field	Type	Description
`keywords`	`string[]`	Multiple search queries
`query`	`string`	Single query (added to `keywords` if both are set)
`topics`	`string[]`	One or more of `WORLD · NATION · BUSINESS · TECHNOLOGY · ENTERTAINMENT · SPORTS · SCIENCE · HEALTH`
`topicUrls`	`string[]`	Full Google News topic/section URLs (or bare `{TOPIC_ID}[/sections/{SECTION_ID}]`)

Volume

Field	Type	Default	Description
`maxItems`	`int` 1–50,000	—	Global cap. `>100` triggers automatic day-by-day splitting
`maxArticles`	`int`	`10`	Per-source cap (used when `maxItems` is not set)

Time window

Field	Type	Description
`timeframe`	`1h` / `1d` / `7d` / `1m` / `1y` / `all`	Window for keyword searches
`time_period`	`last_hour` / `last_day` / `last_week` / `last_month` / `last_year` / `custom`	Alias for `timeframe`
`time_period_min`, `time_period_max`	`MM/DD/YYYY`	Used with `time_period=custom`

Locale & filters

Field	Type	Description
`region_language`	`"US:en"`, `"FR:fr"`, …	Combined region:language
`gl`	ISO country (lowercase)	Overrides region
`hl`	ISO language (lowercase)	Overrides language
`lr`	`lang_en`, `lang_fr`, …	Restrict results to a language
`cr`	ISO country (lowercase)	Restrict results to a country
`nfpr`	`0` / `1`	`1` = disable auto-correct
`filter`	`0` / `1`	`1` = enable Similar/Omitted Results filter

Extraction

Field	Type	Default	Description
`decodeUrls`	`bool`	`false`	Resolve to publisher URLs
`extractDescriptions`	`bool`	`false`	Fetch `og:description` (implies `decodeUrls`)
`extractImages`	`bool`	`true`	Include HD thumbnails

Advanced

Field	Type	Default	Description
`proxyConfiguration`	`object`	Apify Residential US	Standard Apify proxy input
`concurrency`	`int` 1–128	`32`	In-flight requests for decode + article stages
`retryBudgetRss`	`int` 1–10	`4`	Independent retry attempts per RSS feed
`retryBudgetDecode`	`int` 1–10	`3`	Independent retry attempts per URL decode
`retryBudgetArticle`	`int` 1–10	`2`	Independent retry attempts per publisher page fetch

💡 Recipes

🔔 Brand monitoring — last hour, multi-keyword

{
  "keywords": ["Acme Corp", "Acme CEO", "Acme product line"],
  "timeframe": "1h",
  "maxArticles": 100,
  "extractImages": true
}

📊 Daily SEO digest with full content

{
  "topics": ["TECHNOLOGY", "BUSINESS"],
  "maxArticles": 50,
  "decodeUrls": true,
  "extractDescriptions": true,
  "extractImages": true
}

📚 Historical archive (5,000 articles, full year)

{
  "query": "climate change",
  "time_period": "custom",
  "time_period_min": "01/01/2025",
  "time_period_max": "12/31/2025",
  "maxItems": 5000,
  "lr": "lang_en"
}

🌍 Cross-language coverage

{
  "keywords": ["Olympics 2028"],
  "gl": "fr",
  "hl": "fr",
  "lr": "lang_fr"
}

🧪 Specific section (e.g. Tech → AI)

{
  "topicUrls": [
    "https://news.google.com/topics/CAAq.../sections/CAQi...?hl=en-US&gl=US&ceid=US:en"
  ],
  "maxArticles": 50
}

🔌 No-Code Integrations

Wires up to Make, n8n, Zapier, Pipedream out of the box. Common flows:

Schedule: run every 15 min / hourly / daily via Apify Schedules
Push: send new articles to Google Sheets, Airtable, Notion, Slack, Discord, or your CRM
Alert: trigger when a brand or keyword crosses a threshold

🏗️ Architecture

┌──────────────────────────────────────────┐
                    │  curl_cffi  ·  Chrome TLS impersonation  │
                    │  Apify Residential US (rotating IP)      │
                    └──────────────────────────────────────────┘
                                       │
                ┌──────────────────────┼─────────────────────┐
                ▼                      ▼                     ▼
        ┌──────────────┐      ┌──────────────┐      ┌──────────────┐
        │  RSS feed    │      │  HTML index  │      │  URL decode  │
        │  (lxml)      │      │  (regex)     │      │  (Fbv4je)    │
        │  retry: 4×   │      │  retry: 4×   │      │  retry: 3×   │
        └──────┬───────┘      └──────┬───────┘      └──────┬───────┘
               │                     │                     │
               └─────────┬───────────┘                     │
                         ▼                                 ▼
                ┌──────────────────┐              ┌────────────────┐
                │  merged item +   │              │  publisher     │
                │  HD thumbnail    │              │  og:* fetch    │
                └────────┬─────────┘              │  retry: 2×     │
                         │                        └────────┬───────┘
                         └────────────┬───────────────────┘
                                      ▼
                              ┌──────────────┐
                              │  Apify       │
                              │  Dataset     │
                              └──────────────┘

Each retry budget is independent: a 5xx storm in one stage never burns the others.

🛠️ Local Development

git clone <this-repo>
cd google_news

pip install -r requirements.txt

# Drop a test input
mkdir -p storage/key_value_stores/default
cat > storage/key_value_stores/default/INPUT.json <<'JSON'
{
  "keywords": ["bitcoin"],
  "maxArticles": 5,
  "extractImages": true
}
JSON

python -m src

Output appears under storage/datasets/default/*.json.

ℹ️ Note: locally without APIFY_PROXY_PASSWORD, the scraper falls back to direct connections — RSS + image extraction work, but URL decoding requires residential US proxies (Google's Fbv4je rejects non-residential IPs).

Deploy to Apify

deploy.bat        # Windows
# or
apify push --force

🧰 Tech Stack

Python 3.12 · async/await throughout
curl_cffi 0.7+ — Chrome TLS impersonation, HTTP/2
apify SDK 3.x — platform integration, proxy, dataset
lxml — fast XML/HTML parsing
Custom regex parsers — for Google News HTML index and Fbv4je batchexecute responses

📄 License

MIT.

Google News Scraper

lhotanova/google-news-scraper

Gets featured articles from Google News with title, link, source, publication date and image.

Kristýna Lhoťanová

3.1K

4.6

(13)

Google News Scraper

easyapi/google-news-scraper

Powerful Google News scraper, collect up to 5000 news articles with flexible search options, language support. Perfect for news aggregation, market research, and sentiment analysis. 📰🔍

EasyApi

1.6K

3.8

(8)

Google News Scraper (Pay Per Event)

data_xplorer/google-news-scraper-fast

Scrape Google News in real time, including images and descriptions. This tool extracts complete structured data: high-resolution visuals, full, titles, sources, dates, and direct URLs.

Data Xplorer

876

4.9

(4)

Google News Scraper

epctex/google-news-scraper

Unlock timely news insights with our Google News data retrieval tool. Get the latest news on any news at any time, and more. Effortless and powerful. 📰🔍 #NewsData

epctex

588

5.0

(8)

Google News Scraper

crawlerbros/google-news-scraper

Scrape Google News in real-time. Supports keyword search, date filters, full-text article extraction, and image extraction.

Crawler Bros

5.0

(30)

Google News Scraper Fast & cheap ⭐ (Pay per results) 📰⚡

scrapestorm/google-news-scraper-fast-cheap-pay-per-results

Unlock the power of the Google News scraper tool! 📰✨ Effortlessly gather news articles based on your chosen Keyword or topic 🔍. Get key details like the title 📝 source 🌐, publication time ⏰, images 🖼️, & direct links to the full articles 🔗perfect for staying informed and ahead of the curve! 🚀

Storm_Scraper

683

3.7

(21)

Google News Scraper

automation-lab/google-news-scraper

📰 Extract Google News articles: headlines, sources, publication dates, descriptions, and URLs. Search by keyword or browse topics (Tech, Business, Sports). Pure HTTP — fast, reliable, no browser needed.

Stas Persiianenko

190

Google News Realtime Scraper

devisty/google-news

Provide real-time news and articles sourced from Google News

Devisty

257

Google News Scraper

data_xplorer/google-news-scraper

🚀 Google News data at your fingertips: Our powerful scraper delivers real-time news monitoring across 70+ region/language combinations. Built as a developer-friendly API alternative, it transforms raw news feeds into structured, actionable data.

Data Xplorer

130

5.0

(2)

Google News Scraper

xmolodtsov/google-news-scraper

Extract full Google News articles with text, images & metadata. 95%+ success rate, multi-region support, smart content extraction with automatic fallbacks. Production-ready & cost-optimized

Yevhenii Molodtsov

5.0

(1)

Google News Scraper API | PR & Sentiment Monitoring

andok/google-news-scraper

Instantly scrape recent news articles and headlines by keyword from Google News. Automate your media monitoring and PR tracking.

Andok

Google News Scraper

📰 Google News Scraper

⚡ Why This Scraper

🧬 Real browser fingerprint, no headless browser

🌎 Residential US IPs that rotate per request

🛡️ Three independent retry budgets

🖼️ HD images without visiting publisher pages

🔓 Smart URL decoder

📊 50,000-article archives, day-by-day

🚀 Quick Start

🧭 Two Ways to Search (Mix and Match)

🔍 Mode 1 — Keyword search

📰 Mode 2 — Topics & sections

📦 Output Schema

📋 Input Reference

Sources (at least one required)

Volume

Time window

Locale & filters

Extraction

Advanced

💡 Recipes

🔔 Brand monitoring — last hour, multi-keyword

📊 Daily SEO digest with full content

📚 Historical archive (5,000 articles, full year)

🌍 Cross-language coverage

🧪 Specific section (e.g. Tech → AI)

🔌 No-Code Integrations

🏗️ Architecture

🛠️ Local Development

Deploy to Apify

🧰 Tech Stack

📄 License

You might also like

Google News Scraper

Google News Scraper

Google News Scraper (Pay Per Event)

Google News Scraper

Google News Scraper

Google News Scraper Fast & cheap ⭐ (Pay per results) 📰⚡

Google News Scraper

Google News Realtime Scraper

Google News Scraper

Google News Scraper

Google News Scraper API | PR & Sentiment Monitoring