Pricing

$20.00/month + usage

Try for free

Go to Apify Store

Google News Scraper

Try for free

Extract full Google News articles with text, images & metadata. 95%+ success rate, multi-region support, smart content extraction with automatic fallbacks. Production-ready & cost-optimized

Pricing

$20.00/month + usage

Rating

5.0

(1)

Developer

Yevhenii Molodtsov

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Google News Bulk Scraper

Google News → publisher URLs → clean article text + images + metadata, with JS rendering, paywall, and consent-page fallbacks. HTTP-first, Playwright only when needed.

Scrape one query or thousands in a single run. Each article lands as its own dataset row with the full text, images, author, source, language, and a quality score — ready for NLP pipelines, media monitoring, or research datasets.

What You Get

Each article in the output includes:

title — headline as published
url — canonical publisher URL (not the Google News redirect)
source — publisher name (e.g. "Reuters", "TechCrunch")
publishedAt — ISO 8601 timestamp
author — byline when available
text — clean full-text content (300+ characters, validated)
images — OG image, featured image, and in-article images with alt text
language — detected content language
extractionSuccess — boolean flag for downstream filtering
contentQuality — score (0-100), level (low/medium/high), and warnings

Set fetchArticleDetails: false to skip crawling and get RSS metadata only (title, source, date, link) at minimal cost.

Quick Start

Using Apify Console

Visit Apify Console
Search for "Google News Scraper"
Configure your search parameters
Run the actor

Using Apify CLI

npm install -g apify-cli

# Single query
apify call google-news-scraper --input '{
  "query": "Tesla",
  "maxItemsPerUrl": 10
}'

# Multiple queries (string shorthand)
apify call google-news-scraper --input '{
  "queries": ["tesla", "apple"],
  "maxItemsPerUrl": 10
}'

# Multiple queries with passthrough fields
apify call google-news-scraper --input '{
  "queries": [
    { "query": "Kim Kardashian", "profileUrl": "https://news.google.com/search?q=kim+kardashian" },
    { "query": "MrBeast" }
  ],
  "maxItemsPerUrl": 10,
  "maxItems": 15
}'

Using Apify API

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('google-news-scraper').call({
    queries: [
        { query: 'Taylor Swift', profileUrl: 'https://news.google.com/search?q=taylor+swift' },
        { query: 'Elon Musk', profileUrl: 'https://news.google.com/search?q=elon+musk' },
    ],
    maxItemsPerUrl: 10,
    maxItems: 50,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
// items is a flat array of articles, each with query + passthrough fields merged in
console.log(items);

Input Modes

Most Common: Single Query

{
    "query": "artificial intelligence",
    "maxItemsPerUrl": 10
}

That's it — one query, up to 10 articles.

Bulk: Multiple Queries

Pass an array of strings to scrape several topics in one run:

{
    "queries": ["tesla", "apple", "nvidia"],
    "maxItemsPerUrl": 10
}

Advanced: Queries with Passthrough Fields

Each query can be an object. Any field besides query is passed through to every output article for that query — useful for linking results back to your own IDs, profile URLs, or tags:

{
    "queries": [
        { "query": "Kim Kardashian", "profileUrl": "https://news.google.com/search?q=kim+kardashian" },
        { "query": "MrBeast", "customField": "my-tag" },
        "Taylor Swift"
    ],
    "maxItemsPerUrl": 10,
    "maxItems": 25
}

Precedence: queries > query. If both are provided, queries wins.

Configuration

Input Parameters

Parameter	Type	Required	Default	Description
`query`	string	No*	-	Simple search query string
`queries`	array	No*	-	Array of strings or objects with `query` and optional passthrough fields
`maxItemsPerUrl`	integer	No	`50`	Max articles per individual query
`maxItems`	integer	No	`0`	Optional global cap on total articles (0 = unlimited)
`fetchArticleDetails`	boolean	No	`true`	If false, skip article crawling and return RSS metadata only
`region`	string	No	`"US"`	Country code (US, GB, CA, AU, DE, ES, MX, IT)
`language`	string	No	`"en-US"`	Language code (en-US, en-GB, en-CA, en-AU, de-DE, es-ES, es-MX, it-IT)
`dateFrom`	string	No	-	Start date (YYYY-MM-DD)
`dateTo`	string	No	-	End date (YYYY-MM-DD)
`disableBrowserFallback`	boolean	No	`false`	Skip Playwright fallback — cheaper but may return fewer articles
`proxyConfiguration`	object	No	Apify Proxy enabled	Proxy settings; defaults to Apify Proxy

*At least one of query or queries is required.

How Extraction Works

The pipeline resolves every Google News redirect to the real publisher URL, then extracts content through six ordered strategies — stopping at the first one that produces 300+ characters of text with images:

HTTP fetch — fast, cheap, works for most publishers
Playwright browser — automatic fallback for JS-rendered or consent-gated pages
Readability / Extractus / JSON-LD / custom selectors / meta tags / heuristics — six extraction strategies tried in order

Every article is quality-scored (text length, image presence, error-page detection). Low-quality results are filtered before they reach your dataset.

Estimated Cost

All costs depend on article count, target sites, and proxy tier. The numbers below are rough guidelines based on typical runs using Apify Proxy (datacenter tier).

Scenario	Articles	Typical Cost
RSS metadata only (`fetchArticleDetails: false`)	100	~$0.01 – $0.02
Full text, HTTP-first (most sites)	100	~$0.05 – $0.10
Full text, mixed HTTP + Playwright fallback	100	~$0.10 – $0.25
Heavy JS sites (frequent Playwright)	100	~$0.20 – $0.50

Cost levers you control:

fetchArticleDetails: false — skip article crawling entirely for near-zero cost
disableBrowserFallback: true — stay HTTP-only, ~2-5x cheaper, fewer articles from JS-heavy sites
maxItemsPerUrl / maxItems — hard caps on article count
Proxy tier — datacenter is default and cheapest; residential auto-escalates only on repeated 429/403 errors

Limitations

Be aware of these before you buy:

Paywalled sites — articles behind hard paywalls (WSJ, FT, NYT subscriber-only) will return partial text or fail. The scraper extracts whatever is publicly visible.
Heavy bot protection — sites with aggressive Cloudflare challenges or CAPTCHAs may need multiple retries and residential proxies, increasing cost.
Region/language variance — Google News returns different articles depending on region and language. The same query may yield different results from US vs DE.
RSS feed limits — Google News RSS feeds return a limited window of articles (roughly 24-72 hours). For historical coverage, use dateFrom/dateTo date slicing, which the scraper handles automatically.
Image availability — some publishers strip images or serve them via CDN policies that block external access. Articles without valid images receive a lower quality score.

Output Format

Output is a flat array of articles. Each article is a separate dataset entry with the query string and any passthrough fields merged at the top level:

[
    {
        "query": "Taylor Swift",
        "profileUrl": "https://news.google.com/search?q=taylor+swift",
        "title": "Taylor Swift Announces New Album - Billboard",
        "url": "https://www.billboard.com/2025/08/05/taylor-swift-new-album.html",
        "source": "Billboard",
        "publishedAt": "2025-08-05T14:08:57.000Z",
        "author": "Jane Smith",
        "text": "Full article content...",
        "description": "Brief summary of the article...",
        "images": [
            {
                "url": "https://example.com/image.jpg",
                "type": "featured-og",
                "alt": "Image description"
            }
        ],
        "tags": ["Taylor Swift"],
        "language": "en",
        "extractionSuccess": true,
        "contentQuality": {
            "score": 85,
            "level": "high",
            "isValid": true,
            "warnings": []
        }
    },
    {
        "query": "MrBeast",
        "customField": "test-passthrough",
        "title": "MrBeast Breaks YouTube Record",
        "url": "https://www.example.com/mrbeast-record.html",
        "source": "Example News",
        "publishedAt": "2025-08-05T10:00:00.000Z",
        "text": "Full article content...",
        "..."
    }
]

Development

Setup

git clone https://github.com/YevheniiM/google-news-scrapper
cd google-news-scrapper
npm install

Running

# Production
npm start

# Development mode (DEBUG=true, NODE_ENV=development)
npm run dev

# Development with file watching
npm run dev:watch

Create an INPUT.json at the project root for local input:

{
    "queries": [{ "query": "Taylor Swift" }, { "query": "Elon Musk" }],
    "maxItemsPerUrl": 5
}

Testing

# Run all tests
npm test

# Watch mode
npm run test:watch

# With coverage
npm run test:coverage

Formatting

npm run format
npm run format:check

License

MIT -- see LICENSE for details.

Acknowledgments

Apify SDK and Crawlee for the scraping framework
@mozilla/readability and @extractus/article-extractor for content extraction
fast-xml-parser for RSS parsing

Google News Scraper

codingfrontend/google-news-scraper

Scrape news articles from news.google.com with deep article content extraction

codingfrontend

Google News Scraper

futurizerush/google-news-scraper

Google News Search Scraper - Real-time news aggregation from Google News. Features smart article enrichment with full content extraction. Perfect for market research, trend analysis, and content monitoring.

Futurize Rush

5.0

Google News Scraper

fortuitous_pirate/google-news-scraper

Fortuitous Pirate

Google News Scraper

piotrv1001/google-news-scraper

Scrapes news articles from Google News, extracting titles, sources, publication dates, and links. Search by keywords, browse by topic, or get top headlines with multi-language and region support. Ideal for news monitoring, media analysis, and content aggregation.

FalconScrape

Google News Article Scraper

webscrap18/google-news-article-scraper

Scrape Google News, Extract full content with Title, Article Text, Images and Structured data.

WebScrap

Google News Scraper

easyapi/google-news-scraper

Powerful Google News scraper, collect up to 5000 news articles with flexible search options, language support. Perfect for news aggregation, market research, and sentiment analysis. 📰🔍

EasyApi

4.6

✅ CHEAP GOOGLE NEWS SCRAPPER ✅

shoya/cheap-google-news-scrapper

Extract news articles from Google News with unlimited keywords, custom location, language, and time period filters. Supports advanced search operators, topic-based scraping, and automatic deduplication. One of the most affordable Google News scrapers on Apify optimized for speed and cost efficiency.