Real-Time Google News Scraper (Keywords + Topics + AI-ready) avatar

Real-Time Google News Scraper (Keywords + Topics + AI-ready)

Pricing

from $3.50 / 1,000 results

Go to Apify Store
Real-Time Google News Scraper (Keywords + Topics + AI-ready)

Real-Time Google News Scraper (Keywords + Topics + AI-ready)

Extract structured, real-time news data from Google News using keywords or topic-based scraping.

Pricing

from $3.50 / 1,000 results

Rating

0.0

(0)

Developer

Ahmed Jasarevic

Ahmed Jasarevic

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

πŸ“° Google News Scraper Actor

A fast and scalable Apify Actor for scraping Google News articles by keywords or topics, with full article extraction, deduplication, and structured dataset output.

It supports:

  • πŸ” Keyword-based search
  • 🧭 Topic-based news scraping
  • 🌍 Region + language control
  • ⚑ Fast parallel scraping
  • 🧠 Article extraction (title, snippet, image)
  • πŸ“¦ Clean Apify dataset output

πŸš€ Features

  • Google News RSS scraping
  • Full article decoding (Google News redirect resolver)
  • HTML article parsing
  • Block detection fallback
  • Threaded scraping for speed
  • Apify dataset integration (Actor.push_data)
  • Supports multiple topics in one run

βš™οΈ Input Configuration

πŸ”Ή Mode: Keywords

Search news using keywords.

Example input

{
"mode": "keywords",
"keywords": ["ai", "openai"],
"maxArticles": 20,
"region_language": "US:en",
"timeframe": "1d"
}

πŸ”Ή Mode: Topics

Scrape full Google News categories.

Available topics

  • WORLD
  • NATION
  • BUSINESS
  • TECHNOLOGY
  • ENTERTAINMENT
  • SPORTS
  • SCIENCE
  • HEALTH

Example input

{
"mode": "topics",
"topics": ["BUSINESS", "TECHNOLOGY"],
"maxArticles": 20,
"region_language": "US:en",
"timeframe": "1d"
}

πŸ“€ Output Format (Dataset Item)

Each scraped article is stored in the Apify dataset.

Example output

{
"position": 7,
"title": "A fragile jihadist-separatist alliance in Mali",
"link": "https://www.france24.com/en/africa/20260501-example",
"domain": "www.france24.com",
"source": "www.france24.com",
"snippet": "A fragile alliance between jihadist and separatist groups in Mali is evolving...",
"thumbnail": "https://example.com/image.jpg",
"date_utc": "2026-05-02T16:36:39.390391+00:00"
}

πŸ“Š Dataset Fields

FieldDescription
titleArticle title
linkDirect article URL
domainSource domain
sourceSource domain (duplicate for compatibility)
snippetExtracted article summary
thumbnailOG image if available
positionRanking position in RSS
date_utcScrape timestamp

πŸ”„ How It Works

Step 1

Fetch Google News RSS feed:

https://news.google.com/rss/search?q=QUERY

Step 2

Decode Google redirect links β†’ real article URL

Step 3

Scrape article HTML:

  • title
  • meta description / og:description
  • first paragraphs fallback
  • og:image

Step 4

Push results to Apify dataset


⚑ Performance

  • Multi-threaded scraping (ThreadPoolExecutor)
  • 8 workers by default
  • Fast RSS parsing
  • Block detection fallback

πŸ›‘ Block Handling

The scraper detects:

  • JS requirement pages
  • β€œAccess denied”
  • β€œSubscribe to continue”
  • adblock walls

If blocked:

  • Falls back to RSS snippet

πŸ“Œ Example Use Cases

  • AI news monitoring
  • Financial news tracking
  • Competitor intelligence
  • Media dashboards
  • Research automation
  • Trending topic tracking

πŸš€ Example Output Flow

Input:

{
"mode": "keywords",
"keywords": ["elon musk", "openai"],
"maxArticles": 10
}

Output:

  • 10 cleaned articles
  • deduplicated
  • full URLs
  • structured dataset

🧠 Notes

  • Google News RSS returns ~100 items per query
  • Final output is limited by maxArticles
  • Topics are aggregated before scraping
  • Keywords use OR logic

πŸ“ˆ Future upgrades

  • sentiment scoring
  • AI clustering by topic
  • full-text extraction fallback (Readability)
  • proxy rotation
  • webhook export (Zapier / Supabase)