Real-Time Google News Scraper (Keywords + Topics + AI-ready)
Pricing
from $3.50 / 1,000 results
Real-Time Google News Scraper (Keywords + Topics + AI-ready)
Extract structured, real-time news data from Google News using keywords or topic-based scraping.
Pricing
from $3.50 / 1,000 results
Rating
0.0
(0)
Developer
Ahmed Jasarevic
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
๐ฐ Google News Scraper Actor
A fast and scalable Apify Actor for scraping Google News articles by keywords or topics, with full article extraction, deduplication, and structured dataset output.
It supports:
- ๐ Keyword-based search
- ๐งญ Topic-based news scraping
- ๐ Region + language control
- โก Fast parallel scraping
- ๐ง Article extraction (title, snippet, image)
- ๐ฆ Clean Apify dataset output
๐ Features
- Google News RSS scraping
- Full article decoding (Google News redirect resolver)
- HTML article parsing
- Block detection fallback
- Threaded scraping for speed
- Apify dataset integration (
Actor.push_data) - Supports multiple topics in one run
โ๏ธ Input Configuration
๐น Mode: Keywords
Search news using keywords.
Example input
{"mode": "keywords","keywords": ["ai", "openai"],"maxArticles": 20,"region_language": "US:en","timeframe": "1d"}
๐น Mode: Topics
Scrape full Google News categories.
Available topics
- WORLD
- NATION
- BUSINESS
- TECHNOLOGY
- ENTERTAINMENT
- SPORTS
- SCIENCE
- HEALTH
Example input
{"mode": "topics","topics": ["BUSINESS", "TECHNOLOGY"],"maxArticles": 20,"region_language": "US:en","timeframe": "1d"}
๐ค Output Format (Dataset Item)
Each scraped article is stored in the Apify dataset.
Example output
{"position": 7,"title": "A fragile jihadist-separatist alliance in Mali","link": "https://www.france24.com/en/africa/20260501-example","domain": "www.france24.com","source": "www.france24.com","snippet": "A fragile alliance between jihadist and separatist groups in Mali is evolving...","thumbnail": "https://example.com/image.jpg","date_utc": "2026-05-02T16:36:39.390391+00:00"}
๐ Dataset Fields
| Field | Description |
|---|---|
| title | Article title |
| link | Direct article URL |
| domain | Source domain |
| source | Source domain (duplicate for compatibility) |
| snippet | Extracted article summary |
| thumbnail | OG image if available |
| position | Ranking position in RSS |
| date_utc | Scrape timestamp |
๐ How It Works
Step 1
Fetch Google News RSS feed:
https://news.google.com/rss/search?q=QUERY
Step 2
Decode Google redirect links โ real article URL
Step 3
Scrape article HTML:
- title
- meta description / og:description
- first paragraphs fallback
- og:image
Step 4
Push results to Apify dataset
โก Performance
- Multi-threaded scraping (ThreadPoolExecutor)
- 8 workers by default
- Fast RSS parsing
- Block detection fallback
๐ก Block Handling
The scraper detects:
- JS requirement pages
- โAccess deniedโ
- โSubscribe to continueโ
- adblock walls
If blocked:
- Falls back to RSS snippet
๐ Example Use Cases
- AI news monitoring
- Financial news tracking
- Competitor intelligence
- Media dashboards
- Research automation
- Trending topic tracking
๐ Example Output Flow
Input:
{"mode": "keywords","keywords": ["elon musk", "openai"],"maxArticles": 10}
Output:
- 10 cleaned articles
- deduplicated
- full URLs
- structured dataset
๐ง Notes
- Google News RSS returns ~100 items per query
- Final output is limited by
maxArticles - Topics are aggregated before scraping
- Keywords use OR logic
๐ Future upgrades
- sentiment scoring
- AI clustering by topic
- full-text extraction fallback (Readability)
- proxy rotation
- webhook export (Zapier / Supabase)