Real-Time Google News Scraper (Keywords + Topics + AI-ready)
Pricing
from $3.50 / 1,000 results
Real-Time Google News Scraper (Keywords + Topics + AI-ready)
Extract structured, real-time news data from Google News using keywords or topic-based scraping.
Pricing
from $3.50 / 1,000 results
Rating
0.0
(0)
Developer
Ahmed Jasarevic
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
π° Google News Scraper Actor
A fast and scalable Apify Actor for scraping Google News articles by keywords or topics, with full article extraction, deduplication, and structured dataset output.
It supports:
- π Keyword-based search
- π§ Topic-based news scraping
- π Region + language control
- β‘ Fast parallel scraping
- π§ Article extraction (title, snippet, image)
- π¦ Clean Apify dataset output
π Features
- Google News RSS scraping
- Full article decoding (Google News redirect resolver)
- HTML article parsing
- Block detection fallback
- Threaded scraping for speed
- Apify dataset integration (
Actor.push_data) - Supports multiple topics in one run
βοΈ Input Configuration
πΉ Mode: Keywords
Search news using keywords.
Example input
{"mode": "keywords","keywords": ["ai", "openai"],"maxArticles": 20,"region_language": "US:en","timeframe": "1d"}
πΉ Mode: Topics
Scrape full Google News categories.
Available topics
- WORLD
- NATION
- BUSINESS
- TECHNOLOGY
- ENTERTAINMENT
- SPORTS
- SCIENCE
- HEALTH
Example input
{"mode": "topics","topics": ["BUSINESS", "TECHNOLOGY"],"maxArticles": 20,"region_language": "US:en","timeframe": "1d"}
π€ Output Format (Dataset Item)
Each scraped article is stored in the Apify dataset.
Example output
{"position": 7,"title": "A fragile jihadist-separatist alliance in Mali","link": "https://www.france24.com/en/africa/20260501-example","domain": "www.france24.com","source": "www.france24.com","snippet": "A fragile alliance between jihadist and separatist groups in Mali is evolving...","thumbnail": "https://example.com/image.jpg","date_utc": "2026-05-02T16:36:39.390391+00:00"}
π Dataset Fields
| Field | Description |
|---|---|
| title | Article title |
| link | Direct article URL |
| domain | Source domain |
| source | Source domain (duplicate for compatibility) |
| snippet | Extracted article summary |
| thumbnail | OG image if available |
| position | Ranking position in RSS |
| date_utc | Scrape timestamp |
π How It Works
Step 1
Fetch Google News RSS feed:
https://news.google.com/rss/search?q=QUERY
Step 2
Decode Google redirect links β real article URL
Step 3
Scrape article HTML:
- title
- meta description / og:description
- first paragraphs fallback
- og:image
Step 4
Push results to Apify dataset
β‘ Performance
- Multi-threaded scraping (ThreadPoolExecutor)
- 8 workers by default
- Fast RSS parsing
- Block detection fallback
π‘ Block Handling
The scraper detects:
- JS requirement pages
- βAccess deniedβ
- βSubscribe to continueβ
- adblock walls
If blocked:
- Falls back to RSS snippet
π Example Use Cases
- AI news monitoring
- Financial news tracking
- Competitor intelligence
- Media dashboards
- Research automation
- Trending topic tracking
π Example Output Flow
Input:
{"mode": "keywords","keywords": ["elon musk", "openai"],"maxArticles": 10}
Output:
- 10 cleaned articles
- deduplicated
- full URLs
- structured dataset
π§ Notes
- Google News RSS returns ~100 items per query
- Final output is limited by
maxArticles - Topics are aggregated before scraping
- Keywords use OR logic
π Future upgrades
- sentiment scoring
- AI clustering by topic
- full-text extraction fallback (Readability)
- proxy rotation
- webhook export (Zapier / Supabase)