📰 Extract Google News Articles — AI & RAG Ready
Pricing
from $7.00 / 1,000 results
📰 Extract Google News Articles — AI & RAG Ready
Extract Google News articles by keyword, topic, or URL with full-text extraction for AI/RAG pipelines. Get headlines, sources, snippets, images, authors, and clean article text in structured JSON. Export scraped data, run the scraper via API, or integrate with other tools.
Pricing
from $7.00 / 1,000 results
Rating
0.0
(0)
Developer
Muhammad Afzal
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
📰 Google News Scraper — AI & RAG Ready
Extract Google News articles by keyword, topic, or URL with full-text extraction optimized for AI and RAG pipelines. Get headlines, sources, snippets, images, authors, and clean article text in structured JSON. Perfect for sentiment analysis, brand monitoring, competitive intelligence, and media tracking.
Features
- Multi-input mode — Search by keyword, browse by topic, or provide direct Google News URLs
- Full-text extraction — Visit article pages and extract clean text without ads, navigation, or boilerplate
- AI/RAG-ready output — Semantic field names, structured metadata, word count, and reading time estimates
- 25+ countries & 17 languages — Localized results for global news coverage
- Time filtering — Filter by last hour, day, week, month, or year
- URL decoding — Automatically resolves Google News redirect URLs to original article links
- RSS-first approach — Uses Google News RSS feeds for reliable, fast data extraction
- Article metadata — Extracts Open Graph tags, author, keywords, and description from publisher pages
Use Cases
- RAG pipelines & LLM training — Feed clean article text into vector databases and language models
- Brand monitoring & PR tracking — Track mentions across thousands of news sources in real time
- Sentiment analysis — Collect structured news data for NLP and sentiment models
- Competitive intelligence — Monitor competitors, market trends, and industry developments
- Media aggregation — Build custom news feeds combining topics, keywords, and regions
- Content curation — Discover trending stories and curate content for newsletters and blogs
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
| searchQueries | Array of strings | ["technology"] | Keyword searches. Supports Google operators (OR, -, "exact", site:). |
| topics | Array of strings | [] | News topics: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH |
| startUrls | Array of URLs | [] | Direct Google News URLs to scrape |
| maxResults | Integer | 50 | Maximum articles per search query or topic (1–5000) |
| country | String | US | Country code for localized results (25+ countries) |
| language | String | en | Language code (17 languages supported) |
| timePeriod | String | 1d | Time filter: 1h, 1d, 7d, 30d, 1y, all |
| extractFullText | Boolean | false | Extract full article text from publisher pages (slower, essential for AI/RAG) |
| decodeUrls | Boolean | true | Decode Google redirect URLs to original article links |
| proxyConfiguration | Object | Apify Proxy | Proxy settings for scraping |
Output
Each article contains:
| Field | Type | Description |
|---|---|---|
title | String | Article headline |
url | String | null | Decoded original article URL (when decodeUrls enabled) |
source | String | null | Publisher name (e.g., "BBC News", "Reuters") |
publisher_url | String | null | Publisher website URL (e.g., "https://www.reuters.com") |
author | String | null | Article author (when extractFullText enabled) |
published_at | String | null | Publication timestamp (ISO 8601) |
snippet | String | null | Article summary from Google News |
image_url | String | null | Article thumbnail image URL |
category | String | null | News category (e.g., "TECHNOLOGY", "BUSINESS") |
topic | String | null | Google News topic when browsing by topic |
search_query | String | null | The search query that produced this result |
full_text | String | null | Full article text (when extractFullText enabled) |
word_count | Integer | null | Word count of extracted text |
estimated_reading_time_min | Integer | null | Estimated reading time in minutes |
article_metadata | Object | null | Open Graph and meta tags from the article page |
scraped_at | String | ISO 8601 extraction timestamp |
source_url | String | Google News feed URL that produced this result |
publisher_url | String | null | Publisher website URL (e.g., "https://www.reuters.com") |
Pricing
This actor uses Pay-Per-Event pricing:
| Event | Price | Description |
|---|---|---|
result | $0.003 | Per article with headline, source, snippet, and metadata |
full-text-result | $0.010 | Per article with full text, author, word count, and reading time |
Examples:
- 100 articles (headlines only) = $0.30
- 100 articles (with full text) = $1.00
- 1,000 articles (headlines only) = $3.00
- 1,000 articles (with full text) = $10.00
Tips & Limitations
- Start without
extractFullTextfor fast headline-only results, then enable it when you need article body text - Google News RSS feeds return up to ~100 articles per search query; use multiple queries or topics for broader coverage
- Time period filtering only applies to keyword searches, not topic pages
- For best results with AI/RAG pipelines, enable
extractFullTextto get clean article text - The
decodeUrlsoption resolves Google News redirect URLs — keep it enabled for original article links - Use Apify Proxy with residential rotation if you encounter rate limiting
Integration Examples
API
curl -X POST "https://api.apify.com/v2/acts/USERNAME~google-news-scraper/runs?token=YOUR_TOKEN" \-H "Content-Type: application/json" \-d '{"searchQueries": ["AI technology", "climate change"], "maxResults": 50, "extractFullText": true}'
Schedule
Set up scheduled runs via Apify Console for continuous news monitoring and brand tracking.
MCP Integration
This actor is optimized for AI agents via the Apify MCP server. Semantic field names and rich dataset schemas make it easy for LLMs to understand and use the output.