📰 Extract Google News Articles — AI & RAG Ready avatar

📰 Extract Google News Articles — AI & RAG Ready

Pricing

from $7.00 / 1,000 results

Go to Apify Store
📰 Extract Google News Articles — AI & RAG Ready

📰 Extract Google News Articles — AI & RAG Ready

Extract Google News articles by keyword, topic, or URL with full-text extraction for AI/RAG pipelines. Get headlines, sources, snippets, images, authors, and clean article text in structured JSON. Export scraped data, run the scraper via API, or integrate with other tools.

Pricing

from $7.00 / 1,000 results

Rating

0.0

(0)

Developer

Muhammad Afzal

Muhammad Afzal

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

📰 Google News Scraper — AI & RAG Ready

Extract Google News articles by keyword, topic, or URL with full-text extraction optimized for AI and RAG pipelines. Get headlines, sources, snippets, images, authors, and clean article text in structured JSON. Perfect for sentiment analysis, brand monitoring, competitive intelligence, and media tracking.

Features

  • Multi-input mode — Search by keyword, browse by topic, or provide direct Google News URLs
  • Full-text extraction — Visit article pages and extract clean text without ads, navigation, or boilerplate
  • AI/RAG-ready output — Semantic field names, structured metadata, word count, and reading time estimates
  • 25+ countries & 17 languages — Localized results for global news coverage
  • Time filtering — Filter by last hour, day, week, month, or year
  • URL decoding — Automatically resolves Google News redirect URLs to original article links
  • RSS-first approach — Uses Google News RSS feeds for reliable, fast data extraction
  • Article metadata — Extracts Open Graph tags, author, keywords, and description from publisher pages

Use Cases

  • RAG pipelines & LLM training — Feed clean article text into vector databases and language models
  • Brand monitoring & PR tracking — Track mentions across thousands of news sources in real time
  • Sentiment analysis — Collect structured news data for NLP and sentiment models
  • Competitive intelligence — Monitor competitors, market trends, and industry developments
  • Media aggregation — Build custom news feeds combining topics, keywords, and regions
  • Content curation — Discover trending stories and curate content for newsletters and blogs

Input

ParameterTypeDefaultDescription
searchQueriesArray of strings["technology"]Keyword searches. Supports Google operators (OR, -, "exact", site:).
topicsArray of strings[]News topics: WORLD, NATION, BUSINESS, TECHNOLOGY, ENTERTAINMENT, SPORTS, SCIENCE, HEALTH
startUrlsArray of URLs[]Direct Google News URLs to scrape
maxResultsInteger50Maximum articles per search query or topic (1–5000)
countryStringUSCountry code for localized results (25+ countries)
languageStringenLanguage code (17 languages supported)
timePeriodString1dTime filter: 1h, 1d, 7d, 30d, 1y, all
extractFullTextBooleanfalseExtract full article text from publisher pages (slower, essential for AI/RAG)
decodeUrlsBooleantrueDecode Google redirect URLs to original article links
proxyConfigurationObjectApify ProxyProxy settings for scraping

Output

Each article contains:

FieldTypeDescription
titleStringArticle headline
urlString | nullDecoded original article URL (when decodeUrls enabled)
sourceString | nullPublisher name (e.g., "BBC News", "Reuters")
publisher_urlString | nullPublisher website URL (e.g., "https://www.reuters.com")
authorString | nullArticle author (when extractFullText enabled)
published_atString | nullPublication timestamp (ISO 8601)
snippetString | nullArticle summary from Google News
image_urlString | nullArticle thumbnail image URL
categoryString | nullNews category (e.g., "TECHNOLOGY", "BUSINESS")
topicString | nullGoogle News topic when browsing by topic
search_queryString | nullThe search query that produced this result
full_textString | nullFull article text (when extractFullText enabled)
word_countInteger | nullWord count of extracted text
estimated_reading_time_minInteger | nullEstimated reading time in minutes
article_metadataObject | nullOpen Graph and meta tags from the article page
scraped_atStringISO 8601 extraction timestamp
source_urlStringGoogle News feed URL that produced this result
publisher_urlString | nullPublisher website URL (e.g., "https://www.reuters.com")

Pricing

This actor uses Pay-Per-Event pricing:

EventPriceDescription
result$0.003Per article with headline, source, snippet, and metadata
full-text-result$0.010Per article with full text, author, word count, and reading time

Examples:

  • 100 articles (headlines only) = $0.30
  • 100 articles (with full text) = $1.00
  • 1,000 articles (headlines only) = $3.00
  • 1,000 articles (with full text) = $10.00

Tips & Limitations

  • Start without extractFullText for fast headline-only results, then enable it when you need article body text
  • Google News RSS feeds return up to ~100 articles per search query; use multiple queries or topics for broader coverage
  • Time period filtering only applies to keyword searches, not topic pages
  • For best results with AI/RAG pipelines, enable extractFullText to get clean article text
  • The decodeUrls option resolves Google News redirect URLs — keep it enabled for original article links
  • Use Apify Proxy with residential rotation if you encounter rate limiting

Integration Examples

API

curl -X POST "https://api.apify.com/v2/acts/USERNAME~google-news-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{"searchQueries": ["AI technology", "climate change"], "maxResults": 50, "extractFullText": true}'

Schedule

Set up scheduled runs via Apify Console for continuous news monitoring and brand tracking.

MCP Integration

This actor is optimized for AI agents via the Apify MCP server. Semantic field names and rich dataset schemas make it easy for LLMs to understand and use the output.