Google News Scraper & RSS URL Extractor avatar

Google News Scraper & RSS URL Extractor

Pricing

Pay per event

Go to Apify Store
Google News Scraper & RSS URL Extractor

Google News Scraper & RSS URL Extractor

Google News scraper that queries the public RSS feed for fresh news headlines and article URLs. Localized scoping, dedupe across queries, and deterministic article URL extraction for downstream NLP.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

1

Bookmarked

5

Total users

3

Monthly active users

9 days ago

Last modified

Share

📰 Google News Scraper

Build robust content discovery pipelines by extracting fresh article URLs and metadata straight from Google News. This actor serves as a highly efficient discovery surface, querying Google News RSS to find the latest articles matching your target keywords. It is explicitly designed for data engineers and developers who need to feed news links into downstream processing tools, such as an Article Content Extractor or LLM ingestion scripts.

Instead of scraping entire news sites blindly, use this tool to discover highly relevant, localized content. You can configure the scraper to pull results for specific regions and languages, making it ideal for global topic monitoring or localized sentiment analysis. The built-in deduplication engine ensures that even if you run dozens of overlapping queries, your final dataset contains only unique article URLs.

Every run delivers structured records containing the target article URL, headline, publisher identity, and precise publication timestamp. This makes it incredibly easy to schedule weekly topic digests, track industry trends over time, or gather a continuous stream of training data for AI models. Stop wrestling with complex news APIs and use this fast, query-based scraper to fuel your data workflows.

Store Quickstart

  • Start with Quickstart (company news) for a reliable first run.
  • Use Brand Monitoring to track multiple companies or themes.
  • Use Google News → Article Cleanup when your next step is article extraction.

Where this actor fits

SurfaceBest for
Google News ScraperDiscover current article URLs by query
Article Content ExtractorClean the discovered article/news/blog pages
Website Content ExtractorClean discovered non-article pages
RSS Feed AggregatorDiscover fresh URLs from known publishers and blogs

Key Features

  • 🔎 Query-based discovery — Pull article URLs from Google News RSS without a paid API
  • 🌍 Localized results — Tune by language and country
  • 🔄 Deduplication — Remove duplicate URLs across multiple queries
  • 📰 Publisher context — Keep headline, source, description, and publish date
  • Fast feeder step — Lightweight discovery before deeper extraction

Use Cases

WhoWhy
PR teamsFind the latest media mentions to hand off for cleanup
Competitive intelligenceBuild newsroom watchlists from search queries
Content opsDiscover trending stories before enrichment
AI / RAG teamsCreate a steady article URL feed for downstream extraction

Input

FieldTypeDefaultDescription
queriesstring[]requiredSearch queries (max 50)
languagestringenGoogle News language code
countrystringUSGoogle News country code
maxItemsinteger25Max articles per query
deduplicatebooleantrueRemove duplicate links across queries
timeoutMsinteger15000Request timeout
deliverystringdatasetdataset or webhook
webhookUrlstringWebhook target when delivery=webhook
dryRunbooleanfalseRun without saving

Input Example

{
"queries": ["OpenAI", "Google AI"],
"language": "en",
"country": "US",
"maxItems": 20,
"deduplicate": true
}

Input Examples

Example: Daily tech news in English

{
"queries": [
"AI safety",
"open source"
],
"language": "en",
"country": "US",
"maxItemsPerQuery": 30
}

Example: Localized news (JP)

{
"queries": [
"人工知能"
],
"language": "ja",
"country": "JP",
"maxItemsPerQuery": 50
}

Example: Multi-keyword dedupe run

{
"queries": [
"climate",
"renewable energy"
],
"language": "en",
"country": "US",
"maxItemsPerQuery": 40,
"dedupeAcrossQueries": true
}

Output

FieldTypeDescription
titlestringArticle headline
linkstringDirect article URL for downstream cleanup
sourcestringPublisher name
pubDatestringOriginal RSS publish date
pubDateISOstringISO timestamp version of pubDate
descriptionstringShort Google News snippet
querystringSearch query that surfaced the row

Output Example

{
"title": "Codex for (almost) everything",
"link": "https://openai.com/index/codex-for-almost-everything",
"source": "OpenAI",
"pubDate": "Thu, 16 Apr 2026 10:00:00 GMT",
"pubDateISO": "2026-04-16T10:00:00.000Z",
"description": "The updated Codex app for macOS and Windows adds computer use...",
"query": "OpenAI"
}

First-run buyer experience

  1. Run Quickstart (company news).
  2. Confirm the dataset shows real article URLs, not generic homepages.
  3. Pick the top URLs and send them to Article Content Extractor.
  4. If a discovered URL is actually a docs/product/policy page, clean it with Website Content Extractor instead.

Tips & Limitations

  • Use broad queries for the first run; refine later.
  • RSS is a discovery layer only — it does not return full article bodies.
  • Combine multiple narrower queries instead of one overloaded boolean query when relevance matters.

FAQ

Can I get full article text here?

No. This actor discovers URLs and returns metadata only. Use Article Content Extractor for full content.

Why use this instead of scraping the Google News UI?

The RSS surface is lighter, more stable, and better suited for recurring discovery runs.

Can I schedule recurring news monitoring?

Yes — run it on a schedule, then pass the discovered URLs into an article-cleanup step.

Content Intelligence Pack handoffs:

Cost

Pay Per Event:

  • actor-start: $0.01
  • dataset-item: $0.003 per output item

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store.