# Google News Scraper — Headlines, Sources, URLs (`scrapeify/google-news-scraper`) Actor

Turn any Google News query into a deduplicated dataset of up to 2,000 articles: titles, sources, dates, RSS links, resolved publisher URLs, clean snippets. Multiple RSS time-window passes for depth beyond single-feed limits. Excel-ready CSV. No API key. Not affiliated with Google.

- **URL**: https://apify.com/scrapeify/google-news-scraper.md
- **Developed by:** [Scrapeify](https://apify.com/scrapeify) (community)
- **Categories:** News, SEO tools, Jobs
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $20.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Google News RSS Scraper — Structured Headlines, Sources & Article URLs (Up to 2,000)

Turn any **Google News** search query into a **deduplicated, structured dataset** of headlines, publisher names, publication timestamps, RSS links, and resolved article URLs — without a Google API key or a headless browser. The Scrapeify Google News Scraper issues multiple RSS passes across time-window phases to overcome single-feed size limits, merges and deduplicates results across passes, and exports to a Dataset, `RESULTS_CSV` (Excel-friendly UTF-8 BOM), `RESULTS_JSON`, and a run `OUTPUT` summary.

Built for media monitoring teams, competitive intelligence analysts, AI content pipelines, and search visibility researchers who need repeatable, structured coverage of any news topic at scale.

---

### Features

| Capability | Detail |
|---|---|
| **RSS-first architecture** | HTTP fetches to `news.google.com/rss/search` — lightweight, no browser required |
| **Multi-phase coverage** | Multiple `when` passes (1h, 7d, 30d, 1y) to approximate depth beyond single-feed limits |
| **Deduplication** | Merges results across phases using stable RSS identifiers and normalized URLs |
| **Clean text fields** | HTML stripped from descriptions for downstream NLP and embedding workflows |
| **Canonical URL resolution** | Parses Google redirect parameters to surface publisher `articleUrl` where available |
| **429 / 5xx retry logic** | Bounded retry attempts with backoff for transient Google RSS errors |
| **Up to 2,000 articles** | Per-run cap with input validation; dedup stats in `OUTPUT` |
| **Structured columns** | `position`, `keyword`, `title`, `link`, `articleUrl`, `pubDate`, `sourceName`, `description` |
| **Excel-ready CSV** | `RESULTS_CSV` with UTF-8 BOM and quoted fields for Windows compatibility |
| **Input flexibility** | Aliases: `query`, `searchQuery`, `q` for keyword; `maxResults` for `numberOfResults` |

---

### Use Cases

#### Media Monitoring & Press Tracking
Track news coverage for brand names, executives, products, or regulatory topics. Schedule hourly or daily runs and diff new `link` values since the previous run to surface breaking coverage before competitors do.

#### Competitive Intelligence
Monitor rival company and product news. Identify PR campaigns, product launches, partnership announcements, and negative press. Build a structured archive of competitor mentions for strategic planning.

#### SEO & Search Visibility Research
Map which publishers and articles rank in Google News for your target keywords. Identify content gaps, measure your brand's News presence, and track competitors' earned media performance over time.

#### AI Content Pipeline (Stage 1 Retrieval)
Use as Stage 1 of a retrieval stack: headlines + snippets cheaply triage topic relevance → LLMs decide which URLs warrant full article fetching and chunking → agents post summaries to ticketing or Slack.

#### RAG Knowledge Base Construction
Feed `title` + `description` + `articleUrl` into embedding pipelines. Store with `keyword` and `sourceName` metadata for semantic retrieval. Enable AI-generated answers with cited, timestamped news sources.

#### Industry Trend Analysis
Aggregate `sourceName` distributions and publication cadence for any keyword over time. Identify which outlets cover a topic most frequently, which publishers are emerging voices, and how news volume correlates with market events.

#### E-Commerce & Brand Intelligence
Track product recalls, supply chain disruptions, competitor product launches, and category news that affects purchasing decisions. Combine with Amazon Scraper data for comprehensive market intelligence.

#### Automation & Alert Pipelines
Trigger Apify runs on a cron schedule. Diff against previous dataset by `link` or `articleUrl`. Push new articles to Slack, email, or a ticketing system automatically.

#### Data Aggregation & Multi-Source Research
Combine Google News results with Google Maps, Amazon, and Meta Ad Library actor outputs for comprehensive multi-source dossiers on brands, markets, or topics.

#### Academic & Policy Research
Track news coverage of policy topics, scientific developments, or public health issues at scale. Export to CSV for corpus analysis, NLP research, or data journalism workflows.

---

### Why Choose This Actor

- **Lightweight and cost-efficient** — HTTP-only; no browser fleet; suitable for high-frequency scheduling
- **Deduplication built in** — fewer duplicate rows than naive single-RSS pulls
- **Production outputs** — Dataset + CSV + JSON keys fit ETL, BI, and client-reporting workflows
- **Cloud-native** — Apify standard Dataset and Key-value store semantics with scheduling and webhooks
- **Automation-ready** — identical input contract across Console, REST API, and SDK clients

---

### Quick Start

1. Open the Scrapeify **Google News Scraper** on Apify Console.
2. Enter a **`keyword`** (e.g. `renewable energy policy`) and set **`numberOfResults`** (e.g. `500`).
3. Click **Start** and wait for completion (typically seconds to low minutes).
4. Export the **Dataset** as JSON or CSV, or download **RESULTS_CSV** from Storage → Key-value store.

> **Tip:** Start with `numberOfResults: 50` to validate keyword coverage before scaling to the 2,000-article limit.

---

### Input Schema

```json
{
  "keyword": "semiconductor supply chain",
  "numberOfResults": 500
}
````

| Field | Type | Required | Description |
|---|---|---|---|
| `keyword` | string | Yes | News search phrase. Aliases: `query`, `searchQuery`, `q`. Supports operators (quotes, site:, etc.) |
| `numberOfResults` | integer | Yes | Unique articles to collect (1–2,000). Alias: `maxResults` |

***

### Output Schema

#### Dataset Row (one row per article)

```json
{
  "position": 1,
  "keyword": "semiconductor supply chain",
  "title": "Fab expansion slows as equipment backlog extends into 2027",
  "link": "https://news.google.com/rss/articles/CBMiXGh0dHBzOi8vd3d3LmV4YW1wbGUuY29tL3RlY2gvZmFiLWRlbGF5cw...",
  "articleUrl": "https://www.example.com/tech/fab-delays",
  "pubDate": "Wed, 07 May 2026 08:15:00 GMT",
  "sourceName": "TechCrunch",
  "description": "Equipment vendors report extended lead times for EUV modules as chipmakers compete for capacity at advanced nodes."
}
```

| Field | Type | Description |
|---|---|---|
| `position` | integer | Deduped result position (1-based) |
| `keyword` | string | Input keyword echoed on every row for joins and audits |
| `title` | string | Article headline |
| `link` | string | Google News RSS link (use as stable identifier) |
| `articleUrl` | string | Resolved publisher URL when available; `null` if redirect omitted |
| `pubDate` | string | Publication date in RSS format |
| `sourceName` | string | Publisher name |
| `description` | string | Article snippet with HTML stripped |

> **Note:** `articleUrl` resolves the Google redirect to the original publisher URL when redirect parameters are present. Use `link` as the stable dedup key; `articleUrl` as the citation URL for downstream crawling.

#### Run Summary (`OUTPUT` key in default KV store)

```json
{
  "ok": true,
  "keyword": "semiconductor supply chain",
  "numberOfResults": 500,
  "returnedCount": 487,
  "meta": {
    "stoppedReason": "target_reached",
    "passesCompleted": 4,
    "totalFetched": 512,
    "uniqueAfterDedupe": 487
  },
  "scrapedAt": "2026-05-07T04:00:00.000Z",
  "download": {
    "dataset": "Export as CSV/JSON from Dataset tab",
    "keyValueStore": "RESULTS_CSV = Excel-friendly CSV (UTF-8 BOM, quoted fields)"
  },
  "csv": null,
  "note": "CSV too large to embed inline; use RESULTS_CSV key."
}
```

| Field | Type | Description |
|---|---|---|
| `ok` | boolean | `true` if articles were returned; `false` on error or empty |
| `returnedCount` | integer | Unique articles after deduplication |
| `meta.stoppedReason` | string | `target_reached`, `exhausted`, or error descriptor |
| `meta.passesCompleted` | integer | Number of RSS phase passes completed |
| `meta.uniqueAfterDedupe` | integer | Articles remaining after cross-phase dedup |
| `csv` | string/null | Embedded CSV string when small enough; else `null` |

**Additional KV keys:** `RESULTS_CSV` (full CSV, UTF-8 BOM), `RESULTS_JSON` (full JSON array).

***

### API Examples

#### cURL

```bash
curl "https://api.apify.com/v2/acts/scrapeify~google-news-scraper/runs?token=$APIFY_TOKEN" \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "keyword": "climate policy",
    "numberOfResults": 250
  }'
```

#### Python

```python
import os
from apify_client import ApifyClient

client = ApifyClient(os.environ["APIFY_TOKEN"])

run = client.actor("scrapeify/google-news-scraper").call(
    run_input={"keyword": "climate policy", "numberOfResults": 250}
)

for article in client.dataset(run["defaultDatasetId"]).iterate_items():
    url = article.get("articleUrl") or article["link"]
    print(article["title"], article["sourceName"], url)
```

#### JavaScript / Node.js

```javascript
import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

const run = await client.actor("scrapeify/google-news-scraper").call({
  keyword: "climate policy",
  numberOfResults: 250,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Collected ${items.length} unique articles from ${new Set(items.map(a => a.sourceName)).size} publishers`);
```

***

### Integration Examples

#### ChatGPT / Custom GPT Actions

Register the Apify run endpoint as a Custom GPT action. Return `title`, `sourceName`, `pubDate`, and `articleUrl` as a JSON array. The model can summarize recent coverage, identify trends, or answer questions grounded in actual news articles.

#### Claude Tool Use

```python
from langchain.tools import tool

@tool
def get_recent_news(keyword: str, n: int = 100) -> list:
    """Fetch recent Google News articles for a keyword. Returns structured article data."""
    run = client.actor("scrapeify/google-news-scraper").call(
        run_input={"keyword": keyword, "numberOfResults": n}
    )
    return client.dataset(run["defaultDatasetId"]).list_items().items
```

Pass the structured list to Claude for summarization, entity extraction, or sentiment analysis with `articleUrl` citations.

#### Gemini

Fetch 500+ article headlines and snippets → pass to Gemini's long-context window → generate a comprehensive topic briefing with source attribution and emerging narrative threads.

#### LangChain

```python
from langchain.tools import tool
from langchain.text_splitter import RecursiveCharacterTextSplitter

@tool
def fetch_news_corpus(keyword: str, n: int) -> list:
    """Search Google News and return article data for RAG ingestion."""
    run = client.actor("scrapeify/google-news-scraper").call(
        run_input={"keyword": keyword, "numberOfResults": n}
    )
    return client.dataset(run["defaultDatasetId"]).list_items().items

## Use as a retriever tool in a ConversationalRetrievalChain
```

#### CrewAI

`NewsResearchAgent` fetches articles with this tool. `AnalysisAgent` identifies key themes and entities. `WritingAgent` drafts a briefing document with source citations and publication dates.

#### AutoGen

```python
## UserProxyAgent: "Summarize the last 100 news articles about EV battery technology"
## ResearchAgent: calls google_news_scraper tool → returns structured JSON
## SynthesisAgent: extracts key claims, publisher perspectives, and publication timeline
```

#### n8n / Make.com / Zapier

Cron trigger → Apify run → iterate Dataset items → filter for new `link` values since last run → push to Slack digest, Notion page, or HubSpot deal activity feed.

#### RAG Systems

```python
## 1. Fetch articles
articles = get_recent_news("renewable energy", n=500)

## 2. Create documents for vector store
from langchain.schema import Document
docs = [
    Document(
        page_content=f"{a['title']}. {a['description']}",
        metadata={"url": a.get("articleUrl") or a["link"],
                  "source": a["sourceName"],
                  "date": a["pubDate"]}
    )
    for a in articles
]

## 3. Embed and index
vectorstore.add_documents(docs)
```

<details>
<summary><strong>Minimal LangChain news retriever pattern</strong></summary>

```python
def build_news_retriever(keyword: str, n: int, vectorstore):
    run = client.actor("scrapeify/google-news-scraper").call(
        run_input={"keyword": keyword, "numberOfResults": n}
    )
    articles = client.dataset(run["defaultDatasetId"]).list_items().items
    docs = [Document(page_content=a["title"] + ". " + a["description"],
                     metadata={"url": a.get("articleUrl"), "source": a["sourceName"]})
            for a in articles]
    vectorstore.add_documents(docs)
    return vectorstore.as_retriever()
```

</details>

***

### Frequently Asked Questions

**1. Do I need a Google API key or Google Cloud account?**
No. The actor fetches public RSS endpoints from `news.google.com` — no API credentials required.

**2. Why do I sometimes get fewer articles than requested?**
There may not be enough distinct articles across RSS phases for the keyword. Inspect `meta.uniqueAfterDedupe` and `meta.stoppedReason` in `OUTPUT`.

**3. When is `articleUrl` null?**
Some Google News RSS entries don't include redirect parameters that allow URL resolution. Fall back to `link` for stable identification.

**4. How does deduplication work across phases?**
The actor tracks stable RSS identifiers and normalized URLs across all passes. Articles seen in multiple time-window phases are merged into a single row.

**5. Can I search by country or language?**
The current implementation uses default `hl` and `gl` parameters. Fork the actor for specific `ceid` locale pairs (e.g. `ceid=GB%3Aen` for UK English).

**6. Is full article text included?**
No — only RSS fields: title, snippet, source, date, and URL. Crawl `articleUrl` with a separate article fetcher to retrieve full text.

**7. How fast are runs typically?**
Seconds to low minutes depending on `numberOfResults` and Google RSS response times.

**8. How does the actor handle 429 rate limiting?**
Bounded retry attempts with backoff. Avoid launching excessive parallel runs from a single IP for the same keyword.

**9. Does `RESULTS_CSV` open correctly in Excel?**
Yes — `RESULTS_CSV` uses UTF-8 BOM encoding and quoted fields for Windows Excel compatibility.

**10. Can I schedule hourly monitoring runs?**
Yes — use Apify Schedules combined with webhooks to your notification stack.

**11. Are publication dates reliable?**
`pubDate` reflects what the RSS feed reports. Some publishers use the crawl date rather than original publication date.

**12. Can I combine results with other Scrapeify actors?**
Yes — join Google News results with Maps, Amazon, or Ad Library actor outputs in your data warehouse by keyword or entity.

**13. What input aliases are supported?**
`query`, `searchQuery`, `q` for the keyword; `maxResults` for `numberOfResults`.

**14. What causes an empty dataset with error rows?**
Check `message` in pushed error items and `OUTPUT.ok` for details. Common causes: empty keyword, Google temporarily blocking the IP, or zero-result queries.

**15. Can I use this for real-time news alerts?**
Hourly runs are practical. For sub-minute latency, a dedicated news API is more appropriate.

**16. How do I ingest into a vector database?**
Use `title` + `description` as the text content. Store `articleUrl`, `sourceName`, `keyword`, and `pubDate` as metadata for filtering and citation.

**17. What is the difference between `link` and `articleUrl`?**
`link` is the Google News RSS URL — use as the stable dedup key. `articleUrl` is the resolved publisher URL — use as the citation link for downstream crawling and user-facing references.

**18. Can I track which publishers cover a topic most?**
Yes — aggregate `sourceName` values across Dataset rows. Sort by frequency to rank publishers by topic coverage volume.

**19. Does the actor support Google Alerts-style monitoring?**
This actor provides structured rows for programmatic pipelines. For email digests, Google Alerts is a simpler option. For database-integrated monitoring and downstream automation, this actor is the better choice.

**20. Is there an upper limit per keyword per run?**
Yes — 2,000 unique articles per run (input validation). For broader coverage, run multiple passes across overlapping time windows with different `when` parameters.

**21. How should I handle GDPR for article data?**
Headlines and snippets may mention individuals. Apply your organization's data retention and classification policies to stored news corpora.

**22. Can I retrieve articles from specific publishers?**
Add `site:publisher.com` to the keyword query to target a specific domain in Google News search.

**23. What is `meta.passesCompleted`?**
The number of RSS phase passes the actor completed (e.g. 1h, 7d, 30d, 1y windows). More passes generally yield broader coverage.

**24. Does this include paywalled articles?**
Only metadata (title, snippet, source, URL) is collected from RSS — no paywall bypass. Full text requires a separate article fetcher.

**25. How do I build an idempotent monitoring pipeline?**
Key on `link` or normalized `articleUrl` before inserting into your database. Compare new `link` sets against the previous run to identify net-new coverage.

***

### Best Practices

- **Stagger schedules** — don't hammer RSS from many simultaneous tasks on one egress IP
- **Key on `link`** for idempotent pipelines before inserting into Postgres or vector stores
- **Rate-limit downstream crawling** — respect `robots.txt` and publisher terms when fetching full article text from `articleUrl`
- **Start small** — validate with `numberOfResults: 50` before scaling to 2,000
- **Monitor `returnedCount` trends** — alert on significant drops week-over-week for fixed keywords
- **Archive `RESULTS_JSON`** alongside `OUTPUT` for each scheduled run to enable historical diff analysis
- **Use `keyword` column for joins** — it's echoed on every row, making multi-keyword batch pipelines easy to merge

***

### Performance & Scalability

| Factor | Guidance |
|---|---|
| **Throughput** | HTTP-only; highly efficient for high-frequency scheduling |
| **Upper bound** | 2,000 deduplicated articles per run |
| **Run time** | Seconds to low minutes depending on RSS response latency and `numberOfResults` |
| **Horizontal scale** | Run parallel actors per keyword list — each is independent |
| **Storage** | Dataset is authoritative; `RESULTS_CSV` and `RESULTS_JSON` may be limited by KV size for large runs |

***

### AI & Automation Workflows

**3-stage retrieval pipeline:**

1. **Stage 1 (this actor):** headlines + snippets cheaply triage topic relevance
2. **Stage 2 (article fetcher):** crawl `articleUrl` for full text on relevant articles
3. **Stage 3 (LLM):** chunk, embed, and index full text; generate answers with `articleUrl` citations

**Competitive briefing automation:**
Schedule weekly Google News runs for competitor brand names → extract key themes from titles and snippets using an LLM → generate competitive intelligence brief → post to Confluence or Notion.

**Trend detection pipeline:**
Daily runs for industry keywords → aggregate `pubDate` distribution → detect volume spikes indicating major news events → alert stakeholders before the news cycle peaks.

***

### Error Handling

| Scenario | Behavior |
|---|---|
| Missing or empty keyword | Error row in Dataset + `OUTPUT.ok: false` |
| Empty results | Completes with `returnedCount = 0`; `meta.stoppedReason = exhausted` |
| 429 rate limiting | Bounded retries with backoff; persistent failures surface in run logs |
| KV size limits | `csv` field in OUTPUT set to `null`; use `RESULTS_CSV` KV key or Dataset export |
| Transient HTTP errors | Retried per module constants; logged if persistent |

***

### Trust & Reliability

Scrapeify maintains this actor for **repeatable news monitoring** with structured outputs, explicit dedup statistics, and clear storage keys — suitable for production automation when combined with appropriate compliance review and downstream content policies.

***

### Related Scrapeify Actors

Explore the full Scrapeify suite — chain these actors together for end-to-end automation pipelines:

| Actor | What it does |
|---|---|
| [Amazon Scraper](https://apify.com/scrapeify/amazon-scraper) | ASINs, prices, sponsored flags across 23 marketplaces |
| [Instagram Ad Library Scraper](https://apify.com/scrapeify/instagram-ad-library-scraper) | Instagram-only ads from Meta Ad Library |
| [Meta Ad Library Scraper](https://apify.com/scrapeify/meta-ad-library-scraper) | Facebook & Instagram ads with sort options |
| [WhatsApp Ad Scraper](https://apify.com/scrapeify/whatsapp-ad-scraper) | Click-to-WhatsApp ad creatives |
| [YouTube Video Downloader](https://apify.com/scrapeify/youtube-video-downloader) | Videos & audio to Apify Key-Value Store |
| [Meta Brand & Page ID Finder](https://apify.com/scrapeify/facebook-page-id-finder) | Resolve brand names to numeric Page IDs |
| [Google Maps Scraper](https://apify.com/scrapeify/google-maps-scraper) | Local business leads, reviews, emails, contacts |

***

*Google News is a trademark of Google LLC. This actor is not affiliated with or endorsed by Google.*

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@type": "SoftwareApplication",
      "name": "Scrapeify Google News Scraper",
      "applicationCategory": "DeveloperApplication",
      "applicationSubCategory": "News & Media Monitoring API",
      "operatingSystem": "Cloud (Apify Platform)",
      "description": "Turn any Google News search query into a deduplicated dataset of up to 2,000 articles: headlines, publishers, publication dates, RSS links, resolved article URLs, and clean snippets. Multi-phase RSS fetching for depth beyond single-feed limits. No browser, no API key.",
      "url": "https://apify.com/scrapeify/google-news-scraper",
      "featureList": [
        "RSS-first lightweight HTTP architecture",
        "Multi-phase fetching for deeper coverage",
        "Automatic deduplication across passes",
        "Up to 2,000 unique articles per run",
        "Excel-ready CSV with UTF-8 BOM",
        "Canonical publisher URL resolution"
      ],
      "offers": {
        "@type": "Offer",
        "category": "SaaS"
      },
      "publisher": {
        "@type": "Organization",
        "name": "Scrapeify"
      }
    },
    {
      "@type": "FAQPage",
      "mainEntity": [
        {
          "@type": "Question",
          "name": "Do I need a Google API key or Google Cloud account?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "No. The actor fetches public RSS endpoints from news.google.com — no API credentials required."
          }
        },
        {
          "@type": "Question",
          "name": "What is the maximum number of articles per run?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "2,000 unique deduplicated articles per run, enforced by input validation. Multiple RSS phase passes are used to reach higher counts."
          }
        },
        {
          "@type": "Question",
          "name": "When is articleUrl null in the output?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Some Google News RSS entries don't include redirect parameters that allow URL resolution. Fall back to the link field for stable identification."
          }
        },
        {
          "@type": "Question",
          "name": "Is full article text included?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "No — only RSS fields: title, snippet, source, date, and URL. Crawl articleUrl with a separate article fetcher to retrieve full text."
          }
        },
        {
          "@type": "Question",
          "name": "Can I schedule hourly news monitoring runs?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "Yes — use Apify Schedules combined with webhooks to your notification stack. Diff new link values between runs to detect breaking coverage."
          }
        },
        {
          "@type": "Question",
          "name": "How does deduplication work across RSS phases?",
          "acceptedAnswer": {
            "@type": "Answer",
            "text": "The actor tracks stable RSS identifiers and normalized URLs across all passes. Articles seen in multiple time-window phases are merged into a single row."
          }
        }
      ]
    }
  ]
}
</script>

# Actor input Schema

## `keyword` (type: `string`):

What to search on Google News (e.g. climate, Apple stock, election).

## `numberOfResults` (type: `integer`):

How many unique articles to collect (1–2000). Multiple RSS passes are used automatically when needed.

## Actor input object example

```json
{
  "keyword": "technology",
  "numberOfResults": 50
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapeify/google-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("scrapeify/google-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call scrapeify/google-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapeify/google-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Google News Scraper — Headlines, Sources, URLs",
        "description": "Turn any Google News query into a deduplicated dataset of up to 2,000 articles: titles, sources, dates, RSS links, resolved publisher URLs, clean snippets. Multiple RSS time-window passes for depth beyond single-feed limits. Excel-ready CSV. No API key. Not affiliated with Google.",
        "version": "0.2",
        "x-build-id": "d8CgBZBD9eRKxAOKp"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapeify~google-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapeify-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapeify~google-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-scrapeify-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapeify~google-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-scrapeify-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "keyword",
                    "numberOfResults"
                ],
                "properties": {
                    "keyword": {
                        "title": "Keyword",
                        "type": "string",
                        "description": "What to search on Google News (e.g. climate, Apple stock, election).",
                        "default": "technology"
                    },
                    "numberOfResults": {
                        "title": "Number of results",
                        "minimum": 1,
                        "maximum": 2000,
                        "type": "integer",
                        "description": "How many unique articles to collect (1–2000). Multiple RSS passes are used automatically when needed.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
