# Google News Scraper (`dami_studio/google-news-scraper`) Actor

Search Google News by keyword or topic and get clean, structured articles with the REAL publisher URL (not Google's redirect), source, date, and snippet. Optional full article text and AI summary + sentiment. No key, no login.

- **URL**: https://apify.com/dami\_studio/google-news-scraper.md
- **Developed by:** [Dami's Studio](https://apify.com/dami_studio) (community)
- **Categories:** News, Integrations, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $3.00 / 1,000 article returneds

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Google News Scraper

Search Google News by keyword or topic and get back clean, structured articles — with the **real publisher URL** instead of Google's useless redirect link. No API key, no login.

Most Google News scrapers hand you `https://news.google.com/rss/articles/CBMi...` links that you then have to figure out how to open. This one decodes them to the actual `reuters.com/...` (or wherever) URL, and falls back gracefully (with `urlResolved: false`) when a link genuinely can't be decoded — instead of silently dropping the article.

### What you get per article

`title`, `url` (resolved publisher link), `urlResolved`, `source`, `sourceUrl` (publisher homepage), `publishedAt` (ISO), `snippet`, and the original `googleUrl`. Turn on extras for `articleText` (full body) and `aiSummary` + `sentiment`.

#### Fields that can be null

- `url` / `urlResolved` — URL resolution is best-effort. When a Google link can't be decoded, `url` stays the Google redirect and `urlResolved` is `false`. Check `urlResolved` to know which you got.
- `sourceUrl`, `source`, `publishedAt`, `snippet` — null when Google's feed doesn't include that field for an item.
- `articleText` — only present when **Fetch full article text** is on AND extraction succeeded. Some sites block scraping or have no extractable body; those rows come back with `articleText: null`, are flagged `ok: false`, and are **not charged**.
- `aiSummary` / `sentiment` — only present when **AI summary** is on AND the OpenAI call succeeded; otherwise omitted/null for that row.

### Input

| Field | Notes |
|---|---|
| `query` | Keywords. Supports Google operators, e.g. `tesla OR rivian`, `site:reuters.com`, `intitle:layoffs`. |
| `topic` | Use a topic feed (World, Business, Technology, …) instead of a query. |
| `freshness` | Last hour / 24h / 7d / 30d / year. |
| `language` / `country` | e.g. `en-US` / `US`, `de` / `DE`. |
| `maxItems` | Up to ~100 (Google's per-feed cap). |
| `resolveUrls` | Decode to the real publisher URL. On by default. |
| `fetchArticleText` | Download each article and extract the body. |
| `aiSummary` | 1–2 sentence summary + sentiment (needs your OpenAI key). |

### Output

One dataset row per article. Pricing is pay-per-result: you are only charged for genuine, complete article rows (`ok: true`). Rows we couldn't fully deliver are never charged — this includes:

- empty/invalid input (a single `ok: false` diagnostic row with `errorCode: "BAD_INPUT"`),
- no results for the query/topic (`NO_RESULTS`),
- blocks, rate limits, or network errors (`BLOCKED` / `RATE_LIMITED` / `NETWORK`),
- and, when **Fetch full article text** is on, any article whose body couldn't be extracted (`articleText: null`, flagged `ok: false`).

#### Proxy

Google News RSS is a public, no-auth API with no anti-bot, so **no proxy is required** and the default runs without one (saving proxy credits). Only enable Apify Proxy if you hit IP rate limits at very high volume.

#### Troubleshooting

- Many rows have `urlResolved: false`? Some publishers' Google links can't be decoded; the article still comes back with its source, title, date and the Google link.
- `articleText: null` on several rows? Those sites blocked extraction — they are flagged `ok: false` and were not charged.
- Getting a `BAD_INPUT` row? Provide a `query` or pick a `topic` (and an OpenAI key if `aiSummary` is on).

### Example

```json
{ "query": "openai funding", "freshness": "7d", "language": "en-US", "country": "US", "maxItems": 30, "resolveUrls": true }
````

### Notes

Google News search feeds cap at roughly 100 results per query — split big jobs by keyword, source (`site:`), or freshness window. URL resolution is best-effort: most links decode, and any that don't still come back with the source, title, date and the Google link so nothing is lost.

# Actor input Schema

## `query` (type: `string`):

Keywords to search Google News for (e.g. "electric vehicles", or advanced: "tesla OR rivian site:reuters.com"). Leave empty if using a Topic instead.

## `topic` (type: `string`):

Use a Google News topic feed instead of a search query.

## `freshness` (type: `string`):

Only return articles newer than this (applied to query searches).

## `language` (type: `string`):

Interface/article language code, e.g. en-US, en-GB, fr, de, es-419.

## `country` (type: `string`):

Edition country code, e.g. US, GB, CA, DE, FR, IN.

## `maxItems` (type: `integer`):

Maximum number of articles to return (Google News feeds cap around 100 per query).

## `resolveUrls` (type: `boolean`):

Decode Google's redirect links into the actual article URL (e.g. reuters.com/...). Best-effort; falls back to the Google link with urlResolved=false if a link can't be decoded. Check urlResolved per row to know whether url is the real publisher link or the Google redirect.

## `fetchArticleText` (type: `boolean`):

Download each resolved article and extract its main body text. Slower; requires Resolve URLs. Some sites block scraping or have no extractable body — those rows come back with articleText=null and are flagged ok:false and NOT charged, so you only pay for articles whose text was actually fetched.

## `aiSummary` (type: `boolean`):

Generate a 1-2 sentence summary and a sentiment label per article using OpenAI. Needs an OpenAI API key. Best-effort: if the OpenAI call fails for an article, the aiSummary and sentiment fields are simply omitted/null for that row (the article itself is still returned).

## `openaiApiKey` (type: `string`):

Only needed when AI summary is on. Kept private.

## `notionConnector` (type: `string`):

Optional. Write each article as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless.

## `notionParentId` (type: `string`):

Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead.

## `proxyConfiguration` (type: `object`):

OPTIONAL. Google News RSS is a public, no-auth API with no anti-bot, so no proxy is needed and the default routes traffic directly (saving proxy credits). Only enable Apify Proxy if you hit IP rate limits when running very high volumes.

## Actor input object example

```json
{
  "query": "artificial intelligence",
  "topic": "",
  "freshness": "",
  "language": "en-US",
  "country": "US",
  "maxItems": 50,
  "resolveUrls": true,
  "fetchArticleText": false,
  "aiSummary": false,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `results` (type: `string`):

Scraped rows are stored in the default dataset (one row per result). Blocked/empty/error runs return a single uncharged diagnostic row instead.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "artificial intelligence"
};

// Run the Actor and wait for it to finish
const run = await client.actor("dami_studio/google-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "query": "artificial intelligence" }

# Run the Actor and wait for it to finish
run = client.actor("dami_studio/google-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "artificial intelligence"
}' |
apify call dami_studio/google-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=dami_studio/google-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Google News Scraper",
        "description": "Search Google News by keyword or topic and get clean, structured articles with the REAL publisher URL (not Google's redirect), source, date, and snippet. Optional full article text and AI summary + sentiment. No key, no login.",
        "version": "0.1",
        "x-build-id": "VgMqkguOXCFLG5J3e"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/dami_studio~google-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-dami_studio-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/dami_studio~google-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-dami_studio-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/dami_studio~google-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-dami_studio-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Keywords to search Google News for (e.g. \"electric vehicles\", or advanced: \"tesla OR rivian site:reuters.com\"). Leave empty if using a Topic instead."
                    },
                    "topic": {
                        "title": "Topic (instead of a query)",
                        "enum": [
                            "",
                            "WORLD",
                            "NATION",
                            "BUSINESS",
                            "TECHNOLOGY",
                            "ENTERTAINMENT",
                            "SPORTS",
                            "SCIENCE",
                            "HEALTH"
                        ],
                        "type": "string",
                        "description": "Use a Google News topic feed instead of a search query.",
                        "default": ""
                    },
                    "freshness": {
                        "title": "Freshness",
                        "enum": [
                            "",
                            "1h",
                            "1d",
                            "7d",
                            "30d",
                            "1y"
                        ],
                        "type": "string",
                        "description": "Only return articles newer than this (applied to query searches).",
                        "default": ""
                    },
                    "language": {
                        "title": "Language (hl)",
                        "type": "string",
                        "description": "Interface/article language code, e.g. en-US, en-GB, fr, de, es-419.",
                        "default": "en-US"
                    },
                    "country": {
                        "title": "Country (gl)",
                        "type": "string",
                        "description": "Edition country code, e.g. US, GB, CA, DE, FR, IN.",
                        "default": "US"
                    },
                    "maxItems": {
                        "title": "Max articles",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Maximum number of articles to return (Google News feeds cap around 100 per query).",
                        "default": 50
                    },
                    "resolveUrls": {
                        "title": "Resolve real publisher URLs",
                        "type": "boolean",
                        "description": "Decode Google's redirect links into the actual article URL (e.g. reuters.com/...). Best-effort; falls back to the Google link with urlResolved=false if a link can't be decoded. Check urlResolved per row to know whether url is the real publisher link or the Google redirect.",
                        "default": true
                    },
                    "fetchArticleText": {
                        "title": "Fetch full article text",
                        "type": "boolean",
                        "description": "Download each resolved article and extract its main body text. Slower; requires Resolve URLs. Some sites block scraping or have no extractable body — those rows come back with articleText=null and are flagged ok:false and NOT charged, so you only pay for articles whose text was actually fetched.",
                        "default": false
                    },
                    "aiSummary": {
                        "title": "AI summary + sentiment",
                        "type": "boolean",
                        "description": "Generate a 1-2 sentence summary and a sentiment label per article using OpenAI. Needs an OpenAI API key. Best-effort: if the OpenAI call fails for an article, the aiSummary and sentiment fields are simply omitted/null for that row (the article itself is still returned).",
                        "default": false
                    },
                    "openaiApiKey": {
                        "title": "OpenAI API key (for AI summary)",
                        "type": "string",
                        "description": "Only needed when AI summary is on. Kept private."
                    },
                    "notionConnector": {
                        "title": "Notion connector (optional)",
                        "type": "string",
                        "description": "Optional. Write each article as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless."
                    },
                    "notionParentId": {
                        "title": "Notion target data source ID",
                        "type": "string",
                        "description": "Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead."
                    },
                    "proxyConfiguration": {
                        "title": "Proxy (optional)",
                        "type": "object",
                        "description": "OPTIONAL. Google News RSS is a public, no-auth API with no anti-bot, so no proxy is needed and the default routes traffic directly (saving proxy credits). Only enable Apify Proxy if you hit IP rate limits when running very high volumes.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
