# Google News Scraper — Headlines, Topics & Full Coverage (`sian.agency/google-news-scraper`) Actor

Google News scraper and news API in one actor. Search news by query and time range, pull top headlines by country, topic and publisher feeds (CNN, BBC, Tech, Sports), local geo headlines, and full-story coverage with sub-articles and X posts. Pay per result — no subscription.

- **URL**: https://apify.com/sian.agency/google-news-scraper.md
- **Developed by:** [SIÁN OÜ](https://apify.com/sian.agency) (community)
- **Categories:** News, Business, Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.50 / 1,000 news search results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Google News Scraper — News API, Headlines & Full Coverage 📰

[![SIÁN Agency Store](https://img.shields.io/badge/Store-SI%C3%81N%20Agency-1AE392)](https://apify.com/sian.agency?fpr=sian) [![Store-Trustpilot Reviews](https://img.shields.io/badge/Store-Trustpilot%20Reviews-00B67A)](https://apify.com/sian.agency/trustpilot-reviews-scraper?fpr=sian) [![Store-Glassdoor Scraper](https://img.shields.io/badge/Store-Glassdoor%20Scraper-0CAA41)](https://apify.com/sian.agency/glassdoor-data-scraper?fpr=sian) [![Store-Instagram Transcripts](https://img.shields.io/badge/Store-Instagram%20Transcripts-E4405F)](https://apify.com/sian.agency/instagram-ai-transcript-extractor?fpr=sian)

#### 🎉 Scrape Google News at scale — the news API alternative that needs no API key, no RSS parsing, and no developer account
##### Built for PR & comms teams, brand-monitoring SaaS founders, market researchers, and data teams who need clean, structured news data on a schedule

### 📋 Overview

**Google News data made ridiculously simple.** Give this actor a search query, a country, a topic, a publisher, or a story ID — click Run — and get clean, query-ready article rows back. No login, no RSS feed parsing, no proxy setup, and no waiting on an official news API key. It's the Google News API you can actually use today.

Most news scrapers on the Store do one thing — keyword search. This actor bundles **six operations** behind a single dropdown: keyword **news search** with a time-range filter, **top headlines** by country, **topic & publisher feeds** (World, Business, Technology, Sports — plus CNN, BBC, or any source), **local geo headlines**, **full-story coverage** that pulls every sub-article of one story plus posts from X (formerly Twitter), and a **language helper** lookup. Pick one operation per run, get one tidy dataset out. Perfect for building a brand-monitoring pipeline, mapping how a story spreads, aggregating topic feeds for a newsletter, and feeding sentiment models thousands of clean article rows.

**Why teams choose us:**
- ✅ **News API alternative, zero setup**: no API key, no RSS parsing, no developer registration — paste an input and run
- 🧵 **Full-story coverage — a SIÁN exclusive**: the entire coverage cloud of one story (top news + all sub-articles + X posts when available), each row tagged `_rowType: article` or `tweet` — no other Google News actor returns this
- ⚡ **6 operations in one actor**: news search · top headlines · topic & publisher feeds · local headlines · full-story coverage · language list
- 💰 **Pay per result, no rental wall**: charged per successful row plus a small start fee — no monthly subscription, you pay only for the rows you pull
- 🎯 **Rich structured fields**: curated camelCase aliases (`articleTitle`, `snippet`, `sourceName`, `storyId`, `sourcePublicationId`) plus raw upstream fields spread alongside — analyst-friendly and engineer-complete
- 💎 **Only pay for successful rows**: failed lookups land as `status:"error"` rows at zero cost — you're never billed for a hiccup

### ✨ Features

- 🔍 **News Search**: keyword search across Google News with a published-time filter (past hour → past year), optional source-domain restriction — your Google News search API alternative
- 📰 **Top Headlines**: the latest top stories for any country, ready for a dashboard or daily digest
- 🗂️ **Topic & Publisher Headlines**: pull a Google News topic feed (World, National, Business, Technology, Entertainment, Sports, Science, Health) **or** a single publisher's feed via its publication ID (CNN, BBC, any outlet)
- 📍 **Local Headlines**: geo-based news for any city or region — `New York`, `London`, `Berlin`
- 🧵 **Full-Story Coverage**: every sub-article of one story plus X (Twitter) posts when available — trace how a narrative spreads across outlets and social in one call
- 🌐 **Language List**: a helper lookup that returns the valid languages for a country code — no more guessing locale combinations
- 🆔 **Re-Queryable IDs**: every row carries `storyId` (chains into Full-Story Coverage) and `sourcePublicationId` (chains into a publisher feed) so operations compose into a pipeline
- 🖼️ **HTTPS-Normalized URLs**: every article link, photo, source logo, and favicon arrives ready to embed
- 🌍 **Country & Language Targeting**: scope any feed by two-letter country and language codes
- 📊 **Single Clean Dataset Shape**: one flat row per article (or tweet), filterable by `_operation`, `_rowType`, and `status` — the same export pipeline works across all six operations

### 🎬 Quick Start

So simple, no training needed! Pick an operation, fill the matching input, click Run.

```bash
## Or use the API — one line
curl -X POST https://api.apify.com/v2/acts/sian.agency~google-news-scraper/runs?token=YOUR_TOKEN \
-d '{"operation":"newsSearch","query":"Tesla","timePublished":"7d","limit":50}'
````

### 🚀 Getting Started (3 Simple Steps)

#### Step 1: Pick an Operation

Choose one of six operations from the dropdown: News Search, Top Headlines, Topic / Publisher Headlines, Local Headlines, Full-Story Coverage, or Language List. One run = one operation.

#### Step 2: Fill the Matching Input

- **News Search** → a `query` (e.g. `Tesla`), optionally a `timePublished` window and `source` domain
- **Top Headlines** → just a `country` (defaults to `us`)
- **Topic / Publisher Headlines** → a `topic` keyword (`TECHNOLOGY`) **or** a publisher `sourcePublicationId`
- **Local Headlines** → a `query` that's a place name (e.g. `London`)
- **Full-Story Coverage** → a `story` ID (copied from any headline row's `storyId`)
- **Language List** → a `country` code

Optionally set `country`, `language`, and `limit` where relevant.

#### Step 3: Click Run

One click and we fetch, flatten, normalize, and push clean rows to your dataset. An HTML run report lands in the key-value store. Export to JSON, CSV, or Excel from the Apify console — or pull via API.

**That's it! In seconds, you'll have:**

- Clean flat article rows from any of six Google News endpoints — same shape, ready to export
- Curated camelCase fields per row plus raw upstream data spread alongside
- `storyId` and `sourcePublicationId` on every row to chain into deeper operations
- HTTPS article, photo, logo, and favicon URLs ready to embed
- Error rows for failed inputs — never billed

### 📥 Input Configuration

One operation per run. Each operation has its own required field (validated before charging). The optional `country`, `language`, and `limit` fields apply where relevant.

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| operation | enum | Yes | One of: `newsSearch`, `topHeadlines`, `topicHeadlines`, `localHeadlines`, `fullStoryCoverage`, `languageList` |
| query | string | newsSearch, localHeadlines | Keyword to search (News Search) or place name (Local Headlines) |
| timePublished | enum | No (newsSearch) | `anytime`, `1h`, `1d`, `7d`, `30d`, `1y` |
| source | string | No (newsSearch) | Restrict to one publisher domain (e.g. `cnn.com`) |
| topic | string | topicHeadlines | Topic keyword (`TECHNOLOGY`, `BUSINESS`, `SPORTS`…) **or** a publisher `sourcePublicationId` |
| story | string | fullStoryCoverage | Google News story ID (from any headline row's `storyId`) |
| storySort | enum | No (fullStoryCoverage) | `RELEVANCE` (default) or `DATE` |
| country | string | No | Two-letter country code (default `us`) |
| language | string | No | Two-letter language code (default `en`) |
| limit | integer | No | Max articles per run, 1–500 (default 25) — applies to search & headline feeds |

**Example — News Search (last 7 days):**

```json
{
  "operation": "newsSearch",
  "query": "Tesla",
  "timePublished": "7d",
  "limit": 50
}
```

**Example — Top Headlines (UK):**

```json
{
  "operation": "topHeadlines",
  "country": "gb",
  "limit": 50
}
```

**Example — Topic Headlines (Technology):**

```json
{
  "operation": "topicHeadlines",
  "topic": "TECHNOLOGY",
  "country": "us",
  "limit": 50
}
```

**Example — Local Headlines (a city):**

```json
{
  "operation": "localHeadlines",
  "query": "London",
  "country": "gb"
}
```

**Example — Full-Story Coverage (one story's entire cloud):**

```json
{
  "operation": "fullStoryCoverage",
  "story": "CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2pSdnR5aEVSRld6LXRKNmM4dHl5Z0FQAQ",
  "storySort": "RELEVANCE"
}
```

💡 **Workflow tip:** Run Top Headlines or Topic Headlines to discover `storyId` values, then loop Full-Story Coverage per story to map the complete coverage cloud. Copy a `sourcePublicationId` from any row into the `topic` field to pull that publisher's full feed.

### 📤 Output

One flat row per article (or per tweet on Full-Story Coverage), saved to the Apify dataset. Curated camelCase aliases land on every row alongside the raw upstream data. Filter by `_operation` to split modes, by `_rowType` to separate articles from tweets, or by `status` to separate success from error rows.

| Field | Type | Description |
|-------|------|-------------|
| articleId | string | Google News article identifier |
| articleTitle | string | Headline text |
| snippet | string | Article summary / lead text |
| link | string | HTTPS link to the article |
| photoUrl / thumbnailUrl | string | HTTPS article imagery |
| publishedDatetimeUtc | string | ISO-8601 publish timestamp |
| authors | array | Article authors |
| sourceName | string | Publisher name (CNN, BBC, Reuters…) |
| sourceUrl / sourceLogoUrl / sourceFaviconUrl | string | HTTPS publisher links and branding |
| sourcePublicationId | string | Re-queryable publisher feed ID (paste into `topic`) |
| storyId | string | Re-queryable story cluster ID (paste into `story`) |
| relatedTopics | array | Google News related-topic tags |
| subArticles | array | Clustered sub-articles for a headline |
| section | string | Full-Story Coverage section: `top_news`, `all_articles`, `twitter_posts` |
| tweetText / tweetUrl / tweetAuthor | string | Tweet fields (Full-Story Coverage, `_rowType: tweet`) |
| languages | array | Valid languages for a country (Language List) |
| \_operation / \_rowType / \_fetchedAt / \_page / status | metadata | Always-present row metadata |

All examples below are **real captured output** (June 2026; trimmed to the most useful fields):

**Example — News Search row:**

```json
{
  "_operation": "newsSearch",
  "_rowType": "article",
  "articleId": "CBMiWEFVX3lxTE1DYzB5d29UTWVKVDdpTnNETGVGbGoz...",
  "articleTitle": "Tesla, Inc. | History, Cars, Elon Musk, & Headquarters | Britannica Money",
  "snippet": "Tesla, Inc., headquartered in Austin, Texas, is an American manufacturer of electric automobiles, solar panels, and batteries…",
  "link": "https://www.britannica.com/money/Tesla-Motors",
  "publishedDatetimeUtc": "2026-06-03T08:16:00.000Z",
  "sourceName": "Britannica",
  "sourcePublicationId": "CAAqKQgKIiNDQklTRkFnTWFoQUtEbUp5YVhSaGJtNXBZMkV1WTI5dEtBQVAB",
  "status": "success"
}
```

**Example — Top Headline row (carries a re-queryable `storyId`):**

```json
{
  "_operation": "topHeadlines",
  "_rowType": "article",
  "articleTitle": "Markets rally as central bank holds rates",
  "sourceName": "Reuters",
  "storyId": "CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2pSdnR5aEVSRld6LXRKNmM4dHl5Z0FQAQ",
  "subArticles": [
    { "title": "Stocks climb after rate decision", "source_name": "BBC" },
    { "title": "What the rate hold means for borrowers", "source_name": "CNBC" }
  ],
  "publishedDatetimeUtc": "2026-06-08T07:40:00.000Z",
  "status": "success"
}
```

**Example — Full-Story Coverage article row (section-tagged):**

```json
{
  "_operation": "fullStoryCoverage",
  "_rowType": "article",
  "section": "top_news",
  "articleTitle": "Inside the story everyone is covering",
  "sourceName": "The Guardian",
  "link": "https://www.theguardian.com/...",
  "publishedDatetimeUtc": "2026-06-08T06:10:00.000Z",
  "_sourceQuery": "CAAqNggKIjBDQklTSGpvSmMz...",
  "status": "success"
}
```

**Example — Full-Story Coverage tweet row (when X posts are present):**

```json
{
  "_operation": "fullStoryCoverage",
  "_rowType": "tweet",
  "section": "twitter_posts",
  "tweetText": "This is the biggest development on the story so far — full thread 🧵",
  "tweetUrl": "https://twitter.com/...",
  "tweetAuthor": "@reporter",
  "publishedDatetimeUtc": "2026-06-08T06:25:00.000Z",
  "status": "success"
}
```

> 🧵 **About X (Twitter) posts:** Full-Story Coverage returns top news, every sub-article, **and** X posts *when Google News surfaces them for that story* — busy, breaking stories tend to include them; quieter stories may return articles only. Each row is tagged `_rowType: article` or `tweet` so you can filter cleanly either way.

### 💼 Use Cases & Examples

#### 1. Brand & PR Monitoring — Catch Every Mention

**For PR and comms teams tracking brand, product, and executive coverage across Google News.**

**Input:** Schedule `newsSearch` with your brand query and `timePublished: "1d"` on a daily cron
**Output:** One row per article — headline, snippet, source, author, and publish timestamp
**Use:** Diff consecutive datasets to catch new coverage, crises, and competitor PR moves the moment they hit — power "new mention" alerts and a share-of-voice dashboard.

#### 2. Full-Story Coverage Mapping — Trace a Narrative

**For analysts and PR teams mapping how a single story spreads across outlets and social.**

**Input:** Run `topHeadlines`, copy a `storyId`, then run `fullStoryCoverage` on it
**Output:** Top news + every sub-article + X posts (when available), each tagged by `_rowType` and `section`
**Use:** See the entire coverage cloud of one story in a single call — which outlets carried it, in what order, and what people said on X. No other Google News actor returns this.

#### 3. Topic & Publisher Feed Aggregation — Build a Curated Feed

**For newsletter editors, dashboards, and LLM pipelines that need structured topic feeds.**

**Input:** Run `topicHeadlines` for World, Business, Technology, Sports — or a publisher's `sourcePublicationId`
**Output:** The latest clustered headlines for that topic or outlet, ready to export
**Use:** Power a curated news feed for a newsletter, internal dashboard, or a retrieval pipeline that feeds an LLM — refreshed on a schedule with one click.

#### 4. Local & Geo News Intelligence — Regional Monitoring

**For market researchers, real-estate teams, and local-SEO agencies tracking regional press.**

**Input:** `localHeadlines` with a city or region as the `query`
**Output:** Geo-based headlines for that location — civic events, local business, regional sentiment
**Use:** Monitor regional press and location-specific events for market research, real-estate intelligence, or local-SEO content planning.

#### 5. Sentiment & Media Analysis — Feed Your NLP Models

**For data teams quantifying tone, share-of-voice, and trending topics at scale.**

**Input:** Paginate `newsSearch` (or topic feeds) across your target queries with a high `limit`
**Output:** Thousands of clean article rows — title, snippet, source, author, published date
**Use:** Feed structured rows into sentiment and NLP models to quantify tone, surface trending topics, and benchmark share-of-voice across competitors for research and competitive intelligence.

### 🔌 Integration Examples

#### JavaScript/Node.js

```javascript
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });

// Search news, then map each story's full coverage
const search = await client.actor('sian.agency/google-news-scraper').call({
  operation: 'topHeadlines',
  country: 'us',
  limit: 25
});

const { items } = await client.dataset(search.defaultDatasetId).listItems();
const storyIds = items.filter(i => i.status === 'success' && i.storyId).map(i => i.storyId);
console.log(`${storyIds.length} stories to expand into full coverage`);
```

#### Python

```python
from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')

## Daily brand-mention snapshot
run = client.actor('sian.agency/google-news-scraper').call(
    run_input={'operation': 'newsSearch', 'query': 'Tesla', 'timePublished': '1d', 'limit': 50}
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    if item.get('status') == 'success':
        print(f"{item['publishedDatetimeUtc']} — {item['sourceName']}: {item['articleTitle']}")
```

#### cURL

```bash
curl -X POST 'https://api.apify.com/v2/acts/sian.agency~google-news-scraper/runs?token=YOUR_TOKEN' \
-H 'Content-Type: application/json' \
-d '{"operation":"topicHeadlines","topic":"TECHNOLOGY","limit":50}'
```

#### Automation Workflows (N8N / Zapier / Make)

1. **Trigger**: Schedule (daily news refresh) or webhook (new keyword added to your tracker)
2. **HTTP Request**: Call the actor API with `operation` and the per-op input fields
3. **Process**: Diff against yesterday's articles, score sentiment, or expand `storyId` into full coverage
4. **Action**: Push new-mention alerts to Slack, sync rows to Google Sheets, or update your media-monitoring database

### 📈 Performance & Pricing

Transparent **pay-per-event** — you're charged only for **successful rows**, plus a small one-time start fee per run. Failed lookups land as `status:"error"` rows and cost **$0**. Higher Apify plans get automatic volume discounts on every event. No monthly rental, no subscription — you pay only for the rows you pull.

| Event | Indicative Price | Applies to |
|-------|------------------|------------|
| `apify-actor-start` | **$0.005** | One-time per run |
| 🔍 `news-search-result` | **$0.0015** / row | News Search |
| 📰 `top-headlines-result` | **$0.0015** / row | Top Headlines |
| 🗂️ `topic-headlines-result` | **$0.0015** / row | Topic / Publisher Headlines |
| 📍 `local-headlines-result` | **$0.0015** / row | Local Headlines |
| 🧵 `full-story-result` | **$0.0015** / row | Full-Story Coverage |
| 🌐 `language-list-result` | **$0.0005** / row | Language List (helper) |

**Cost examples:**

- **1,000 news search articles** → $1.50 + $0.005 start = **~$1.51**
- **A 50-headline top feed** → $0.075 + $0.005 = **~$0.08**
- **A full story's coverage** (e.g. 53 articles) → ~$0.08 + $0.005 = **~$0.085**

💰 **Bulk article rows are priced at a fraction of a cent each** — cheaper than maintaining your own RSS parsers and proxies, with no news API approval to wait on.

🔗 [View current pricing](https://apify.com/sian.agency/google-news-scraper?fpr=sian)

### ❓ Frequently Asked Questions

**Q: Does Google News have an API?**
A: Google retired its public Google News API years ago, and the RSS feeds are limited, brittle, and don't expose topic feeds, publisher feeds, local headlines, or full-story coverage. This actor is the practical Google News API alternative — it returns clean JSON for all of those, with no developer account.

**Q: Do I need a Google News API key?**
A: No. This actor requires **no Google News API key, no RSS setup, and no developer registration** — just an Apify token. Paste your input, click Run, get structured data back.

**Q: How is this different from parsing the Google News RSS feed?**
A: RSS gives you a thin headline list with no structured source metadata, no story clustering, no topic/publisher feeds, no local news, and no full-story coverage. This actor returns rich rows — source name, logo, favicon, `storyId`, `sourcePublicationId`, sub-articles, and (on Full-Story Coverage) X posts — all HTTPS-normalized and ready to export.

**Q: What does Full-Story Coverage return that other scrapers don't?**
A: The entire coverage cloud for one story in a single call — top news, every sub-article, and X (Twitter) posts when Google News surfaces them. Each row is tagged `_rowType: article` or `tweet` and `section: top_news | all_articles | twitter_posts`. No other Google News actor on the Store exposes this.

**Q: Can I monitor my brand with this?**
A: Yes — that's a core use case. Schedule `newsSearch` with your brand query and `timePublished: "1d"` on a cron, then diff the dataset run-to-run to catch every new mention. Pair it with our Trustpilot and Glassdoor scrapers for a full reputation-monitoring stack.

**Q: How do I get a story ID or publisher ID?**
A: Every headline row carries a `storyId` (paste into the `story` field for Full-Story Coverage) and a `sourcePublicationId` (paste into the `topic` field to pull that publisher's feed). Run Top Headlines or Topic Headlines first, then chain.

**Q: Which countries and languages are supported?**
A: Any two-letter country and language codes (e.g. `us`/`en`, `gb`/`en`, `de`/`de`, `fr`/`fr`). Run the Language List operation with a country code to discover the valid languages for that country.

**Q: What output formats are available?**
A: JSON, CSV, and Excel — export directly from the Apify dataset console, or pull via API.

**Q: How am I billed, and what about failed lookups?**
A: Pay-per-event — only successful rows are charged, plus a one-time start fee. There is no monthly rental. Failed lookups land as `status:"error"` rows at **$0**.

### 🐛 Troubleshooting

**A run returns `status:"error"` with "temporarily unavailable"**

- The data source hit a transient hiccup. The actor retries automatically with backoff. Re-run after a moment — error rows are never charged.

**Full-Story Coverage returns articles but no X posts**

- X (Twitter) posts are surfaced by Google News on a per-story basis — busy, breaking stories tend to include them; quieter stories return articles only. This is expected, not a bug. Filter rows by `_rowType: tweet` to isolate them when present.

**Topic Headlines returns "not found" or empty**

- Confirm the `topic` value. Use an uppercase topic keyword (`TECHNOLOGY`, `BUSINESS`, `SPORTS`…) **or** a valid `sourcePublicationId` copied from a headline row. A free-text phrase will not work — use News Search for that.

**News Search returns fewer rows than `limit`**

- Thin coverage for a narrow query or a short time window returns fewer rows. Broaden the query, widen `timePublished`, or remove the `source` domain filter.

**Local Headlines returns unexpected results**

- Use a recognizable place name (`New York`, `London`) as the `query` and set a matching `country`. Very small localities may have limited coverage.

### 🧰 More by SIÁN Agency

- [Trustpilot Reviews & Company Data Scraper](https://apify.com/sian.agency/trustpilot-reviews-scraper?fpr=sian) — company reviews & reputation data
- [Glassdoor Scraper](https://apify.com/sian.agency/glassdoor-data-scraper?fpr=sian) — employer reviews, salaries & interviews
- [Instagram AI Transcript Extractor](https://apify.com/sian.agency/instagram-ai-transcript-extractor?fpr=sian) — turn Instagram videos & reels into searchable text
- [Browse all SIÁN actors →](https://apify.com/sian.agency?fpr=sian)

### ⚠️ Trademark Disclaimer

This actor is an independent tool and is **not affiliated with, endorsed by, or sponsored by Google LLC.** "Google News", "Google", and related marks are trademarks of Google LLC, and "X" and "Twitter" are trademarks of their respective owners; all are used here only to describe the publicly available data this tool helps you collect. Use this actor responsibly and in compliance with applicable laws, the relevant platforms' terms of service, and data-protection regulations (including GDPR and CCPA where applicable). You are responsible for how you use the data you extract.

### ⚖️ Is it legal to scrape Google News data?

Our actors are ethical and do not extract any private user data. They only collect content that publishers and users have chosen to share publicly on Google News. We therefore believe that this actor, when used for ethical purposes by Apify users, is safe.

However, you should be aware that your results could contain personal data. Personal data is protected by the **GDPR** in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. You can also read Apify's blog post on the [legality of web scraping](https://blog.apify.com/is-web-scraping-legal/).

### 🤝 Support

[![Telegram Support](https://img.shields.io/badge/Telegram-Support%20Group-0088cc?logo=telegram)](https://t.me/+vyh1sRE08sAxMGRi)

**Join our active support community**

- For issues or feature requests, open an issue in the actor's repository or use the **Issues** tab on the actor page
- Check [SIÁN Agency Store](https://apify.com/sian.agency?fpr=sian) for more automation tools
- 📧 <apify@sian-agency.online>
- ⭐ If this saves you time, a 5-star review helps us ship more features.

***

**Built by [SIÁN Agency](https://www.sian-agency.online)** | **[More Tools](https://apify.com/sian.agency?fpr=sian)**

# Actor input Schema

## `operation` (type: `string`):

🎯 **PICK ONE OPERATION PER RUN.** Each run produces one clean dataset matching the chosen mode.

- **🔍 News Search** — search articles by keyword + time range (one row per article)
- **📰 Top Headlines** — latest top stories for a country
- **🗂️ Topic / Publisher Headlines** — headlines for a topic (World, Tech, Sports…) **or** a publisher (CNN, BBC, any source)
- **📍 Local Headlines** — geo-based news for a city or region
- **🧵 Full-Story Coverage** — every sub-article of one story **plus** posts from X (Twitter) — the column no other scraper returns
- **🌐 Language List** — valid languages for a country code (helper lookup)

💡 **TIP:** Run Top Headlines or Topic Headlines to discover `storyId` values, then drill into Full-Story Coverage for the complete coverage cloud.

## `query` (type: `string`):

🔍 **Required for `News Search` and `Local Headlines`.**

- **News Search:** any keyword or phrase — `Tesla`, `interest rates`, `"climate summit"`. Combine with the time-range filter below.
- **Local Headlines:** a place name — `New York`, `London`, `Berlin` — to get geo-based news for that location.

⚠️ **Ignored** for all other operations.

## `timePublished` (type: `string`):

⏱️ **For `News Search` only.** Limit results to articles published within a time window.

- `anytime` (default)
- `1h` — last hour
- `1d` — last 24 hours
- `7d` — last week
- `30d` — last month
- `1y` — last year

⚠️ Ignored for all other operations.

## `source` (type: `string`):

🏷️ Optional. For `News Search` only — restrict results to a single publisher domain, e.g. `cnn.com`, `bbc.com`, `reuters.com`. Leave blank for all sources.

⚠️ Ignored for all other operations.

## `topic` (type: `string`):

🗂️ **Required for `Topic / Publisher Headlines`.** Either:

- A **topic** keyword: `WORLD`, `NATIONAL`, `BUSINESS`, `TECHNOLOGY`, `ENTERTAINMENT`, `SPORTS`, `SCIENCE`, `HEALTH`
- A **publisher publication ID** to get a specific outlet's feed (CNN, BBC, etc.). Find it in the `sourcePublicationId` field of any News Search or headline result row.

💡 **TIP:** Run News Search for an outlet, copy the `sourcePublicationId` from a result, and paste it here to pull that publisher's full feed.

⚠️ Ignored for all other operations.

## `story` (type: `string`):

🧵 **Required for `Full-Story Coverage`.** The Google News story identifier.

💡 **How to get it:** Run `Top Headlines` or `Topic / Publisher Headlines` first — each clustered result row carries a `storyId` field. Copy that value here to pull the entire coverage cloud (top news + all sub-articles + posts from X).

⚠️ Ignored for all other operations.

## `storySort` (type: `string`):

Sort order for the sub-articles in `Full-Story Coverage`.

- `RELEVANCE` (default)
- `DATE` — newest first

Ignored for all other operations.

## `country` (type: `string`):

🌍 Two-letter country code for results locale (e.g. `us`, `gb`, `de`, `fr`, `in`). Applies to News Search, Top Headlines, Topic Headlines, Local Headlines, and Language List. Default `us`.

## `language` (type: `string`):

🗣️ Two-letter language code (e.g. `en`, `es`, `de`, `fr`). Applies to News Search, Top Headlines, Topic Headlines, and Local Headlines. Default `en`.

💡 **TIP:** Run the `Language List` operation with a country code to get the valid languages for that country.

## `limit` (type: `integer`):

🔢 **Applies to News Search, Top Headlines, Topic Headlines, and Local Headlines.** Maximum number of articles returned in a single run (one upstream call returns up to this many rows).

💡 **TIP:** Start small (10–25) to preview results before scaling up. Headlines feeds can return up to 500.

⚠️ Ignored for Full-Story Coverage and Language List.

## Actor input object example

```json
{
  "operation": "newsSearch",
  "query": "Tesla",
  "timePublished": "anytime",
  "source": "cnn.com",
  "topic": "TECHNOLOGY",
  "story": "CAAqNggKIjBDQklTSGpvSmMzUnZjbmt0TXpZd1NoRUtEd2pzbFA3X0N4RjlDUlpVVnhudXBpZ0FQAQ",
  "storySort": "RELEVANCE",
  "country": "us",
  "language": "en",
  "limit": 25
}
```

# Actor output Schema

## `results` (type: `string`):

The dataset of scraped news articles, tweets, and language rows.

## `scrapingSummary` (type: `string`):

HTML summary showing successful and failed rows with key run metrics.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "Tesla",
    "topic": "TECHNOLOGY"
};

// Run the Actor and wait for it to finish
const run = await client.actor("sian.agency/google-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "Tesla",
    "topic": "TECHNOLOGY",
}

# Run the Actor and wait for it to finish
run = client.actor("sian.agency/google-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "Tesla",
  "topic": "TECHNOLOGY"
}' |
apify call sian.agency/google-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=sian.agency/google-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Google News Scraper — Headlines, Topics & Full Coverage",
        "description": "Google News scraper and news API in one actor. Search news by query and time range, pull top headlines by country, topic and publisher feeds (CNN, BBC, Tech, Sports), local geo headlines, and full-story coverage with sub-articles and X posts. Pay per result — no subscription.",
        "version": "1.0",
        "x-build-id": "F5jyI7Y8DzvFZEVUl"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/sian.agency~google-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-sian.agency-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/sian.agency~google-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-sian.agency-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/sian.agency~google-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-sian.agency-google-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "operation"
                ],
                "properties": {
                    "operation": {
                        "title": "🎯 Operation — what do you want to scrape?",
                        "enum": [
                            "newsSearch",
                            "topHeadlines",
                            "topicHeadlines",
                            "localHeadlines",
                            "fullStoryCoverage",
                            "languageList"
                        ],
                        "type": "string",
                        "description": "🎯 **PICK ONE OPERATION PER RUN.** Each run produces one clean dataset matching the chosen mode.\n\n- **🔍 News Search** — search articles by keyword + time range (one row per article)\n- **📰 Top Headlines** — latest top stories for a country\n- **🗂️ Topic / Publisher Headlines** — headlines for a topic (World, Tech, Sports…) **or** a publisher (CNN, BBC, any source)\n- **📍 Local Headlines** — geo-based news for a city or region\n- **🧵 Full-Story Coverage** — every sub-article of one story **plus** posts from X (Twitter) — the column no other scraper returns\n- **🌐 Language List** — valid languages for a country code (helper lookup)\n\n💡 **TIP:** Run Top Headlines or Topic Headlines to discover `storyId` values, then drill into Full-Story Coverage for the complete coverage cloud.",
                        "default": "newsSearch"
                    },
                    "query": {
                        "title": "🔍 Search Query (for News Search / Local Headlines)",
                        "type": "string",
                        "description": "🔍 **Required for `News Search` and `Local Headlines`.**\n\n- **News Search:** any keyword or phrase — `Tesla`, `interest rates`, `\"climate summit\"`. Combine with the time-range filter below.\n- **Local Headlines:** a place name — `New York`, `London`, `Berlin` — to get geo-based news for that location.\n\n⚠️ **Ignored** for all other operations."
                    },
                    "timePublished": {
                        "title": "⏱️ Published Time Range (for News Search)",
                        "enum": [
                            "anytime",
                            "1h",
                            "1d",
                            "7d",
                            "30d",
                            "1y"
                        ],
                        "type": "string",
                        "description": "⏱️ **For `News Search` only.** Limit results to articles published within a time window.\n\n- `anytime` (default)\n- `1h` — last hour\n- `1d` — last 24 hours\n- `7d` — last week\n- `30d` — last month\n- `1y` — last year\n\n⚠️ Ignored for all other operations.",
                        "default": "anytime"
                    },
                    "source": {
                        "title": "🏷️ Source Domain Filter (for News Search)",
                        "type": "string",
                        "description": "🏷️ Optional. For `News Search` only — restrict results to a single publisher domain, e.g. `cnn.com`, `bbc.com`, `reuters.com`. Leave blank for all sources.\n\n⚠️ Ignored for all other operations."
                    },
                    "topic": {
                        "title": "🗂️ Topic or Publisher (for Topic / Publisher Headlines)",
                        "type": "string",
                        "description": "🗂️ **Required for `Topic / Publisher Headlines`.** Either:\n\n- A **topic** keyword: `WORLD`, `NATIONAL`, `BUSINESS`, `TECHNOLOGY`, `ENTERTAINMENT`, `SPORTS`, `SCIENCE`, `HEALTH`\n- A **publisher publication ID** to get a specific outlet's feed (CNN, BBC, etc.). Find it in the `sourcePublicationId` field of any News Search or headline result row.\n\n💡 **TIP:** Run News Search for an outlet, copy the `sourcePublicationId` from a result, and paste it here to pull that publisher's full feed.\n\n⚠️ Ignored for all other operations."
                    },
                    "story": {
                        "title": "🧵 Story ID (for Full-Story Coverage)",
                        "type": "string",
                        "description": "🧵 **Required for `Full-Story Coverage`.** The Google News story identifier.\n\n💡 **How to get it:** Run `Top Headlines` or `Topic / Publisher Headlines` first — each clustered result row carries a `storyId` field. Copy that value here to pull the entire coverage cloud (top news + all sub-articles + posts from X).\n\n⚠️ Ignored for all other operations."
                    },
                    "storySort": {
                        "title": "↕️ Story Sort Order (for Full-Story Coverage)",
                        "enum": [
                            "RELEVANCE",
                            "DATE"
                        ],
                        "type": "string",
                        "description": "Sort order for the sub-articles in `Full-Story Coverage`.\n\n- `RELEVANCE` (default)\n- `DATE` — newest first\n\nIgnored for all other operations.",
                        "default": "RELEVANCE"
                    },
                    "country": {
                        "title": "🌍 Country",
                        "type": "string",
                        "description": "🌍 Two-letter country code for results locale (e.g. `us`, `gb`, `de`, `fr`, `in`). Applies to News Search, Top Headlines, Topic Headlines, Local Headlines, and Language List. Default `us`.",
                        "default": "us"
                    },
                    "language": {
                        "title": "🗣️ Language",
                        "type": "string",
                        "description": "🗣️ Two-letter language code (e.g. `en`, `es`, `de`, `fr`). Applies to News Search, Top Headlines, Topic Headlines, and Local Headlines. Default `en`.\n\n💡 **TIP:** Run the `Language List` operation with a country code to get the valid languages for that country.",
                        "default": "en"
                    },
                    "limit": {
                        "title": "🔢 Max articles to fetch",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "🔢 **Applies to News Search, Top Headlines, Topic Headlines, and Local Headlines.** Maximum number of articles returned in a single run (one upstream call returns up to this many rows).\n\n💡 **TIP:** Start small (10–25) to preview results before scaling up. Headlines feeds can return up to 500.\n\n⚠️ Ignored for Full-Story Coverage and Language List.",
                        "default": 25
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
