# Yandex News Scraper (`crawlerbros/yandex-news-scraper`) Actor

Scrape Yandex News stories, article clusters, and trending topics across Russian and CIS markets. Supports keyword search, category browsing, and trending homepage. Extracts headline, summary, source, publication date, topic tags, and full cluster of publications per story

- **URL**: https://apify.com/crawlerbros/yandex-news-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** News, Developer tools, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 7 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Yandex News Scraper

The only reliable tool for Russian-language media monitoring. Yandex News is Russia's dominant news aggregator, clustering multiple publications around the same event from thousands of sources across Russia and the CIS — data unavailable through any major Western news API.

### What You Can Scrape

- **Story clusters** — headline, summary, category, topic tags, total publication count
- **Individual articles** — per-outlet headline, source name, source URL, article URL, publication date
- **Trending topics** — the "что обсуждают" real-time trending panel from the Yandex News homepage

### Use Cases

- **Brand monitoring** — track Russian and CIS press coverage of your company, product, or executives
- **Geopolitical research** — monitor news events across Russian, Kazakh, Belarusian, and Uzbek media simultaneously
- **Competitive intelligence** — see which stories dominate Russian media on any topic
- **Journalism & academic research** — systematic collection of Russian-language news for media analysis
- **PR agencies** — measure share of voice across CIS markets

### Modes

| Mode | Description |
|------|-------------|
| `searchNews` | Search by keyword or phrase. Supports Russian and English queries. |
| `browseCategory` | Browse a Yandex News topic category (politics, technology, sports, etc.) |
| `getTrending` | Scrape the homepage: trending topic keywords + top story clusters |

### Input

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `mode` | enum | `searchNews` | searchNews, browseCategory, getTrending |
| `searchQueries` | string[] | — | Required for searchNews. E.g. `["Яндекс", "Apple"]` |
| `categoryUrls` | string[] | — | Required for browseCategory. Use `dzen.ru/news/rubric/*` URLs. Confirmed slugs: computers, politics, world, sport, business, science, society, culture, auto, health |
| `domain` | enum | `ru` | ru, kz, by, uz, com |
| `maxStories` | integer | 20 | Story clusters to scrape (1–200) |
| `sortOrder` | enum | `newest` | newest, popular |
| `language` | enum | `ru` | ru, kk, be, uz, en |
| `proxyConfiguration` | object | RESIDENTIAL | Strongly recommended |

### Output

Two linked record types per story, plus a trending record for `getTrending` runs.

#### Story record

```json
{
  "recordType": "story",
  "storyId": "Yandeks_zapustil_novyj_servis-abc123",
  "headline": "Яндекс запустил новый сервис",
  "summary": "Компания объявила о запуске...",
  "category": "Технологии",
  "topicTags": ["Яндекс", "технологии"],
  "articleCount": 12,
  "domain": "ru",
  "storyUrl": "https://news.yandex.ru/story/...",
  "scrapedAt": "2026-05-18T10:00:00Z"
}
````

#### Article record

```json
{
  "recordType": "article",
  "storyId": "Yandeks_zapustil_novyj_servis-abc123",
  "headline": "Яндекс анонсировал новый ИИ-сервис",
  "summary": "Как сообщает РБК, компания...",
  "sourceName": "РБК",
  "sourceUrl": "https://www.rbc.ru/",
  "articleUrl": "https://rbc.ru/technology/...",
  "publishedAt": "2026-05-18",
  "domain": "ru",
  "scrapedAt": "2026-05-18T10:00:00Z"
}
```

#### Trending record

```json
{
  "recordType": "trending",
  "topics": ["Газпром", "выборы", "ИИ", "курс доллара"],
  "domain": "ru",
  "scrapedAt": "2026-05-18T10:00:00Z"
}
```

### Proxy

Yandex News blocks automated requests without proxy. Use Apify Residential Proxy for best results — configure via the `proxyConfiguration` input.

### FAQ

**Which Yandex News domains are supported?**
All five: `ru` (Russia), `kz` (Kazakhstan), `by` (Belarus), `uz` (Uzbekistan), `com` (international).

**How do story and article records link together?**
Every `article` record has a `storyId` field matching its parent `story` record's `storyId`. Filter by `recordType` to separate the two.

**What is the `articleCount` field?**
The total number of publications Yandex reports for the story cluster — not limited by `maxStories`. It reflects how widely a story was covered.

**Can I search in English?**
Yes — set `language: "en"` and use English queries. Results will include English-language sources indexed by Yandex News.

**Why do some stories have fewer articles than `articleCount`?**
Yandex's reported count can differ from the visible cluster list due to paywalled, geo-blocked, or de-listed articles. The actor scrapes only what Yandex renders.

# Actor input Schema

## `mode` (type: `string`):

Scraping mode: search by keyword, browse a category, or get trending stories.

## `searchQueries` (type: `array`):

One or more search queries (required for searchNews mode). E.g. 'Яндекс', 'Ukraine', 'Apple'.

## `categoryUrls` (type: `array`):

Yandex News category page URLs (required for browseCategory mode). Use dzen.ru/news/rubric/\* URLs. Confirmed working slugs: computers, politics, world, sport, business, science, society, culture, auto, health. Legacy news.yandex.ru/\*.html URLs are auto-converted.

## `domain` (type: `string`):

Regional Yandex News edition to scrape (via dzen.ru).

## `maxStories` (type: `integer`):

Maximum number of story clusters to scrape per query or category (1–200).

## `sortOrder` (type: `string`):

Sort order for search and category results.

## `language` (type: `string`):

Content language filter. Sets Accept-Language header and Yandex language preference.

## `proxyConfiguration` (type: `object`):

Required. Yandex/Dzen only serves Russian IPs. Use Apify Residential proxy with country=RU for reliable access.

## Actor input object example

```json
{
  "mode": "searchNews",
  "searchQueries": [
    "Яндекс"
  ],
  "categoryUrls": [
    "https://dzen.ru/news/rubric/politics"
  ],
  "domain": "ru",
  "maxStories": 3,
  "sortOrder": "newest",
  "language": "ru",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "RU"
  }
}
```

# Actor output Schema

## `results` (type: `string`):

Dataset with scraped Yandex News stories, articles, and trending topics

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "searchNews",
    "searchQueries": [
        "Яндекс"
    ],
    "categoryUrls": [
        "https://dzen.ru/news/rubric/politics"
    ],
    "domain": "ru",
    "maxStories": 3,
    "sortOrder": "newest",
    "language": "ru",
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ],
        "apifyProxyCountry": "RU"
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/yandex-news-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "searchNews",
    "searchQueries": ["Яндекс"],
    "categoryUrls": ["https://dzen.ru/news/rubric/politics"],
    "domain": "ru",
    "maxStories": 3,
    "sortOrder": "newest",
    "language": "ru",
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
        "apifyProxyCountry": "RU",
    },
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/yandex-news-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "searchNews",
  "searchQueries": [
    "Яндекс"
  ],
  "categoryUrls": [
    "https://dzen.ru/news/rubric/politics"
  ],
  "domain": "ru",
  "maxStories": 3,
  "sortOrder": "newest",
  "language": "ru",
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ],
    "apifyProxyCountry": "RU"
  }
}' |
apify call crawlerbros/yandex-news-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/yandex-news-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Yandex News Scraper",
        "description": "Scrape Yandex News stories, article clusters, and trending topics across Russian and CIS markets. Supports keyword search, category browsing, and trending homepage. Extracts headline, summary, source, publication date, topic tags, and full cluster of publications per story",
        "version": "1.0",
        "x-build-id": "FaQXBBIfqZB7EYuB2"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~yandex-news-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-yandex-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~yandex-news-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-yandex-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~yandex-news-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-yandex-news-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "searchNews",
                            "browseCategory",
                            "getTrending"
                        ],
                        "type": "string",
                        "description": "Scraping mode: search by keyword, browse a category, or get trending stories.",
                        "default": "searchNews"
                    },
                    "searchQueries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "One or more search queries (required for searchNews mode). E.g. 'Яндекс', 'Ukraine', 'Apple'.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "categoryUrls": {
                        "title": "Category URLs",
                        "type": "array",
                        "description": "Yandex News category page URLs (required for browseCategory mode). Use dzen.ru/news/rubric/* URLs. Confirmed working slugs: computers, politics, world, sport, business, science, society, culture, auto, health. Legacy news.yandex.ru/*.html URLs are auto-converted.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "domain": {
                        "title": "Domain / Market",
                        "enum": [
                            "ru",
                            "kz",
                            "by",
                            "uz",
                            "com"
                        ],
                        "type": "string",
                        "description": "Regional Yandex News edition to scrape (via dzen.ru).",
                        "default": "ru"
                    },
                    "maxStories": {
                        "title": "Max Stories",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Maximum number of story clusters to scrape per query or category (1–200).",
                        "default": 20
                    },
                    "sortOrder": {
                        "title": "Sort Order",
                        "enum": [
                            "newest",
                            "popular"
                        ],
                        "type": "string",
                        "description": "Sort order for search and category results.",
                        "default": "newest"
                    },
                    "language": {
                        "title": "Language",
                        "enum": [
                            "ru",
                            "kk",
                            "be",
                            "uz",
                            "en"
                        ],
                        "type": "string",
                        "description": "Content language filter. Sets Accept-Language header and Yandex language preference.",
                        "default": "ru"
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Required. Yandex/Dzen only serves Russian IPs. Use Apify Residential proxy with country=RU for reliable access."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
