# YouTube Comment Scraper (`scrapesmith/youtube-comment-scraper`) Actor

Scrape millions of comments from any YouTube video with no API key. Supports all URL formats. Returns author info, like counts, reply counts, publish time, and creator hearts.

- **URL**: https://apify.com/scrapesmith/youtube-comment-scraper.md
- **Developed by:** [Scrape Smith](https://apify.com/scrapesmith) (community)
- **Categories:** Social media, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.45 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YouTube Comment Scraper — Apify Actor

Extract comments from any YouTube video at scale. Scrape millions of comments with author details, like counts, reply counts, publish dates, pinned status, and creator hearts — no YouTube API key, no login required.

---

### What This Actor Does

YouTube Comment Scraper uses the YouTube Innertube internal API (`/next` endpoint) to collect comments from any number of videos simultaneously. It handles pagination automatically, resumes after interruptions, and enforces per-run limits to stay within your Apify compute budget.

**Supported input formats:**
- Full watch URLs: `https://www.youtube.com/watch?v=dQw4w9WgXcQ`
- Short URLs: `https://youtu.be/dQw4w9WgXcQ`
- YouTube Shorts: `https://www.youtube.com/shorts/abc123`
- Plain 11-character video IDs: `dQw4w9WgXcQ`

---

### Output Fields

Each comment is saved as one row in the Apify dataset:

| Field | Type | Description |
|---|---|---|
| `videoId` | string | YouTube video ID |
| `videoUrl` | string | Full watch URL |
| `commentId` | string | Unique YouTube comment ID |
| `text` | string | Full comment text |
| `authorName` | string | Display name of the commenter |
| `authorChannelId` | string | Channel ID (`UCxxx...`) |
| `authorUrl` | string | Link to the commenter's channel |
| `likeCount` | integer | Number of likes on the comment |
| `replyCount` | integer | Number of replies |
| `publishedAt` | string | Relative time (e.g. `3 months ago`) |
| `isPinned` | boolean | Pinned by the creator |
| `isHearted` | boolean | Hearted by the creator |
| `status` | string | `ok` / `no_comments` / `error` |

---

### Input Configuration

#### `videoIds` (required)
A list of YouTube video URLs or IDs. Accepts any format — watch URL, short URL, Shorts URL, or a bare 11-character ID. Invalid entries are skipped with a warning logged.

#### `maxComments` (default: `0` = unlimited)
Maximum comments to collect **per video**. Set to `0` to collect everything YouTube will return. Must be a non-negative integer.

#### `sortBy` (default: `newest`)
Controls comment sort order:
- **`newest`** — Sorts by most recent. Supports **unlimited pagination** — the only way to scrape all comments on a large video.
- **`top`** — Sorts by highest engagement. ⚠️ YouTube limits anonymous top-comment pagination to approximately **800–900 comments** regardless of how large the video's comment section is. Use `top` only when you want a representative sample of the most popular comments.

---

### Use Cases

**Sentiment analysis and brand monitoring**
Collect every comment on your brand's YouTube content or competitor videos to run NLP pipelines, detect negative sentiment spikes, and track topic trends over time.

**Academic and social media research**
Extract large comment datasets for linguistic analysis, toxicity detection, misinformation studies, or community behaviour research without needing Google API quotas.

**Content strategy and audience insights**
Identify what phrases, questions, and topics your audience repeatedly brings up. Find recurring requests for future content or products by mining comments at scale.

**Influencer and creator vetting**
Pull comments from a creator's recent videos to gauge real audience engagement, detect bot activity (low-variety comment text, high like counts with suspicious patterns), and verify authenticity before partnerships.

**Competitor intelligence**
Scrape comments on competitor product videos to surface complaints, feature requests, and pain points your product can address.

**Moderation and compliance**
Build comment moderation pipelines by extracting raw comment data and running it through your own classifiers for hate speech, spam, or policy violations.

**Lead generation**
Scrape comments on industry-relevant videos to find users asking purchasing questions or expressing intent — a common tactic for outreach in niche communities.

**Training data for AI models**
Generate large, real-world conversation datasets for fine-tuning language models, training comment classifiers, or building YouTube-specific recommendation systems.

---

### Performance

- Scrapes approximately **1,000–3,000 comments per minute** per video depending on network latency
- Processes multiple videos **concurrently** (configurable via `workers` in advanced settings)
- Pages are fetched and pushed to the dataset **immediately** — no waiting for all pages to finish before data appears
- Progress is saved after every page — if the actor is interrupted or migrated, it resumes from the exact page it left off

---

### Limitations

| Limitation | Details |
|---|---|
| **Top sort cap** | `sortBy=top` returns a maximum of ~800–900 comments. This is a YouTube server-side limit for unauthenticated requests, not a bug. Use `sortBy=newest` for full extraction. |
| **No reply thread expansion** | This actor collects top-level comments. Reply threads (nested comments) are not expanded in this version. |
| **Relative timestamps** | `publishedAt` returns YouTube's relative time string (`3 months ago`) not an absolute ISO date, because the Innertube API does not return parsed timestamps for comments without authentication. |
| **No API key required** | Uses the Innertube internal API — no YouTube Data API v3 key or quota needed. |

---

### Sort Order Guide

````

sortBy=newest  →  Use for: complete datasets, research, archiving
Supports: unlimited comments (tested to 100,000+)
Order: most recent first

sortBy=top     →  Use for: finding viral/popular comments, quick samples
Supports: ~800-900 comments maximum
Order: highest engagement first

````

---

### Example Output

```json
{
  "videoId": "dQw4w9WgXcQ",
  "videoUrl": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "commentId": "Ugzge340dBgB75hWBm54AaABAg",
  "text": "can confirm: he never gave us up",
  "authorName": "SomeUser",
  "authorChannelId": "UCxxxxxxxxxxxxxxxxxxxxxxxx",
  "authorUrl": "https://www.youtube.com/@SomeUser",
  "likeCount": 12400,
  "replyCount": 83,
  "publishedAt": "11 months ago",
  "isPinned": false,
  "isHearted": true,
  "status": "ok"
}
````

***

### Migration and Resume

If the Apify platform migrates your actor run to a different server mid-scrape, the actor automatically resumes from where it left off:

- **Completed videos** are stored in the `COMMENTS_STATE` KV store key and skipped on restart
- **In-progress videos** save a cursor (the current page continuation token) to `CURSOR_{videoId}` before every page fetch — on restart, scraping continues from that exact page rather than page 1

***

### Frequently Asked Questions

**Do I need a YouTube API key?**
No. This actor uses YouTube's internal Innertube API, the same endpoint YouTube's own web client uses. No API key, no OAuth, no Google Cloud project required.

**Can I scrape comments from private or age-restricted videos?**
No. Only publicly accessible videos can be scraped. Private, members-only, and age-restricted videos require authentication which this actor does not use.

**Why does `sortBy=top` stop at ~800 comments?**
YouTube's servers limit unauthenticated pagination of top-sorted comments to approximately 60 pages (~20 comments per page). This is a YouTube-side restriction. Use `sortBy=newest` to bypass this and collect all comments.

**Can I scrape multiple videos at once?**
Yes. Add as many URLs or IDs as you want to `videoIds`. The actor processes them concurrently using configurable parallel workers.

**What happens if the run is interrupted?**
The actor saves its progress (both completed videos and the current page cursor for in-progress videos) to Apify's Key-Value store. On restart it picks up exactly where it left off.

**Is there a comment count limit?**
For free users, the actor caps output at 100 comments. Paid users can configure `maxComments` per video and the platform-level dataset limit via Apify's standard `ACTOR_MAX_PAID_DATASET_ITEMS` environment variable. Setting `maxComments=0` means unlimited (subject to your platform quota).

**Does it scrape replies to comments?**
Not in the current version. Only top-level comments are collected. Reply thread expansion is planned for a future release.

**How fast is it?**
Roughly 1,000 comments per 3 minutes depending on your network and the number of concurrent workers. The actor pushes results page-by-page (every ~20 comments) rather than waiting for all pages to finish, so data appears in your dataset immediately.

**What URL formats are accepted?**
All standard YouTube URL formats: `youtube.com/watch?v=ID`, `youtu.be/ID`, `youtube.com/shorts/ID`, `youtube.com/embed/ID`, and plain 11-character video IDs.

**Can I export results to CSV or Excel?**
Yes. Apify datasets can be downloaded in JSON, CSV, Excel, XML, RSS, and HTML formats directly from the Apify console or via the Apify API.

**Will this get my IP blocked?**
The actor uses randomised client version rotation and respects natural pagination timing. For high-volume scraping across many videos, enabling Apify's residential proxy (available on paid plans) is recommended to distribute requests across different IP addresses.

***

### Legal and Ethical Use

This actor collects publicly available data from YouTube's web interface, identical to what any user sees in their browser. Users are responsible for ensuring their use of collected data complies with YouTube's Terms of Service, applicable data protection regulations (GDPR, CCPA), and their jurisdiction's laws. Do not use this tool to harass individuals, build surveillance systems, or violate user privacy.

***

### Technical Notes

- Built on YouTube's Innertube `/next` endpoint (WEB client, no authentication)
- Comment data extracted from `frameworkUpdates.entityBatchUpdate.mutations` — YouTube's 2024+ polymorphic rendering system
- Continuation tokens sourced from the panel content area (unlimited pagination) rather than sort-menu action tokens (60-page cap)
- Session rotation across 4 client versions on error
- PushBuffer batches dataset writes every 50 items or 3 seconds to reduce API calls

# Actor input Schema

## `videoIds` (type: `array`):

One or more YouTube videos to scrape comments from. Each line can be a full watch URL (https://www.youtube.com/watch?v=...), a short URL (https://youtu.be/...), a Shorts URL (https://www.youtube.com/shorts/...), or a plain 11-character video ID.

## `maxComments` (type: `integer`):

Maximum number of comments to collect per video. Set to 0 for unlimited. Note: sortBy=top is hard-capped at ~800-900 by YouTube regardless of this value.

## `sortBy` (type: `string`):

'top' returns highest-engagement comments but YouTube caps it at ~800-900. 'newest' supports unlimited pagination — recommended for large videos.

## Actor input object example

```json
{
  "videoIds": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
    "https://youtu.be/dQw4w9WgXcQ",
    "dQw4w9WgXcQ"
  ],
  "maxComments": 1000,
  "sortBy": "newest"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "videoIds": [
        "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapesmith/youtube-comment-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "videoIds": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"] }

# Run the Actor and wait for it to finish
run = client.actor("scrapesmith/youtube-comment-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "videoIds": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ]
}' |
apify call scrapesmith/youtube-comment-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapesmith/youtube-comment-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YouTube Comment Scraper",
        "description": "Scrape millions of comments from any YouTube video with no API key. Supports all URL formats. Returns author info, like counts, reply counts, publish time, and creator hearts.",
        "version": "0.0",
        "x-build-id": "i0Boqi9uYbcuYM2ah"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapesmith~youtube-comment-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapesmith-youtube-comment-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapesmith~youtube-comment-scraper/runs": {
            "post": {
                "operationId": "runs-sync-scrapesmith-youtube-comment-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapesmith~youtube-comment-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-scrapesmith-youtube-comment-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "videoIds"
                ],
                "properties": {
                    "videoIds": {
                        "title": "YouTube Video URLs or IDs",
                        "type": "array",
                        "description": "One or more YouTube videos to scrape comments from. Each line can be a full watch URL (https://www.youtube.com/watch?v=...), a short URL (https://youtu.be/...), a Shorts URL (https://www.youtube.com/shorts/...), or a plain 11-character video ID.",
                        "items": {
                            "type": "string"
                        },
                        "default": []
                    },
                    "maxComments": {
                        "title": "Max Comments per Video",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum number of comments to collect per video. Set to 0 for unlimited. Note: sortBy=top is hard-capped at ~800-900 by YouTube regardless of this value.",
                        "default": 10
                    },
                    "sortBy": {
                        "title": "Sort Order",
                        "enum": [
                            "top",
                            "newest"
                        ],
                        "type": "string",
                        "description": "'top' returns highest-engagement comments but YouTube caps it at ~800-900. 'newest' supports unlimited pagination — recommended for large videos.",
                        "default": "newest"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
