YouTube Transcript, Comment, and Metadata Scraper avatar
YouTube Transcript, Comment, and Metadata Scraper

Pricing

Pay per event

Go to Apify Store
YouTube Transcript, Comment, and Metadata Scraper

YouTube Transcript, Comment, and Metadata Scraper

This actor scrapes YouTube videos for full transcripts (captions), the first page of comments, and key metadata (title, channel, views, and likes). It can discover videos based on search queries or scrape a specific list of video IDs.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Visita AI & Automation

Visita AI & Automation

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

This actor scrapes YouTube videos for full transcripts (captions), the first page of comments, and key metadata (title, channel, views, and likes). It can discover videos based on search queries or scrape a specific list of video IDs.

This actor uses a robust hybrid approach:

  • Playwright is used to load the page, handle popups, scroll, and scrape metadata and comments.
  • youtube-caption-extractor library is used to reliably fetch transcripts directly, avoiding common browser-based scraping failures.

Features

  • Scrapes full video transcripts (captions) in your chosen language.
  • Scrapes the first page of comments (approx. 20 comments).
  • Scrapes metadata: title, channel, view count, and like count.
  • Discover Mode: Finds videos to scrape based on search queries.
  • Scrape Mode: Scrapes a specific, user-provided list of video IDs.

Input Configuration

The actor's behavior is controlled by the input, which has the following fields:

FieldTypeDescription
runModeStringRequired. Choose the actor's operating mode.
discover: Find new videos using search.
scrape: Scrape specific videos from videoIDs.
discoverConfigObjectConfiguration for Discover Mode.
scrapeConfigObjectConfiguration for Scrape Mode.
langStringThe language code for the transcript you want (e.g., en, es, fr). Defaults to en.

discoverConfig Settings

FieldTypeDescription
searchQueriesArrayRequired. A list of search terms to find videos. The actor will use the first one.
searchCategoryStringOptional. A category keyword (e.g., "Sport", "News") to append to the search.
uploadDateStringOptional. This filter is not yet implemented in the code.
videoDurationStringOptional. This filter is not yet implemented in the code.
maxResultsPerQueryIntegerThe maximum number of videos to find for the search query. Defaults to 5.

scrapeConfig Settings

FieldTypeDescription
videoIDsArrayRequired. A list of YouTube video IDs (e.g., xZCbAki4puY) to scrape.

Output Structure

The actor saves its results to the dataset, which will be displayed in the Output tab. Each item represents one scraped video.

FieldTypeDescription
videoIdStringThe unique ID of the scraped video.
titleStringThe full title of the video.
channelStringThe name of the YouTube channel.
viewsStringThe view count (e.g., "1.2M views").
likesStringThe like count (e.g., "10K likes").
transcriptMergedStringThe full, merged transcript as a single block of text.
commentsStringA JSON string containing an array of comment objects. Each object has { author, text, likes }.
_chargeStatusStringA status message showing what you were charged for (e.g., "Metadata: Charged, Captions: Charged...").
errorStringIf an error occurred for this video, it will be noted here.

Limitations

  • Comments: The actor currently scrapes only the first page of comments (approx. 20). It does not perform infinite scrolling to load all comments.
  • Discover Filters: The uploadDate and videoDuration filters in "Discover Mode" are not yet implemented. The actor will find the top results regardless of these settings.

💰 Pricing (Pay-Per-Event)

This actor uses a Pay-Per-Event (PPE) pricing model. You pay a tiny fee to start the actor, and then a separate, small fee for each piece of data you successfully retrieve for each video.

This gives you granular control over your costs. If you only scrape metadata, you only pay for metadata.

Event NameTitleDescription
apify-actor-startActor Start FeeThis is the recommended base fee for initiating the actor run.
metadata-retrievedMetadata RetrievedCharged per video for successfully scraping its metadata (title, channel, views, etc.).
captions-retrievedCaptions RetrievedCharged per video only if a transcript is successfully found and extracted.
comments-retrievedComments RetrievedCharged per video only if comments are successfully found and scraped.

You can set your own prices for these events in the Publication tab of the actor's settings.