Youtube Text Scraper avatar

Youtube Text Scraper

Pricing

from $30.00 / 1,000 results

Go to Apify Store
Youtube Text Scraper

Youtube Text Scraper

Extract YouTube transcripts, subtitles, video metadata, hashtags, thumbnails, views, duration, and release dates from YouTube videos using either a search query or a list of direct YouTube URLs. It helps you turn YouTube search results and specific video links into structured JSON data

Pricing

from $30.00 / 1,000 results

Rating

0.0

(0)

Developer

Fabio Borsotti

Fabio Borsotti

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

21 hours ago

Last modified

Share

YouTube Transcript Scraper Actor

Extract YouTube transcripts, subtitles, video metadata, hashtags, thumbnails, views, duration, and release dates from YouTube videos using either a search query or a list of direct YouTube URLs. This YouTube transcript scraper helps you turn YouTube search results and specific video links into structured JSON data for research, SEO, lead generation, monitoring, and content analysis.

What does YouTube Transcript Scraper do?

YouTube Transcript Scraper is an APIFY Actor that can search YouTube videos by keyword, apply a time filter, collect video metadata, and retrieve the first available transcript based on your preferred language order. It can also process direct YouTube video URLs and include those videos in the final output.

This Actor is ideal if you want to:

  • train RAG algorithms for AI
  • scrape YouTube transcripts
  • extract YouTube subtitles
  • collect YouTube video metadata
  • monitor YouTube search results
  • build datasets from YouTube videos
  • automate YouTube content research
  • extract transcripts from specific YouTube videos

The Actor uses:

  • pytubefix for YouTube search and metadata extraction
  • youtube-transcript-api for transcript and subtitle retrieval
  • Scrape.do as a proxy layer

Why use this YouTube scraper?

If you work with YouTube data, transcripts are one of the most valuable sources of structured content. Video transcripts help you analyze what creators actually say, not just what appears in titles and descriptions.

This Actor is useful because it supports two collection modes in the same run:

  • search by keyword using query
  • direct processing of specific YouTube videos using direct_url

This makes it practical both for broad monitoring and for targeted transcript extraction from known videos.

Common use cases

  • SEO research for YouTube videos and keywords
  • competitor monitoring on YouTube
  • AI training and text dataset preparation
  • content repurposing from video to text
  • lead generation from niche YouTube channels
  • trend monitoring by date range
  • transcript-based topic clustering
  • YouTube video catalog enrichment
  • transcript extraction from manually selected videos

What data does the Actor extract?

For each processed YouTube video, the Actor can return:

  • YouTube video ID
  • video title
  • video URL
  • channel name
  • channel URL
  • transcript text with timestamps
  • available subtitles metadata
  • video view count
  • video duration in seconds
  • release date
  • thumbnail URL
  • keywords / hashtags

Why this Actor is useful

This YouTube Transcript Scraper is useful when you need structured YouTube data without building and maintaining your own scraping workflow. It is designed for users who want fast access to transcript-rich video data in a reusable JSON format.

Compared with a basic YouTube metadata scraper, this Actor focuses on transcript extraction and subtitle discovery, which makes it especially helpful for:

  • content intelligence
  • SEO workflows
  • machine learning pipelines
  • research automation
  • enrichment of video datasets

Input

Supported input fields

  • direct_url (array of strings, optional): list of direct YouTube video URLs to process. Supported formats include youtube.com/watch?v=..., youtu.be/..., and youtube.com/shorts/...
  • query (string, optional): YouTube search query. Optional if direct_url is provided
  • range (array of strings): time filter for YouTube search, one of hour, today, this_week, this_month, this_year
  • limit (integer): maximum number of videos to process from search results
  • langs (array of strings): preferred language order used when selecting transcripts
  • file_output (string): name of the JSON file saved in the key-value store

At least one between query and direct_url must be provided.

Example input with direct URLs only

{
"direct_url": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"https://youtu.be/9bZkp7q19f0"
],
"langs": ["it", "en"],
"file_output": "output.json"
}

Example input with query and direct URLs

{
"direct_url": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
],
"query": "apify tutorial",
"range": ["this_week"],
"limit": 5,
"langs": ["it", "en"],
"file_output": "output.json"
}

When both direct_url and query are provided, all videos listed in direct_url are processed and added to the output together with the search results.

Output

Each dataset item contains structured YouTube transcript and metadata fields like these:

{
"id": "video_id",
"title": "Video title",
"url": "https://www.youtube.com/watch?v=video_id",
"autor": "Channel name",
"text": "[00:00] transcript text...",
"channel_name": "Channel name",
"channel_url": "https://www.youtube.com/@channel",
"video_title": "Detailed video title",
"video_url": "https://www.youtube.com/watch?v=video_id",
"views_count": 123456,
"duration_seconds": 542,
"release_date": "2026-04-13T00:00:00",
"thumbnail_url": "https://i.ytimg.com/vi/video_id/maxresdefault.jpg",
"hashtags": ["apify", "youtube", "scraping"],
"subtitles": [
{
"language_code": "en",
"language": "English",
"is_generated": true,
"is_translatable": true
}
]
}

The Actor also saves the complete output array into the APIFY key-value store using the file name specified in file_output.

How the YouTube transcript extraction works

The Actor follows this strategy:

  1. If direct_url is provided, process each direct YouTube URL and extract transcript and metadata.
  2. If query is provided, search YouTube videos using the selected time filter.
  3. For each video, try to find a manually created transcript in the languages you requested.
  4. If no manual transcript is available, try an auto-generated transcript.
  5. If no requested language is available, fall back to the first available transcript.
  6. Save transcript text and structured metadata in the output dataset.

This makes the Actor practical for multilingual transcript scraping, targeted video extraction, and broad topic monitoring.

Who is this Actor for?

This Actor is a good fit for:

  • SEO specialists
  • marketers
  • data engineers
  • content analysts
  • AI data learning teams
  • researchers
  • agencies monitoring YouTube niches
  • users who need transcripts from specific YouTube links

Limitations

The actor works only with public YouTube videos that have transcripts enabled. Private or restricted videos are not supported.

Summary

If you need a YouTube transcript scraper for APIFY that extracts subtitles, transcript text, hashtags, thumbnails, views, duration, and release date from either search results or direct video URLs, this Actor gives you a clean starting point with structured JSON output and proxy support.