Youtube Text Scraper
Pricing
from $30.00 / 1,000 results
Youtube Text Scraper
Extract YouTube transcripts, subtitles, video metadata, hashtags, thumbnails, views, duration, and release dates from YouTube videos using either a search query or a list of direct YouTube URLs. It helps you turn YouTube search results and specific video links into structured JSON data
Pricing
from $30.00 / 1,000 results
Rating
0.0
(0)
Developer
Fabio Borsotti
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
21 hours ago
Last modified
Categories
Share
YouTube Transcript Scraper Actor
Extract YouTube transcripts, subtitles, video metadata, hashtags, thumbnails, views, duration, and release dates from YouTube videos using either a search query or a list of direct YouTube URLs. This YouTube transcript scraper helps you turn YouTube search results and specific video links into structured JSON data for research, SEO, lead generation, monitoring, and content analysis.
What does YouTube Transcript Scraper do?
YouTube Transcript Scraper is an APIFY Actor that can search YouTube videos by keyword, apply a time filter, collect video metadata, and retrieve the first available transcript based on your preferred language order. It can also process direct YouTube video URLs and include those videos in the final output.
This Actor is ideal if you want to:
- train RAG algorithms for AI
- scrape YouTube transcripts
- extract YouTube subtitles
- collect YouTube video metadata
- monitor YouTube search results
- build datasets from YouTube videos
- automate YouTube content research
- extract transcripts from specific YouTube videos
The Actor uses:
pytubefixfor YouTube search and metadata extractionyoutube-transcript-apifor transcript and subtitle retrieval- Scrape.do as a proxy layer
Why use this YouTube scraper?
If you work with YouTube data, transcripts are one of the most valuable sources of structured content. Video transcripts help you analyze what creators actually say, not just what appears in titles and descriptions.
This Actor is useful because it supports two collection modes in the same run:
- search by keyword using
query - direct processing of specific YouTube videos using
direct_url
This makes it practical both for broad monitoring and for targeted transcript extraction from known videos.
Common use cases
- SEO research for YouTube videos and keywords
- competitor monitoring on YouTube
- AI training and text dataset preparation
- content repurposing from video to text
- lead generation from niche YouTube channels
- trend monitoring by date range
- transcript-based topic clustering
- YouTube video catalog enrichment
- transcript extraction from manually selected videos
What data does the Actor extract?
For each processed YouTube video, the Actor can return:
- YouTube video ID
- video title
- video URL
- channel name
- channel URL
- transcript text with timestamps
- available subtitles metadata
- video view count
- video duration in seconds
- release date
- thumbnail URL
- keywords / hashtags
Why this Actor is useful
This YouTube Transcript Scraper is useful when you need structured YouTube data without building and maintaining your own scraping workflow. It is designed for users who want fast access to transcript-rich video data in a reusable JSON format.
Compared with a basic YouTube metadata scraper, this Actor focuses on transcript extraction and subtitle discovery, which makes it especially helpful for:
- content intelligence
- SEO workflows
- machine learning pipelines
- research automation
- enrichment of video datasets
Input
Supported input fields
direct_url(array of strings, optional): list of direct YouTube video URLs to process. Supported formats includeyoutube.com/watch?v=...,youtu.be/..., andyoutube.com/shorts/...query(string, optional): YouTube search query. Optional ifdirect_urlis providedrange(array of strings): time filter for YouTube search, one ofhour,today,this_week,this_month,this_yearlimit(integer): maximum number of videos to process from search resultslangs(array of strings): preferred language order used when selecting transcriptsfile_output(string): name of the JSON file saved in the key-value store
At least one between query and direct_url must be provided.
Example input with direct URLs only
{"direct_url": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ","https://youtu.be/9bZkp7q19f0"],"langs": ["it", "en"],"file_output": "output.json"}
Example input with query and direct URLs
{"direct_url": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"query": "apify tutorial","range": ["this_week"],"limit": 5,"langs": ["it", "en"],"file_output": "output.json"}
When both direct_url and query are provided, all videos listed in direct_url are processed and added to the output together with the search results.
Output
Each dataset item contains structured YouTube transcript and metadata fields like these:
{"id": "video_id","title": "Video title","url": "https://www.youtube.com/watch?v=video_id","autor": "Channel name","text": "[00:00] transcript text...","channel_name": "Channel name","channel_url": "https://www.youtube.com/@channel","video_title": "Detailed video title","video_url": "https://www.youtube.com/watch?v=video_id","views_count": 123456,"duration_seconds": 542,"release_date": "2026-04-13T00:00:00","thumbnail_url": "https://i.ytimg.com/vi/video_id/maxresdefault.jpg","hashtags": ["apify", "youtube", "scraping"],"subtitles": [{"language_code": "en","language": "English","is_generated": true,"is_translatable": true}]}
The Actor also saves the complete output array into the APIFY key-value store using the file name specified in file_output.
How the YouTube transcript extraction works
The Actor follows this strategy:
- If
direct_urlis provided, process each direct YouTube URL and extract transcript and metadata. - If
queryis provided, search YouTube videos using the selected time filter. - For each video, try to find a manually created transcript in the languages you requested.
- If no manual transcript is available, try an auto-generated transcript.
- If no requested language is available, fall back to the first available transcript.
- Save transcript text and structured metadata in the output dataset.
This makes the Actor practical for multilingual transcript scraping, targeted video extraction, and broad topic monitoring.
Who is this Actor for?
This Actor is a good fit for:
- SEO specialists
- marketers
- data engineers
- content analysts
- AI data learning teams
- researchers
- agencies monitoring YouTube niches
- users who need transcripts from specific YouTube links
Limitations
The actor works only with public YouTube videos that have transcripts enabled. Private or restricted videos are not supported.
Summary
If you need a YouTube transcript scraper for APIFY that extracts subtitles, transcript text, hashtags, thumbnails, views, duration, and release date from either search results or direct video URLs, this Actor gives you a clean starting point with structured JSON output and proxy support.