YouTube Scraper
Pricing
from $1.50 / 1,000 base video rows
YouTube Scraper
Search YouTube, export channel and video data, and pull transcripts for shortlisted videos. No API key, no browser. Fast HTTP-only research workflow.
Pricing
from $1.50 / 1,000 base video rows
Rating
0.0
(0)
Developer
kane liu
Maintained by CommunityActor stats
0
Bookmarked
5
Total users
4
Monthly active users
4 days ago
Last modified
Categories
Share
YouTube Research & Transcript Scraper
Search YouTube, export channel video lists, enrich selected videos, and collect transcripts without setting up the YouTube Data API.
This Apify Actor is built for YouTube research workflows where you do not want to scrape everything at the most expensive level. Start with broad discovery, shortlist the videos that matter, then run metadata enrichment or transcript extraction only on that smaller set.
Best for
- YouTube keyword research and topic mapping
- competitor and creator channel monitoring
- content audits for brands, agencies, and media teams
- transcript collection for LLM, RAG, summarization, and qualitative research pipelines
- building structured YouTube datasets from search results, channel pages, and known video URLs
How it works
The Actor accepts three input types. You can use one, two, or all three in the same run.
| Input | Use it when | Result source |
|---|---|---|
searchQueries | You want to discover videos by topic or keyword | YouTube search results |
channelUrls | You want recent videos from one or more channels | Channel /videos pages with RSS fallback when possible |
videoUrls | You already know the exact videos to process | Direct video metadata paths with fallback metadata extraction |
Rows are deduplicated by videoId, so the same video is only pushed once even if it appears in multiple inputs.
At least one of searchQueries, channelUrls, or videoUrls must contain a value. Empty input is rejected so the Actor does not create a misleading dataset row or charge for a helper item.
Recommended workflow
1. Discover videos cheaply
Use searchQueries or channelUrls first. Keep scrapeDetails and includeTranscript off while you are still exploring.
{"searchQueries": ["ai workflow automation", "youtube competitor analysis"],"maxResults": 50}
This gives you a clean shortlist with titles, URLs, channels, thumbnails, rough publish text, durations, view counts when available, and descriptions when present in the search result.
2. Review and shortlist
Filter the dataset outside the Actor. Pick only the videos you actually need for deeper work.
Useful shortlist signals:
- topic relevance from
titleanddescription - creator or company from
channelName - popularity from
viewCount - freshness from
publishedTextorpublishedAt - video length from
durationordurationSeconds
3. Enrich selected videos
Use videoUrls with scrapeDetails when you need stronger metadata for specific videos.
{"videoUrls": ["https://www.youtube.com/watch?v=XVv6mJpFOb0","https://youtu.be/dQw4w9WgXcQ"],"scrapeDetails": true}
scrapeDetails may improve or fill:
publishedAtcategorydescriptionviewCount
It is best used after shortlisting because it performs extra requests per video.
4. Collect transcripts only when needed
Use includeTranscript for videos where you actually need text, timestamps, or LLM-ready content.
{"videoUrls": ["https://www.youtube.com/watch?v=XVv6mJpFOb0"],"scrapeDetails": true,"includeTranscript": true,"transcriptLanguage": "en"}
When a transcript is available, the row includes timestamped transcript segments and a combined plain-text transcript. If YouTube does not provide captions for the video, or the captions cannot be fetched, the Actor still returns the video row without transcript fields.
Input reference
| Field | Type | Default | Description |
|---|---|---|---|
searchQueries | array of strings | empty | YouTube search keywords. Each query runs separately and can return up to maxResults videos. Best for discovery and SEO or market research. |
channelUrls | array of strings | empty | YouTube channel inputs. Supports @handle, UC channel IDs, and common youtube.com channel, c, and user URLs. Returns recent public videos; RSS fallback is used when the channel page does not expose rows. |
videoUrls | array of strings | empty | Exact YouTube videos to process. Supports 11-character video IDs and common watch, shorts, embed, live, and youtu.be URL formats. Best for enrichment and transcripts. |
maxResults | integer | 50 | Maximum videos per search query or channel. It does not multiply direct videoUrls; each provided video URL is processed once. |
scrapeDetails | boolean | false | Fetches richer metadata for each row. Use on shortlists or smaller runs. |
includeTranscript | boolean | false | Attempts transcript extraction for each video. Use on targeted runs because this is the heaviest mode. |
transcriptLanguage | string | en | Preferred transcript language code, such as en, es, fr, de, ja, or pt. If that language is unavailable, the Actor can fall back to the first available caption track. |
Input examples
Search by keyword
{"searchQueries": ["supply chain automation"],"maxResults": 25}
Export latest channel videos
{"channelUrls": ["https://www.youtube.com/@freecodecamp"],"maxResults": 100}
Process known videos
{"videoUrls": ["https://www.youtube.com/watch?v=XVv6mJpFOb0","https://youtu.be/PXMJ6FS7llk"],"scrapeDetails": true}
Transcript run for a shortlist
{"videoUrls": ["https://www.youtube.com/watch?v=XVv6mJpFOb0"],"includeTranscript": true,"transcriptLanguage": "en"}
Mixed discovery run
{"searchQueries": ["ai sales outreach"],"channelUrls": ["https://www.youtube.com/@HubSpot"],"maxResults": 30}
Output fields
Each dataset item is one YouTube video row. The Actor does not write helper rows for empty input.
Core fields
| Field | Type | Description |
|---|---|---|
recordVersion | string | Output contract version, currently 1.0. |
enrichmentLevel | string | base, detail, or transcript. Shows how far the row was enriched. |
videoId | string | YouTube video ID. |
title | string | Video title. |
url | string | Canonical YouTube watch URL. |
channelName | string | Channel or author name when available. |
channelId | string | YouTube channel ID when available. |
channelUrl | string | Channel URL when available. |
viewCount | integer | View count when available. May be 0 when the source does not expose it. |
duration | string | Human-readable duration from listing pages when available. |
durationSeconds | integer | Duration in seconds when available. |
publishedText | string | Relative publish text from listing pages, such as 2 weeks ago, when available. |
publishedAt | string | Publish date when available. Detail mode can improve this field. |
description | string | Search snippet, RSS description, or fuller video description depending on source and enrichment. |
thumbnailUrl | string | Video thumbnail URL. |
category | string | Video category when detail metadata is available. |
isLive | boolean | Whether the source marks the video as live content. |
source | string | Source path used for the row: search, channel, or detail. |
scrapedAt | string | ISO timestamp when the row was created. |
Transcript fields
Transcript fields appear only when includeTranscript is true and captions are successfully returned.
| Field | Type | Description |
|---|---|---|
transcript | array | Timestamped caption segments. Each segment has text, start, and duration. |
transcriptLanguage | string | Language code of the transcript actually returned. |
transcriptText | string | Full transcript joined into one plain-text string. |
Example transcript segment:
{"text": "Welcome back to the channel.","start": 12.4,"duration": 3.2}
Standby API
The Actor includes a Standby API for small interactive requests. The same validation rules apply as normal runs.
| Endpoint | Method | Use |
|---|---|---|
/ | GET | Readiness check |
/search?query=python%20automation&maxResults=10 | GET | Search videos |
/channel?url=https://www.youtube.com/@freecodecamp&maxResults=10 | GET | List recent channel videos |
/video?url=XVv6mJpFOb0 | GET | Fetch one direct video |
/run | POST | Run the normal Actor input JSON through Standby |
Limits and practical notes
- Transcripts are not guaranteed. They depend on whether YouTube exposes captions for the video and whether those captions can be fetched.
includeTranscriptcan still return a valid video row without transcript fields.- Search and channel rows may have lighter metadata than direct detail rows.
maxResultsapplies per search query and per channel URL.- Channel scraping works best with public channels and common YouTube URL formats.
- Very large transcript runs are slower and more expensive than discovery runs. Shortlist first when possible.
- YouTube page structure and availability can change. If a source path fails for a specific video or channel, try the most direct input type, especially
videoUrlsfor known videos.
Pricing model
The Actor is designed for tiered pay-per-event use:
- base video rows for search and channel discovery
- detailed video rows when
scrapeDetailsis enabled - transcript-ready rows when transcript extraction succeeds
This is why the recommended workflow is to discover broadly first, then run detail or transcript modes only on a shortlist.
Check the Apify Store pricing panel for the current event prices before running large jobs.
Why use this Actor
This Actor is focused on research, not just bulk scraping. It separates discovery, detail enrichment, and transcript extraction so you can control speed, dataset size, and cost.
Use it when you need structured YouTube data for market research, creator research, competitor monitoring, content strategy, or LLM-ready transcript workflows without maintaining your own YouTube scraping stack.