YouTube  Scraper avatar

YouTube Scraper

Pricing

from $1.50 / 1,000 base video rows

Go to Apify Store
YouTube  Scraper

YouTube Scraper

Search YouTube, export channel and video data, and pull transcripts for shortlisted videos. No API key, no browser. Fast HTTP-only research workflow.

Pricing

from $1.50 / 1,000 base video rows

Rating

0.0

(0)

Developer

kane liu

kane liu

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

4

Monthly active users

4 days ago

Last modified

Share

YouTube Research & Transcript Scraper

Search YouTube, export channel video lists, enrich selected videos, and collect transcripts without setting up the YouTube Data API.

This Apify Actor is built for YouTube research workflows where you do not want to scrape everything at the most expensive level. Start with broad discovery, shortlist the videos that matter, then run metadata enrichment or transcript extraction only on that smaller set.

Best for

  • YouTube keyword research and topic mapping
  • competitor and creator channel monitoring
  • content audits for brands, agencies, and media teams
  • transcript collection for LLM, RAG, summarization, and qualitative research pipelines
  • building structured YouTube datasets from search results, channel pages, and known video URLs

How it works

The Actor accepts three input types. You can use one, two, or all three in the same run.

InputUse it whenResult source
searchQueriesYou want to discover videos by topic or keywordYouTube search results
channelUrlsYou want recent videos from one or more channelsChannel /videos pages with RSS fallback when possible
videoUrlsYou already know the exact videos to processDirect video metadata paths with fallback metadata extraction

Rows are deduplicated by videoId, so the same video is only pushed once even if it appears in multiple inputs.

At least one of searchQueries, channelUrls, or videoUrls must contain a value. Empty input is rejected so the Actor does not create a misleading dataset row or charge for a helper item.

1. Discover videos cheaply

Use searchQueries or channelUrls first. Keep scrapeDetails and includeTranscript off while you are still exploring.

{
"searchQueries": ["ai workflow automation", "youtube competitor analysis"],
"maxResults": 50
}

This gives you a clean shortlist with titles, URLs, channels, thumbnails, rough publish text, durations, view counts when available, and descriptions when present in the search result.

2. Review and shortlist

Filter the dataset outside the Actor. Pick only the videos you actually need for deeper work.

Useful shortlist signals:

  • topic relevance from title and description
  • creator or company from channelName
  • popularity from viewCount
  • freshness from publishedText or publishedAt
  • video length from duration or durationSeconds

3. Enrich selected videos

Use videoUrls with scrapeDetails when you need stronger metadata for specific videos.

{
"videoUrls": [
"https://www.youtube.com/watch?v=XVv6mJpFOb0",
"https://youtu.be/dQw4w9WgXcQ"
],
"scrapeDetails": true
}

scrapeDetails may improve or fill:

  • publishedAt
  • category
  • description
  • viewCount

It is best used after shortlisting because it performs extra requests per video.

4. Collect transcripts only when needed

Use includeTranscript for videos where you actually need text, timestamps, or LLM-ready content.

{
"videoUrls": [
"https://www.youtube.com/watch?v=XVv6mJpFOb0"
],
"scrapeDetails": true,
"includeTranscript": true,
"transcriptLanguage": "en"
}

When a transcript is available, the row includes timestamped transcript segments and a combined plain-text transcript. If YouTube does not provide captions for the video, or the captions cannot be fetched, the Actor still returns the video row without transcript fields.

Input reference

FieldTypeDefaultDescription
searchQueriesarray of stringsemptyYouTube search keywords. Each query runs separately and can return up to maxResults videos. Best for discovery and SEO or market research.
channelUrlsarray of stringsemptyYouTube channel inputs. Supports @handle, UC channel IDs, and common youtube.com channel, c, and user URLs. Returns recent public videos; RSS fallback is used when the channel page does not expose rows.
videoUrlsarray of stringsemptyExact YouTube videos to process. Supports 11-character video IDs and common watch, shorts, embed, live, and youtu.be URL formats. Best for enrichment and transcripts.
maxResultsinteger50Maximum videos per search query or channel. It does not multiply direct videoUrls; each provided video URL is processed once.
scrapeDetailsbooleanfalseFetches richer metadata for each row. Use on shortlists or smaller runs.
includeTranscriptbooleanfalseAttempts transcript extraction for each video. Use on targeted runs because this is the heaviest mode.
transcriptLanguagestringenPreferred transcript language code, such as en, es, fr, de, ja, or pt. If that language is unavailable, the Actor can fall back to the first available caption track.

Input examples

Search by keyword

{
"searchQueries": ["supply chain automation"],
"maxResults": 25
}

Export latest channel videos

{
"channelUrls": ["https://www.youtube.com/@freecodecamp"],
"maxResults": 100
}

Process known videos

{
"videoUrls": [
"https://www.youtube.com/watch?v=XVv6mJpFOb0",
"https://youtu.be/PXMJ6FS7llk"
],
"scrapeDetails": true
}

Transcript run for a shortlist

{
"videoUrls": [
"https://www.youtube.com/watch?v=XVv6mJpFOb0"
],
"includeTranscript": true,
"transcriptLanguage": "en"
}

Mixed discovery run

{
"searchQueries": ["ai sales outreach"],
"channelUrls": ["https://www.youtube.com/@HubSpot"],
"maxResults": 30
}

Output fields

Each dataset item is one YouTube video row. The Actor does not write helper rows for empty input.

Core fields

FieldTypeDescription
recordVersionstringOutput contract version, currently 1.0.
enrichmentLevelstringbase, detail, or transcript. Shows how far the row was enriched.
videoIdstringYouTube video ID.
titlestringVideo title.
urlstringCanonical YouTube watch URL.
channelNamestringChannel or author name when available.
channelIdstringYouTube channel ID when available.
channelUrlstringChannel URL when available.
viewCountintegerView count when available. May be 0 when the source does not expose it.
durationstringHuman-readable duration from listing pages when available.
durationSecondsintegerDuration in seconds when available.
publishedTextstringRelative publish text from listing pages, such as 2 weeks ago, when available.
publishedAtstringPublish date when available. Detail mode can improve this field.
descriptionstringSearch snippet, RSS description, or fuller video description depending on source and enrichment.
thumbnailUrlstringVideo thumbnail URL.
categorystringVideo category when detail metadata is available.
isLivebooleanWhether the source marks the video as live content.
sourcestringSource path used for the row: search, channel, or detail.
scrapedAtstringISO timestamp when the row was created.

Transcript fields

Transcript fields appear only when includeTranscript is true and captions are successfully returned.

FieldTypeDescription
transcriptarrayTimestamped caption segments. Each segment has text, start, and duration.
transcriptLanguagestringLanguage code of the transcript actually returned.
transcriptTextstringFull transcript joined into one plain-text string.

Example transcript segment:

{
"text": "Welcome back to the channel.",
"start": 12.4,
"duration": 3.2
}

Standby API

The Actor includes a Standby API for small interactive requests. The same validation rules apply as normal runs.

EndpointMethodUse
/GETReadiness check
/search?query=python%20automation&maxResults=10GETSearch videos
/channel?url=https://www.youtube.com/@freecodecamp&maxResults=10GETList recent channel videos
/video?url=XVv6mJpFOb0GETFetch one direct video
/runPOSTRun the normal Actor input JSON through Standby

Limits and practical notes

  • Transcripts are not guaranteed. They depend on whether YouTube exposes captions for the video and whether those captions can be fetched.
  • includeTranscript can still return a valid video row without transcript fields.
  • Search and channel rows may have lighter metadata than direct detail rows.
  • maxResults applies per search query and per channel URL.
  • Channel scraping works best with public channels and common YouTube URL formats.
  • Very large transcript runs are slower and more expensive than discovery runs. Shortlist first when possible.
  • YouTube page structure and availability can change. If a source path fails for a specific video or channel, try the most direct input type, especially videoUrls for known videos.

Pricing model

The Actor is designed for tiered pay-per-event use:

  1. base video rows for search and channel discovery
  2. detailed video rows when scrapeDetails is enabled
  3. transcript-ready rows when transcript extraction succeeds

This is why the recommended workflow is to discover broadly first, then run detail or transcript modes only on a shortlist.

Check the Apify Store pricing panel for the current event prices before running large jobs.

Why use this Actor

This Actor is focused on research, not just bulk scraping. It separates discovery, detail enrichment, and transcript extraction so you can control speed, dataset size, and cost.

Use it when you need structured YouTube data for market research, creator research, competitor monitoring, content strategy, or LLM-ready transcript workflows without maintaining your own YouTube scraping stack.