YouTube Search Scraper avatar

YouTube Search Scraper

Pricing

Pay per usage

Go to Apify Store
YouTube Search Scraper

YouTube Search Scraper

An optimized YouTube scraper to search videos, shorts, and streams by keywords, playlists, or channel URLs. Extract rich metadata, views, likes, and download subtitles/transcripts automatically.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

seungkyu cho

seungkyu cho

Maintained by Community

Actor stats

1

Bookmarked

3

Total users

2

Monthly active users

1.3 days

Issues response

3 days ago

Last modified

Share

๐Ÿš€ Premium YouTube Search Scraper (v1.0.0)

Apify Actor Node.js Playwright

A highly optimized, production-ready YouTube Search Scraper designed to extract detailed metadata from YouTube searches, playlists, channels, and individual watch pages. Combining high-performance HTTP requests with Playwright stealth browser automation, it delivers maximum speed and reliability at a minimal memory footprint.


๐ŸŒŸ Key Selling Points & Advantages

  • โšก Hybrid Scraping Engine: Extremely cost-efficient. Simple searches and watch pages are scraped via fast HTTP InnerTube API protocols (consuming minimal RAM). Playwright is activated only for rendering complex channels and playlists.
  • ๐ŸŒ Expose Localization Options (hl & gl): Target country-specific search relevance, regional trending orders, and retrieve localized metadata (e.g. original titles in Korean, Japanese, or French instead of forced English translations).
  • ๐Ÿ•’ Absolute Timestamp Interpolation: Automatically converts relative times (e.g., "3 days ago", "16 hours ago") into precise, query-relative absolute Unix Epoch timestamps (interpolatedTimestamp).
  • ๐Ÿ’ฌ Subtitle & Captions Downloader: Supports downloading full transcripts and subtitles in multiple formats (SRT, WebVTT, JSON, Plain Text).
  • ๐Ÿง  Low Memory Footprint (V8 GC Enabled): Regularly invokes garbage collection at task boundaries to prevent container OOMs, running comfortably on 256MB or 512MB RAM instances to minimize platform costs.
  • ๐Ÿ›ก๏ธ Strict Anti-Bot Bypass: Out-of-the-box support for user-agent rotation, stealth browser fingerprinting, and Apify Residential Proxies to dodge YouTube's rate limiting and CAPTCHAs.

๐Ÿ—๏ธ Technical Architecture

graph TD
A[Input JSON Config] --> B{Task Distributor}
B -- "Search Query / Search URL" --> C[HTTP InnerTube API - Ultra Fast]
B -- "Channel Handle / Playlist URL" --> D[Playwright Stealth Browser]
C --> E[Video IDs & Search Metadata]
D --> E
E --> F[Stage 2: Deep Watch Page Extraction]
F --> G[Extract InnerTube Player Details: Views, Likes, Description]
F --> H[Extract Captions / Subtitles if enabled]
G --> I[Relational Timestamp Interpolator]
H --> J[Compile Dataset Item]
I --> J
J --> K[Push to Apify Dataset]
K --> L[V8 Garbage Collection Memory Cleanup]

โš™๏ธ Input Parameters

The Actor accepts a highly configurable JSON object. Below are the key parameters available:

ParameterTypeDefaultDescription
searchQueriesarray["SpaceX", "Blue Origin"]Search keywords or phrases to query.
startUrlsarray[]Direct YouTube URLs to scrape (supports Video, Channel, or Playlist links).
maxResultsinteger50Maximum standard video results to scrape per search query or channel.
maxResultsShortsinteger0Maximum Shorts to scrape per query/channel.
maxResultStreamsinteger0Maximum Live Streams to scrape per query/channel.
downloadSubtitlesbooleanfalseIf enabled, extracts video subtitles/transcripts.
subtitlesFormatstring"srt"Subtitle output format (srt, vtt, txt, json).
sortingOrderstring"relevance"Sort results by relevance, popularity (views), uploadDate, or rating.
dateFilterstring"any"Filter by upload date (any, hour, today, thisWeek, thisMonth, thisYear).
lengthFilterstring"any"Filter by duration (any, under4 mins, between420 mins, over20 mins).
hlstring"en"Language code for YouTube client and translation UI.
glstring"US"Geographical region code for search relevance & localization.
proxyConfigurationobject{"useApifyProxy": true}Proxy configuration. Residential proxies are highly recommended.

[!IMPORTANT] When scraping regional keywords, match hl and gl to the target market (e.g., hl: "ko", gl: "KR" for South Korea) to retrieve accurate search results and localized metadata.


๐Ÿ“ฆ Output Format

The Actor pushes scraped data to the default Apify Dataset. Below is the schema structure and description:

FieldTypeDescription
idstringUnique YouTube Video ID
urlstringVideo watch page URL
titlestringVideo title
descriptionstringFull video description text
thumbnailUrlstringHighest resolution thumbnail URL
channelNamestringCreator's channel name
channelUrlstringCreator's channel URL
channelIdstringCreator's channel ID
channelSubscriberCountstringCreator's channel subscriber count
lengthSecondsintegerVideo duration in seconds
viewCountintegerNumber of video views
likesintegerNumber of video likes
publishedDatestringISO format upload timestamp
publishedTimeTextstringRelative time text (e.g. "2 days ago")
interpolatedTimestampintegerAbsolute Unix epoch timestamp (ms)
badgesarrayVideo format badges (e.g., ["4K", "CC", "Shorts"])
transcriptstringExtracted subtitles/transcripts (if downloadSubtitles: true)
subtitlesLanguagestringLanguage code of the downloaded transcript

๐Ÿ“„ Output JSON Example

[
{
"id": "R8m3G1E4s-g",
"url": "https://www.youtube.com/watch?v=R8m3G1E4s-g",
"title": "SpaceX Starship Test Flight 4 Launch",
"description": "Watch SpaceX launch Starship Flight 4 from Starbase, Texas...",
"thumbnailUrl": "https://i.ytimg.com/vi/R8m3G1E4s-g/maxresdefault.jpg",
"channelName": "SpaceX",
"channelUrl": "https://www.youtube.com/@SpaceX",
"channelId": "UC3xYfSxxxxxxx",
"channelSubscriberCount": "15.4M subscribers",
"lengthSeconds": 3600,
"viewCount": 2450000,
"likes": 120000,
"publishedDate": "2026-06-25T12:00:00.000Z",
"publishedTimeText": "2 days ago",
"interpolatedTimestamp": 1744027200000,
"badges": ["4K", "CC"],
"transcript": null,
"subtitlesLanguage": null
}
]

๐Ÿš€ Execution & Integration

Using Apify Client (Node.js)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('your-username/youtube-search-scraper').call({
searchQueries: ["SpaceX"],
maxResults: 20,
hl: "en",
gl: "US",
proxyConfiguration: {
useApifyProxy: true,
apifyProxyGroups: ["RESIDENTIAL"]
}
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} videos!`);

Local Development

  1. Install dependencies:
    $npm install
  2. Modify variables inside input.json to customize parameters.
  3. Run the actor locally:
    $npm start
  4. Run integration tests:
    $npm test

๐Ÿ›ก๏ธ License & Disclaimers

  • License: MIT
  • Disclaimer: This actor is designed for research, data analysis, and archiving purposes. Please respect YouTube's Terms of Service and robot.txt rules when designing your scraping workflows.