YouTube Search Scraper
Pricing
Pay per usage
YouTube Search Scraper
An optimized YouTube scraper to search videos, shorts, and streams by keywords, playlists, or channel URLs. Extract rich metadata, views, likes, and download subtitles/transcripts automatically.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
seungkyu cho
Maintained by CommunityActor stats
1
Bookmarked
3
Total users
2
Monthly active users
1.3 days
Issues response
3 days ago
Last modified
Categories
Share
๐ Premium YouTube Search Scraper (v1.0.0)
A highly optimized, production-ready YouTube Search Scraper designed to extract detailed metadata from YouTube searches, playlists, channels, and individual watch pages. Combining high-performance HTTP requests with Playwright stealth browser automation, it delivers maximum speed and reliability at a minimal memory footprint.
๐ Key Selling Points & Advantages
- โก Hybrid Scraping Engine: Extremely cost-efficient. Simple searches and watch pages are scraped via fast HTTP InnerTube API protocols (consuming minimal RAM). Playwright is activated only for rendering complex channels and playlists.
- ๐ Expose Localization Options (
hl&gl): Target country-specific search relevance, regional trending orders, and retrieve localized metadata (e.g. original titles in Korean, Japanese, or French instead of forced English translations). - ๐ Absolute Timestamp Interpolation: Automatically converts relative times (e.g.,
"3 days ago","16 hours ago") into precise, query-relative absolute Unix Epoch timestamps (interpolatedTimestamp). - ๐ฌ Subtitle & Captions Downloader: Supports downloading full transcripts and subtitles in multiple formats (
SRT,WebVTT,JSON,Plain Text). - ๐ง Low Memory Footprint (V8 GC Enabled): Regularly invokes garbage collection at task boundaries to prevent container OOMs, running comfortably on 256MB or 512MB RAM instances to minimize platform costs.
- ๐ก๏ธ Strict Anti-Bot Bypass: Out-of-the-box support for user-agent rotation, stealth browser fingerprinting, and Apify Residential Proxies to dodge YouTube's rate limiting and CAPTCHAs.
๐๏ธ Technical Architecture
graph TDA[Input JSON Config] --> B{Task Distributor}B -- "Search Query / Search URL" --> C[HTTP InnerTube API - Ultra Fast]B -- "Channel Handle / Playlist URL" --> D[Playwright Stealth Browser]C --> E[Video IDs & Search Metadata]D --> EE --> F[Stage 2: Deep Watch Page Extraction]F --> G[Extract InnerTube Player Details: Views, Likes, Description]F --> H[Extract Captions / Subtitles if enabled]G --> I[Relational Timestamp Interpolator]H --> J[Compile Dataset Item]I --> JJ --> K[Push to Apify Dataset]K --> L[V8 Garbage Collection Memory Cleanup]
โ๏ธ Input Parameters
The Actor accepts a highly configurable JSON object. Below are the key parameters available:
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQueries | array | ["SpaceX", "Blue Origin"] | Search keywords or phrases to query. |
startUrls | array | [] | Direct YouTube URLs to scrape (supports Video, Channel, or Playlist links). |
maxResults | integer | 50 | Maximum standard video results to scrape per search query or channel. |
maxResultsShorts | integer | 0 | Maximum Shorts to scrape per query/channel. |
maxResultStreams | integer | 0 | Maximum Live Streams to scrape per query/channel. |
downloadSubtitles | boolean | false | If enabled, extracts video subtitles/transcripts. |
subtitlesFormat | string | "srt" | Subtitle output format (srt, vtt, txt, json). |
sortingOrder | string | "relevance" | Sort results by relevance, popularity (views), uploadDate, or rating. |
dateFilter | string | "any" | Filter by upload date (any, hour, today, thisWeek, thisMonth, thisYear). |
lengthFilter | string | "any" | Filter by duration (any, under4 mins, between420 mins, over20 mins). |
hl | string | "en" | Language code for YouTube client and translation UI. |
gl | string | "US" | Geographical region code for search relevance & localization. |
proxyConfiguration | object | {"useApifyProxy": true} | Proxy configuration. Residential proxies are highly recommended. |
[!IMPORTANT] When scraping regional keywords, match
hlandglto the target market (e.g.,hl: "ko",gl: "KR"for South Korea) to retrieve accurate search results and localized metadata.
๐ฆ Output Format
The Actor pushes scraped data to the default Apify Dataset. Below is the schema structure and description:
| Field | Type | Description |
|---|---|---|
id | string | Unique YouTube Video ID |
url | string | Video watch page URL |
title | string | Video title |
description | string | Full video description text |
thumbnailUrl | string | Highest resolution thumbnail URL |
channelName | string | Creator's channel name |
channelUrl | string | Creator's channel URL |
channelId | string | Creator's channel ID |
channelSubscriberCount | string | Creator's channel subscriber count |
lengthSeconds | integer | Video duration in seconds |
viewCount | integer | Number of video views |
likes | integer | Number of video likes |
publishedDate | string | ISO format upload timestamp |
publishedTimeText | string | Relative time text (e.g. "2 days ago") |
interpolatedTimestamp | integer | Absolute Unix epoch timestamp (ms) |
badges | array | Video format badges (e.g., ["4K", "CC", "Shorts"]) |
transcript | string | Extracted subtitles/transcripts (if downloadSubtitles: true) |
subtitlesLanguage | string | Language code of the downloaded transcript |
๐ Output JSON Example
[{"id": "R8m3G1E4s-g","url": "https://www.youtube.com/watch?v=R8m3G1E4s-g","title": "SpaceX Starship Test Flight 4 Launch","description": "Watch SpaceX launch Starship Flight 4 from Starbase, Texas...","thumbnailUrl": "https://i.ytimg.com/vi/R8m3G1E4s-g/maxresdefault.jpg","channelName": "SpaceX","channelUrl": "https://www.youtube.com/@SpaceX","channelId": "UC3xYfSxxxxxxx","channelSubscriberCount": "15.4M subscribers","lengthSeconds": 3600,"viewCount": 2450000,"likes": 120000,"publishedDate": "2026-06-25T12:00:00.000Z","publishedTimeText": "2 days ago","interpolatedTimestamp": 1744027200000,"badges": ["4K", "CC"],"transcript": null,"subtitlesLanguage": null}]
๐ Execution & Integration
Using Apify Client (Node.js)
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('your-username/youtube-search-scraper').call({searchQueries: ["SpaceX"],maxResults: 20,hl: "en",gl: "US",proxyConfiguration: {useApifyProxy: true,apifyProxyGroups: ["RESIDENTIAL"]}});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Scraped ${items.length} videos!`);
Local Development
- Install dependencies:
$npm install
- Modify variables inside
input.jsonto customize parameters. - Run the actor locally:
$npm start
- Run integration tests:
$npm test
๐ก๏ธ License & Disclaimers
- License: MIT
- Disclaimer: This actor is designed for research, data analysis, and archiving purposes. Please respect YouTube's Terms of Service and robot.txt rules when designing your scraping workflows.