Youtube Shorts Scraper avatar
Youtube Shorts Scraper

Pricing

$25.00/month + usage

Go to Apify Store
Youtube Shorts Scraper

Youtube Shorts Scraper

YouTube Shorts Scraper is an Apify Actor that scrapes video data and comments from the Shorts section of a YouTube channel. It extracts key details such as video title, URL, view count, streaming time (in days), hashtags, description, comments count, and individual comments with user IDs.

Pricing

$25.00/month + usage

Rating

5.0

(2)

Developer

scraping automation

scraping automation

Maintained by Community

Actor stats

2

Bookmarked

43

Total users

1

Monthly active users

4 days ago

Last modified

Share

๐Ÿ“บ YouTube Shorts Scraper

High-performance YouTube Shorts scraper that extracts comprehensive video data, comments, engagement metrics, transcripts, and more from YouTube Shorts channels. Built with Apify and Puppeteer for reliability and speed.

๐Ÿš€ Key Features

Core Capabilities

  • โœ… Comprehensive Data Extraction: Video metadata, comments, engagement metrics, transcripts, and channel details
  • โœ… Multiple Channel Support: Process single or multiple channels in one run
  • โœ… Date Filtering: Filter videos by upload date range (YYYY-MM-DD format)
  • โœ… Advanced Anti-Bot Protection: Stealth mode with fingerprinting to avoid detection
  • โœ… Robust Error Handling: Graceful fallbacks and retry mechanisms
  • โœ… Make.com Integration: Real-time webhook delivery with batch processing

Data Extraction Features

  • โœ… Video Metadata: Title, URL, views, upload time, duration, hashtags, description
  • โœ… Engagement Metrics: Automatic calculation of engagement rate, score, and category
  • โœ… Comments: Extract up to 100 comments per video with auto-scrolling
  • โœ… Transcripts: Extract video transcripts/subtitles with language detection (unique feature)
  • โœ… Channel Details: Channel information, description, links, joined date

๐Ÿ“ฅ Input Parameters

Channel Configuration

ParameterTypeRequiredDefaultDescription
channelstringNo*"https://www.youtube.com/@Dior"Single YouTube channel URL. Supports @username, channel ID, c/, or user/ formats
channelsstring[]No*[]Array of YouTube channel URLs to scrape (processes sequentially)

Processing Configuration

ParameterTypeDefaultRangeDescription
max_videosinteger31-1000Maximum number of videos to process per channel
batch_sizeinteger11-50Videos per batch (smaller = faster)
delay_between_batchesinteger10050-5000Delay between batches in milliseconds
get_detailsbooleantrue-When true, opens every video to collect comments, likes and descriptions
include_channel_detailsbooleantrue-When true, captures the channel's About information

Date Filtering

ParameterTypeDefaultFormatDescription
oldest_post_datestringnullYYYY-MM-DDFilter videos by oldest upload date (e.g., "2024-01-01")
newest_post_datestringnullYYYY-MM-DDFilter videos by newest upload date (e.g., "2025-12-31")

Advanced Features

ParameterTypeDefaultRangeDescription
max_comments_per_videointeger101-100Maximum comments to extract per video
extract_durationbooleantrue-Extract video duration in seconds and ISO format
calculate_engagementbooleantrue-Calculate engagement metrics (rate, score, category)
extract_transcriptbooleantrue-Extract video transcript/subtitles if available

Integration

ParameterTypeDefaultDescription
webhook_urlstring""Optional webhook URL to send data to Make.com

Example Input Configurations

Minimal Configuration

{
"channel": "https://www.youtube.com/@Dior",
"max_videos": 10
}

Multiple Channels

{
"channels": [
"https://www.youtube.com/@Dior",
"https://www.youtube.com/@Nike",
"https://www.youtube.com/@Adidas"
],
"max_videos": 50,
"batch_size": 10
}

With Date Filtering

{
"channel": "https://www.youtube.com/@Dior",
"max_videos": 100,
"oldest_post_date": "2024-01-01",
"newest_post_date": "2025-12-31"
}
{
"channel": "https://www.youtube.com/@Dior",
"max_videos": 100,
"oldest_post_date": "2024-01-01",
"newest_post_date": "2025-12-31",
"max_comments_per_video": 50,
"extract_duration": true,
"calculate_engagement": true,
"extract_transcript": true,
"get_details": true,
"webhook_url": "https://hook.make.com/your-webhook-id"
}

๐Ÿ“ค Output Format

The Actor outputs structured JSON data to an Apify Dataset. Each video produces a comprehensive object:

{
"id": "dYD98Y26DGA",
"title": "#DiorSummer26 campaign featuring Louis Garrel",
"url": "https://www.youtube.com/shorts/dYD98Y26DGA",
"views": 11212,
"time_text": 365,
"hashtags": ["diorsummer26", "fashion"],
"description": "Clean video description without hashtags...",
"comments_count": 5,
"likes_count": 25,
"comments": [
{
"user_id": "username",
"user_href": "https://www.youtube.com/@username",
"text": "Great video!"
}
],
"duration_seconds": 26,
"duration_iso": "00:00:26",
"duration_formatted": "0:26",
"engagement_rate": 2.5,
"engagement_score": 75,
"engagement_category": "high",
"likes_per_view": 0.000025,
"comments_per_view": 0.00015,
"transcript": "Full transcript text if available...",
"transcript_available": true,
"transcript_language": "en",
"batch_id": "batch_1",
"processed_at": "2026-01-14T08:18:32.000Z",
"input_channel_url": "https://www.youtube.com/@Dior",
"channel_index": 0,
"channel_total": 1,
"webhook_sent": true
}

Output Fields

Basic Video Information:

  • id, title, url, views, time_text (days since upload)

Content Information:

  • hashtags (array), description, comments_count, likes_count

Comments Data:

  • comments (array with user_id, user_href, text)

Duration Information:

  • duration_seconds, duration_iso, duration_formatted

Engagement Metrics:

  • engagement_rate, engagement_score, engagement_category, likes_per_view, comments_per_view

Transcript Information:

  • transcript, transcript_available, transcript_language

Metadata:

  • batch_id, processed_at, input_channel_url, channel_index, channel_total, webhook_sent

๐Ÿ”ง Technical Features

Advanced Anti-Blocking System

  • Consent page detection: Automatically detects and handles YouTube consent popups
  • Content validation: Identifies blocked or empty pages with intelligent fallback
  • HTTP headers optimization: Mimics real browser behavior to avoid detection
  • Retry mechanisms: Implements smart retry logic for failed page loads
  • Stealth mode: Uses puppeteer-extra-plugin-stealth for advanced fingerprinting

Robust Data Extraction

  • Dual extraction strategy:
    • Primary: Extract from individual video pages
    • Fallback: Use data from channel grid view
  • Smart parsing functions:
    • Views: "11K views" โ†’ 11000, "1.2M views" โ†’ 1200000
    • Time: "2 weeks ago" โ†’ 14, "1 year ago" โ†’ 365
  • Error handling: Graceful fallback for failed extractions

YouTube Shorts Optimization

  • Grid-based extraction: Leverages Shorts grid view for reliable metadata
  • Adaptive timeouts: Different timeout strategies for Shorts vs. regular videos
  • Content detection: Identifies and handles Shorts-specific page structures
  • Automatic /shorts appending: Automatically appends /shorts to channel URLs if needed

๐Ÿ”— Make.com Integration

This Actor is optimized for Make.com (formerly Integromat) integration, providing real-time data delivery and efficient workflow automation.

Make.com Features

  • Real-time webhook delivery - Send results directly to Make.com as they're processed
  • Batch processing - Handle large volumes with configurable batch sizes
  • Rate limiting control - Adjustable delays to respect Make.com and YouTube rate limits
  • Enhanced output format - Includes batch IDs, timestamps, and webhook status

Setup Instructions

  1. Create webhook in Make.com - Add a Webhook module to your scenario
  2. Copy webhook URL - Use the generated webhook URL in your Actor input
  3. Configure batch settings - Adjust batch size and delays as needed
  4. Handle incoming data - Process webhook data in real-time workflows

Workflow Examples

  • Content Monitoring โ†’ YouTube Scraper โ†’ Webhook โ†’ Slack Notifications
  • Data Analysis โ†’ YouTube Scraper โ†’ Webhook โ†’ AI Analysis โ†’ Database
  • Content Curation โ†’ YouTube Scraper โ†’ Webhook โ†’ Content Filter โ†’ Social Media
  • Engagement Tracking โ†’ YouTube Scraper โ†’ Webhook โ†’ Analytics Dashboard

For detailed Make.com integration instructions, see ./MAKE_INTEGRATION.md.

โš™๏ธ How It Works

Channel Page Processing

The Actor navigates to the provided YouTube channel URL (supports both /videos and /shorts endpoints) and automatically appends /shorts if needed. It scrolls down to load more videos based on the max_videos parameter.

Video Processing

For each video, the Actor:

  1. Opens the video page
  2. Waits for metadata and comments to load
  3. Extracts comprehensive data including comments, transcripts, and engagement metrics
  4. Falls back to grid data if page extraction fails

Data Transformation

The Actor converts:

  • View counts: "11K views" โ†’ 11000, "1.2M views" โ†’ 1200000
  • Time descriptions: "2 weeks ago" โ†’ 14 days, "1 year ago" โ†’ 365 days
  • Engagement metrics: Calculates rate, score, and category automatically

Result Storage

All scraped data is pushed to an Apify Dataset with organized views for easy access and export.

๐Ÿ“ˆ Recent Improvements (v1.4+)

New Features

  • โœ… Date Filtering - Filter videos by upload date range (YYYY-MM-DD format)
  • โœ… Video Duration Extraction - Get duration in seconds, ISO format, and original format
  • โœ… Engagement Rate Calculation - Automatic calculation of engagement metrics
  • โœ… Extended Comments - Extract up to 100 comments per video with auto-scrolling
  • โœ… Transcript Extraction - Extract video transcripts/subtitles with language detection (unique feature)
  • โœ… Multiple Channel Support - Process multiple channels in one run with channel identification

Enhanced Data Extraction

  • โœ… Likes count extraction - Now captures video likes from YouTube Shorts
  • โœ… Robust view parsing - Handles "11K views", "1.2M views", "344K views" formats
  • โœ… Time parsing improvements - Better handling of various time formats (100% accuracy)
  • โœ… Grid-based fallback - Uses channel grid data when individual page extraction fails
  • โœ… Structured hashtags - Extracts hashtags as clean array format
  • โœ… Clean descriptions - Separates description content from hashtags

Anti-Blocking Enhancements

  • โœ… Consent page handling - Automatically accepts YouTube consent popups
  • โœ… Content validation - Detects blocked pages and implements fallback strategies
  • โœ… HTTP headers optimization - Enhanced browser fingerprinting
  • โœ… Retry mechanisms - Intelligent retry logic for failed requests

Performance Optimizations

  • โœ… Adaptive timeouts - Different strategies for Shorts vs. regular videos
  • โœ… Resource management - Better page cleanup and memory management
  • โœ… Error recovery - Graceful handling of extraction failures
  • โœ… Batch processing - Configurable batch sizes for optimal performance

โšก Performance Tips

  • Use smaller batch sizes (5-10) for faster processing
  • Reduce delays between batches (500-1000ms) for speed
  • Start with fewer videos and increase gradually
  • Use get_details: false for faster grid-only scraping
  • Enable date filtering to reduce processing time

This project is intended for educational and research purposes only. The use of this Actor must comply with YouTube's Terms of Service and robots.txt policies.

You are responsible for ensuring your use case does not violate YouTube website terms. YouTube's content and trademarks are the property of YouTube, Inc. Avoid aggressive scraping that could negatively impact YouTube's infrastructure.

If you intend to use this Actor for commercial purposes, consider reaching out to the YouTube API team for official data access.