ScrapeClaw - Youtube_Scraper avatar

ScrapeClaw - Youtube_Scraper

Under maintenance

Pricing

from $1.00 / actor start

Go to Apify Store
ScrapeClaw - Youtube_Scraper

ScrapeClaw - Youtube_Scraper

Under maintenance

Part of ScrapeClaw (https://scrapeclaw.cc/) — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook. Built with Python & Playwright. No API keys required.

Pricing

from $1.00 / actor start

Rating

0.0

(0)

Developer

Scrapeclaw

Scrapeclaw

Maintained by Community

Actor stats

0

Bookmarked

0

Total users

0

Monthly active users

14 days ago

Last modified

Share

📺 YouTube Channel Scrapper

Python Version Playwright License: MIT

A powerful, resilient, and anti-detect YouTube channel metadata scraper. This tool enables automated discovery and deep-scraping of YouTube channels without requiring official API keys or user authentication.


🚀 Features

  • 🔍 Smart Discovery: Find channels by category and location using advanced Google Search and YouTube discovery techniques.
  • 📊 Deep Scraping: Extract comprehensive metadata, including:
    • Subscriber counts, total views, and video counts.
    • Channel descriptions, joined dates, and verified status.
    • Recent video uploads with thumbnails and metadata.
    • External social links and location info.
  • 🛡️ Anti-Detection: Built-in human-like behavior simulation (random mouse movements, scroll behavior) and custom user-agent rotation to minimize bot detection.
  • 🖼️ Media Handling: Automatic downloading and resizing (JPEG compression) of profile pictures, banners, and video thumbnails.
  • 🔄 Robust Orchestration: State-managed pipeline with auto-resume, failure recovery, and checkpointing for large-scale scraping operations.

📦 Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/youtube-scrapper.git
    cd youtube-scrapper
  2. Install dependencies:

    $pip install playwright aiohttp python-dotenv Pillow tqdm
  3. Setup Playwright:

    $playwright install chromium

🛠️ Usage

1. Channel Discovery

Find channel handles/URLs based on niche and location. This generates a queue file in data/queue/.

$python youtube_channel_discovery.py --categories "tech" --locations "India"

2. Detailed Scraping

Process a queue file to extract detailed metadata for each channel.

$python youtube_channel_scraper.py --queue data/queue/your_queue_file.json

3. Full Pipeline (Orchestrator)

Run the entire journey from discovery to completed scrape using a config file.

$python youtube_orchestrator.py --config config/scraper_config.json

⚙️ Configuration

The scraper behavior can be fine-tuned via JSON configuration files in the config/ directory:

SettingDescription
max_discovery_retriesNumber of times to retry Google Search results.
max_videos_to_scrapeLimit for recent video metadata collection per channel.
delay_between_channelsRandom range for sleep time between channel visits.
headlessSet to true for background operation, false for visual monitoring.

📂 Output Structure

  • data/output/: JSON files for each scraped channel.
  • thumbnails/: Organized folders containing profile pics, banners, and video thumbnails.
  • data/queue/: Checkpoint files for discovery results.
  • data/progress/: Session state files for the orchestrator.