ScrapeClaw - Youtube_Scraper
Pricing
from $1.00 / actor start
ScrapeClaw - Youtube_Scraper
Part of ScrapeClaw (https://scrapeclaw.cc/) — a suite of production-ready, agentic social media scrapers for Instagram, YouTube, X/Twitter, TikTok, and Facebook. Built with Python & Playwright. No API keys required.
Pricing
from $1.00 / actor start
Rating
0.0
(0)
Developer

Scrapeclaw
Actor stats
0
Bookmarked
0
Total users
0
Monthly active users
14 days ago
Last modified
Categories
Share
📺 YouTube Channel Scrapper
A powerful, resilient, and anti-detect YouTube channel metadata scraper. This tool enables automated discovery and deep-scraping of YouTube channels without requiring official API keys or user authentication.
🚀 Features
- 🔍 Smart Discovery: Find channels by category and location using advanced Google Search and YouTube discovery techniques.
- 📊 Deep Scraping: Extract comprehensive metadata, including:
- Subscriber counts, total views, and video counts.
- Channel descriptions, joined dates, and verified status.
- Recent video uploads with thumbnails and metadata.
- External social links and location info.
- 🛡️ Anti-Detection: Built-in human-like behavior simulation (random mouse movements, scroll behavior) and custom user-agent rotation to minimize bot detection.
- 🖼️ Media Handling: Automatic downloading and resizing (JPEG compression) of profile pictures, banners, and video thumbnails.
- 🔄 Robust Orchestration: State-managed pipeline with auto-resume, failure recovery, and checkpointing for large-scale scraping operations.
📦 Installation
-
Clone the repository:
git clone https://github.com/yourusername/youtube-scrapper.gitcd youtube-scrapper -
Install dependencies:
$pip install playwright aiohttp python-dotenv Pillow tqdm -
Setup Playwright:
$playwright install chromium
🛠️ Usage
1. Channel Discovery
Find channel handles/URLs based on niche and location. This generates a queue file in data/queue/.
$python youtube_channel_discovery.py --categories "tech" --locations "India"
2. Detailed Scraping
Process a queue file to extract detailed metadata for each channel.
$python youtube_channel_scraper.py --queue data/queue/your_queue_file.json
3. Full Pipeline (Orchestrator)
Run the entire journey from discovery to completed scrape using a config file.
$python youtube_orchestrator.py --config config/scraper_config.json
⚙️ Configuration
The scraper behavior can be fine-tuned via JSON configuration files in the config/ directory:
| Setting | Description |
|---|---|
max_discovery_retries | Number of times to retry Google Search results. |
max_videos_to_scrape | Limit for recent video metadata collection per channel. |
delay_between_channels | Random range for sleep time between channel visits. |
headless | Set to true for background operation, false for visual monitoring. |
📂 Output Structure
data/output/: JSON files for each scraped channel.thumbnails/: Organized folders containing profile pics, banners, and video thumbnails.data/queue/: Checkpoint files for discovery results.data/progress/: Session state files for the orchestrator.
