YouTube Transcript Scraper avatar

YouTube Transcript Scraper

Pricing

from $2.50 / 1,000 results

Go to Apify Store
YouTube Transcript Scraper

YouTube Transcript Scraper

Extract YouTube captions, timestamps, SRT, VTT, and plain text from public videos in bulk without browser automation.

Pricing

from $2.50 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

YouTube Transcript Bulk API

Extract transcripts from public YouTube videos in bulk. The actor is built for AI pipelines, content repurposing, subtitle export, research, and searchable video archives.

What It Does

You provide YouTube video URLs or direct video IDs. The actor fetches the public YouTube watch page, reads available caption tracks, selects the best matching language, downloads the timed transcript XML, and returns one dataset row per video.

The launch implementation is HTTP-first and does not use browser automation. That keeps Apify hosting cost low and makes the pricing predictable.

Input

  • videoUrls: YouTube watch, Shorts, embed, live, or youtu.be URLs.
  • videoIds: Direct 11-character YouTube video IDs.
  • language: Preferred caption language such as en or ja.
  • includeAutoGenerated: Allows auto-generated captions when manual captions are not available.
  • translationLanguage: Optional YouTube transcript translation target.
  • outputFormat: json, text, srt, or vtt.
  • maxVideos: Maximum videos to process.
  • dryRun: Validate input and emit preview rows without fetching YouTube.

Output

Each video produces one row:

  • videoId
  • videoUrl
  • status
  • language
  • sourceLanguage
  • isAutoGenerated
  • segmentCount
  • fullText
  • segments
  • formattedTranscript
  • errorCode
  • errorMessage
  • scrapedAt

Unavailable captions, deleted videos, private videos, and request failures are returned as error rows instead of failing the full run. This follows Apify PPE best practice because the actor still performed work for that input.

Pricing

Recommended PPE launch target:

  • apify-actor-start: keep Apify default $0.00005.
  • apify-default-dataset-item: $0.0025 per transcript row.
  • Optional future enriched/translated event: $0.008 per enriched row.

The current cost model assumes HTTP requests, no browser, and no residential proxy. Publication should remain blocked if live cost probes show that residential proxy is required.

Limits

  • Only public videos with public caption tracks are supported.
  • Age-restricted, private, deleted, or captionless videos return an error row.
  • YouTube may change its watch page payload shape. The canary should run daily against a known captioned video.
  • Channel and playlist expansion is intentionally not part of v1. Add it only after transcript extraction has 30-day revenue signal.

Local Run

npm test
npm start

The default input.json uses dryRun: true so local startup does not depend on live YouTube access.