Youtube Transcript

Under maintenance

Pricing

$2.50 / 1,000 results

Try for free

Go to Apify Store

Youtube Transcript

Under maintenance

Try for free

Harvest rich YouTube metadata and English transcripts at scale with this Apify actor—perfect for SEO, content repurposing, and AI workflows. Built-in proxy support, resilient caption extraction, and multi-format outputs keep your video intelligence accurate and ready for publishing.

Pricing

$2.50 / 1,000 results

Rating

0.0

(0)

Developer

codemaster devops

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

YouTube Transcript Downloader & Caption Scraper (Apify Actor)

Boost your SEO content strategy with clean YouTube transcripts, metadata, and captions in every popular format. This production-ready Apify actor extracts subtitles (manual or auto-generated), gathers rich video details, and saves everything to an Apify dataset that is easy to reuse in blogs, knowledge bases, or downstream NLP workflows.

✅ Works for both long-form videos and Shorts
✅ Supports Apify Proxy groups (BUYPROXIES94952, StaticUS3, Residential, or custom URLs)
✅ Delivers transcripts as text arrays, timestamped captions, concatenated strings, and XML
✅ Includes machine-readable .actor/input_schema.json and .actor/output_schema.json
✅ Optimised README for discoverability—help search engines and users understand the actor fast

Why Use This Actor
Quick Start
Input Schema
Output Schema
Example Dataset Record
How It Works
SEO & Content Marketing Ideas
FAQ
Contributing & Support

Why Use This Actor

Complete metadata + captions – Fetch title, channel info, engagement counts, description, tags, thumbnail, and English subtitles in multiple formats.
Resilient transcript extraction – Falls back from youtube-captions-scraper to youtubei.js and timed-text XML parsing to handle auto-generated captions or patched YouTube layouts.
Proxy-ready – Configure Apify proxy groups or custom URLs to prevent 410/429 errors and unblock regional content.
SEO-friendly output – Deliver transcripts the way content teams need them: arrays for bullet lists, timestamped objects for interactive players, or full-text strings for quick copy/paste.
Built for scaling – Retries transient errors, skips non-retryable responses, and stores results per video, so large batches keep moving.

Quick Start

1. Run on Apify Console

Click Deploy in the Apify actor UI or run apify push.
Open the actor in Apify Console and fill in the form (input schema is auto-generated).
Optional: choose a proxy group such as BUYPROXIES94952 for datacenter IPs or StaticUS3 for static US addresses.
Run the actor and watch the dataset populate with transcripts and metadata.

2. Run via Apify CLI

apify login
apify push   # deploy the actor (already configured with .actor files)

# Provide an input JSON file (see examples/input.json)
APIFY_LOCAL_STORAGE_DIR=./apify_storage \
  npm start

3. Integrate Programmatically

Use the Apify API or client libraries to trigger the actor from your app:

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });

await client.actor('your-username/youtube-transcript').call({
  videoUrls: [
    { url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' },
    { url: 'https://youtu.be/aqz-KE-bpKQ' }
  ],
  transcriptFormat: 'all',
  proxyConfiguration: {
    useApifyProxy: true,
    apifyProxyGroups: ['BUYPROXIES94952']
  }
});

Input Schema

The full JSON schema that powers the Apify input form lives at .actor/input_schema.json. Highlights:

Field	Type	Description
`videoUrls` (required)	`array` (`requestListSources` editor)	YouTube URLs or bare IDs. The actor normalises standard, short, embed, and playlist links.
`transcriptFormat`	`string`	Choose `all`, `textArray`, `textWithTimestamps`, `fullText`, or `xml`. Defaults to `all`.
`maxRetries`	`integer`	Number of retries for transient failures (default `3`).
`proxyConfiguration`	`object`	Standard Apify proxy config. Prefilled with the `BUYPROXIES94952` datacenter group for reliable scraping; swap to `StaticUS3`, `RESIDENTIAL`, or custom URLs if needed.

Example payload (also available in examples/input.json):

{
  "videoUrls": [
    { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" },
    { "url": "https://youtu.be/aqz-KE-bpKQ" }
  ],
  "transcriptFormat": "all",
  "maxRetries": 3,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": ["BUYPROXIES94952"]
  }
}

Output Schema

Every dataset item matches the JSON schema published in .actor/output_schema.json. The top-level structure is:

Field	Description
`videoId`	11-character YouTube identifier.
`url`	Original URL (or ID) submitted.
`metadata`	Rich video data: title, channel info, view/like/comment counts, publish date, description, tags, thumbnail.
`transcripts`	The transcript in the formats you requested. Contains `textArray`, `textWithTimestamps`, `fullText`, and/or `xml`. When captions are unavailable, this field is `null`.

Because the schema is machine-readable, you can quickly validate the dataset in CI or generate strongly typed DTOs for downstream services.

Example Dataset Record

{
  "videoId": "dQw4w9WgXcQ",
  "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
  "metadata": {
    "videoId": "dQw4w9WgXcQ",
    "title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
    "channelName": "Rick Astley",
    "channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
    "viewCount": 1704274503,
    "likeCount": 18591096,
    "commentCount": null,
    "publishDate": "2009-10-24T23:57:33-07:00",
    "description": "…",
    "tags": ["rick astley", "Never Gonna Give You Up", "rick roll"],
    "thumbnailUrl": "https://i.ytimg.com/vi_webp/dQw4w9WgXcQ/maxresdefault.webp"
  },
  "transcripts": {
    "textArray": ["[♪♪♪]", "♪ We're no strangers to love ♪", "…"],
    "textWithTimestamps": [
      { "start": 1.36, "duration": 1.68, "text": "[♪♪♪]" },
      { "start": 18.64, "duration": 3.24, "text": "♪ We're no strangers to love ♪" }
    ],
    "fullText": "[♪♪♪]\n♪ We're no strangers to love ♪\n…",
    "xml": "<?xml version=\"1.0\" encoding=\"utf-8\" ?><transcript>…</transcript>"
  }
}

See more examples in examples/dataset-sample.json.

How It Works

Input normalisation – Accepts raw IDs, long URLs, short URLs, or request-list sources and extracts the canonical video ID.
Proxy initialisation – Boots the global-agent HTTP proxy layer (if requested) and rotates sessions per video.
Metadata fetch – Uses ytdl-core plus youtubei.js to obtain video info, ensuring metrics even when the public API changes.
Transcript retrieval
- First tries youtube-captions-scraper for clean text
- Falls back to youtubei.js transcript endpoints
- Finally parses timed-text XML if required
Formatting – Converts captions into the requested output formats and synthesises XML when Google throttles the timed-text endpoint.
Persistence – Pushes each result to the default Apify dataset, respecting the JSON schema for easy downstream use.

SEO & Content Marketing Ideas

Repurpose transcripts into articles – Feed fullText into a summariser to craft blog posts or landing pages quickly.
Optimise long-tail keywords – Use metadata.tags and subtitles to identify phrases worth targeting in SEO campaigns.
Build GIF or reel scripts – Timestamped captions (textWithTimestamps) help editors cut highlight clips or reels.
Create accessible archives – Convert xml or textArray into readable transcripts for accessible knowledge bases.
Monitor competitors – Track rival channels for trending topics and keyword gaps.

FAQ

Q: Do I need proxies?
A: Not strictly, but enabling Apify Proxy (prefilled with BUYPROXIES94952) drastically reduces 410/429 errors and lets you access region-locked videos.

Q: Does it work with auto-generated captions?
A: Yes. The actor prefers manual English subtitles but will automatically fall back to auto-generated English transcripts and log a warning if only machine captions are available.

Q: Can I request other languages?
A: The current release targets English (en/auto). Fork the actor to add additional language preferences or contributions are welcome (see below).

Q: How do I validate outputs?
A: Use the .actor/output_schema.json file with any JSON Schema validator or integrate it into your build pipeline.

Contributing & Support

Issues / Ideas – Open an issue or submit a pull request on GitHub.
Commercial support – Need custom formats, extra language support, or private deployment? Reach out through Apify Marketplace or GitHub discussions.
Inspiration – Let us know how you use the actor; community showcases help others and improve search visibility!

Happy scraping and content creating! 🚀

Youtube Transcript Scraper

api-empire/youtube-transcript-scraper

Extract full YouTube video transcripts instantly with this Apify YouTube Transcript Scraper. Get accurate subtitles, timestamps, and speaker data for analysis, SEO, or research. Perfect for content creators, marketers, and data scientists. Fast, reliable, and easy to automate.

API Empire

Youtube Video Transcript Scraper

neuro-scraper/my-actor-11

A powerful YouTube Video Transcript Scraper that instantly pulls clean, accurate captions from any video — perfect for creators, researchers, and AI workflows. Fast, reliable, and built to save your time.

Neuro Scraper

Youtube Transcript Scraper

scrapier/youtube-transcript-scraper

Extract full transcripts from YouTube videos with the YouTube Transcript Scraper. Get precise timestamps, speaker names, and text for any video. Perfect for content analysis, SEO, research, and summarization. Fast, accurate, and easy to integrate into your workflow.

Scrapier

5.0

Youtube Transcript Scraper

thedoor/youtube-transcript-scraper

Extract full YouTube transcripts instantly. Bulk video support, precise timestamps, and multiple export formats (CSV, Excel, JSON). Perfect for AI training, SEO, and content analysis.

TheDoor

Youtube Transcript Scraper

easyapi/youtube-transcript-scraper

Extract YouTube video transcripts and captions effortlessly using multiple transcript services. Perfect for content analysis, subtitles extraction, and video accessibility.

EasyApi

Youtube Transcript Scraper

scrapio/youtube-transcript-scraper

Scrapes transcripts from any YouTube video, capturing full text, timestamps, language, and metadata. Ideal for SEO research, content analysis, accessibility, subtitle extraction, and automated processing of large video libraries with accurate transcript output

Scrapio

YouTube Transcript Scraper

igview-owner/youtube-transcript-scraper

Extract complete transcripts from any YouTube video with precise timestamps. Auto-selects English captions. Perfect for AI training, content analysis, SEO & research. Export to JSON/CSV/Excel.

Sachin Kumar Yadav

YouTube Transcript Ninja ⚡️🥷⚡

topaz_sharingan/Youtube-Transcript-Scraper-1

Extract transcripts from YouTube videos with ease! This actor takes a YouTube video URL as input and returns the transcript of the video in the specified format.

Moses Bilal

5.3K

5.0

TikTok & YouTube Transcript Extractor Scraper

memo23/tiktok-transcript-extractor-cheerio

Extract transcripts and metadata from both TikTok and YouTube videos in WebVTT format. Supports multiple URLs, language selection, proxy configuration, and advanced output for accessibility, analysis, or repurposing.

Muhamed Didovic

482

YouTube Transcript Scraper (Premium version)

smartly_automated/youtube-transcript-scraper-premium-version

Extract YouTube transcripts in 15+ languages with timestamps and metadata. Uses Apify's most expensive Proxy to bypass YouTube's IP blocking & rate limiting. Get fast bulk processing, video titles, views, channels, and clean text ready for AI, SEO, or content creation. Export in your desired format.