Youtube Transcript avatar
Youtube Transcript
Under maintenance

Pricing

$2.50 / 1,000 results

Go to Apify Store
Youtube Transcript

Youtube Transcript

Under maintenance

Developed by

codemaster devops

codemaster devops

Maintained by Community

Harvest rich YouTube metadata and English transcripts at scale with this Apify actor—perfect for SEO, content repurposing, and AI workflows. Built-in proxy support, resilient caption extraction, and multi-format outputs keep your video intelligence accurate and ready for publishing.

0.0 (0)

Pricing

$2.50 / 1,000 results

0

1

1

Last modified

8 days ago

YouTube Transcript Downloader & Caption Scraper (Apify Actor)

Boost your SEO content strategy with clean YouTube transcripts, metadata, and captions in every popular format. This production-ready Apify actor extracts subtitles (manual or auto-generated), gathers rich video details, and saves everything to an Apify dataset that is easy to reuse in blogs, knowledge bases, or downstream NLP workflows.

  • ✅ Works for both long-form videos and Shorts
  • ✅ Supports Apify Proxy groups (BUYPROXIES94952, StaticUS3, Residential, or custom URLs)
  • ✅ Delivers transcripts as text arrays, timestamped captions, concatenated strings, and XML
  • ✅ Includes machine-readable .actor/input_schema.json and .actor/output_schema.json
  • ✅ Optimised README for discoverability—help search engines and users understand the actor fast

Table of Contents

  1. Why Use This Actor
  2. Quick Start
  3. Input Schema
  4. Output Schema
  5. Example Dataset Record
  6. How It Works
  7. SEO & Content Marketing Ideas
  8. FAQ
  9. Contributing & Support

Why Use This Actor

  • Complete metadata + captions – Fetch title, channel info, engagement counts, description, tags, thumbnail, and English subtitles in multiple formats.
  • Resilient transcript extraction – Falls back from youtube-captions-scraper to youtubei.js and timed-text XML parsing to handle auto-generated captions or patched YouTube layouts.
  • Proxy-ready – Configure Apify proxy groups or custom URLs to prevent 410/429 errors and unblock regional content.
  • SEO-friendly output – Deliver transcripts the way content teams need them: arrays for bullet lists, timestamped objects for interactive players, or full-text strings for quick copy/paste.
  • Built for scaling – Retries transient errors, skips non-retryable responses, and stores results per video, so large batches keep moving.

Quick Start

1. Run on Apify Console

  1. Click Deploy in the Apify actor UI or run apify push.
  2. Open the actor in Apify Console and fill in the form (input schema is auto-generated).
  3. Optional: choose a proxy group such as BUYPROXIES94952 for datacenter IPs or StaticUS3 for static US addresses.
  4. Run the actor and watch the dataset populate with transcripts and metadata.

2. Run via Apify CLI

apify login
apify push # deploy the actor (already configured with .actor files)
# Provide an input JSON file (see examples/input.json)
APIFY_LOCAL_STORAGE_DIR=./apify_storage \
npm start

3. Integrate Programmatically

Use the Apify API or client libraries to trigger the actor from your app:

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
await client.actor('your-username/youtube-transcript').call({
videoUrls: [
{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' },
{ url: 'https://youtu.be/aqz-KE-bpKQ' }
],
transcriptFormat: 'all',
proxyConfiguration: {
useApifyProxy: true,
apifyProxyGroups: ['BUYPROXIES94952']
}
});

Input Schema

The full JSON schema that powers the Apify input form lives at .actor/input_schema.json. Highlights:

FieldTypeDescription
videoUrls (required)array (requestListSources editor)YouTube URLs or bare IDs. The actor normalises standard, short, embed, and playlist links.
transcriptFormatstringChoose all, textArray, textWithTimestamps, fullText, or xml. Defaults to all.
maxRetriesintegerNumber of retries for transient failures (default 3).
proxyConfigurationobjectStandard Apify proxy config. Prefilled with the BUYPROXIES94952 datacenter group for reliable scraping; swap to StaticUS3, RESIDENTIAL, or custom URLs if needed.

Example payload (also available in examples/input.json):

{
"videoUrls": [
{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" },
{ "url": "https://youtu.be/aqz-KE-bpKQ" }
],
"transcriptFormat": "all",
"maxRetries": 3,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["BUYPROXIES94952"]
}
}

Output Schema

Every dataset item matches the JSON schema published in .actor/output_schema.json. The top-level structure is:

FieldDescription
videoId11-character YouTube identifier.
urlOriginal URL (or ID) submitted.
metadataRich video data: title, channel info, view/like/comment counts, publish date, description, tags, thumbnail.
transcriptsThe transcript in the formats you requested. Contains textArray, textWithTimestamps, fullText, and/or xml. When captions are unavailable, this field is null.

Because the schema is machine-readable, you can quickly validate the dataset in CI or generate strongly typed DTOs for downstream services.


Example Dataset Record

{
"videoId": "dQw4w9WgXcQ",
"url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
"metadata": {
"videoId": "dQw4w9WgXcQ",
"title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)",
"channelName": "Rick Astley",
"channelId": "UCuAXFkgsw1L7xaCfnd5JJOw",
"viewCount": 1704274503,
"likeCount": 18591096,
"commentCount": null,
"publishDate": "2009-10-24T23:57:33-07:00",
"description": "…",
"tags": ["rick astley", "Never Gonna Give You Up", "rick roll"],
"thumbnailUrl": "https://i.ytimg.com/vi_webp/dQw4w9WgXcQ/maxresdefault.webp"
},
"transcripts": {
"textArray": ["[♪♪♪]", "♪ We're no strangers to love ♪", "…"],
"textWithTimestamps": [
{ "start": 1.36, "duration": 1.68, "text": "[♪♪♪]" },
{ "start": 18.64, "duration": 3.24, "text": "♪ We're no strangers to love ♪" }
],
"fullText": "[♪♪♪]\n♪ We're no strangers to love ♪\n…",
"xml": "<?xml version=\"1.0\" encoding=\"utf-8\" ?><transcript>…</transcript>"
}
}

See more examples in examples/dataset-sample.json.


How It Works

  1. Input normalisation – Accepts raw IDs, long URLs, short URLs, or request-list sources and extracts the canonical video ID.
  2. Proxy initialisation – Boots the global-agent HTTP proxy layer (if requested) and rotates sessions per video.
  3. Metadata fetch – Uses ytdl-core plus youtubei.js to obtain video info, ensuring metrics even when the public API changes.
  4. Transcript retrieval
    • First tries youtube-captions-scraper for clean text
    • Falls back to youtubei.js transcript endpoints
    • Finally parses timed-text XML if required
  5. Formatting – Converts captions into the requested output formats and synthesises XML when Google throttles the timed-text endpoint.
  6. Persistence – Pushes each result to the default Apify dataset, respecting the JSON schema for easy downstream use.

SEO & Content Marketing Ideas

  • Repurpose transcripts into articles – Feed fullText into a summariser to craft blog posts or landing pages quickly.
  • Optimise long-tail keywords – Use metadata.tags and subtitles to identify phrases worth targeting in SEO campaigns.
  • Build GIF or reel scripts – Timestamped captions (textWithTimestamps) help editors cut highlight clips or reels.
  • Create accessible archives – Convert xml or textArray into readable transcripts for accessible knowledge bases.
  • Monitor competitors – Track rival channels for trending topics and keyword gaps.

FAQ

Q: Do I need proxies?
A: Not strictly, but enabling Apify Proxy (prefilled with BUYPROXIES94952) drastically reduces 410/429 errors and lets you access region-locked videos.

Q: Does it work with auto-generated captions?
A: Yes. The actor prefers manual English subtitles but will automatically fall back to auto-generated English transcripts and log a warning if only machine captions are available.

Q: Can I request other languages?
A: The current release targets English (en/auto). Fork the actor to add additional language preferences or contributions are welcome (see below).

Q: How do I validate outputs?
A: Use the .actor/output_schema.json file with any JSON Schema validator or integrate it into your build pipeline.


Contributing & Support

  • Issues / Ideas – Open an issue or submit a pull request on GitHub.
  • Commercial support – Need custom formats, extra language support, or private deployment? Reach out through Apify Marketplace or GitHub discussions.
  • Inspiration – Let us know how you use the actor; community showcases help others and improve search visibility!

Happy scraping and content creating! 🚀