Youtube Transcript
Pricing
$2.50 / 1,000 results
Youtube Transcript
Harvest rich YouTube metadata and English transcripts at scale with this Apify actor—perfect for SEO, content repurposing, and AI workflows. Built-in proxy support, resilient caption extraction, and multi-format outputs keep your video intelligence accurate and ready for publishing.
Pricing
$2.50 / 1,000 results
Rating
0.0
(0)
Developer

codemaster devops
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
YouTube Transcript Downloader & Caption Scraper (Apify Actor)
Boost your SEO content strategy with clean YouTube transcripts, metadata, and captions in every popular format. This production-ready Apify actor extracts subtitles (manual or auto-generated), gathers rich video details, and saves everything to an Apify dataset that is easy to reuse in blogs, knowledge bases, or downstream NLP workflows.
- ✅ Works for both long-form videos and Shorts
- ✅ Supports Apify Proxy groups (
BUYPROXIES94952,StaticUS3, Residential, or custom URLs) - ✅ Delivers transcripts as text arrays, timestamped captions, concatenated strings, and XML
- ✅ Includes machine-readable .actor/input_schema.json and .actor/output_schema.json
- ✅ Optimised README for discoverability—help search engines and users understand the actor fast
Table of Contents
- Why Use This Actor
- Quick Start
- Input Schema
- Output Schema
- Example Dataset Record
- How It Works
- SEO & Content Marketing Ideas
- FAQ
- Contributing & Support
Why Use This Actor
- Complete metadata + captions – Fetch title, channel info, engagement counts, description, tags, thumbnail, and English subtitles in multiple formats.
- Resilient transcript extraction – Falls back from
youtube-captions-scrapertoyoutubei.jsand timed-text XML parsing to handle auto-generated captions or patched YouTube layouts. - Proxy-ready – Configure Apify proxy groups or custom URLs to prevent 410/429 errors and unblock regional content.
- SEO-friendly output – Deliver transcripts the way content teams need them: arrays for bullet lists, timestamped objects for interactive players, or full-text strings for quick copy/paste.
- Built for scaling – Retries transient errors, skips non-retryable responses, and stores results per video, so large batches keep moving.
Quick Start
1. Run on Apify Console
- Click Deploy in the Apify actor UI or run
apify push. - Open the actor in Apify Console and fill in the form (input schema is auto-generated).
- Optional: choose a proxy group such as
BUYPROXIES94952for datacenter IPs orStaticUS3for static US addresses. - Run the actor and watch the dataset populate with transcripts and metadata.
2. Run via Apify CLI
apify loginapify push # deploy the actor (already configured with .actor files)# Provide an input JSON file (see examples/input.json)APIFY_LOCAL_STORAGE_DIR=./apify_storage \npm start
3. Integrate Programmatically
Use the Apify API or client libraries to trigger the actor from your app:
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });await client.actor('your-username/youtube-transcript').call({videoUrls: [{ url: 'https://www.youtube.com/watch?v=dQw4w9WgXcQ' },{ url: 'https://youtu.be/aqz-KE-bpKQ' }],transcriptFormat: 'all',proxyConfiguration: {useApifyProxy: true,apifyProxyGroups: ['BUYPROXIES94952']}});
Input Schema
The full JSON schema that powers the Apify input form lives at .actor/input_schema.json. Highlights:
| Field | Type | Description |
|---|---|---|
videoUrls (required) | array (requestListSources editor) | YouTube URLs or bare IDs. The actor normalises standard, short, embed, and playlist links. |
transcriptFormat | string | Choose all, textArray, textWithTimestamps, fullText, or xml. Defaults to all. |
maxRetries | integer | Number of retries for transient failures (default 3). |
proxyConfiguration | object | Standard Apify proxy config. Prefilled with the BUYPROXIES94952 datacenter group for reliable scraping; swap to StaticUS3, RESIDENTIAL, or custom URLs if needed. |
Example payload (also available in examples/input.json):
{"videoUrls": [{ "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ" },{ "url": "https://youtu.be/aqz-KE-bpKQ" }],"transcriptFormat": "all","maxRetries": 3,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["BUYPROXIES94952"]}}
Output Schema
Every dataset item matches the JSON schema published in .actor/output_schema.json. The top-level structure is:
| Field | Description |
|---|---|
videoId | 11-character YouTube identifier. |
url | Original URL (or ID) submitted. |
metadata | Rich video data: title, channel info, view/like/comment counts, publish date, description, tags, thumbnail. |
transcripts | The transcript in the formats you requested. Contains textArray, textWithTimestamps, fullText, and/or xml. When captions are unavailable, this field is null. |
Because the schema is machine-readable, you can quickly validate the dataset in CI or generate strongly typed DTOs for downstream services.
Example Dataset Record
{"videoId": "dQw4w9WgXcQ","url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ","metadata": {"videoId": "dQw4w9WgXcQ","title": "Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)","channelName": "Rick Astley","channelId": "UCuAXFkgsw1L7xaCfnd5JJOw","viewCount": 1704274503,"likeCount": 18591096,"commentCount": null,"publishDate": "2009-10-24T23:57:33-07:00","description": "…","tags": ["rick astley", "Never Gonna Give You Up", "rick roll"],"thumbnailUrl": "https://i.ytimg.com/vi_webp/dQw4w9WgXcQ/maxresdefault.webp"},"transcripts": {"textArray": ["[♪♪♪]", "♪ We're no strangers to love ♪", "…"],"textWithTimestamps": [{ "start": 1.36, "duration": 1.68, "text": "[♪♪♪]" },{ "start": 18.64, "duration": 3.24, "text": "♪ We're no strangers to love ♪" }],"fullText": "[♪♪♪]\n♪ We're no strangers to love ♪\n…","xml": "<?xml version=\"1.0\" encoding=\"utf-8\" ?><transcript>…</transcript>"}}
See more examples in examples/dataset-sample.json.
How It Works
- Input normalisation – Accepts raw IDs, long URLs, short URLs, or request-list sources and extracts the canonical video ID.
- Proxy initialisation – Boots the global-agent HTTP proxy layer (if requested) and rotates sessions per video.
- Metadata fetch – Uses
ytdl-coreplusyoutubei.jsto obtain video info, ensuring metrics even when the public API changes. - Transcript retrieval
- First tries
youtube-captions-scraperfor clean text - Falls back to
youtubei.jstranscript endpoints - Finally parses timed-text XML if required
- First tries
- Formatting – Converts captions into the requested output formats and synthesises XML when Google throttles the timed-text endpoint.
- Persistence – Pushes each result to the default Apify dataset, respecting the JSON schema for easy downstream use.
SEO & Content Marketing Ideas
- Repurpose transcripts into articles – Feed
fullTextinto a summariser to craft blog posts or landing pages quickly. - Optimise long-tail keywords – Use
metadata.tagsand subtitles to identify phrases worth targeting in SEO campaigns. - Build GIF or reel scripts – Timestamped captions (
textWithTimestamps) help editors cut highlight clips or reels. - Create accessible archives – Convert
xmlortextArrayinto readable transcripts for accessible knowledge bases. - Monitor competitors – Track rival channels for trending topics and keyword gaps.
FAQ
Q: Do I need proxies?
A: Not strictly, but enabling Apify Proxy (prefilled with BUYPROXIES94952) drastically reduces 410/429 errors and lets you access region-locked videos.
Q: Does it work with auto-generated captions?
A: Yes. The actor prefers manual English subtitles but will automatically fall back to auto-generated English transcripts and log a warning if only machine captions are available.
Q: Can I request other languages?
A: The current release targets English (en/auto). Fork the actor to add additional language preferences or contributions are welcome (see below).
Q: How do I validate outputs?
A: Use the .actor/output_schema.json file with any JSON Schema validator or integrate it into your build pipeline.
Contributing & Support
- Issues / Ideas – Open an issue or submit a pull request on GitHub.
- Commercial support – Need custom formats, extra language support, or private deployment? Reach out through Apify Marketplace or GitHub discussions.
- Inspiration – Let us know how you use the actor; community showcases help others and improve search visibility!
Happy scraping and content creating! 🚀