Youtube Transcript
Pricing
$2.50 / 1,000 results
Youtube Transcript
Under maintenanceHarvest rich YouTube metadata and English transcripts at scale with this Apify actor—perfect for SEO, content repurposing, and AI workflows. Built-in proxy support, resilient caption extraction, and multi-format outputs keep your video intelligence accurate and ready for publishing.
0.0 (0)
Pricing
$2.50 / 1,000 results
0
1
1
Last modified
8 days ago
YouTube Transcript Downloader & Caption Scraper (Apify Actor)
Boost your SEO content strategy with clean YouTube transcripts, metadata, and captions in every popular format. This production-ready Apify actor extracts subtitles (manual or auto-generated), gathers rich video details, and saves everything to an Apify dataset that is easy to reuse in blogs, knowledge bases, or downstream NLP workflows.
- ✅ Works for both long-form videos and Shorts
- ✅ Supports Apify Proxy groups (BUYPROXIES94952,StaticUS3, Residential, or custom URLs)
- ✅ Delivers transcripts as text arrays, timestamped captions, concatenated strings, and XML
- ✅ Includes machine-readable .actor/input_schema.json and .actor/output_schema.json
- ✅ Optimised README for discoverability—help search engines and users understand the actor fast
Table of Contents
- Why Use This Actor
- Quick Start
- Input Schema
- Output Schema
- Example Dataset Record
- How It Works
- SEO & Content Marketing Ideas
- FAQ
- Contributing & Support
Why Use This Actor
- Complete metadata + captions – Fetch title, channel info, engagement counts, description, tags, thumbnail, and English subtitles in multiple formats.
- Resilient transcript extraction – Falls back from youtube-captions-scrapertoyoutubei.jsand timed-text XML parsing to handle auto-generated captions or patched YouTube layouts.
- Proxy-ready – Configure Apify proxy groups or custom URLs to prevent 410/429 errors and unblock regional content.
- SEO-friendly output – Deliver transcripts the way content teams need them: arrays for bullet lists, timestamped objects for interactive players, or full-text strings for quick copy/paste.
- Built for scaling – Retries transient errors, skips non-retryable responses, and stores results per video, so large batches keep moving.
Quick Start
1. Run on Apify Console
- Click Deploy in the Apify actor UI or run apify push.
- Open the actor in Apify Console and fill in the form (input schema is auto-generated).
- Optional: choose a proxy group such as BUYPROXIES94952for datacenter IPs orStaticUS3for static US addresses.
- Run the actor and watch the dataset populate with transcripts and metadata.
2. Run via Apify CLI
3. Integrate Programmatically
Use the Apify API or client libraries to trigger the actor from your app:
Input Schema
The full JSON schema that powers the Apify input form lives at .actor/input_schema.json. Highlights:
| Field | Type | Description | 
|---|---|---|
| videoUrls(required) | array(requestListSourceseditor) | YouTube URLs or bare IDs. The actor normalises standard, short, embed, and playlist links. | 
| transcriptFormat | string | Choose all,textArray,textWithTimestamps,fullText, orxml. Defaults toall. | 
| maxRetries | integer | Number of retries for transient failures (default 3). | 
| proxyConfiguration | object | Standard Apify proxy config. Prefilled with the BUYPROXIES94952datacenter group for reliable scraping; swap toStaticUS3,RESIDENTIAL, or custom URLs if needed. | 
Example payload (also available in examples/input.json):
Output Schema
Every dataset item matches the JSON schema published in .actor/output_schema.json. The top-level structure is:
| Field | Description | 
|---|---|
| videoId | 11-character YouTube identifier. | 
| url | Original URL (or ID) submitted. | 
| metadata | Rich video data: title, channel info, view/like/comment counts, publish date, description, tags, thumbnail. | 
| transcripts | The transcript in the formats you requested. Contains textArray,textWithTimestamps,fullText, and/orxml. When captions are unavailable, this field isnull. | 
Because the schema is machine-readable, you can quickly validate the dataset in CI or generate strongly typed DTOs for downstream services.
Example Dataset Record
See more examples in examples/dataset-sample.json.
How It Works
- Input normalisation – Accepts raw IDs, long URLs, short URLs, or request-list sources and extracts the canonical video ID.
- Proxy initialisation – Boots the global-agent HTTP proxy layer (if requested) and rotates sessions per video.
- Metadata fetch – Uses ytdl-coreplusyoutubei.jsto obtain video info, ensuring metrics even when the public API changes.
- Transcript retrieval
- First tries youtube-captions-scraperfor clean text
- Falls back to youtubei.jstranscript endpoints
- Finally parses timed-text XML if required
 
- First tries 
- Formatting – Converts captions into the requested output formats and synthesises XML when Google throttles the timed-text endpoint.
- Persistence – Pushes each result to the default Apify dataset, respecting the JSON schema for easy downstream use.
SEO & Content Marketing Ideas
- Repurpose transcripts into articles – Feed fullTextinto a summariser to craft blog posts or landing pages quickly.
- Optimise long-tail keywords – Use metadata.tagsand subtitles to identify phrases worth targeting in SEO campaigns.
- Build GIF or reel scripts – Timestamped captions (textWithTimestamps) help editors cut highlight clips or reels.
- Create accessible archives – Convert xmlortextArrayinto readable transcripts for accessible knowledge bases.
- Monitor competitors – Track rival channels for trending topics and keyword gaps.
FAQ
Q: Do I need proxies?
A: Not strictly, but enabling Apify Proxy (prefilled with BUYPROXIES94952) drastically reduces 410/429 errors and lets you access region-locked videos.
Q: Does it work with auto-generated captions?
A: Yes. The actor prefers manual English subtitles but will automatically fall back to auto-generated English transcripts and log a warning if only machine captions are available.
Q: Can I request other languages?
A: The current release targets English (en/auto). Fork the actor to add additional language preferences or contributions are welcome (see below).
Q: How do I validate outputs?
A: Use the .actor/output_schema.json file with any JSON Schema validator or integrate it into your build pipeline.
Contributing & Support
- Issues / Ideas – Open an issue or submit a pull request on GitHub.
- Commercial support – Need custom formats, extra language support, or private deployment? Reach out through Apify Marketplace or GitHub discussions.
- Inspiration – Let us know how you use the actor; community showcases help others and improve search visibility!
Happy scraping and content creating! 🚀



















