YouTube Transcript Corpus Audit & RAG Readiness avatar

YouTube Transcript Corpus Audit & RAG Readiness

Pricing

from $6.00 / 1,000 transcript rag chunks

Go to Apify Store
YouTube Transcript Corpus Audit & RAG Readiness

YouTube Transcript Corpus Audit & RAG Readiness

Extract public YouTube captions, audit transcript coverage, score RAG readiness, and create timestamped supporting chunks without double charging report mode.

Pricing

from $6.00 / 1,000 transcript rag chunks

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 days ago

Last modified

Share

YouTube Corpus Audit & RAG Readiness Report

Works best after

YouTube Channel Transcript RAG Intelligence is easiest to buy after one of these related Actors has already produced public rows or source context:

Start with $9 / corpus_snapshot_report; upgrade to $29 / rag_readiness_report only when the first report needs deeper action detail. Internal links improve discovery only. Qualified forecast still requires accounted paid usage.

After this run

Turn this Actor's output into a capped paid report with Website RAG Readiness Audit Report. Use it when AI builders, documentation teams, support teams, and technical marketers need to decide whether public website pages are clean and complete enough for RAG ingestion.

  • First report: $9 / website_rag_snapshot_report; set maxChargeUsd to $9.
  • Deeper report: $29 / website_rag_readiness_report; use only when the first result needs competitor or action-depth.
  • This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.

Proof-focused buyer summary

Built for AI builders, content teams, and knowledge-base owners who need to decide whether a YouTube corpus is clean enough for RAG before building embeddings or a chatbot.

  • Buy this when: Avoids building embeddings on a corpus that will fail retrieval quality later.
  • Entry: $9 / corpus_snapshot_report - $9 checks transcript availability, coverage, and basic corpus risk.
  • Premium: $29 / rag_readiness_report - $29 adds RAG readiness, chunking risks, retrieval QA actions, and missing-content priorities.
  • Output promise: decision summary, score, three prioritized actions, source URLs, warnings, chargedEvent, chargedUsd, and previewReport.nextRunInput.
  • Safety: keep maxChargeUsd equal to the tier price. Demo, dry run, blocked/private sources, failed sources, and cap-limited runs are no-charge.
  • Not promised: rankings, revenue, conversion lift, sales lift, legal/procurement/financial advice, or private-source enrichment.

Entry first-run input:

{
"demoMode": false,
"dryRun": false,
"reportTier": "corpus_snapshot",
"maxChargeUsd": 9,
"maxReports": 1,
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
],
"maxVideos": 1,
"includeTranscriptQuality": true,
"includeRagReadiness": true
}

Premium upgrade input:

{
"demoMode": false,
"dryRun": false,
"reportTier": "rag_readiness",
"maxChargeUsd": 29,
"maxReports": 1,
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
],
"maxVideos": 10,
"includeTranscriptQuality": true,
"includeRagReadiness": true,
"includeChunkingPlan": true,
"includeRetrievalRisks": true
}

Next report-style Actors

If you already have data from this Actor, these follow-on Actors turn public or user-provided inputs into decision-ready reports. They are optional, capped by maxChargeUsd, and do not make business outcome claims.

Turn public YouTube captions from videos, playlists, or channels into a decision-ready RAG corpus audit. The report focuses on caption coverage, missing-caption risk, chunking quality, retrieval QA actions, and the next run needed to move from raw transcript extraction to usable AI retrieval.

Use corpus_snapshot for a compact coverage checklist, or rag_readiness when you need retrieval QA actions and prioritized fixes. The legacy chunks mode is still available for timestamped transcript rows.

First-run quality and billing cue

Start with demoMode or dryRun when you only want to inspect the output shape. For paid runs, use the entry report first, set maxChargeUsd as a hard cap, and move to premium only when you need comparison, recommendations, or a decision-ready action list. Pricing is controlled by the Apify Store listing, and maxChargeUsd should match the tier the buyer expects to run.

Store Quickstart

Recommended first run:

{
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
],
"language": "en",
"reportTier": "corpus_snapshot",
"maxChargeUsd": 9,
"maxVideos": 1,
"delivery": "dataset",
"dryRun": false
}

Input Examples

Corpus Snapshot Report

{
"videoUrls": [
"https://www.youtube.com/watch?v=dQw4w9WgXcQ"
],
"reportTier": "corpus_snapshot",
"maxChargeUsd": 9,
"maxVideos": 1,
"delivery": "dataset",
"dryRun": false
}

RAG Readiness Report

{
"channelUrls": [
"https://www.youtube.com/@OpenAI"
],
"reportTier": "rag_readiness",
"maxChargeUsd": 29,
"maxVideos": 5,
"chunkSize": 1200,
"chunkOverlap": 150,
"delivery": "dataset",
"dryRun": false
}

Legacy Transcript Chunks

{
"videoIds": [
"dQw4w9WgXcQ"
],
"reportTier": "chunks",
"language": "en",
"delivery": "webhook",
"webhookUrl": "https://example.com/webhook",
"dryRun": false
}

Sample Output

{
"meta": {
"actorName": "youtube-channel-transcript-rag-intelligence",
"actorTitle": "YouTube Channel Transcript RAG Intelligence",
"fetchedAt": "2026-05-09T00:00:00.000Z",
"totalRows": 2
},
"rows": [
{
"rowType": "corpus_audit_report",
"reportTier": "rag_readiness",
"status": "success",
"chargedEvent": "rag_readiness_report",
"chargedUsd": 29,
"decisionSummary": "RAG readiness report: 4/5 videos have usable public captions, coverage score 80, RAG readiness 76/100, missing-caption risk medium.",
"coverageScore": 80,
"ragReadinessScore": 76,
"missingCaptionRisk": "medium",
"chunkingRisks": [
{
"severity": "low",
"code": "chunking_ok",
"action": "Use current chunking as a baseline for retrieval QA."
}
],
"retrievalQaChecklist": [
{
"check": "caption_coverage",
"status": "pass",
"action": "Confirm target videos have public captions before indexing."
},
{
"check": "grounding",
"status": "required",
"action": "Run golden Q&A prompts and require timestamped source citations."
}
],
"actionList": [
"Replace captionless or unavailable videos and keep their warning rows as no-charge source evidence.",
"Build 5-10 golden retrieval questions and verify each answer cites a timestamped source chunk.",
"Tag missing-caption videos as ingestion blockers before scheduling recurring corpus updates."
],
"previewReport": {
"nextRunInput": {
"channelUrls": ["https://www.youtube.com/@OpenAI"],
"reportTier": "rag_readiness",
"maxChargeUsd": 29,
"dryRun": false
}
},
"sourceUrls": ["https://www.youtube.com/@OpenAI"]
}
],
"warnings": []
}

Output Fields

Report rows include:

  • decisionSummary
  • coverageScore
  • ragReadinessScore
  • missingCaptionRisk
  • chunkingRisks
  • retrievalQaChecklist
  • actionList and prioritizedActions
  • previewReport.nextRunInput
  • status, chargedEvent, chargedUsd, reason
  • sourceUrls, warnings, errors

Supporting transcript chunks are included as no-charge evidence rows in report mode. Legacy chunk mode keeps timestamped rag_chunk rows with video metadata, chunk text, timestamps, and source URLs.

Pricing and No-Charge Rules

  • corpus_snapshot emits corpus_snapshot_report.
  • rag_readiness emits rag_readiness_report.
  • Report mode charges at most one paid event per run. Supporting chunks are no-charge evidence rows.
  • dryRun, demoMode, caption failures, source failures, and maxChargeUsd limit rows are no-charge.
  • For recurring checks, save a successful report input as an Apify task and keep maxChargeUsd aligned with the selected tier.

Compliance Guardrails

  • Uses public YouTube pages and public caption tracks only.
  • No account session, private video, member-only, paywalled, or login-only access is used.
  • No CAPTCHA or rate-limit bypass is attempted.
  • Do not position output as a replacement for rights-managed transcript licensing.
  • Do not claim ranking, sales, or revenue improvements from the report.
  • Do not use provider emblems or wording that implies upstream approval.

See Also

Use these follow-on Actors when you want a capped, decision-ready report instead of more raw rows. They use public or user-provided inputs, respect maxChargeUsd, and do not promise rankings, revenue, conversion lifts, or sales outcomes.

If this Actor gave you raw rows or source context, these follow-on report Actors are designed for a small capped paid run. They help make a decision, not just collect more data.

  • Website RAG Readiness Audit Report - decide whether public website pages are clean and complete enough for RAG ingestion. Entry $9 / website_rag_snapshot_report; premium $29 / website_rag_readiness_report.

Keep maxChargeUsd equal to the selected tier. Internal links are traffic aids only; real proof requires accounted paid usage.