YouTube Transcript Corpus Audit & RAG Readiness
Pricing
from $6.00 / 1,000 transcript rag chunks
YouTube Transcript Corpus Audit & RAG Readiness
Extract public YouTube captions, audit transcript coverage, score RAG readiness, and create timestamped supporting chunks without double charging report mode.
Pricing
from $6.00 / 1,000 transcript rag chunks
Rating
0.0
(0)
Developer
太郎 山田
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
13 days ago
Last modified
Categories
Share
YouTube Corpus Audit & RAG Readiness Report
Works best after
YouTube Channel Transcript RAG Intelligence is easiest to buy after one of these related Actors has already produced public rows or source context:
- Youtube Transcript Bulk Api - use its public rows or source context as the starting input.
- Youtube Channel Analytics - use its public rows or source context as the starting input.
Start with $9 / corpus_snapshot_report; upgrade to $29 / rag_readiness_report only when the first report needs deeper action detail.
Internal links improve discovery only. Qualified forecast still requires accounted paid usage.
After this run
Turn this Actor's output into a capped paid report with Website RAG Readiness Audit Report. Use it when AI builders, documentation teams, support teams, and technical marketers need to decide whether public website pages are clean and complete enough for RAG ingestion.
- First report: $9 /
website_rag_snapshot_report; setmaxChargeUsdto $9. - Deeper report: $29 /
website_rag_readiness_report; use only when the first result needs competitor or action-depth. - This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.
Proof-focused buyer summary
Built for AI builders, content teams, and knowledge-base owners who need to decide whether a YouTube corpus is clean enough for RAG before building embeddings or a chatbot.
- Buy this when: Avoids building embeddings on a corpus that will fail retrieval quality later.
- Entry: $9 /
corpus_snapshot_report- $9 checks transcript availability, coverage, and basic corpus risk. - Premium: $29 /
rag_readiness_report- $29 adds RAG readiness, chunking risks, retrieval QA actions, and missing-content priorities. - Output promise: decision summary, score, three prioritized actions, source URLs, warnings,
chargedEvent,chargedUsd, andpreviewReport.nextRunInput. - Safety: keep
maxChargeUsdequal to the tier price. Demo, dry run, blocked/private sources, failed sources, and cap-limited runs are no-charge. - Not promised: rankings, revenue, conversion lift, sales lift, legal/procurement/financial advice, or private-source enrichment.
Entry first-run input:
{"demoMode": false,"dryRun": false,"reportTier": "corpus_snapshot","maxChargeUsd": 9,"maxReports": 1,"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"maxVideos": 1,"includeTranscriptQuality": true,"includeRagReadiness": true}
Premium upgrade input:
{"demoMode": false,"dryRun": false,"reportTier": "rag_readiness","maxChargeUsd": 29,"maxReports": 1,"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"maxVideos": 10,"includeTranscriptQuality": true,"includeRagReadiness": true,"includeChunkingPlan": true,"includeRetrievalRisks": true}
Next report-style Actors
If you already have data from this Actor, these follow-on Actors turn public or user-provided inputs into decision-ready reports. They are optional, capped by maxChargeUsd, and do not make business outcome claims.
- Website RAG Readiness Audit Report - audit public web pages before mixing them with transcript corpora in a RAG system.
Turn public YouTube captions from videos, playlists, or channels into a decision-ready RAG corpus audit. The report focuses on caption coverage, missing-caption risk, chunking quality, retrieval QA actions, and the next run needed to move from raw transcript extraction to usable AI retrieval.
Use corpus_snapshot for a compact coverage checklist, or rag_readiness when you need retrieval QA actions and prioritized fixes. The legacy chunks mode is still available for timestamped transcript rows.
First-run quality and billing cue
Start with demoMode or dryRun when you only want to inspect the output shape. For paid runs, use the entry report first, set maxChargeUsd as a hard cap, and move to premium only when you need comparison, recommendations, or a decision-ready action list. Pricing is controlled by the Apify Store listing, and maxChargeUsd should match the tier the buyer expects to run.
Store Quickstart
Recommended first run:
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"language": "en","reportTier": "corpus_snapshot","maxChargeUsd": 9,"maxVideos": 1,"delivery": "dataset","dryRun": false}
Input Examples
Corpus Snapshot Report
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"reportTier": "corpus_snapshot","maxChargeUsd": 9,"maxVideos": 1,"delivery": "dataset","dryRun": false}
RAG Readiness Report
{"channelUrls": ["https://www.youtube.com/@OpenAI"],"reportTier": "rag_readiness","maxChargeUsd": 29,"maxVideos": 5,"chunkSize": 1200,"chunkOverlap": 150,"delivery": "dataset","dryRun": false}
Legacy Transcript Chunks
{"videoIds": ["dQw4w9WgXcQ"],"reportTier": "chunks","language": "en","delivery": "webhook","webhookUrl": "https://example.com/webhook","dryRun": false}
Sample Output
{"meta": {"actorName": "youtube-channel-transcript-rag-intelligence","actorTitle": "YouTube Channel Transcript RAG Intelligence","fetchedAt": "2026-05-09T00:00:00.000Z","totalRows": 2},"rows": [{"rowType": "corpus_audit_report","reportTier": "rag_readiness","status": "success","chargedEvent": "rag_readiness_report","chargedUsd": 29,"decisionSummary": "RAG readiness report: 4/5 videos have usable public captions, coverage score 80, RAG readiness 76/100, missing-caption risk medium.","coverageScore": 80,"ragReadinessScore": 76,"missingCaptionRisk": "medium","chunkingRisks": [{"severity": "low","code": "chunking_ok","action": "Use current chunking as a baseline for retrieval QA."}],"retrievalQaChecklist": [{"check": "caption_coverage","status": "pass","action": "Confirm target videos have public captions before indexing."},{"check": "grounding","status": "required","action": "Run golden Q&A prompts and require timestamped source citations."}],"actionList": ["Replace captionless or unavailable videos and keep their warning rows as no-charge source evidence.","Build 5-10 golden retrieval questions and verify each answer cites a timestamped source chunk.","Tag missing-caption videos as ingestion blockers before scheduling recurring corpus updates."],"previewReport": {"nextRunInput": {"channelUrls": ["https://www.youtube.com/@OpenAI"],"reportTier": "rag_readiness","maxChargeUsd": 29,"dryRun": false}},"sourceUrls": ["https://www.youtube.com/@OpenAI"]}],"warnings": []}
Output Fields
Report rows include:
decisionSummarycoverageScoreragReadinessScoremissingCaptionRiskchunkingRisksretrievalQaChecklistactionListandprioritizedActionspreviewReport.nextRunInputstatus,chargedEvent,chargedUsd,reasonsourceUrls,warnings,errors
Supporting transcript chunks are included as no-charge evidence rows in report mode. Legacy chunk mode keeps timestamped rag_chunk rows with video metadata, chunk text, timestamps, and source URLs.
Pricing and No-Charge Rules
corpus_snapshotemitscorpus_snapshot_report.rag_readinessemitsrag_readiness_report.- Report mode charges at most one paid event per run. Supporting chunks are no-charge evidence rows.
dryRun,demoMode, caption failures, source failures, andmaxChargeUsdlimit rows are no-charge.- For recurring checks, save a successful report input as an Apify task and keep
maxChargeUsdaligned with the selected tier.
Compliance Guardrails
- Uses public YouTube pages and public caption tracks only.
- No account session, private video, member-only, paywalled, or login-only access is used.
- No CAPTCHA or rate-limit bypass is attempted.
- Do not position output as a replacement for rights-managed transcript licensing.
- Do not claim ranking, sales, or revenue improvements from the report.
- Do not use provider emblems or wording that implies upstream approval.
See Also
Related report Actors
Use these follow-on Actors when you want a capped, decision-ready report instead of more raw rows. They use public or user-provided inputs, respect maxChargeUsd, and do not promise rankings, revenue, conversion lifts, or sales outcomes.
- Website RAG Readiness Audit - audit public support pages before mixing them with transcript corpora.
Related paid report workflows
If this Actor gave you raw rows or source context, these follow-on report Actors are designed for a small capped paid run. They help make a decision, not just collect more data.
- Website RAG Readiness Audit Report - decide whether public website pages are clean and complete enough for RAG ingestion. Entry $9 /
website_rag_snapshot_report; premium $29 /website_rag_readiness_report.
Keep maxChargeUsd equal to the selected tier. Internal links are traffic aids only; real proof requires accounted paid usage.