YouTube Transcript Corpus Audit & RAG Readiness
Pricing
from $6.00 / 1,000 transcript rag chunks
YouTube Transcript Corpus Audit & RAG Readiness
Extract public YouTube captions, audit transcript coverage, score RAG readiness, and create timestamped supporting chunks without double charging report mode.
Pricing
from $6.00 / 1,000 transcript rag chunks
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
YouTube Corpus Audit & RAG Readiness Report
Turn public YouTube captions from videos, playlists, or channels into a decision-ready RAG corpus audit. The report focuses on caption coverage, missing-caption risk, chunking quality, retrieval QA actions, and the next run needed to move from raw transcript extraction to usable AI retrieval.
Use corpus_snapshot for a compact coverage checklist, or rag_readiness when you need retrieval QA actions and prioritized fixes. The legacy chunks mode is still available for timestamped transcript rows.
Store Quickstart
Recommended first run:
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"language": "en","reportTier": "corpus_snapshot","maxChargeUsd": 9,"maxVideos": 1,"delivery": "dataset","dryRun": false}
Input Examples
Corpus Snapshot Report
{"videoUrls": ["https://www.youtube.com/watch?v=dQw4w9WgXcQ"],"reportTier": "corpus_snapshot","maxChargeUsd": 9,"maxVideos": 1,"delivery": "dataset","dryRun": false}
RAG Readiness Report
{"channelUrls": ["https://www.youtube.com/@OpenAI"],"reportTier": "rag_readiness","maxChargeUsd": 29,"maxVideos": 5,"chunkSize": 1200,"chunkOverlap": 150,"delivery": "dataset","dryRun": false}
Legacy Transcript Chunks
{"videoIds": ["dQw4w9WgXcQ"],"reportTier": "chunks","language": "en","delivery": "webhook","webhookUrl": "https://example.com/webhook","dryRun": false}
Sample Output
{"meta": {"actorName": "youtube-channel-transcript-rag-intelligence","actorTitle": "YouTube Channel Transcript RAG Intelligence","fetchedAt": "2026-05-09T00:00:00.000Z","totalRows": 2},"rows": [{"rowType": "corpus_audit_report","reportTier": "rag_readiness","status": "success","chargedEvent": "rag_readiness_report","chargedUsd": 29,"decisionSummary": "RAG readiness report: 4/5 videos have usable public captions, coverage score 80, RAG readiness 76/100, missing-caption risk medium.","coverageScore": 80,"ragReadinessScore": 76,"missingCaptionRisk": "medium","chunkingRisks": [{"severity": "low","code": "chunking_ok","action": "Use current chunking as a baseline for retrieval QA."}],"retrievalQaChecklist": [{"check": "caption_coverage","status": "pass","action": "Confirm target videos have public captions before indexing."},{"check": "grounding","status": "required","action": "Run golden Q&A prompts and require timestamped source citations."}],"actionList": ["Replace captionless or unavailable videos and keep their warning rows as no-charge source evidence.","Build 5-10 golden retrieval questions and verify each answer cites a timestamped source chunk.","Tag missing-caption videos as ingestion blockers before scheduling recurring corpus updates."],"previewReport": {"nextRunInput": {"channelUrls": ["https://www.youtube.com/@OpenAI"],"reportTier": "rag_readiness","maxChargeUsd": 29,"dryRun": false}},"sourceUrls": ["https://www.youtube.com/@OpenAI"]}],"warnings": []}
Output Fields
Report rows include:
decisionSummarycoverageScoreragReadinessScoremissingCaptionRiskchunkingRisksretrievalQaChecklistactionListandprioritizedActionspreviewReport.nextRunInputstatus,chargedEvent,chargedUsd,reasonsourceUrls,warnings,errors
Supporting transcript chunks are included as no-charge evidence rows in report mode. Legacy chunk mode keeps timestamped rag_chunk rows with video metadata, chunk text, timestamps, and source URLs.
Pricing and No-Charge Rules
corpus_snapshotemitscorpus_snapshot_report.rag_readinessemitsrag_readiness_report.- Report mode charges at most one paid event per run. Supporting chunks are no-charge evidence rows.
dryRun,demoMode, caption failures, source failures, andmaxChargeUsdlimit rows are no-charge.- The recurring watch summary is planned and proof-gated. It is not selectable in the public input schema and is not promoted until paid proof exists.
Compliance Guardrails
- Uses public YouTube pages and public caption tracks only.
- No account session, private video, member-only, paywalled, or login-only access is used.
- No CAPTCHA or rate-limit bypass is attempted.
- Do not position output as a replacement for rights-managed transcript licensing.
- Do not claim ranking, sales, or revenue improvements from the report.
- Do not use provider emblems or wording that implies upstream approval.