Pricing

from $6.00 / 1,000 transcript rag chunks

YouTube & Transcript Corpus Audit for RAG

Audit authorized transcript text for RAG readiness, produce retrieval chunks and source-linked reports, and optionally try public YouTube captions without login or cookies.

Pricing

from $6.00 / 1,000 transcript rag chunks

Rating

0.0

(0)

Developer

naoki anzai

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

Works best after

YouTube Channel Transcript RAG Intelligence is easiest to buy after one of these related Actors has already produced public rows or source context:

Youtube Transcript Bulk Api - use its public rows or source context as the starting input.
Youtube Channel Analytics - use its public rows or source context as the starting input.

Start with $9 / corpus_snapshot_report; upgrade to $29 / rag_readiness_report only when the first report needs deeper action detail. Internal links improve discovery only. Qualified forecast still requires accounted paid usage.

Run the next report

Turn this Actor's output into a capped paid report with Website RAG Readiness Audit Report. Use it when AI builders, documentation teams, support teams, and technical marketers need to decide whether public website pages are clean and complete enough for RAG ingestion.

First report: $9 / website_rag_snapshot_report; set maxChargeUsd to $9.
Deeper report: $29 / website_rag_readiness_report; use only when the first result needs competitor or action-depth.
This is an internal Apify flow aid. It is not revenue proof until accounted paid usage appears.

Proof-focused buyer summary

Built for AI builders, content teams, and knowledge-base owners who need to decide whether a YouTube corpus is clean enough for RAG before building embeddings or a chatbot.

Buy this when: Avoids building embeddings on a corpus that will fail retrieval quality later.
Entry: $9 / corpus_snapshot_report - $9 checks transcript availability, coverage, and basic corpus risk.
Premium: $29 / rag_readiness_report - $29 adds RAG readiness, chunking risks, retrieval QA actions, and missing-content priorities.
Output promise: decision summary, score, three prioritized actions, source URLs, warnings, chargedEvent, chargedUsd, and previewReport.nextRunInput.
Safety: keep maxChargeUsd equal to the tier price. Demo, dry run, blocked/private sources, failed sources, and cap-limited runs are no-charge.
Not promised: rankings, revenue, conversion lift, sales lift, legal/procurement/financial advice, or private-source enrichment.

Entry first-run input:

{
  "demoMode": false,
  "dryRun": false,
  "reportTier": "corpus_snapshot",
  "maxChargeUsd": 9,
  "maxReports": 1,
  "transcriptTitle": "Product onboarding transcript",
  "transcriptSourceUrl": "https://example.com/onboarding-video",
  "transcriptText": "Paste transcript text that you own or are authorized to process here. Include enough content to evaluate chunking, missing context, and retrieval readiness."
}

Premium upgrade input:

{
  "demoMode": false,
  "dryRun": false,
  "reportTier": "rag_readiness",
  "maxChargeUsd": 29,
  "maxReports": 1,
  "transcriptTitle": "Support knowledge transcript corpus",
  "transcriptSourceUrl": "https://example.com/support-video",
  "transcriptText": "Paste the authorized transcript corpus here. The premium report adds chunking risks, retrieval QA actions, and missing-content priorities."
}

Next report-style Actors

If you already have data from this Actor, these follow-on Actors turn public or user-provided inputs into decision-ready reports. They are optional, capped by maxChargeUsd, and do not make business outcome claims.

Website RAG Readiness Audit Report - audit public web pages before mixing them with transcript corpora in a RAG system.

Turn authorized transcript text into a decision-ready RAG corpus audit. Public YouTube caption fetching remains available as a best-effort source, but user-provided text is recommended because YouTube can suppress caption tracks for automated cloud requests.

Use corpus_snapshot for a compact coverage checklist, or rag_readiness when you need retrieval QA actions and prioritized fixes. The legacy chunks mode is still available for timestamped transcript rows.

First-run quality and billing cue

Start with demoMode or dryRun when you only want to inspect the output shape. For paid runs, use the entry report first, set maxChargeUsd as a hard cap, and move to premium only when you need comparison, recommendations, or a decision-ready action list. Pricing is controlled by the Apify Store listing, and maxChargeUsd should match the tier the buyer expects to run.

Store Quickstart

Recommended first run:

{
  "videoUrls": [],
  "transcriptTitle": "Product onboarding transcript",
  "transcriptSourceUrl": "https://example.com/onboarding-video",
  "transcriptText": "Paste transcript text that you own or are authorized to process here. Include the important product concepts, procedures, and limitations that retrieval users will ask about.",
  "language": "en",
  "reportTier": "corpus_snapshot",
  "maxChargeUsd": 9,
  "delivery": "dataset",
  "dryRun": false
}

Input Examples

Corpus Snapshot Report

{
  "videoUrls": [
    "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
  ],
  "reportTier": "corpus_snapshot",
  "maxChargeUsd": 9,
  "maxVideos": 1,
  "delivery": "dataset",
  "dryRun": false
}

RAG Readiness Report

{
  "channelUrls": [
    "https://www.youtube.com/@OpenAI"
  ],
  "reportTier": "rag_readiness",
  "maxChargeUsd": 29,
  "maxVideos": 5,
  "chunkSize": 1200,
  "chunkOverlap": 150,
  "delivery": "dataset",
  "dryRun": false
}

Legacy Transcript Chunks

{
  "videoIds": [
    "dQw4w9WgXcQ"
  ],
  "reportTier": "chunks",
  "language": "en",
  "delivery": "webhook",
  "webhookUrl": "https://example.com/webhook",
  "dryRun": false
}

Sample Output

{
  "meta": {
    "actorName": "youtube-channel-transcript-rag-intelligence",
    "actorTitle": "YouTube Channel Transcript RAG Intelligence",
    "fetchedAt": "2026-05-09T00:00:00.000Z",
    "totalRows": 2
  },
  "rows": [
    {
      "rowType": "corpus_audit_report",
      "reportTier": "rag_readiness",
      "status": "success",
      "chargedEvent": "rag_readiness_report",
      "chargedUsd": 29,
      "decisionSummary": "RAG readiness report: 4/5 sources have usable transcript text, coverage score 80, RAG readiness 76/100, missing-transcript risk medium.",
      "coverageScore": 80,
      "ragReadinessScore": 76,
      "missingCaptionRisk": "medium",
      "chunkingRisks": [
        {
          "severity": "low",
          "code": "chunking_ok",
          "action": "Use current chunking as a baseline for retrieval QA."
        }
      ],
      "retrievalQaChecklist": [
        {
          "check": "transcript_coverage",
          "status": "pass",
          "action": "Confirm each source has authorized, usable transcript text before indexing."
        },
        {
          "check": "grounding",
          "status": "required",
          "action": "Run golden Q&A prompts and require timestamped source citations."
        }
      ],
      "actionList": [
        "Replace captionless or unavailable videos and keep their warning rows as no-charge source evidence.",
        "Build 5-10 golden retrieval questions and verify each answer cites a timestamped source chunk.",
        "Tag missing-caption videos as ingestion blockers before scheduling recurring corpus updates."
      ],
      "previewReport": {
        "nextRunInput": {
          "channelUrls": ["https://www.youtube.com/@OpenAI"],
          "reportTier": "rag_readiness",
          "maxChargeUsd": 29,
          "dryRun": false
        }
      },
      "sourceUrls": ["https://www.youtube.com/@OpenAI"]
    }
  ],
  "warnings": []
}

Output Fields

Report rows include:

decisionSummary
coverageScore
ragReadinessScore
missingCaptionRisk
chunkingRisks
retrievalQaChecklist
actionList and prioritizedActions
previewReport.nextRunInput
status, chargedEvent, chargedUsd, reason
sourceUrls, warnings, errors

Supporting transcript chunks are included as no-charge evidence rows in report mode. Legacy chunk mode keeps timestamped rag_chunk rows with video metadata, chunk text, timestamps, and source URLs.

Pricing and No-Charge Rules

corpus_snapshot emits corpus_snapshot_report.
rag_readiness emits rag_readiness_report.
Report mode charges at most one paid event per run. Supporting chunks are no-charge evidence rows.
dryRun, demoMode, caption failures, source failures, and maxChargeUsd limit rows are no-charge.
For recurring checks, save a successful report input as an Apify task and keep maxChargeUsd aligned with the selected tier.

Compliance Guardrails

Uses public YouTube pages and public caption tracks only.
No account session, private video, member-only, paywalled, or login-only access is used.
No CAPTCHA or rate-limit bypass is attempted.
Do not position output as a replacement for rights-managed transcript licensing.
Do not claim ranking, sales, or revenue improvements from the report.
Do not use provider emblems or wording that implies upstream approval.

YouTube & Transcript Corpus Audit for RAG

Works best after

Run the next report

Proof-focused buyer summary

Next report-style Actors

First-run quality and billing cue

Store Quickstart

Input Examples

Corpus Snapshot Report

RAG Readiness Report

Legacy Transcript Chunks

Sample Output

Output Fields

Pricing and No-Charge Rules

Compliance Guardrails

See Also

Youtube Transcript Scraper

Youtube Transcript Scraper

YouTube Transcript to RAG Dataset

Youtube Transcript API

YouTube Transcript Scraper

Website RAG Readiness Audit Report

YouTube Transcript Scraper

YouTube Transcript API - RAG Chapters, Summary & Chunks

YouTube Transcript API + Summary & RAG

Youtube Transcript - 1$/month

YouTube & Transcript Corpus Audit for RAG

Works best after

Run the next report

Proof-focused buyer summary

Next report-style Actors

First-run quality and billing cue

Store Quickstart

Input Examples

Corpus Snapshot Report

RAG Readiness Report

Legacy Transcript Chunks

Sample Output

Output Fields

Pricing and No-Charge Rules

Compliance Guardrails

See Also

Related report Actors

Related paid report workflows

You might also like

Youtube Transcript Scraper

Youtube Transcript Scraper

YouTube Transcript to RAG Dataset

Youtube Transcript API

YouTube Transcript Scraper

Website RAG Readiness Audit Report

YouTube Transcript Scraper

YouTube Transcript API - RAG Chapters, Summary & Chunks

YouTube Transcript API + Summary & RAG

Youtube Transcript - 1$/month