Deprecated

Pricing

Pay per event

See alternative Actors

Go to Apify Store

CBS 60 Minutes Transcripts Scraper

Deprecated

See alternative Actors

Collects full interview transcripts from CBS 60 Minutes. Discovers pages via the CBS News article sitemap, extracts the Q&A body, correspondent name, broadcast date, speaker labels, and topic tags. Video-only segments without a published transcript are skipped.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

CBS 60 Minutes Transcript Scraper — Interview Archive

60 Minutes transcripts average 5,000–30,000 words of on-the-record Q&A per segment, with consistent correspondent and speaker-label fields across the archive. This actor scrapes full interview transcripts from CBS News 60 Minutes — one record per segment: headline, correspondent, broadcast date, speaker-labeled body text, and topic metadata. Discovers transcript pages automatically from the CBS News article sitemap. Video-only segments without a published transcript are skipped.

What's included and what isn't

60 Minutes airs approximately 45 episodes per US broadcast season, with 3–4 segments per episode. Roughly 50–70% of segments have a published transcript — the remainder are video-only. This scraper covers transcript-bearing segments only, makes that boundary explicit in every record (is_transcript: true), and skips video-only pages entirely. The active transcript archive covers approximately 5 years back, with sparser coverage for earlier seasons.

What does the CBS 60 Minutes Transcript Scraper do?

Discovery walks the CBS News sitemap index at cbsnews.com/xml-sitemap/index.xml, filtering monthly article sitemaps for two URL patterns:

/news/<slug>-60-minutes-transcript/ — primary transcript pattern
/news/read-the-full-transcript-of-<slug>/ — extended interview variant

Metadata is parsed from JSON-LD NewsArticle blocks on each page. Transcript body text lives in <section class="content__body"> as <p> tags, with ad wrappers stripped before extraction. Speaker labels are extracted from paragraph-leading Name: patterns in both Title Case and ALL-CAPS formats. No headless browser or proxy required.

What data does it extract?

Field	Type	Description
`story_slug`	string	URL slug of the transcript page
`story_title`	string	Article headline
`story_url`	string	Canonical CBS News URL
`aired_date`	string	Broadcast date (YYYY-MM-DD)
`published_date`	string	CBS News publish timestamp (ISO 8601)
`segment_type`	string	Inferred type: `interview`, `investigation`, or `profile`
`correspondent`	string	CBS News correspondent (e.g. Major Garrett, Lesley Stahl)
`subjects`	string	Interviewed subjects extracted from speaker labels (comma-separated)
`synopsis`	string	Article meta description
`body_html`	string	Full transcript HTML preserving Q&A paragraph structure
`body_text`	string	Plain-text version of the transcript
`speakers`	string	All speaker labels found in the transcript (comma-separated)
`is_transcript`	boolean	Always `true` — non-transcripts are skipped
`has_video_only_variant`	boolean	True when a paired video-only story exists
`related_story_urls`	string	Related CBS News links on the page (comma-separated)
`topics`	string	CBS News topic tags (comma-separated)
`canonical_url`	string	Canonical URL from page head
`source`	string	Fixed: `cbsnews.com/60-minutes`
`scraped_at`	datetime	ISO 8601 scrape timestamp

How to use it

{
  "maxItems": 1,
  "startUrls": [
    {"url": "https://www.cbsnews.com/news/netanyahu-us-israel-iran-60-minutes-transcript/"}
  ]
}

Scrapes a specific episode transcript.

{
  "maxItems": 200,
  "startDate": "2025-01"
}

Scrapes all 60 Minutes transcripts published from January 2025 onward (up to 200 records).

{ "maxItems": 1000 }

Full archive crawl — returns all available transcripts across the active archive.

Field	Type	Description
`maxItems`	integer	Required. Maximum transcript records to scrape
`startDate`	string	Optional. Limit discovery to sitemaps from this month onward (YYYY-MM format)
`startUrls`	array	Optional. Direct CBS News transcript URLs — skips sitemap discovery when provided

Pricing

Charged per transcript record scraped. A 200-transcript run at the 1.2x coefficient on the default_2603_basic profile costs approximately $0.35 ($0.10 start + $0.00125 per record × 200 records).

Use cases

Media and political research — build a structured corpus of 60 Minutes interviews with heads of state, CEOs, and scientists spanning multiple years
NLP corpora — long-form Q&A transcripts with consistent speaker labeling are well-suited for dialog modeling, summarization, or entity extraction
Journalism datasets — index correspondent names, broadcast dates, and topics across the archive to analyze coverage patterns or research specific subjects
RAG pipelines — ingest high-quality, on-the-record interview text as a retrieval source for investigative journalism or policy research applications
Academic research — track how specific topics (foreign policy, corporate governance, public health) are covered on network news over time

FAQ

Why is coverage 50–70% rather than 100%?

CBS News publishes transcripts for most but not all 60 Minutes segments. Some segments are video-only by editorial choice, particularly shorter news-break items and some documentary segments. The is_transcript field and the URL-pattern filter ensure only genuine transcript pages are returned.

Can I scrape by correspondent?

The input does not have a correspondent filter, but every record returns the correspondent field. Fetch the relevant date range and filter downstream by correspondent name.

Results export as JSON, CSV, or Excel from the Apify dataset view.

TikTok Transcript Extractor

clockworks/tiktok-transcript-extractor

Get TikTok transcripts and subtitles from any TikTok video. Download subtitles where available or add AI video-to-text transcription where captions are missing. Results include engagement data, ready for content analysis, trend tracking, AI pipelines, keyword research, and content repurposing.

Clockworks

170

5.0

(1)

Youtube Transcript Scraper

coregent/youtube-transcript-scraper

Lightning-fast transcript extraction with pay-per-result pricing. Extract comprehensive transcript data from YouTube videos using official APIs. Get paragraph-formatted transcript text, timed segments, and metadata with 15 complete fields in just 1-2 seconds per video.

Delowar Munna

5.0

(2)

YouTube Transcript Scraper

apihq/youtube-transcript-scraper

Scrape YouTube transcripts, captions, and subtitles as clean timestamped JSON. No YouTube Data API key or quota, no browser. One video or batch up to 50. Returns caption segments plus video metadata and languages. Pay only for successful transcripts at $3 per 1,000; failed videos never charge.

apihq dev

Bilibili Transcript Scraper | AI Speech-to-Text (B站)

ethereal_wool/bilibili-transcript-scraper

Turn any Bilibili (B站) video into text. Real AI speech recognition with best-in-class Mandarin Chinese accuracy — works on videos with no subtitles. Full text + timestamped sentences as clean JSON.

Jackie Chen

YouTube Transcript Summary & Translator. Fast & Efficient ⚡

lume/yt-transcripts-summary

YouTube transcript scraper with AI summaries. Extract captions, generate intelligent summaries, and translate to any language. Perfect for content creators, marketers, SEO, competitor analysis, and automation workflows.

Lume

5.0

(3)

Transcribe Podcast to Text — Show Notes, SRT & Timestamps

sian.agency/transcribe-podcast-to-text

Transcribe podcast episodes to text in bulk. Speaker labels for hosts and guests, word-level timestamps, SRT/VTT for show notes. 99+ languages.

SIÁN OÜ

Instagram Profile – User Profile & Metadata Scraper

transcriptdl/instagram-profile---user-profile-metadata-scraper

Verified 99.4% Success. BULK extract complete Instagram profiles and posts with follower stats, engagement metrics, captions, hashtags, media URLs, and related accounts.

Transcript Downloader

Instagram Youtube Transcripts With Speaker Labels Full Account

transcriptdl/instagram-youtube-transcripts-with-speaker-labels-full-account

Verified 99.4% Success. BULK generate transcripts with speaker diarization from Instagram Reels & YouTube videos. Automatically identifies speakers, outputs SRT/VTT subtitles, timestamps & full text. Perfect for podcasts, interviews & meetings. Bulk processing supported.

Transcript Downloader

Douyin 抖音 Transcripts Scraper - 50+ Languages, .srt + MP4

zen-studio/douyin-transcripts-scraper

Extract timestamped transcripts and .srt 字幕 from any Douyin (抖音) video. Mandarin speech-to-text plus translation into 50 languages. Optionally save the source MP4 and cover image to your key-value store at no extra cost. 60+ metadata fields. Per-minute pricing, free tier.

Zen Studio

Youtube Audio Scraper Extractor & Downloader

transcriptdl/transcript-downloader-youtube-audio-scraper

Verified 99.4% Success. BULK download and scrape audio from YouTube videos in bulk using the Transcript Downloader API. Supports multiple formats, optional storage to Apify, and progress tracking with polling.

Transcript Downloader

149

5.0

(1)

YouTube Video Scraper Extractor & Downloader

transcriptdl/youtube-video-scraper-extractor-downloader

Verified 99.4% Success. BULK download and scrape MP4 video files from YouTube videos using the Transcript Downloader API. Supports optional storage to Apify, webhook delivery, and progress tracking with automatic polling.

Transcript Downloader