CBS 60 Minutes Transcripts Scraper avatar

CBS 60 Minutes Transcripts Scraper

Pricing

Pay per event

Go to Apify Store
CBS 60 Minutes Transcripts Scraper

CBS 60 Minutes Transcripts Scraper

Collects full interview transcripts from CBS 60 Minutes. Discovers pages via the CBS News article sitemap, extracts the Q&A body, correspondent name, broadcast date, speaker labels, and topic tags. Video-only segments without a published transcript are skipped.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

Scrapes full Q&A interview transcripts from CBS News 60 Minutes — the most-recognised US investigative news magazine. Returns one record per transcript page: title, correspondent, broadcast date, subject list, speaker-labeled body text, and topic metadata. Discovers transcript pages automatically from the CBS News article sitemap. Video-only segments without a published transcript are skipped.

60 Minutes is the most-watched US news magazine, known for long-form sit-down interviews with heads of state, CEOs, whistleblowers, and scientists. Each transcript runs 5,000-30,000 words of clean, on-the-record Q&A — high-signal content for media research, RAG pipelines, and investigative journalism datasets.

What It Scrapes

Targets two URL patterns on cbsnews.com:

  • /news/<slug>-60-minutes-transcript/ — primary transcript pattern
  • /news/read-the-full-transcript-of-<slug>/ — extended interview variant

Discovery walks the CBS News monthly article sitemaps, filters by these patterns, and scrapes each matching page. Video-only stories (e.g. /news/<slug>-60-minutes/) are explicitly excluded.

Output Schema

FieldTypeDescription
story_slugstringURL slug of the transcript page
story_titlestringArticle headline
story_urlstringCanonical CBS News URL
aired_datestringBroadcast date (YYYY-MM-DD)
published_datestringCBS News publish timestamp (ISO 8601)
segment_typestringInferred type: interview, investigation, or profile
correspondentstringCBS News correspondent (e.g. Major Garrett, Lesley Stahl)
subjectsstringInterviewed subjects extracted from speaker labels (comma-separated)
synopsisstringArticle dek / meta description
body_htmlstringFull transcript HTML preserving Q&A paragraph structure
body_textstringPlain-text version of the transcript
speakersstringAll speaker labels found in the transcript (comma-separated)
is_transcriptbooleanAlways true — non-transcripts are skipped
has_video_only_variantbooleanTrue when a paired video-only story exists
related_story_urlsstringRelated CBS News links on the page (comma-separated)
topicsstringCBS News topic tags (comma-separated)
canonical_urlstringCanonical URL from page head
sourcestringFixed: cbsnews.com/60-minutes
scraped_atdatetimeISO 8601 scrape timestamp

Speaker labels follow two CBS conventions: Major Garrett: (Title Case) and MAJOR GARRETT: (ALL-CAPS, used in the extended-interview variant). Both formats are normalized and extracted.

Input Options

maxItems (integer, required) — Maximum number of transcript records to scrape. Set a higher value for bulk runs.

startDate (string, optional) — Limit sitemap discovery to a given month onwards (YYYY-MM format, e.g. "2024-01"). Defaults to all available months when omitted.

startUrls (array, optional) — One or more direct CBS News transcript URLs. When provided, sitemap discovery is skipped and only the supplied URLs are scraped. Useful for targeted re-runs of specific episodes.

Example: Specific episode

{
"maxItems": 1,
"startUrls": [
{"url": "https://www.cbsnews.com/news/netanyahu-us-israel-iran-60-minutes-transcript/"}
]
}

Example: All 2025 transcripts

{
"maxItems": 200,
"startDate": "2025-01"
}

Example: Full archive (all available transcripts)

{
"maxItems": 1000
}

How It Works

Discovery uses the CBS News sitemap index at cbsnews.com/xml-sitemap/index.xml. Monthly article sitemaps (article-YYYY-MM.xml) are walked in order, newest first. Each sitemap lists 3,000+ news articles; only URLs matching the transcript patterns are fetched.

Metadata is parsed from JSON-LD NewsArticle blocks present on every CBS article page — giving reliable correspondent name, publish date, and keywords. The transcript body lives in <section class="content__body"> as a sequence of <p> tags. Speaker labels are extracted from paragraph-leading Name: patterns. Ad wrappers are stripped before body extraction.

CBS News is server-rendered (varnish edge cache) with no bot-protection observed. No proxy required, no headless browser required.

Coverage Notes

60 Minutes airs approximately 45 episodes per US broadcast season, with 3-4 segments per episode. Roughly 50-70% of segments receive a published transcript — the remainder are video-only. This scraper covers transcript-bearing segments only and makes that boundary explicit in every record (is_transcript: true, video-only pages are skipped). The active transcript archive covers approximately 5 years back, with sparser coverage for earlier seasons.

Pricing

Charged per transcript record scraped. Long-form interviews (5,000-30,000 words each) are priced at a modest premium reflecting their per-record research value versus wire-copy or short-form corpora.