CNN Transcripts Scraper avatar

CNN Transcripts Scraper

Pricing

Pay per event

Go to Apify Store
CNN Transcripts Scraper

CNN Transcripts Scraper

Scrape broadcast transcripts from transcripts.cnn.com. Extracts full segment text, speaker labels, show metadata, and airtime info for any CNN show and date range.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

11 hours ago

Last modified

Share

Scrape broadcast transcripts from CNN's public archive at transcripts.cnn.com. Covers every CNN show from recent history (archive dates back to ~2000), with per-segment granularity, speaker-label extraction, and structured metadata.

What you get

Each output record represents one broadcast segment:

FieldDescription
show_slugCNN show identifier (e.g. cnr, fzgps, sotu)
show_titleFull show name (e.g. "CNN Newsroom")
aired_dateBroadcast date — YYYY-MM-DD
segment_numberIndex within the show-date (1, 2, 3 …)
segment_titleSegment headline and topic summary
segment_urlCanonical URL on transcripts.cnn.com
body_htmlFull transcript HTML (preserves timestamps, paragraph breaks)
body_textPlain-text version with speaker labels and newlines preserved
speakersComma-separated list of detected speaker names
aired_at_localET broadcast time (e.g. 02:00 ET)
sourceAlways transcripts.cnn.com
scraped_atISO timestamp of when the record was scraped

Usage

Basic — scrape all shows for a single day

{
"startDate": "2026-05-08",
"maxItems": 50
}

Date range with show filter

{
"startDate": "2026-05-01",
"endDate": "2026-05-08",
"showSlugs": ["cnr", "fzgps", "sotu"],
"maxItems": 500
}

Input fields

FieldTypeRequiredDescription
startDatestringYesStart date YYYY-MM-DD
endDatestringNoEnd date YYYY-MM-DD (defaults to startDate)
showSlugsstring[]NoFilter to specific shows (e.g. ["cnr", "fzgps"]). Leave empty for all shows.
maxItemsintegerNoCap on total segments returned. 0 = no limit. Default: 0.

Common show slugs

SlugShow
cnrCNN Newsroom
fzgpsFareed Zakaria GPS
sotuState of the Union
acdAnderson Cooper 360
eboErin Burnett OutFront
cgThe Lead with Jake Tapper
sitroomThe Situation Room
ipInside Politics
ctmoCNN This Morning
amprAmanpour

Dataset size

  • ~30 active CNN shows, 1–22 segments per show per day
  • ~30–50 new segments published daily
  • Archive goes back to approximately 2000

Notes on speaker extraction

The speakers field parses ALL-CAPS labels preceding a colon (e.g. ANDERSON COOPER:, TRUMP:) using a regex pass on the plain-text body. It covers named hosts and guests; unnamed contributors appear as UNIDENTIFIED MALE / UNIDENTIFIED FEMALE where present.

Responsible use

Transcripts on transcripts.cnn.com are published publicly by CNN for informational access. Users are responsible for ensuring their downstream use of transcript data complies with applicable copyright law and CNN's terms of service.