CNN Transcripts Scraper
Pricing
Pay per event
CNN Transcripts Scraper
Scrape broadcast transcripts from transcripts.cnn.com. Extracts full segment text, speaker labels, show metadata, and airtime info for any CNN show and date range.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
11 hours ago
Last modified
Categories
Share
Scrape broadcast transcripts from CNN's public archive at transcripts.cnn.com. Covers every CNN show from recent history (archive dates back to ~2000), with per-segment granularity, speaker-label extraction, and structured metadata.
What you get
Each output record represents one broadcast segment:
| Field | Description |
|---|---|
show_slug | CNN show identifier (e.g. cnr, fzgps, sotu) |
show_title | Full show name (e.g. "CNN Newsroom") |
aired_date | Broadcast date — YYYY-MM-DD |
segment_number | Index within the show-date (1, 2, 3 …) |
segment_title | Segment headline and topic summary |
segment_url | Canonical URL on transcripts.cnn.com |
body_html | Full transcript HTML (preserves timestamps, paragraph breaks) |
body_text | Plain-text version with speaker labels and newlines preserved |
speakers | Comma-separated list of detected speaker names |
aired_at_local | ET broadcast time (e.g. 02:00 ET) |
source | Always transcripts.cnn.com |
scraped_at | ISO timestamp of when the record was scraped |
Usage
Basic — scrape all shows for a single day
{"startDate": "2026-05-08","maxItems": 50}
Date range with show filter
{"startDate": "2026-05-01","endDate": "2026-05-08","showSlugs": ["cnr", "fzgps", "sotu"],"maxItems": 500}
Input fields
| Field | Type | Required | Description |
|---|---|---|---|
startDate | string | Yes | Start date YYYY-MM-DD |
endDate | string | No | End date YYYY-MM-DD (defaults to startDate) |
showSlugs | string[] | No | Filter to specific shows (e.g. ["cnr", "fzgps"]). Leave empty for all shows. |
maxItems | integer | No | Cap on total segments returned. 0 = no limit. Default: 0. |
Common show slugs
| Slug | Show |
|---|---|
cnr | CNN Newsroom |
fzgps | Fareed Zakaria GPS |
sotu | State of the Union |
acd | Anderson Cooper 360 |
ebo | Erin Burnett OutFront |
cg | The Lead with Jake Tapper |
sitroom | The Situation Room |
ip | Inside Politics |
ctmo | CNN This Morning |
ampr | Amanpour |
Dataset size
- ~30 active CNN shows, 1–22 segments per show per day
- ~30–50 new segments published daily
- Archive goes back to approximately 2000
Notes on speaker extraction
The speakers field parses ALL-CAPS labels preceding a colon (e.g. ANDERSON COOPER:, TRUMP:) using a regex pass on the plain-text body. It covers named hosts and guests; unnamed contributors appear as UNIDENTIFIED MALE / UNIDENTIFIED FEMALE where present.
Responsible use
Transcripts on transcripts.cnn.com are published publicly by CNN for informational access. Users are responsible for ensuring their downstream use of transcript data complies with applicable copyright law and CNN's terms of service.