Deprecated

Pricing

Pay per event

See alternative Actors

Go to Apify Store

PBS Frontline Transcripts Scraper

Deprecated

See alternative Actors

Scrape full transcripts from PBS Frontline documentary films. Extracts transcript body text, speaker labels, film metadata (air date, synopsis, credits), and topic tags from all Frontline documentaries on pbs.org.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

What It Scrapes

Every record comes from pbs.org/wgbh/frontline/documentary/<slug>/. The transcript lives inline on the main documentary page (the /transcript/ subpath was retired; transcripts are now embedded directly). Metadata comes from JSON-LD structured data blocks on the same page.

Output Schema

Field	Type	Description
`film_slug`	string	URL slug (e.g. `the-deal-trump-bukele-gangs-el-salvador`)
`film_title`	string	Documentary title
`film_url`	string	Canonical PBS URL
`air_date`	string	Original broadcast date (YYYY-MM-DD)
`duration_minutes`	number	Runtime in minutes
`synopsis`	string	Brief description from page metadata
`producers`	string	Comma-separated producing and directing credits
`correspondents`	string	Comma-separated correspondent credits
`related_topics`	string	Comma-separated PBS topic tags
`body_html`	string	Full transcript HTML with `<strong>SPEAKER:</strong>` spans
`body_text`	string	Plain-text transcript with inline speaker labels
`speakers`	string	Comma-separated unique speaker labels
`has_viewer_discretion_notice`	boolean	True if the film flags mature content
`related_film_urls`	string	Comma-separated URLs of cross-linked Frontline films
`canonical_url`	string	Canonical page URL
`source`	string	Fixed: `pbs.org/wgbh/frontline`
`scraped_at`	datetime	ISO 8601 scrape timestamp

Speaker labels follow the Frontline convention: NARRATOR, PRESIDENT DONALD TRUMP, NAYIB BUKELE, etc. They are extracted directly from LABEL: spans — no inference, no cleanup required.

Input Options

startUrls (array, optional) — Specific documentary URLs to scrape. Leave empty to run the full sitemap discovery and scrape all available transcripts.

maxItems (integer, optional) — Cap on total records. Default 0 (no limit). When using sitemap discovery, applies globally across all sitemaps.

Example: Single film

{
    "startUrls": [
        {"url": "https://www.pbs.org/wgbh/frontline/documentary/the-deal-trump-bukele-gangs-el-salvador/"}
    ]
}

Example: Full archive crawl (all ~250 films)

{
    "maxItems": 0
}

Example: Recent 50 films

{
    "maxItems": 50
}

How It Works

Discovery uses PBS Frontline's sitemap index at pbs.org/wgbh/frontline/sitemap.xml. The nine sitemap-documentary sub-sitemaps each hold up to 100 film URLs, ordered newest-first. Films without a transcript (some pre-rebuild older entries) are silently skipped.

Metadata is parsed from JSON-LD blocks on each documentary page. The transcript and credits live in two Chakra UI accordion panels — panel 0 is the transcript, panel 1 is the credits. Speaker labels are extracted via a single regex pass on the LABEL: pattern Frontline uses consistently across its archive.

The site is server-rendered Next.js with aggressive edge caching — no headless browser required, no proxy required.

Pricing

Charged per record scraped. Long-form transcripts (30-80KB each) are priced at a modest premium reflecting per-record research value. Start price applies per actor run regardless of record count.

Notes

Films without a transcript are skipped gracefully and do not count toward maxItems.
Some older archive films have had their transcript pages rebuilt and may appear without speaker-label markup — body text is still returned when a transcript exists.
body_html preserves the original  speaker spans for downstream NLP pipelines that want to distinguish speaker turns programmatically.

Need Custom Fields or a Different Source?

File an issue or get in touch. We can add fields, filter by topic, or build adjacent scrapers in the same broadcast-transcript vertical.

Twitter/X Hashtag Scraper: Support Sentiment&Tone Analyzer 2025

fastcrawler/twitter-x-hashtag-scraper-support-sentiment-tone-analyzer-2025

Get 1,000 results for just $0.01! Introducing the Twitter Hashtag Fast Scraper, your go-to solution for scraping Twitter hashtags. This powerful tool combines blazing-fast speed with advanced data extraction capabilities, making it perfect for social media analysts, marketers, and researchers.

fastcrawler

482

1.0

(1)

Twitter Following&Followers&BlueVerified Fast&Cheapest Scraper

fastcrawler/twitter-following-followers-blueverified-fast-cheapest-scraper

1000 followers only cost 0.01$. With the Twitter Following&Followers&BlueVerified Fast Scraper, you can enhance your data collection process, streamline your workflows, and gain a competitive edge by acquiring accurate, up-to-date information.

fastcrawler

221

Twitter (X.com) Video Downloader

alpha-scraper/twitter-x-com-video-downloader

Extract direct video URLs and public metadata from Twitter (X.com) posts containing videos. This actor processes one or multiple tweet links and returns structured data including tweet ID, title, thumbnail, and best-quality downloadable video link — ideal for media monitoring, research, automation.

Alpha Scraper

X.com Twitter User Search Scraper

xtdata/twitter-x-user-search-scraper

Scrape Twitter (X.com) user profiles instantly. Scrape detailed metrics like follower counts, bios, locations, and verified status using specific search terms. Fast, reliable, and perfect for influencer research and competitor analysis.

xtdata

160

5.0

(1)

X.com Twitter User Tweets Scraper

xtdata/twitter-x-user-tweets-scraper

Scrape tweets for target user profiles or handles data from Twitter (now X). Ideal for researchers, analysts, and social media tracking.

xtdata

263

Twitter (X.com) User Info Scraper

xtdata/twitter-x-user-info-scraper

Extract comprehensive Twitter (X.com) data with our User Info Scraper. Scrape detailed profiles, followers, and following lists by username or numeric ID. Supports batch processing, pagination, and custom mapping. Reliable, high-speed, and requires no login. Perfect for professional lead generation.

xtdata

Twitter(X.com) Tweets Scraper

codenest/twitter-x-com-tweets-scraper

🎯 Extract Twitter (X.com) tweets with videos, images & audio! Get structured JSON output with multiple resolutions & secure Apify storage. Perfect for researchers 🔬, marketers 📈 & content analysts 📊 needing reliable social media data extraction.❤️Twitter(X.com) Tweets Scraper❤️.

CodeNest

Twitter / X Scraper: Tweets, No API Key

themineworks/twitter-x-scraper

Scrape public tweets from X (Twitter) by keyword, hashtag or username without a paid API key. Returns text, likes, retweets, views, date and media URLs. MCP ready. Pay per tweet, first 10 free.

The Mine Works

Youtube Transcript Scraper

pintostudio/youtube-transcript-scraper

Looking for a reliable way to extract transcripts from YouTube videos? 🎥✨ Look no further! The YouTube-Transcript-Scraper has you covered. 🚀 It effortlessly retrieves transcripts while offering additional valuable insights. Ready to start? Let’s scrape away! 🕵️‍♂️💻

Pinto Studio

21K

4.8

(46)

Youtube Transcript Scraper

topaz_sharingan/Youtube-Transcript-Scraper

Are you in search of a robust solution for extracting transcripts from YouTube videos? Look no further 😉, YouTube-Transcript-Scraper will meet your needs. Our software not only efficiently retrieves transcripts but also provides additional valuable information .👍 😀 Scrap away 🕵‍♂️.

Moses Ceaser

4.6K

4.9

(15)

YouTube Transcript Scraper Pro (Captions + AI Fallback)

codepoetry/youtube-transcript-ai-scraper

Extract YouTube transcripts at scale without burning through your budget. It starts with free captions whenever they're available, then switches to AI only for videos that don't have them. You stay in control of costs, and the output — JSON, SRT, VTT, plain text, or LLM-ready format