๐Ÿ“ป NPR Scraper โ€” News & Podcast Transcripts avatar

๐Ÿ“ป NPR Scraper โ€” News & Podcast Transcripts

Pricing

from $3.00 / 1,000 results

Go to Apify Store
๐Ÿ“ป NPR Scraper โ€” News & Podcast Transcripts

๐Ÿ“ป NPR Scraper โ€” News & Podcast Transcripts

Extract articles & content from NPR โ€” news stories, podcast episodes & transcripts. Build media monitoring, content analysis & journalism research tools. Pay per article.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Stephan Corbeil

Stephan Corbeil

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

a day ago

Last modified

Categories

Share

๐Ÿ“ป NPR Scraper โ€” News Articles, Podcast Transcripts & Show Metadata

Bulk-extract articles, podcast episodes, transcripts, and show metadata from NPR.org and NPR member stations: headline, byline, dek, full article body, publication date, primary topic, image URLs, audio URLs, transcript text, and show / program affiliation. A pay-per-result alternative to Diffbot's news API, NewsAPI.org Premium, Webhose.io, and the NPR private partner API โ€” built for media-monitoring firms, NLP researchers training news-domain models, content-aggregator startups, and political-comms teams tracking national news coverage.

Why NPR Scraper Beats Diffbot News, NewsAPI.org, Webhose.io & NPR Partner API

FeatureNexGenData NPR ScraperDiffbot News APINewsAPI.org PremiumWebhose.ioNPR Partner API
Cost$1 per 1K articles, pay-per-event$299+ / month$449+ / month$$$$ enterprisePartner contract
NPR-specific coverageYes โ€” full NPR.org + member stationsGeneric news (NPR included)NPR via aggregatorNPR via aggregatorYes (partner-only)
Full article bodyYesYesHeadline + url onlyYesYes
Podcast audio + transcriptYes โ€” audio_url + transcript_textNoNoNoYes
Bulk exportJSON / CSV / ExcelJSON / CSV (plan-gated)JSONJSON / CSVPartner-only
AuthApify tokenAPI key + planAPI key + planAccount + planPartnership
Historical archive5+ years30 days default30 days (free)Yes (plan-gated)Yes
Monthly minimumNone$299+$449+$$$$Partnership contract

Most media-intel teams pick this actor instead of Diffbot's news API for NPR-specific workflows because it is a drop-in alternative that returns NPR transcripts (which Diffbot does not) and is cheaper than NewsAPI Premium for any depth beyond headlines โ€” and it doesn't require an NPR partnership contract.

What You Get Per Story

Each dataset item is a flat record:

  • url, npr_story_id
  • headline, dek โ€” subheadline / standfirst
  • byline[] โ€” author(s) with bio link
  • published_at, updated_at โ€” ISO 8601
  • body โ€” full article text (HTML or markdown)
  • body_paragraphs[] โ€” array of paragraph strings for easy NLP
  • topics[] โ€” NPR topic tags (Politics, Business, Music, etc.)
  • show โ€” program affiliation (Morning Edition, All Things Considered, etc.)
  • audio_url โ€” direct MP3 link for the corresponding podcast segment
  • audio_duration_seconds
  • transcript_text โ€” full transcript when available
  • images[] โ€” {url, caption, credit}
  • pull_quotes[]
  • related_stories[] โ€” NPR-internal cross-links
  • member_station โ€” local NPR station that produced piece, if any

Use Cases

  • Media monitoring firms โ€” daily ingest of all NPR coverage matching a client's keywords + competitors
  • NLP / LLM training teams โ€” build a high-quality, professionally-edited news corpus with paired audio
  • Content aggregators โ€” power topic pages that surface the best NPR coverage on, say, the Fed
  • Political comms / PR teams โ€” track how a brand or issue gets framed in NPR's national audience
  • Academic researchers โ€” study media-language evolution by analyzing NPR coverage across decades
  • Educators โ€” generate classroom materials by pulling story + transcript + audio in one record

Quick Start

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("nexgendata/npr-scraper").call(run_input={
"queries": ["Federal Reserve", "AI regulation"],
"topics": ["politics", "economy"],
"since": "2026-04-01",
"maxStories": 200
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["published_at"], item["headline"])

Pricing

Pay-per-event:

  • Actor Start: small fixed charge per run (memory-scaled)
  • Per story: $1 per 1,000 stories returned

No subscription, no minimum.

Use caseActor
Podcast episode metadata + audio URLspodcast-episodes-scraper
News content + sentiment MCP for AInews-mcp-server
AI sentiment + theme analyzerai-sentiment-analyzer
Hacker News scraperhacker-news-scraper
Google News / Search SERPgoogle-search-scraper
Reddit subreddit trend trackerreddit-subreddit-trends
Crunchbase news scrapercrunchbase-news-scraper
YouTube channel + video metadata MCPyoutube-media-mcp-server

FAQ

Are NPR member-station stories included? Yes โ€” pieces from member stations that publish to npr.org show up with the member_station field populated.

How deep does the archive go? NPR's web archive is robust back to 2005-ish; the actor will return any story still online.

Are podcast transcripts always present? Where NPR publishes one โ€” which is most flagship shows (Morning Edition, All Things Considered, NPR Politics, Throughline, etc.).

Output formats? JSON, CSV, Excel, and the Apify dataset API.

Is this legal? Yes. NPR publishes all this content for public consumption; the actor is a structured-extraction wrapper.

About NexGenData

NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b


How NexGenData Pricing Works

Every NexGenData actor uses pay-per-event pricing โ€” you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.

  • Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
  • Result / item: charged per item written to the default dataset
  • No charge for retries, internal proxy rotation, or failed sub-requests โ€” those are absorbed by the platform

Apify Platform Bonus

New to Apify? Sign up with the NexGenData referral link โ€” you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.

Integration Surface

Every actor in the NexGenData catalog can be triggered from:

  • Apify console โ€” point-and-click run
  • Apify API โ€” REST + webhooks
  • Apify Python / JS SDKs โ€” programmatic batch
  • Zapier, Make.com, n8n โ€” official integrations
  • MCP โ€” many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
  • Schedules โ€” built-in cron for daily / weekly / monthly runs
  • Webhooks โ€” POST results to any HTTPS endpoint on dataset write

Support

NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome โ€” high-demand features ship in the next version.

Home: thenextgennexus.com Full catalog: apify.com/nexgendata