Podcast Transcript Search
Pricing
Pay per usage
Podcast Transcript Search
Podcast Transcript Search. Search and discover data across multiple sources with structured output. Fast, reliable, and cost-effective.
What does Podcast Transcript Search do?
Podcast Transcript Search is an Apify actor that crawls podcast transcript databases and directories to find episodes containing specific keywords and topics. It returns structured results with the podcast name, episode title, matched text excerpts, timestamps, speaker names, and direct links. This actor is perfect for researchers, journalists, content creators, and marketers who need to discover relevant podcast discussions without manually listening to hours of audio content. The actor leverages CheerioCrawler for efficient HTML parsing across multiple transcript sources.
Why use Podcast Transcript Search?
The podcast ecosystem has exploded with millions of episodes across every conceivable topic. Finding the specific episodes that discuss your area of interest is nearly impossible through manual browsing. This actor automates the search process by crawling publicly available transcript databases and extracting relevant mentions. Instead of spending hours searching through individual podcast platforms, you get a consolidated dataset of every mention of your keywords along with contextual information to evaluate relevance. The structured output makes it easy to filter, sort, and analyze results in spreadsheets or data pipelines.
How to use Podcast Transcript Search
- Open the actor on the Apify platform.
- Enter your search keywords in the
searchQueryfield. Use specific phrases for targeted results. - Set
maxResultsto control the volume of data returned. - Click Start to begin the search.
- Review results in the Dataset tab once the run completes.
- Export to JSON, CSV, or Excel for further analysis.
Input Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
searchQuery | string | Keywords or topics to search for | "artificial intelligence" |
maxResults | integer | Maximum transcript matches to return | 30 |
Output
Each result in the dataset contains:
| Field | Description |
|---|---|
podcastName | Name of the podcast show |
episodeTitle | Title of the specific episode |
matchedText | Text excerpt containing the keyword match |
timestamp | Timestamp within the episode (when available) |
speakerName | Name of the speaker (when available) |
url | Direct link to the transcript or episode |
Cost Estimate
Running this actor on the Apify platform typically costs between $0.005 and $0.02 per run depending on the number of results requested. The actor uses 1024 MB of memory by default and processes pages efficiently using Cheerio rather than a full browser. Most searches complete within 1-3 minutes. Increasing maxResults may extend the run time and cost proportionally.
Tips and Best Practices
- Use specific multi-word phrases for more relevant results. For example, "machine learning ethics" will yield better matches than just "machine learning."
- Combine results from multiple runs with different keywords to build comprehensive topic research.
- The
matchedTextfield provides context around each keyword hit so you can evaluate relevance without visiting every link. - Schedule regular runs to discover new episodes as they are published and transcribed.
- For related content aggregation, check out the RSS Feed Aggregator actor to monitor podcast RSS feeds directly.
- Some transcript databases may have rate limits. The actor handles retries automatically, but very large searches may take longer.
- Filter your exported dataset by
podcastNameto focus on specific shows that consistently cover your topic.
