Podcast Episode Extractor avatar
Podcast Episode Extractor
Deprecated

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Podcast Episode Extractor

Podcast Episode Extractor

Deprecated

๐ŸŽ™๏ธ Extract comprehensive podcast episode data from RSS feeds including metadata, audio information, and episode details. Perfect for podcast aggregation and content research.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

SimplifySME Toolbox

SimplifySME Toolbox

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

2 days ago

Last modified

Share

๐ŸŽ™๏ธ Extract comprehensive podcast episode data from RSS feeds including metadata, audio information, and episode details. Perfect for podcast aggregation and content research.


๐Ÿ“บ What It Extracts

  • Podcast Metadata: Title, description, link, image, author, language
  • Episode Details: Title, description, link, publication dates
  • Audio Information: Audio URL, type, file size (with automatic fallback)
  • Episode Metadata: Episode number, season, explicit content flag
  • Content Organization: Categories and tags

๐Ÿš€ Key Features

FeatureDescription
๐ŸŽต Audio Length DetectionExtracts audio file size from RSS or HTTP headers
โฑ๏ธ Duration ParsingConverts duration strings to seconds (HH:MM:SS, MM:SS, etc.)
๐Ÿ“Š Structured OutputClean JSON format with all episode data
๐Ÿ”„ Smart FallbacksFetches audio length from HTTP headers if not in RSS
โšก Concurrent ProcessingEfficient processing of multiple episodes
๐Ÿ“ˆ Complete MetadataExtracts all available podcast and episode information

๐Ÿ“ฅ Input

Required

  • rssUrl (string): The podcast RSS feed URL
    • Example: "https://feeds.example.com/podcast.xml"
    • Supports standard RSS and Atom feed formats

๐Ÿ“ค Output

Returns comprehensive podcast data:

Podcast Metadata

{
"podcast": {
"title": "Podcast Title",
"description": "Podcast description...",
"link": "https://example.com/podcast",
"image": "https://example.com/podcast-image.jpg",
"author": "Author Name",
"language": "en"
}
}

Episodes Array

{
"episodes": [
{
"title": "Episode Title",
"description": "Episode description...",
"link": "https://example.com/episode",
"pubDate": "Mon, 01 Jan 2024 10:00:00 GMT",
"isoDate": "2024-01-01T10:00:00.000Z",
"duration": "45:30",
"durationSeconds": 2730,
"episode": 1,
"season": 1,
"explicit": false,
"author": "Author Name",
"image": "https://example.com/episode-image.jpg",
"audioUrl": "https://example.com/episode.mp3",
"audioType": "audio/mpeg",
"audioLength": 45678901,
"audioLengthMB": 43.56,
"categories": ["Technology", "Business"]
}
],
"totalEpisodes": 10
}

๐Ÿ’ก Use Cases

  • โœ… Podcast Aggregation - Collect episodes from multiple podcasts
  • โœ… Content Research - Analyze podcast content and topics
  • โœ… Episode Tracking - Monitor new episodes and updates
  • โœ… Analytics - Track podcast performance and metrics
  • โœ… Content Discovery - Find relevant podcast episodes
  • โœ… Media Libraries - Build podcast databases and catalogs

โš™๏ธ Technical Details

  • Feed Parser: Uses rss-parser library for robust feed parsing
  • Audio Length Detection:
    • Primary: Extracts from RSS <enclosure> tag
    • Fallback: Performs HTTP HEAD request to get Content-Length header
  • Duration Parsing: Handles multiple formats (HH:MM:SS, MM:SS, seconds)
  • Concurrency: Limits to 5 concurrent HTTP requests for audio length fallback
  • Timeout: 10-second timeout for each HTTP HEAD request

๐Ÿ“ Example Usage

Basic Extraction

{
"rssUrl": "https://feeds.example.com/podcast.xml"
}
{
"rssUrl": "https://feeds.npr.org/510289/podcast.xml"
}

โš ๏ธ Important Notes

  • Audio Length: If not available in RSS feed, the actor will attempt to fetch it from the audio URL's HTTP headers
  • Duration Formats: Supports HH:MM:SS, MM:SS, and seconds-only formats
  • Concurrency: Audio length fallback requests are limited to 5 concurrent requests
  • Timeout: Each HTTP HEAD request has a 10-second timeout