Podcast Episode Extractor avatar
Podcast Episode Extractor
Under maintenance

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Podcast Episode Extractor

Podcast Episode Extractor

Under maintenance

πŸŽ™οΈ Extract comprehensive podcast episode data from RSS feeds including metadata, audio information, and episode details. Perfect for podcast aggregation and content research.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

SimplifySME Toolbox

SimplifySME Toolbox

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

6 days ago

Last modified

Share

πŸŽ™οΈ Extract comprehensive podcast episode data from RSS feeds including metadata, audio information, and episode details. Perfect for podcast aggregation and content research.


πŸ“Ί What It Extracts

  • Podcast Metadata: Title, description, link, image, author, language
  • Episode Details: Title, description, link, publication dates
  • Audio Information: Audio URL, type, file size (with automatic fallback)
  • Episode Metadata: Episode number, season, explicit content flag
  • Content Organization: Categories and tags

πŸš€ Key Features

FeatureDescription
🎡 Audio Length DetectionExtracts audio file size from RSS or HTTP headers
⏱️ Duration ParsingConverts duration strings to seconds (HH:MM:SS, MM:SS, etc.)
πŸ“Š Structured OutputClean JSON format with all episode data
πŸ”„ Smart FallbacksFetches audio length from HTTP headers if not in RSS
⚑ Concurrent ProcessingEfficient processing of multiple episodes
πŸ“ˆ Complete MetadataExtracts all available podcast and episode information

πŸ“₯ Input

Required

  • rssUrl (string): The podcast RSS feed URL
    • Example: "https://feeds.example.com/podcast.xml"
    • Supports standard RSS and Atom feed formats

πŸ“€ Output

Returns comprehensive podcast data:

Podcast Metadata

{
"podcast": {
"title": "Podcast Title",
"description": "Podcast description...",
"link": "https://example.com/podcast",
"image": "https://example.com/podcast-image.jpg",
"author": "Author Name",
"language": "en"
}
}

Episodes Array

{
"episodes": [
{
"title": "Episode Title",
"description": "Episode description...",
"link": "https://example.com/episode",
"pubDate": "Mon, 01 Jan 2024 10:00:00 GMT",
"isoDate": "2024-01-01T10:00:00.000Z",
"duration": "45:30",
"durationSeconds": 2730,
"episode": 1,
"season": 1,
"explicit": false,
"author": "Author Name",
"image": "https://example.com/episode-image.jpg",
"audioUrl": "https://example.com/episode.mp3",
"audioType": "audio/mpeg",
"audioLength": 45678901,
"audioLengthMB": 43.56,
"categories": ["Technology", "Business"]
}
],
"totalEpisodes": 10
}

πŸ’‘ Use Cases

  • βœ… Podcast Aggregation - Collect episodes from multiple podcasts
  • βœ… Content Research - Analyze podcast content and topics
  • βœ… Episode Tracking - Monitor new episodes and updates
  • βœ… Analytics - Track podcast performance and metrics
  • βœ… Content Discovery - Find relevant podcast episodes
  • βœ… Media Libraries - Build podcast databases and catalogs

βš™οΈ Technical Details

  • Feed Parser: Uses rss-parser library for robust feed parsing
  • Audio Length Detection:
    • Primary: Extracts from RSS <enclosure> tag
    • Fallback: Performs HTTP HEAD request to get Content-Length header
  • Duration Parsing: Handles multiple formats (HH:MM:SS, MM:SS, seconds)
  • Concurrency: Limits to 5 concurrent HTTP requests for audio length fallback
  • Timeout: 10-second timeout for each HTTP HEAD request

πŸ“ Example Usage

Basic Extraction

{
"rssUrl": "https://feeds.example.com/podcast.xml"
}
{
"rssUrl": "https://feeds.npr.org/510289/podcast.xml"
}

⚠️ Important Notes

  • Audio Length: If not available in RSS feed, the actor will attempt to fetch it from the audio URL's HTTP headers
  • Duration Formats: Supports HH:MM:SS, MM:SS, and seconds-only formats
  • Concurrency: Audio length fallback requests are limited to 5 concurrent requests
  • Timeout: Each HTTP HEAD request has a 10-second timeout