Podcast Episode Extractor
Pricing
from $2.00 / 1,000 results
Go to Apify Store
Under maintenance
Podcast Episode Extractor
ποΈ Extract comprehensive podcast episode data from RSS feeds including metadata, audio information, and episode details. Perfect for podcast aggregation and content research.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer

SimplifySME Toolbox
Maintained by Community
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
6 days ago
Last modified
Categories
Share
ποΈ Extract comprehensive podcast episode data from RSS feeds including metadata, audio information, and episode details. Perfect for podcast aggregation and content research.
πΊ What It Extracts
- Podcast Metadata: Title, description, link, image, author, language
- Episode Details: Title, description, link, publication dates
- Audio Information: Audio URL, type, file size (with automatic fallback)
- Episode Metadata: Episode number, season, explicit content flag
- Content Organization: Categories and tags
π Key Features
| Feature | Description |
|---|---|
| π΅ Audio Length Detection | Extracts audio file size from RSS or HTTP headers |
| β±οΈ Duration Parsing | Converts duration strings to seconds (HH:MM:SS, MM:SS, etc.) |
| π Structured Output | Clean JSON format with all episode data |
| π Smart Fallbacks | Fetches audio length from HTTP headers if not in RSS |
| β‘ Concurrent Processing | Efficient processing of multiple episodes |
| π Complete Metadata | Extracts all available podcast and episode information |
π₯ Input
Required
rssUrl(string): The podcast RSS feed URL- Example:
"https://feeds.example.com/podcast.xml" - Supports standard RSS and Atom feed formats
- Example:
π€ Output
Returns comprehensive podcast data:
Podcast Metadata
{"podcast": {"title": "Podcast Title","description": "Podcast description...","link": "https://example.com/podcast","image": "https://example.com/podcast-image.jpg","author": "Author Name","language": "en"}}
Episodes Array
{"episodes": [{"title": "Episode Title","description": "Episode description...","link": "https://example.com/episode","pubDate": "Mon, 01 Jan 2024 10:00:00 GMT","isoDate": "2024-01-01T10:00:00.000Z","duration": "45:30","durationSeconds": 2730,"episode": 1,"season": 1,"explicit": false,"author": "Author Name","image": "https://example.com/episode-image.jpg","audioUrl": "https://example.com/episode.mp3","audioType": "audio/mpeg","audioLength": 45678901,"audioLengthMB": 43.56,"categories": ["Technology", "Business"]}],"totalEpisodes": 10}
π‘ Use Cases
- β Podcast Aggregation - Collect episodes from multiple podcasts
- β Content Research - Analyze podcast content and topics
- β Episode Tracking - Monitor new episodes and updates
- β Analytics - Track podcast performance and metrics
- β Content Discovery - Find relevant podcast episodes
- β Media Libraries - Build podcast databases and catalogs
βοΈ Technical Details
- Feed Parser: Uses
rss-parserlibrary for robust feed parsing - Audio Length Detection:
- Primary: Extracts from RSS
<enclosure>tag - Fallback: Performs HTTP HEAD request to get
Content-Lengthheader
- Primary: Extracts from RSS
- Duration Parsing: Handles multiple formats (HH:MM:SS, MM:SS, seconds)
- Concurrency: Limits to 5 concurrent HTTP requests for audio length fallback
- Timeout: 10-second timeout for each HTTP HEAD request
π Example Usage
Basic Extraction
{"rssUrl": "https://feeds.example.com/podcast.xml"}
Popular Podcast Feeds
{"rssUrl": "https://feeds.npr.org/510289/podcast.xml"}
β οΈ Important Notes
- Audio Length: If not available in RSS feed, the actor will attempt to fetch it from the audio URL's HTTP headers
- Duration Formats: Supports HH:MM:SS, MM:SS, and seconds-only formats
- Concurrency: Audio length fallback requests are limited to 5 concurrent requests
- Timeout: Each HTTP HEAD request has a 10-second timeout