# This is a template for a Dockerfile used to run acts in Actor system.
# The base image name below is set during the act build, based on user settings.
# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
FROM apify/actor-node-basic

# Second, copy just package.json and package-lock.json since it should be
# the only file that affects "npm install" in the next step, to speed up the build
COPY package*.json ./

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
 && npm install --only=prod --no-optional \
 && echo "Installed NPM packages:" \
 && (npm list --all || true) \
 && echo "Node.js version:" \
 && node --version \
 && echo "NPM version:" \
 && npm --version

# Copy source code to container
# Do this in the last step, to have fast build if only the source code changed
COPY  . ./

# NOTE: The CMD is already defined by the base image.
# Uncomment this for local node inspector debugging:
# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

{
    "name": "apify-project",
    "version": "0.0.1",
    "description": "",
    "author": "It's not you it's me",
    "license": "ISC",
    "dependencies": {
        "apify": "0.22.4"
    },
    "scripts": {
        "start": "node main.js"
    }
}

main.js

1const Apify = require('apify');
2
3Apify.main(async () => {
4    const input = await Apify.getInput();
5    const requestQueue = await Apify.openRequestQueue('google-podcasts');
6    const dataset = await Apify.openDataset('google-podcasts');
7    let output = [];
8
9    for(const link of input.links)
10        await requestQueue.addRequest({
11            url: link,
12            uniqueKey: link + (new Date).toString()
13        });
14
15    const crawler = new Apify.CheerioCrawler({
16        requestQueue,
17
18        // The crawler downloads and processes the web pages in parallel, with a concurrency
19        // automatically managed based on the available system memory and CPU (see AutoscaledPool class).
20        // Here we define some hard limits for the concurrency.
21        minConcurrency: 10,
22        maxConcurrency: 50,
23
24        // On error, retry each page at most once.
25        maxRequestRetries: 1,
26
27        // Increase the timeout for processing of each page.
28        handlePageTimeoutSecs: 30,
29
30        // Limit to 10 requests per one crawl
31        maxRequestsPerCrawl: 10,
32
33        // This function will be called for each URL to crawl.
34        // It accepts a single parameter, which is an object with options as:
35        // https://sdk.apify.com/docs/typedefs/cheerio-crawler-options#handlepagefunction
36        // We use for demonstration only 2 of them:
37        // - request: an instance of the Request class with information such as URL and HTTP method
38        // - $: the cheerio object containing parsed HTML
39        handlePageFunction: async ({ request, $ }) => {
40            console.log("Handling " + request.url)
41
42            const pattern = /\/feed\/(\w|\/|\?|\=|\&|;)+"/
43            let url_podcast = $.html().match(pattern) // This will return the first link
44
45            if (url_podcast != null) {
46                url_podcast = url_podcast[0].replace('"', '') // take the first item and remove the trailing quote symbol (")
47
48                const out = {
49                    url: request.url,
50                    url_podcast
51                };
52
53                await dataset.pushData(out);
54                output.push(out)
55            }
56            
57        },
58    });
59
60    await crawler.run();
61    console.log(output)
62
63    await Apify.setValue('OUTPUT', output);
64});

Apple 🍎 Podcasts Extractor

jupri/apple-podcast

💫 All-in-One APPLE.com Podcasts Scraper

cat

149

5.0

YouTube Music Podcast Scraper 🎧

easyapi/youtube-music-podcast-scraper

Scrape YouTube Music podcast playlists data by keywords. Extract detailed information including titles, creators, thumbnails, playlist IDs, and URLs. Perfect for podcast discovery and analysis.

EasyApi

🎙️ Podcast Episode Ideas Creator

powerai/podcast-episode-ideas-creator

Transform your podcast content strategy with AI-powered episode ideas! This intelligent tool analyzes your podcast theme and target audience to generate comprehensive episode plans, including topic categories, guest speaker suggestions, storytelling techniques, and attention-grabbing titles.

PowerAI

1.7

🎙️ Podcast Episode Ideas Creator(Rental)

powerai/podcast-episode-ideas-creator-rental

PowerAI

5.0

Apple Podcasts Show Scraper 🎧🍏 - Cheap

scrapestorm/apple-podcasts-show-scraper---cheap

🔍 Looking to explore podcasts by keyword on Apple Podcasts? With the Apple Podcasts Show Scraper 🍏🎧, you can extract show titles, artist names, genres, release dates, and more. Fast, simple, and ideal for podcast fans, researchers, marketers, and data analysts! 📊🎙️

Storm_Scraper

5.0

App Store Data Extractor - Scrape reviews too!

epctex/appstore-scraper

Discover a vast collection of apps, movies, podcasts, reviews, and more on iTunes and the App Store. Extract comprehensive data including images, ISBN, author, description, title, language, user ratings, and reviews focused on countries without limitations. Unlimited and extremely fast!

epctex

659

3.4

YouTube Full Channel Transcripts Extractor

karamelo/youtube-full-channel-transcripts-extractor

With only the channel or playlist link You can extract 1 to 1000s of all the transcripts of a channel, be it videos or shorts or streams/lives or even podcasts and playlists, you name it. Get all the transcripts/captions organized with video ID and title in a nice table or JSON or CSV to download.

karamelo

919

4.0

YouTube Scraper - Full Channel,Playlists,Shorts..

dz_omar/Youtube-Scraper-Pro

Extracts metadata and scripts from all YT content, identifying smart keywords like #hashtags and @channelnames. It supports playlists, shorts, live, podcasts, courses, channels, videos, and batch requests.

dz_omar

4.9

Audible Scraper

mscraper/audible-scraper

Extract data from Amazon's audiobook and podcast service Audible. Extract data straight from Audible Best Sellers. Scrape prices, descriptions, ratings, reviews, and other data from the results, which you can export in a number of dataset formats.

mscraper

Substack Posts Scraper 📚

easyapi/substack-posts-scraper

Scrape Substack posts and articles by keywords. Extract comprehensive post data including title, author, publication details, podcast information, reactions, and more. Perfect for content analysis and research.

EasyApi