Podcasts avatar

Podcasts

Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Podcasts

Podcasts

zyberg/podcasts

Gets the url of the first podcast from google podcasts

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-basic
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY  . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "0.22.4"
9    },
10    "scripts": {
11        "start": "node main.js"
12    }
13}

main.js

1const Apify = require('apify');
2
3Apify.main(async () => {
4    const input = await Apify.getInput();
5    const requestQueue = await Apify.openRequestQueue('google-podcasts');
6    const dataset = await Apify.openDataset('google-podcasts');
7    let output = [];
8
9    for(const link of input.links)
10        await requestQueue.addRequest({
11            url: link,
12            uniqueKey: link + (new Date).toString()
13        });
14
15    const crawler = new Apify.CheerioCrawler({
16        requestQueue,
17
18        // The crawler downloads and processes the web pages in parallel, with a concurrency
19        // automatically managed based on the available system memory and CPU (see AutoscaledPool class).
20        // Here we define some hard limits for the concurrency.
21        minConcurrency: 10,
22        maxConcurrency: 50,
23
24        // On error, retry each page at most once.
25        maxRequestRetries: 1,
26
27        // Increase the timeout for processing of each page.
28        handlePageTimeoutSecs: 30,
29
30        // Limit to 10 requests per one crawl
31        maxRequestsPerCrawl: 10,
32
33        // This function will be called for each URL to crawl.
34        // It accepts a single parameter, which is an object with options as:
35        // https://sdk.apify.com/docs/typedefs/cheerio-crawler-options#handlepagefunction
36        // We use for demonstration only 2 of them:
37        // - request: an instance of the Request class with information such as URL and HTTP method
38        // - $: the cheerio object containing parsed HTML
39        handlePageFunction: async ({ request, $ }) => {
40            console.log("Handling " + request.url)
41
42            const pattern = /\/feed\/(\w|\/|\?|\=|\&|;)+"/
43            let url_podcast = $.html().match(pattern) // This will return the first link
44
45            if (url_podcast != null) {
46                url_podcast = url_podcast[0].replace('"', '') // take the first item and remove the trailing quote symbol (")
47
48                const out = {
49                    url: request.url,
50                    url_podcast
51                };
52
53                await dataset.pushData(out);
54                output.push(out)
55            }
56            
57        },
58    });
59
60    await crawler.run();
61    console.log(output)
62
63    await Apify.setValue('OUTPUT', output);
64});
Developer
Maintained by Community
Categories