Actor picture

Product Hunt User Finder

jancurn/product-hunt-user-finder

Scrapes details on users on Product Hunt based on a provided list o Twitter usernames.

No credit card required

Author's avatarJan Čurn
  • Modified
  • Users21
  • Runs374
Actor picture
Product Hunt User Finder

Dockerfile

# Dockerfile contains instructions how to build a Docker image that
# will contain all the code and configuration needed to run your actor.
# For a full Dockerfile reference,
# see https://docs.docker.com/engine/reference/builder/

# First, specify the base Docker image. Apify provides the following
# base images for your convenience:
#  apify/actor-node-basic (Node.js on Alpine Linux, small and fast)
#  apify/actor-node-chrome (Node.js + Chrome on Debian)
#  apify/actor-node-chrome-xvfb (Node.js + Chrome + Xvfb on Debian)
# For more information, see https://docs.apify.com/actor/build#base-images
# Note that you can use any other image from Docker Hub.
FROM apify/actor-node-basic

# Second, copy just package.json since it should be the only file
# that affects "npm install" in the next step, to speed up the build
COPY package.json ./

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
 && npm install --only=prod --no-optional \
 && echo "Installed NPM packages:" \
 && npm list || true \
 && echo "Node.js version:" \
 && node --version \
 && echo "NPM version:" \
 && npm --version

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./

# Optionally, specify how to launch the source code of your actor.
# By default, Apify's base Docker images define the CMD instruction
# that runs the Node.js source code using the command specified
# in the "scripts.start" section of the package.json file.
# In short, the instruction looks something like this:
#
# CMD npm start

INPUT_SCHEMA.json

{
    "title": "Schema for the apify/phantomjs-extractor actor",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "twitterUsernames": {
            "title": "Twitter usernames",
            "type": "string",
            "description": "List of comma or whitespace-separated Twitter usernames to look for.",
            "editor": "textarea",
            "prefill": "jancurn elonmusk dummy_nonexistent_username"
        },
        "proxyConfiguration": {
            "title": "Proxy configuration",
            "type": "object",
            "nullable": false,
            "description": "Specifies the type of proxy servers that will be used by the crawler in order to hide its origin.",
            "editor": "proxy"
        }
    },
    "required": [
        "twitterUsernames"
    ]
}

README.md

# Find Product Hunt account details

This simple actor takes a list of Twitter usernames on input,
and finds and extracts details of the corresponding user accounts
on [Product Hunt](https://www.producthunt.com/).

For example, this is useful if you are launching on Product Hunt
and want to find out which of your existing users or Twitter followers
have an account on Product Hunt and how much of a following they have there.

main.js

const _ = require('underscore');
const Apify = require('apify');

Apify.main(async () => {
    const input = await Apify.getInput();

    const { proxyConfiguration, twitterUsernames } = input;

    const matches = [...twitterUsernames.matchAll(/[^ \n\t,]+/g)];
    if (matches.length === 0) {
        throw new Error('There are not Twitter usernames to process');
    }

    const sources = [];
    let i = 0;
    for (const match of matches) {
        const username = match[0];
        sources.push({
            url: `https://www.producthunt.com/frontend/users/${username}?id=${username}`,
            userData: {
                username,
                index: i,
            },
            uniqueKey: `${i}`,
        });
        i++;
    }

    const requestList = await Apify.openRequestList('my-list', sources);
    
    // Helper function to generate the URL
    const generateProxyUrl = () => {
        if (!proxyConfiguration) {
            return null;
        }
        if (proxyConfiguration.useApifyProxy) {
            return Apify.getApifyProxyUrl({
                session: _.random(1000000),
                ...proxyConfiguration,
            });
        }
        if (proxyConfiguration.proxyUrls && proxyConfiguration.proxyUrls.length > 0) {
            return proxyConfiguration.proxyUrls[_.random(proxyConfiguration.proxyUrls.length)];
        }
        return null;
    }
    
    const crawler = new Apify.BasicCrawler({
        requestList,
        // Avoid overloading of Product Hunt API
        maxConcurrency: 20,
        handleRequestFunction: async ({ request }) => {
            const proxyUrl = generateProxyUrl();
            console.log(`Processing ${request.url} via proxy ${proxyUrl}...`);

            const { body, statusCode } = await Apify.utils.requestAsBrowser({ url: request.url });

            let result;
            if (statusCode === 200) {
                result = {
                    original_index: request.userData.index,
                    original_twitter_username: request.userData.username,
                    response_status_code: statusCode,
                    ...JSON.parse(body).user,
                };
            } else if (statusCode === 404) {
                result = {
                    original_index: request.userData.index,
                    original_twitter_username: request.userData.username,
                    response_status_code: statusCode,
                };
            } else {
                throw new Error(`Received response status ${statusCode}: ${body}`);
            }

            await Apify.pushData(result);
        },
    });

    await crawler.run();
});

package.json

{
    "name": "my-actor",
    "version": "0.0.1",
    "dependencies": {
        "apify": "^0.20.4"
    },
    "scripts": {
        "start": "node main.js"
    },
    "author": "Me!"
}