Example Hacker News avatar
Example Hacker News
Deprecated

Pricing

Pay per usage

Go to Store
Example Hacker News

Example Hacker News

Deprecated
mtrunkat/example-hacker-news

Developed by

Marek Trunkát

Maintained by Community

Example crawler for news.ycombinator.com built using Apify SDK.

0.0 (0)

Pricing

Pay per usage

8

Monthly users

1

Last modified

2 years ago

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY --chown=myuser:myuser . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

main.js

1const Apify = require('apify');
2
3Apify.main(async () => {
4    // Get queue and enqueue first url.
5    const requestQueue = await Apify.openRequestQueue();
6    const enqueueUrl = async url => requestQueue.addRequest({ url });
7    await enqueueUrl('https://news.ycombinator.com/');
8
9    // Create crawler.
10    const crawler = new Apify.PuppeteerCrawler({
11        requestQueue,
12        
13        launchPuppeteerOptions: {
14          liveView: true, 
15        },
16
17        // This page is executed for each request.
18        // If request failes then it's retried 3 times.
19        // Parameter page is Puppeteers page object with loaded page.
20        handlePageFunction: async ({ page, request }) => {
21            console.log(`Request ${request.url} succeeded!`);
22
23            // We inject JQuery for easier data extracting
24            await Apify.utils.puppeteer.injectJQuery(page)
25
26            // Extract all posts. This is a function that gets executed inside a browser context
27            // $ is JQuery variable that is actualy defined on the browser itself 
28            // so don't worry about the red line warning
29            const data = await page.evaluate(() => {
30                let posts = [];
31                $('.athing').each(function() {
32                    posts.push({
33                        rank: Number($(this).find('.rank').text().replace('.', '').trim()),
34                        title: $(this).find('.storylink').text().trim(),
35                        link: $(this).find('.storylink').attr('href'),
36                        domain: $(this).find('.sitestr').text().trim(),
37                        score: Number($(this).next().find('.score').text().replace('points', '').replace(',', '').trim()),
38                        author: $(this).next().find('.hnuser').text().trim(),
39                        posted: $(this).next().find('.age').text().trim(),
40                        comments: Number($(this).next().find('a:contains("comments")').text().replace('comments', '').replace(',', '').trim()),
41                        url: window.location.href,
42                    })
43                })
44                return posts;
45            });
46            
47            // Save data.
48            await Apify.pushData(data);
49            
50            // Enqueue next page.
51            try {
52                const nextHref = await page.$eval('.morelink', el => el.href);
53                await enqueueUrl(nextHref);
54            } catch (err) {
55                console.log(`Url ${request.url} is the last page!`);
56            }
57        },
58
59        // If request failed 4 times then this function is executed.
60        handleFailedRequestFunction: async ({ request }) => {
61            console.log(`Request ${request.url} failed 4 times`);
62            
63            await Apify.pushData({
64                url: request.url,
65                errors: request.errorMessages,
66            })
67        },
68    });
69    
70    // Run crawler.
71    await crawler.run();
72});

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "1.0.0"
9    },
10    "scripts": {
11        "start": "node main.js"
12    }
13}

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.