Economist Category Scraper
Go to Store
This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?
See alternative ActorsEconomist Category Scraper
mtrunkat/economist-category-scraper
Example implementation of economist.com scraper built using apify/web-scraper actor. Crawls latest updates from a given economist category.
Dockerfile
1# Dockerfile contains instructions how to build a Docker image that
2# will contain all the code and configuration needed to run your actor.
3# For a full Dockerfile reference,
4# see https://docs.docker.com/engine/reference/builder/
5
6# First, specify the base Docker image. Apify provides the following
7# base images for your convenience:
8# apify/actor-node-basic (Node.js 10 on Alpine Linux, small and fast)
9# apify/actor-node-chrome (Node.js 10 + Chrome on Debian)
10# apify/actor-node-chrome-xvfb (Node.js 10 + Chrome + Xvfb on Debian)
11# For more information, see https://apify.com/docs/actor#base-images
12# Note that you can use any other image from Docker Hub.
13FROM apify/actor-node-basic
14
15# Second, copy just package.json since it should be the only file
16# that affects NPM install in the next step
17COPY package.json ./
18
19# Install NPM packages, skip optional and development dependencies to
20# keep the image small. Avoid logging too much and print the dependency
21# tree for debugging
22RUN npm --quiet set progress=false \
23 && npm install --only=prod --no-optional \
24 && echo "Installed NPM packages:" \
25 && npm list \
26 && echo "Node.js version:" \
27 && node --version \
28 && echo "NPM version:" \
29 && npm --version
30
31# Next, copy the remaining files and directories with the source code.
32# Since we do this after NPM install, quick build will be really fast
33# for most source file changes.
34COPY . ./
35
36# Optionally, specify how to launch the source code of your actor.
37# By default, Apify's base Docker images define the CMD instruction
38# that runs the source code using the command specified
39# in the "scripts.start" section of the package.json file.
40# In short, the instruction looks something like this:
41# CMD npm start
INPUT_SCHEMA.json
1{
2 "title": "My input schema",
3 "type": "object",
4 "schemaVersion": 1,
5 "properties": {
6 "category": {
7 "title": "Category",
8 "type": "string",
9 "description": "Economist.com category to be scraped",
10 "editor": "textfield",
11 "prefill": "briefing"
12 }
13 }
14}
main.js
1// This is the main Node.js source code file of your actor.
2// It is referenced from the "scripts" section of the package.json file.
3
4const Apify = require('apify');
5
6Apify.main(async () => {
7 // Get input of the actor. Input fields can be modified in INPUT_SCHEMA.json file.
8 // For more information, see https://apify.com/docs/actor/input-schema
9 const input = await Apify.getInput();
10 console.log('Input:');
11 console.dir(input);
12
13 // Here you can prepare your input for actor apify/web-scraper this input is based on a actor
14 // task you used as the starting point.
15 const metamorphInput = {
16 "startUrls": [
17 {
18 "url": `https://www.economist.com/${input.category}/?page=1`,
19 "method": "GET"
20 }
21 ],
22 "useRequestQueue": true,
23 "pseudoUrls": [
24 {
25 "purl": `https://www.economist.com/${input.category}/?page=[\\d+]`,
26 "method": "GET"
27 }
28 ],
29 "linkSelector": "a",
30 "pageFunction": async function pageFunction(context) {
31 // request is an instance of Apify.Request (https://sdk.apify.com/docs/api/request)
32 // $ is an instance of jQuery (http://jquery.com/)
33 const request = context.request;
34 const $ = context.jQuery;
35 const pageNum = parseInt(request.url.split('?page=').pop());
36
37 context.log.info(`Scraping ${context.request.url}`);
38
39 // Extract all articles.
40 const articles = [];
41 $('article').each((index, articleEl) => {
42 const $articleEl = $(articleEl);
43
44 // H3 contains 2 child elements where first one is topic and second is article title.
45 const $h3El = $articleEl.find('h3');
46
47 // Extract additonal info and push it to data object.
48 articles.push({
49 pageNum,
50 topic: $h3El.children().first().text(),
51 title: $h3El.children().last().text(),
52 url: $articleEl.find('a')[0].href,
53 teaser: $articleEl.find('.teaser__text').text(),
54 });
55 });
56
57 // Return results.
58 return articles;
59 },
60 "proxyConfiguration": {
61 "useApifyProxy": true
62 },
63 "debugLog": false,
64 "browserLog": false,
65 "injectJQuery": true,
66 "injectUnderscore": false,
67 "downloadMedia": false,
68 "downloadCss": false,
69 "ignoreSslErrors": false
70 };
71
72 // Now let's metamorph into actor apify/web-scraper using the created input.
73 await Apify.metamorph('apify/web-scraper', metamorphInput);
74});
package.json
1{
2 "name": "my-actor",
3 "version": "0.0.1",
4 "dependencies": {
5 "apify": "^0.14.5"
6 },
7 "scripts": {
8 "start": "node main.js"
9 },
10 "author": "Me!"
11}
Developer
Maintained by Community
Categories