
Algolia Webcrawler
Pricing
Pay per usage
Go to Store

Algolia Webcrawler
Crawls a website using one or more sitemaps and imports the data to Algolia search index. The text content is identified using simple CSS selectors.
0.0 (0)
Pricing
Pay per usage
4
Total users
80
Monthly users
1
Runs succeeded
0%
Last modified
4 years ago
Dockerfile
FROM apify/actor-node-basic
# First, copy package.json since it affects NPM installCOPY package.json ./
# Install NPM packages, skip optional and development dependencies to# keep the image small. Avoid logging to much and print the dependency# tree for debuggingRUN npm --quiet set progress=false \ && npm install --only=prod --no-optional \ && echo "Installed NPM packages:" \ && npm list \ && echo "Node.js version:" \ && node --version \ && echo "NPM version:" \ && npm --version
# Lastly, copy remaining files and directories with the source code.# This way, quick build will not need to reinstall packages on a simple change.COPY . ./
# Specify how to run the source codeCMD npm start
main.js
1const fs = require('fs');2const tmp = require('tmp');3const Apify = require('apify');4
5// Hack to circumvent strange error exit code masking in alogila-crawler6// (see https://github.com/DeuxHuitHuit/algolia-webcrawler/blob/master/app.js#L29)7process.on('exit', (code) => {8 console.log('Exiting the process with code ' + code);9 process.exit(code);10});11
12(async function () {13 try {14 // Get input of your actor15 const input = await Apify.getValue('INPUT');16 console.log('Input fetched:');17 console.dir(input);18 19 // From algolia-webcrawler docs:20 // "At the bare minimum, you can edit config.json to set a values to the following options:21 // 'app', 'cred', 'indexname' and at least one 'sitemap' object. If you have multiple sitemaps,22 // please list them all: sub-sitemaps will not be crawled."23 if (!input || !input.app || !input.cred || !input.index || !input.sitemaps) {24 console.error('The input must be a JSON config file with fields as required by algolia-webcrawler package.');25 console.error('For details, see https://www.npmjs.com/package/algolia-webcrawler');26 process.exit(33);27 }28 29 var tmpobj = tmp.fileSync({ prefix: 'aloglia-input-', postfix: '.json' });30 console.log(`Writing input JSON to file ${tmpobj.name}`);31 fs.writeFileSync(tmpobj.name, JSON.stringify(input, null, 2));32 33 console.log(`Emulating command: node algolia-webcrawler --config ${tmpobj.name}`);34 process.argv[2] = '--config';35 process.argv[3] = tmpobj.name;36 const webcrawler = require('algolia-webcrawler');37 } catch (e) {38 console.error(e.stack || e);39 process.exit(34);40 }41})();
package.json
{ "name": "my-actor", "version": "0.0.1", "dependencies": { "apify": "^0.14.3", "tmp": "^0.1.0", "algolia-webcrawler": "^3.2.0" }, "scripts": { "start": "node main.js" }, "author": "Me!"}