Datasets Merge avatar
Datasets Merge
Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
Datasets Merge

Datasets Merge

petr_cermak/datasets-merge

Actor for merging multiple datasets into one. If you don't provide outputDatasetId, the result will be in the default dataset.

Dockerfile

1# This is a template for a Dockerfile used to run acts in Actor system.
2# The base image name below is set during the act build, based on user settings.
3# IMPORTANT: The base image must set a correct working directory, such as /usr/src/app or /home/user
4FROM apify/actor-node-basic:v0.21.10
5
6# Second, copy just package.json and package-lock.json since it should be
7# the only file that affects "npm install" in the next step, to speed up the build
8COPY package*.json ./
9
10# Install NPM packages, skip optional and development dependencies to
11# keep the image small. Avoid logging too much and print the dependency
12# tree for debugging
13RUN npm --quiet set progress=false \
14 && npm install --only=prod --no-optional \
15 && echo "Installed NPM packages:" \
16 && (npm list --all || true) \
17 && echo "Node.js version:" \
18 && node --version \
19 && echo "NPM version:" \
20 && npm --version
21
22# Copy source code to container
23# Do this in the last step, to have fast build if only the source code changed
24COPY  . ./
25
26# NOTE: The CMD is already defined by the base image.
27# Uncomment this for local node inspector debugging:
28# CMD [ "node", "--inspect=0.0.0.0:9229", "main.js" ]

package.json

1{
2    "name": "apify-project",
3    "version": "0.0.1",
4    "description": "",
5    "author": "It's not you it's me",
6    "license": "ISC",
7    "dependencies": {
8        "apify": "0.21.10"
9    },
10    "scripts": {
11        "start": "node main.js"
12    }
13}

main.js

1const Apify = require('apify');
2
3async function loadResults(datasetId, process, offset){  
4    const limit = 10000;
5    if(!offset){offset = 0;}
6    const newItems = await Apify.client.datasets.getItems({
7        datasetId, 
8        offset,
9        limit
10    });
11    if(newItems && (newItems.length || (newItems.items && newItems.items.length))){
12        if(newItems.length){await process(newItems);}
13        else if(newItems.items && newItems.items.length){await process(newItems.items);}
14        await loadResults(datasetId, process, offset + limit);
15    }
16};
17
18Apify.main(async () => {
19    const input = await Apify.getValue('INPUT');
20    if(!input.datasetIds || !Array.isArray(input.datasetIds)){
21        throw new Error('Missing or invalid datasetIds attribute in INPUT!');
22    }
23    
24    const output = await Apify.openDataset(input.outputDatasetId);
25    
26    for(const datasetId of input.datasetIds){
27        await loadResults(datasetId, async (items) => {
28            await output.pushData(items);
29        });
30    }
31    
32    console.log('finished');
33});
Developer
Maintained by Community
Categories