Sort Dataset Items
- lukaskrivka/sort-dataset-items
- Modified
- Users 3
- Runs 132
- Created by
Lukáš Křivka
Add this actor as a webhook to your scraper to sort the dataset by index field
.editorconfig
root = true
[*]
indent_style = space
indent_size = 4
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
end_of_line = lf
.eslintrc
{
"extends": "@apify"
}
.gitignore
# This file tells Git which files shouldn't be added to source control
.idea
node_modules
Dockerfile
# First, specify the base Docker image. You can read more about
# the available images at https://sdk.apify.com/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node:16
# Second, copy just package.json and package-lock.json since it should be
# the only file that affects "npm install" in the next step, to speed up the build
COPY package*.json ./
# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
&& npm install --only=prod --no-optional \
&& echo "Installed NPM packages:" \
&& (npm list --only=prod --no-optional --all || true) \
&& echo "Node.js version:" \
&& node --version \
&& echo "NPM version:" \
&& npm --version
# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./
# Optionally, specify how to launch the source code of your actor.
# By default, Apify's base Docker images define the CMD instruction
# that runs the Node.js source code using the command specified
# in the "scripts.start" section of the package.json file.
# In short, the instruction looks something like this:
#
# CMD npm start
INPUT_SCHEMA.json
{
"title": "Input schema for the apify_project actor.",
"type": "object",
"schemaVersion": 1,
"properties": {
"datasetId": {
"title": "Dataset Id",
"type": "string",
"description": "Dataset Id of dataset to sort. You don't need to provide this if you use a webhook.",
"editor": "textfield"
}
},
"required": []
}
README.md
# Sort dataset items
Dataset items are immutable but this actor will create a new dataset with ordered items. This is useful if you need to keep the same order as your Start URLs.
1. Make sure that items from your scraper in your original dataset have an `index` property, e.g.
```javascript
{
index: 1,
price: 2.35,
title: 'my product',
// your other data
}
```
2. Add this actor as a webhook to your scraper.
URL: https://api.apify.com/v2/acts/lukaskrivka~sort-dataset-items/runs?token=YOUR_TOKEN (replace with your real token)
3. After your scraper finishes, it will automatically launch this actor. Once this actor finishes, it will produce dataset with sorted items.
### Memory
If you need to sort large dataset, you might need to increase memory of this actor. You can change default memory by making it into a task.
apify.json
{
"env": { "npm_config_loglevel": "silent" }
}
main.js
const Apify = require('apify');
const transformFunction = (items) => {
items.sort((a, b) => {
return a.index - b.index;
})
return items;
}
Apify.main(async () => {
// Get input of the actor (here only for demonstration purposes).
const input = await Apify.getInput();
console.log('Input:');
console.dir(input);
let {
// either called from webhook or directly
resource,
datasetId
} = input;
if (resource) {
datasetId = resource.defaultDatasetId;
}
const actorInput = {
datasetIds: [datasetId],
preDedupTransformFunction: transformFunction,
};
await Apify.metamorph(
'lukaskrivka/dedup-datasets',
actorInput,
)
});
package.json
{
"name": "project-empty",
"version": "0.0.1",
"description": "This is a boilerplate of an Apify actor.",
"dependencies": {
"apify": "^2.0.7"
},
"devDependencies": {
"@apify/eslint-config": "^0.1.3",
"eslint": "^7.0.0"
},
"scripts": {
"start": "node main.js",
"lint": "./node_modules/.bin/eslint ./src --ext .js,.jsx",
"lint:fix": "./node_modules/.bin/eslint ./src --ext .js,.jsx --fix",
"test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1"
},
"author": "It's not you it's me",
"license": "ISC"
}