Add appendDatasetIds to input. This is useful for transforming functions if you need to check which dataset each item comes from.
Add diff function to transform function's second parameter object. This is using the fast-diff package. This can be used to compare changes between string fields of two datasets.

2024-09-10

Features

Add persistedSharedObject to transform functions input object. This object is shared for all transform function calls and persist over Actor migrations. This is useful mainly for the dedup-as-loading mode where the transform functions are called multiple times and only process a chunk of the data.
Add nullAsUnique to input. If set to true, the null and missing values are considered unique and not deduplicated.

2024-07-03

Features

Enable merging all datasets for runs of an Actor or Task with actorOrTaskId, onlyRunsNewerThan, onlyRunsOlderThan input parameters.

2023-07-13

Features

Add customInputData object to input for easy passing of custom values into preDedupTransformFunction and postDedupTransformFunction. It is part of the 2nd parameter object.

2021-01-24

Features

Added fieldsToLoad to input to increase speed and reduce memory if you don't need full items in output
Added limit and offset to input to be able to process only slices of dataset
Removed uploadSleepMs as the platform can now handle much higher load of upload

2021-01-14

Features

outputDatasetId can now also use dataset name. If dataset with that name doesn't exist, a new dataset is created.

2020-07-10

Fixes:

dedup-as-loading mode now works correctly with actor migrations. This means that this actor can finally be used for huge datasets with lower memory!

Features:

fields are now optional which means the actor does not need to perform deduplication

Previous updates

Previous updates were not tracked, see GitHub commits if you need to find past changes or ask in Issues or Discord.

Contact Details Merge & Deduplicate

lukaskrivka/contact-details-merge-deduplicate

Merge and deduplicate all contacts extracted by Contact Details Scraper. Works with multiple datasets. One row per domain.

Lukáš Křivka

Example Process Crawl Results

apify/example-process-crawl-results

Iterates through all results from a crawler run and count them. Needs to be called from the crawler's finish webhook by adding an URL to finish the webhook of your crawler. Use this actor as a starting point to develop custom post-processing of data from the crawler.

Apify

4.5

Forward dataset as POST data

anchor/forward-dataset-webhook

This actor forwards the results of an Actor to an endpoint, instead of having to fetch the results manually. It will download the dataset and attach it to the body of a POST request you will specify. It acts as a new webhook. Simplify your Actor process !!!

Anchor

Actor Costs

lukaskrivka/actor-costs

Get costs and usage stats for your actor use aggregated daily. The actor also provides summary stats for the whole period.

Lukáš Křivka

Power Webhook Integration

pocesar/run-webhook-digest

Allows you to provide multiple HTTP endpoints, that receive a more complete JSON from the run, and allow you to hit those endpoints using a proxy, and enable you to do conditional webhook calls with some lines of Javascript code and you can link/chain one actor to another

Paulo Cesar

Monitoring Runner

apify/monitoring-runner

The monitoring runner is a part of the Apify Monitoring Suite (apify/monitoring). See its readme for more information and how to use this.

Apify

118

4.5

CSV File to Dataset

lukaskrivka/csv-file-to-dataset

Upload a local or remote CSV/text file and convert it to Apify Dataset for further use.

Lukáš Křivka

108

Rental as PPE Example

lukaskrivka/rental-as-ppe-example

Example Actor that simulated rental payments (with free results) in PPE billing system

Lukáš Křivka

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

4.7

Apify Task Usage Reporter

vittuhy/apify-task-usage-reporter

This actor scans your Apify account and provides a detailed summary of your platform usage and costs, broken down by task. It helps you understand which tasks consume the most resources over a specific period.