No credit card required

Actor Testing

pocesar/actor-testing

No credit card required

Test your actors with varying inputs and expected outputs, duplicates, bad output fields, or unexpected log messages using Jasmine

Apify actor testing

Test your actors and tasks with multiple inputs, and expected outputs, integrating with results checker

Features
Testing
Expected consumption
Reasoning

Features

By leveraging Jasmine, the extensible expect and Apify SDK, you can test tasks and actors, and check for their output consistency and/or duplicates.

It goes well with monitoring suit for running your production runs, but this actor should be run in a scheduled manner for best results.

You can run many tests in parallel or test them in series (as your account memory allows)
You can run tests locally but accessing platform storage and actors
Abstracts access to two other public actors:
- Results checker
- Duplications checker

Testing

The testing interface is familiar with Jasmine BDD tests, but with Apify specific async matchers:

1({
2    it,
3    run,
4    expectAsync,
5    input, // Object containing the current input, you can access customData here
6    describe, // describe subsections
7    expect, // default Jasmine expect
8    _, // lodash as a helper to traverse array items and objects
9    moment // Moment.JS to help with dates and time math
10}) => {
11
12  // describe is not needed, but it's good to keep everything tidy
13  describe('sub', () => {
14
15    it('should have preconfigured task working', async () => {
16        const myTaskResult = await run({
17            // actorId: 'actor/from-store', // can use an actorId directly
18            taskId: 'myuser/my-task-name',
19            input: {
20                some: 'extra input' // optional overrides
21            },
22            options: {
23                timeout: 15000 // optional call options
24            },
25            name: 'should have preconfigured task working'
26        });
27
28        // sync assertions, not very useful, expections should have inside async assertions
29        expect(myTaskResult.runId).not.toBeEmptyString();
30
31        /**
32         * Async assertions calls resources on the platform
33         */
34
35        // reads the OUTPUT key
36        await expectAsync(myTaskResult).withOutput(async ({ contentType, value }) => {
37            expect(contentType)
38                // withContext give more information about of what you're testing
39                .withContext(myTaskResult.format('Body should be utf-8 JSON'))
40                .toEqual('application/json; charset=utf-8');
41
42            expect(value).toEqual({ hello: 'world' }, myTaskResult.format('Output body'));
43        });
44
45        // reads any key, fails the test if not found
46        await expectAsync(myTaskResult).withKeyValueStore(async ({ key, contentType, value }) => {
47            expect(value).toEqual({ status: true });
48        }, { keyName: 'INPUT' });
49
50        // gets requestQueue information
51        await expectAsync(myTaskResult).withRequestQueue(async ({
52            // contains everything from RequestQueueInfo
53            id, userId, createdAt,
54            modifiedAt, accessedAt, expireAt,
55            totalRequestCount, handledRequestCount, pendingRequestCount,
56            actId, actRunId, hadMultipleClients
57        }) => {
58            expect(totalRequestCount).toBeGreaterThan(1);
59        });
60
61        // check log for errors
62        await expectAsync(myTaskResult).withLog((log) => {
63            expect(log).not.toContain('ReferenceError');
64            expect(log).not.toContain('TypeError');
65            expect(log).not.toContain('The function passed to Apify.main() threw an exception');
66        });
67
68        // Check for dataset consistency
69        await expectAsync(myTaskResult).withChecker(({ runResult, output }) => {
70            expect(output.badItemCount).toBe(0);
71        }, {
72            functionalChecker: () => ({
73                myField: (field) => typeof field === 'string'
74            })
75        });
76
77        // Check for duplicate items
78        await expectAsync(myTaskResult).withDuplicates(({ runResult, output }) => {
79            expect(output).toEqual({});
80        }, {
81            taskId: 'myTaskId'
82        })
83    });
84
85  });
86}

Supports all extra Jasmine matchers, including asymmetrical matchers from https://github.com/JamieMason/Jasmine-Matchers To access any without the JS editor complaining on the platform, you need to use global.any[asymmetricMatcher]

The special run parameter gives you the hability to run your tasks or actors, and return an accessor for their resources:

1const result = await run({
2  taskId: 'xxx',  // task either by id or using user/task-name
3  actorId: 'xxx', // actor either by id or using user/actor-name
4  input: {}       // custom input override
5  options: {}     // specific memory, timeout options
6  nonce: '1'      // additional nonce for tasks running with the same input and options
7  name: 'run name'// give the run a name to be able to distinguish between them
8});

The run is idempotent and will run the same tasks once per test, but you can specify the nonce to force running it everytime

Matchers

Those async matchers are lazy and only evaluated when you use them. You should use the result from run function to run expectAsync() on. They abstract many common platform API calls. All callbacks can be plain closures or async ones, they are awaited anyway.

You also have full access to the Apify variable inside your tests.

toHaveStatus(status: 'SUCCEEDED' | 'FAILED' | 'ABORTED' | 'TIMED-OUT')

Checks for the proper run status

withLog((logContent: string) => void)

Run expectations on the logContent

withDuplicates((result: { runResult: Object, output: Object }) => void, input?: Object)

Ensures that no duplicates are found. You can provide a taskId with a pre-configured task or you can provide all the input manually according to the docs here By default, anything above 2 counted items are considered duplicates

Returns the OUTPUT of the run, containing an object like this:

1{
2  // the keys here mean all the values that were found on the target dataset
3  "$$": {
4    "count": 4,
5    "originalIndexes": [
6      0,
7      12,
8      13,
9      15
10    ],
11    "outputIndexes": [
12      9,
13      10,
14      11,
15      13
16    ]
17  },
18  "MISSING!": { // this means it's missing or null value
19    "count": 8,
20    "originalIndexes": [
21      1,
22      3,
23      4,
24      6,
25      10,
26      14,
27      16,
28      17
29    ],
30    "outputIndexes": [
31      0,
32      1,
33      2,
34      5,
35      8,
36      12,
37      14,
38      15
39    ]
40  },
41  "$$$": {
42    "count": 4,
43    "originalIndexes": [
44      2,
45      5,
46      7,
47      8
48    ],
49    "outputIndexes": [
50      3,
51      4,
52      6,
53      7
54    ]
55  }
56}

withChecker((result: { runResult: Object, output: Object }) => void, input: Object, options?: Object)

Input is required and you need at least a taskId parameter pointing to a pre-configured results-checker task or you can pass everything to the input. Check the docs here

Options is the Apify.call/callTask options Returns the OUTPUT of the run, containing an object like this:

1"totalItemCount": 17,
2  "badItemCount": 0,
3  "identificationFields": [],
4  "badFields": {},
5  "extraFields": {},
6  "totalFieldCounts": {
7    "categories": 17,
8    "info": 17,
9    "likes": 17,
10    "messenger": 17,
11    "posts": 17,
12    "priceRange": 10,
13    "title": 17,
14    "pageUrl": 17,
15    "address": 17,
16    "awards": 17,
17    "email": 15,
18    "impressum": 17,
19    "instagram": 2,
20    "phone": 15,
21    "products": 17,
22    "transit": 4,
23    "twitter": 1,
24    "website": 16,
25    "youtube": 0,
26    "mission": 17,
27    "overview": 17,
28    "payment": 2,
29    "checkins": 12,
30    "#startedAt": 17,
31    "verified": 0,
32    "#url": 17,
33    "#ref": 17,
34    "reviews": 14,
35    "#version": 17,
36    "#finishedAt": 17
37  },
38  "badItems": "https://api.apify.com/v2/key-value-stores/_/records/BAD-ITEMS?disableRedirect=true"

withDataset((result: { dataset: Object, info: Object }) => void, options?: Object)

Returns dataset information and the items. Options can be optionally passed to limit the number of items returned, using unwind parameter, or any other option that is available here: Dataset getItems

The dataset object contains:

1{
2    items: [ [Object] ],
3    total: 1,
4    offset: 0,
5    count: 1,
6    limit: 999999999999
7}

The info object contains:

1{
2    id: '',
3    userId: '',
4    createdAt: 2020-12-05T18:44:45.041Z,
5    modifiedAt: 2020-12-05T18:44:50.515Z,
6    accessedAt: 2020-12-05T18:44:50.515Z,
7    itemCount: 1,
8    cleanItemCount: 1,
9    actId: '',
10    actRunId: '',
11    stats: {
12      uploadedBytes: 0,
13      downloadedBytes: 0,
14      deflatedBytes: 0,
15      inflatedBytes: 21,
16      s3PutCount: 0,
17      s3GetCount: 0,
18      s3DeleteCount: 0,
19      readCount: 0,
20      writeCount: 1
21    }
22}

N.B.: this method waits at least 12 seconds to be able to read from the remote storage and make sure it's ready to be accessed after the task/actor has finished running using run

withOutput((output: { value: any, contentType: string }) => void)

Returns the OUTPUT key of the run. Can have any content type, check the contentType

withStatistics((stats: Object) => void, options?: { index: number = 0 })

Returns the SDK_CRAWLER_STATISTICS_0 key of the run by default, unless provided with another index in the options.

Returns an object like this:

1{
2  "requestsFinished": 217,
3  "requestsFailed": 99,
4  "requestsRetries": 0,
5  "requestsFailedPerMinute": 3,
6  "requestsFinishedPerMinute": 8,
7  "requestMinDurationMillis": 3071,
8  "requestMaxDurationMillis": 41800,
9  "requestTotalFailedDurationMillis": 686856,
10  "requestTotalFinishedDurationMillis": 3161769,
11  "crawlerStartedAt": "2020-12-07T05:06:44.107Z",
12  "crawlerFinishedAt": null,
13  "statsPersistedAt": "2020-12-07T05:34:04.209Z",
14  "crawlerRuntimeMillis": 1640402,
15  "crawlerLastStartTimestamp": 1607317603807,
16  "requestRetryHistogram": [
17    316
18  ],
19  "statsId": 0,
20  "requestAvgFailedDurationMillis": 6938,
21  "requestAvgFinishedDurationMillis": 14570,
22  "requestTotalDurationMillis": 3848625,
23  "requestsTotal": 316
24}

withKeyValueStore((output: { value: any, contentType: string }) => void, options: { keyName: string })

Returns the content of the selected keyName. The test fails if the key doesn't exist. You can access the INPUT that was used for the run using { keyName: 'INPUT' }

withRequestQueue((requestQueue: Object) => void)

Access the requestQueue object, that contains:

1{
2    id: '',
3    userId: '',
4    createdAt: 2020-12-05T18:44:45.048Z,
5    modifiedAt: 2020-12-05T18:44:45.048Z,
6    accessedAt: 2020-12-05T18:44:45.048Z,
7    expireAt: 2021-02-03T18:44:45.048Z,
8    totalRequestCount: 0,
9    handledRequestCount: 0,
10    pendingRequestCount: 0,
11    actId: '',
12    actRunId: '',
13    hadMultipleClients: false
14}

N.B.: all those exists only on expectAsync and need to be awaited, as demonstrated above:

1await expectAsync(runResult).withDataset((something) => {
2    expect(something).toEqual('here');
3});

jasmine.any() and jasmine.anything() can be accessed using global.jasmine

Output

The tests output are available in the key value store under OUTPUT key, with the following structure:

1{
2  "suite2": {
3    "id": "suite2",
4    "description": "one",
5    "fullName": "Actor tests one",
6    "failedExpectations": [],
7    "deprecationWarnings": [],
8    "duration": 26484,
9    "properties": null,
10    "status": "passed",
11    "specs": [
12      {
13        "id": "spec0",
14        "description": "should work",
15        "fullName": "Actor tests one should work",
16        "failedExpectations": [],
17        "passedExpectations": [
18          {
19            "matcherName": "toHaveStatus",
20            "message": "Passed.",
21            "stack": "",
22            "passed": true
23          },
24          {
25            "matcherName": "toEqual",
26            "message": "Passed.",
27            "stack": "",
28            "passed": true
29          },
30          {
31            "matcherName": "withDataset",
32            "message": "Passed.",
33            "stack": "",
34            "passed": true
35          },
36          {
37            "matcherName": "withRequestQueue",
38            "message": "Passed.",
39            "stack": "",
40            "passed": true
41          },
42          {
43            "matcherName": "withOutput",
44            "message": "Passed.",
45            "stack": "",
46            "passed": true
47          },
48          {
49            "matcherName": "withKeyValueStore",
50            "message": "Passed.",
51            "stack": "",
52            "passed": true
53          },
54          {
55            "matcherName": "withChecker",
56            "message": "Passed.",
57            "stack": "",
58            "passed": true
59          }
60        ],
61        "deprecationWarnings": [],
62        "pendingReason": "",
63        "duration": 26480,
64        "properties": null,
65        "status": "passed"
66      }
67    ]
68  },
69  "suite3": {
70    "id": "suite3",
71    "description": "two",
72    "fullName": "Actor tests two",
73    "failedExpectations": [],
74    "deprecationWarnings": [],
75    "duration": 21,
76    "properties": null,
77    "status": "passed",
78    "specs": [
79      {
80        "id": "spec1",
81        "description": "works",
82        "fullName": "Actor tests two works",
83        "failedExpectations": [
84          {
85            "matcherName": "toBe",
86            "message": "Expected true to be false.",
87            "stack": "Error: Expected true to be false.\n    at <Jasmine>\n    at listOnTimeout (internal/timers.js:549:17)\n    at processTimers (internal/timers.js:492:7)",
88            "passed": false,
89            "expected": false,
90            "actual": true
91          }
92        ],
93        "passedExpectations": [],
94        "deprecationWarnings": [],
95        "pendingReason": "",
96        "duration": 15,
97        "properties": null,
98        "status": "failed"
99      }
100    ]
101  }
102}

Expected consumption

This is a very lightweight actor that only intermediates actor runs, it can be run with the lowest amount of memory, which is 128MB. Running for an hour should consume around 0.125 CUs.

Reasoning

Automated and integration tests are a must have for any complex piece of software. For Apify actors, it's no different. Apify actors can be one (or many inputs) to one output, or it can have many items (through the dataset).

License

Apache 2.0

Developer

Paulo Cesar

Actor metrics

3 monthly users
86.4% runs succeeded
0.0 days response time
Created in Dec 2020
Modified about 1 month ago

Categories

Automation

Content Checker

jakubbalada/content-checker

Monitor a website or web page for content changes. Automatically saves before and after screenshots and sends an email notification when content changes are detected.

Jakub Balada

AI Web Agent

apify/ai-web-agent

Use natural language prompts to browse the web, click on elements, fill and submit forms, extract data, and take screenshots using the OpenAI API.

Apify

345

Merge, Dedup & Transform Datasets

lukaskrivka/dedup-datasets

The ultimate dataset processor. Extremely fast merging, deduplications & transformations all in a single run.

Lukáš Křivka

1.5k

Send Email

apify/send-mail

The actor automatically sends an email to a specific address. This actor is useful for notifications and reporting. With only 3 lines of javascript code, you'll be on top of your scraping actors and never miss important results or issues.

Apify

2.4k

Instagram Followers Count Scraper

apify/instagram-followers-count-scraper

Scrape the number of followers & follows from any Instagram profile. Schedule the scraper to run regularly to monitor how the numbers change. You can also export scraped data, run the scraper via API, monitor runs or integrate with other tools.

Apify

1.9k

Anti Captcha Recaptcha

petr_cermak/anti-captcha-recaptcha

🧰 Actor for solving Google reCAPTCHA using the anti-captcha.com service. You need to have an anti-captcha subscription.

Petr Cermak

1.3k

Page Scraping Analyzer

apify/page-analyzer

Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.

Apify

Google Sheets Import & Export

lukaskrivka/google-sheets

Import data from datasets or JSON files to Google Sheets. Programmatically process data in Sheets. Easier and faster than the official Google Sheets API and perfect for importing data from scraping.

Lukáš Křivka

852

Website Checker

lukaskrivka/website-checker

Check any website you plan to scrape for expected Compute unit consumption, anti-scraping software, and reliability.

Lukáš Křivka

725

Dataset Image Downloader & Uploader

lukaskrivka/images-download-upload

Download image files from image URLs in your datasets and save them to a Zip file, Key-Value store, or directly your AWS S3 bucket.

Lukáš Křivka

280

Playwright vs. Cypress

Connecting web scrapers: a guide to Actor-to-Actor integrations

Playwright testing: how to write and run E2E tests properly

Build new tools

Are you a developer? Build your own Actors and run them on Apify.

Enterprise solutions

Get a complete web scraping or automation solution from Apify experts.

Actor Testing

Apify actor testing

Features

Testing

Matchers

toHaveStatus(status: 'SUCCEEDED' | 'FAILED' | 'ABORTED' | 'TIMED-OUT')

withLog((logContent: string) => void)

withDuplicates((result: { runResult: Object, output: Object }) => void, input?: Object)

withChecker((result: { runResult: Object, output: Object }) => void, input: Object, options?: Object)

withDataset((result: { dataset: Object, info: Object }) => void, options?: Object)

withOutput((output: { value: any, contentType: string }) => void)

withStatistics((stats: Object) => void, options?: { index: number = 0 })

withKeyValueStore((output: { value: any, contentType: string }) => void, options: { keyName: string })

withRequestQueue((requestQueue: Object) => void)

Output

Expected consumption

Reasoning

License

Content Checker

AI Web Agent

Merge, Dedup & Transform Datasets

Send Email

Instagram Followers Count Scraper

Anti Captcha Recaptcha

Page Scraping Analyzer

Google Sheets Import & Export

Website Checker

Dataset Image Downloader & Uploader

Related articles

Where next?

Apify actor testing

Features

Testing

Matchers

toHaveStatus(status: 'SUCCEEDED' | 'FAILED' | 'ABORTED' | 'TIMED-OUT')

withLog((logContent: string) => void)

withDuplicates((result: { runResult: Object, output: Object }) => void, input?: Object)

withChecker((result: { runResult: Object, output: Object }) => void, input: Object, options?: Object)

withDataset((result: { dataset: Object, info: Object }) => void, options?: Object)

withOutput((output: { value: any, contentType: string }) => void)

withStatistics((stats: Object) => void, options?: { index: number = 0 })

withKeyValueStore((output: { value: any, contentType: string }) => void, options: { keyName: string })

withRequestQueue((requestQueue: Object) => void)

Output

Expected consumption

Reasoning

License

You might also like these Actors

Content Checker

AI Web Agent

Merge, Dedup & Transform Datasets

Send Email

Instagram Followers Count Scraper

Anti Captcha Recaptcha

Page Scraping Analyzer

Google Sheets Import & Export

Website Checker

Dataset Image Downloader & Uploader

Related articles

Where next?