Actor Testing
No credit card required
Actor Testing
No credit card required
Test your actors with varying inputs and expected outputs, duplicates, bad output fields, or unexpected log messages using Jasmine
Apify actor testing
Test your actors and tasks with multiple inputs, and expected outputs, integrating with results checker
Features
By leveraging Jasmine, the extensible expect
and Apify SDK, you can test tasks and actors,
and check for their output consistency and/or duplicates.
It goes well with monitoring suit for running your production runs, but this actor should be run in a scheduled manner for best results.
- You can run many tests in parallel or test them in series (as your account memory allows)
- You can run tests locally but accessing platform storage and actors
- Abstracts access to two other public actors:
Testing
The testing interface is familiar with Jasmine BDD tests, but with Apify specific async matchers:
1({ 2 it, 3 run, 4 expectAsync, 5 input, // Object containing the current input, you can access customData here 6 describe, // describe subsections 7 expect, // default Jasmine expect 8 _, // lodash as a helper to traverse array items and objects 9 moment // Moment.JS to help with dates and time math 10 Apify // Apify SDK v2 11 apifyClient // Apify client v2 12}) => { 13 14 // describe is not needed, but it's good to keep everything tidy 15 describe('sub', () => { 16 17 it('should have preconfigured task working', async () => { 18 const myTaskResult = await run({ 19 // actorId: 'actor/from-store', // can use an actorId directly 20 taskId: 'myuser/my-task-name', 21 input: { 22 some: 'extra input' // optional overrides 23 }, 24 options: { 25 timeout: 15000 // optional call options 26 }, 27 name: 'should have preconfigured task working' 28 }); 29 30 // sync assertions, not very useful, expections should have inside async assertions 31 expect(myTaskResult.runId).not.toBeEmptyString(); 32 33 /** 34 * Async assertions calls resources on the platform 35 */ 36 37 // reads the OUTPUT key 38 await expectAsync(myTaskResult).withOutput(async ({ contentType, value }) => { 39 expect(contentType) 40 // withContext give more information about of what you're testing 41 .withContext(myTaskResult.format('Body should be utf-8 JSON')) 42 .toEqual('application/json; charset=utf-8'); 43 44 expect(value).toEqual({ hello: 'world' }, myTaskResult.format('Output body')); 45 }); 46 47 // reads any key, fails the test if not found 48 await expectAsync(myTaskResult).withKeyValueStore(async ({ key, contentType, value }) => { 49 expect(value).toEqual({ status: true }); 50 }, { keyName: 'INPUT' }); 51 52 // gets requestQueue information 53 await expectAsync(myTaskResult).withRequestQueue(async ({ 54 // contains everything from RequestQueueInfo 55 id, userId, createdAt, 56 modifiedAt, accessedAt, expireAt, 57 totalRequestCount, handledRequestCount, pendingRequestCount, 58 actId, actRunId, hadMultipleClients 59 }) => { 60 expect(totalRequestCount).toBeGreaterThan(1); 61 }); 62 63 // check log for errors 64 await expectAsync(myTaskResult).withLog((log) => { 65 expect(log).not.toContain('ReferenceError'); 66 expect(log).not.toContain('TypeError'); 67 expect(log).not.toContain('The function passed to Apify.main() threw an exception'); 68 }); 69 70 // Check for dataset consistency 71 await expectAsync(myTaskResult).withChecker(({ runResult, output }) => { 72 expect(output.badItemCount).toBe(0); 73 }, { 74 functionalChecker: () => ({ 75 myField: (field) => typeof field === 'string' 76 }) 77 }); 78 79 // Check for duplicate items 80 await expectAsync(myTaskResult).withDuplicates(({ runResult, output }) => { 81 expect(output).toEqual({}); 82 }, { 83 taskId: 'myTaskId' 84 }) 85 }); 86 87 }); 88}
Supports all extra Jasmine matchers, including asymmetrical matchers from https://github.com/JamieMason/Jasmine-Matchers
To access any
without the JS editor complaining on the platform, you need to use global.any[asymmetricMatcher]
The special run
parameter gives you the hability to run your tasks or actors, and return an accessor for their resources:
1const result = await run({ 2 taskId: 'xxx', // task either by id or using user/task-name 3 actorId: 'xxx', // actor either by id or using user/actor-name 4 input: {} // custom input override 5 options: {} // specific memory, timeout options 6 nonce: '1' // additional nonce for tasks running with the same input and options 7 name: 'run name'// give the run a name to be able to distinguish between them 8});
The run
is idempotent and will run the same tasks once per test, but you can specify the nonce
to force running it everytime
The run
function returns an object with standard API client run info with extra data:
1runResult = { 2 runInput, // Actual input of the run with default fields filled 3 maxResults, // Attempts at parsing maxResults or similar field from input (use runInput to do this yourself) 4 data: { 5 ...runInfo, 6 taskId, 7 actorName, 8 taskName, 9 name: run.name, 10 }
Matchers
Those async matchers are lazy and only evaluated when you use them. You should use the result from run
function to run expectAsync()
on.
They abstract many common platform API calls. All callbacks can be plain closures or async ones, they are awaited anyway.
You also have full access to the Apify variable inside your tests.
toHaveStatus(status: 'SUCCEEDED' | 'FAILED' | 'ABORTED' | 'TIMED-OUT')
Checks for the proper run status
withLog((logContent: string) => void)
Run expectations on the logContent
withDuplicates((result: { runResult: Object, output: Object }) => void, input?: Object)
Ensures that no duplicates are found. You can provide a taskId
with a pre-configured task or you can
provide all the input manually according to the docs here
By default, anything above 2 counted items are considered duplicates
Returns the OUTPUT
of the run, containing an object like this:
1{ 2 // the keys here mean all the values that were found on the target dataset 3 "$$": { 4 "count": 4, 5 "originalIndexes": [ 6 0, 7 12, 8 13, 9 15 10 ], 11 "outputIndexes": [ 12 9, 13 10, 14 11, 15 13 16 ] 17 }, 18 "MISSING!": { // this means it's missing or null value 19 "count": 8, 20 "originalIndexes": [ 21 1, 22 3, 23 4, 24 6, 25 10, 26 14, 27 16, 28 17 29 ], 30 "outputIndexes": [ 31 0, 32 1, 33 2, 34 5, 35 8, 36 12, 37 14, 38 15 39 ] 40 }, 41 "$$$": { 42 "count": 4, 43 "originalIndexes": [ 44 2, 45 5, 46 7, 47 8 48 ], 49 "outputIndexes": [ 50 3, 51 4, 52 6, 53 7 54 ] 55 } 56}
withChecker((result: { runResult: Object, output: Object }) => void, input: Object, options?: Object)
Input is required and you need at least a taskId
parameter pointing to a
pre-configured results-checker task or you can pass everything to the input.
Check the docs here
Options is the Apify.call/callTask options
Returns the OUTPUT
of the run, containing an object like this:
1"totalItemCount": 17, 2 "badItemCount": 0, 3 "identificationFields": [], 4 "badFields": {}, 5 "extraFields": {}, 6 "totalFieldCounts": { 7 "categories": 17, 8 "info": 17, 9 "likes": 17, 10 "messenger": 17, 11 "posts": 17, 12 "priceRange": 10, 13 "title": 17, 14 "pageUrl": 17, 15 "address": 17, 16 "awards": 17, 17 "email": 15, 18 "impressum": 17, 19 "instagram": 2, 20 "phone": 15, 21 "products": 17, 22 "transit": 4, 23 "twitter": 1, 24 "website": 16, 25 "youtube": 0, 26 "mission": 17, 27 "overview": 17, 28 "payment": 2, 29 "checkins": 12, 30 "#startedAt": 17, 31 "verified": 0, 32 "#url": 17, 33 "#ref": 17, 34 "reviews": 14, 35 "#version": 17, 36 "#finishedAt": 17 37 }, 38 "badItems": "https://api.apify.com/v2/key-value-stores/_/records/BAD-ITEMS?disableRedirect=true"
withDataset((result: { dataset: Object, info: Object }) => void, options?: Object)
Returns dataset information and the items. Options can be optionally passed to limit the number of items returned,
using unwind
parameter, or any other option that is available here: Dataset getItems
The dataset object contains:
1{ 2 items: [ [Object] ], 3 total: 1, 4 offset: 0, 5 count: 1, 6 limit: 999999999999 7}
The info object contains:
1{ 2 id: '', 3 userId: '', 4 createdAt: 2020-12-05T18:44:45.041Z, 5 modifiedAt: 2020-12-05T18:44:50.515Z, 6 accessedAt: 2020-12-05T18:44:50.515Z, 7 itemCount: 1, 8 cleanItemCount: 1, 9 actId: '', 10 actRunId: '', 11 stats: { 12 uploadedBytes: 0, 13 downloadedBytes: 0, 14 deflatedBytes: 0, 15 inflatedBytes: 21, 16 s3PutCount: 0, 17 s3GetCount: 0, 18 s3DeleteCount: 0, 19 readCount: 0, 20 writeCount: 1 21 } 22}
N.B.: this method waits at least 12 seconds to be able to read from the remote storage and make sure
it's ready to be accessed after the task/actor has finished running using run
withOutput((output: { value: any, contentType: string }) => void)
Returns the OUTPUT
key of the run. Can have any content type, check the contentType
withStatistics((stats: Object) => void, options?: { index: number = 0 })
Returns the SDK_CRAWLER_STATISTICS_0
key of the run by default, unless provided with another index
in the options.
Returns an object like this:
1{ 2 "requestsFinished": 217, 3 "requestsFailed": 99, 4 "requestsRetries": 0, 5 "requestsFailedPerMinute": 3, 6 "requestsFinishedPerMinute": 8, 7 "requestMinDurationMillis": 3071, 8 "requestMaxDurationMillis": 41800, 9 "requestTotalFailedDurationMillis": 686856, 10 "requestTotalFinishedDurationMillis": 3161769, 11 "crawlerStartedAt": "2020-12-07T05:06:44.107Z", 12 "crawlerFinishedAt": null, 13 "statsPersistedAt": "2020-12-07T05:34:04.209Z", 14 "crawlerRuntimeMillis": 1640402, 15 "crawlerLastStartTimestamp": 1607317603807, 16 "requestRetryHistogram": [ 17 316 18 ], 19 "statsId": 0, 20 "requestAvgFailedDurationMillis": 6938, 21 "requestAvgFinishedDurationMillis": 14570, 22 "requestTotalDurationMillis": 3848625, 23 "requestsTotal": 316 24}
withKeyValueStore((output: { value: any, contentType: string }) => void, options: { keyName: string })
Returns the content of the selected keyName. The test fails if the key doesn't exist.
You can access the INPUT that was used for the run using { keyName: 'INPUT' }
withRequestQueue((requestQueue: Object) => void)
Access the requestQueue object, that contains:
1{ 2 id: '', 3 userId: '', 4 createdAt: 2020-12-05T18:44:45.048Z, 5 modifiedAt: 2020-12-05T18:44:45.048Z, 6 accessedAt: 2020-12-05T18:44:45.048Z, 7 expireAt: 2021-02-03T18:44:45.048Z, 8 totalRequestCount: 0, 9 handledRequestCount: 0, 10 pendingRequestCount: 0, 11 actId: '', 12 actRunId: '', 13 hadMultipleClients: false 14}
N.B.: all those exists only on expectAsync
and need to be awaited, as demonstrated above:
1await expectAsync(runResult).withDataset((something) => { 2 expect(something).toEqual('here'); 3});
jasmine.any()
and jasmine.anything()
can be accessed using global.jasmine
Output
The tests output are available in the key value store under OUTPUT
key, with the following structure:
1{ 2 "suite2": { 3 "id": "suite2", 4 "description": "one", 5 "fullName": "Actor tests one", 6 "failedExpectations": [], 7 "deprecationWarnings": [], 8 "duration": 26484, 9 "properties": null, 10 "status": "passed", 11 "specs": [ 12 { 13 "id": "spec0", 14 "description": "should work", 15 "fullName": "Actor tests one should work", 16 "failedExpectations": [], 17 "passedExpectations": [ 18 { 19 "matcherName": "toHaveStatus", 20 "message": "Passed.", 21 "stack": "", 22 "passed": true 23 }, 24 { 25 "matcherName": "toEqual", 26 "message": "Passed.", 27 "stack": "", 28 "passed": true 29 }, 30 { 31 "matcherName": "withDataset", 32 "message": "Passed.", 33 "stack": "", 34 "passed": true 35 }, 36 { 37 "matcherName": "withRequestQueue", 38 "message": "Passed.", 39 "stack": "", 40 "passed": true 41 }, 42 { 43 "matcherName": "withOutput", 44 "message": "Passed.", 45 "stack": "", 46 "passed": true 47 }, 48 { 49 "matcherName": "withKeyValueStore", 50 "message": "Passed.", 51 "stack": "", 52 "passed": true 53 }, 54 { 55 "matcherName": "withChecker", 56 "message": "Passed.", 57 "stack": "", 58 "passed": true 59 } 60 ], 61 "deprecationWarnings": [], 62 "pendingReason": "", 63 "duration": 26480, 64 "properties": null, 65 "status": "passed" 66 } 67 ] 68 }, 69 "suite3": { 70 "id": "suite3", 71 "description": "two", 72 "fullName": "Actor tests two", 73 "failedExpectations": [], 74 "deprecationWarnings": [], 75 "duration": 21, 76 "properties": null, 77 "status": "passed", 78 "specs": [ 79 { 80 "id": "spec1", 81 "description": "works", 82 "fullName": "Actor tests two works", 83 "failedExpectations": [ 84 { 85 "matcherName": "toBe", 86 "message": "Expected true to be false.", 87 "stack": "Error: Expected true to be false.\n at <Jasmine>\n at listOnTimeout (internal/timers.js:549:17)\n at processTimers (internal/timers.js:492:7)", 88 "passed": false, 89 "expected": false, 90 "actual": true 91 } 92 ], 93 "passedExpectations": [], 94 "deprecationWarnings": [], 95 "pendingReason": "", 96 "duration": 15, 97 "properties": null, 98 "status": "failed" 99 } 100 ] 101 } 102}
Expected consumption
This is a very lightweight actor that only intermediates actor runs, it can be run with the lowest amount of memory, which is 128MB. Running for an hour should consume around 0.125 CUs.
Reasoning
Automated and integration tests are a must have for any complex piece of software. For Apify actors, it's no different. Apify actors can be one (or many inputs) to one output, or it can have many items (through the dataset).
License
Apache 2.0
Actor Metrics
1 monthly user
-
9 stars
83% runs succeeded
Created in Dec 2020
Modified 3 months ago