Best Actor Finder
Pricing
Pay per usage
Go to Apify Store
Pricing
Pay per usage
Rating
5.0
(1)
Developer

Pranav Patel
Maintained by Community
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Pricing
Pay per usage
Pricing
Pay per usage
Rating
5.0
(1)
Developer

Pranav Patel
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
# configurations.idea.vscode.zed
# crawlee and apify storage foldersapify_storagecrawlee_storagestorage
# installed filesnode_modules
# git folder.gitroot = true
[*]indent_style = spaceindent_size = 4charset = utf-8trim_trailing_whitespace = trueinsert_final_newline = trueend_of_line = lf# This file tells Git which files shouldn't be added to source control
.DS_Store.idea.vscode.zeddistnode_modulesapify_storagestorage
# Added by Apify CLI.venv.prettierignore{ "printWidth": 120, "singleQuote": true, "tabWidth": 4}1# Apify Actors Development Guide2
3Important: Before you begin, fill in the `generatedBy` property in the meta section of `.actor/actor.json`. Replace it with the model you're currently using. This helps Apify monitor and improve AGENTS.md for specific LLM models.4
5## What are Apify Actors?6
7- Actors are serverless programs that run in the cloud. They're inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems.8- Actors are programs packaged as Docker images that run in isolated containers9
10## Core Concepts11
12- Accept well-defined JSON input13- Perform isolated tasks (web scraping, automation, data processing)14- Produce structured JSON output to datasets and/or store data in key-value stores15- Can run from seconds to hours or even indefinitely16- Persist state and can be restarted17
18## Do19
20- accept well-defined JSON input and produce structured JSON output21- use Apify SDK (`apify`) for code running ON Apify platform22- validate input early with proper error handling and fail gracefully23- use CheerioCrawler for static HTML content (10x faster than browsers)24- use PlaywrightCrawler only for JavaScript-heavy sites and dynamic content25- use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls26- implement retry strategies with exponential backoff for failed requests27- use proper concurrency settings (HTTP: 10-50, Browser: 1-5)28- set sensible defaults in `.actor/input_schema.json` for all optional fields29- set up output schema in `.actor/output_schema.json`30- clean and validate data before pushing to dataset31- use semantic CSS selectors and fallback strategies for missing elements32- respect robots.txt, ToS, and implement rate limiting with delays33- check which tools (cheerio/playwright/crawlee) are installed before applying guidance34
35## Don't36
37- do not rely on `Dataset.getInfo()` for final counts on Cloud platform38- do not use browser crawlers when HTTP/Cheerio works (massive performance gains with HTTP)39- do not hard code values that should be in input schema or environment variables40- do not skip input validation or error handling41- do not overload servers - use appropriate concurrency and delays42- do not scrape prohibited content or ignore Terms of Service43- do not store personal/sensitive data unless explicitly permitted44- do not use deprecated options like `requestHandlerTimeoutMillis` on CheerioCrawler (v3.x)45- do not use `additionalHttpHeaders` - use `preNavigationHooks` instead46
47## Commands48
49```bash50# Local development51apify run # Run Actor locally52
53# Authentication & deployment54apify login # Authenticate account55apify push # Deploy to Apify platform56
57# Help58apify help # List all commands59```60
61## Safety and Permissions62
63Allowed without prompt:64
65- read files with `Actor.getValue()`66- push data with `Actor.pushData()`67- set values with `Actor.setValue()`68- enqueue requests to RequestQueue69- run locally with `apify run`70
71Ask first:72
73- npm/pip package installations74- apify push (deployment to cloud)75- proxy configuration changes (requires paid plan)76- Dockerfile changes affecting builds77- deleting datasets or key-value stores78
79## Project Structure80
81.actor/82├── actor.json # Actor config: name, version, env vars, runtime settings83├── input_schema.json # Input validation & Console form definition84└── output_schema.json # Specifies where an Actor stores its output85src/86└── main.js # Actor entry point and orchestrator87storage/ # Local storage (mirrors Cloud during development)88├── datasets/ # Output items (JSON objects)89├── key_value_stores/ # Files, config, INPUT90└── request_queues/ # Pending crawl requests91Dockerfile # Container image definition92AGENTS.md # AI agent instructions (this file)93
94## Actor Input Schema95
96The input schema defines the input parameters for an Actor. It's a JSON object comprising various field types supported by the Apify platform.97
98### Structure99
100```json101{102 "title": "<INPUT-SCHEMA-TITLE>",103 "type": "object",104 "schemaVersion": 1,105 "properties": {106 /* define input fields here */107 },108 "required": []109}110```111
112### Example113
114```json115{116 "title": "E-commerce Product Scraper Input",117 "type": "object",118 "schemaVersion": 1,119 "properties": {120 "startUrls": {121 "title": "Start URLs",122 "type": "array",123 "description": "URLs to start scraping from (category pages or product pages)",124 "editor": "requestListSources",125 "default": [{ "url": "https://example.com/category" }],126 "prefill": [{ "url": "https://example.com/category" }]127 },128 "followVariants": {129 "title": "Follow Product Variants",130 "type": "boolean",131 "description": "Whether to scrape product variants (different colors, sizes)",132 "default": true133 },134 "maxRequestsPerCrawl": {135 "title": "Max Requests per Crawl",136 "type": "integer",137 "description": "Maximum number of pages to scrape (0 = unlimited)",138 "default": 1000,139 "minimum": 0140 },141 "proxyConfiguration": {142 "title": "Proxy Configuration",143 "type": "object",144 "description": "Proxy settings for anti-bot protection",145 "editor": "proxy",146 "default": { "useApifyProxy": false }147 },148 "locale": {149 "title": "Locale",150 "type": "string",151 "description": "Language/country code for localized content",152 "default": "cs",153 "enum": ["cs", "en", "de", "sk"],154 "enumTitles": ["Czech", "English", "German", "Slovak"]155 }156 },157 "required": ["startUrls"]158}159```160
161## Actor Output Schema162
163The Actor output schema builds upon the schemas for the dataset and key-value store. It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results.164
165### Structure166
167```json168{169 "actorOutputSchemaVersion": 1,170 "title": "<OUTPUT-SCHEMA-TITLE>",171 "properties": {172 /* define your outputs here */173 }174}175```176
177### Example178
179```json180{181 "actorOutputSchemaVersion": 1,182 "title": "Output schema of the files scraper",183 "properties": {184 "files": {185 "type": "string",186 "title": "Files",187 "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys"188 },189 "dataset": {190 "type": "string",191 "title": "Dataset",192 "template": "{{links.apiDefaultDatasetUrl}}/items"193 }194 }195}196```197
198### Output Schema Template Variables199
200- `links` (object) - Contains quick links to most commonly used URLs201- `links.publicRunUrl` (string) - Public run url in format `https://console.apify.com/view/runs/:runId`202- `links.consoleRunUrl` (string) - Console run url in format `https://console.apify.com/actors/runs/:runId`203- `links.apiRunUrl` (string) - API run url in format `https://api.apify.com/v2/actor-runs/:runId`204- `links.apiDefaultDatasetUrl` (string) - API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId`205- `links.apiDefaultKeyValueStoreUrl` (string) - API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId`206- `links.containerRunUrl` (string) - URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/`207- `run` (object) - Contains information about the run same as it is returned from the `GET Run` API endpoint208- `run.defaultDatasetId` (string) - ID of the default dataset209- `run.defaultKeyValueStoreId` (string) - ID of the default key-value store210
211## Dataset Schema Specification212
213The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console.214
215### Example216
217Consider an example Actor that calls `Actor.pushData()` to store data into dataset:218
219```typescript220import { Actor } from 'apify';221// Initialize the JavaScript SDK222await Actor.init();223
224/**225 * Actor code226 */227await Actor.pushData({228 numericField: 10,229 pictureUrl: 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png',230 linkUrl: 'https://google.com',231 textField: 'Google',232 booleanField: true,233 dateField: new Date(),234 arrayField: ['#hello', '#world'],235 objectField: {},236});237
238// Exit successfully239await Actor.exit();240```241
242To set up the Actor's output tab UI, reference a dataset schema file in `.actor/actor.json`:243
244```json245{246 "actorSpecification": 1,247 "name": "book-library-scraper",248 "title": "Book Library Scraper",249 "version": "1.0.0",250 "storages": {251 "dataset": "./dataset_schema.json"252 }253}254```255
256Then create the dataset schema in `.actor/dataset_schema.json`:257
258```json259{260 "actorSpecification": 1,261 "fields": {},262 "views": {263 "overview": {264 "title": "Overview",265 "transformation": {266 "fields": [267 "pictureUrl",268 "linkUrl",269 "textField",270 "booleanField",271 "arrayField",272 "objectField",273 "dateField",274 "numericField"275 ]276 },277 "display": {278 "component": "table",279 "properties": {280 "pictureUrl": {281 "label": "Image",282 "format": "image"283 },284 "linkUrl": {285 "label": "Link",286 "format": "link"287 },288 "textField": {289 "label": "Text",290 "format": "text"291 },292 "booleanField": {293 "label": "Boolean",294 "format": "boolean"295 },296 "arrayField": {297 "label": "Array",298 "format": "array"299 },300 "objectField": {301 "label": "Object",302 "format": "object"303 },304 "dateField": {305 "label": "Date",306 "format": "date"307 },308 "numericField": {309 "label": "Number",310 "format": "number"311 }312 }313 }314 }315 }316}317```318
319### Structure320
321```json322{323 "actorSpecification": 1,324 "fields": {},325 "views": {326 "<VIEW_NAME>": {327 "title": "string (required)",328 "description": "string (optional)",329 "transformation": {330 "fields": ["string (required)"],331 "unwind": ["string (optional)"],332 "flatten": ["string (optional)"],333 "omit": ["string (optional)"],334 "limit": "integer (optional)",335 "desc": "boolean (optional)"336 },337 "display": {338 "component": "table (required)",339 "properties": {340 "<FIELD_NAME>": {341 "label": "string (optional)",342 "format": "text|number|date|link|boolean|image|array|object (optional)"343 }344 }345 }346 }347 }348}349```350
351**Dataset Schema Properties:**352
353- `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1)354- `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible)355- `views` (DatasetView object, required) - Object with API and UI views description356
357**DatasetView Properties:**358
359- `title` (string, required) - Visible in UI Output tab and API360- `description` (string, optional) - Only available in API response361- `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API362- `display` (ViewDisplay object, required) - Output tab UI visualization definition363
364**ViewTransformation Properties:**365
366- `fields` (string[], required) - Fields to present in output (order matches column order)367- `unwind` (string[], optional) - Deconstructs nested children into parent object368- `flatten` (string[], optional) - Transforms nested object into flat structure369- `omit` (string[], optional) - Removes specified fields from output370- `limit` (integer, optional) - Maximum number of results (default: all)371- `desc` (boolean, optional) - Sort order (true = newest first)372
373**ViewDisplay Properties:**374
375- `component` (string, required) - Only `table` is available376- `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values377
378**ViewDisplayProperty Properties:**379
380- `label` (string, optional) - Table column header381- `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object`382
383## Key-Value Store Schema Specification384
385The key-value store schema organizes keys into logical groups called collections for easier data management.386
387### Example388
389Consider an example Actor that calls `Actor.setValue()` to save records into the key-value store:390
391```typescript392import { Actor } from 'apify';393// Initialize the JavaScript SDK394await Actor.init();395
396/**397 * Actor code398 */399await Actor.setValue('document-1', 'my text data', { contentType: 'text/plain' });400
401await Actor.setValue(`image-${imageID}`, imageBuffer, { contentType: 'image/jpeg' });402
403// Exit successfully404await Actor.exit();405```406
407To configure the key-value store schema, reference a schema file in `.actor/actor.json`:408
409```json410{411 "actorSpecification": 1,412 "name": "data-collector",413 "title": "Data Collector",414 "version": "1.0.0",415 "storages": {416 "keyValueStore": "./key_value_store_schema.json"417 }418}419```420
421Then create the key-value store schema in `.actor/key_value_store_schema.json`:422
423```json424{425 "actorKeyValueStoreSchemaVersion": 1,426 "title": "Key-Value Store Schema",427 "collections": {428 "documents": {429 "title": "Documents",430 "description": "Text documents stored by the Actor",431 "keyPrefix": "document-"432 },433 "images": {434 "title": "Images",435 "description": "Images stored by the Actor",436 "keyPrefix": "image-",437 "contentTypes": ["image/jpeg"]438 }439 }440}441```442
443### Structure444
445```json446{447 "actorKeyValueStoreSchemaVersion": 1,448 "title": "string (required)",449 "description": "string (optional)",450 "collections": {451 "<COLLECTION_NAME>": {452 "title": "string (required)",453 "description": "string (optional)",454 "key": "string (conditional - use key OR keyPrefix)",455 "keyPrefix": "string (conditional - use key OR keyPrefix)",456 "contentTypes": ["string (optional)"],457 "jsonSchema": "object (optional)"458 }459 }460}461```462
463**Key-Value Store Schema Properties:**464
465- `actorKeyValueStoreSchemaVersion` (integer, required) - Version of key-value store schema structure document (currently only version 1)466- `title` (string, required) - Title of the schema467- `description` (string, optional) - Description of the schema468- `collections` (Object, required) - Object where each key is a collection ID and value is a Collection object469
470**Collection Properties:**471
472- `title` (string, required) - Collection title shown in UI tabs473- `description` (string, optional) - Description appearing in UI tooltips474- `key` (string, conditional) - Single specific key for this collection475- `keyPrefix` (string, conditional) - Prefix for keys included in this collection476- `contentTypes` (string[], optional) - Allowed content types for validation477- `jsonSchema` (object, optional) - JSON Schema Draft 07 format for `application/json` content type validation478
479Either `key` or `keyPrefix` must be specified for each collection, but not both.480
481## Apify MCP Tools482
483If MCP server is configured, use these tools for documentation:484
485- `search-apify-docs` - Search documentation486- `fetch-apify-docs` - Get full doc pages487
488Otherwise, reference: `@https://mcp.apify.com/`489
490## Resources491
492- [docs.apify.com/llms.txt](https://docs.apify.com/llms.txt) - Quick reference493- [docs.apify.com/llms-full.txt](https://docs.apify.com/llms-full.txt) - Complete docs494- [crawlee.dev](https://crawlee.dev) - Crawlee documentation495- [whitepaper.actor](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete Actor specification# Specify the base Docker image. You can read more about# the available images at https://crawlee.dev/docs/guides/docker-images# You can also use any other image from Docker Hub.FROM apify/actor-node:22 AS builder
# Check preinstalled packagesRUN npm ls crawlee apify puppeteer playwright
# Copy just package.json and package-lock.json# to speed up the build using Docker layer cache.COPY package*.json ./
# Install all dependencies. Don't audit to speed up the installation.RUN npm install --include=dev --audit=false
# Next, copy the source files using the user set# in the base image.COPY . ./
# Install all dependencies and build the project.# Don't audit to speed up the installation.RUN npm run build
# Create final imageFROM apify/actor-node:22
# Check preinstalled packagesRUN npm ls crawlee apify puppeteer playwright
# Copy just package.json and package-lock.json# to speed up the build using Docker layer cache.COPY package*.json ./
# Install NPM packages, skip optional and development dependencies to# keep the image small. Avoid logging too much and print the dependency# tree for debuggingRUN npm --quiet set progress=false \ && npm install --omit=dev --omit=optional \ && echo "Installed NPM packages:" \ && (npm list --omit=dev --all || true) \ && echo "Node.js version:" \ && node --version \ && echo "NPM version:" \ && npm --version \ && rm -r ~/.npm
# Copy built JS files from builder imageCOPY /usr/src/app/dist ./dist
# Next, copy the remaining files and directories with the source code.# Since we do this after NPM install, quick build will be really fast# for most source file changes.COPY . ./
# Run the image.CMD npm run start:prod --silent1import prettier from 'eslint-config-prettier';2
3import apify from '@apify/eslint-config/ts.js';4import globals from 'globals';5import tsEslint from 'typescript-eslint';6
7// eslint-disable-next-line import/no-default-export8export default [9 { ignores: ['**/dist', 'eslint.config.mjs'] },10 ...apify,11 prettier,12 {13 languageOptions: {14 parser: tsEslint.parser,15 parserOptions: {16 project: 'tsconfig.json',17 },18 globals: {19 ...globals.node,20 ...globals.jest,21 },22 },23 plugins: {24 '@typescript-eslint': tsEslint.plugin,25 },26 rules: {27 'no-console': 0,28 },29 },30];{ "name": "best-actor-finder", "version": "0.1.0", "type": "module", "description": "Finds and tests the best actors for a specific task.", "engines": { "node": ">=18.0.0" }, "dependencies": { "@ai-sdk/openai": "^2.0.77", "ai": "^5.0.108", "apify": "^3.4.2", "apify-client": "^2.12.1", "crawlee": "^3.13.8", "openai": "^4.77.0" }, "devDependencies": { "@apify/eslint-config": "^1.0.0", "@apify/tsconfig": "^0.1.1", "@types/node": "^22.15.32", "eslint": "^9.29.0", "eslint-config-prettier": "^10.1.5", "globals": "^16.2.0", "prettier": "^3.5.3", "tsx": "^4.20.3", "typescript": "^5.8.3", "typescript-eslint": "^8.34.1" }, "scripts": { "start": "npm run start:dev", "start:prod": "node dist/main.js", "start:dev": "tsx src/main.ts", "build": "tsc", "lint": "eslint", "lint:fix": "eslint --fix", "format": "prettier --write .", "format:check": "prettier --check .", "test": "echo \"Error: oops, the Actor has no tests yet, sad!\" && exit 1" }, "author": "It's not you it's me", "license": "ISC"}{ "extends": "@apify/tsconfig", "compilerOptions": { "module": "NodeNext", "moduleResolution": "NodeNext", "target": "ES2022", "outDir": "dist", "noUnusedLocals": false, "skipLibCheck": true, "lib": ["DOM"] }, "include": ["./src/**/*"]}{ "actorSpecification": 1, "name": "best-actor-finder", "title": "Project Cheerio Crawler Typescript", "description": "Crawlee and Cheerio project in typescript.", "version": "0.0", "buildTag": "latest", "meta": { "templateId": "ts-crawlee-cheerio", "generatedBy": "<FILL-IN-MODEL>" }, "input": "./input_schema.json", "output": "./output_schema.json", "storages": { "dataset": "./dataset_schema.json" }, "dockerfile": "../Dockerfile"}{ "actorSpecification": 1, "fields": {}, "views": { "overview": { "title": "Overview", "transformation": { "fields": ["title", "url"] }, "display": { "component": "table", "properties": { "title": { "label": "Title", "format": "text" }, "url": { "label": "URL", "format": "link" } } } } }}{ "title": "Actor Scout v2", "description": "Find the best Apify Actor by actually testing them", "type": "object", "schemaVersion": 1, "properties": { "query": { "title": "What do you need?", "type": "string", "description": "Describe your use case. Be specific about: what data, from which website, any special requirements.", "editor": "textarea", "prefill": "I need to scrape Google Maps for restaurant data in NYC, including phone numbers, ratings, and reviews" }, "maxActorsToTest": { "title": "Number of Actors to test", "type": "integer", "description": "More tests = better comparison but higher cost (~$0.05-0.20 per actor)", "default": 3, "minimum": 2, "maximum": 5 }, "testTimeout": { "title": "Test timeout (seconds)", "type": "integer", "description": "Max time to wait for each actor's test run", "default": 60, "minimum": 30, "maximum": 120 } }, "required": ["query"]}{ "actorOutputSchemaVersion": 1, "title": "Output schema of the files scraper", "properties": { "overview": { "type": "string", "title": "Overview", "template": "{{links.apiDefaultDatasetUrl}}/items?view=overview" } }}1import { Actor, log, ApifyClient } from 'apify';2import { CheerioCrawler } from 'crawlee';3import OpenAI from 'openai';4
5// ============ TYPES ============6interface Input {7 query: string;8 maxActorsToTest?: number;9 testTimeout?: number;10}11
12interface ActorCandidate {13 id: string;14 name: string;15 username: string;16 title: string;17 description: string;18 url: string;19 actorId: string; // format: username/name20 stats: {21 totalUsers: number;22 totalRuns: number;23 };24 categories: string[];25 readme?: string;26 pricing?: string;27 inputSchema?: Record<string, any>;28}29
30interface PreliminaryScore {31 actor: ActorCandidate;32 score: number;33 reasoning: string;34}35
36interface TestRun {37 actorId: string;38 actorName: string;39 success: boolean;40 input: Record<string, any>;41 output: any[] | null;42 error?: string;43 duration: number;44 itemCount: number;45}46
47interface FinalResult {48 rank: number;49 actorId: string;50 actorName: string;51 actorUrl: string;52 username: string;53 preliminaryScore: number;54 testResult: {55 success: boolean;56 itemCount: number;57 duration: number;58 sampleOutput: any;59 error?: string;60 };61 finalScore: number;62 verdict: string;63 strengths: string[];64 weaknesses: string[];65}66
67// ============ CONSTANTS ============68const MODEL = 'google/gemini-2.5-flash';69const MAX_TEST_RESULTS = 3; // Limit results per test run to control cost70
71// ============ MAIN ============72await Actor.init();73
74const input = await Actor.getInput<Input>();75if (!input?.query) {76 throw new Error('Query is required');77}78
79const { 80 query, 81 maxActorsToTest = 3,82 testTimeout = 60 83} = input;84
85const startTime = Date.now();86const apifyClient = new ApifyClient({ token: process.env.APIFY_TOKEN });87
88log.info('🔍 Actor Scout v2 - Starting', { query, maxActorsToTest });89
90// ============ STEP 1: QUERY ANALYSIS ============91log.info('Step 1: Analyzing query...');92const { keywords, intent } = await analyzeQuery(query);93log.info('Query analyzed', { keywords, intent });94
95// ============ STEP 2: ACTOR DISCOVERY ============96log.info('Step 2: Searching Apify Store...');97const candidates = await searchApifyStore(keywords, 12);98log.info(`Found ${candidates.length} candidates`);99
100if (candidates.length === 0) {101 await Actor.setValue('OUTPUT', { error: 'No actors found for query', query });102 await Actor.exit();103}104
105// ============ STEP 3: PRELIMINARY RANKING ============106log.info('Step 3: Preliminary ranking (metadata + README)...');107const detailedCandidates = await fetchActorDetails(candidates, apifyClient);108const preliminaryRanking = await preliminaryRank(detailedCandidates, query, intent);109const topCandidates = preliminaryRanking.slice(0, maxActorsToTest);110
111log.info('Top candidates for testing:', { 112 actors: topCandidates.map(p => p.actor.title) 113});114
115// ============ STEP 4: GENERATE TEST INPUTS ============116log.info('Step 4: Generating test inputs for each actor...');117const testInputs = await generateTestInputs(topCandidates, query, intent);118
119// ============ STEP 5: PARALLEL TEST RUNS ============120log.info('Step 5: Running parallel tests...');121const testResults = await runParallelTests(122 testInputs, 123 apifyClient, 124 testTimeout125);126
127// ============ STEP 6: COMPARE OUTPUTS ============128log.info('Step 6: Comparing outputs with LLM...');129const finalRanking = await compareOutputs(130 testResults, 131 query, 132 intent,133 preliminaryRanking134);135
136// ============ STEP 7: FORMAT OUTPUT ============137const duration = ((Date.now() - startTime) / 1000).toFixed(1);138const output = formatFinalOutput(query, finalRanking, duration);139
140// Save results141await Actor.setValue('OUTPUT', {142 query,143 intent,144 results: finalRanking,145 formattedOutput: output,146 metadata: {147 candidatesFound: candidates.length,148 actorsTested: testResults.length,149 successfulTests: testResults.filter(t => t.success).length,150 duration: `${duration}s`,151 model: MODEL,152 },153});154
155await Actor.pushData(finalRanking);156
157console.log('\n' + output);158
159log.info('✅ Actor Scout complete!', { 160 winner: finalRanking[0]?.actorName,161 duration: `${duration}s` 162});163
164await Actor.exit();165
166// ============ HELPER FUNCTIONS ============167
168function getOpenAI(): OpenAI {169 return new OpenAI({170 baseURL: 'https://openrouter.apify.actor/api/v1',171 apiKey: 'apify',172 defaultHeaders: {173 Authorization: `Bearer ${process.env.APIFY_TOKEN}`,174 },175 });176}177
178async function analyzeQuery(query: string): Promise<{ keywords: string[]; intent: string }> {179 const openai = getOpenAI();180 181 const response = await openai.chat.completions.create({182 model: MODEL,183 messages: [{184 role: 'user',185 content: `Analyze this request for finding an Apify Actor (web scraping tool).186
187Request: "${query}"188
189Return JSON:190{191 "keywords": ["keyword1", "keyword2", "keyword3"], // 2-4 search terms192 "intent": "Brief description of what data the user wants and from where"193}194
195Only return the JSON:`196 }],197 temperature: 0,198 });199
200 const content = response.choices[0]?.message?.content || '{}';201 try {202 return JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim());203 } catch {204 return { 205 keywords: query.split(' ').filter(w => w.length > 3).slice(0, 4),206 intent: query 207 };208 }209}210
211async function searchApifyStore(keywords: string[], limit: number): Promise<ActorCandidate[]> {212 const searchQuery = keywords.join(' ');213 const url = `https://api.apify.com/v2/store?search=${encodeURIComponent(searchQuery)}&limit=${limit}`;214 215 const response = await fetch(url, {216 headers: { Authorization: `Bearer ${process.env.APIFY_TOKEN}` },217 });218 219 if (!response.ok) throw new Error(`Store API error: ${response.status}`);220 221 const data = await response.json();222 223 return data.data.items.map((item: any) => ({224 id: item.id,225 name: item.name,226 username: item.username,227 title: item.title || item.name,228 description: item.description || '',229 url: `https://apify.com/${item.username}/${item.name}`,230 actorId: `${item.username}/${item.name}`,231 stats: {232 totalUsers: item.stats?.totalUsers || 0,233 totalRuns: item.stats?.totalRuns || 0,234 },235 categories: item.categories || [],236 }));237}238
239async function fetchActorDetails(240 actors: ActorCandidate[], 241 client: ApifyClient242): Promise<ActorCandidate[]> {243 const detailed: ActorCandidate[] = [];244 245 for (const actor of actors) {246 try {247 // Fetch actor info including input schema248 const actorInfo = await client.actor(actor.actorId).get();249 250 detailed.push({251 ...actor,252 readme: actorInfo?.description || actor.description,253 inputSchema: actorInfo?.defaultRunOptions?.build 254 ? undefined 255 : (actorInfo as any)?.inputSchema,256 });257 } catch (e) {258 // If API call fails, use what we have259 detailed.push(actor);260 }261 }262 263 // Also crawl README pages for more detail264 const readmeMap = new Map<string, string>();265 266 const crawler = new CheerioCrawler({267 maxRequestsPerCrawl: Math.min(actors.length, 10),268 maxConcurrency: 5,269 requestHandler: async ({ request, $ }) => {270 const readme = $('article').first().text() || 271 $('.markdown-body').first().text() || '';272 readmeMap.set(request.userData.actorId, readme.slice(0, 3000));273 },274 failedRequestHandler: async () => {},275 });276 277 await crawler.run(278 actors.slice(0, 10).map(a => ({279 url: a.url,280 userData: { actorId: a.actorId },281 }))282 );283 284 return detailed.map(actor => ({285 ...actor,286 readme: readmeMap.get(actor.actorId) || actor.readme || actor.description,287 }));288}289
290async function preliminaryRank(291 actors: ActorCandidate[],292 query: string,293 intent: string294): Promise<PreliminaryScore[]> {295 const openai = getOpenAI();296 297 const actorSummaries = actors.map((a, i) => 298 `${i + 1}. ${a.title} (${a.actorId})299 Users: ${a.stats.totalUsers}, Runs: ${a.stats.totalRuns}300 Description: ${a.description?.slice(0, 200)}301 README excerpt: ${a.readme?.slice(0, 300) || 'N/A'}`302 ).join('\n\n');303 304 const response = await openai.chat.completions.create({305 model: MODEL,306 messages: [{307 role: 'user',308 content: `You're evaluating Apify Actors for this user request.309
310USER REQUEST: "${query}"311USER INTENT: ${intent}312
313CANDIDATE ACTORS:314${actorSummaries}315
316Score each actor 1-10 based on:317- How well it matches the user's intent318- Documentation quality319- Popularity/trust (user count, runs)320- Likelihood of working correctly321
322Return JSON array sorted by score (highest first):323[324 {"index": 1, "score": 9, "reasoning": "Best match because..."},325 {"index": 3, "score": 7, "reasoning": "Good but..."},326 ...327]328
329Only return the JSON array:`330 }],331 temperature: 0.2,332 });333
334 const content = response.choices[0]?.message?.content || '[]';335 try {336 const rankings = JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim());337 return rankings.map((r: any) => ({338 actor: actors[r.index - 1],339 score: r.score,340 reasoning: r.reasoning,341 })).filter((r: any) => r.actor); // Filter out any invalid indices342 } catch {343 // Fallback: return by user count344 return actors345 .sort((a, b) => b.stats.totalUsers - a.stats.totalUsers)346 .map(actor => ({ actor, score: 5, reasoning: 'Fallback ranking by popularity' }));347 }348}349
350async function generateTestInputs(351 candidates: PreliminaryScore[],352 query: string,353 intent: string354): Promise<Array<{ actor: ActorCandidate; input: Record<string, any>; prelim: PreliminaryScore }>> {355 const openai = getOpenAI();356 const results: Array<{ actor: ActorCandidate; input: Record<string, any>; prelim: PreliminaryScore }> = [];357 358 for (const prelim of candidates) {359 const actor = prelim.actor;360 361 // Try to fetch the actual input schema362 let schemaInfo = 'No schema available - generate reasonable defaults';363 try {364 const schemaUrl = `https://api.apify.com/v2/acts/${actor.actorId}/input-schema`;365 const schemaRes = await fetch(schemaUrl, {366 headers: { Authorization: `Bearer ${process.env.APIFY_TOKEN}` },367 });368 if (schemaRes.ok) {369 const schema = await schemaRes.json();370 schemaInfo = JSON.stringify(schema, null, 2).slice(0, 2000);371 }372 } catch {}373 374 const response = await openai.chat.completions.create({375 model: MODEL,376 messages: [{377 role: 'user',378 content: `Generate a minimal test input for this Apify Actor.379
380USER WANTS: "${query}"381INTENT: ${intent}382
383ACTOR: ${actor.title} (${actor.actorId})384DESCRIPTION: ${actor.description}385
386INPUT SCHEMA:387${schemaInfo}388
389Generate a JSON input that:3901. Matches what the user is looking for3912. Uses MINIMAL settings (we want just 1-${MAX_TEST_RESULTS} results to test)3923. Is valid for this actor's schema393
394Return ONLY the JSON input object (no explanation):`395 }],396 temperature: 0.1,397 });398
399 const content = response.choices[0]?.message?.content || '{}';400 try {401 const input = JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim());402 results.push({ actor, input, prelim });403 } catch {404 // Use a generic minimal input405 results.push({ 406 actor, 407 input: { maxItems: MAX_TEST_RESULTS },408 prelim 409 });410 }411 }412 413 return results;414}415
416async function runParallelTests(417 testInputs: Array<{ actor: ActorCandidate; input: Record<string, any>; prelim: PreliminaryScore }>,418 client: ApifyClient,419 timeoutSecs: number420): Promise<TestRun[]> {421 const testPromises = testInputs.map(async ({ actor, input }) => {422 const startTime = Date.now();423 424 try {425 log.info(`Testing: ${actor.title}`, { input });426 427 // Run the actor428 const run = await client.actor(actor.actorId).call(input, {429 timeout: timeoutSecs,430 memory: 256, // Minimal memory431 });432 433 // Fetch results from dataset434 const { items } = await client.dataset(run.defaultDatasetId).listItems({435 limit: MAX_TEST_RESULTS,436 });437 438 const duration = (Date.now() - startTime) / 1000;439 440 return {441 actorId: actor.actorId,442 actorName: actor.title,443 success: true,444 input,445 output: items,446 duration,447 itemCount: items.length,448 };449 } catch (error: any) {450 const duration = (Date.now() - startTime) / 1000;451 452 return {453 actorId: actor.actorId,454 actorName: actor.title,455 success: false,456 input,457 output: null,458 error: error.message || 'Unknown error',459 duration,460 itemCount: 0,461 };462 }463 });464 465 return Promise.all(testPromises);466}467
468async function compareOutputs(469 testResults: TestRun[],470 query: string,471 intent: string,472 preliminaryRanking: PreliminaryScore[]473): Promise<FinalResult[]> {474 const openai = getOpenAI();475 476 // Build comparison prompt477 const resultSummaries = testResults.map((t, i) => {478 if (!t.success) {479 return `${i + 1}. ${t.actorName} - ❌ FAILED480 Error: ${t.error}481 Duration: ${t.duration.toFixed(1)}s`;482 }483 484 const sampleOutput = JSON.stringify(t.output?.[0] || {}, null, 2).slice(0, 1000);485 return `${i + 1}. ${t.actorName} - ✅ SUCCESS486 Items returned: ${t.itemCount}487 Duration: ${t.duration.toFixed(1)}s488 Sample output:489 ${sampleOutput}`;490 }).join('\n\n');491 492 const response = await openai.chat.completions.create({493 model: MODEL,494 messages: [{495 role: 'user',496 content: `Compare these Apify Actor test results for the user's request.497
498USER REQUEST: "${query}"499INTENT: ${intent}500
501TEST RESULTS:502${resultSummaries}503
504For each actor, evaluate:5051. Did it succeed?5062. Does the output contain what the user needs?5073. Is the data quality good?5084. Is it fast enough?509
510Return JSON array with final ranking:511[512 {513 "actorName": "Name",514 "finalScore": 9.5,515 "verdict": "Best choice because...",516 "strengths": ["strength1", "strength2"],517 "weaknesses": ["weakness1"]518 },519 ...520]521
522Sort by finalScore descending. Only return JSON:`523 }],524 temperature: 0.2,525 });526
527 const content = response.choices[0]?.message?.content || '[]';528 let llmRankings: any[] = [];529 530 try {531 llmRankings = JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim());532 } catch {533 llmRankings = testResults.map(t => ({534 actorName: t.actorName,535 finalScore: t.success ? 7 : 2,536 verdict: t.success ? 'Test passed' : 'Test failed',537 strengths: t.success ? ['Completed successfully'] : [],538 weaknesses: t.success ? [] : [t.error || 'Failed'],539 }));540 }541 542 // Merge test results with LLM rankings543 return llmRankings.map((llm, index) => {544 const testResult = testResults.find(t => t.actorName === llm.actorName) || testResults[index];545 const prelim = preliminaryRanking.find(p => p.actor.title === llm.actorName);546 547 return {548 rank: index + 1,549 actorId: testResult?.actorId || '',550 actorName: llm.actorName,551 actorUrl: `https://apify.com/${testResult?.actorId}`,552 username: testResult?.actorId?.split('/')[0] || '',553 preliminaryScore: prelim?.score || 0,554 testResult: {555 success: testResult?.success || false,556 itemCount: testResult?.itemCount || 0,557 duration: testResult?.duration || 0,558 sampleOutput: testResult?.output?.[0] || null,559 error: testResult?.error,560 },561 finalScore: llm.finalScore,562 verdict: llm.verdict,563 strengths: llm.strengths || [],564 weaknesses: llm.weaknesses || [],565 };566 });567}568
569function formatFinalOutput(570 query: string, 571 results: FinalResult[], 572 duration: string573): string {574 const line = '─'.repeat(70);575 const doubleLine = '═'.repeat(70);576 577 let output = `578${doubleLine}579 🔍 ACTOR SCOUT v2 - TEST RESULTS580${doubleLine}581
582Query: "${query}"583Tested: ${results.length} actors | Duration: ${duration}s584
585${line}586 🏆 WINNER587${line}588`;589 590 if (results.length > 0) {591 const winner = results[0];592 const statusIcon = winner.testResult.success ? '✅' : '❌';593 594 output += `595 ${winner.actorName}596 ${winner.actorUrl}597 598 Final Score: ${winner.finalScore}/10599 Test Status: ${statusIcon} ${winner.testResult.success ? 'PASSED' : 'FAILED'}600 Items Retrieved: ${winner.testResult.itemCount}601 Test Duration: ${winner.testResult.duration.toFixed(1)}s602 603 💡 Verdict: ${winner.verdict}604 605 ✓ Strengths:606${winner.strengths.map(s => ` • ${s}`).join('\n')}607 608 ⚠ Weaknesses:609${winner.weaknesses.map(w => ` • ${w}`).join('\n') || ' • None identified'}610`;611 612 if (winner.testResult.sampleOutput) {613 output += `614 📦 Sample Output:615${JSON.stringify(winner.testResult.sampleOutput, null, 2).split('\n').map(l => ' ' + l).join('\n').slice(0, 800)}616`;617 }618 }619 620 output += `621${line}622 📊 ALL RESULTS623${line}624
625 Rank Actor Test Score Verdict626 ──── ───────────────────────────────── ─────── ────── ─────────────────627`;628 629 for (const r of results) {630 const name = r.actorName.length > 33 ? r.actorName.slice(0, 30) + '...' : r.actorName;631 const status = r.testResult.success ? '✅ Pass' : '❌ Fail';632 const verdictShort = r.verdict.slice(0, 20) + (r.verdict.length > 20 ? '...' : '');633 output += ` #${r.rank} ${name.padEnd(33)} ${status} ${String(r.finalScore).padStart(4)}/10 ${verdictShort}\n`;634 }635 636 // Runner-up details637 if (results.length > 1) {638 output += `639${line}640 📝 RUNNER-UP DETAILS641${line}642`;643 for (const r of results.slice(1)) {644 const statusIcon = r.testResult.success ? '✅' : '❌';645 output += `646 #${r.rank} ${r.actorName} (${r.finalScore}/10)647 ${statusIcon} ${r.testResult.success ? `Returned ${r.testResult.itemCount} items in ${r.testResult.duration.toFixed(1)}s` : `Failed: ${r.testResult.error}`}648 ${r.verdict}649`;650 }651 }652 653 output += `654${doubleLine}655 Actor Scout v2 | Model: ${MODEL} | Tested ${results.length} actors656${doubleLine}657`;658 659 return output;660}