Best Actor Finder

Pricing

Pay per usage

Try for free

Go to Apify Store

Best Actor Finder

Try for free

Finds and tests the best actors for a specific task.

Pricing

Pay per usage

Rating

5.0

(1)

Developer

Pranav Patel

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

22 days ago

Last modified

Makerworld Models Search Scraper

stealth_mode/makerworld-models-search-scraper

Scrape detailed 3D model data from MakerWorld.com, Bambu Lab's thriving maker community. Extract model specifications, engagement metrics, creator information, and printability data. Essential for market research, trend analysis, and competitive intelligence in the 3D printing ecosystem.

Stealth mode

Techcrunch Articles Listing By Keyword

datacach/techcrunch-articles-listing-by-keyword

Search and scrape TechCrunch articles by keyword. Extract article data including titles, URLs, publication dates, authors, and categories. Perfect for tech news monitoring, content research, and trend analysis. TechCrunch API alternative.

DataCach

Google News Scraper

simpleapi/google-news-scraper

Extract the latest Google News stories with full metadata and precise keyword filtering. Build datasets of headlines, publishers, and time-based insights. Ideal for media monitoring, academic research, and real-time intelligence dashboards.

SimpleAPI

Dataset Query Engine

jiri.spilka/dataset-query-engine

Use natural language queries to retrieve results from an Apify dataset. This Actor provides a query engine that loads a dataset, executes SQL queries, and synthesizes results.

Jiří Spilka

4.6

California State Bar Attorney Scraper

blueorion/california-state-bar-attorney-scraper

Scrape attorney profiles from The State Bar of California, including name, license status, contact info, emails, law school, languages spoken, and disciplinary history. Use search keywords, and extracts up to 500 attorneys per query. Ideal for legal research, background checks, compliance workflows.

Blue Orion

Airbnb Experiences Scraper 🎯

easyapi/airbnb-experiences-scraper

🌟 Extract detailed Airbnb Experiences data including titles, prices, ratings, locations, and media content. Perfect for market research, travel analysis, and price monitoring. Supports multiple search URLs and customizable data limits.

EasyApi

5.0

IMDB 🎞️ Extractor

jupri/imdb

💫 Scrape IMDb.com

cat

Website to RSS Feed Generator

constant_quadruped/website-to-rss

Convert any website into an RSS feed instantly. Auto-detects blog posts, news, and articles. Supports JavaScript sites via Playwright. Filter by keywords, extract full content, output as RSS or JSON. Perfect for competitor monitoring, news aggregation, and research tracking.

Quadruped

FindLaw Scraper

jungle_synthesizer/findlaw-scraper

Scrape attorney and law firm data from FindLaw Lawyer Directory to generate high-quality, targeted legal industry leads

BowTiedRacoon

115

5.0

USA Data.gov U.S. Government's Open Data Scrape

parseforge/data-gov-scraper

Stop wasting hours digging through thousands of government datasets. Our Data.gov scraper automatically gathers complete dataset details from the U.S. government's open data portal in minutes. Ideal for researchers, analysts, journalists, and teams needing reliable data without manual effort.

ParseForge

5.0

1# Apify Actors Development Guide 2 3Important: Before you begin, fill in the `generatedBy` property in the meta section of `.actor/actor.json`. Replace it with the model you're currently using. This helps Apify monitor and improve AGENTS.md for specific LLM models. 4 5## What are Apify Actors? 6 7- Actors are serverless programs that run in the cloud. They're inspired by the UNIX philosophy - programs that do one thing well and can be easily combined to build complex systems. 8- Actors are programs packaged as Docker images that run in isolated containers 9 10## Core Concepts 11 12- Accept well-defined JSON input 13- Perform isolated tasks (web scraping, automation, data processing) 14- Produce structured JSON output to datasets and/or store data in key-value stores 15- Can run from seconds to hours or even indefinitely 16- Persist state and can be restarted 17 18## Do 19 20- accept well-defined JSON input and produce structured JSON output 21- use Apify SDK (`apify`) for code running ON Apify platform 22- validate input early with proper error handling and fail gracefully 23- use CheerioCrawler for static HTML content (10x faster than browsers) 24- use PlaywrightCrawler only for JavaScript-heavy sites and dynamic content 25- use router pattern (createCheerioRouter/createPlaywrightRouter) for complex crawls 26- implement retry strategies with exponential backoff for failed requests 27- use proper concurrency settings (HTTP: 10-50, Browser: 1-5) 28- set sensible defaults in `.actor/input_schema.json` for all optional fields 29- set up output schema in `.actor/output_schema.json` 30- clean and validate data before pushing to dataset 31- use semantic CSS selectors and fallback strategies for missing elements 32- respect robots.txt, ToS, and implement rate limiting with delays 33- check which tools (cheerio/playwright/crawlee) are installed before applying guidance 34 35## Don't 36 37- do not rely on `Dataset.getInfo()` for final counts on Cloud platform 38- do not use browser crawlers when HTTP/Cheerio works (massive performance gains with HTTP) 39- do not hard code values that should be in input schema or environment variables 40- do not skip input validation or error handling 41- do not overload servers - use appropriate concurrency and delays 42- do not scrape prohibited content or ignore Terms of Service 43- do not store personal/sensitive data unless explicitly permitted 44- do not use deprecated options like `requestHandlerTimeoutMillis` on CheerioCrawler (v3.x) 45- do not use `additionalHttpHeaders` - use `preNavigationHooks` instead 46 47## Commands 48 49```bash 50# Local development 51apify run # Run Actor locally 52 53# Authentication & deployment 54apify login # Authenticate account 55apify push # Deploy to Apify platform 56 57# Help 58apify help # List all commands 59``` 60 61## Safety and Permissions 62 63Allowed without prompt: 64 65- read files with `Actor.getValue()` 66- push data with `Actor.pushData()` 67- set values with `Actor.setValue()` 68- enqueue requests to RequestQueue 69- run locally with `apify run` 70 71Ask first: 72 73- npm/pip package installations 74- apify push (deployment to cloud) 75- proxy configuration changes (requires paid plan) 76- Dockerfile changes affecting builds 77- deleting datasets or key-value stores 78 79## Project Structure 80 81.actor/ 82├── actor.json # Actor config: name, version, env vars, runtime settings 83├── input_schema.json # Input validation & Console form definition 84└── output_schema.json # Specifies where an Actor stores its output 85src/ 86└── main.js # Actor entry point and orchestrator 87storage/ # Local storage (mirrors Cloud during development) 88├── datasets/ # Output items (JSON objects) 89├── key_value_stores/ # Files, config, INPUT 90└── request_queues/ # Pending crawl requests 91Dockerfile # Container image definition 92AGENTS.md # AI agent instructions (this file) 93 94## Actor Input Schema 95 96The input schema defines the input parameters for an Actor. It's a JSON object comprising various field types supported by the Apify platform. 97 98### Structure 99 100```json 101{ 102 "title": "<INPUT-SCHEMA-TITLE>", 103 "type": "object", 104 "schemaVersion": 1, 105 "properties": { 106 /* define input fields here */ 107 }, 108 "required": [] 109} 110``` 111 112### Example 113 114```json 115{ 116 "title": "E-commerce Product Scraper Input", 117 "type": "object", 118 "schemaVersion": 1, 119 "properties": { 120 "startUrls": { 121 "title": "Start URLs", 122 "type": "array", 123 "description": "URLs to start scraping from (category pages or product pages)", 124 "editor": "requestListSources", 125 "default": [{ "url": "https://example.com/category" }], 126 "prefill": [{ "url": "https://example.com/category" }] 127 }, 128 "followVariants": { 129 "title": "Follow Product Variants", 130 "type": "boolean", 131 "description": "Whether to scrape product variants (different colors, sizes)", 132 "default": true 133 }, 134 "maxRequestsPerCrawl": { 135 "title": "Max Requests per Crawl", 136 "type": "integer", 137 "description": "Maximum number of pages to scrape (0 = unlimited)", 138 "default": 1000, 139 "minimum": 0 140 }, 141 "proxyConfiguration": { 142 "title": "Proxy Configuration", 143 "type": "object", 144 "description": "Proxy settings for anti-bot protection", 145 "editor": "proxy", 146 "default": { "useApifyProxy": false } 147 }, 148 "locale": { 149 "title": "Locale", 150 "type": "string", 151 "description": "Language/country code for localized content", 152 "default": "cs", 153 "enum": ["cs", "en", "de", "sk"], 154 "enumTitles": ["Czech", "English", "German", "Slovak"] 155 } 156 }, 157 "required": ["startUrls"] 158} 159``` 160 161## Actor Output Schema 162 163The Actor output schema builds upon the schemas for the dataset and key-value store. It specifies where an Actor stores its output and defines templates for accessing that output. Apify Console uses these output definitions to display run results. 164 165### Structure 166 167```json 168{ 169 "actorOutputSchemaVersion": 1, 170 "title": "<OUTPUT-SCHEMA-TITLE>", 171 "properties": { 172 /* define your outputs here */ 173 } 174} 175``` 176 177### Example 178 179```json 180{ 181 "actorOutputSchemaVersion": 1, 182 "title": "Output schema of the files scraper", 183 "properties": { 184 "files": { 185 "type": "string", 186 "title": "Files", 187 "template": "{{links.apiDefaultKeyValueStoreUrl}}/keys" 188 }, 189 "dataset": { 190 "type": "string", 191 "title": "Dataset", 192 "template": "{{links.apiDefaultDatasetUrl}}/items" 193 } 194 } 195} 196``` 197 198### Output Schema Template Variables 199 200- `links` (object) - Contains quick links to most commonly used URLs 201- `links.publicRunUrl` (string) - Public run url in format `https://console.apify.com/view/runs/:runId` 202- `links.consoleRunUrl` (string) - Console run url in format `https://console.apify.com/actors/runs/:runId` 203- `links.apiRunUrl` (string) - API run url in format `https://api.apify.com/v2/actor-runs/:runId` 204- `links.apiDefaultDatasetUrl` (string) - API url of default dataset in format `https://api.apify.com/v2/datasets/:defaultDatasetId` 205- `links.apiDefaultKeyValueStoreUrl` (string) - API url of default key-value store in format `https://api.apify.com/v2/key-value-stores/:defaultKeyValueStoreId` 206- `links.containerRunUrl` (string) - URL of a webserver running inside the run in format `https://<containerId>.runs.apify.net/` 207- `run` (object) - Contains information about the run same as it is returned from the `GET Run` API endpoint 208- `run.defaultDatasetId` (string) - ID of the default dataset 209- `run.defaultKeyValueStoreId` (string) - ID of the default key-value store 210 211## Dataset Schema Specification 212 213The dataset schema defines how your Actor's output data is structured, transformed, and displayed in the Output tab in the Apify Console. 214 215### Example 216 217Consider an example Actor that calls `Actor.pushData()` to store data into dataset: 218 219```typescript 220import { Actor } from 'apify'; 221// Initialize the JavaScript SDK 222await Actor.init(); 223 224/** 225 * Actor code 226 */ 227await Actor.pushData({ 228 numericField: 10, 229 pictureUrl: 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_92x30dp.png', 230 linkUrl: 'https://google.com', 231 textField: 'Google', 232 booleanField: true, 233 dateField: new Date(), 234 arrayField: ['#hello', '#world'], 235 objectField: {}, 236}); 237 238// Exit successfully 239await Actor.exit(); 240``` 241 242To set up the Actor's output tab UI, reference a dataset schema file in `.actor/actor.json`: 243 244```json 245{ 246 "actorSpecification": 1, 247 "name": "book-library-scraper", 248 "title": "Book Library Scraper", 249 "version": "1.0.0", 250 "storages": { 251 "dataset": "./dataset_schema.json" 252 } 253} 254``` 255 256Then create the dataset schema in `.actor/dataset_schema.json`: 257 258```json 259{ 260 "actorSpecification": 1, 261 "fields": {}, 262 "views": { 263 "overview": { 264 "title": "Overview", 265 "transformation": { 266 "fields": [ 267 "pictureUrl", 268 "linkUrl", 269 "textField", 270 "booleanField", 271 "arrayField", 272 "objectField", 273 "dateField", 274 "numericField" 275 ] 276 }, 277 "display": { 278 "component": "table", 279 "properties": { 280 "pictureUrl": { 281 "label": "Image", 282 "format": "image" 283 }, 284 "linkUrl": { 285 "label": "Link", 286 "format": "link" 287 }, 288 "textField": { 289 "label": "Text", 290 "format": "text" 291 }, 292 "booleanField": { 293 "label": "Boolean", 294 "format": "boolean" 295 }, 296 "arrayField": { 297 "label": "Array", 298 "format": "array" 299 }, 300 "objectField": { 301 "label": "Object", 302 "format": "object" 303 }, 304 "dateField": { 305 "label": "Date", 306 "format": "date" 307 }, 308 "numericField": { 309 "label": "Number", 310 "format": "number" 311 } 312 } 313 } 314 } 315 } 316} 317``` 318 319### Structure 320 321```json 322{ 323 "actorSpecification": 1, 324 "fields": {}, 325 "views": { 326 "<VIEW_NAME>": { 327 "title": "string (required)", 328 "description": "string (optional)", 329 "transformation": { 330 "fields": ["string (required)"], 331 "unwind": ["string (optional)"], 332 "flatten": ["string (optional)"], 333 "omit": ["string (optional)"], 334 "limit": "integer (optional)", 335 "desc": "boolean (optional)" 336 }, 337 "display": { 338 "component": "table (required)", 339 "properties": { 340 "<FIELD_NAME>": { 341 "label": "string (optional)", 342 "format": "text|number|date|link|boolean|image|array|object (optional)" 343 } 344 } 345 } 346 } 347 } 348} 349``` 350 351**Dataset Schema Properties:** 352 353- `actorSpecification` (integer, required) - Specifies the version of dataset schema structure document (currently only version 1) 354- `fields` (JSONSchema object, required) - Schema of one dataset object (use JsonSchema Draft 2020-12 or compatible) 355- `views` (DatasetView object, required) - Object with API and UI views description 356 357**DatasetView Properties:** 358 359- `title` (string, required) - Visible in UI Output tab and API 360- `description` (string, optional) - Only available in API response 361- `transformation` (ViewTransformation object, required) - Data transformation applied when loading from Dataset API 362- `display` (ViewDisplay object, required) - Output tab UI visualization definition 363 364**ViewTransformation Properties:** 365 366- `fields` (string[], required) - Fields to present in output (order matches column order) 367- `unwind` (string[], optional) - Deconstructs nested children into parent object 368- `flatten` (string[], optional) - Transforms nested object into flat structure 369- `omit` (string[], optional) - Removes specified fields from output 370- `limit` (integer, optional) - Maximum number of results (default: all) 371- `desc` (boolean, optional) - Sort order (true = newest first) 372 373**ViewDisplay Properties:** 374 375- `component` (string, required) - Only `table` is available 376- `properties` (Object, optional) - Keys matching `transformation.fields` with ViewDisplayProperty values 377 378**ViewDisplayProperty Properties:** 379 380- `label` (string, optional) - Table column header 381- `format` (string, optional) - One of: `text`, `number`, `date`, `link`, `boolean`, `image`, `array`, `object` 382 383## Key-Value Store Schema Specification 384 385The key-value store schema organizes keys into logical groups called collections for easier data management. 386 387### Example 388 389Consider an example Actor that calls `Actor.setValue()` to save records into the key-value store: 390 391```typescript 392import { Actor } from 'apify'; 393// Initialize the JavaScript SDK 394await Actor.init(); 395 396/** 397 * Actor code 398 */ 399await Actor.setValue('document-1', 'my text data', { contentType: 'text/plain' }); 400 401await Actor.setValue(`image-${imageID}`, imageBuffer, { contentType: 'image/jpeg' }); 402 403// Exit successfully 404await Actor.exit(); 405``` 406 407To configure the key-value store schema, reference a schema file in `.actor/actor.json`: 408 409```json 410{ 411 "actorSpecification": 1, 412 "name": "data-collector", 413 "title": "Data Collector", 414 "version": "1.0.0", 415 "storages": { 416 "keyValueStore": "./key_value_store_schema.json" 417 } 418} 419``` 420 421Then create the key-value store schema in `.actor/key_value_store_schema.json`: 422 423```json 424{ 425 "actorKeyValueStoreSchemaVersion": 1, 426 "title": "Key-Value Store Schema", 427 "collections": { 428 "documents": { 429 "title": "Documents", 430 "description": "Text documents stored by the Actor", 431 "keyPrefix": "document-" 432 }, 433 "images": { 434 "title": "Images", 435 "description": "Images stored by the Actor", 436 "keyPrefix": "image-", 437 "contentTypes": ["image/jpeg"] 438 } 439 } 440} 441``` 442 443### Structure 444 445```json 446{ 447 "actorKeyValueStoreSchemaVersion": 1, 448 "title": "string (required)", 449 "description": "string (optional)", 450 "collections": { 451 "<COLLECTION_NAME>": { 452 "title": "string (required)", 453 "description": "string (optional)", 454 "key": "string (conditional - use key OR keyPrefix)", 455 "keyPrefix": "string (conditional - use key OR keyPrefix)", 456 "contentTypes": ["string (optional)"], 457 "jsonSchema": "object (optional)" 458 } 459 } 460} 461``` 462 463**Key-Value Store Schema Properties:** 464 465- `actorKeyValueStoreSchemaVersion` (integer, required) - Version of key-value store schema structure document (currently only version 1) 466- `title` (string, required) - Title of the schema 467- `description` (string, optional) - Description of the schema 468- `collections` (Object, required) - Object where each key is a collection ID and value is a Collection object 469 470**Collection Properties:** 471 472- `title` (string, required) - Collection title shown in UI tabs 473- `description` (string, optional) - Description appearing in UI tooltips 474- `key` (string, conditional) - Single specific key for this collection 475- `keyPrefix` (string, conditional) - Prefix for keys included in this collection 476- `contentTypes` (string[], optional) - Allowed content types for validation 477- `jsonSchema` (object, optional) - JSON Schema Draft 07 format for `application/json` content type validation 478 479Either `key` or `keyPrefix` must be specified for each collection, but not both. 480 481## Apify MCP Tools 482 483If MCP server is configured, use these tools for documentation: 484 485- `search-apify-docs` - Search documentation 486- `fetch-apify-docs` - Get full doc pages 487 488Otherwise, reference: `@https://mcp.apify.com/` 489 490## Resources 491 492- [docs.apify.com/llms.txt](https://docs.apify.com/llms.txt) - Quick reference 493- [docs.apify.com/llms-full.txt](https://docs.apify.com/llms-full.txt) - Complete docs 494- [crawlee.dev](https://crawlee.dev) - Crawlee documentation 495- [whitepaper.actor](https://raw.githubusercontent.com/apify/actor-whitepaper/refs/heads/master/README.md) - Complete Actor specification

# Specify the base Docker image. You can read more about # the available images at https://crawlee.dev/docs/guides/docker-images # You can also use any other image from Docker Hub. FROM apify/actor-node:22 AS builder # Check preinstalled packages RUN npm ls crawlee apify puppeteer playwright # Copy just package.json and package-lock.json # to speed up the build using Docker layer cache. COPY --chown=myuser:myuser package*.json ./ # Install all dependencies. Don't audit to speed up the installation. RUN npm install --include=dev --audit=false # Next, copy the source files using the user set # in the base image. COPY --chown=myuser:myuser . ./ # Install all dependencies and build the project. # Don't audit to speed up the installation. RUN npm run build # Create final image FROM apify/actor-node:22 # Check preinstalled packages RUN npm ls crawlee apify puppeteer playwright # Copy just package.json and package-lock.json # to speed up the build using Docker layer cache. COPY --chown=myuser:myuser package*.json ./ # Install NPM packages, skip optional and development dependencies to # keep the image small. Avoid logging too much and print the dependency # tree for debugging RUN npm --quiet set progress=false \ && npm install --omit=dev --omit=optional \ && echo "Installed NPM packages:" \ && (npm list --omit=dev --all || true) \ && echo "Node.js version:" \ && node --version \ && echo "NPM version:" \ && npm --version \ && rm -r ~/.npm # Copy built JS files from builder image COPY --from=builder --chown=myuser:myuser /usr/src/app/dist ./dist # Next, copy the remaining files and directories with the source code. # Since we do this after NPM install, quick build will be really fast # for most source file changes. COPY --chown=myuser:myuser . ./ # Run the image. CMD npm run start:prod --silent

1import prettier from 'eslint-config-prettier'; 2 3import apify from '@apify/eslint-config/ts.js'; 4import globals from 'globals'; 5import tsEslint from 'typescript-eslint'; 6 7// eslint-disable-next-line import/no-default-export 8export default [ 9 { ignores: ['**/dist', 'eslint.config.mjs'] }, 10 ...apify, 11 prettier, 12 { 13 languageOptions: { 14 parser: tsEslint.parser, 15 parserOptions: { 16 project: 'tsconfig.json', 17 }, 18 globals: { 19 ...globals.node, 20 ...globals.jest, 21 }, 22 }, 23 plugins: { 24 '@typescript-eslint': tsEslint.plugin, 25 }, 26 rules: { 27 'no-console': 0, 28 }, 29 }, 30];

{ "name": "best-actor-finder", "version": "0.1.0", "type": "module", "description": "Finds and tests the best actors for a specific task.", "engines": { "node": ">=18.0.0" }, "dependencies": { "@ai-sdk/openai": "^2.0.77", "ai": "^5.0.108", "apify": "^3.4.2", "apify-client": "^2.12.1", "crawlee": "^3.13.8", "openai": "^4.77.0" }, "devDependencies": { "@apify/eslint-config": "^1.0.0", "@apify/tsconfig": "^0.1.1", "@types/node": "^22.15.32", "eslint": "^9.29.0", "eslint-config-prettier": "^10.1.5", "globals": "^16.2.0", "prettier": "^3.5.3", "tsx": "^4.20.3", "typescript": "^5.8.3", "typescript-eslint": "^8.34.1" }, "scripts": { "start": "npm run start:dev", "start:prod": "node dist/main.js", "start:dev": "tsx src/main.ts", "build": "tsc", "lint": "eslint", "lint:fix": "eslint --fix", "format": "prettier --write .", "format:check": "prettier --check .", "test": "echo \"Error: oops, the Actor has no tests yet, sad!\" && exit 1" }, "author": "It's not you it's me", "license": "ISC" }

{ "extends": "@apify/tsconfig", "compilerOptions": { "module": "NodeNext", "moduleResolution": "NodeNext", "target": "ES2022", "outDir": "dist", "noUnusedLocals": false, "skipLibCheck": true, "lib": ["DOM"] }, "include": ["./src/**/*"] }

{ "actorSpecification": 1, "name": "best-actor-finder", "title": "Project Cheerio Crawler Typescript", "description": "Crawlee and Cheerio project in typescript.", "version": "0.0", "buildTag": "latest", "meta": { "templateId": "ts-crawlee-cheerio", "generatedBy": "<FILL-IN-MODEL>" }, "input": "./input_schema.json", "output": "./output_schema.json", "storages": { "dataset": "./dataset_schema.json" }, "dockerfile": "../Dockerfile" }

{ "actorSpecification": 1, "fields": {}, "views": { "overview": { "title": "Overview", "transformation": { "fields": ["title", "url"] }, "display": { "component": "table", "properties": { "title": { "label": "Title", "format": "text" }, "url": { "label": "URL", "format": "link" } } } } } }

{ "title": "Actor Scout v2", "description": "Find the best Apify Actor by actually testing them", "type": "object", "schemaVersion": 1, "properties": { "query": { "title": "What do you need?", "type": "string", "description": "Describe your use case. Be specific about: what data, from which website, any special requirements.", "editor": "textarea", "prefill": "I need to scrape Google Maps for restaurant data in NYC, including phone numbers, ratings, and reviews" }, "maxActorsToTest": { "title": "Number of Actors to test", "type": "integer", "description": "More tests = better comparison but higher cost (~$0.05-0.20 per actor)", "default": 3, "minimum": 2, "maximum": 5 }, "testTimeout": { "title": "Test timeout (seconds)", "type": "integer", "description": "Max time to wait for each actor's test run", "default": 60, "minimum": 30, "maximum": 120 } }, "required": ["query"] }

{ "actorOutputSchemaVersion": 1, "title": "Output schema of the files scraper", "properties": { "overview": { "type": "string", "title": "Overview", "template": "{{links.apiDefaultDatasetUrl}}/items?view=overview" } } }

1import { Actor, log, ApifyClient } from 'apify'; 2import { CheerioCrawler } from 'crawlee'; 3import OpenAI from 'openai'; 4 5// ============ TYPES ============ 6interface Input { 7 query: string; 8 maxActorsToTest?: number; 9 testTimeout?: number; 10} 11 12interface ActorCandidate { 13 id: string; 14 name: string; 15 username: string; 16 title: string; 17 description: string; 18 url: string; 19 actorId: string; // format: username/name 20 stats: { 21 totalUsers: number; 22 totalRuns: number; 23 }; 24 categories: string[]; 25 readme?: string; 26 pricing?: string; 27 inputSchema?: Record<string, any>; 28} 29 30interface PreliminaryScore { 31 actor: ActorCandidate; 32 score: number; 33 reasoning: string; 34} 35 36interface TestRun { 37 actorId: string; 38 actorName: string; 39 success: boolean; 40 input: Record<string, any>; 41 output: any[] | null; 42 error?: string; 43 duration: number; 44 itemCount: number; 45} 46 47interface FinalResult { 48 rank: number; 49 actorId: string; 50 actorName: string; 51 actorUrl: string; 52 username: string; 53 preliminaryScore: number; 54 testResult: { 55 success: boolean; 56 itemCount: number; 57 duration: number; 58 sampleOutput: any; 59 error?: string; 60 }; 61 finalScore: number; 62 verdict: string; 63 strengths: string[]; 64 weaknesses: string[]; 65} 66 67// ============ CONSTANTS ============ 68const MODEL = 'google/gemini-2.5-flash'; 69const MAX_TEST_RESULTS = 3; // Limit results per test run to control cost 70 71// ============ MAIN ============ 72await Actor.init(); 73 74const input = await Actor.getInput<Input>(); 75if (!input?.query) { 76 throw new Error('Query is required'); 77} 78 79const { 80 query, 81 maxActorsToTest = 3, 82 testTimeout = 60 83} = input; 84 85const startTime = Date.now(); 86const apifyClient = new ApifyClient({ token: process.env.APIFY_TOKEN }); 87 88log.info('🔍 Actor Scout v2 - Starting', { query, maxActorsToTest }); 89 90// ============ STEP 1: QUERY ANALYSIS ============ 91log.info('Step 1: Analyzing query...'); 92const { keywords, intent } = await analyzeQuery(query); 93log.info('Query analyzed', { keywords, intent }); 94 95// ============ STEP 2: ACTOR DISCOVERY ============ 96log.info('Step 2: Searching Apify Store...'); 97const candidates = await searchApifyStore(keywords, 12); 98log.info(`Found ${candidates.length} candidates`); 99 100if (candidates.length === 0) { 101 await Actor.setValue('OUTPUT', { error: 'No actors found for query', query }); 102 await Actor.exit(); 103} 104 105// ============ STEP 3: PRELIMINARY RANKING ============ 106log.info('Step 3: Preliminary ranking (metadata + README)...'); 107const detailedCandidates = await fetchActorDetails(candidates, apifyClient); 108const preliminaryRanking = await preliminaryRank(detailedCandidates, query, intent); 109const topCandidates = preliminaryRanking.slice(0, maxActorsToTest); 110 111log.info('Top candidates for testing:', { 112 actors: topCandidates.map(p => p.actor.title) 113}); 114 115// ============ STEP 4: GENERATE TEST INPUTS ============ 116log.info('Step 4: Generating test inputs for each actor...'); 117const testInputs = await generateTestInputs(topCandidates, query, intent); 118 119// ============ STEP 5: PARALLEL TEST RUNS ============ 120log.info('Step 5: Running parallel tests...'); 121const testResults = await runParallelTests( 122 testInputs, 123 apifyClient, 124 testTimeout 125); 126 127// ============ STEP 6: COMPARE OUTPUTS ============ 128log.info('Step 6: Comparing outputs with LLM...'); 129const finalRanking = await compareOutputs( 130 testResults, 131 query, 132 intent, 133 preliminaryRanking 134); 135 136// ============ STEP 7: FORMAT OUTPUT ============ 137const duration = ((Date.now() - startTime) / 1000).toFixed(1); 138const output = formatFinalOutput(query, finalRanking, duration); 139 140// Save results 141await Actor.setValue('OUTPUT', { 142 query, 143 intent, 144 results: finalRanking, 145 formattedOutput: output, 146 metadata: { 147 candidatesFound: candidates.length, 148 actorsTested: testResults.length, 149 successfulTests: testResults.filter(t => t.success).length, 150 duration: `${duration}s`, 151 model: MODEL, 152 }, 153}); 154 155await Actor.pushData(finalRanking); 156 157console.log('\n' + output); 158 159log.info('✅ Actor Scout complete!', { 160 winner: finalRanking[0]?.actorName, 161 duration: `${duration}s` 162}); 163 164await Actor.exit(); 165 166// ============ HELPER FUNCTIONS ============ 167 168function getOpenAI(): OpenAI { 169 return new OpenAI({ 170 baseURL: 'https://openrouter.apify.actor/api/v1', 171 apiKey: 'apify', 172 defaultHeaders: { 173 Authorization: `Bearer ${process.env.APIFY_TOKEN}`, 174 }, 175 }); 176} 177 178async function analyzeQuery(query: string): Promise<{ keywords: string[]; intent: string }> { 179 const openai = getOpenAI(); 180 181 const response = await openai.chat.completions.create({ 182 model: MODEL, 183 messages: [{ 184 role: 'user', 185 content: `Analyze this request for finding an Apify Actor (web scraping tool). 186 187Request: "${query}" 188 189Return JSON: 190{ 191 "keywords": ["keyword1", "keyword2", "keyword3"], // 2-4 search terms 192 "intent": "Brief description of what data the user wants and from where" 193} 194 195Only return the JSON:` 196 }], 197 temperature: 0, 198 }); 199 200 const content = response.choices[0]?.message?.content || '{}'; 201 try { 202 return JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim()); 203 } catch { 204 return { 205 keywords: query.split(' ').filter(w => w.length > 3).slice(0, 4), 206 intent: query 207 }; 208 } 209} 210 211async function searchApifyStore(keywords: string[], limit: number): Promise<ActorCandidate[]> { 212 const searchQuery = keywords.join(' '); 213 const url = `https://api.apify.com/v2/store?search=${encodeURIComponent(searchQuery)}&limit=${limit}`; 214 215 const response = await fetch(url, { 216 headers: { Authorization: `Bearer ${process.env.APIFY_TOKEN}` }, 217 }); 218 219 if (!response.ok) throw new Error(`Store API error: ${response.status}`); 220 221 const data = await response.json(); 222 223 return data.data.items.map((item: any) => ({ 224 id: item.id, 225 name: item.name, 226 username: item.username, 227 title: item.title || item.name, 228 description: item.description || '', 229 url: `https://apify.com/${item.username}/${item.name}`, 230 actorId: `${item.username}/${item.name}`, 231 stats: { 232 totalUsers: item.stats?.totalUsers || 0, 233 totalRuns: item.stats?.totalRuns || 0, 234 }, 235 categories: item.categories || [], 236 })); 237} 238 239async function fetchActorDetails( 240 actors: ActorCandidate[], 241 client: ApifyClient 242): Promise<ActorCandidate[]> { 243 const detailed: ActorCandidate[] = []; 244 245 for (const actor of actors) { 246 try { 247 // Fetch actor info including input schema 248 const actorInfo = await client.actor(actor.actorId).get(); 249 250 detailed.push({ 251 ...actor, 252 readme: actorInfo?.description || actor.description, 253 inputSchema: actorInfo?.defaultRunOptions?.build 254 ? undefined 255 : (actorInfo as any)?.inputSchema, 256 }); 257 } catch (e) { 258 // If API call fails, use what we have 259 detailed.push(actor); 260 } 261 } 262 263 // Also crawl README pages for more detail 264 const readmeMap = new Map<string, string>(); 265 266 const crawler = new CheerioCrawler({ 267 maxRequestsPerCrawl: Math.min(actors.length, 10), 268 maxConcurrency: 5, 269 requestHandler: async ({ request, $ }) => { 270 const readme = $('article').first().text() || 271 $('.markdown-body').first().text() || ''; 272 readmeMap.set(request.userData.actorId, readme.slice(0, 3000)); 273 }, 274 failedRequestHandler: async () => {}, 275 }); 276 277 await crawler.run( 278 actors.slice(0, 10).map(a => ({ 279 url: a.url, 280 userData: { actorId: a.actorId }, 281 })) 282 ); 283 284 return detailed.map(actor => ({ 285 ...actor, 286 readme: readmeMap.get(actor.actorId) || actor.readme || actor.description, 287 })); 288} 289 290async function preliminaryRank( 291 actors: ActorCandidate[], 292 query: string, 293 intent: string 294): Promise<PreliminaryScore[]> { 295 const openai = getOpenAI(); 296 297 const actorSummaries = actors.map((a, i) => 298 `${i + 1}. ${a.title} (${a.actorId}) 299 Users: ${a.stats.totalUsers}, Runs: ${a.stats.totalRuns} 300 Description: ${a.description?.slice(0, 200)} 301 README excerpt: ${a.readme?.slice(0, 300) || 'N/A'}` 302 ).join('\n\n'); 303 304 const response = await openai.chat.completions.create({ 305 model: MODEL, 306 messages: [{ 307 role: 'user', 308 content: `You're evaluating Apify Actors for this user request. 309 310USER REQUEST: "${query}" 311USER INTENT: ${intent} 312 313CANDIDATE ACTORS: 314${actorSummaries} 315 316Score each actor 1-10 based on: 317- How well it matches the user's intent 318- Documentation quality 319- Popularity/trust (user count, runs) 320- Likelihood of working correctly 321 322Return JSON array sorted by score (highest first): 323[ 324 {"index": 1, "score": 9, "reasoning": "Best match because..."}, 325 {"index": 3, "score": 7, "reasoning": "Good but..."}, 326 ... 327] 328 329Only return the JSON array:` 330 }], 331 temperature: 0.2, 332 }); 333 334 const content = response.choices[0]?.message?.content || '[]'; 335 try { 336 const rankings = JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim()); 337 return rankings.map((r: any) => ({ 338 actor: actors[r.index - 1], 339 score: r.score, 340 reasoning: r.reasoning, 341 })).filter((r: any) => r.actor); // Filter out any invalid indices 342 } catch { 343 // Fallback: return by user count 344 return actors 345 .sort((a, b) => b.stats.totalUsers - a.stats.totalUsers) 346 .map(actor => ({ actor, score: 5, reasoning: 'Fallback ranking by popularity' })); 347 } 348} 349 350async function generateTestInputs( 351 candidates: PreliminaryScore[], 352 query: string, 353 intent: string 354): Promise<Array<{ actor: ActorCandidate; input: Record<string, any>; prelim: PreliminaryScore }>> { 355 const openai = getOpenAI(); 356 const results: Array<{ actor: ActorCandidate; input: Record<string, any>; prelim: PreliminaryScore }> = []; 357 358 for (const prelim of candidates) { 359 const actor = prelim.actor; 360 361 // Try to fetch the actual input schema 362 let schemaInfo = 'No schema available - generate reasonable defaults'; 363 try { 364 const schemaUrl = `https://api.apify.com/v2/acts/${actor.actorId}/input-schema`; 365 const schemaRes = await fetch(schemaUrl, { 366 headers: { Authorization: `Bearer ${process.env.APIFY_TOKEN}` }, 367 }); 368 if (schemaRes.ok) { 369 const schema = await schemaRes.json(); 370 schemaInfo = JSON.stringify(schema, null, 2).slice(0, 2000); 371 } 372 } catch {} 373 374 const response = await openai.chat.completions.create({ 375 model: MODEL, 376 messages: [{ 377 role: 'user', 378 content: `Generate a minimal test input for this Apify Actor. 379 380USER WANTS: "${query}" 381INTENT: ${intent} 382 383ACTOR: ${actor.title} (${actor.actorId}) 384DESCRIPTION: ${actor.description} 385 386INPUT SCHEMA: 387${schemaInfo} 388 389Generate a JSON input that: 3901. Matches what the user is looking for 3912. Uses MINIMAL settings (we want just 1-${MAX_TEST_RESULTS} results to test) 3923. Is valid for this actor's schema 393 394Return ONLY the JSON input object (no explanation):` 395 }], 396 temperature: 0.1, 397 }); 398 399 const content = response.choices[0]?.message?.content || '{}'; 400 try { 401 const input = JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim()); 402 results.push({ actor, input, prelim }); 403 } catch { 404 // Use a generic minimal input 405 results.push({ 406 actor, 407 input: { maxItems: MAX_TEST_RESULTS }, 408 prelim 409 }); 410 } 411 } 412 413 return results; 414} 415 416async function runParallelTests( 417 testInputs: Array<{ actor: ActorCandidate; input: Record<string, any>; prelim: PreliminaryScore }>, 418 client: ApifyClient, 419 timeoutSecs: number 420): Promise<TestRun[]> { 421 const testPromises = testInputs.map(async ({ actor, input }) => { 422 const startTime = Date.now(); 423 424 try { 425 log.info(`Testing: ${actor.title}`, { input }); 426 427 // Run the actor 428 const run = await client.actor(actor.actorId).call(input, { 429 timeout: timeoutSecs, 430 memory: 256, // Minimal memory 431 }); 432 433 // Fetch results from dataset 434 const { items } = await client.dataset(run.defaultDatasetId).listItems({ 435 limit: MAX_TEST_RESULTS, 436 }); 437 438 const duration = (Date.now() - startTime) / 1000; 439 440 return { 441 actorId: actor.actorId, 442 actorName: actor.title, 443 success: true, 444 input, 445 output: items, 446 duration, 447 itemCount: items.length, 448 }; 449 } catch (error: any) { 450 const duration = (Date.now() - startTime) / 1000; 451 452 return { 453 actorId: actor.actorId, 454 actorName: actor.title, 455 success: false, 456 input, 457 output: null, 458 error: error.message || 'Unknown error', 459 duration, 460 itemCount: 0, 461 }; 462 } 463 }); 464 465 return Promise.all(testPromises); 466} 467 468async function compareOutputs( 469 testResults: TestRun[], 470 query: string, 471 intent: string, 472 preliminaryRanking: PreliminaryScore[] 473): Promise<FinalResult[]> { 474 const openai = getOpenAI(); 475 476 // Build comparison prompt 477 const resultSummaries = testResults.map((t, i) => { 478 if (!t.success) { 479 return `${i + 1}. ${t.actorName} - ❌ FAILED 480 Error: ${t.error} 481 Duration: ${t.duration.toFixed(1)}s`; 482 } 483 484 const sampleOutput = JSON.stringify(t.output?.[0] || {}, null, 2).slice(0, 1000); 485 return `${i + 1}. ${t.actorName} - ✅ SUCCESS 486 Items returned: ${t.itemCount} 487 Duration: ${t.duration.toFixed(1)}s 488 Sample output: 489 ${sampleOutput}`; 490 }).join('\n\n'); 491 492 const response = await openai.chat.completions.create({ 493 model: MODEL, 494 messages: [{ 495 role: 'user', 496 content: `Compare these Apify Actor test results for the user's request. 497 498USER REQUEST: "${query}" 499INTENT: ${intent} 500 501TEST RESULTS: 502${resultSummaries} 503 504For each actor, evaluate: 5051. Did it succeed? 5062. Does the output contain what the user needs? 5073. Is the data quality good? 5084. Is it fast enough? 509 510Return JSON array with final ranking: 511[ 512 { 513 "actorName": "Name", 514 "finalScore": 9.5, 515 "verdict": "Best choice because...", 516 "strengths": ["strength1", "strength2"], 517 "weaknesses": ["weakness1"] 518 }, 519 ... 520] 521 522Sort by finalScore descending. Only return JSON:` 523 }], 524 temperature: 0.2, 525 }); 526 527 const content = response.choices[0]?.message?.content || '[]'; 528 let llmRankings: any[] = []; 529 530 try { 531 llmRankings = JSON.parse(content.replace(/```json\n?|\n?```/g, '').trim()); 532 } catch { 533 llmRankings = testResults.map(t => ({ 534 actorName: t.actorName, 535 finalScore: t.success ? 7 : 2, 536 verdict: t.success ? 'Test passed' : 'Test failed', 537 strengths: t.success ? ['Completed successfully'] : [], 538 weaknesses: t.success ? [] : [t.error || 'Failed'], 539 })); 540 } 541 542 // Merge test results with LLM rankings 543 return llmRankings.map((llm, index) => { 544 const testResult = testResults.find(t => t.actorName === llm.actorName) || testResults[index]; 545 const prelim = preliminaryRanking.find(p => p.actor.title === llm.actorName); 546 547 return { 548 rank: index + 1, 549 actorId: testResult?.actorId || '', 550 actorName: llm.actorName, 551 actorUrl: `https://apify.com/${testResult?.actorId}`, 552 username: testResult?.actorId?.split('/')[0] || '', 553 preliminaryScore: prelim?.score || 0, 554 testResult: { 555 success: testResult?.success || false, 556 itemCount: testResult?.itemCount || 0, 557 duration: testResult?.duration || 0, 558 sampleOutput: testResult?.output?.[0] || null, 559 error: testResult?.error, 560 }, 561 finalScore: llm.finalScore, 562 verdict: llm.verdict, 563 strengths: llm.strengths || [], 564 weaknesses: llm.weaknesses || [], 565 }; 566 }); 567} 568 569function formatFinalOutput( 570 query: string, 571 results: FinalResult[], 572 duration: string 573): string { 574 const line = '─'.repeat(70); 575 const doubleLine = '═'.repeat(70); 576 577 let output = ` 578${doubleLine} 579 🔍 ACTOR SCOUT v2 - TEST RESULTS 580${doubleLine} 581 582Query: "${query}" 583Tested: ${results.length} actors | Duration: ${duration}s 584 585${line} 586 🏆 WINNER 587${line} 588`; 589 590 if (results.length > 0) { 591 const winner = results[0]; 592 const statusIcon = winner.testResult.success ? '✅' : '❌'; 593 594 output += ` 595 ${winner.actorName} 596 ${winner.actorUrl} 597 598 Final Score: ${winner.finalScore}/10 599 Test Status: ${statusIcon} ${winner.testResult.success ? 'PASSED' : 'FAILED'} 600 Items Retrieved: ${winner.testResult.itemCount} 601 Test Duration: ${winner.testResult.duration.toFixed(1)}s 602 603 💡 Verdict: ${winner.verdict} 604 605 ✓ Strengths: 606${winner.strengths.map(s => ` • ${s}`).join('\n')} 607 608 ⚠ Weaknesses: 609${winner.weaknesses.map(w => ` • ${w}`).join('\n') || ' • None identified'} 610`; 611 612 if (winner.testResult.sampleOutput) { 613 output += ` 614 📦 Sample Output: 615${JSON.stringify(winner.testResult.sampleOutput, null, 2).split('\n').map(l => ' ' + l).join('\n').slice(0, 800)} 616`; 617 } 618 } 619 620 output += ` 621${line} 622 📊 ALL RESULTS 623${line} 624 625 Rank Actor Test Score Verdict 626 ──── ───────────────────────────────── ─────── ────── ───────────────── 627`; 628 629 for (const r of results) { 630 const name = r.actorName.length > 33 ? r.actorName.slice(0, 30) + '...' : r.actorName; 631 const status = r.testResult.success ? '✅ Pass' : '❌ Fail'; 632 const verdictShort = r.verdict.slice(0, 20) + (r.verdict.length > 20 ? '...' : ''); 633 output += ` #${r.rank} ${name.padEnd(33)} ${status} ${String(r.finalScore).padStart(4)}/10 ${verdictShort}\n`; 634 } 635 636 // Runner-up details 637 if (results.length > 1) { 638 output += ` 639${line} 640 📝 RUNNER-UP DETAILS 641${line} 642`; 643 for (const r of results.slice(1)) { 644 const statusIcon = r.testResult.success ? '✅' : '❌'; 645 output += ` 646 #${r.rank} ${r.actorName} (${r.finalScore}/10) 647 ${statusIcon} ${r.testResult.success ? `Returned ${r.testResult.itemCount} items in ${r.testResult.duration.toFixed(1)}s` : `Failed: ${r.testResult.error}`} 648 ${r.verdict} 649`; 650 } 651 } 652 653 output += ` 654${doubleLine} 655 Actor Scout v2 | Model: ${MODEL} | Tested ${results.length} actors 656${doubleLine} 657`; 658 659 return output; 660}

Best Actor Finder

Best Actor Finder

Makerworld Models Search Scraper

Techcrunch Articles Listing By Keyword

Google News Scraper

Dataset Query Engine

California State Bar Attorney Scraper

Airbnb Experiences Scraper 🎯

IMDB 🎞️ Extractor

Website to RSS Feed Generator

FindLaw Scraper

USA Data.gov U.S. Government's Open Data Scrape

Related articles

.dockerignore

.editorconfig

.gitignore

.prettierignore

.prettierrc

AGENTS.md

Dockerfile

eslint.config.mjs

package.json

tsconfig.json

.actor/actor.json

.actor/dataset_schema.json

.actor/input_schema.json

.actor/output_schema.json

src/main.ts

Best Actor Finder

Best Actor Finder

You might also like

Makerworld Models Search Scraper

Techcrunch Articles Listing By Keyword

Google News Scraper

Dataset Query Engine

California State Bar Attorney Scraper

Airbnb Experiences Scraper 🎯

IMDB 🎞️ Extractor

Website to RSS Feed Generator

FindLaw Scraper

USA Data.gov U.S. Government's Open Data Scrape

Related articles

.dockerignore

.editorconfig

.gitignore

.prettierignore

.prettierrc

AGENTS.md

Dockerfile

eslint.config.mjs

package.json

tsconfig.json

.actor/actor.json

.actor/dataset_schema.json

.actor/input_schema.json

.actor/output_schema.json

src/main.ts