AI Web Task Runner avatar

AI Web Task Runner

Pricing

from $50.00 / 1,000 results

Go to Apify Store
AI Web Task Runner

AI Web Task Runner

Run natural-language browser tasks with Playwright. Extract structured data, follow task-relevant links, capture screenshots, generate reports, and export reusable scripts.

Pricing

from $50.00 / 1,000 results

Rating

0.0

(0)

Developer

Solutions Smart

Solutions Smart

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

What does AI Web Task Runner do?

AI Web Task Runner is an Apify Actor that turns natural-language browser tasks into controlled Playwright automation runs.

It can:

  • browse public websites
  • follow task-relevant links
  • extract structured results
  • capture screenshots
  • save raw HTML
  • generate a human-readable report
  • export a reusable Playwright Python script from the successful task trajectory

This Actor is designed for public-web automation, extraction, research, and script generation.

It is not a login bot, spam bot, comment bot, messaging bot, or anti-bot bypass tool.

How it differs from a fixed scraper

Most scrapers are built for one website and one output shape.

AI Web Task Runner is different:

  • you describe the task in natural language
  • the Actor opens one or more start URLs
  • it follows task-relevant public pages
  • it records an action trajectory
  • it extracts best-effort results even without an LLM
  • it can optionally use an LLM for safer planning, schema mapping, and summarization

This makes it useful for a wider class of public-web tasks than a single-purpose scraper, while still staying controlled and safety-constrained.

Main modes

run_task

Default mode.

Use this for general browser-task execution, such as:

  • finding features
  • locating pricing information
  • summarizing a product page
  • finding the correct public page for a business task

extract

Use this for structured extraction.

If you provide an extractionSchema, the Actor tries to map observed content into that schema.

research

Use this to browse task-relevant public pages and produce a summary with source URLs.

generate_script

Use this to run a task and export a reusable standalone Playwright Python script based on the successful action trajectory.

audit_lead

Optional compatibility mode.

This preserves the previous lead/contact-audit workflow and outputs company-profile style results for website contact and outreach auditing.

Input examples

Example 1: Pricing extraction

{
"task": "Find the pricing plans and extract plan name, price, billing period, and main features.",
"startUrls": [
{ "url": "https://example.com" }
],
"mode": "extract",
"extractionSchema": {
"plans": [
{
"name": "",
"price": "",
"billingPeriod": "",
"features": []
}
]
},
"maxPages": 5,
"captureScreenshots": true
}

Example 2: Research task

{
"task": "Find what services this company offers and summarize them with source URLs.",
"startUrls": [
{ "url": "https://example.com" }
],
"mode": "research",
"maxPages": 6,
"maxDepth": 2,
"sameDomainOnly": true
}

Example 3: Generate reusable Playwright script

{
"task": "Open the website, navigate to the pricing page, and extract the pricing table.",
"startUrls": [
{ "url": "https://example.com" }
],
"mode": "generate_script",
"generateReusableScript": true,
"maxPages": 5,
"captureScreenshots": true
}

Example 4: Optional lead audit template

{
"task": "Audit this website for contact and sales outreach readiness.",
"startUrls": [
{ "url": "https://example.com" }
],
"mode": "audit_lead",
"maxPages": 5,
"captureScreenshots": true
}

Output record types

For all main modes except audit_lead, the default dataset contains only:

task_result

One final task-level record. This is the main export row and the recommended unit for pricing and CSV export.

Detailed page snapshots and extracted items are stored in the key-value store and referenced from the final task result.

It contains:

  • task
  • mode
  • final status
  • pages visited
  • steps executed
  • summary
  • result payload
  • confidence
  • screenshot keys
  • report key
  • trajectory key
  • generated script key if applicable
  • page snapshots key
  • extracted items key

audit_lead compatibility output

When mode = audit_lead, the default dataset contains:

  • company_profile

The older page records are preserved as compatibility artifacts in the key-value store, not as billable default dataset rows.

Key-value store artifacts

For the main task-runner modes, the Actor saves:

  • REPORT.html
  • TASK_RESULT.json
  • TASK_TRAJECTORY.json
  • PAGE_SNAPSHOTS.json
  • EXTRACTED_ITEMS.json
  • GENERATED_SCRIPT_RECORD.json if enabled
  • generated_script.py if enabled
  • generated_script_metadata.json if enabled
  • screenshots if enabled
  • raw HTML if saveHtml = true

This means the default dataset stays clean and export-friendly, while detailed execution artifacts remain available in the key-value store.

For audit_lead, compatibility artifacts include:

  • COMPANY_PROFILES.json
  • PAGE_RECORDS.json
  • REPORT.html
  • run_log.json

Generated Playwright script

When generateReusableScript = true or mode = generate_script, the Actor saves:

  • generated_script.py
  • generated_script_metadata.json

The generated script:

  • is standalone Playwright Python
  • is based on the recorded safe action trajectory
  • includes comments for the reproduced browser steps
  • contains no secrets
  • does not rely on arbitrary LLM-generated executable code

Safety model

The Actor only allows a fixed safe action set:

  • visit_url
  • click_link_text
  • click_css_selector
  • type_text
  • press_key
  • select_option
  • wait
  • extract_current_page
  • collect_links
  • stop

The Actor does not allow:

  • arbitrary Python execution
  • shell execution
  • unrestricted JavaScript execution
  • login/cookie automation in V1
  • posting, commenting, or messaging automation
  • paywall bypass
  • CAPTCHA bypass
  • destructive actions
  • purchases or form submissions that change state

Deterministic vs LLM-assisted mode

Deterministic mode

If llmProvider = none, the Actor still works.

It will:

  • open start URLs
  • collect page snapshots
  • infer task keywords
  • follow obvious task-relevant links within limits
  • extract visible text, headings, links, tables, prices, emails, phones, and structured candidates
  • produce a best-effort task result

LLM-assisted mode

If an LLM provider is configured, the Actor may use the model for:

  • safe action planning
  • choosing the next allowed action
  • mapping page snapshots into extractionSchema
  • summarizing findings

The LLM is constrained to structured JSON outputs and validated before use.

If the LLM fails, the Actor falls back to deterministic behavior.

Optional lead-audit template

The older lead/contact-auditor behavior is still available in:

"mode": "audit_lead"

That mode keeps:

  • lead-audit heuristics
  • page-level contact findings
  • company profile aggregation
  • compatibility outputs for outreach workflows

It is now an optional template mode, not the main identity of the Actor.

Example use cases

  • Extract pricing plans from a SaaS website
  • Extract a table from a public webpage
  • Research product features from a company website
  • Find contact or sales paths from a public website
  • Summarize same-domain public pages related to a topic
  • Generate a reusable Playwright script for a repeated browser task

Limitations

  • This Actor is intended for public websites.
  • It does not support login-heavy or state-changing automation in V1.
  • Deterministic extraction is heuristic and best-effort.
  • Some websites hide key data behind scripts, forms, or client-side UI patterns.
  • LLM mode can improve planning and summarization, but it is still constrained and fallback-safe.
  • It is not an anti-bot bypass product.

Troubleshooting

I only got the homepage

Increase maxDepth or start from a more relevant public page.

The final result is incomplete

Increase:

  • maxPages
  • maxDepth
  • timeoutSeconds

and consider using extract mode with an extractionSchema.

The Actor did not visit the page I expected

Check:

  • TASK_TRAJECTORY.json
  • PAGE_SNAPSHOTS.json
  • EXTRACTED_ITEMS.json
  • REPORT.html

These artifacts show what the Actor actually saw and did.

I want the older contact-audit behavior

Use:

"mode": "audit_lead"

The Actor should not submit forms or log in

That is expected in V1. The safety model deliberately avoids state-changing automation.