AI Web Task Runner
Pricing
from $50.00 / 1,000 results
AI Web Task Runner
Run natural-language browser tasks with Playwright. Extract structured data, follow task-relevant links, capture screenshots, generate reports, and export reusable scripts.
Pricing
from $50.00 / 1,000 results
Rating
0.0
(0)
Developer
Solutions Smart
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Share
What does AI Web Task Runner do?
AI Web Task Runner is an Apify Actor that turns natural-language browser tasks into controlled Playwright automation runs.
It can:
- browse public websites
- follow task-relevant links
- extract structured results
- capture screenshots
- save raw HTML
- generate a human-readable report
- export a reusable Playwright Python script from the successful task trajectory
This Actor is designed for public-web automation, extraction, research, and script generation.
It is not a login bot, spam bot, comment bot, messaging bot, or anti-bot bypass tool.
How it differs from a fixed scraper
Most scrapers are built for one website and one output shape.
AI Web Task Runner is different:
- you describe the task in natural language
- the Actor opens one or more start URLs
- it follows task-relevant public pages
- it records an action trajectory
- it extracts best-effort results even without an LLM
- it can optionally use an LLM for safer planning, schema mapping, and summarization
This makes it useful for a wider class of public-web tasks than a single-purpose scraper, while still staying controlled and safety-constrained.
Main modes
run_task
Default mode.
Use this for general browser-task execution, such as:
- finding features
- locating pricing information
- summarizing a product page
- finding the correct public page for a business task
extract
Use this for structured extraction.
If you provide an extractionSchema, the Actor tries to map observed content into that schema.
research
Use this to browse task-relevant public pages and produce a summary with source URLs.
generate_script
Use this to run a task and export a reusable standalone Playwright Python script based on the successful action trajectory.
audit_lead
Optional compatibility mode.
This preserves the previous lead/contact-audit workflow and outputs company-profile style results for website contact and outreach auditing.
Input examples
Example 1: Pricing extraction
{"task": "Find the pricing plans and extract plan name, price, billing period, and main features.","startUrls": [{ "url": "https://example.com" }],"mode": "extract","extractionSchema": {"plans": [{"name": "","price": "","billingPeriod": "","features": []}]},"maxPages": 5,"captureScreenshots": true}
Example 2: Research task
{"task": "Find what services this company offers and summarize them with source URLs.","startUrls": [{ "url": "https://example.com" }],"mode": "research","maxPages": 6,"maxDepth": 2,"sameDomainOnly": true}
Example 3: Generate reusable Playwright script
{"task": "Open the website, navigate to the pricing page, and extract the pricing table.","startUrls": [{ "url": "https://example.com" }],"mode": "generate_script","generateReusableScript": true,"maxPages": 5,"captureScreenshots": true}
Example 4: Optional lead audit template
{"task": "Audit this website for contact and sales outreach readiness.","startUrls": [{ "url": "https://example.com" }],"mode": "audit_lead","maxPages": 5,"captureScreenshots": true}
Output record types
For all main modes except audit_lead, the default dataset contains only:
task_result
One final task-level record. This is the main export row and the recommended unit for pricing and CSV export.
Detailed page snapshots and extracted items are stored in the key-value store and referenced from the final task result.
It contains:
- task
- mode
- final status
- pages visited
- steps executed
- summary
- result payload
- confidence
- screenshot keys
- report key
- trajectory key
- generated script key if applicable
- page snapshots key
- extracted items key
audit_lead compatibility output
When mode = audit_lead, the default dataset contains:
company_profile
The older page records are preserved as compatibility artifacts in the key-value store, not as billable default dataset rows.
Key-value store artifacts
For the main task-runner modes, the Actor saves:
REPORT.htmlTASK_RESULT.jsonTASK_TRAJECTORY.jsonPAGE_SNAPSHOTS.jsonEXTRACTED_ITEMS.jsonGENERATED_SCRIPT_RECORD.jsonif enabledgenerated_script.pyif enabledgenerated_script_metadata.jsonif enabled- screenshots if enabled
- raw HTML if
saveHtml = true
This means the default dataset stays clean and export-friendly, while detailed execution artifacts remain available in the key-value store.
For audit_lead, compatibility artifacts include:
COMPANY_PROFILES.jsonPAGE_RECORDS.jsonREPORT.htmlrun_log.json
Generated Playwright script
When generateReusableScript = true or mode = generate_script, the Actor saves:
generated_script.pygenerated_script_metadata.json
The generated script:
- is standalone Playwright Python
- is based on the recorded safe action trajectory
- includes comments for the reproduced browser steps
- contains no secrets
- does not rely on arbitrary LLM-generated executable code
Safety model
The Actor only allows a fixed safe action set:
visit_urlclick_link_textclick_css_selectortype_textpress_keyselect_optionwaitextract_current_pagecollect_linksstop
The Actor does not allow:
- arbitrary Python execution
- shell execution
- unrestricted JavaScript execution
- login/cookie automation in V1
- posting, commenting, or messaging automation
- paywall bypass
- CAPTCHA bypass
- destructive actions
- purchases or form submissions that change state
Deterministic vs LLM-assisted mode
Deterministic mode
If llmProvider = none, the Actor still works.
It will:
- open start URLs
- collect page snapshots
- infer task keywords
- follow obvious task-relevant links within limits
- extract visible text, headings, links, tables, prices, emails, phones, and structured candidates
- produce a best-effort task result
LLM-assisted mode
If an LLM provider is configured, the Actor may use the model for:
- safe action planning
- choosing the next allowed action
- mapping page snapshots into
extractionSchema - summarizing findings
The LLM is constrained to structured JSON outputs and validated before use.
If the LLM fails, the Actor falls back to deterministic behavior.
Optional lead-audit template
The older lead/contact-auditor behavior is still available in:
"mode": "audit_lead"
That mode keeps:
- lead-audit heuristics
- page-level contact findings
- company profile aggregation
- compatibility outputs for outreach workflows
It is now an optional template mode, not the main identity of the Actor.
Example use cases
- Extract pricing plans from a SaaS website
- Extract a table from a public webpage
- Research product features from a company website
- Find contact or sales paths from a public website
- Summarize same-domain public pages related to a topic
- Generate a reusable Playwright script for a repeated browser task
Limitations
- This Actor is intended for public websites.
- It does not support login-heavy or state-changing automation in V1.
- Deterministic extraction is heuristic and best-effort.
- Some websites hide key data behind scripts, forms, or client-side UI patterns.
- LLM mode can improve planning and summarization, but it is still constrained and fallback-safe.
- It is not an anti-bot bypass product.
Troubleshooting
I only got the homepage
Increase maxDepth or start from a more relevant public page.
The final result is incomplete
Increase:
maxPagesmaxDepthtimeoutSeconds
and consider using extract mode with an extractionSchema.
The Actor did not visit the page I expected
Check:
TASK_TRAJECTORY.jsonPAGE_SNAPSHOTS.jsonEXTRACTED_ITEMS.jsonREPORT.html
These artifacts show what the Actor actually saw and did.
I want the older contact-audit behavior
Use:
"mode": "audit_lead"
The Actor should not submit forms or log in
That is expected in V1. The safety model deliberately avoids state-changing automation.