# OpenAI Web Automation (`dtrungtin/openai-web-automation`) Actor

Controls a real browser with an OpenAI model to interact with web pages and extract structured data — no CSS selectors or page-specific scraping code required.

- **URL**: https://apify.com/dtrungtin/openai-web-automation.md
- **Developed by:** [Tin](https://apify.com/dtrungtin) (community)
- **Categories:** AI, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

$60.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Automate Web Page Using OpenAI

An [Apify Actor](https://apify.com/actors) that controls a real browser with an OpenAI model to interact with web pages and extract structured data — no CSS selectors or page-specific scraping code required.

You give it a URL, tell it what to do (e.g. "search for xiaomi and click the first result"), and describe what data to extract. The agent takes screenshots at each step, sends them to OpenAI, performs the requested actions, and finally saves the extracted data to a dataset.

Built on [Crawlee](https://crawlee.dev/) + Puppeteer and the [Apify SDK](https://docs.apify.com/sdk/js/).

---

### How It Works

1. The browser opens the **Start URL**.
2. The **Interaction Prompt** is sent to the OpenAI model along with a screenshot.
3. The model issues browser actions (click, type, scroll, etc.) and the agent executes them — repeating until the task is done or **Max Interaction Steps** is reached.
4. A final screenshot and the cleaned page HTML are sent to the model.
5. The model extracts the data described in **Data to Extract** and saves it to the dataset.

Each intermediate screenshot is also saved to the key-value store so you can inspect every step the agent took.

---

### Input

| Field | Type | Required | Description |
|---|---|---|---|
| `startUrls` | array | ✅ | Web pages to open. Each item must have a `url` key. |
| `prompt` | string | ✅ | What the AI should do on the page (clicks, searches, form fills, etc.). |
| `expectedOutput` | string | ✅ | What data to extract from the final page state. |
| `outputSchema` | JS function | — | Zod schema function for typed, validated output. |
| `openAiModel` | string | — | OpenAI model to use. Default: `gpt-5.4`. |
| `maxSteps` | integer | — | Max browser actions before stopping. Default: `10`. |
| `countryCode` | string | — | Proxy country code (`US`, `DE`, `VN`, `FR`, `GB`). Default: `US`. |
| `proxyConfig` | object | — | Advanced proxy settings (Apify Residential by default). |

#### Basic Example — eBay Product

```json
{
    "startUrls": [{ "url": "https://tradingeconomics.com/" }],
    "prompt": "Find the gold price. Close any popup if it appears.",
    "expectedOutput": "Extract the gold price."
}
````

#### Advanced Example — eBay Search with Typed Schema

```json
{
    "startUrls": [{ "url": "https://www.ebay.com" }],
    "prompt": "Search by keyword 'xiaomi' and click on the first item in the search results. Close any popup if it appears.",
    "expectedOutput": "Extract the title, price and condition of the eBay item.",
    "outputSchema": "(z) => z.object({ title: z.string(), price: z.string(), condition: z.string() })",
    "maxSteps": 15,
    "countryCode": "US"
}
```

### Output

The dataset receives **one record per agent step**, plus one final record with the extracted data.

**Intermediate step record** (written after every browser action):

```json
{
    "url": "https://www.ebay.com/sch/i.html?_nkw=xiaomi",
    "title": "xiaomi items for sale | eBay",
    "screenshotSentToOpenAiUrl": "https://api.apify.com/v2/key-value-stores/xxx/records/screenshot_<uuid>.png"
}
```

**Final record** (written after extraction is complete):

```json
{
    "url": "https://www.ebay.com/p/3072579174?iid=186372216016&var=694422418597",
    "title": "Samsung Galaxy S22 - 128 GB - Phantom Black (Unlocked)",
    "screenshotSentToOpenAiUrl": "https://api.apify.com/v2/key-value-stores/xxx/records/final_screenshot_<uuid>.png",
    "data": {
        "title": "Samsung Galaxy S22 - 128GB - Phantom Black (Unlocked)",
        "price": "$156.99",
        "condition": "Very Good – Refurbished"
    }
}
```

So a run with 5 interaction steps will produce 6 dataset records total (5 step records + 1 final record). The `screenshotSentToOpenAiUrl` in each record links directly to the screenshot the model saw at that step, letting you replay exactly what the agent did.

### Tips

- **Start simple.** Use a direct product URL and a short prompt first, then add complexity.
- **Be specific in `expectedOutput`.** The more precise your description, the better the extraction (e.g. "Extract the price as a number without the currency symbol" vs "Extract the price").
- **Use `outputSchema`** when you need consistent field types across many pages — Zod will validate and coerce the model's output.
- **Increase `maxSteps`** for multi-page flows (search → click → detail page needs at least 5–10 steps).
- **Prefer shorter runs.** Each Actor startup has overhead; one longer run is more efficient than many short ones.

#### Related actors

OpenAI Web Scraper `dtrungtin/openai-web-scraper`

#### Epilogue

Thank you for trying my actor. I will be very glad for a feedback that you can send to my email `dtrungtin@gmail.com`.

# Actor input Schema

## `startUrls` (type: `array`):

One or more web pages the browser will open before the AI agent begins interacting. Add multiple URLs to run the same task on several pages in sequence.

## `prompt` (type: `string`):

Instructions for what the AI agent should do on the page — e.g. search for a product, fill a form, or click through a checkout flow. The browser is already open; do not include navigation steps like 'go to URL'.

## `expectedOutput` (type: `string`):

Describe the data you want extracted from the final page state. The AI will read the screenshot and page HTML, then return this data as JSON. Example: 'Extract the product title, price, and stock status.'

## `outputSchema` (type: `string`):

Optional. A JavaScript function that receives the Zod library (`z`) and returns a Zod schema. When provided, the extracted JSON will be validated and shaped to match the schema. Leave as `(z) => null` to use free-form JSON output. See https://zod.dev/ for schema syntax.

## `maxSteps` (type: `integer`):

Maximum number of browser actions (clicks, typing, scrolling, etc.) the AI agent may perform before stopping. Increase for complex multi-step tasks; decrease to limit cost.

## `countryCode` (type: `string`):

Country for the residential proxy exit node. Use this when the target site serves different content by region.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://tradingeconomics.com/"
    }
  ],
  "prompt": "Search by keyword 'xiaomi' and click on the first item in search results. Close any popup if it appears.",
  "expectedOutput": "Extract the title, price and condition of the eBay item.",
  "outputSchema": "(z) => { return null; }",
  "maxSteps": 10,
  "countryCode": "US"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://tradingeconomics.com/"
        }
    ],
    "outputSchema": (z) => { return null; }
};

// Run the Actor and wait for it to finish
const run = await client.actor("dtrungtin/openai-web-automation").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://tradingeconomics.com/" }],
    "outputSchema": "(z) => { return null; }",
}

# Run the Actor and wait for it to finish
run = client.actor("dtrungtin/openai-web-automation").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://tradingeconomics.com/"
    }
  ],
  "outputSchema": "(z) => { return null; }"
}' |
apify call dtrungtin/openai-web-automation --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=dtrungtin/openai-web-automation",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "OpenAI Web Automation",
        "description": "Controls a real browser with an OpenAI model to interact with web pages and extract structured data — no CSS selectors or page-specific scraping code required.",
        "version": "0.0",
        "x-build-id": "458ymbph4YUHm6h8H"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/dtrungtin~openai-web-automation/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-dtrungtin-openai-web-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/dtrungtin~openai-web-automation/runs": {
            "post": {
                "operationId": "runs-sync-dtrungtin-openai-web-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/dtrungtin~openai-web-automation/run-sync": {
            "post": {
                "operationId": "run-sync-dtrungtin-openai-web-automation",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls",
                    "prompt",
                    "expectedOutput"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "One or more web pages the browser will open before the AI agent begins interacting. Add multiple URLs to run the same task on several pages in sequence.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "prompt": {
                        "title": "Interaction Prompt",
                        "type": "string",
                        "description": "Instructions for what the AI agent should do on the page — e.g. search for a product, fill a form, or click through a checkout flow. The browser is already open; do not include navigation steps like 'go to URL'.",
                        "default": "Search by keyword 'xiaomi' and click on the first item in search results. Close any popup if it appears."
                    },
                    "expectedOutput": {
                        "title": "Data to Extract",
                        "type": "string",
                        "description": "Describe the data you want extracted from the final page state. The AI will read the screenshot and page HTML, then return this data as JSON. Example: 'Extract the product title, price, and stock status.'",
                        "default": "Extract the title, price and condition of the eBay item."
                    },
                    "outputSchema": {
                        "title": "Output Schema (Zod)",
                        "type": "string",
                        "description": "Optional. A JavaScript function that receives the Zod library (`z`) and returns a Zod schema. When provided, the extracted JSON will be validated and shaped to match the schema. Leave as `(z) => null` to use free-form JSON output. See https://zod.dev/ for schema syntax."
                    },
                    "maxSteps": {
                        "title": "Max Interaction Steps",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Maximum number of browser actions (clicks, typing, scrolling, etc.) the AI agent may perform before stopping. Increase for complex multi-step tasks; decrease to limit cost.",
                        "default": 10
                    },
                    "countryCode": {
                        "title": "Proxy Country",
                        "enum": [
                            "US",
                            "DE",
                            "VN",
                            "FR",
                            "GB"
                        ],
                        "type": "string",
                        "description": "Country for the residential proxy exit node. Use this when the target site serves different content by region.",
                        "default": "US"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
