# AI Web Scraper Test (`apify/ai-web-scraper-test`) Actor

AI-first web scraper that extracts structured data from any website using natural-language prompts.
No programming knowledge required.
No hard-coded logic that breaks when a website changes.

- **URL**: https://apify.com/apify/ai-web-scraper-test.md
- **Developed by:** [Apify](https://apify.com/apify) (community)
- **Categories:** AI
- **Stats:** 2 total users, 1 monthly users, 69.2% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $25.00 / 1,000 page extractions

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## AI Web Scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. 
No programming knowledge required.
No hard-coded logic that breaks when a website changes.

### What is AI Web Scraper?

This Actor combines web scraping with large language model (LLM) technologies. It visits all the URLs you add to the **Start URLs** list and uses the **Page extraction prompt** to extract the data you need from each page.

This scraper "sees" a website like a human does, so you can describe what you want in plain language. Using LLMs also makes the scraper resilient to website changes. While traditional scrapers rely on hard-coded logic, the AI Web Scraper adapts automatically.

While you focus on the prompt, the Actor handles the technical heavy lifting:

- **Browser emulation:** Full support for dynamic, JavaScript-heavy websites.
- **Smart anti-blocking:** Integrated proxy pools and browser fingerprinting to access any website.
- **LLM integration:** No external LLM subscription required. AI tokens are included in the Actor cost.

**Note:** If you don't provide a page extraction prompt, the Actor returns the content of each page as Markdown.

### How to use this Actor

1. Click **Try for free** in the top-right corner.
2. Set up the input (see below).
3. Click **Save & Start**.
4. Wait a few seconds and your data will be ready in the **Output** tab.

#### Input

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `startUrls` | `array` | Yes | - | URLs to start from. |
| `prompt` | `string` | No | `""` | Extraction instruction in natural language. This prompt runs on every page. |

#### How to write a good prompt

A well-written prompt is key to getting good results with this Actor. The examples below are based on [Apify Store](https://apify.com/store).

**Be specific about what data you want:**
````

✅ Good: Extract all Apify Actors from this page. For each Actor, save its name and description.
❌ Bad: Extract all Actor information.

```

**Avoid using colors to describe elements:**
```

✅ Good: Get the link in the "Go to Console" button.
❌ Bad: Get the link in the black button.

```

**Be specific about element location - use "left", "right", "below", and "above":**
```

✅ Good: Get the list of Actors below the $1M Challenge picks section.
❌ Bad: Get the list of $1M Challenge picks Actors.

````

#### Schedule recurring scrapes

To schedule regular data extraction, use the Apify built-in scheduler.

[How to schedule an Actor?](https://www.youtube.com/watch?v=1jI7WcVQmwM)

#### Using low-code tools like n8n

You can embed this Actor in your automation workflow using low-code tools like n8n. The Apify platform integrates with Zapier, Make, n8n, Google Sheets, Google Drive, and many others.

You can also use [webhooks](https://docs.apify.com/platform/integrations/webhooks) to trigger actions automatically when a run finishes.

### Why use the AI Web Scraper?

#### Get structured data without custom development

You don't need to know what a CSS selector is. The AI handles that for you. Just provide a prompt in plain language.

#### Use one prompt for multiple websites

A traditional scraper requires custom code for every page. With AI Web Scraper, you can reuse the same prompt across multiple websites.

For example, to find the author of blog posts across different sites:
```json
"startUrls": [
  { "url": "https://blog.apify.com/web-scraping-report-2026/" },
  { "url": "https://crawlee.dev/blog/crawlee-for-python-v1" }
],
"prompt": "Return the blog post name, author name, and publication date."
````

Expected output:

```json
[
  {
    "url": "https://blog.apify.com/web-scraping-report-2026/",
    "data": {
      "blog_post_name": "State of web scraping report 2026",
      "author_name": "Theo Vasilis",
      "publication_date": "Jan 29, 2026"
    }
  },
  {
    "url": "https://crawlee.dev/blog/crawlee-for-python-v1",
    "data": {
      "blog_post_name": "Crawlee for Python v1",
      "author_name": "Vlada Dusek",
      "publication_date": "September 15, 2025"
    }
  }
]
```

#### Typical use cases

AI Web Scraper works best on websites with varied page structures, where building a traditional scraper would be too expensive:

- Blogs
- E-commerce websites
- Real estate listings
- Job boards

It's also a great fit for monitoring websites that update frequently. For example, if you want to track a competitor's pricing page that gets redesigned every few weeks.

### AI Web Scraper and an MCP server

With the Apify API, you can use almost any Actor with a Model Context Protocol (MCP) server. You can connect using clients like Claude Desktop and LibreChat, or build your own. Read more about how to [set up Apify Actors with MCP](https://blog.apify.com/how-to-use-mcp/).

For AI Web Scraper, go to the [MCP tab](https://apify.com/apify/ai-scraper/api/mcp) and follow these steps:

1. Start a Server-Sent Events (SSE) session to receive a `sessionId`.
2. Send an API message using that `sessionId` to trigger the scraper. The message starts AI Web Scraper with the provided input.
3. Confirm that you receive an `Accepted` response.

### FAQ

#### Why choose AI Web Scraper over a traditional scraper?

Here's a quick comparison with Cheerio Scraper and Playwright Scraper:

| | AI Web Scraper | [Cheerio Scraper](https://apify.com/apify/cheerio-scraper) | [Playwright Scraper](https://apify.com/apify/playwright-scraper) |
|---|---|---|---|
| Requires programming skills | No | Yes | Yes |
| Adapts to website changes | Yes | No | No |
| Reads JavaScript and dynamic content | Yes | No | Yes |
| Proxy pool and anti-blocking | Yes | Yes | Yes |
| Cost per run | $$$ | $ | $$ |

#### Can I control the crawling behavior?

AI Web Scraper doesn't currently support pagination logic. You can provide multiple start URLs instead.

**Pro tip:** Chain two Actors together - use one to extract links and a second to extract data from each page.

#### Do I need a ChatGPT subscription?

No. AI tokens are included in the Actor cost. No external setup needed.

#### Can I use proxies?

We use [Apify Proxy](https://apify.com/proxy) automatically in this Actor.

#### How do I access and export the scraped data?

Scraped results are stored in a dataset. You can export it in JSON, XML, CSV, or Excel format.

Download results via the Apify API or Apify Console. You can also push data to tools like Make, n8n, or Zapier using the available integrations.

#### Which scraping tool is best for beginners?

If you don't have programming skills, an AI scraper is the best starting point. AI Web Scraper lets you extract structured data from any website using a plain-language prompt.

For a more technical introduction to web scraping, check out [Apify Academy](https://docs.apify.com/academy).

#### What is Stagehand?

Stagehand is the AI browser automation framework used by this Actor. It brings natural language to web scraping - instead of working with CSS selectors, you describe the web element you're looking for in plain language.

Stagehand is fully [compatible with Playwright](https://docs.stagehand.dev/v3/integrations/playwright#playwright), so you can add an AI layer to existing Playwright scripts. It's also integrated with the [Crawlee library](https://crawlee.dev/js/docs/guides/stagehand-crawler-guide), making it easy to deploy on the Apify platform.

# Actor input Schema

## `startUrls` (type: `array`):

URLs to scrape.

## `prompt` (type: `string`):

Natural language instruction for what data to extract from the page. If empty, the actor returns page content as Markdown.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://apify.com/store"
    }
  ],
  "prompt": "Extract all Apify actors from this page, save their names, descriptions, number of users, and rating"
}
```

# Actor output Schema

## `results` (type: `string`):

No description

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://apify.com/store"
        }
    ],
    "prompt": "Extract all Apify actors from this page, save their names, descriptions, number of users, and rating"
};

// Run the Actor and wait for it to finish
const run = await client.actor("apify/ai-web-scraper-test").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://apify.com/store" }],
    "prompt": "Extract all Apify actors from this page, save their names, descriptions, number of users, and rating",
}

# Run the Actor and wait for it to finish
run = client.actor("apify/ai-web-scraper-test").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://apify.com/store"
    }
  ],
  "prompt": "Extract all Apify actors from this page, save their names, descriptions, number of users, and rating"
}' |
apify call apify/ai-web-scraper-test --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=apify/ai-web-scraper-test",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "AI Web Scraper Test",
        "description": "AI-first web scraper that extracts structured data from any website using natural-language prompts. \nNo programming knowledge required.\nNo hard-coded logic that breaks when a website changes.",
        "version": "0.0",
        "x-build-id": "uItADg8FBPTZfKp1n"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/apify~ai-web-scraper-test/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-apify-ai-web-scraper-test",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/apify~ai-web-scraper-test/runs": {
            "post": {
                "operationId": "runs-sync-apify-ai-web-scraper-test",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/apify~ai-web-scraper-test/run-sync": {
            "post": {
                "operationId": "run-sync-apify-ai-web-scraper-test",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "URLs to scrape.",
                        "default": [
                            {
                                "url": "https://apify.com/store"
                            }
                        ],
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "prompt": {
                        "title": "Page extraction prompt",
                        "type": "string",
                        "description": "Natural language instruction for what data to extract from the page. If empty, the actor returns page content as Markdown.",
                        "default": ""
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
