# Best AI Web Scraper (`hgservices/best-ai-web-scraper`) Actor

Extract any data from any website by simply describing what you want in plain English. AI-powered web scraping with no code, no selectors, and no per-site setup.

- **URL**: https://apify.com/hgservices/best-ai-web-scraper.md
- **Developed by:** [Harish Garg](https://apify.com/hgservices) (community)
- **Categories:** AI, Agents, MCP servers
- **Stats:** 1 total users, 0 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $21.00 / 1,000 web page scrapeds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Best AI Web Scraper

**Best AI Web Scraper extracts any data from any website — you just describe what you want in plain English.** No selectors, no code, no per-site setup. Point it at one or more URLs, type something like *"Extract the product name, price, and customer rating,"* and get back clean, structured data ready to download or send to your apps.

### What does Best AI Web Scraper do?

It reads the web pages you give it and pulls out exactly the information you ask for — as neat, structured data. Instead of building a custom scraper for every site, you describe the fields you want in everyday language and the AI does the rest.

You can use it to extract things like:

- **Product details** — name, price, rating, availability, SKU
- **Contact and lead info** — names, job titles, companies, emails
- **Articles and news** — title, author, date, summary
- **Listings** — real estate, jobs, events, directories
- **Anything else** on a page you can describe in a sentence

It handles modern, JavaScript-heavy websites automatically and can get past common blocking, so you spend your time using the data instead of fighting the page. Running on the Apify platform also gives you scheduling, an API, integrations, and run monitoring out of the box.

### Why use Best AI Web Scraper?

- **No scraper to maintain.** Because it reads pages with AI instead of fixed rules, it keeps working when a website changes its layout — no broken selectors, no rebuilds, no upkeep.
- **No technical skills required.** Describe the data once in plain English — no code and no technical setup.
- **One prompt, many sites.** The same prompt adapts across different page layouts, so you don't build a separate scraper for every website.
- **Handles tough pages.** Dynamic, JavaScript-heavy sites and common anti-bot blocks are dealt with automatically.
- **Ready-to-use output.** Clean, structured data you can download as JSON, CSV, or Excel, or pipe into databases, spreadsheets, and other Apify Actors.

Typical use cases: e-commerce price and review monitoring, lead and contact enrichment, news and article extraction, real-estate and job listings, and one-off research scrapes.

### How to use Best AI Web Scraper

1. Add one or more **Start URLs** — the pages you want to scrape.
2. Write an **Extraction Prompt** describing the fields you want (e.g. *"Extract the article title, author, publish date, and a one-sentence summary"*).
3. (Optional) Turn on **Proxy** for sites that block, and adjust **Max URLs** and **Max Concurrency**.
4. Click **Start**, then download your results from the **Output** tab.

That's it — no API keys and no setup. Extraction works out of the box.

### How to write a good prompt

The prompt is where you tell the scraper exactly what to pull from each page. A clear prompt gives clean, consistent results — here's how to write one.

- **Name the exact fields you want.** *"Extract the product name, price, currency, and average star rating"* works far better than *"get the product info."*
- **Ask for a list when a page has many items.** For a directory or search-results page: *"Extract a list of all job postings, each with the job title, company, location, and salary."*
- **Say how you want values formatted.** For example: *"Return the price as a number without the currency symbol,"* or *"Return the date as YYYY-MM-DD."*
- **Keep it focused.** Only ask for the fields you actually need — fewer fields are faster, cheaper, and more accurate.

| What you're scraping | Example prompt |
| --- | --- |
| Product page | "Extract the product name, price, currency, availability, and average rating." |
| Article or blog post | "Extract the title, author, publish date, and a one-sentence summary." |
| Directory or listings | "Extract a list of all listings, each with the name, address, phone number, and website." |
| Company / profile page | "Extract the company name, industry, employee count, and headquarters location." |

### Input

| Field | Type | Description |
| --- | --- | --- |
| `startUrls` | array | The URLs you want to scrape. |
| `prompt` | string | A plain-English description of the data to extract. |
| `proxyEnabled` | boolean | Turn on proxy for sites that block direct access. Default `false`. |
| `maxRequestsPerCrawl` | integer | Maximum number of URLs to process. Default `10`. |
| `maxConcurrency` | integer | How many URLs to process at the same time. Default `3`. |

Example input:

```json
{
    "startUrls": [{ "url": "https://example.com/product/123" }],
    "prompt": "Extract the product name, price, and average customer rating.",
    "proxyEnabled": false,
    "maxRequestsPerCrawl": 10,
    "maxConcurrency": 3
}
````

### Output

Every result is a clean, flat row. A detail page (one product, one article) produces a single row; a **listing page** (search results, a directory, a press-release feed) produces **one row per item** — so a single URL can return hundreds of records. The fields are exactly the ones you described in your prompt.

The **Output** tab shows a formatted, shareable report table the moment your run finishes. You can also download the raw dataset as JSON, CSV, Excel, or XML, or pull it through the API.

Scraping a single product page:

```json
{
    "url": "https://example.com/product/123",
    "productName": "Acme Wireless Headphones",
    "price": 79.99,
    "rating": 4.6
}
```

Scraping a listing page — one row per item:

```json
[
    { "url": "https://example.com/news", "date": "May 07, 2026", "link": "Headline one" },
    { "url": "https://example.com/news", "date": "May 05, 2026", "link": "Headline two" }
]
```

#### Data table

| Field | Description |
| --- | --- |
| `url` | The source page the record came from. |
| *(your fields)* | One column per field you asked for in the prompt — e.g. `productName`, `price`, `date`, `link`. |
| `error` | Present only on rows for URLs that could not be scraped. |

### How much does it cost to scrape?

You pay per result, not per hour — there's no separate AI key or subscription to buy. Pricing is simple and predictable: a fixed amount for **each page scraped** and a fixed amount for **each row of data returned**. A heavy, JavaScript-rich page costs the same as a simple one, so your bill depends only on how many pages you scrape and how many records come back — never on how the page is built. See the **Pricing** section for the exact rates.

To keep runs cheap:

- **Scrape only the pages you need.** Each page you process is one page charge, so set a sensible **Max URLs**.
- **Mind listing pages.** A search-results or directory page can return hundreds of rows from a single URL — and each row is charged — so target the pages that hold just the records you want.
- **Write a focused prompt.** Asking only for the fields you need keeps results clean and avoids paying for rows you'll throw away.

### Tips for best results

- **Be specific in your prompt.** Naming exact fields (*"price," "SKU," "rating"*) gives cleaner results than vague requests.
- **Start small.** Test with a couple of URLs before scaling up **Max URLs**.
- **Turn on proxy only when needed.** It helps with sites that block, but plain runs are faster and cheaper.

### Automate and integrate

Because it runs on Apify, this scraper plugs into the tools you already use:

- **Schedule recurring runs.** Scrape on any schedule — hourly, daily, or weekly — and have fresh data waiting automatically.
- **Connect to low-code tools.** Send results straight into n8n, Make, Zapier, Google Sheets, Slack, and more through Apify's built-in integrations — no glue code required.
- **Call it from your own apps.** Trigger runs and pull data through the Apify API in any programming language.
- **Use it with AI agents.** The Actor is available through the Apify MCP server, so AI assistants and agents can run it and use the structured results directly.

### How it works (for the curious)

Under the hood, the scraper is built for speed and cost efficiency. It first tries a fast, lightweight fetch that handles most static sites instantly. If a page needs JavaScript or appears blocked, it automatically upgrades to a full stealth browser, and — if you've enabled proxy — to a browser routed through Apify Proxy for the hardest sites. The page is then cleaned up and passed to an AI model that returns your requested fields as structured data. You don't have to configure any of this; it happens automatically per page.

### FAQ, disclaimers, and support

**Is web scraping legal?** Scraping publicly available data is generally legal, but you are responsible for complying with each target site's Terms of Service, robots.txt, and applicable laws (including data-protection rules). Do not scrape personal or copyrighted data without a lawful basis.

**A page came back empty or failed — why?** Some sites are heavily protected and may block automated access, show a CAPTCHA, or have no extractable content. Turning on **Proxy** resolves many of these cases.

**Can I get a custom version?** Yes — for tailored fields, login-protected pages, or pagination, open the **Issues** tab to request changes or report problems.

# Actor input Schema

## `startUrls` (type: `array`):

URLs to scrape.

## `prompt` (type: `string`):

Describe what data to extract. E.g. 'Extract the product name, price, and customer rating'

## `proxyEnabled` (type: `boolean`):

Use Apify proxy for Tier 3 escalation on blocked pages.

## `maxRequestsPerCrawl` (type: `integer`):

Maximum number of URLs to process.

## `maxConcurrency` (type: `integer`):

How many URLs to process in parallel.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://books.toscrape.com/"
    }
  ],
  "prompt": "Extract each book's title, price, star rating, and whether it's in stock.",
  "proxyEnabled": false,
  "maxRequestsPerCrawl": 10,
  "maxConcurrency": 3
}
```

# Actor output Schema

## `report` (type: `string`):

No description

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://books.toscrape.com/"
        }
    ],
    "prompt": "Extract each book's title, price, star rating, and whether it's in stock."
};

// Run the Actor and wait for it to finish
const run = await client.actor("hgservices/best-ai-web-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "startUrls": [{ "url": "https://books.toscrape.com/" }],
    "prompt": "Extract each book's title, price, star rating, and whether it's in stock.",
}

# Run the Actor and wait for it to finish
run = client.actor("hgservices/best-ai-web-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://books.toscrape.com/"
    }
  ],
  "prompt": "Extract each book'\''s title, price, star rating, and whether it'\''s in stock."
}' |
apify call hgservices/best-ai-web-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=hgservices/best-ai-web-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Best AI Web Scraper",
        "description": "Extract any data from any website by simply describing what you want in plain English. AI-powered web scraping with no code, no selectors, and no per-site setup.",
        "version": "1.0",
        "x-build-id": "mdI4YOrGZPPtdXK0h"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/hgservices~best-ai-web-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-hgservices-best-ai-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/hgservices~best-ai-web-scraper/runs": {
            "post": {
                "operationId": "runs-sync-hgservices-best-ai-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/hgservices~best-ai-web-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-hgservices-best-ai-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "startUrls",
                    "prompt"
                ],
                "properties": {
                    "startUrls": {
                        "title": "Start URLs",
                        "type": "array",
                        "description": "URLs to scrape.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "prompt": {
                        "title": "Extraction Prompt",
                        "type": "string",
                        "description": "Describe what data to extract. E.g. 'Extract the product name, price, and customer rating'"
                    },
                    "proxyEnabled": {
                        "title": "Enable Proxy",
                        "type": "boolean",
                        "description": "Use Apify proxy for Tier 3 escalation on blocked pages.",
                        "default": false
                    },
                    "maxRequestsPerCrawl": {
                        "title": "Max URLs",
                        "type": "integer",
                        "description": "Maximum number of URLs to process.",
                        "default": 10
                    },
                    "maxConcurrency": {
                        "title": "Max Concurrency",
                        "type": "integer",
                        "description": "How many URLs to process in parallel.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
