Deprecated

Pricing

from $0.00005 / actor start

See alternative Actors

Go to Apify Store

AI Web Crawler

Deprecated

See alternative Actors

Extract structured data from any website using AI. No custom selectors needed.

Pricing

from $0.00005 / actor start

Rating

0.0

(0)

Developer

Angel Rojo

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🤖 AI Web Scraper — GPT-Powered Data Extraction

Extract structured data from any website using AI. No custom selectors needed — just a URL and natural language instructions. Supports OpenAI, OpenRouter, LM Studio, Ollama, Groq, and any OpenAI-compatible API.

🎯 What It Does

AI Web Scraper uses GPT-4o-mini (or GPT-4o/GPT-4.1) to intelligently extract structured data from any webpage. Unlike traditional scrapers that require specific CSS selectors or XPath expressions, this Actor understands natural language instructions and adapts to any website structure.

✨ Key Features

🧠 Natural Language Extraction — Describe what you want in plain English, GPT does the rest
🔄 Universal Compatibility — Works on any website without custom coding per site
📊 Structured JSON Output — Returns clean, parseable data pushed to Apify Dataset
📄 Multi-Page Support — Automatic pagination handling (up to 50 pages)
🚀 Fast Processing — Pages processed in seconds with headless Playwright
🔒 Anti-Detection — Blocks images/ads, uses realistic user-agent
⚡ Multiple AI Models — gpt-4o-mini, gpt-4o, gpt-4.1 (or any OpenAI-compatible API)

💡 Use Cases

Industry	What to Extract
🛒 E-commerce	Product names, prices, ratings, descriptions, reviews count
🏠 Real Estate	Listings, prices, locations, agent info, property details
📧 Lead Generation	Company names, emails, phone numbers, social profiles
💼 Job Boards	Job titles, salaries, companies, locations, requirements
📰 Research	Articles, papers, reviews, social media content
🔍 SEO	Meta tags, headings, content structure, internal links

📥 Input Schema

Field	Type	Required	Default	Description
`url`	`string`	✅	—	Target URL to scrape
`prompt`	`string`	✅	—	What data to extract (natural language)
`apiKey`	`string`	❌	env `OPENAI_API_KEY`	OpenAI API key (`sk-...`)
`model`	`string`	❌	`gpt-4o-mini`	AI model: `gpt-4o-mini`, `gpt-4o`, `gpt-4.1`
`maxPages`	`integer`	❌	`1`	Max pages to process (1–50)
`waitForSelector`	`string`	❌	—	CSS selector to wait for before extracting

Example Input

{
    "url": "https://www.example.com/products",
    "prompt": "Extract all product names, prices, ratings, and review counts",
    "model": "gpt-4o-mini",
    "maxPages": 3
}

📤 Output

Each extracted item is pushed to the Apify Dataset as a separate record with these standard fields:

Field	Type	Description
`title`	`string`	Title or name of the extracted item
`description`	`string`	Description or summary
`price`	`string`	Price value if available
`url`	`string`	Source URL of the item
`image_url`	`string`	Image URL if available
`rating`	`number`	Rating score (0–5 scale)
`reviews_count`	`integer`	Number of reviews
`availability`	`string`	Availability status
`category`	`string`	Category or type
`source_page`	`string`	Page where item was found
`extracted_at`	`datetime`	ISO timestamp of extraction

⚠️ Note: Field names are dynamic — GPT determines them based on your prompt. The schema above covers common extraction patterns for products/listings.

Example Output

[
    {
        "title": "Wireless Headphones Pro",
        "price": "$79.99",
        "rating": 4.5,
        "reviews_count": 1234,
        "url": "https://example.com/products/wireless-headphones-pro"
    },
    {
        "title": "Bluetooth Speaker",
        "price": "$49.99",
        "rating": 4.2,
        "reviews_count": 856,
        "url": "https://example.com/products/bluetooth-speaker"
    }
]

🧪 How to Use

Option 1: Run via Apify Console

Go to Apify Console
Find "AI Web Scraper" in the Store
Click "Try for free" or "Run Actor"
Enter your URL and extraction prompt
Click "Run" — results appear in the Dataset

Option 2: Run via API

curl -X POST "https://api.apify.com/v2/acts/gek0v~ai-web-scraper/runs" \
  -H "Authorization: Bearer YOUR_APIFY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "prompt": "Extract product names and prices",
    "model": "gpt-4o-mini"
  }'

Option 3: Python SDK

from apify_client import ApifyClient

client = ApifyClient("your-apify-token")

run = client.actor("gek0v/ai-web-scraper").call(run_input={
    "url": "https://example.com",
    "prompt": "Extract all article titles and authors",
    "model": "gpt-4o-mini"
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

💰 Pricing

Component	Cost
Actor Compute (Actor Start)	~$0.000002/run (based on memory allocation)
Dataset Storage	~$0.002 per stored item
Platform Fee	20% of compute + storage costs
OpenAI GPT API	Passed directly to user at model pricing

💡 Typical cost per run: Most extractions cost < $0.01 (with gpt-4o-mini) plus ~$0.002 per extracted item stored.

🔧 Local Development

# Clone
git clone https://github.com/gek0v/ai-web-scraper.git
cd ai-web-scraper

# Install dependencies
pip install -r requirements.txt

# Run locally
python src/main.py --input '{"url": "https://example.com", "prompt": "Extract all headings"}'

📝 Tips for Best Results

Be specific in your prompt — "Extract product name, price in USD, and star rating" works better than "extract product info"
Test with gpt-4o-mini first — It's 10x cheaper and often good enough. Upgrade to gpt-4o for complex pages
Use waitForSelector — For dynamic SPAs (React, Vue, Angular), wait for the content container
Limit maxPages — Start with 1 page to test, then scale up
Provide your API key — Set OPENAI_API_KEY env var or pass via input

⚠️ Limitations

Very large pages (>100K chars) are truncated to fit GPT's context window
JavaScript-heavy SPAs may need waitForSelector for rendering
Some anti-bot protections (Cloudflare, etc.) may block access
GPT costs are passed through to the user (OpenAI/compatible API pricing applies)
Requires an OpenAI-compatible API key (not included)

📄 License

MIT License — free to use and modify.

🏷️ Tags

web-scraping artificial-intelligence data-extraction playwright gpt automation developer-tools

AI Web Scraper

apify/ai-web-scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. No programming knowledge required. No hard-coded logic that breaks when a website changes.

Apify

8.2K

4.3

(12)

Ai Web Scraper - Extract Data With Ease

eloquent_mountain/ai-web-scraper-extract-data-with-ease

Ai Web Scraper enables scraping for everyone, including non-techies! It uses Google's Gemini LLM to scrape websites with natural language commands. It dynamically extracts data, no selector input needed, handles dynamic content and cookie consent, avoids bot detection, outputs JSON or other formats.

Paco

1.4K

1.0

(2)

AI-Ready Web Content Crawler (LLM/RAG Optimized)

brilliant_gum/web-content-crawler

Deep-crawl websites and extract LLM-ready Markdown with OG tags, JSON-LD, author, dates, token estimates, native RAG chunking, language filtering, content-hash dedup, and per-page error reporting. Enforced timeouts. Zero silent failures.

Yuliia Kulakova

AI Web Scraper - Powered by Crawl4AI

raizen/ai-web-scraper

A blazing-fast AI web scraper powered by Crawl4AI. Perfect for LLMs, AI agents, AI automation, model training, sentiment analysis, and content generation. Supports deep crawling, multiple extraction strategies and flexible output (Markdown/JSON). Seamlessly integrates with Make.com, n8n, and Zapier.

Raizen Technology

355

1.0

(1)

AI Web Crawler

hounderd/ai-web-crawler

Crawl websites and extract clean, LLM-ready markdown content with stealth browser rendering, anti-bot hardening, smart content filtering, and structured metadata extraction. Built for RAG pipelines, AI agents, and data workflows.

Hounderd

AI Web Scraper

crawlworks/ai-web-scraper

Scrape any webpage with a URL and a plain-English prompt. Get structured JSON output powered by AI — no coding, no selectors, no configuration.

Crawlworks

Scrape GPT - Universal AI Web Scraper Agent

paradox-analytics/scrape-gpt---universal-ai-web-scraper-agent

AI-powered universal web scraper that works on ANY website without configuration. Extract data from e-commerce, news sites, social media, and more using intelligent LLM-based field mapping. Features JSON-first extraction, automatic pagination, anti-bot bypass, and cost-effective caching.

Paradox Analytics

Smart AI Web Scraper

cockroachapi/smart-ai-web-scraper

Unlock the power of Smart AI Web Scraper! Efficiently scrape dynamic content, simulate browser behavior, and extract targeted data.

Cockroach API

5.0

(2)

Best AI Web Scraper

hgservices/Best-AI-Web-Scraper

Extract any data from any website by simply describing what you want in plain English. AI-powered web scraping with no code, no selectors, and no per-site setup.

Harish Garg

AI Web Scraper — Structured Data Extraction from Any Website

oneary/ai-powered-data-extractor

Extract structured data from any webpage using AI. Define your schema and the AI identifies relevant content — no selectors or coding needed. Handles products, reviews, contacts, and custom fields.

Luan M.

AI Web Content Crawler - Markdown for LLMs

intelscrape/ai-web-content-crawler

Crawl any website and extract clean Markdown optimized for LLM training, RAG pipelines, and AI knowledge bases - removes boilerplate and outputs structured JSON with URL, title, markdown, and metadata.