Deprecated

Pricing

from $0.00005 / actor start

See alternative Actors

Go to Apify Store

AI Smart Scraper — Extract Data from Any Website

Deprecated

See alternative Actors

AI web scraper: describe the data you want in plain English, get clean JSON from any webpage. No CSS selectors needed. For lead gen, price monitoring, RAG, and AI agents. Powered by Gemini AI.

Pricing

from $0.00005 / actor start

Rating

5.0

(1)

Developer

亲晖林

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

AI Smart Scraper — Extract Structured Data from Any Website

Extract structured JSON data from any webpage using plain English prompts. No CSS selectors, no XPath, no coding required. Just describe the data you want, and AI does the rest.

✨ Key Features

Natural language extraction — Describe what you want: "Get all product names, prices, and ratings"
Any website — Works on news sites, e-commerce, directories, job boards, real estate listings, and more
Structured JSON output — Clean, machine-readable data ready for your pipeline
Zero configuration — No CSS selectors or page structure knowledge needed
Custom schemas — Optionally define exact output structure with JSON Schema
Batch processing — Process multiple URLs in a single run
Built-in AI — Powered by Google Gemini 2.5 Flash. No API keys needed

🎯 Use Cases

Use Case	Example Prompt
Lead generation	"Extract company names, emails, phone numbers, and addresses"
Price monitoring	"Get all product names, current prices, and discount percentages"
Job scraping	"Extract job titles, companies, locations, salaries, and posting dates"
News aggregation	"Get article titles, authors, publish dates, and summaries"
Real estate	"Extract property addresses, prices, bedrooms, bathrooms, and square footage"
Restaurant data	"Get restaurant names, ratings, review counts, cuisine types, and price ranges"
Academic research	"Extract paper titles, authors, publication years, and citation counts"
Social media	"Get post text, like counts, comment counts, and timestamps"

📥 Input

Parameter	Type	Required	Description
`url`	String	Yes*	Target webpage URL
`urls`	Array	Yes*	List of URLs for batch processing
`prompt`	String	Yes	Natural language description of data to extract
`schema`	Object	No	Optional JSON Schema for output validation
`maxPages`	Integer	No	Maximum pages to process (default: 1, max: 100)
`openaiApiKey`	String	No	Optional: Use your own OpenAI key instead of built-in AI

*Provide either url or urls (or both).

📤 Output

Each result in the dataset contains:

{
    "url": "https://example.com/products",
    "data": [
        {
            "name": "Wireless Headphones",
            "price": 79.99,
            "rating": 4.5,
            "reviews": 2847
        }
    ],
    "metadata": {
        "tokensUsed": 1250,
        "model": "google/gemini-2.5-flash",
        "extractedAt": "2026-02-24T15:37:46.831Z",
        "contentLength": 15420,
        "status": "success"
    }
}

💡 Examples

Example 1: Extract top articles from Hacker News

Input:

{
    "url": "https://news.ycombinator.com",
    "prompt": "Extract the top 5 articles with their title, score, and comment count"
}

Output:

{
    "data": [
        { "title": "Show HN: I built a new tool", "score": 285, "comment_count": 63 },
        { "title": "Why AI agents need better tools", "score": 141, "comment_count": 45 }
    ]
}

Example 2: Scrape product listings with custom schema

Input:

{
    "url": "https://example-shop.com/laptops",
    "prompt": "Extract all laptop listings with name, price, specs, and availability",
    "schema": {
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "name": { "type": "string" },
                "price": { "type": "number" },
                "cpu": { "type": "string" },
                "ram_gb": { "type": "integer" },
                "in_stock": { "type": "boolean" }
            }
        }
    }
}

Example 3: Batch URL processing

Input:

{
    "urls": [
        "https://company-a.com/about",
        "https://company-b.com/about",
        "https://company-c.com/about"
    ],
    "prompt": "Extract the company name, founding year, number of employees, and headquarters location"
}

💰 Pricing

This Actor uses Pay Per Event pricing:

Event	Price
Page extracted	$0.01 per page
Actor start	$0.00005 per start

Cost example: Extracting data from 100 product pages = $1.00 + platform usage (~$0.40) = ~$1.40 total

No monthly fees. No subscriptions. Pay only for what you use.

🔌 Integrations

This Actor works with:

Apify API — Call via REST API from any language
Apify MCP Server — Use directly from AI agents (Claude, ChatGPT, etc.)
Zapier / Make — Automate workflows with no-code tools
Python / JavaScript SDK — Native Apify client libraries

🤔 FAQ

Q: Do I need an API key? A: No! The Actor uses a built-in AI model (Google Gemini). Optionally, you can provide your own OpenAI API key for GPT-4o-mini.

Q: What websites does it work on? A: Any publicly accessible webpage. It uses Cheerio for fast HTML parsing, so JavaScript-heavy SPAs may need additional configuration.

Q: How accurate is the extraction? A: Powered by Gemini 2.5 Flash, extraction accuracy is typically 90-95% for well-structured pages. Complex or unusual layouts may require more specific prompts.

Q: Can I use this for large-scale scraping? A: Yes! Use the urls parameter for batch processing and maxPages to control scope. For very large jobs, consider running multiple Actor instances.

📋 Changelog

v0.1 — Initial release with Gemini 2.5 Flash, Cheerio crawler, PPE pricing

AI Extraction Agent - Smart Scraper

alizarin_refrigerator-owner/ai-extraction-agent

AI-powered data extraction using natural language prompts. Describe what you need & let AI extract structured data from any webpage automatically.

The Howlers

OmniExtract AI: LinkedIn + Multi-Site Job Scraper + AI Engine

mr.data_scientist/OmniExtract-AI

2026’s elite job scraper for LinkedIn, Indeed & more. Use advanced filters to extract rich data: full descriptions, salaries & seniority. Features LLM-powered AI extraction (SmartScraper/SearchGraph) for any URL. Fast, proxy-ready & optimized for deep data. No coding required. JSON/CSV/audio export.

Ali Hassan

AI-Powered Smart Web Scraper

cloud9_ai/ai-web-scraper

Intelligent content extraction from any website using Crawlee + AI. Auto-detects structure, adapts to layout changes, handles JavaScript rendering. No custom code needed. Extract articles, products, listings from 1000s of pages.

cloud9

AI Lead Scout: Global Google Maps Scraper with GPT-4o

panzerhans/ai-lead-qualifier-google-maps-scraper

Stop exporting messy spreadsheets with thousands of dead leads. AI Lead Scout doesn't just scrape Google Maps; it thinks like a sales assistant. It finds businesses anywhere in the world and uses GPT-4o mini to instantly qualify them for you.

Jarne

AI Web Scraper

apify/ai-web-scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. No programming knowledge required. No hard-coded logic that breaks when a website changes.

Apify

7.5K

3.5

(10)

Web Scraper and AI processor

scraping_samurai/web-scraper-and-ai-processor

Adaptive AI controller classifies page quality from fast HTTP fetches and selectively triggers headless rendering, then converts raw text into structured JSON from natural-language extraction prompts. Optimizes cost vs. accuracy with AI-guided escalation, retry, and thin/blocked content heuristics.

Scraping Samurai

AI Powered X No Code Scraper

ko_sunam/AI-Powered-X-No-Code-Scraper

🤖 AI Web Scraper: Extract data from any website in 60 seconds - no coding needed! Simply paste a URL, tell the AI what you want, and get clean Excel-ready data. Perfect for SMBs needing competitor prices, leads, or market data. Start free!

Fabian D

Zocdoc Scraper

fresh_cliff/zocdoc-scraper

Zocdoc Doctor Scraper - Extract doctor profiles, ratings, locations & availability from Zocdoc API. Search by location & specialty. Get clean structured data for healthcare research, competitor analysis & lead generation. Fast, reliable & bot-resistant scraping.

Brennan Crawford

Website Url Scraper

lccscsio/website-url-scraper

Smart Contractor Crawler extracts emails, phones, and text from contractor websites. Ideal for automating outreach, AI training, or business lead enrichment.

Louis-Charles Carrier

112

5.0

(1)

News Website Crawler & Article Extractor

xtech/news-source-crawler

Scrape all articles from any news website. Extract full text, metadata, keywords, and summaries. Ideal for content analysis, research, and news aggregation.