Pricing

from $0.000035 / actor start

AI Web Scraper - Extract Any Website by Example

AI web scraper that extracts any website by example — paste a URL and a value you see on the page (a price, title, or name) and it learns the HTML pattern and pulls every similar item as structured rows. No CSS selectors, no API key. Export CSV/JSON/Excel.

Pricing

from $0.000035 / actor start

Rating

0.0

(0)

Developer

Flash Scrape

Actor stats

Bookmarked

Total users

Monthly active users

19 hours ago

Last modified

How to scrape any website by example (3 steps)

Paste the page URL. Use a list, category, or search page that has repeating items (products, quotes, listings, search results).
Paste example values you can see on the page — one per line. Optionally label them as label: value (for example author: Albert Einstein) so your output columns get clean names.
Run it. The scraper finds each example in the HTML, learns the wrapping tag and class, extracts every element matching that pattern, and zips the fields into structured rows.

That is the whole workflow. No browser extension to install, no point-and-click recorder that breaks on the next layout change, and no selector knowledge required. You teach the scraper by example and it generalizes the rule across the entire page.

What makes this different

Most "by-example" scrapers give you values and leave you guessing whether they're right. This one shows its work:

Confidence score on every row — _confidence (0-1) plus _fieldsFilled tells you how reliable each extraction is, so you can trust or filter the output instead of eyeballing it.
The learned selector, exposed — the run saves an EXTRACTION_SCHEMA (and logs it) showing the exact tag.class selector, detected type, match count, and confidence it inferred for each field. Full transparency, easy debugging.
Type detection + optional normalization — it tags each field as number / price / percent / date / text and, with normalizeValues on, converts prices and numbers into real numbers in the output.
Multiple examples per field — give the same label on several lines and the scraper uses them together for a more robust pattern (and higher confidence).
Pagination follow — set maxPages and it follows rel="next" / "Next" links across pages automatically.

What data you get

Every run returns one row per extracted item. Each row contains:

sourceUrl — the page the item was extracted from.
One column per example you provided, named by your label (e.g. quote, author, price, title).
With metadata on (default): _confidence, _fieldsFilled, _types, and _page.

Because columns come from your labels, the output schema matches exactly what you asked for — no junk fields, no nested mess. Export the dataset to CSV, JSON, or Excel straight from the run.

Input

Field	Required	Description
`startUrls`	Yes	Pages to scrape — typically list / category / search pages with repeating items.
`examples`	Yes	Values visible on the page, one per line. Label them `author: Albert Einstein` to name the output columns. Repeat a label on multiple lines for a more robust pattern.
`maxItems`	No	Stop after this many rows across all URLs. Use `0` for no limit (default).
`maxPages`	No	Follow pagination ("Next" / `rel="next"`) up to this many pages per URL. Default `1`.
`normalizeValues`	No	Convert detected numbers / prices / percents into real numbers. Default `false`.
`includeMeta`	No	Add per-row `_confidence`, `_fieldsFilled`, `_types`, `_page`. Default `true`.

Example input

{
  "startUrls": [{ "url": "https://quotes.toscrape.com/" }],
  "examples": ["quote: process of our thinking", "author: Albert Einstein"],
  "maxItems": 0
}

JSON output sample

For the input above, the scraper returns one row per quote on the page:

[
  {
    "sourceUrl": "https://quotes.toscrape.com/",
    "quote": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
    "author": "Albert Einstein",
    "_confidence": 0.95,
    "_fieldsFilled": "2/2",
    "_types": { "quote": "text", "author": "text" },
    "_page": 1
  },
  {
    "sourceUrl": "https://quotes.toscrape.com/",
    "quote": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
    "author": "J.K. Rowling",
    "_confidence": 0.95,
    "_fieldsFilled": "2/2",
    "_types": { "quote": "text", "author": "text" },
    "_page": 1
  }
]

The run also saves an EXTRACTION_SCHEMA to the key-value store, e.g.:

{
  "learnedRules": [
    { "field": "quote", "selector": "span.text", "type": "text", "matches": 10, "confidence": 0.95 },
    { "field": "author", "selector": "small.author", "type": "text", "matches": 10, "confidence": 0.95 }
  ],
  "itemsExtracted": 10,
  "averageConfidence": 0.95
}

Point it at a shop instead and label your examples title:, price:, and sku: — you get one row per product with exactly those columns plus sourceUrl.

Filters & options

Scrape multiple pages at once — add several entries to startUrls and the rows are combined into one dataset.
Name your own columns — label every example as label: value to control the output schema.
Cap your results — set maxItems to limit total rows (handy for quick test runs), or 0 for everything.
Mix field types on one page — give a title example and a price example together and they zip into the same rows.

Pricing

This actor uses pay-per-result: you are charged once per extracted row via the item event, so you only pay for data you actually get. Runs are free while monetization is unconfigured, and you can cap spend with maxItems. Check the actor's Apify Store page for the current per-item rate.

Use with AI agents & automation

The dataset is plain JSON, so it drops straight into your stack. Call this scraper from an MCP server to give AI agents live web-extraction-by-example, or wire it into Make, n8n, or Zapier to trigger runs and route rows to a CRM, database, or Google Sheets automatically. Schedule recurring runs to keep a sheet of prices, listings, or leads continuously fresh — no glue code needed.

Other Flash Scrape scrapers

Need a ready-made scraper for a specific platform? Try the rest of the Flash Scrape suite:

Google Maps Leads Scraper — Google Maps business leads
Yelp Leads Scraper — Yelp business leads
BBB + Yellow Pages Leads Scraper — BBB and Yellow Pages leads
Instagram Leads Scraper — Instagram profile leads
TikTok Leads Scraper — TikTok creator leads
YouTube Leads Scraper — YouTube creator leads

FAQ

Is it legal to scrape websites with this? The actor only reads publicly available web content — the same pages anyone can open in a browser. Scrape responsibly, respect each site's terms of service and robots rules, and avoid collecting personal or copyrighted data you are not entitled to use.

Do I need an API key or any code? No. There is no API key and no coding. You paste a URL and a few example values you can see on the page; the scraper learns the pattern for you.

How many results can I get? As many repeating items as the page contains across all your startUrls. Set maxItems to cap the total, or leave it at 0 for no limit.

Can I export to CSV, Excel, or Google Sheets? Yes. Every run produces a dataset you can download as CSV, JSON, or Excel, or push to Google Sheets via Make, n8n, or Zapier.

Why didn't my example match? Copy an exact value from the page's visible text — not from an image, a tooltip, or a dropdown. It also works best when each value sits in its own element (a <span> price, an <h2> or <a> title).

Can AI agents call this scraper? Yes. It exposes a standard Apify run interface, so MCP servers and agent frameworks can invoke it and read the structured rows directly.

Scrapes public web content. Use responsibly and within each site's terms.

AI Web Scraper

apify/ai-web-scraper

AI-first web scraper that extracts structured data from any website using natural-language prompts. No programming knowledge required. No hard-coded logic that breaks when a website changes.

Apify

7.8K

4.3

(12)

Best AI Web Scraper

hgservices/Best-AI-Web-Scraper

Extract any data from any website by simply describing what you want in plain English. AI-powered web scraping with no code, no selectors, and no per-site setup.

Harish Garg

Quick Website Content Scraper ( Extract Text for RAG & LLMs )

automateitplease/ai-web-content-scraper-extract-text-for-rag-llms

Extract clean text from any website for AI/LLM applications. Supports both static and JavaScript-rendered sites (React, Vue, Angular). Perfect for RAG systems, chatbot training, and content analysis.

AutomateItPlease Workflow And Automaton Ops

AI-Powered Smart Web Scraper

cloud9_ai/ai-web-scraper

Intelligent content extraction from any website using Crawlee + AI. Auto-detects structure, adapts to layout changes, handles JavaScript rendering. No custom code needed. Extract articles, products, listings from 1000s of pages.

cloud9

Ai Web Scraper - Extract Data With Ease

eloquent_mountain/ai-web-scraper-extract-data-with-ease

Ai Web Scraper enables scraping for everyone, including non-techies! It uses Google's Gemini LLM to scrape websites with natural language commands. It dynamically extracts data, no selector input needed, handles dynamic content and cookie consent, avoids bot detection, outputs JSON or other formats.

Paco

1.3K

1.0

(2)

AI Web Scraper with Playwright Browser (No-Code, MCP)

data_rig/ai-web-scraper

Run a real Playwright browser as an AI web scraper. Extract structured data from any site using natural language—no selectors or scripts. Handles JS-heavy pages, pagination, and interactions. Built for MCP agents like OpenCode and Claude Code.

Data Rig

AI Web Scraper

crawlworks/ai-web-scraper

Scrape any webpage with a URL and a plain-English prompt. Get structured JSON output powered by AI — no coding, no selectors, no configuration.

Crawlworks

Bizsleuth

ashar_malik/bizsleuth

An AI powered lead generation tool that can extract useful information from business websites.

Ashar Malik

4.6

(2)

AI-Ready Web Content Crawler (LLM/RAG Optimized)

brilliant_gum/web-content-crawler

Deep-crawl websites and extract LLM-ready Markdown with OG tags, JSON-LD, author, dates, token estimates, native RAG chunking, language filtering, content-hash dedup, and per-page error reporting. Enforced timeouts. Zero silent failures.

Yuliia Kulakova

AI Web Crawler

hounderd/ai-web-crawler

Crawl websites and extract clean, LLM-ready markdown content with stealth browser rendering, anti-bot hardening, smart content filtering, and structured metadata extraction. Built for RAG pipelines, AI agents, and data workflows.