AI Web Scraper - Extract Any Website by Example avatar

AI Web Scraper - Extract Any Website by Example

Pricing

from $0.000035 / actor start

Go to Apify Store
AI Web Scraper - Extract Any Website by Example

AI Web Scraper - Extract Any Website by Example

AI web scraper that extracts any website by example — paste a URL and a value you see on the page (a price, title, or name) and it learns the HTML pattern and pulls every similar item as structured rows. No CSS selectors, no API key. Export CSV/JSON/Excel.

Pricing

from $0.000035 / actor start

Rating

0.0

(0)

Developer

Flash Scrape

Flash Scrape

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 hours ago

Last modified

Share

A no code web scraper that turns any website into clean, structured data — without writing a single CSS selector, XPath, or line of code. This is web scraping by example: paste a URL, paste one or two values you can actually see on the page (a price, a title, a name), and the scraper learns the surrounding HTML pattern and pulls every similar item into rows you can export to CSV, JSON, or Excel. No API key. No fragile selectors to maintain.

If you have ever wanted to scrape a website without coding, this is the simplest way to do it: show the actor what you want by example, and it figures out the rest.

How to scrape any website by example (3 steps)

  1. Paste the page URL. Use a list, category, or search page that has repeating items (products, quotes, listings, search results).
  2. Paste example values you can see on the page — one per line. Optionally label them as label: value (for example author: Albert Einstein) so your output columns get clean names.
  3. Run it. The scraper finds each example in the HTML, learns the wrapping tag and class, extracts every element matching that pattern, and zips the fields into structured rows.

That is the whole workflow. No browser extension to install, no point-and-click recorder that breaks on the next layout change, and no selector knowledge required. You teach the scraper by example and it generalizes the rule across the entire page.

What makes this different

Most "by-example" scrapers give you values and leave you guessing whether they're right. This one shows its work:

  • Confidence score on every row_confidence (0-1) plus _fieldsFilled tells you how reliable each extraction is, so you can trust or filter the output instead of eyeballing it.
  • The learned selector, exposed — the run saves an EXTRACTION_SCHEMA (and logs it) showing the exact tag.class selector, detected type, match count, and confidence it inferred for each field. Full transparency, easy debugging.
  • Type detection + optional normalization — it tags each field as number / price / percent / date / text and, with normalizeValues on, converts prices and numbers into real numbers in the output.
  • Multiple examples per field — give the same label on several lines and the scraper uses them together for a more robust pattern (and higher confidence).
  • Pagination follow — set maxPages and it follows rel="next" / "Next" links across pages automatically.

What data you get

Every run returns one row per extracted item. Each row contains:

  • sourceUrl — the page the item was extracted from.
  • One column per example you provided, named by your label (e.g. quote, author, price, title).
  • With metadata on (default): _confidence, _fieldsFilled, _types, and _page.

Because columns come from your labels, the output schema matches exactly what you asked for — no junk fields, no nested mess. Export the dataset to CSV, JSON, or Excel straight from the run.

Input

FieldRequiredDescription
startUrlsYesPages to scrape — typically list / category / search pages with repeating items.
examplesYesValues visible on the page, one per line. Label them author: Albert Einstein to name the output columns. Repeat a label on multiple lines for a more robust pattern.
maxItemsNoStop after this many rows across all URLs. Use 0 for no limit (default).
maxPagesNoFollow pagination ("Next" / rel="next") up to this many pages per URL. Default 1.
normalizeValuesNoConvert detected numbers / prices / percents into real numbers. Default false.
includeMetaNoAdd per-row _confidence, _fieldsFilled, _types, _page. Default true.

Example input

{
"startUrls": [{ "url": "https://quotes.toscrape.com/" }],
"examples": ["quote: process of our thinking", "author: Albert Einstein"],
"maxItems": 0
}

JSON output sample

For the input above, the scraper returns one row per quote on the page:

[
{
"sourceUrl": "https://quotes.toscrape.com/",
"quote": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein",
"_confidence": 0.95,
"_fieldsFilled": "2/2",
"_types": { "quote": "text", "author": "text" },
"_page": 1
},
{
"sourceUrl": "https://quotes.toscrape.com/",
"quote": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
"author": "J.K. Rowling",
"_confidence": 0.95,
"_fieldsFilled": "2/2",
"_types": { "quote": "text", "author": "text" },
"_page": 1
}
]

The run also saves an EXTRACTION_SCHEMA to the key-value store, e.g.:

{
"learnedRules": [
{ "field": "quote", "selector": "span.text", "type": "text", "matches": 10, "confidence": 0.95 },
{ "field": "author", "selector": "small.author", "type": "text", "matches": 10, "confidence": 0.95 }
],
"itemsExtracted": 10,
"averageConfidence": 0.95
}

Point it at a shop instead and label your examples title:, price:, and sku: — you get one row per product with exactly those columns plus sourceUrl.

Filters & options

  • Scrape multiple pages at once — add several entries to startUrls and the rows are combined into one dataset.
  • Name your own columns — label every example as label: value to control the output schema.
  • Cap your results — set maxItems to limit total rows (handy for quick test runs), or 0 for everything.
  • Mix field types on one page — give a title example and a price example together and they zip into the same rows.

Pricing

This actor uses pay-per-result: you are charged once per extracted row via the item event, so you only pay for data you actually get. Runs are free while monetization is unconfigured, and you can cap spend with maxItems. Check the actor's Apify Store page for the current per-item rate.

Use with AI agents & automation

The dataset is plain JSON, so it drops straight into your stack. Call this scraper from an MCP server to give AI agents live web-extraction-by-example, or wire it into Make, n8n, or Zapier to trigger runs and route rows to a CRM, database, or Google Sheets automatically. Schedule recurring runs to keep a sheet of prices, listings, or leads continuously fresh — no glue code needed.

Other Flash Scrape scrapers

Need a ready-made scraper for a specific platform? Try the rest of the Flash Scrape suite:

FAQ

Is it legal to scrape websites with this? The actor only reads publicly available web content — the same pages anyone can open in a browser. Scrape responsibly, respect each site's terms of service and robots rules, and avoid collecting personal or copyrighted data you are not entitled to use.

Do I need an API key or any code? No. There is no API key and no coding. You paste a URL and a few example values you can see on the page; the scraper learns the pattern for you.

How many results can I get? As many repeating items as the page contains across all your startUrls. Set maxItems to cap the total, or leave it at 0 for no limit.

Can I export to CSV, Excel, or Google Sheets? Yes. Every run produces a dataset you can download as CSV, JSON, or Excel, or push to Google Sheets via Make, n8n, or Zapier.

Why didn't my example match? Copy an exact value from the page's visible text — not from an image, a tooltip, or a dropdown. It also works best when each value sits in its own element (a <span> price, an <h2> or <a> title).

Can AI agents call this scraper? Yes. It exposes a standard Apify run interface, so MCP servers and agent frameworks can invoke it and read the structured rows directly.


Scrapes public web content. Use responsibly and within each site's terms.