AI Web Scraper - Extract Any Website by Example
Pricing
from $0.000035 / actor start
AI Web Scraper - Extract Any Website by Example
AI web scraper that extracts any website by example — paste a URL and a value you see on the page (a price, title, or name) and it learns the HTML pattern and pulls every similar item as structured rows. No CSS selectors, no API key. Export CSV/JSON/Excel.
Pricing
from $0.000035 / actor start
Rating
0.0
(0)
Developer
Flash Scrape
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
19 hours ago
Last modified
Categories
Share
A no code web scraper that turns any website into clean, structured data — without writing a single CSS selector, XPath, or line of code. This is web scraping by example: paste a URL, paste one or two values you can actually see on the page (a price, a title, a name), and the scraper learns the surrounding HTML pattern and pulls every similar item into rows you can export to CSV, JSON, or Excel. No API key. No fragile selectors to maintain.
If you have ever wanted to scrape a website without coding, this is the simplest way to do it: show the actor what you want by example, and it figures out the rest.
How to scrape any website by example (3 steps)
- Paste the page URL. Use a list, category, or search page that has repeating items (products, quotes, listings, search results).
- Paste example values you can see on the page — one per line. Optionally label them as
label: value(for exampleauthor: Albert Einstein) so your output columns get clean names. - Run it. The scraper finds each example in the HTML, learns the wrapping tag and class, extracts every element matching that pattern, and zips the fields into structured rows.
That is the whole workflow. No browser extension to install, no point-and-click recorder that breaks on the next layout change, and no selector knowledge required. You teach the scraper by example and it generalizes the rule across the entire page.
What makes this different
Most "by-example" scrapers give you values and leave you guessing whether they're right. This one shows its work:
- Confidence score on every row —
_confidence(0-1) plus_fieldsFilledtells you how reliable each extraction is, so you can trust or filter the output instead of eyeballing it. - The learned selector, exposed — the run saves an
EXTRACTION_SCHEMA(and logs it) showing the exacttag.classselector, detected type, match count, and confidence it inferred for each field. Full transparency, easy debugging. - Type detection + optional normalization — it tags each field as number / price / percent / date / text and, with
normalizeValueson, converts prices and numbers into real numbers in the output. - Multiple examples per field — give the same label on several lines and the scraper uses them together for a more robust pattern (and higher confidence).
- Pagination follow — set
maxPagesand it followsrel="next"/ "Next" links across pages automatically.
What data you get
Every run returns one row per extracted item. Each row contains:
sourceUrl— the page the item was extracted from.- One column per example you provided, named by your label (e.g.
quote,author,price,title). - With metadata on (default):
_confidence,_fieldsFilled,_types, and_page.
Because columns come from your labels, the output schema matches exactly what you asked for — no junk fields, no nested mess. Export the dataset to CSV, JSON, or Excel straight from the run.
Input
| Field | Required | Description |
|---|---|---|
startUrls | Yes | Pages to scrape — typically list / category / search pages with repeating items. |
examples | Yes | Values visible on the page, one per line. Label them author: Albert Einstein to name the output columns. Repeat a label on multiple lines for a more robust pattern. |
maxItems | No | Stop after this many rows across all URLs. Use 0 for no limit (default). |
maxPages | No | Follow pagination ("Next" / rel="next") up to this many pages per URL. Default 1. |
normalizeValues | No | Convert detected numbers / prices / percents into real numbers. Default false. |
includeMeta | No | Add per-row _confidence, _fieldsFilled, _types, _page. Default true. |
Example input
{"startUrls": [{ "url": "https://quotes.toscrape.com/" }],"examples": ["quote: process of our thinking", "author: Albert Einstein"],"maxItems": 0}
JSON output sample
For the input above, the scraper returns one row per quote on the page:
[{"sourceUrl": "https://quotes.toscrape.com/","quote": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.","author": "Albert Einstein","_confidence": 0.95,"_fieldsFilled": "2/2","_types": { "quote": "text", "author": "text" },"_page": 1},{"sourceUrl": "https://quotes.toscrape.com/","quote": "It is our choices, Harry, that show what we truly are, far more than our abilities.","author": "J.K. Rowling","_confidence": 0.95,"_fieldsFilled": "2/2","_types": { "quote": "text", "author": "text" },"_page": 1}]
The run also saves an EXTRACTION_SCHEMA to the key-value store, e.g.:
{"learnedRules": [{ "field": "quote", "selector": "span.text", "type": "text", "matches": 10, "confidence": 0.95 },{ "field": "author", "selector": "small.author", "type": "text", "matches": 10, "confidence": 0.95 }],"itemsExtracted": 10,"averageConfidence": 0.95}
Point it at a shop instead and label your examples title:, price:, and sku: — you get one row per product with exactly those columns plus sourceUrl.
Filters & options
- Scrape multiple pages at once — add several entries to
startUrlsand the rows are combined into one dataset. - Name your own columns — label every example as
label: valueto control the output schema. - Cap your results — set
maxItemsto limit total rows (handy for quick test runs), or0for everything. - Mix field types on one page — give a title example and a price example together and they zip into the same rows.
Pricing
This actor uses pay-per-result: you are charged once per extracted row via the item event, so you only pay for data you actually get. Runs are free while monetization is unconfigured, and you can cap spend with maxItems. Check the actor's Apify Store page for the current per-item rate.
Use with AI agents & automation
The dataset is plain JSON, so it drops straight into your stack. Call this scraper from an MCP server to give AI agents live web-extraction-by-example, or wire it into Make, n8n, or Zapier to trigger runs and route rows to a CRM, database, or Google Sheets automatically. Schedule recurring runs to keep a sheet of prices, listings, or leads continuously fresh — no glue code needed.
Other Flash Scrape scrapers
Need a ready-made scraper for a specific platform? Try the rest of the Flash Scrape suite:
- Google Maps Leads Scraper — Google Maps business leads
- Yelp Leads Scraper — Yelp business leads
- BBB + Yellow Pages Leads Scraper — BBB and Yellow Pages leads
- Instagram Leads Scraper — Instagram profile leads
- TikTok Leads Scraper — TikTok creator leads
- YouTube Leads Scraper — YouTube creator leads
FAQ
Is it legal to scrape websites with this? The actor only reads publicly available web content — the same pages anyone can open in a browser. Scrape responsibly, respect each site's terms of service and robots rules, and avoid collecting personal or copyrighted data you are not entitled to use.
Do I need an API key or any code? No. There is no API key and no coding. You paste a URL and a few example values you can see on the page; the scraper learns the pattern for you.
How many results can I get?
As many repeating items as the page contains across all your startUrls. Set maxItems to cap the total, or leave it at 0 for no limit.
Can I export to CSV, Excel, or Google Sheets? Yes. Every run produces a dataset you can download as CSV, JSON, or Excel, or push to Google Sheets via Make, n8n, or Zapier.
Why didn't my example match?
Copy an exact value from the page's visible text — not from an image, a tooltip, or a dropdown. It also works best when each value sits in its own element (a <span> price, an <h2> or <a> title).
Can AI agents call this scraper? Yes. It exposes a standard Apify run interface, so MCP servers and agent frameworks can invoke it and read the structured rows directly.
Scrapes public web content. Use responsibly and within each site's terms.