SimilarWeb Scraper - Traffic, AI & WHOIS
Pricing
$5.00 / 1,000 results
SimilarWeb Scraper - Traffic, AI & WHOIS
Extract website traffic analytics from SimilarWeb for any domain. Rankings, monthly visits, bounce rate, traffic sources, keywords, and AI chatbot traffic data. Plus a domain-analysis mode with RDAP WHOIS and 1-to-5-word keyword density from the homepage. $5 per 1,000 results. Fast, HTTP-only API.
Pricing
$5.00 / 1,000 results
Rating
5.0
(1)
Developer
Sourabh Kumar
Actor stats
2
Bookmarked
55
Total users
38
Monthly active users
5 days ago
Last modified
Categories
Share
Pull SimilarWeb traffic, AI chatbot referrals, WHOIS, and 1‑to‑5‑word keyword density for any domain. No login, no contract, no Partner-API tier required.
$5 per 1,000 results — no per-run fee, no platform-usage fee, and you only pay when we return data. Failed or empty lookups are free.
Two modes in one actor:
traffic— SimilarWeb metrics + the most complete AI-traffic block in the segment (ranked AI sources, traffic tier, top prompts, GA-verified flag).domainAnalysis— RDAP WHOIS + 1‑to‑5‑word keyword density from the homepage HTML.
Why scrape SimilarWeb instead of using the official API
SimilarWeb sells an enterprise API that gates the deepest data behind sales calls. There are also a dozen other SimilarWeb scrapers on the Apify Store. Here's where this one wins.
| Concern | Other actors | This actor |
|---|---|---|
| AI chatbot referrals | Usually missing, or top‑3 only | Ranked AI sources, traffic tier, top prompts, 3‑month trend |
dataSource (GA‑verified vs estimated) | Not exposed anywhere else | Returned per row |
| WHOIS + n‑gram on the same input list | Split across separate actors | One actor, second mode |
| WAF / CloudFront / CAPTCHA blocks | Often crash the run | Detected, proxy-rotated, retried; persistent blocks emit _error and the run keeps going |
| Per‑run fee | Some charge $0.10 start or $0.50 floor | None |
We're not the cheapest. We are the most complete on AI traffic.
Key features of this SimilarWeb scraper
- SimilarWeb traffic data — global rank, country rank, category rank, monthly visits, bounce rate, pages per visit, and average visit duration.
- AI chatbot referral tracking — see how much traffic ChatGPT, Claude, Perplexity, Gemini, Grok, and Copilot drive to a domain.
- GA-verified data source flag —
dataSource: "ga-verified"when SimilarWeb's data is backed by a Google Analytics integration,"estimated"otherwise. Nobody else exposes this. - AI traffic tier and 3-month chatbot trends —
aiTrafficTier(e.g."<500M") plus a per-chatbot share history across the last three months. - Top AI prompts driving traffic — the actual prompts users type into chatbots that surface the target domain.
- Bulk RDAP WHOIS lookup — registrar, creation/expiration/update dates, registrant org and country, nameservers — for any TLD, no API key required.
- 1-to-5-word keyword density analysis — n-gram phrase frequency on the target homepage with English stopword filtering.
- WAF, CloudFront, and CAPTCHA detection — automatic proxy rotation and retry; persistent blocks emit a row with
_errorinstead of crashing the run. - Multi-tier proxy fallback — direct → datacenter → US residential, with your supplied
proxyConfigurationpreferred when present. - HTTP-only architecture — no headless browser, no Playwright, low compute cost per row.
- Bulk domain processing — feed any number of domains; the run streams rows as they're scraped and respects
maxItems.
What data this SimilarWeb scraper returns
| 📊 Global / country / category rank | 📈 3‑month visit history | 🎯 Bounce rate, pages/visit, duration | 🔗 Traffic source split |
| 🌍 Top 5 countries | 🔑 Top 5 keywords + CPC | 🤖 Ranked AI chatbot referrals | 💬 Top AI prompts |
| ✅ GA‑verified flag | 📝 WHOIS via RDAP | 🧮 1‑to‑5‑word keyword density | 📸 Screenshot URL + isSmall |
Traffic mode fields
| Field | Type | What it is |
|---|---|---|
domain, siteName, title, description | string | Identity |
globalRank, countryRank, categoryRank, globalCategoryRank | int / object | Ranking signals |
category | string | SimilarWeb category |
totalVisits, estimatedMonthlyVisits | number / object | Visit volumes |
bounceRate, pagesPerVisit, avgVisitDuration, engagementMonth | number / string | Engagement metrics |
trafficSources | object | Direct, search, social, referrals, paid, mail (fractions ≈ 1.0) |
topCountries | array (max 5) | Country code, country id, share |
topKeywords | array (max 5) | Keyword, volume, CPC, estimated value |
aiTrafficDetails | object | totalAiVisits, aiReferralShare, aiTrafficTier, topChatbots, chatbotTrends, topPrompts, aiPromptsStatus |
aiChatbotsRanked | array | Full ranked AI source list (6–7 entries for popular sites) |
dataSource | enum | "ga-verified" or "estimated" |
serverNotice, largeScreenshot, snapshotDate, isSmall | misc | Side data |
_meta, _error | object / string | Forward compatibility + failure reason |
Domain analysis mode fields
| Field | Type | What it is |
|---|---|---|
domain | string | Normalized input |
whois | object | registrar, createdDate, updatedDate, expiresDate, registrantOrg, registrantCountry, nameServers |
whoisError | string | "rdap_not_found", "rdap_rate_limited", "rdap_unreachable" |
keywordDensity | object | { "1": [...], "2": [...], ..., "5": [...] } — each entry has ngram, count, frequency |
keywordDensityError | string | "html_fetch_failed", "empty_body", "cloudflare_blocked", etc. |
htmlFetchedBytes | int | Bytes pulled (capped at 1 MB) |
htmlFetchProxyTier | string | Which tier landed the body ("user", "direct", "datacenter", "residential") |
_error | string | Set when every configured subtask failed |
Track AI chatbot referrals — ChatGPT, Claude, Perplexity, Gemini
Traffic mode returns a per-domain breakdown of which AI chatbots refer traffic. Two fields cover it:
aiChatbotsRankedis a compact ranked list of every chatbot SimilarWeb sees driving traffic to the domain (typically 6–7 sources for popular sites: chatgpt.com, claude.ai, perplexity.ai, gemini.google.com, grok.com, copilot.microsoft.com).aiTrafficDetailsis the verbose block: total AI visits, AI referral share, AI traffic tier (e.g."<500M"), top prompts users type, and a 3-month per-chatbot share history.
Set includeAiBreakdown: false to drop the verbose block but keep aiChatbotsRanked. Set includeIcons: true to add chatbot icon URLs.
Bulk WHOIS lookup and 1-to-5-word keyword density analysis
domainAnalysis mode covers two on-page SEO use cases against the same domain list you feed traffic mode.
- WHOIS via RDAP. The actor queries
rdap.org, which auto-routes by TLD — works for.com,.io,.net, country TLDs, and most others without an API key. Returns registrar, registration/update/expiration dates, registrant org and country, and nameservers. - Keyword density. The actor fetches the target homepage (capped at 1 MB), strips scripts/styles via cheerio, tokenizes with a 25-word English stopword filter, and returns the top N n-grams per size. Defaults: sizes
[1,2,3,4,5], top 50 per size. Cloudflare-protected sites (e.g.wsj.com) emitkeywordDensityError: "cloudflare_blocked"instead of HTML — the run continues.
How to scrape SimilarWeb: step by step
- Create a free Apify account. 30 seconds, no card.
- Open the SimilarWeb Scraper in the Apify Console.
- Paste your domains.
https://,www., and trailing slashes are stripped for you. - Click Start. A 3‑domain run finishes in under a minute; bulk runs scale roughly linearly.
- Export the dataset as JSON, CSV, or Excel — or fetch via API.
How much does it cost to scrape SimilarWeb?
Pay-per-result. $5 per 1,000 results ($0.005/result) — and that's all you pay. No per-run fee, no platform-usage fee, no charge for failed or empty results. We only bill when we successfully return data for a domain. Both modes bill the same flat rate.
- Apify Free plan ($5/month credit): around 1,000 results/month.
- Apify Starter plan ($29/month): about 5,800 results/month.
The actor is HTTP-only — no headless browser — and platform compute is on us, not on your bill.
Input parameters for the SimilarWeb scraper
Both modes share domains + maxItems + proxyConfiguration. Mode-specific flags toggle the rest.
{"mode": "traffic","domains": ["google.com", "amazon.com", "github.com"],"maxItems": 100,"includeAiBreakdown": true,"includeIcons": false,"proxyConfiguration": { "useApifyProxy": true }}
For domainAnalysis:
{"mode": "domainAnalysis","domains": ["github.com", "stripe.com"],"includeWhois": true,"includeKeywordDensity": true,"keywordDensityNGrams": [1, 2, 3, 4, 5],"keywordDensityTopN": 50}
| Field | Type | Default | Note |
|---|---|---|---|
mode | enum | "traffic" | "traffic" or "domainAnalysis". Strict — typos fail at the gateway. |
domains | string[] | prefilled sample | Each item ≤253 chars. Schemes and www. stripped automatically. |
maxItems | int | none | Caps the number of rows processed. |
includeAiBreakdown | bool | true | Traffic mode. Off keeps aiChatbotsRanked but drops the verbose aiTrafficDetails block. |
includeIcons | bool | false | Traffic mode. Adds chatbot icon URLs. |
includeWhois | bool | true | DomainAnalysis mode. RDAP lookup via rdap.org. |
includeKeywordDensity | bool | true | DomainAnalysis mode. Fetches the homepage (≤1MB) and tokenizes. |
keywordDensityNGrams | int[] | [1,2,3,4,5] | Sizes to compute, each between 1 and 8. |
keywordDensityTopN | int | 50 | Top N n‑grams returned per size. |
proxyConfiguration | object | Apify Proxy | The actor falls back through datacenter and US residential when a tier gets blocked. Traffic mode also tries a direct connection first; domain‑analysis HTML fetch skips direct because most target sites bot‑detect it. |
Output examples — traffic, AI referrals, WHOIS, and keyword density
You can download the dataset in JSON, HTML, CSV, or Excel — or stream it through the Apify API.
Traffic mode — sample row (google.com)
{"domain": "google.com","siteName": "google.com","title": "Publishing Partner Program","globalRank": 1,"countryRank": { "country": "US", "countryId": 840, "rank": 1 },"categoryRank": { "rank": 1, "category": "Computers_Electronics_and_Technology/Search_Engines" },"category": "computers_electronics_and_technology/search_engines","totalVisits": 86850607710,"bounceRate": 0.282,"pagesPerVisit": 8.71,"avgVisitDuration": 614.32,"engagementMonth": "2026-03","trafficSources": {"direct": 0.925, "search": 0.008, "social": 0.029,"referrals": 0.017, "paidReferrals": 0.008, "mail": 0.008},"topCountries": [{ "countryCode": "US", "countryId": 840, "share": 0.244 },{ "countryCode": "JP", "countryId": 392, "share": 0.056 }],"topKeywords": [{ "keyword": "gemini", "volume": 123107710, "cpc": 0.24, "estimatedValue": 185450780 }],"aiTrafficDetails": {"totalAiVisits": 350694197,"aiReferralShare": 0.0041,"aiTrafficTier": "<500M","topChatbots": [{ "name": "chatgpt.com", "share": 51.11 },{ "name": "claude.ai", "share": 35.76 },{ "name": "perplexity.ai", "share": 6.75 }],"topPrompts": ["What is the most popular search engine?","How can I find information online?"],"aiPromptsStatus": { "code": 0, "error": null }},"aiChatbotsRanked": [{ "name": "chatgpt.com", "rank": 1 },{ "name": "claude.ai", "rank": 2 },{ "name": "perplexity.ai", "rank": 3 }],"dataSource": "estimated","snapshotDate": "2026-03-01T00:00:00+00:00","isSmall": false,"_meta": { "schemaVersion": 1, "policy": 1 },"_error": null}
Domain analysis mode — sample row (github.com)
{"domain": "github.com","whois": {"registrar": "MarkMonitor Inc.","createdDate": "2007-10-09T18:20:50Z","updatedDate": "2024-09-07T09:16:32Z","expiresDate": "2026-10-09T18:20:50Z","registrantOrg": null,"registrantCountry": null,"nameServers": ["dns1.p08.nsone.net", "ns-421.awsdns-52.com"]},"whoisError": null,"keywordDensity": {"1": [{ "ngram": "github", "count": 53, "frequency": 0.0525 },{ "ngram": "code", "count": 25, "frequency": 0.0248 }],"2": [{ "ngram": "explore github", "count": 10, "frequency": 0.0099 },{ "ngram": "github copilot", "count": 8, "frequency": 0.0079 }],"3": [{ "ngram": "github advanced security", "count": 3, "frequency": 0.003 }]},"keywordDensityError": null,"htmlFetchedBytes": 566856,"htmlFetchProxyTier": "user","_error": null}
Behavior change vs v0.1: aiTrafficDetails.totalAiVisits and aiReferralShare now return null (not 0) when SimilarWeb has no AI data for the domain. Update any === 0 checks to handle null.
Use cases — SEO audit, lead generation, AI traffic monitoring
- Competitive analysis and SEO audit — compare global rank, country rank, top keywords, and traffic sources across competitor domains.
- AI traffic monitoring — track how much referral traffic ChatGPT, Claude, Perplexity, Gemini, and other chatbots send to your site or your competitors.
- Lead generation and sales intelligence — enrich CRM records with traffic volume, top keywords, and WHOIS contact metadata.
- Domain investment research — pair WHOIS expiration dates with traffic trends to spot dropping or undervalued domains.
- Marketing budget allocation — break down traffic source share (direct, search, social, paid, referrals, mail) to decide where to spend.
- On-page content audit — use 1-to-5-word keyword density to check stuffing, content relevance, and phrase frequency on any homepage.
- Brand monitoring — see which AI prompts surface a domain and whether share is growing or shrinking month over month.
FAQ — pricing, legality, integrations, API access
How much does this SimilarWeb scraper cost?
Pay-per-result. You pay $5 for 1,000 results ($0.005/result) — and only when we actually return data. No per-run fee, no platform-usage fee, no charge for failed or empty lookups. The Apify Free plan gives you $5 in monthly credits — about 1,000 results. The $29/month Starter plan covers about 5,800 results.
No subscription lock-in. Pause whenever.
Is it legal to scrape SimilarWeb?
Scraping publicly accessible pages is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches public endpoints, but how you use the output is on you.
Apify's full breakdown: Is web scraping legal?.
Can I integrate the SimilarWeb scraper with other tools?
Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.
Full list: Apify integrations.
Can I run the SimilarWeb scraper through the Apify API?
Yes. Every run is available via the Apify REST API:
curl -X POST "https://api.apify.com/v2/acts/sourabhbgp~similarweb-scraper/runs?token=APIFY_TOKEN" \-H "Content-Type: application/json" \-d '{"mode":"traffic","domains":["google.com","amazon.com"]}'
Docs: Apify API reference.
Can I use this SimilarWeb scraper through an MCP Server?
Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call this scraper. Setup: Apify MCP docs.
Your feedback
Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.