SimilarWeb Scraper - Traffic, AI & WHOIS avatar

SimilarWeb Scraper - Traffic, AI & WHOIS

Pricing

$5.00 / 1,000 results

Go to Apify Store
SimilarWeb Scraper - Traffic, AI & WHOIS

SimilarWeb Scraper - Traffic, AI & WHOIS

Extract website traffic analytics from SimilarWeb for any domain. Rankings, monthly visits, bounce rate, traffic sources, keywords, and AI chatbot traffic data. Plus a domain-analysis mode with RDAP WHOIS and 1-to-5-word keyword density from the homepage. $5 per 1,000 results. Fast, HTTP-only API.

Pricing

$5.00 / 1,000 results

Rating

5.0

(1)

Developer

Sourabh Kumar

Sourabh Kumar

Maintained by Community

Actor stats

2

Bookmarked

55

Total users

38

Monthly active users

5 days ago

Last modified

Share

Pull SimilarWeb traffic, AI chatbot referrals, WHOIS, and 1‑to‑5‑word keyword density for any domain. No login, no contract, no Partner-API tier required.

$5 per 1,000 results — no per-run fee, no platform-usage fee, and you only pay when we return data. Failed or empty lookups are free.

Two modes in one actor:

  • traffic — SimilarWeb metrics + the most complete AI-traffic block in the segment (ranked AI sources, traffic tier, top prompts, GA-verified flag).
  • domainAnalysis — RDAP WHOIS + 1‑to‑5‑word keyword density from the homepage HTML.

Why scrape SimilarWeb instead of using the official API

SimilarWeb sells an enterprise API that gates the deepest data behind sales calls. There are also a dozen other SimilarWeb scrapers on the Apify Store. Here's where this one wins.

ConcernOther actorsThis actor
AI chatbot referralsUsually missing, or top‑3 onlyRanked AI sources, traffic tier, top prompts, 3‑month trend
dataSource (GA‑verified vs estimated)Not exposed anywhere elseReturned per row
WHOIS + n‑gram on the same input listSplit across separate actorsOne actor, second mode
WAF / CloudFront / CAPTCHA blocksOften crash the runDetected, proxy-rotated, retried; persistent blocks emit _error and the run keeps going
Per‑run feeSome charge $0.10 start or $0.50 floorNone

We're not the cheapest. We are the most complete on AI traffic.

Key features of this SimilarWeb scraper

  • SimilarWeb traffic data — global rank, country rank, category rank, monthly visits, bounce rate, pages per visit, and average visit duration.
  • AI chatbot referral tracking — see how much traffic ChatGPT, Claude, Perplexity, Gemini, Grok, and Copilot drive to a domain.
  • GA-verified data source flagdataSource: "ga-verified" when SimilarWeb's data is backed by a Google Analytics integration, "estimated" otherwise. Nobody else exposes this.
  • AI traffic tier and 3-month chatbot trendsaiTrafficTier (e.g. "<500M") plus a per-chatbot share history across the last three months.
  • Top AI prompts driving traffic — the actual prompts users type into chatbots that surface the target domain.
  • Bulk RDAP WHOIS lookup — registrar, creation/expiration/update dates, registrant org and country, nameservers — for any TLD, no API key required.
  • 1-to-5-word keyword density analysis — n-gram phrase frequency on the target homepage with English stopword filtering.
  • WAF, CloudFront, and CAPTCHA detection — automatic proxy rotation and retry; persistent blocks emit a row with _error instead of crashing the run.
  • Multi-tier proxy fallback — direct → datacenter → US residential, with your supplied proxyConfiguration preferred when present.
  • HTTP-only architecture — no headless browser, no Playwright, low compute cost per row.
  • Bulk domain processing — feed any number of domains; the run streams rows as they're scraped and respects maxItems.

What data this SimilarWeb scraper returns

📊 Global / country / category rank📈 3‑month visit history🎯 Bounce rate, pages/visit, duration🔗 Traffic source split
🌍 Top 5 countries🔑 Top 5 keywords + CPC🤖 Ranked AI chatbot referrals💬 Top AI prompts
✅ GA‑verified flag📝 WHOIS via RDAP🧮 1‑to‑5‑word keyword density📸 Screenshot URL + isSmall

Traffic mode fields

FieldTypeWhat it is
domain, siteName, title, descriptionstringIdentity
globalRank, countryRank, categoryRank, globalCategoryRankint / objectRanking signals
categorystringSimilarWeb category
totalVisits, estimatedMonthlyVisitsnumber / objectVisit volumes
bounceRate, pagesPerVisit, avgVisitDuration, engagementMonthnumber / stringEngagement metrics
trafficSourcesobjectDirect, search, social, referrals, paid, mail (fractions ≈ 1.0)
topCountriesarray (max 5)Country code, country id, share
topKeywordsarray (max 5)Keyword, volume, CPC, estimated value
aiTrafficDetailsobjecttotalAiVisits, aiReferralShare, aiTrafficTier, topChatbots, chatbotTrends, topPrompts, aiPromptsStatus
aiChatbotsRankedarrayFull ranked AI source list (6–7 entries for popular sites)
dataSourceenum"ga-verified" or "estimated"
serverNotice, largeScreenshot, snapshotDate, isSmallmiscSide data
_meta, _errorobject / stringForward compatibility + failure reason

Domain analysis mode fields

FieldTypeWhat it is
domainstringNormalized input
whoisobjectregistrar, createdDate, updatedDate, expiresDate, registrantOrg, registrantCountry, nameServers
whoisErrorstring"rdap_not_found", "rdap_rate_limited", "rdap_unreachable"
keywordDensityobject{ "1": [...], "2": [...], ..., "5": [...] } — each entry has ngram, count, frequency
keywordDensityErrorstring"html_fetch_failed", "empty_body", "cloudflare_blocked", etc.
htmlFetchedBytesintBytes pulled (capped at 1 MB)
htmlFetchProxyTierstringWhich tier landed the body ("user", "direct", "datacenter", "residential")
_errorstringSet when every configured subtask failed

Track AI chatbot referrals — ChatGPT, Claude, Perplexity, Gemini

Traffic mode returns a per-domain breakdown of which AI chatbots refer traffic. Two fields cover it:

  • aiChatbotsRanked is a compact ranked list of every chatbot SimilarWeb sees driving traffic to the domain (typically 6–7 sources for popular sites: chatgpt.com, claude.ai, perplexity.ai, gemini.google.com, grok.com, copilot.microsoft.com).
  • aiTrafficDetails is the verbose block: total AI visits, AI referral share, AI traffic tier (e.g. "<500M"), top prompts users type, and a 3-month per-chatbot share history.

Set includeAiBreakdown: false to drop the verbose block but keep aiChatbotsRanked. Set includeIcons: true to add chatbot icon URLs.

Bulk WHOIS lookup and 1-to-5-word keyword density analysis

domainAnalysis mode covers two on-page SEO use cases against the same domain list you feed traffic mode.

  • WHOIS via RDAP. The actor queries rdap.org, which auto-routes by TLD — works for .com, .io, .net, country TLDs, and most others without an API key. Returns registrar, registration/update/expiration dates, registrant org and country, and nameservers.
  • Keyword density. The actor fetches the target homepage (capped at 1 MB), strips scripts/styles via cheerio, tokenizes with a 25-word English stopword filter, and returns the top N n-grams per size. Defaults: sizes [1,2,3,4,5], top 50 per size. Cloudflare-protected sites (e.g. wsj.com) emit keywordDensityError: "cloudflare_blocked" instead of HTML — the run continues.

How to scrape SimilarWeb: step by step

  1. Create a free Apify account. 30 seconds, no card.
  2. Open the SimilarWeb Scraper in the Apify Console.
  3. Paste your domains. https://, www., and trailing slashes are stripped for you.
  4. Click Start. A 3‑domain run finishes in under a minute; bulk runs scale roughly linearly.
  5. Export the dataset as JSON, CSV, or Excel — or fetch via API.

How much does it cost to scrape SimilarWeb?

Pay-per-result. $5 per 1,000 results ($0.005/result) — and that's all you pay. No per-run fee, no platform-usage fee, no charge for failed or empty results. We only bill when we successfully return data for a domain. Both modes bill the same flat rate.

  • Apify Free plan ($5/month credit): around 1,000 results/month.
  • Apify Starter plan ($29/month): about 5,800 results/month.

The actor is HTTP-only — no headless browser — and platform compute is on us, not on your bill.

Input parameters for the SimilarWeb scraper

Both modes share domains + maxItems + proxyConfiguration. Mode-specific flags toggle the rest.

{
"mode": "traffic",
"domains": ["google.com", "amazon.com", "github.com"],
"maxItems": 100,
"includeAiBreakdown": true,
"includeIcons": false,
"proxyConfiguration": { "useApifyProxy": true }
}

For domainAnalysis:

{
"mode": "domainAnalysis",
"domains": ["github.com", "stripe.com"],
"includeWhois": true,
"includeKeywordDensity": true,
"keywordDensityNGrams": [1, 2, 3, 4, 5],
"keywordDensityTopN": 50
}
FieldTypeDefaultNote
modeenum"traffic""traffic" or "domainAnalysis". Strict — typos fail at the gateway.
domainsstring[]prefilled sampleEach item ≤253 chars. Schemes and www. stripped automatically.
maxItemsintnoneCaps the number of rows processed.
includeAiBreakdownbooltrueTraffic mode. Off keeps aiChatbotsRanked but drops the verbose aiTrafficDetails block.
includeIconsboolfalseTraffic mode. Adds chatbot icon URLs.
includeWhoisbooltrueDomainAnalysis mode. RDAP lookup via rdap.org.
includeKeywordDensitybooltrueDomainAnalysis mode. Fetches the homepage (≤1MB) and tokenizes.
keywordDensityNGramsint[][1,2,3,4,5]Sizes to compute, each between 1 and 8.
keywordDensityTopNint50Top N n‑grams returned per size.
proxyConfigurationobjectApify ProxyThe actor falls back through datacenter and US residential when a tier gets blocked. Traffic mode also tries a direct connection first; domain‑analysis HTML fetch skips direct because most target sites bot‑detect it.

Output examples — traffic, AI referrals, WHOIS, and keyword density

You can download the dataset in JSON, HTML, CSV, or Excel — or stream it through the Apify API.

Traffic mode — sample row (google.com)

{
"domain": "google.com",
"siteName": "google.com",
"title": "Publishing Partner Program",
"globalRank": 1,
"countryRank": { "country": "US", "countryId": 840, "rank": 1 },
"categoryRank": { "rank": 1, "category": "Computers_Electronics_and_Technology/Search_Engines" },
"category": "computers_electronics_and_technology/search_engines",
"totalVisits": 86850607710,
"bounceRate": 0.282,
"pagesPerVisit": 8.71,
"avgVisitDuration": 614.32,
"engagementMonth": "2026-03",
"trafficSources": {
"direct": 0.925, "search": 0.008, "social": 0.029,
"referrals": 0.017, "paidReferrals": 0.008, "mail": 0.008
},
"topCountries": [
{ "countryCode": "US", "countryId": 840, "share": 0.244 },
{ "countryCode": "JP", "countryId": 392, "share": 0.056 }
],
"topKeywords": [
{ "keyword": "gemini", "volume": 123107710, "cpc": 0.24, "estimatedValue": 185450780 }
],
"aiTrafficDetails": {
"totalAiVisits": 350694197,
"aiReferralShare": 0.0041,
"aiTrafficTier": "<500M",
"topChatbots": [
{ "name": "chatgpt.com", "share": 51.11 },
{ "name": "claude.ai", "share": 35.76 },
{ "name": "perplexity.ai", "share": 6.75 }
],
"topPrompts": [
"What is the most popular search engine?",
"How can I find information online?"
],
"aiPromptsStatus": { "code": 0, "error": null }
},
"aiChatbotsRanked": [
{ "name": "chatgpt.com", "rank": 1 },
{ "name": "claude.ai", "rank": 2 },
{ "name": "perplexity.ai", "rank": 3 }
],
"dataSource": "estimated",
"snapshotDate": "2026-03-01T00:00:00+00:00",
"isSmall": false,
"_meta": { "schemaVersion": 1, "policy": 1 },
"_error": null
}

Domain analysis mode — sample row (github.com)

{
"domain": "github.com",
"whois": {
"registrar": "MarkMonitor Inc.",
"createdDate": "2007-10-09T18:20:50Z",
"updatedDate": "2024-09-07T09:16:32Z",
"expiresDate": "2026-10-09T18:20:50Z",
"registrantOrg": null,
"registrantCountry": null,
"nameServers": ["dns1.p08.nsone.net", "ns-421.awsdns-52.com"]
},
"whoisError": null,
"keywordDensity": {
"1": [
{ "ngram": "github", "count": 53, "frequency": 0.0525 },
{ "ngram": "code", "count": 25, "frequency": 0.0248 }
],
"2": [
{ "ngram": "explore github", "count": 10, "frequency": 0.0099 },
{ "ngram": "github copilot", "count": 8, "frequency": 0.0079 }
],
"3": [
{ "ngram": "github advanced security", "count": 3, "frequency": 0.003 }
]
},
"keywordDensityError": null,
"htmlFetchedBytes": 566856,
"htmlFetchProxyTier": "user",
"_error": null
}

Behavior change vs v0.1: aiTrafficDetails.totalAiVisits and aiReferralShare now return null (not 0) when SimilarWeb has no AI data for the domain. Update any === 0 checks to handle null.

Use cases — SEO audit, lead generation, AI traffic monitoring

  • Competitive analysis and SEO audit — compare global rank, country rank, top keywords, and traffic sources across competitor domains.
  • AI traffic monitoring — track how much referral traffic ChatGPT, Claude, Perplexity, Gemini, and other chatbots send to your site or your competitors.
  • Lead generation and sales intelligence — enrich CRM records with traffic volume, top keywords, and WHOIS contact metadata.
  • Domain investment research — pair WHOIS expiration dates with traffic trends to spot dropping or undervalued domains.
  • Marketing budget allocation — break down traffic source share (direct, search, social, paid, referrals, mail) to decide where to spend.
  • On-page content audit — use 1-to-5-word keyword density to check stuffing, content relevance, and phrase frequency on any homepage.
  • Brand monitoring — see which AI prompts surface a domain and whether share is growing or shrinking month over month.

FAQ — pricing, legality, integrations, API access

How much does this SimilarWeb scraper cost?

Pay-per-result. You pay $5 for 1,000 results ($0.005/result) — and only when we actually return data. No per-run fee, no platform-usage fee, no charge for failed or empty lookups. The Apify Free plan gives you $5 in monthly credits — about 1,000 results. The $29/month Starter plan covers about 5,800 results.

No subscription lock-in. Pause whenever.

Scraping publicly accessible pages is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches public endpoints, but how you use the output is on you.

Apify's full breakdown: Is web scraping legal?.

Can I integrate the SimilarWeb scraper with other tools?

Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.

Full list: Apify integrations.

Can I run the SimilarWeb scraper through the Apify API?

Yes. Every run is available via the Apify REST API:

curl -X POST "https://api.apify.com/v2/acts/sourabhbgp~similarweb-scraper/runs?token=APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"mode":"traffic","domains":["google.com","amazon.com"]}'

Docs: Apify API reference.

Can I use this SimilarWeb scraper through an MCP Server?

Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call this scraper. Setup: Apify MCP docs.

Your feedback

Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.