SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking avatar

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

Pricing

from $1.00 / 1,000 results

Go to Apify Store
SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

SimilarWeb Website Scraper - AI Referral, WHOIS & Ranking

Extract SimilarWeb traffic analytics for any domain: rankings, monthly visits, bounce rate, traffic sources, keywords, AI chatbot referrals. Plus RDAP WHOIS and 1-to-5-word keyword density. $1 per 1,000 results. 50 domains in ~10s, 1,000 in under 3 minutes.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(1)

Developer

Sourabh Kumar

Sourabh Kumar

Maintained by Community

Actor stats

3

Bookmarked

89

Total users

48

Monthly active users

3 days ago

Last modified

Share

SimilarWeb scraper — traffic, AI referrals, WHOIS & keywords · $1/1k

Pull SimilarWeb traffic, AI chatbot referrals, RDAP WHOIS, and 1‑to‑5‑word keyword density for any domain. No login, no contract, no Partner-API tier.

$1 per 1,000 results. No per-run fee, no platform-usage fee. Failed or empty lookups are free.

Lightning fast. Scrape 50 domains in ~10 seconds, 200 domains in under a minute, 1,000 domains in under 3 minutes.

Two modes in one actor:

  • traffic — SimilarWeb metrics + the most complete AI-traffic block in the segment.
  • domainAnalysis — RDAP WHOIS + keyword density from the homepage HTML.

Why this scraper

  • 💰 $1 per 1,000 results. Flat per-row pricing. Both modes bill the same. No per-run fee, no tiers.
  • Lightning fast. 50 domains in ~10 seconds. 200 in under a minute. 1,000 in under 3 minutes. Concurrent fetcher, maxConcurrency up to 15.
  • 🤖 The deepest AI-traffic block on the Store. Ranked AI sources, traffic tier, top prompts, and a 3‑month per‑chatbot share history.
  • GA-verified flag. dataSource: "ga-verified" when SimilarWeb's data is backed by a Google Analytics integration, "estimated" otherwise. Nobody else exposes this.
  • 📝 Bulk WHOIS without an API key. RDAP via rdap.org auto-routes by TLD — .com, .io, .net, country TLDs, and most others.
  • 🔍 1-to-5-word keyword density. N-gram phrase frequency on any homepage with English stopword filtering.
  • 🌐 BYO proxies. Paste your own proxyUrls and we route through them first, falling back to Apify Proxy only if blocked.
  • 🛡️ Block-resilient. WAF, CloudFront, and CAPTCHA are detected and the request retries on the next proxy tier. One bad domain never kills the batch.
  • 📦 HTTP-only. No headless browser, no Playwright, low compute per row.

What you get

📊 Global / country / category rank📈 3‑month visit history🎯 Bounce rate, pages/visit, duration🔗 Traffic source split
🌍 Top 5 countries🔑 Top 5 keywords + CPC🤖 Ranked AI chatbot referrals💬 Top AI prompts
✅ GA‑verified flag📝 WHOIS via RDAP🧮 1‑to‑5‑word keyword density📸 Screenshot URL + isSmall

Traffic mode fields

FieldTypeWhat it is
domain, siteName, title, descriptionstringIdentity
globalRank, countryRank, categoryRank, globalCategoryRankint / objectRanking signals
categorystringSimilarWeb category
totalVisits, estimatedMonthlyVisitsnumber / objectVisit volumes
bounceRate, pagesPerVisit, avgVisitDuration, engagementMonthnumber / stringEngagement metrics
trafficSourcesobjectDirect, search, social, referrals, paid, mail (fractions ≈ 1.0)
topCountriesarray (max 5)Country code, country id, share
topKeywordsarray (max 5)Keyword, volume, CPC, estimated value
aiTrafficDetailsobjecttotalAiVisits, aiReferralShare, aiTrafficTier, topChatbots, chatbotTrends, topPrompts, aiPromptsStatus
aiChatbotsRankedarrayFull ranked AI source list (6–7 entries for popular sites)
dataSourceenum"ga-verified" or "estimated"
serverNotice, largeScreenshot, snapshotDate, isSmallmiscSide data
_meta, _errorobject / stringForward compatibility + failure reason

Domain analysis mode fields

FieldTypeWhat it is
domainstringNormalized input
whoisobjectregistrar, createdDate, updatedDate, expiresDate, registrantOrg, registrantCountry, nameServers
whoisErrorstring"rdap_not_found", "rdap_rate_limited", "rdap_unreachable"
keywordDensityobject{ "1": [...], "2": [...], ..., "5": [...] } — each entry has ngram, count, frequency
keywordDensityErrorstring"html_fetch_failed", "empty_body", "cloudflare_blocked", etc.
htmlFetchedBytesintBytes pulled (capped at 1 MB)
htmlFetchProxyTierstringWhich tier landed the body ("user", "direct", "datacenter", "residential")
_errorstringSet when every configured subtask failed

How to scrape SimilarWeb

  1. Create a free Apify account. 30 seconds, no card.
  2. Open the SimilarWeb Scraper in the Apify Console.
  3. Paste your domains. https://, www., and trailing slashes are stripped for you.
  4. Click Start. 50 domains finish in ~10 seconds, 200 in under a minute, 1,000 in under 3 minutes.
  5. Export the dataset as JSON, CSV, or Excel — or fetch via API.

Proxy options — Apify Proxy or bring your own

By default the scraper uses Apify Proxy and you pay nothing extra for it. Two things make it work on tough domains with no setup from you.

  • Automatic fallback. Traffic mode tries a direct connection first (free, fast), then datacenter, then US residential — whichever returns clean data wins. Domain-analysis HTML fetch skips direct because most homepages bot-detect datacenter IPs.
  • Block detection. WAF, CloudFront, and CAPTCHA responses are recognised and the request retries on the next proxy tier. Persistent blocks return a row with an _error field instead of crashing the run.

Bring your own proxies if you already have a residential or ISP plan:

"proxyConfiguration": {
"useApifyProxy": false,
"proxyUrls": [
"http://user:pass@proxy-a.example.com:8080",
"http://user:pass@proxy-b.example.com:8080"
]
}

Your URLs are tried before Apify's tiers, so you only pay for Apify bandwidth when your pool gets blocked. Multiple URLs are rotated per session.

How much does it cost

Pay-per-result. $1 per 1,000 results ($0.001/result). Both modes bill the same flat rate. No per-run fee, no platform-usage fee, no charge for failed or empty results.

  • Apify Free plan ($5/month credit): about 5,000 results/month.
  • Apify Starter plan ($29/month): about 29,000 results/month.

The actor is HTTP-only — no headless browser — and platform compute is on us, not your bill.

Input

Both modes share domains + maxItems + maxConcurrency + proxyConfiguration. Mode-specific flags toggle the rest.

{
"mode": "traffic",
"domains": ["google.com", "amazon.com", "github.com"],
"maxItems": 100,
"maxConcurrency": 8,
"includeAiBreakdown": true,
"includeIcons": false,
"proxyConfiguration": { "useApifyProxy": true }
}

For domainAnalysis:

{
"mode": "domainAnalysis",
"domains": ["github.com", "stripe.com"],
"includeWhois": true,
"includeKeywordDensity": true,
"keywordDensityNGrams": [1, 2, 3, 4, 5],
"keywordDensityTopN": 50
}
FieldTypeDefaultNote
modeenum"traffic""traffic" or "domainAnalysis". Strict — typos fail at the gateway.
domainsstring[]prefilled sampleEach item ≤253 chars. Schemes and www. stripped automatically.
maxItemsintnoneCaps the number of rows processed.
maxConcurrencyint8How many domains to process in parallel (1–15).
includeAiBreakdownbooltrueTraffic mode. Off keeps aiChatbotsRanked but drops the verbose aiTrafficDetails block.
includeIconsboolfalseTraffic mode. Adds chatbot icon URLs.
includeWhoisbooltrueDomainAnalysis mode. RDAP lookup via rdap.org.
includeKeywordDensitybooltrueDomainAnalysis mode. Fetches the homepage (≤1MB) and tokenizes.
keywordDensityNGramsint[][1,2,3,4,5]Sizes to compute, each between 1 and 8.
keywordDensityTopNint50Top N n‑grams returned per size.
proxyConfigurationobjectApify ProxyFalls back through datacenter and US residential when a tier is blocked. Set useApifyProxy: false and pass proxyUrls to bring your own.

Output

You can download the dataset in JSON, HTML, CSV, or Excel — or stream it through the Apify API.

Traffic mode — sample row (google.com)

{
"domain": "google.com",
"siteName": "google.com",
"title": "Publishing Partner Program",
"globalRank": 1,
"countryRank": { "country": "US", "countryId": 840, "rank": 1 },
"categoryRank": { "rank": 1, "category": "Computers_Electronics_and_Technology/Search_Engines" },
"category": "computers_electronics_and_technology/search_engines",
"totalVisits": 86850607710,
"bounceRate": 0.282,
"pagesPerVisit": 8.71,
"avgVisitDuration": 614.32,
"engagementMonth": "2026-03",
"trafficSources": {
"direct": 0.925, "search": 0.008, "social": 0.029,
"referrals": 0.017, "paidReferrals": 0.008, "mail": 0.008
},
"topCountries": [
{ "countryCode": "US", "countryId": 840, "share": 0.244 },
{ "countryCode": "JP", "countryId": 392, "share": 0.056 }
],
"topKeywords": [
{ "keyword": "gemini", "volume": 123107710, "cpc": 0.24, "estimatedValue": 185450780 }
],
"aiTrafficDetails": {
"totalAiVisits": 350694197,
"aiReferralShare": 0.0041,
"aiTrafficTier": "<500M",
"topChatbots": [
{ "name": "chatgpt.com", "share": 51.11 },
{ "name": "claude.ai", "share": 35.76 },
{ "name": "perplexity.ai", "share": 6.75 }
],
"topPrompts": [
"What is the most popular search engine?",
"How can I find information online?"
],
"aiPromptsStatus": { "code": 0, "error": null }
},
"aiChatbotsRanked": [
{ "name": "chatgpt.com", "rank": 1 },
{ "name": "claude.ai", "rank": 2 },
{ "name": "perplexity.ai", "rank": 3 }
],
"dataSource": "estimated",
"snapshotDate": "2026-03-01T00:00:00+00:00",
"isSmall": false,
"_meta": { "schemaVersion": 1, "policy": 1 },
"_error": null
}

Domain analysis mode — sample row (github.com)

{
"domain": "github.com",
"whois": {
"registrar": "MarkMonitor Inc.",
"createdDate": "2007-10-09T18:20:50Z",
"updatedDate": "2024-09-07T09:16:32Z",
"expiresDate": "2026-10-09T18:20:50Z",
"registrantOrg": null,
"registrantCountry": null,
"nameServers": ["dns1.p08.nsone.net", "ns-421.awsdns-52.com"]
},
"whoisError": null,
"keywordDensity": {
"1": [
{ "ngram": "github", "count": 53, "frequency": 0.0525 },
{ "ngram": "code", "count": 25, "frequency": 0.0248 }
],
"2": [
{ "ngram": "explore github", "count": 10, "frequency": 0.0099 },
{ "ngram": "github copilot", "count": 8, "frequency": 0.0079 }
],
"3": [
{ "ngram": "github advanced security", "count": 3, "frequency": 0.003 }
]
},
"keywordDensityError": null,
"htmlFetchedBytes": 566856,
"htmlFetchProxyTier": "user",
"_error": null
}

Use cases

  • Competitive analysis and SEO audit — compare global rank, country rank, top keywords, and traffic sources across competitor domains.
  • AI traffic monitoring — track how much referral traffic ChatGPT, Claude, Perplexity, Gemini, and other chatbots send to your site or your competitors.
  • Lead generation and sales intelligence — enrich CRM records with traffic volume, top keywords, and WHOIS contact metadata.
  • Domain investment research — pair WHOIS expiration dates with traffic trends to spot dropping or undervalued domains.
  • Marketing budget allocation — break down traffic source share (direct, search, social, paid, referrals, mail) to decide where to spend.
  • On-page content audit — use 1-to-5-word keyword density to check stuffing, content relevance, and phrase frequency on any homepage.
  • Brand monitoring — see which AI prompts surface a domain and whether share is growing or shrinking month over month.

Limitations

  • No similar-sites discovery yet. Pulling a domain's competitors / alternatives is on the v0.3 roadmap.
  • Cloudflare-protected homepages block the HTML fetch. Keyword density returns keywordDensityError: "cloudflare_blocked" for sites like wsj.com; the WHOIS portion still works.
  • topCountries and topKeywords are capped at 5 by SimilarWeb's public payload.
  • Privacy-protected WHOIS records return null for registrantOrg and registrantCountry — that's the registrar redacting, not a scraper bug.
  • maxConcurrency is capped at 15. Beyond that, SimilarWeb's rate limits start dominating and total throughput drops.
  • Keyword density runs on the homepage only (max 1 MB of HTML). Sub-page audits aren't supported.

FAQ

How much does this SimilarWeb scraper cost?

Pay-per-result. You pay $1 for 1,000 results ($0.001/result) — and only when we actually return data. No per-run fee, no platform-usage fee, no charge for failed or empty lookups. The Apify Free plan ($5 monthly credit) covers about 5,000 results. The $29/month Starter plan covers about 29,000.

No subscription lock-in. Pause whenever.

Scraping publicly accessible pages is generally allowed in the US and most of the EU, as long as you don't collect personal data covered by GDPR or CCPA without a lawful basis. This actor only touches public endpoints, but how you use the output is on you.

Apify's full breakdown: Is web scraping legal?.

Can I integrate the SimilarWeb scraper with other tools?

Push results into Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. Apify treats every actor as a webhook source, so anything that consumes webhooks or pulls from an API works.

Full list: Apify integrations.

Can I run the SimilarWeb scraper through the Apify API?

Yes. Every run is available via the Apify REST API:

curl -X POST "https://api.apify.com/v2/acts/sourabhbgp~similarweb-scraper/runs?token=APIFY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"mode":"traffic","domains":["google.com","amazon.com"]}'

Docs: Apify API reference.

Can I use this SimilarWeb scraper through an MCP Server?

Yes. Apify ships an MCP server that exposes every actor as a tool, so Claude Desktop, Cursor, and any other MCP-capable client can call this scraper. Setup: Apify MCP docs.

Your feedback

Bug, missing field, or odd behavior? Drop a note in the Issues tab. Reports go to a human and fixes usually ship the same week.