Similarweb Scraper avatar

Similarweb Scraper

Pricing

from $1.00 / 1,000 domain analyseds

Go to Apify Store
Similarweb Scraper

Similarweb Scraper

⚡ Get Similarweb data for any domain in seconds — traffic, ranks, top keywords, AI traffic share (ChatGPT/Claude/Gemini), similar sites, company info & WHOIS. No login or API key. Bulk parallel scrape, captcha-resilient. Export to JSON/CSV/Excel. Perfect for SEO, lead gen & competitor analysis.

Pricing

from $1.00 / 1,000 domain analyseds

Rating

0.0

(0)

Developer

VortexData

VortexData

Maintained by Community

Actor stats

1

Bookmarked

3

Total users

2

Monthly active users

2 days ago

Last modified

Share

🔍 Similarweb Scraper

📊 Website intelligence for any domain in seconds. Global and country rankings, engagement metrics, traffic sources, top keywords, AI traffic share, similar sites and WHOIS — all in one parallel run, exported as JSON, CSV, Excel or any other format Apify supports.

💎 What is Similarweb Scraper?

Similarweb Scraper is a fast, captcha-resilient web scraper that pulls the same data the Similarweb web app shows you — without requiring a Similarweb account, login, or API key. Behind the scenes it talks to Similarweb's own SPA data endpoint using a real Chrome TLS / JA3 fingerprint via curl_cffi and routes every request through a fresh Apify Residential proxy session, so you get reliable, production-grade data for any domain.

You give it a list of domains and pick which datasets you want. The Actor returns one merged record per domain (or one record per dataset per domain, your choice) ready to drop into a spreadsheet, BI tool, warehouse, or AI agent.

🚀 What can Similarweb Scraper do?

  • 🗂️ Scrape three independent datasets in a single run:
    • 📊 Base data — global / country / category ranks, monthly visits, bounce rate, pages per visit, time on site, traffic-source split (direct, search, referral, social, paid, mail), top organic keywords with volume and CPC, AI traffic share per LLM (ChatGPT / Claude / Gemini / Perplexity / Copilot).
    • 🪞 Similar sites — competitors and alternatives with their traffic, category and ranking.
    • 🆔 AITDK — WHOIS via RDAP (registrar, registration / expiration dates, name servers, EPP status, DNSSEC) plus on-page keyword density analysis of the domain's homepage.
  • Process domains in parallel — up to 10 concurrent fetches by default, each with isolated retry budgets so a flaky stream cannot drain attempts of another.
  • 🛡️ Captcha-resilient — uses Similarweb's open SPA endpoint that serves 200 OK to Chrome TLS fingerprints, no captcha solving required for base data and similar sites.
  • 🔄 Per-request IP rotation — every HTTP call gets a fresh Apify Residential proxy session, so a blocked address costs at most one attempt.
  • 🌐 Three input formats — accepts example.com, www.example.com, or https://example.com. The domain is extracted automatically.
  • 📤 Two output shapesaggregated (one merged record per domain) or individual (one record per dataset).

☁️ Remember the Apify platform

Running this Actor on Apify gives you everything that comes with the platform out of the box: managed Residential proxies with global exit IPs, scheduling (run hourly / daily / weekly), free storage in Apify Datasets with export to JSON / CSV / Excel / JSONL / XML / RSS, webhooks and integrations (Make, Zapier, n8n, Google Sheets, Slack, Airtable, Pipedream), and a REST API + Python / JavaScript SDKs to plug results into your own pipelines.

🗝️ What data can this Actor extract?

Field groupExamples
RankingsGlobal rank · country rank · category rank
EngagementTotal visits · monthly visits (3 months) · bounce rate · pages / visit · time on site
Traffic sourcesDirect · search · referral · social · paid · mail (as shares)
AI traffic shareChatGPT · Claude · Gemini · Perplexity · Copilot — current + 3-month history
Top keywordsKeyword · estimated value · search volume · CPC
Country breakdownTop countries with share + monthly visit estimates per country
Similar sitesUp to 20 related sites with traffic, ranks, category and thumbnails
WHOIS (via RDAP)Registrar · IANA ID · registration / expiration / last-changed dates · name servers · EPP status
Keyword densityTop-20 non-stopword tokens from the homepage with count and density
AssetsDesktop / mobile screenshots · favicon

🎯 How to use Similarweb Scraper

  1. Click Try for free on the Actor's Apify Store page.
  2. In the Domains field, paste the list of websites you want to analyse — one per line, any format (example.com, www.example.com, or https://example.com).
  3. Pick exactly one Dataset to fetch. Start with base_data if you just want the standard Similarweb dashboard data.
  4. Choose an Output shape: Aggregated if you'll open the results in a spreadsheet, Individual if you'll join the streams downstream.
  5. Click Start. Each domain costs about half a second of compute on the base endpoint; AITDK takes longer because it scrapes each homepage.
  6. When the run finishes, open Storage → Dataset and export to JSON, CSV, Excel, JSONL, XML or RSS. Or pull the results through the API: https://api.apify.com/v2/datasets/{dataset_id}/items.

💰 Pricing

This Actor is billed by Apify compute units (CU) consumed and Apify Proxy traffic used — see Apify's platform pricing. Typical CU consumption per run:

Datasets selectedApprox. CU per 100 domains
base_data only~0.01 CU
base_data + similar_sites~0.02 CU
base_data + aitdk~0.05 CU (homepage fetch)
All three datasets~0.06 CU

Costs scale roughly linearly with the number of domains. The Apify free tier (5 USD / month) covers thousands of base-data lookups.

📥 Input

The form has three fields only — everything else has sensible defaults:

FieldTypeDefault
domainsarrayrequired
datasetsenumbase_data — one of base_data, similar_sites, aitdk
output_modeenumaggregated (or individual)

Example input

{
"domains": ["openai.com", "apple.com", "reddit.com"],
"datasets": "base_data",
"output_mode": "aggregated"
}

📤 Output

Each item conforms to the dataset schema and is rendered in the Apify Console as five tables: 📊 Overview · 🚦 Traffic sources · 💫 Engagement · 🤖 AI traffic share · 🆔 AITDK (WHOIS + keywords).

Example item (aggregated, abridged)

{
"domain": "openai.com",
"types_included": ["base_data", "aitdk"],
"rankGlobal": 207,
"country": "US",
"countryRank": 306,
"category": "ai_chatbots_and_tools",
"categoryRank": 6,
"title": "OpenAI",
"totalVisits": 195737812,
"bounceRate": 0.5937,
"pagesPerVisit": 2.59,
"timeOnSite": 138.72,
"socialTraffic": 0.0287,
"searchTraffic": 0.2154,
"directTraffic": 0.3840,
"referralTraffic": 0.1038,
"aiTrafficShareChatgpt": 0.8825,
"aiTrafficShareClaude": 0.0029,
"aiTrafficShareGemini": 0.0106,
"topKeywords": [
{"keyword": "chatgpt", "estimatedValue": 20907500.0, "searchVolume": 173339160.0, "cpc": 0.14},
{"keyword": "chat gpt", "estimatedValue": 5688810.0, "searchVolume": 95011780.0, "cpc": 0.14}
],
"aitdk_data": {
"whois_registrar": "MarkMonitor Inc.",
"whois_registration_date": "2007-01-19T19:28:24Z",
"whois_expiration_date": "2029-01-19T19:28:24Z",
"whois_name_servers": ["ns1-02.azure-dns.com", "ns2-02.azure-dns.net"],
"keyword_density_total_words": 283,
"keyword_density": [
{"keyword": "chatgpt", "count": 14, "density": 0.0495},
{"keyword": "research", "count": 13, "density": 0.0459}
]
}
}

🔗 Integrate Similarweb Scraper anywhere

Apify Actors run on a REST API — every run, dataset and webhook is addressable from your code:

# Trigger a run from anywhere
curl -X POST "https://api.apify.com/v2/acts/<USER>~similarweb-scraper/runs?token=<API_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"domains": ["openai.com"], "datasets": ["base_data"]}'
# Read results from the run's default dataset
curl "https://api.apify.com/v2/datasets/<DATASET_ID>/items?format=json"

Or use the official Python and JavaScript clients.

❓ FAQ

🔑 Do I need a Similarweb account or API key? No. This Actor talks to Similarweb's public SPA endpoint directly. No login, no API key, and no scraping the captcha-gated in-depth pages.

🆕 Is the data fresh? Yes — it's the same JSON Similarweb's UI loads. The snapshotDate field on every record tells you exactly which month it represents. Similarweb refreshes its traffic data monthly.

⏰ Can I run this on a schedule? Yes — open the Actor in Apify Console, go to Schedules and pick hourly, daily, weekly or a custom cron. Combine with webhooks to push fresh data into Google Sheets, Slack, Make, Zapier or your own backend automatically.

⚖️ Is web scraping legal? Public web pages are generally legal to scrape, but you must respect copyright, terms of service, and personal-data protection laws (GDPR in the EU and similar regulations elsewhere). This Actor only extracts publicly visible data — no personal data is collected. See Apify's legal blog for details.

💬 Support

📝 Changelog

See CHANGELOG.md for the full release history. The Actor follows Semantic Versioning.