Similarweb Scraper
Pricing
from $1.00 / 1,000 domain analyseds
Similarweb Scraper
⚡ Get Similarweb data for any domain in seconds — traffic, ranks, top keywords, AI traffic share (ChatGPT/Claude/Gemini), similar sites, company info & WHOIS. No login or API key. Bulk parallel scrape, captcha-resilient. Export to JSON/CSV/Excel. Perfect for SEO, lead gen & competitor analysis.
Pricing
from $1.00 / 1,000 domain analyseds
Rating
0.0
(0)
Developer
VortexData
Maintained by CommunityActor stats
1
Bookmarked
3
Total users
2
Monthly active users
a day ago
Last modified
Categories
Share
🔍 Similarweb Scraper
📊 Website intelligence for any domain in seconds. Global and country rankings, engagement metrics, traffic sources, top keywords, AI traffic share, similar sites and WHOIS — all in one parallel run, exported as JSON, CSV, Excel or any other format Apify supports.
💎 What is Similarweb Scraper?
Similarweb Scraper is a fast, captcha-resilient web scraper that pulls
the same data the Similarweb web app shows you — without requiring a
Similarweb account, login, or API key. Behind the scenes it talks to
Similarweb's own SPA data endpoint using a real Chrome TLS / JA3
fingerprint via curl_cffi
and routes every request through a fresh Apify Residential proxy
session, so you get reliable, production-grade data for any domain.
You give it a list of domains and pick which datasets you want. The Actor returns one merged record per domain (or one record per dataset per domain, your choice) ready to drop into a spreadsheet, BI tool, warehouse, or AI agent.
🚀 What can Similarweb Scraper do?
- 🗂️ Scrape three independent datasets in a single run:
- 📊 Base data — global / country / category ranks, monthly visits, bounce rate, pages per visit, time on site, traffic-source split (direct, search, referral, social, paid, mail), top organic keywords with volume and CPC, AI traffic share per LLM (ChatGPT / Claude / Gemini / Perplexity / Copilot).
- 🪞 Similar sites — competitors and alternatives with their traffic, category and ranking.
- 🆔 AITDK — WHOIS via RDAP (registrar, registration / expiration dates, name servers, EPP status, DNSSEC) plus on-page keyword density analysis of the domain's homepage.
- ⚡ Process domains in parallel — up to 10 concurrent fetches by default, each with isolated retry budgets so a flaky stream cannot drain attempts of another.
- 🛡️ Captcha-resilient — uses Similarweb's open SPA endpoint that
serves
200 OKto Chrome TLS fingerprints, no captcha solving required for base data and similar sites. - 🔄 Per-request IP rotation — every HTTP call gets a fresh Apify Residential proxy session, so a blocked address costs at most one attempt.
- 🌐 Three input formats — accepts
example.com,www.example.com, orhttps://example.com. The domain is extracted automatically. - 📤 Two output shapes — aggregated (one merged record per domain) or individual (one record per dataset).
☁️ Remember the Apify platform
Running this Actor on Apify gives you everything that comes with the platform out of the box: managed Residential proxies with global exit IPs, scheduling (run hourly / daily / weekly), free storage in Apify Datasets with export to JSON / CSV / Excel / JSONL / XML / RSS, webhooks and integrations (Make, Zapier, n8n, Google Sheets, Slack, Airtable, Pipedream), and a REST API + Python / JavaScript SDKs to plug results into your own pipelines.
🗝️ What data can this Actor extract?
| Field group | Examples |
|---|---|
| Rankings | Global rank · country rank · category rank |
| Engagement | Total visits · monthly visits (3 months) · bounce rate · pages / visit · time on site |
| Traffic sources | Direct · search · referral · social · paid · mail (as shares) |
| AI traffic share | ChatGPT · Claude · Gemini · Perplexity · Copilot — current + 3-month history |
| Top keywords | Keyword · estimated value · search volume · CPC |
| Country breakdown | Top countries with share + monthly visit estimates per country |
| Similar sites | Up to 20 related sites with traffic, ranks, category and thumbnails |
| WHOIS (via RDAP) | Registrar · IANA ID · registration / expiration / last-changed dates · name servers · EPP status |
| Keyword density | Top-20 non-stopword tokens from the homepage with count and density |
| Assets | Desktop / mobile screenshots · favicon |
🎯 How to use Similarweb Scraper
- Click Try for free on the Actor's Apify Store page.
- In the Domains field, paste the list of websites you want to
analyse — one per line, any format (
example.com,www.example.com, orhttps://example.com). - Pick exactly one Dataset to fetch. Start with
base_dataif you just want the standard Similarweb dashboard data. - Choose an Output shape: Aggregated if you'll open the results in a spreadsheet, Individual if you'll join the streams downstream.
- Click Start. Each domain costs about half a second of compute on the base endpoint; AITDK takes longer because it scrapes each homepage.
- When the run finishes, open Storage → Dataset and export to
JSON, CSV, Excel, JSONL, XML or RSS. Or pull the results through the
API:
https://api.apify.com/v2/datasets/{dataset_id}/items.
💰 Pricing
This Actor is billed by Apify compute units (CU) consumed and Apify Proxy traffic used — see Apify's platform pricing. Typical CU consumption per run:
| Datasets selected | Approx. CU per 100 domains |
|---|---|
base_data only | ~0.01 CU |
base_data + similar_sites | ~0.02 CU |
base_data + aitdk | ~0.05 CU (homepage fetch) |
| All three datasets | ~0.06 CU |
Costs scale roughly linearly with the number of domains. The Apify free tier (5 USD / month) covers thousands of base-data lookups.
📥 Input
The form has three fields only — everything else has sensible defaults:
| Field | Type | Default |
|---|---|---|
domains | array | required |
datasets | enum | base_data — one of base_data, similar_sites, aitdk |
output_mode | enum | aggregated (or individual) |
Example input
{"domains": ["openai.com", "apple.com", "reddit.com"],"datasets": "base_data","output_mode": "aggregated"}
📤 Output
Each item conforms to the dataset schema and is rendered in the Apify Console as five tables: 📊 Overview · 🚦 Traffic sources · 💫 Engagement · 🤖 AI traffic share · 🆔 AITDK (WHOIS + keywords).
Example item (aggregated, abridged)
{"domain": "openai.com","types_included": ["base_data", "aitdk"],"rankGlobal": 207,"country": "US","countryRank": 306,"category": "ai_chatbots_and_tools","categoryRank": 6,"title": "OpenAI","totalVisits": 195737812,"bounceRate": 0.5937,"pagesPerVisit": 2.59,"timeOnSite": 138.72,"socialTraffic": 0.0287,"searchTraffic": 0.2154,"directTraffic": 0.3840,"referralTraffic": 0.1038,"aiTrafficShareChatgpt": 0.8825,"aiTrafficShareClaude": 0.0029,"aiTrafficShareGemini": 0.0106,"topKeywords": [{"keyword": "chatgpt", "estimatedValue": 20907500.0, "searchVolume": 173339160.0, "cpc": 0.14},{"keyword": "chat gpt", "estimatedValue": 5688810.0, "searchVolume": 95011780.0, "cpc": 0.14}],"aitdk_data": {"whois_registrar": "MarkMonitor Inc.","whois_registration_date": "2007-01-19T19:28:24Z","whois_expiration_date": "2029-01-19T19:28:24Z","whois_name_servers": ["ns1-02.azure-dns.com", "ns2-02.azure-dns.net"],"keyword_density_total_words": 283,"keyword_density": [{"keyword": "chatgpt", "count": 14, "density": 0.0495},{"keyword": "research", "count": 13, "density": 0.0459}]}}
🔗 Integrate Similarweb Scraper anywhere
Apify Actors run on a REST API — every run, dataset and webhook is addressable from your code:
# Trigger a run from anywherecurl -X POST "https://api.apify.com/v2/acts/<USER>~similarweb-scraper/runs?token=<API_TOKEN>" \-H "Content-Type: application/json" \-d '{"domains": ["openai.com"], "datasets": ["base_data"]}'# Read results from the run's default datasetcurl "https://api.apify.com/v2/datasets/<DATASET_ID>/items?format=json"
Or use the official Python and JavaScript clients.
❓ FAQ
🔑 Do I need a Similarweb account or API key? No. This Actor talks to Similarweb's public SPA endpoint directly. No login, no API key, and no scraping the captcha-gated in-depth pages.
🆕 Is the data fresh?
Yes — it's the same JSON Similarweb's UI loads. The snapshotDate
field on every record tells you exactly which month it represents.
Similarweb refreshes its traffic data monthly.
⏰ Can I run this on a schedule? Yes — open the Actor in Apify Console, go to Schedules and pick hourly, daily, weekly or a custom cron. Combine with webhooks to push fresh data into Google Sheets, Slack, Make, Zapier or your own backend automatically.
⚖️ Is web scraping legal? Public web pages are generally legal to scrape, but you must respect copyright, terms of service, and personal-data protection laws (GDPR in the EU and similar regulations elsewhere). This Actor only extracts publicly visible data — no personal data is collected. See Apify's legal blog for details.
💬 Support
- Found a bug or have a feature request? Open an issue in the Actor's Issues tab on Apify Console.
- Questions about Apify itself? Visit docs.apify.com or the Apify Discord community.
📝 Changelog
See CHANGELOG.md for the full release history. The Actor follows Semantic Versioning.