Similarweb Scraper - Traffic, AI Traffic & WHOIS avatar

Similarweb Scraper - Traffic, AI Traffic & WHOIS

Pricing

from $0.80 / 1,000 domains

Go to Apify Store
Similarweb Scraper - Traffic, AI Traffic & WHOIS

Similarweb Scraper - Traffic, AI Traffic & WHOIS

πŸ” Spy on any website in seconds: traffic, rankings, top keywords, AI traffic share (ChatGPT/Claude/Gemini), competitors, similar sites & WHOIS β€” all from Similarweb. No login or API key. Bulk parallel scrape, captcha-resilient. Export to JSON/CSV/Excel. SEO, lead gen, research.

Pricing

from $0.80 / 1,000 domains

Rating

5.0

(1)

Developer

VortexData

VortexData

Maintained by Community

Actor stats

3

Bookmarked

63

Total users

50

Monthly active users

a day ago

Last modified

Share

πŸ” Similarweb Scraper

πŸ“Š Website intelligence for any domain in seconds. Start with one website, choose traffic/rankings, similar sites, or WHOIS + homepage keywords, then export the results as JSON, CSV, Excel or any other format Apify supports.

πŸ’Ž What is Similarweb Scraper?

Similarweb Scraper is a fast, captcha-resilient web scraper that pulls public website intelligence without requiring a Similarweb account, login, or API key. Behind the scenes it uses Chrome TLS / JA3 fingerprints via curl_cffi and routes every request through a fresh Apify Residential proxy session. When Similarweb refuses a protected source, the Actor stops quickly and does not save guessed or fallback data as base_data.

You can start with a single domain, then scale to a whole list when you are ready. Pick one dataset mode per run and the Actor returns clean records ready to drop into a spreadsheet, BI tool, warehouse, or AI agent.

πŸš€ What can Similarweb Scraper do?

  • πŸ—‚οΈ Choose one of three dataset modes for each run:
    • πŸ“Š Base data β€” global / country / category ranks, monthly visits, bounce rate, pages per visit, time on site, traffic-source split (direct, search, referral, social, paid, mail), top organic keywords with volume and CPC, AI traffic share per LLM (ChatGPT / Claude / Gemini / Perplexity / Copilot).
    • πŸͺž Similar sites β€” competitors and alternatives with their traffic, category and ranking.
    • πŸ†” AITDK β€” WHOIS via RDAP (registrar, registration / expiration dates, name servers, EPP status, DNSSEC) plus on-page keyword density analysis of the domain's homepage.
  • ⚑ Start small or run in bulk β€” one domain is enough for a test run; larger batches process up to 10 domains concurrently by default.
  • πŸ›‘οΈ Captcha-resilient β€” uses available public sources directly, without captcha solving. If a protected Similarweb source returns a challenge, the Actor skips that source instead of saving unreliable data.
  • πŸ”„ Per-request IP rotation β€” every HTTP call gets a fresh Apify Residential proxy session, so a blocked address costs at most one attempt.
  • 🌐 Three input formats β€” accepts example.com, www.example.com, or https://example.com. The domain is extracted automatically.

☁️ Remember the Apify platform

Running this Actor on Apify gives you everything that comes with the platform out of the box: managed Residential proxies with global exit IPs, scheduling (run hourly / daily / weekly), free storage in Apify Datasets with export to JSON / CSV / Excel / JSONL / XML / RSS, webhooks and integrations (Make, Zapier, n8n, Google Sheets, Slack, Airtable, Pipedream), and a REST API + Python / JavaScript SDKs to plug results into your own pipelines.

πŸ—οΈ What data can this Actor extract?

Field groupExamples
RankingsGlobal rank Β· country rank Β· category rank
EngagementTotal visits Β· monthly visits (3 months) Β· bounce rate Β· pages / visit Β· time on site
Traffic sourcesDirect Β· search Β· referral Β· social Β· paid / affiliate Β· display ads Β· GenAI Β· mail (as shares)
AI traffic shareChatGPT Β· Claude Β· Gemini Β· Perplexity Β· Copilot β€” current + 3-month history
Top keywordsKeyword Β· estimated value Β· search volume Β· CPC
Country breakdownTop countries with share + monthly visit estimates per country
Similar sitesUp to 20 related sites with traffic, ranks, category and thumbnails
WHOIS (via RDAP)Registrar Β· IANA ID Β· registration / expiration / last-changed dates Β· name servers Β· EPP status
Keyword densityTop-20 non-stopword tokens from the homepage with count and density
AssetsDesktop / mobile screenshots Β· favicon

🎯 How to use Similarweb Scraper

  1. Click Try for free on the Actor's Apify Store page.
  2. In the Domains field, enter one website to test, or paste a larger list later β€” one per line, any format (example.com, www.example.com, or https://example.com).
  3. Pick exactly one Dataset to fetch. Start with base_data for the main traffic, rank, engagement, keyword and AI-share overview, or choose another mode when needed.
  4. Click Start. A one-domain test run is fine; there is no 10-domain minimum. aitdk takes longer than the other modes because it also fetches RDAP and the homepage.
  5. When the run finishes, open Storage β†’ Dataset and export to JSON, CSV, Excel, JSONL, XML or RSS. Or pull the results through the API: https://api.apify.com/v2/datasets/{dataset_id}/items.

πŸ“₯ Input

The form has two visible fields only β€” everything else has sensible defaults:

FieldTypeDefault
domainsarray20 common domains are prefilled for a quick bulk test
datasetModeenumbase_data β€” one of base_data, similar_sites, aitdk

Example input

{
"domains": ["openai.com", "google.com", "booking.com"],
"datasetMode": "base_data"
}

API callers may also pass domains through domainsText (newline, comma, or semicolon separated), legacy urls, or Apify-style startUrls request objects. At least one domain source must be provided.

πŸ“€ Output

Each domain produces one dataset item. Each item conforms to the dataset schema and is rendered in the Apify Console views that match the selected dataset mode: πŸ“Š Overview Β· 🚦 Traffic sources Β· πŸ’« Engagement Β· πŸ€– AI traffic share Β· πŸ†” AITDK (WHOIS and keywords).

Example item (abridged)

{
"domain": "openai.com",
"rankGlobal": 207,
"country": "US",
"countryRank": 306,
"category": "ai_chatbots_and_tools",
"categoryRank": 6,
"title": "OpenAI",
"totalVisits": 195737812,
"bounceRate": 0.5937,
"pagesPerVisit": 2.59,
"timeOnSite": 138.72,
"socialTraffic": 0.0287,
"searchTraffic": 0.2154,
"directTraffic": 0.3840,
"referralTraffic": 0.1038,
"displayAdsTraffic": 0.0013,
"genAiTraffic": 0.2358,
"aiTrafficShareChatgpt": 0.8825,
"aiTrafficShareClaude": 0.0029,
"aiTrafficShareGemini": 0.0106,
"topKeywords": [
{"keyword": "chatgpt", "estimatedValue": 20907500.0, "searchVolume": 173339160.0, "cpc": 0.14},
{"keyword": "chat gpt", "estimatedValue": 5688810.0, "searchVolume": 95011780.0, "cpc": 0.14}
]
}

πŸ”— Integrate Similarweb Scraper anywhere

Apify Actors run on a REST API β€” every run, dataset and webhook is addressable from your code:

# Trigger a run from anywhere
curl -X POST "https://api.apify.com/v2/acts/<USER>~similarweb-scraper/runs?token=<API_TOKEN>" \
-H "Content-Type: application/json" \
-d '{"domains": ["openai.com"], "datasetMode": "base_data"}'
# Read results from the run's default dataset
curl "https://api.apify.com/v2/datasets/<DATASET_ID>/items?format=json"

Or use the official Python and JavaScript clients.

❓ FAQ

Can I test it with only one domain? Yes. The Actor needs at least one domain, not ten. The default UI example uses openai.com with base_data, so a new user can click Start immediately.

πŸ”‘ Do I need a Similarweb account or API key? No. This Actor talks to Similarweb's public SPA endpoint directly. No login, no API key, and no scraping the captcha-gated in-depth pages.

πŸ†• Is the data fresh? Yes β€” it's the same JSON Similarweb's UI loads. The snapshotDate field on every record tells you exactly which month it represents. Similarweb refreshes its traffic data monthly.

⏰ Can I run this on a schedule? Yes β€” open the Actor in Apify Console, go to Schedules and pick hourly, daily, weekly or a custom cron. Combine with webhooks to push fresh data into Google Sheets, Slack, Make, Zapier or your own backend automatically.

βš–οΈ Is web scraping legal? Public web pages are generally legal to scrape, but you must respect copyright, terms of service, and personal-data protection laws (GDPR in the EU and similar regulations elsewhere). This Actor only extracts publicly visible data β€” no personal data is collected. See Apify's legal blog for details.

πŸ’¬ Support

πŸ“ Changelog

See CHANGELOG.md for the full release history. The Actor follows Semantic Versioning.