Analyze Domains avatar

Analyze Domains

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Analyze Domains

Analyze Domains

Bulk-analyse a list of domains. Probes 4 URL variants (http/https, with/without www) and extracts title, meta description, Open Graph + Twitter Cards, emails, social handles, and tech stack. Optional toggles for SSL cert info, DNS records (A/AAAA/MX/TXT/NS), and WHOIS.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

Bulk-probe a list of domains and extract a structured profile for each — preferred URL, page title, meta description, contact emails, social-media handles, and a tech-stack fingerprint. HTTP-only, no proxy or browser required.

What it does

You supply a list of domains (bare hostnames or full URLs); the actor:

  1. Probes 4 URL variants per domain in parallel — http://<d>, https://<d>, http://www.<d>, https://www.<d>.
  2. Picks the canonical "preferred" URL (first variant returning a 200 with HTML).
  3. Extracts the landing page's title, meta description, canonical URL, Open Graph + Twitter Card metadata.
  4. Optionally pulls contact emails, social-media handles (Twitter/X, Facebook, Instagram, LinkedIn, YouTube, TikTok, GitHub), and a tech-stack fingerprint (CMS, frameworks, analytics, CDN).

Input

FieldTypeDefaultDescription
domainsarray of strings (required)["apify.com"]List of domains. Accepts bare hostnames (example.com) or full URLs (https://example.com/page?q=1).
extractContactsbooleantruePull mailto: links + plain-text email matches. Filters out boilerplate (user@example.com, noreply@…, etc).
extractSocialLinksbooleantrueDetect handles for Twitter/X, Facebook, Instagram, LinkedIn, YouTube, TikTok, GitHub.
extractTechStackbooleantrueLightweight Wappalyzer-style scan — CMS, frameworks, analytics, CDN.
extractSslbooleanfalseProbe each domain on port 443 and emit SSL cert info (issuer, subject, expiry, daysUntilExpiry).
extractDnsbooleanfalseResolve A / AAAA / MX / TXT / NS DNS records for each domain.
extractWhoisbooleanfalseQuery WHOIS over port 43 for registrar, creation/expiration/updated dates, name servers, status, abuse emails.
concurrencyinteger5 (1–20)How many domains to probe in parallel. Each probe issues up to 4 URL-variant requests in parallel.

Example input

{
"domains": ["apify.com", "github.com", "vercel.com"],
"extractContacts": true,
"extractSocialLinks": true,
"extractTechStack": true,
"concurrency": 5
}

Output

One record per input domain. Empty fields are omitted (no nulls).

{
"domain": "apify.com",
"isReachable": true,
"preferredUrl": "https://apify.com/",
"title": "Apify: Full-stack web scraping and data extraction platform",
"description": "Cloud platform for web scraping…",
"canonical": "https://apify.com/",
"ogTags": {
"title": "Apify: Full-stack web scraping and data extraction platform",
"type": "website",
"url": "https://apify.com/",
"image": "https://apify.com/og-image.jpg"
},
"emails": ["hello@apify.com", "support@apify.com"],
"emailCount": 2,
"socialLinks": {
"twitter": "apify",
"linkedin": "apify",
"github": "apify",
"youtube": "apify"
},
"socialLinkCount": 4,
"techStack": ["Cloudflare", "Google Analytics", "Google Tag Manager", "Next.js", "React"],
"ssl": {
"issuer": "Let's Encrypt",
"subject": "apify.com",
"expires": "2027-01-15",
"daysUntilExpiry": 270
},
"sslIsValid": true,
"dns": {
"A": ["104.21.x.x", "172.67.x.x"],
"AAAA": ["2606:4700:..."],
"MX": [{"preference": 10, "exchange": "mail.apify.com"}],
"TXT": ["v=spf1 include:_spf.google.com ~all"],
"NS": ["alma.ns.cloudflare.com", "lewis.ns.cloudflare.com"]
},
"whois": {
"registrar": "GoDaddy.com, LLC",
"creation_date": "2014-08-26",
"expiration_date": "2027-08-26",
"updated_date": "2024-07-29",
"name_servers": ["alma.ns.cloudflare.com", "lewis.ns.cloudflare.com"],
"status": ["clientTransferProhibited"],
"emails": ["abuse@godaddy.com"]
},
"variantProbes": [
{"url": "https://apify.com", "status": 200, "contentType": "text/html", "bodyLen": 78213},
{"url": "https://www.apify.com", "status": 301, "finalUrl": "https://apify.com/", "bodyLen": 0},
{"url": "http://apify.com", "status": 301, "finalUrl": "https://apify.com/", "bodyLen": 0},
{"url": "http://www.apify.com", "status": 301, "finalUrl": "https://apify.com/", "bodyLen": 0}
],
"scrapedAt": "2026-04-26T14:23:11+00:00"
}

Output fields

  • domain — normalised hostname (scheme/path/port/www stripped).
  • isReachabletrue if at least one of the 4 URL variants returned with a parseable response.
  • preferredUrl — the canonical landing URL (first 200-with-HTML variant; final URL after redirects when applicable).
  • title / description<title> and <meta name="description"> of the landing page.
  • canonical<link rel="canonical"> href when present.
  • ogTags — flat map of og:* properties (title, type, url, image, description, site_name, …).
  • emails / emailCount — deduped contact emails, lowercase placeholders (user@example.com, noreply@…) filtered out.
  • socialLinks / socialLinkCount{platform: handle} map.
  • techStack — array of detected technologies (CMS, frameworks, analytics, CDN).
  • ssl — (when extractSsl: true) issuer, subject, expires (YYYY-MM-DD), daysUntilExpiry. error key on lookup failure.
  • sslIsValid — derived boolean: true when SSL fetch succeeded AND not expired.
  • dns — (when extractDns: true) record types as keys: A, AAAA, NS, TXT are arrays of strings; MX is an array of {preference, exchange} objects.
  • whois — (when extractWhois: true) parsed registrar, dates, name servers, status flags, abuse emails. error key on lookup failure.
  • variantProbes — per-variant status report (url, status, optional finalUrl, contentType, bodyLen, errorReason).
  • scrapedAt — ISO-8601 UTC timestamp.

Use cases

  • Lead enrichment — turn a list of company domains into a profile with contact emails + social handles in one pass.
  • Domain audits — detect which domains redirect to https://www., which are unreachable, which still serve plain HTTP.
  • Tech-stack survey — find every Shopify / WordPress / Next.js site in a list.
  • Competitive research — pull title + description + tech-stack for every competitor in a market.
  • Migration planning — confirm new domains have correct canonical + redirects in place after a launch.

FAQ

Does it need a proxy? No. The actor uses Chrome 131 TLS-fingerprint impersonation; almost all marketing / corporate sites accept the request straight from datacenter IPs. Aggressively bot-protected targets (e.g. Cloudflare with bot-fight on hard mode) may return a challenge — those domains will simply be marked isReachable: false rather than proxied.

What about cookies / login? None needed — only the public landing page is fetched.

How are invalid domains handled? Each invalid input emits a separate record of shape {type: "analyze_domains_error", reason: "invalid_domain", input: ..., scrapedAt: ...} so the dataset audit stays honest. The valid domains in the same run are still processed.

Why are some fields missing on a record? The actor follows an omit-empty contract: fields not populated on the source page (e.g. no og:image, no description) are simply absent — no nulls.

How many domains can it handle? There is no hard cap; the concurrency setting controls parallelism (1–20). At concurrency: 5, roughly 100 domains complete in under 90 seconds.

Can I extract internal pages too? This actor focuses on the landing page only. For sitemap-walking and per-page metadata at scale, see SiteResearcher.

What counts as a "tech stack" match? Body-content + response-header signatures (e.g. wp-content/, _next/static/, cf-ray header). It's a fast best-effort fingerprint, not a Wappalyzer-class deep scan.