Analyze Domains
Pricing
from $1.00 / 1,000 results
Analyze Domains
Bulk-analyse a list of domains. Probes 4 URL variants (http/https, with/without www) and extracts title, meta description, Open Graph + Twitter Cards, emails, social handles, and tech stack. Optional toggles for SSL cert info, DNS records (A/AAAA/MX/TXT/NS), and WHOIS.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Crawler Bros
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
Bulk-probe a list of domains and extract a structured profile for each — preferred URL, page title, meta description, contact emails, social-media handles, and a tech-stack fingerprint. HTTP-only, no proxy or browser required.
What it does
You supply a list of domains (bare hostnames or full URLs); the actor:
- Probes 4 URL variants per domain in parallel —
http://<d>,https://<d>,http://www.<d>,https://www.<d>. - Picks the canonical "preferred" URL (first variant returning a 200 with HTML).
- Extracts the landing page's title, meta description, canonical URL, Open Graph + Twitter Card metadata.
- Optionally pulls contact emails, social-media handles (Twitter/X, Facebook, Instagram, LinkedIn, YouTube, TikTok, GitHub), and a tech-stack fingerprint (CMS, frameworks, analytics, CDN).
Input
| Field | Type | Default | Description |
|---|---|---|---|
domains | array of strings (required) | ["apify.com"] | List of domains. Accepts bare hostnames (example.com) or full URLs (https://example.com/page?q=1). |
extractContacts | boolean | true | Pull mailto: links + plain-text email matches. Filters out boilerplate (user@example.com, noreply@…, etc). |
extractSocialLinks | boolean | true | Detect handles for Twitter/X, Facebook, Instagram, LinkedIn, YouTube, TikTok, GitHub. |
extractTechStack | boolean | true | Lightweight Wappalyzer-style scan — CMS, frameworks, analytics, CDN. |
extractSsl | boolean | false | Probe each domain on port 443 and emit SSL cert info (issuer, subject, expiry, daysUntilExpiry). |
extractDns | boolean | false | Resolve A / AAAA / MX / TXT / NS DNS records for each domain. |
extractWhois | boolean | false | Query WHOIS over port 43 for registrar, creation/expiration/updated dates, name servers, status, abuse emails. |
concurrency | integer | 5 (1–20) | How many domains to probe in parallel. Each probe issues up to 4 URL-variant requests in parallel. |
Example input
{"domains": ["apify.com", "github.com", "vercel.com"],"extractContacts": true,"extractSocialLinks": true,"extractTechStack": true,"concurrency": 5}
Output
One record per input domain. Empty fields are omitted (no nulls).
{"domain": "apify.com","isReachable": true,"preferredUrl": "https://apify.com/","title": "Apify: Full-stack web scraping and data extraction platform","description": "Cloud platform for web scraping…","canonical": "https://apify.com/","ogTags": {"title": "Apify: Full-stack web scraping and data extraction platform","type": "website","url": "https://apify.com/","image": "https://apify.com/og-image.jpg"},"emails": ["hello@apify.com", "support@apify.com"],"emailCount": 2,"socialLinks": {"twitter": "apify","linkedin": "apify","github": "apify","youtube": "apify"},"socialLinkCount": 4,"techStack": ["Cloudflare", "Google Analytics", "Google Tag Manager", "Next.js", "React"],"ssl": {"issuer": "Let's Encrypt","subject": "apify.com","expires": "2027-01-15","daysUntilExpiry": 270},"sslIsValid": true,"dns": {"A": ["104.21.x.x", "172.67.x.x"],"AAAA": ["2606:4700:..."],"MX": [{"preference": 10, "exchange": "mail.apify.com"}],"TXT": ["v=spf1 include:_spf.google.com ~all"],"NS": ["alma.ns.cloudflare.com", "lewis.ns.cloudflare.com"]},"whois": {"registrar": "GoDaddy.com, LLC","creation_date": "2014-08-26","expiration_date": "2027-08-26","updated_date": "2024-07-29","name_servers": ["alma.ns.cloudflare.com", "lewis.ns.cloudflare.com"],"status": ["clientTransferProhibited"],"emails": ["abuse@godaddy.com"]},"variantProbes": [{"url": "https://apify.com", "status": 200, "contentType": "text/html", "bodyLen": 78213},{"url": "https://www.apify.com", "status": 301, "finalUrl": "https://apify.com/", "bodyLen": 0},{"url": "http://apify.com", "status": 301, "finalUrl": "https://apify.com/", "bodyLen": 0},{"url": "http://www.apify.com", "status": 301, "finalUrl": "https://apify.com/", "bodyLen": 0}],"scrapedAt": "2026-04-26T14:23:11+00:00"}
Output fields
domain— normalised hostname (scheme/path/port/www stripped).isReachable—trueif at least one of the 4 URL variants returned with a parseable response.preferredUrl— the canonical landing URL (first 200-with-HTML variant; final URL after redirects when applicable).title/description—<title>and<meta name="description">of the landing page.canonical—<link rel="canonical">href when present.ogTags— flat map ofog:*properties (title,type,url,image,description,site_name, …).emails/emailCount— deduped contact emails, lowercase placeholders (user@example.com,noreply@…) filtered out.socialLinks/socialLinkCount—{platform: handle}map.techStack— array of detected technologies (CMS, frameworks, analytics, CDN).ssl— (whenextractSsl: true)issuer,subject,expires(YYYY-MM-DD),daysUntilExpiry.errorkey on lookup failure.sslIsValid— derived boolean:truewhen SSL fetch succeeded AND not expired.dns— (whenextractDns: true) record types as keys:A,AAAA,NS,TXTare arrays of strings;MXis an array of{preference, exchange}objects.whois— (whenextractWhois: true) parsed registrar, dates, name servers, status flags, abuse emails.errorkey on lookup failure.variantProbes— per-variant status report (url,status, optionalfinalUrl,contentType,bodyLen,errorReason).scrapedAt— ISO-8601 UTC timestamp.
Use cases
- Lead enrichment — turn a list of company domains into a profile with contact emails + social handles in one pass.
- Domain audits — detect which domains redirect to
https://www., which are unreachable, which still serve plain HTTP. - Tech-stack survey — find every Shopify / WordPress / Next.js site in a list.
- Competitive research — pull title + description + tech-stack for every competitor in a market.
- Migration planning — confirm new domains have correct canonical + redirects in place after a launch.
FAQ
Does it need a proxy?
No. The actor uses Chrome 131 TLS-fingerprint impersonation; almost all marketing / corporate sites accept the request straight from datacenter IPs. Aggressively bot-protected targets (e.g. Cloudflare with bot-fight on hard mode) may return a challenge — those domains will simply be marked isReachable: false rather than proxied.
What about cookies / login? None needed — only the public landing page is fetched.
How are invalid domains handled?
Each invalid input emits a separate record of shape {type: "analyze_domains_error", reason: "invalid_domain", input: ..., scrapedAt: ...} so the dataset audit stays honest. The valid domains in the same run are still processed.
Why are some fields missing on a record?
The actor follows an omit-empty contract: fields not populated on the source page (e.g. no og:image, no description) are simply absent — no nulls.
How many domains can it handle?
There is no hard cap; the concurrency setting controls parallelism (1–20). At concurrency: 5, roughly 100 domains complete in under 90 seconds.
Can I extract internal pages too? This actor focuses on the landing page only. For sitemap-walking and per-page metadata at scale, see SiteResearcher.
What counts as a "tech stack" match?
Body-content + response-header signatures (e.g. wp-content/, _next/static/, cf-ray header). It's a fast best-effort fingerprint, not a Wappalyzer-class deep scan.