Website Intelligence Extractor avatar

Website Intelligence Extractor

Pricing

from $10.00 / 1,000 results

Go to Apify Store
Website Intelligence Extractor

Website Intelligence Extractor

A powerful Apify actor that crawls websites to extract key intelligence, including emails, phone numbers, social media profiles, technology stack, SEO metadata, and structured data (JSON-LD). Ideal for lead generation, competitive analysis, marketing research, and SEO audits.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Jamshaid Arif

Jamshaid Arif

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

🔍 Website Intelligence Extractor

A powerful Apify actor that crawls any website and extracts actionable intelligence — emails, phone numbers, social media profiles, technology stack, SEO metadata, and structured data (JSON-LD).

Perfect for lead generation, competitive analysis, marketing research, and SEO auditing.


✨ What It Extracts

CategoryDetails
📧 EmailsAll email addresses found on pages + mailto: links, with junk filtering
📞 PhonesPhone numbers from text + tel: links, international format support
🔗 Social MediaFacebook, Twitter/X, LinkedIn, Instagram, YouTube, GitHub, TikTok, Reddit, Threads, Bluesky, and 15+ more platforms
⚙️ Tech StackCMS (WordPress, Shopify, Webflow…), Frameworks (React, Next.js, Vue…), Analytics (GA, Mixpanel, PostHog…), Marketing tools (HubSpot, Intercom…), CDN, Hosting, Payments — 60+ technologies
📊 SEO DataTitle, meta description, canonical URL, OG tags, Twitter cards, heading hierarchy, word count, image alt audit, internal/external links, and a computed SEO Score (0-100)
📋 Structured DataJSON-LD schemas (Organization, Product, Article, FAQ, etc.)

🚀 Quick Start

Input Example

{
"startUrls": [
{ "url": "https://example.com" }
],
"maxPages": 30,
"maxDepth": 3,
"extractEmails": true,
"extractPhones": true,
"extractSocials": true,
"detectTechStack": true,
"extractSEO": true,
"extractStructuredData": true,
"proxyConfiguration": {
"useApifyProxy": true
}
}

Input Parameters

ParameterTypeDefaultDescription
startUrlsarrayrequiredURLs to crawl
maxPagesinteger20Max pages per run (1–500)
maxDepthinteger3Link depth to follow (0–10)
extractEmailsbooleantrueFind email addresses
extractPhonesbooleantrueFind phone numbers
extractSocialsbooleantrueFind social media links
detectTechStackbooleantrueIdentify technologies
extractSEObooleantrueCollect SEO metadata
extractStructuredDatabooleantrueParse JSON-LD
proxyConfigurationobjectApify proxyProxy settings

📦 Output Format

Per-Page Dataset Record

{
"url": "https://example.com/about",
"statusCode": 200,
"crawledAt": "2025-01-15T10:30:00.000Z",
"title": "About Us — Example Corp",
"metaDescription": "Learn about Example Corp...",
"seoScore": 82,
"wordCount": 1450,
"emails": ["hello@example.com", "careers@example.com"],
"phones": ["+1 (555) 123-4567"],
"socialLinks": {
"twitter": ["https://twitter.com/examplecorp"],
"linkedin": ["https://linkedin.com/company/example"],
"github": ["https://github.com/example"]
},
"techStack": [
{ "name": "Next.js", "category": "Framework" },
{ "name": "Vercel", "category": "Hosting" },
{ "name": "Google Analytics", "category": "Analytics" },
{ "name": "Stripe", "category": "Payments" }
],
"seo": {
"title": "About Us — Example Corp",
"titleLength": 25,
"metaDescription": "Learn about Example Corp...",
"metaDescriptionLength": 145,
"canonicalUrl": "https://example.com/about",
"language": "en",
"openGraph": { "title": "...", "image": "..." },
"headings": {
"h1": ["About Example Corp"],
"h2": ["Our Mission", "Our Team", "Contact"]
},
"totalImages": 12,
"imagesWithoutAlt": 2,
"internalLinks": 34,
"externalLinks": 8,
"seoScore": 82
},
"structuredData": [
{
"@type": "Organization",
"name": "Example Corp",
"url": "https://example.com"
}
]
}

Domain Summary (Key-Value Store → DOMAIN_SUMMARY)

After crawling completes, a rolled-up summary is saved:

{
"totalPagesCrawled": 25,
"totalUniqueEmails": ["hello@example.com", "sales@example.com"],
"totalUniquePhones": ["+1 (555) 123-4567"],
"socialProfiles": {
"twitter": ["https://twitter.com/examplecorp"],
"linkedin": ["https://linkedin.com/company/example"]
},
"technologiesDetected": [
{ "name": "Next.js", "category": "Framework" },
{ "name": "Stripe", "category": "Payments" }
]
}

🎯 Use Cases

  • Lead Generation — Crawl prospect websites to harvest contact emails and phone numbers
  • Competitive Analysis — Discover what tech stack competitors use
  • SEO Auditing — Bulk-audit SEO health across hundreds of pages
  • Market Research — Map social media presence across an industry
  • Sales Intelligence — Enrich CRM records with fresh website data
  • Content Analysis — Extract structured data and content metrics

📝 License

MIT — see LICENSE for details.