Website Intelligence Extractor
Pricing
from $10.00 / 1,000 results
Website Intelligence Extractor
A powerful Apify actor that crawls websites to extract key intelligence, including emails, phone numbers, social media profiles, technology stack, SEO metadata, and structured data (JSON-LD). Ideal for lead generation, competitive analysis, marketing research, and SEO audits.
Pricing
from $10.00 / 1,000 results
Rating
0.0
(0)
Developer
Jamshaid Arif
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
🔍 Website Intelligence Extractor
A powerful Apify actor that crawls any website and extracts actionable intelligence — emails, phone numbers, social media profiles, technology stack, SEO metadata, and structured data (JSON-LD).
Perfect for lead generation, competitive analysis, marketing research, and SEO auditing.
✨ What It Extracts
| Category | Details |
|---|---|
| 📧 Emails | All email addresses found on pages + mailto: links, with junk filtering |
| 📞 Phones | Phone numbers from text + tel: links, international format support |
| 🔗 Social Media | Facebook, Twitter/X, LinkedIn, Instagram, YouTube, GitHub, TikTok, Reddit, Threads, Bluesky, and 15+ more platforms |
| ⚙️ Tech Stack | CMS (WordPress, Shopify, Webflow…), Frameworks (React, Next.js, Vue…), Analytics (GA, Mixpanel, PostHog…), Marketing tools (HubSpot, Intercom…), CDN, Hosting, Payments — 60+ technologies |
| 📊 SEO Data | Title, meta description, canonical URL, OG tags, Twitter cards, heading hierarchy, word count, image alt audit, internal/external links, and a computed SEO Score (0-100) |
| 📋 Structured Data | JSON-LD schemas (Organization, Product, Article, FAQ, etc.) |
🚀 Quick Start
Input Example
{"startUrls": [{ "url": "https://example.com" }],"maxPages": 30,"maxDepth": 3,"extractEmails": true,"extractPhones": true,"extractSocials": true,"detectTechStack": true,"extractSEO": true,"extractStructuredData": true,"proxyConfiguration": {"useApifyProxy": true}}
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrls | array | required | URLs to crawl |
maxPages | integer | 20 | Max pages per run (1–500) |
maxDepth | integer | 3 | Link depth to follow (0–10) |
extractEmails | boolean | true | Find email addresses |
extractPhones | boolean | true | Find phone numbers |
extractSocials | boolean | true | Find social media links |
detectTechStack | boolean | true | Identify technologies |
extractSEO | boolean | true | Collect SEO metadata |
extractStructuredData | boolean | true | Parse JSON-LD |
proxyConfiguration | object | Apify proxy | Proxy settings |
📦 Output Format
Per-Page Dataset Record
{"url": "https://example.com/about","statusCode": 200,"crawledAt": "2025-01-15T10:30:00.000Z","title": "About Us — Example Corp","metaDescription": "Learn about Example Corp...","seoScore": 82,"wordCount": 1450,"emails": ["hello@example.com", "careers@example.com"],"phones": ["+1 (555) 123-4567"],"socialLinks": {"twitter": ["https://twitter.com/examplecorp"],"linkedin": ["https://linkedin.com/company/example"],"github": ["https://github.com/example"]},"techStack": [{ "name": "Next.js", "category": "Framework" },{ "name": "Vercel", "category": "Hosting" },{ "name": "Google Analytics", "category": "Analytics" },{ "name": "Stripe", "category": "Payments" }],"seo": {"title": "About Us — Example Corp","titleLength": 25,"metaDescription": "Learn about Example Corp...","metaDescriptionLength": 145,"canonicalUrl": "https://example.com/about","language": "en","openGraph": { "title": "...", "image": "..." },"headings": {"h1": ["About Example Corp"],"h2": ["Our Mission", "Our Team", "Contact"]},"totalImages": 12,"imagesWithoutAlt": 2,"internalLinks": 34,"externalLinks": 8,"seoScore": 82},"structuredData": [{"@type": "Organization","name": "Example Corp","url": "https://example.com"}]}
Domain Summary (Key-Value Store → DOMAIN_SUMMARY)
After crawling completes, a rolled-up summary is saved:
{"totalPagesCrawled": 25,"totalUniqueEmails": ["hello@example.com", "sales@example.com"],"totalUniquePhones": ["+1 (555) 123-4567"],"socialProfiles": {"twitter": ["https://twitter.com/examplecorp"],"linkedin": ["https://linkedin.com/company/example"]},"technologiesDetected": [{ "name": "Next.js", "category": "Framework" },{ "name": "Stripe", "category": "Payments" }]}
🎯 Use Cases
- Lead Generation — Crawl prospect websites to harvest contact emails and phone numbers
- Competitive Analysis — Discover what tech stack competitors use
- SEO Auditing — Bulk-audit SEO health across hundreds of pages
- Market Research — Map social media presence across an industry
- Sales Intelligence — Enrich CRM records with fresh website data
- Content Analysis — Extract structured data and content metrics
📝 License
MIT — see LICENSE for details.