Company Info Scraper – Contacts, Tech Stack, Social Profiles avatar

Company Info Scraper – Contacts, Tech Stack, Social Profiles

Pricing

$9.99/month + usage

Go to Apify Store
Company Info Scraper – Contacts, Tech Stack, Social Profiles

Company Info Scraper – Contacts, Tech Stack, Social Profiles

Crawl any company website: extract emails (support/sales/decision-maker), phone numbers, addresses, social profiles, technologies, industry, size, lead score. Smart crawling + JS fallback. $9.99/month. Perfect for B2B leads & competitor intel.

Pricing

$9.99/month + usage

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

1

Monthly active users

3 days ago

Last modified

Share


🏢 Company Info Scraper – Extract Business Data, Contacts, Tech Stack, Social Profiles

Crawl any company website – extract company name, description, logo, favicon, email addresses (support/sales/decision‑maker), phone numbers, social media profiles, physical addresses, technologies used, industry, business model, and more.
Includes smart crawling (priority pages first), fallback browser rendering, checkpoint/resume, and lead scoring. Perfect for lead generation, sales intelligence, competitor research, and market analysis.


💡 What is the Company Info Scraper?

The Company Info Scraper is an intelligent Apify actor that analyzes any business website and returns a structured, comprehensive profile of that company. It goes far beyond simple scraping:

  • Intelligent crawling – prioritizes contact, about, careers, blog, pricing, and portal pages first.
  • Multi‑engine fetching – tries lightweight HTTP (curl_cffi) first, falls back to headless Chromium (Playwright) for JavaScript‑heavy sites.
  • Contact extraction – finds email addresses (categorised into support, sales, decision‑maker, general), phone numbers (E.164 format), physical addresses (with pattern matching).
  • Social media discovery – detects Facebook, Instagram, LinkedIn, Twitter/X, YouTube, TikTok, Pinterest, GitHub, Threads, Snapchat, Reddit.
  • Technology detection – identifies CMS (WordPress, Shopify, Webflow), analytics (GA, GTM), chat widgets (Intercom, Zendesk), payment providers (Stripe, PayPal), and more.
  • Business intelligence – infers industry, company size estimate, business model (B2B SaaS, E‑commerce, Agency, etc.), and lead scoring (High/Medium/Low).
  • Resume & checkpoint – saves progress after every page; survives interruptions.
  • Bulk processing – supply many start URLs, and the actor crawls each independently.

The output includes everything a sales or research team needs to qualify a lead – direct contact URLs, email buckets, phone numbers, social handles, and even a lead quality score.


🚀 Key Features

FeatureDescription
Smart priority crawlingVisits contact, about, careers, privacy, terms, blog, pricing, login, portal, product pages early.
Dual fetch engineUses curl_cffi with Chrome impersonation + fallback to Playwright (full JS rendering).
Contact extractionEmails (categorised into support, sales, decision-maker, general), phone numbers (E.164), physical addresses.
Social media discoveryDetects 13+ social platforms; returns full profile URLs.
Technology stack detectionRecognises 30+ technologies (CMS, analytics, chat, payments, hosting).
Business insightsIndustry (10 categories), company size (from hints or number of employees), business model, lead score.
Checkpoint & resumeSaves state after every page; restart without re‑scanning visited URLs.
Bulk domainsProcess hundreds of websites in one run (each independently crawled).
Flat monthly pricing$9.99/month – unlimited runs, no per‑page fees.
Clean JSON outputOne comprehensive item per domain + intermediate items for each scanned page (with status field).

📥 Input Parameters

The actor accepts a JSON object with the following fields:

ParameterTypeRequiredDefaultDescription
startUrlsarray of objectsYesList of starting URLs (e.g., [{"url": "https://example.com"}]).
maxPagesPerDomainintegerNo20Maximum pages to crawl per domain (prevents runaway crawls).
concurrencyintegerNo20Number of concurrent HTTP requests.
regionHintstringNo"US"Two‑letter country code for phone number parsing (e.g., "US", "BD").
proxyConfigurationobjectNoApify proxy configuration. Residential proxies recommended.

Example Input

{
"startUrls": [
{"url": "https://stripe.com"},
{"url": "https://shopify.com"},
{"url": "https://airbnb.com"}
],
"maxPagesPerDomain": 30,
"concurrency": 15,
"regionHint": "US",
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}

📤 Output Fields

The actor pushes two types of items to the dataset:

  1. Page‑level items (one per crawled page, status: "scanned") – useful for intermediate debugging.
  2. One final item per domain (when crawling finishes or maxPagesPerDomain reached, status: "completed" or "partial").

Below is a sample of the final company item (most relevant for users):

{
"domain": "stripe.com",
"website": "https://stripe.com",
"company_name": "Stripe",
"website_title": "Stripe: Financial infrastructure for the internet",
"website_description": "Stripe powers online and in-person payment processing and financial solutions for businesses of all sizes.",
"industry": "Fintech",
"country": "United States",
"company_size_estimate": "1000+",
"business_model": "B2B SaaS",
"founded_year": "2010",
"logo_url": "https://stripe.com/img/about/logos/logomark.png",
"favicon_url": "https://stripe.com/favicon.ico",
"phone_numbers": ["+1-888-963-8944"],
"emails": ["support@stripe.com", "sales@stripe.com", "press@stripe.com"],
"support_emails": ["support@stripe.com"],
"sales_emails": ["sales@stripe.com"],
"decision_maker_emails": [],
"email_buckets": {
"support": ["support@stripe.com"],
"sales": ["sales@stripe.com"],
"general": [],
"other": ["press@stripe.com"]
},
"addresses": ["3180 18th St, San Francisco, CA 94110"],
"social_profiles": {
"twitter": ["https://twitter.com/stripe"],
"linkedin": ["https://linkedin.com/company/stripe"],
"github": ["https://github.com/stripe"]
},
"contact_url": "https://stripe.com/contact",
"about_url": "https://stripe.com/about",
"careers_url": "https://stripe.com/jobs",
"privacy_url": "https://stripe.com/privacy",
"terms_url": "https://stripe.com/legal",
"blog_url": "https://stripe.com/blog",
"pricing_url": "https://stripe.com/pricing",
"login_url": "https://dashboard.stripe.com/login",
"customer_portal_url": "https://dashboard.stripe.com",
"product_url": "https://stripe.com/products",
"technologies": ["Cloudflare", "Google Analytics", "React", "Stripe", "PayPal"],
"has_contact_form": true,
"has_live_chat": false,
"newsletter_signup": true,
"accepts_online_payments": true,
"pages_scanned": 25,
"status": "completed",
"lead_score": 85,
"lead_quality": "High",
"scraped_at": "2026-06-01T12:30:00Z"
}
FieldTypeDescription
domainstringCompany domain (e.g., stripe.com).
company_namestringBest‑guess company name (from OG tags, title, H1).
website_titlestringPage <title> of homepage or first meaningful page.
website_descriptionstringMeta description / OG description.
industrystringDetected industry (Fintech, E‑commerce, SaaS, etc.).
countrystringDetected country from address or text.
company_size_estimatestring1-10, 11-50, ..., 1000+ or Unknown.
business_modelstringB2B SaaS, E-commerce, Agency / Services, etc.
founded_yearstringFrom JSON‑LD or text (4‑digit year).
logo_urlstringURL of the company logo (if found).
favicon_urlstringURL of the favicon.
phone_numbersarrayE.164 formatted phone numbers.
emailsarrayAll found email addresses.
support_emailsarrayEmails matching support@, help@, etc.
sales_emailsarrayEmails matching sales@, business@, etc.
decision_maker_emailsarrayEmails matching ceo@, founder@, director@, etc.
email_bucketsobjectCategorised emails for easy integration.
addressesarrayPhysical address strings.
social_profilesobjectDictionary of platform → array of URLs.
contact_urlstringFirst discovered contact page URL.
about_urlstringFirst discovered about page URL.
careers_urlstringCareers/jobs page URL.
privacy_urlstringPrivacy policy URL.
terms_urlstringTerms of service URL.
blog_urlstringBlog/news page URL.
pricing_urlstringPricing page URL.
login_urlstringLogin page URL.
customer_portal_urlstringCustomer portal/dashboard URL.
product_urlstringProduct/solutions URL.
technologiesarrayDetected technology stack (CMS, analytics, etc.).
has_contact_formbooleanWhether a contact form was detected.
has_live_chatbooleanLive chat widget detected (Intercom, etc.).
newsletter_signupbooleanNewsletter subscription form detected.
accepts_online_paymentsbooleanPayment methods (Stripe, PayPal, etc.) detected.
pages_scannedintegerNumber of pages crawled for this domain.
statusstringcompleted (reached limit or finished), partial (still links left but stopped).
lead_scoreintegerScore 0–100 based on available data (emails, phones, social, etc.).
lead_qualitystringHigh (≥70), Medium (35–69), Low (<35).
scraped_atstringISO timestamp.

💰 Pricing

PlanPriceDescription
Monthly Subscription$9.99Unlimited runs – no per‑page fees, no hidden costs.
  • You can scrape as many domains as you want, with up to maxPagesPerDomain per domain, as many times per month as you like.
  • The actor automatically saves checkpoints; if you stop early, it resumes from where it left off.
  • No pay‑per‑event – this is a fixed monthly subscription.

🛠 How to Use on Apify

  1. Create a task with this actor.
  2. Provide startUrls – one or more company website URLs.
  3. Adjust maxPagesPerDomain (default 20) – higher values give more thorough results but take longer.
  4. Set concurrency (default 20) – increase for faster crawling (but may trigger blocking).
  5. Enable residential proxies – strongly recommended to avoid being blocked.
  6. Run – the actor will crawl each domain, extract all data, and push results to the Dataset.
  7. Export – download final company profiles as JSON, CSV, or Excel.

Tip: The actor produces both page‑level items (many) and a final summary item per domain. Filter by status: "completed" or status: "partial" to get only the final company profiles.

Running via API

curl -X POST "https://api.apify.com/v2/acts/your-username~company-info-scraper/runs" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-d '{
"startUrls": [{"url": "https://stripe.com"}],
"maxPagesPerDomain": 15,
"proxyConfiguration": {"useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"]}
}'

🎯 Use Cases

Use CaseHow the Company Info Scraper Helps
Lead generationExtract contact emails, phones, social profiles, and decision‑maker emails for outbound campaigns.
Sales intelligenceScore leads automatically (lead_quality) and identify technologies used (e.g., Shopify stores → pitch e‑commerce solutions).
Competitor researchGather website tech stack, business model, and industry classification for benchmarking.
Market analysisBatch‑process hundreds of companies in a sector to identify common tools, locations, and sizes.
Mergers & acquisitionsQuickly collect company overview, founding year, and contact channels for initial due diligence.
Partnership sourcingFind potential partners by discovering their contact and social pages.

❓ Frequently Asked Questions

1. Do I need a proxy?
Residential proxies are strongly recommended, especially for many domains or for sites using Cloudflare. Datacenter IPs may get blocked quickly.

2. How many pages does it crawl?
maxPagesPerDomain controls the limit (default 20). The actor prioritises important pages (contact, about, careers, etc.), so even a low limit gives good data.

3. What happens if the website requires JavaScript?
The actor tries HTTP first (fast). If that fails or returns minimal content, it falls back to a headless Chrome browser (Playwright) to render the page.

4. How does it categorise emails?

  • support_emails: starts with support@, help@, care@, service@, contact@, hello@.
  • sales_emails: starts with sales@, business@, partners@, bd@.
  • decision_maker_emails: starts with ceo@, founder@, owner@, president@, director@, vp@, chief@, head@, lead@.
  • general: info@, hello@, contact@ (if not already captured).
  • other: all remaining.

5. How is the lead score calculated?
Points are added for:

  • Phone numbers (20), emails (20), addresses (10), social profiles (10), contact page (10), company name (5), technologies (5), description (5), logo/favicon (5), live chat (5), newsletter (5).
    Total ≥70 = High, 35–69 = Medium, <35 = Low.

6. Can I run this for thousands of domains?
Yes, but be mindful of proxy usage (residential proxies are metered). For very large lists, spread runs over time or use a dedicated proxy pool.

7. Does it extract the full website content?
No – it focuses on metadata, contact info, and structured data. It does not archive entire pages.

8. What is the checkpoint feature?
If the actor stops (due to spending limit, timeout, or user interruption), it saves which pages have been visited. When you restart, it resumes without re‑scanning already processed URLs.

9. How do I get only the final company profile (not the page‑level items)?
After the run, filter the dataset by status: "completed" or "partial". Page‑level items have status: "scanned".

10. What if the website is not in English?
The actor works with any language – pattern matching (addresses, phone numbers) is language‑agnostic, and keyword detection (for industry, social media) uses a base set of English terms. You can modify the INDUSTRY_HINTS etc. in the source code for other languages.



🔍 SEO Keywords

company information scraper, business intelligence, lead generation tool, website technology detector, social media finder, email extractor, phone number scraper, company contact scraper, business data extraction, Apify company scraper, B2B lead enrichment, competitor analysis tool, company size estimator




Start extracting complete company intelligence – only $9.99/month. Crawl any business website, get contacts, tech stack, social profiles, and lead scoring.