Pricing

from $50.00 / 1,000 company researcheds

Company Research Scraper — Deep Company Intelligence Data

Extract deep company intelligence data from any domain. Get company name, description, employees, tech stack, social links, GitHub stats, Wikipedia summary, executive leadership, recent news, and contact emails — from 8 sources in parallel. $0.06 per company.

Pricing

from $50.00 / 1,000 company researcheds

Rating

0.0

(0)

Developer

Scrape Pilot

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

🔍 Company Research Scraper — Deep Company Intelligence Data

The most complete Company Research Scraper on Apify. Extract deep company intelligence data from any domain — company name, description, industry, headquarters, employee count, executive leadership, tech stack, social links, GitHub stats, Wikipedia summary, recent news, and contact emails — sourced simultaneously from 8 public data sources per company. No login. No API key. Pay only for results.

🔍 What Is This Actor?

Company Research Scraper is a production-ready Apify actor that extracts comprehensive company intelligence data from any business domain — pulling from 8 public data sources simultaneously per company and merging everything into one clean, structured record.

Provide one or many company domains — openai.com, stripe.com, shopify.com — and receive back a complete company intelligence profile: verified company name, full description, industry, company type, headquarters location, total employee count, executive leadership names, products and services, tech stack (frontend, analytics, CRM, infrastructure, payments), social media links across 6 platforms, GitHub repository stats, Wikipedia summary, recent news headlines, contact emails found on the website, and direct links to all source pages.

This company research scraper runs all 8 data source queries in parallel per company — website HTML, LinkedIn public page, Wikipedia, GitHub API, Google News, DuckDuckGo, OpenCorporates, and HTTP header analysis — delivering the most complete company intelligence data record available on Apify, faster than any sequential scraper.

🚀 Why Use This Company Research Scraper?

Feature	This Actor	Manual Research	ZoomInfo / Clearbit	Other Scrapers
8 sources per company — parallel	✅	❌ Hours	✅ Expensive	❌ 1–2 sources
Company description + Wikipedia	✅	✅ Slow	✅	⚠️
Tech stack detection	✅ 5 categories	❌	✅	❌
GitHub stats	✅	❌	❌	❌
Recent news (up to 8 headlines)	✅	❌	❌	❌
Executive leadership names	✅	✅	✅	❌
Contact email extraction	✅	❌	✅	⚠️
Social links (6 platforms)	✅	❌	⚠️	⚠️
Bulk domain input	✅	❌	✅	⚠️
Checkpoint resume on abort	✅	N/A	N/A	❌
No subscription or API key	✅	N/A	❌	✅

Bottom line: This company research scraper is the only actor that aggregates company intelligence data from 8 public sources in parallel — delivering tech stack, GitHub stats, recent news, executive leadership, and contact emails alongside the standard company profile fields.

📡 Data Sources

Every company record is built from 8 public sources queried simultaneously:

Source	Data Extracted
Company Website	Meta description, page title, JSON-LD schema, social links, emails, tech stack, products/services, copyright year, about page text
LinkedIn Public Page	Company name, employee count, founded year, headquarters, industry, about text
Wikipedia	Full company description, founded year, headquarters location, Wikipedia URL
GitHub API	Organization name, public repo count, followers, description, location, blog, email, creation date
Google News RSS	Up to 8 recent news headlines with publication date and URL
DuckDuckGo Instant Answer	Company abstract, revenue, employee count, industry, company type, leadership names
OpenCorporates	Incorporation date, registered address, company type, OpenCorporates URL
HTTP Headers	Server technology, CDN provider (Cloudflare, AWS, Vercel, Fastly)

All 8 sources are queried in parallel per company — not sequentially — delivering a complete profile as fast as the slowest single source.

🎯 Use Cases

💼 B2B Sales & Account-Based Marketing

Build enriched prospect lists by running this company research scraper against target domain lists
Automatically populate CRM records with company description, employee count, industry, and contact email
Identify tech stack usage across prospects — find companies using Salesforce, HubSpot, or Stripe

🔍 Competitive Intelligence & Market Mapping

Extract company intelligence data on competitors — description, employees, leadership, tech stack, recent news
Monitor competitor news automatically by scheduling regular runs on competitor domains
Map the technology landscape of an industry by comparing tech stacks across multiple companies

🤖 AI & Data Pipeline Integrations

Feed structured company intelligence data into AI research assistants, CRM enrichment pipelines, or RAG systems
Build automated company profiling workflows for investment screening, partnership evaluation, or vendor assessment
Use GitHub stats and tech stack data to qualify engineering-focused companies for developer tools outreach

📊 Investment Research & Due Diligence

Run rapid first-pass due diligence on potential investment targets using publicly available company data
Extract executive leadership names, employee counts, and industry classification for portfolio screening
Collect recent news for any company to surface material events before deeper research

🏢 Partnership & Vendor Evaluation

Research potential partners or vendors at scale before initiating contact
Compare company size, tech stack, and industry focus across a shortlist of candidates
Find contact emails from company websites for first outreach

🎓 Academic & Business Research

Build structured datasets of company profiles for market structure, innovation, or technology adoption research
Collect GitHub activity data alongside company metadata for studies on open-source software practices
Analyze tech stack adoption patterns across industries using structured company intelligence data

⚙️ Input Parameters

{
  "domains": [
    "openai.com",
    "stripe.com",
    "shopify.com"
  ],
  "proxyConfiguration": {
    "useApifyProxy":    true,
    "apifyProxyGroups": ["RESIDENTIAL"]
  }
}

Parameter	Type	Description
`domains`	array or string	Company domains to research — e.g. `"openai.com"`, `"stripe.com"`. Newline-separated string also accepted. `https://` prefix is optional — auto-handled
`proxyConfiguration`	object	Apify proxy config — residential proxy recommended for LinkedIn and high-volume runs

Tip: You can enter domains with or without https:// or www. — all formats are normalized automatically. Mix any combination of domain formats in the same run.

📋 Output Fields

Every record from this company research scraper contains the following fields. Fields marked ✅ Reliable consistently return data. Fields marked ⚠️ Source-dependent depend on what each public source makes available.

🏢 Core Company Fields

Field	Reliability	Description	Example
`domain`	✅ Always	Normalized company domain	`"openai.com"`
`Company Name`	✅ High	Company display name	`"OpenAI"`
`Company Description`	✅ High	Full description from Wikipedia or website (max 1500 chars)	`"OpenAI is an American AI research organization..."`
`Website URL`	✅ Always	Direct company website URL	`"https://openai.com"`
`LinkedIn Profile URL`	✅ High	LinkedIn company page URL	`"https://www.linkedin.com/company/openai"`
`Industry/Vertical`	✅ Medium	Industry classification from LinkedIn	`"Industry Research Services"`
`Company Type`	✅ Medium	Private / Public / Nonprofit	`"Private"`
`Year Founded`	⚠️ Source-dependent	Founding year when available	`2015`
`Headquarters Location`	✅ High	Primary office location	`"San Francisco"`
`Total Employees`	✅ High	Employee count from LinkedIn	`8644`
`Executive Leadership`	⚠️ Source-dependent	Founder and executive names	`"Founders: Sam Altman, Ilya Sutskever..."`
`Products & Services`	✅ Medium	Key products and services from website	`"Platform Overview, Solutions"`

💻 Tech Stack Fields

Field	Reliability	Description	Example
`Tech Stack.Frontend`	✅ Medium	Frontend frameworks detected	`["React/Next.js", "TypeScript"]`
`Tech Stack.Analytics`	✅ Medium	Analytics tools detected	`["Google Analytics", "Amplitude"]`
`Tech Stack.Marketing/CRM`	✅ Medium	CRM and marketing tools	`["HubSpot", "Salesforce"]`
`Tech Stack.Infrastructure/CDN`	✅ High	Cloud and CDN providers	`["Cloudflare", "AWS"]`
`Tech Stack.Payments`	⚠️ Source-dependent	Payment processors detected	`["Stripe"]`
`Tech Stack.Server`	✅ Medium	Server technology from headers	`["Next.js", "cloudflare"]`
`Tech Stack.CDN`	✅ High	CDN from HTTP headers	`["Cloudflare"]`

Field	Reliability	Description	Example
`Social Links.linkedin`	✅ High	LinkedIn company URL	`"https://www.linkedin.com/company/openai"`
`Social Links.twitter`	✅ High	Twitter/X profile URL	`"https://x.com/OpenAI"`
`Social Links.youtube`	✅ Medium	YouTube channel URL	`"https://www.youtube.com/OpenAI"`
`Social Links.github`	✅ Medium	GitHub organization URL	`"https://github.com/openai"`
`Social Links.facebook`	⚠️ Source-dependent	Facebook page URL	`"https://www.facebook.com/openai"`
`Social Links.instagram`	⚠️ Source-dependent	Instagram profile URL	`"https://www.instagram.com/openai"`
`Emails Found`	⚠️ Source-dependent	Emails found on website (up to 5)	`["press@company.com"]`
`Contact Email`	⚠️ Source-dependent	Best contact email identified	`"press@company.com"`

📊 GitHub Fields

Field	Reliability	Description	Example
`GitHub URL`	✅ Medium	GitHub org page URL	`"https://github.com/openai"`
`GitHub Public Repos`	✅ Medium	Number of public repositories	`246`
`GitHub Followers`	✅ Medium	GitHub organization followers	`120231`
`GitHub Description`	⚠️ Source-dependent	GitHub org bio	`"AI research and deployment company"`

📰 News & Reference Fields

Field	Reliability	Description	Example
`Wikipedia URL`	✅ High	Wikipedia article URL	`"https://en.wikipedia.org/wiki/OpenAI"`
`Wikipedia Summary`	✅ High	Full Wikipedia extract	`"OpenAI is an American AI..."`
`Recent News`	✅ High	Up to 8 recent news items with title, date, and URL	See example below
`OpenCorporates URL`	⚠️ Source-dependent	OpenCorporates registry link	`"https://opencorporates.com/..."`

⏱️ Meta Fields

Field	Description	Example
`data_sources`	List of all 8 sources queried	`["website_html", "linkedin_public", "wikipedia", ...]`
`data_freshness`	Data recency status	`"real-time"`
`scraped_at`	Extraction timestamp (ISO 8601 UTC)	`"2024-03-15T10:30:00Z"`

❌ Fields That Are Never Populated

The following fields appear in the record schema but are always null — they require paid data sources not included in this actor:

Company Stage, Additional Locations, Revenue Growth Rate, Revenue Per Employee, Funding Raised to Date, Latest Funding Round, Latest Funding Amount, Latest Funding Date, Major Investors, Annual Profitability, EBITDA, Burn Rate, Runway, Cash Position, Market Cap, Monthly Website Visits, Global Traffic Rank, Bounce Rate, Avg Visit Duration, Pages Per Visit, Direct Competitors, Indirect Competitors

Note: Total Annual Revenue is occasionally populated from DuckDuckGo (e.g. "US$13.1 billion (2025)"), but this is not guaranteed and depends on DuckDuckGo's knowledge graph coverage.

📦 Example Output

Input:

{ "domains": ["openai.com"] }

Output:

{
  "domain":             "openai.com",
  "Company Name":       "OpenAI",
  "Company Description":"OpenAI Global, LLC is an American artificial intelligence (AI) research organization consisting of a for-profit public benefit corporation (PBC) and a nonprofit foundation, headquartered in San Francisco...",
  "Website URL":        "https://openai.com",
  "LinkedIn Profile URL":"https://www.linkedin.com/company/openai",
  "Industry/Vertical":  "Industry Research Services",
  "Company Type":       "Private",
  "Year Founded":       null,
  "Headquarters Location":"San Francisco",
  "Total Employees":    8644,
  "Executive Leadership":"Founders: Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman...",
  "Products & Services":"Platform Overview, Solutions",
  "Total Annual Revenue":"US$13.1 billion (2025)",
  "Tech Stack": {
    "Frontend":          ["React/Next.js"],
    "Analytics":         ["Amplitude"],
    "Infrastructure/CDN":["Cloudflare", "AWS"],
    "Server":            ["Next.js", "cloudflare"],
    "CDN":               ["Cloudflare"]
  },
  "Social Links": {
    "twitter":   "https://x.com/OpenAI",
    "youtube":   "https://www.youtube.com/OpenAI",
    "linkedin":  "https://www.linkedin.com/company/openai",
    "github":    "https://github.com/openai",
    "instagram": "https://www.instagram.com/openai/"
  },
  "GitHub URL":          "https://github.com/openai",
  "GitHub Public Repos": 246,
  "GitHub Followers":    120231,
  "Wikipedia URL":       "https://en.wikipedia.org/wiki/OpenAI",
  "Wikipedia Summary":   "OpenAI Global, LLC is an American artificial intelligence (AI) research organization...",
  "Recent News": [
    {
      "title":     "OpenAI's AI Chip Deal With Broadcom Hits $18 Billion Financing Snag",
      "published": "Thu, 07 May 2026 18:09:00 GMT",
      "url":       "https://news.google.com/..."
    }
  ],
  "Emails Found":   [],
  "Contact Email":  null,
  "data_sources":   ["website_html","linkedin_public","wikipedia","github_api","google_news","duckduckgo","opencorporates","http_headers"],
  "data_freshness": "real-time",
  "scraped_at":     "2026-05-08T02:32:37.964072Z"
}

💰 Pricing

This actor uses pay-per-event pricing — you only pay for successfully researched company records.

Event	Price
Actor start fee	$0.02 per run
Per company researched	$0.06 per result ($60 per 1,000 companies)

How billing works:

✅ The $0.02 start fee applies once per run regardless of how many companies are processed
✅ Each company domain that returns a successful result is charged at $0.06
✅ Domains that fail to return any data are not charged
✅ You control your total spend by setting a charge limit in your Apify account — the actor stops automatically when your limit is reached
✅ No free trial — pay only for what you use, starting from your first result

Example: Research 100 companies = $0.02 (start) + $6.00 (100 × $0.06) = $6.02 total

⚡ Performance & Limits

Companies	Estimated Time
1	~30–90 seconds
10	~5–15 minutes
50	~25–60 minutes
100	~50–120 minutes

All 8 data sources queried in parallel per company — not sequentially
Results pushed to the dataset immediately after each company is processed
Checkpoint saved after every company — restart resumes from last completed domain automatically
Proxy rotated every 5 companies for high-volume runs
Processing time varies by company — well-known companies with Wikipedia and GitHub entries are faster than obscure domains

❓ FAQ

Q: What is the minimum I need to spend to use this actor? A: There is no free trial. The minimum charge is $0.02 for starting the actor plus $0.06 for each successfully researched company. A single company costs $0.08 total.

Q: Why do some fields like Year Founded return null for well-known companies? A: Founding year data is sourced from Wikipedia, DuckDuckGo, LinkedIn, and the company website. Some sources may omit this field even for major companies. The actor only populates fields when the data is actually found — it never fabricates or estimates values.

Q: Which fields are guaranteed to be populated? A: domain, Website URL, data_sources, data_freshness, and scraped_at are always present. Company Name, Company Description, Headquarters Location, Total Employees, Wikipedia Summary, Tech Stack, Social Links, and Recent News are highly reliable for well-known companies with public profiles.

Q: Which fields will never have data? A: Fields requiring paid data sources — funding details, revenue growth rate, traffic metrics, market cap, burn rate, runway, and competitor lists — are always null. See the "Fields That Are Never Populated" section above for the complete list.

Q: What happens if a company domain fails to return data? A: The failure is logged and that domain is skipped. No charge is applied for failed domains. The actor continues processing all remaining domains in the batch.

Q: Does the checkpoint feature work if I hit my spending limit? A: Yes. If your Apify spending limit is reached mid-run, the actor saves a checkpoint. On your next run with the same input, it resumes from the last successfully processed domain — you are not charged again for already-completed companies.

Q: Can I export results to Excel or CSV? A: Yes. All results are pushed to the Apify dataset, which can be exported to JSON, CSV, Excel, and more directly from the Apify Console after each run.

📜 Changelog

v2.0.0 (Current)

✅ 8 data sources queried in parallel per company
✅ Tech stack detection across 5 categories (Frontend, Analytics, CRM, Infrastructure, Payments)
✅ CDN and server detection from HTTP response headers
✅ GitHub API integration — repos, followers, description, location
✅ Recent news extraction — up to 8 headlines per company
✅ Executive leadership names from DuckDuckGo knowledge graph
✅ Contact email extraction from company website
✅ Checkpoint/resume — restarts continue from last completed domain
✅ Pay-per-event billing — charged per successfully researched company
✅ Spending limit respect — stops automatically when user charge limit reached
✅ Real-time dataset push per company

v1.0.0

Initial release with website and Wikipedia data sources

🏷️ Tags

company research scraper company intelligence data business data extractor company profile scraper tech stack detector b2b data enrichment company intelligence crm enrichment lead enrichment company data scraper firmographic data business intelligence scraper

⚖️ Legal & Terms of Use

This actor retrieves publicly available company information from public websites, Wikipedia, GitHub, DuckDuckGo, OpenCorporates, Google News, and LinkedIn public pages — in the same way a regular user browses these platforms.

Please note:

Use extracted company intelligence data only for lawful purposes — sales prospecting, market research, CRM enrichment, investment screening, and academic study are common legitimate uses
Do not use this company research scraper to harvest personal data about individuals or facilitate harassment
Company information belongs to the respective organizations — always verify critical details directly with primary sources before making business decisions
Comply with applicable data protection laws — including GDPR and CCPA — when using extracted company data for outreach
The actor developer is not responsible for decisions made based on extracted company data

🤝 Support & Feedback

Bug report? Contact us via the Apify actor page
Feature request? Post in the Apify Community forum
Loving it? Please leave a ⭐ review — it helps other users find this actor!

Built with ❤️ on Apify
The most complete Company Research Scraper — 8 sources, 30+ fields, tech stack, GitHub, news

💰 $0.02 per run + $0.06 per company · No free trial · Pay only for results

AI Company Research Agent

constant_quadruped/ai-company-research-agent

Get comprehensive company intelligence in seconds. Research any company for tech stack, key employees, competitors, news sentiment & AI insights.

Company Enrichment from Domain

heenalr/company-enrichment

Enrich a company domain into name, description, emails, phones, social profiles, and tech stack — from the company's own public website. No LinkedIn. Built for AI agents.

Heenal Rajani

Company Deep Research — SEC, GitHub, DNS & Social

ryanclinton/company-deep-research

Generate comprehensive company research reports from 7+ sources: SEC filings, stock data, Wikipedia, GitHub, Trustpilot reviews, DNS records, and social media verification. One domain in, full intelligence report out.

Ryan Clinton

Company Domain

apioracle/company-domain

Retrieves the official company website and social media links for a given company name.

Leo Barone

995

4.9

Company Employees Scraper

build_matrix/company-employees-scraper

Fetch all employees from a company.

Build Matrix

837

4.4

Company Name to Domain Finder

ravishing_viceroy/company-name-to-domain-finder

Find company website domains from company names. Enter a business name and get the best matching domain with optional public company metadata.

Ravishing Viceroy

AI Company Lead Enrichment & Sales Intelligence

fabri-lab/leads-data-scraper-ai

Find company websites, social profiles, contact pages, emails, recent news, lead scores, sales angles, and outreach templates from company names or domains.

Yusuf Barış

Website Company Enricher

great_pistachio/website-company-enricher

Enrich company data from any website domain. Extracts company name, emails, phones, social links, tech stack, addresses, and more. A free alternative to Clearbit and Clay for lead enrichment and sales prospecting.

Saturnin Pugnet

Deep Intelligence Lookup

startuphub/deep-intelligence

A deep intelligence dossier on any person or company from a name, company, domain, LinkedIn URL, or email. Free redacted preview or full unredacted report.

StartupHub

Company Website Lead Enricher

peroxo/company-website-lead-enricher

Extract public company names, contact details, legal identifiers, and social links from company websites.

Michael Beetz