Company Research Scraper — Deep Company Intelligence Data avatar

Company Research Scraper — Deep Company Intelligence Data

Pricing

from $50.00 / 1,000 company researcheds

Go to Apify Store
Company Research Scraper — Deep Company Intelligence Data

Company Research Scraper — Deep Company Intelligence Data

Extract deep company intelligence data from any domain. Get company name, description, employees, tech stack, social links, GitHub stats, Wikipedia summary, executive leadership, recent news, and contact emails — from 8 sources in parallel. $0.06 per company.

Pricing

from $50.00 / 1,000 company researcheds

Rating

0.0

(0)

Developer

Scrape Pilot

Scrape Pilot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

🔍 Company Research Scraper — Deep Company Intelligence Data

The most complete Company Research Scraper on Apify. Extract deep company intelligence data from any domain — company name, description, industry, headquarters, employee count, executive leadership, tech stack, social links, GitHub stats, Wikipedia summary, recent news, and contact emails — sourced simultaneously from 8 public data sources per company. No login. No API key. Pay only for results.


📌 Table of Contents


🔍 What Is This Actor?

Company Research Scraper is a production-ready Apify actor that extracts comprehensive company intelligence data from any business domain — pulling from 8 public data sources simultaneously per company and merging everything into one clean, structured record.

Provide one or many company domains — openai.com, stripe.com, shopify.com — and receive back a complete company intelligence profile: verified company name, full description, industry, company type, headquarters location, total employee count, executive leadership names, products and services, tech stack (frontend, analytics, CRM, infrastructure, payments), social media links across 6 platforms, GitHub repository stats, Wikipedia summary, recent news headlines, contact emails found on the website, and direct links to all source pages.

This company research scraper runs all 8 data source queries in parallel per company — website HTML, LinkedIn public page, Wikipedia, GitHub API, Google News, DuckDuckGo, OpenCorporates, and HTTP header analysis — delivering the most complete company intelligence data record available on Apify, faster than any sequential scraper.


🚀 Why Use This Company Research Scraper?

FeatureThis ActorManual ResearchZoomInfo / ClearbitOther Scrapers
8 sources per company — parallel❌ Hours✅ Expensive❌ 1–2 sources
Company description + Wikipedia✅ Slow⚠️
Tech stack detection✅ 5 categories
GitHub stats
Recent news (up to 8 headlines)
Executive leadership names
Contact email extraction⚠️
Social links (6 platforms)⚠️⚠️
Bulk domain input⚠️
Checkpoint resume on abortN/AN/A
No subscription or API keyN/A

Bottom line: This company research scraper is the only actor that aggregates company intelligence data from 8 public sources in parallel — delivering tech stack, GitHub stats, recent news, executive leadership, and contact emails alongside the standard company profile fields.


📡 Data Sources

Every company record is built from 8 public sources queried simultaneously:

SourceData Extracted
Company WebsiteMeta description, page title, JSON-LD schema, social links, emails, tech stack, products/services, copyright year, about page text
LinkedIn Public PageCompany name, employee count, founded year, headquarters, industry, about text
WikipediaFull company description, founded year, headquarters location, Wikipedia URL
GitHub APIOrganization name, public repo count, followers, description, location, blog, email, creation date
Google News RSSUp to 8 recent news headlines with publication date and URL
DuckDuckGo Instant AnswerCompany abstract, revenue, employee count, industry, company type, leadership names
OpenCorporatesIncorporation date, registered address, company type, OpenCorporates URL
HTTP HeadersServer technology, CDN provider (Cloudflare, AWS, Vercel, Fastly)

All 8 sources are queried in parallel per company — not sequentially — delivering a complete profile as fast as the slowest single source.


🎯 Use Cases

💼 B2B Sales & Account-Based Marketing

  • Build enriched prospect lists by running this company research scraper against target domain lists
  • Automatically populate CRM records with company description, employee count, industry, and contact email
  • Identify tech stack usage across prospects — find companies using Salesforce, HubSpot, or Stripe

🔍 Competitive Intelligence & Market Mapping

  • Extract company intelligence data on competitors — description, employees, leadership, tech stack, recent news
  • Monitor competitor news automatically by scheduling regular runs on competitor domains
  • Map the technology landscape of an industry by comparing tech stacks across multiple companies

🤖 AI & Data Pipeline Integrations

  • Feed structured company intelligence data into AI research assistants, CRM enrichment pipelines, or RAG systems
  • Build automated company profiling workflows for investment screening, partnership evaluation, or vendor assessment
  • Use GitHub stats and tech stack data to qualify engineering-focused companies for developer tools outreach

📊 Investment Research & Due Diligence

  • Run rapid first-pass due diligence on potential investment targets using publicly available company data
  • Extract executive leadership names, employee counts, and industry classification for portfolio screening
  • Collect recent news for any company to surface material events before deeper research

🏢 Partnership & Vendor Evaluation

  • Research potential partners or vendors at scale before initiating contact
  • Compare company size, tech stack, and industry focus across a shortlist of candidates
  • Find contact emails from company websites for first outreach

🎓 Academic & Business Research

  • Build structured datasets of company profiles for market structure, innovation, or technology adoption research
  • Collect GitHub activity data alongside company metadata for studies on open-source software practices
  • Analyze tech stack adoption patterns across industries using structured company intelligence data

⚙️ Input Parameters

{
"domains": [
"openai.com",
"stripe.com",
"shopify.com"
],
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
ParameterTypeDescription
domainsarray or stringCompany domains to research — e.g. "openai.com", "stripe.com". Newline-separated string also accepted. https:// prefix is optional — auto-handled
proxyConfigurationobjectApify proxy config — residential proxy recommended for LinkedIn and high-volume runs

Tip: You can enter domains with or without https:// or www. — all formats are normalized automatically. Mix any combination of domain formats in the same run.


📋 Output Fields

Every record from this company research scraper contains the following fields. Fields marked ✅ Reliable consistently return data. Fields marked ⚠️ Source-dependent depend on what each public source makes available.

🏢 Core Company Fields

FieldReliabilityDescriptionExample
domain✅ AlwaysNormalized company domain"openai.com"
Company Name✅ HighCompany display name"OpenAI"
Company Description✅ HighFull description from Wikipedia or website (max 1500 chars)"OpenAI is an American AI research organization..."
Website URL✅ AlwaysDirect company website URL"https://openai.com"
LinkedIn Profile URL✅ HighLinkedIn company page URL"https://www.linkedin.com/company/openai"
Industry/Vertical✅ MediumIndustry classification from LinkedIn"Industry Research Services"
Company Type✅ MediumPrivate / Public / Nonprofit"Private"
Year Founded⚠️ Source-dependentFounding year when available2015
Headquarters Location✅ HighPrimary office location"San Francisco"
Total Employees✅ HighEmployee count from LinkedIn8644
Executive Leadership⚠️ Source-dependentFounder and executive names"Founders: Sam Altman, Ilya Sutskever..."
Products & Services✅ MediumKey products and services from website"Platform Overview, Solutions"

💻 Tech Stack Fields

FieldReliabilityDescriptionExample
Tech Stack.Frontend✅ MediumFrontend frameworks detected["React/Next.js", "TypeScript"]
Tech Stack.Analytics✅ MediumAnalytics tools detected["Google Analytics", "Amplitude"]
Tech Stack.Marketing/CRM✅ MediumCRM and marketing tools["HubSpot", "Salesforce"]
Tech Stack.Infrastructure/CDN✅ HighCloud and CDN providers["Cloudflare", "AWS"]
Tech Stack.Payments⚠️ Source-dependentPayment processors detected["Stripe"]
Tech Stack.Server✅ MediumServer technology from headers["Next.js", "cloudflare"]
Tech Stack.CDN✅ HighCDN from HTTP headers["Cloudflare"]

🔗 Social & Contact Fields

FieldReliabilityDescriptionExample
Social Links.linkedin✅ HighLinkedIn company URL"https://www.linkedin.com/company/openai"
Social Links.twitter✅ HighTwitter/X profile URL"https://x.com/OpenAI"
Social Links.youtube✅ MediumYouTube channel URL"https://www.youtube.com/OpenAI"
Social Links.github✅ MediumGitHub organization URL"https://github.com/openai"
Social Links.facebook⚠️ Source-dependentFacebook page URL"https://www.facebook.com/openai"
Social Links.instagram⚠️ Source-dependentInstagram profile URL"https://www.instagram.com/openai"
Emails Found⚠️ Source-dependentEmails found on website (up to 5)["press@company.com"]
Contact Email⚠️ Source-dependentBest contact email identified"press@company.com"

📊 GitHub Fields

FieldReliabilityDescriptionExample
GitHub URL✅ MediumGitHub org page URL"https://github.com/openai"
GitHub Public Repos✅ MediumNumber of public repositories246
GitHub Followers✅ MediumGitHub organization followers120231
GitHub Description⚠️ Source-dependentGitHub org bio"AI research and deployment company"

📰 News & Reference Fields

FieldReliabilityDescriptionExample
Wikipedia URL✅ HighWikipedia article URL"https://en.wikipedia.org/wiki/OpenAI"
Wikipedia Summary✅ HighFull Wikipedia extract"OpenAI is an American AI..."
Recent News✅ HighUp to 8 recent news items with title, date, and URLSee example below
OpenCorporates URL⚠️ Source-dependentOpenCorporates registry link"https://opencorporates.com/..."

⏱️ Meta Fields

FieldDescriptionExample
data_sourcesList of all 8 sources queried["website_html", "linkedin_public", "wikipedia", ...]
data_freshnessData recency status"real-time"
scraped_atExtraction timestamp (ISO 8601 UTC)"2024-03-15T10:30:00Z"

❌ Fields That Are Never Populated

The following fields appear in the record schema but are always null — they require paid data sources not included in this actor:

Company Stage, Additional Locations, Revenue Growth Rate, Revenue Per Employee, Funding Raised to Date, Latest Funding Round, Latest Funding Amount, Latest Funding Date, Major Investors, Annual Profitability, EBITDA, Burn Rate, Runway, Cash Position, Market Cap, Monthly Website Visits, Global Traffic Rank, Bounce Rate, Avg Visit Duration, Pages Per Visit, Direct Competitors, Indirect Competitors

Note: Total Annual Revenue is occasionally populated from DuckDuckGo (e.g. "US$13.1 billion (2025)"), but this is not guaranteed and depends on DuckDuckGo's knowledge graph coverage.


📦 Example Output

Input:

{ "domains": ["openai.com"] }

Output:

{
"domain": "openai.com",
"Company Name": "OpenAI",
"Company Description":"OpenAI Global, LLC is an American artificial intelligence (AI) research organization consisting of a for-profit public benefit corporation (PBC) and a nonprofit foundation, headquartered in San Francisco...",
"Website URL": "https://openai.com",
"LinkedIn Profile URL":"https://www.linkedin.com/company/openai",
"Industry/Vertical": "Industry Research Services",
"Company Type": "Private",
"Year Founded": null,
"Headquarters Location":"San Francisco",
"Total Employees": 8644,
"Executive Leadership":"Founders: Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman...",
"Products & Services":"Platform Overview, Solutions",
"Total Annual Revenue":"US$13.1 billion (2025)",
"Tech Stack": {
"Frontend": ["React/Next.js"],
"Analytics": ["Amplitude"],
"Infrastructure/CDN":["Cloudflare", "AWS"],
"Server": ["Next.js", "cloudflare"],
"CDN": ["Cloudflare"]
},
"Social Links": {
"twitter": "https://x.com/OpenAI",
"youtube": "https://www.youtube.com/OpenAI",
"linkedin": "https://www.linkedin.com/company/openai",
"github": "https://github.com/openai",
"instagram": "https://www.instagram.com/openai/"
},
"GitHub URL": "https://github.com/openai",
"GitHub Public Repos": 246,
"GitHub Followers": 120231,
"Wikipedia URL": "https://en.wikipedia.org/wiki/OpenAI",
"Wikipedia Summary": "OpenAI Global, LLC is an American artificial intelligence (AI) research organization...",
"Recent News": [
{
"title": "OpenAI's AI Chip Deal With Broadcom Hits $18 Billion Financing Snag",
"published": "Thu, 07 May 2026 18:09:00 GMT",
"url": "https://news.google.com/..."
}
],
"Emails Found": [],
"Contact Email": null,
"data_sources": ["website_html","linkedin_public","wikipedia","github_api","google_news","duckduckgo","opencorporates","http_headers"],
"data_freshness": "real-time",
"scraped_at": "2026-05-08T02:32:37.964072Z"
}

💰 Pricing

This actor uses pay-per-event pricing — you only pay for successfully researched company records.

EventPrice
Actor start fee$0.02 per run
Per company researched$0.06 per result ($60 per 1,000 companies)

How billing works:

  • ✅ The $0.02 start fee applies once per run regardless of how many companies are processed
  • ✅ Each company domain that returns a successful result is charged at $0.06
  • ✅ Domains that fail to return any data are not charged
  • ✅ You control your total spend by setting a charge limit in your Apify account — the actor stops automatically when your limit is reached
  • No free trial — pay only for what you use, starting from your first result

Example: Research 100 companies = $0.02 (start) + $6.00 (100 × $0.06) = $6.02 total


⚡ Performance & Limits

CompaniesEstimated Time
1~30–90 seconds
10~5–15 minutes
50~25–60 minutes
100~50–120 minutes
  • All 8 data sources queried in parallel per company — not sequentially
  • Results pushed to the dataset immediately after each company is processed
  • Checkpoint saved after every company — restart resumes from last completed domain automatically
  • Proxy rotated every 5 companies for high-volume runs
  • Processing time varies by company — well-known companies with Wikipedia and GitHub entries are faster than obscure domains

❓ FAQ

Q: What is the minimum I need to spend to use this actor? A: There is no free trial. The minimum charge is $0.02 for starting the actor plus $0.06 for each successfully researched company. A single company costs $0.08 total.

Q: Why do some fields like Year Founded return null for well-known companies? A: Founding year data is sourced from Wikipedia, DuckDuckGo, LinkedIn, and the company website. Some sources may omit this field even for major companies. The actor only populates fields when the data is actually found — it never fabricates or estimates values.

Q: Which fields are guaranteed to be populated? A: domain, Website URL, data_sources, data_freshness, and scraped_at are always present. Company Name, Company Description, Headquarters Location, Total Employees, Wikipedia Summary, Tech Stack, Social Links, and Recent News are highly reliable for well-known companies with public profiles.

Q: Which fields will never have data? A: Fields requiring paid data sources — funding details, revenue growth rate, traffic metrics, market cap, burn rate, runway, and competitor lists — are always null. See the "Fields That Are Never Populated" section above for the complete list.

Q: What happens if a company domain fails to return data? A: The failure is logged and that domain is skipped. No charge is applied for failed domains. The actor continues processing all remaining domains in the batch.

Q: Does the checkpoint feature work if I hit my spending limit? A: Yes. If your Apify spending limit is reached mid-run, the actor saves a checkpoint. On your next run with the same input, it resumes from the last successfully processed domain — you are not charged again for already-completed companies.

Q: Can I export results to Excel or CSV? A: Yes. All results are pushed to the Apify dataset, which can be exported to JSON, CSV, Excel, and more directly from the Apify Console after each run.


📜 Changelog

v2.0.0 (Current)

  • ✅ 8 data sources queried in parallel per company
  • ✅ Tech stack detection across 5 categories (Frontend, Analytics, CRM, Infrastructure, Payments)
  • ✅ CDN and server detection from HTTP response headers
  • ✅ GitHub API integration — repos, followers, description, location
  • ✅ Recent news extraction — up to 8 headlines per company
  • ✅ Executive leadership names from DuckDuckGo knowledge graph
  • ✅ Contact email extraction from company website
  • ✅ Checkpoint/resume — restarts continue from last completed domain
  • ✅ Pay-per-event billing — charged per successfully researched company
  • ✅ Spending limit respect — stops automatically when user charge limit reached
  • ✅ Real-time dataset push per company

v1.0.0

  • Initial release with website and Wikipedia data sources

🏷️ Tags

company research scraper company intelligence data business data extractor company profile scraper tech stack detector b2b data enrichment company intelligence crm enrichment lead enrichment company data scraper firmographic data business intelligence scraper


This actor retrieves publicly available company information from public websites, Wikipedia, GitHub, DuckDuckGo, OpenCorporates, Google News, and LinkedIn public pages — in the same way a regular user browses these platforms.

Please note:

  • Use extracted company intelligence data only for lawful purposes — sales prospecting, market research, CRM enrichment, investment screening, and academic study are common legitimate uses
  • Do not use this company research scraper to harvest personal data about individuals or facilitate harassment
  • Company information belongs to the respective organizations — always verify critical details directly with primary sources before making business decisions
  • Comply with applicable data protection laws — including GDPR and CCPA — when using extracted company data for outreach
  • The actor developer is not responsible for decisions made based on extracted company data

🤝 Support & Feedback

  • Bug report? Contact us via the Apify actor page
  • Feature request? Post in the Apify Community forum
  • Loving it? Please leave a ⭐ review — it helps other users find this actor!

Built with ❤️ on Apify
The most complete Company Research Scraper — 8 sources, 30+ fields, tech stack, GitHub, news

💰 $0.02 per run + $0.06 per company · No free trial · Pay only for results