Company Research Scraper — Deep Company Intelligence Data
Pricing
from $50.00 / 1,000 company researcheds
Company Research Scraper — Deep Company Intelligence Data
Extract deep company intelligence data from any domain. Get company name, description, employees, tech stack, social links, GitHub stats, Wikipedia summary, executive leadership, recent news, and contact emails — from 8 sources in parallel. $0.06 per company.
Pricing
from $50.00 / 1,000 company researcheds
Rating
0.0
(0)
Developer
Scrape Pilot
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
🔍 Company Research Scraper — Deep Company Intelligence Data
The most complete Company Research Scraper on Apify. Extract deep company intelligence data from any domain — company name, description, industry, headquarters, employee count, executive leadership, tech stack, social links, GitHub stats, Wikipedia summary, recent news, and contact emails — sourced simultaneously from 8 public data sources per company. No login. No API key. Pay only for results.
📌 Table of Contents
- What Is This Actor?
- Why Use This Company Research Scraper?
- Data Sources
- Use Cases
- Input Parameters
- Output Fields
- Example Output
- Pricing
- Performance & Limits
- FAQ
- Changelog
- Legal & Terms of Use
🔍 What Is This Actor?
Company Research Scraper is a production-ready Apify actor that extracts comprehensive company intelligence data from any business domain — pulling from 8 public data sources simultaneously per company and merging everything into one clean, structured record.
Provide one or many company domains — openai.com, stripe.com, shopify.com — and receive back a complete company intelligence profile: verified company name, full description, industry, company type, headquarters location, total employee count, executive leadership names, products and services, tech stack (frontend, analytics, CRM, infrastructure, payments), social media links across 6 platforms, GitHub repository stats, Wikipedia summary, recent news headlines, contact emails found on the website, and direct links to all source pages.
This company research scraper runs all 8 data source queries in parallel per company — website HTML, LinkedIn public page, Wikipedia, GitHub API, Google News, DuckDuckGo, OpenCorporates, and HTTP header analysis — delivering the most complete company intelligence data record available on Apify, faster than any sequential scraper.
🚀 Why Use This Company Research Scraper?
| Feature | This Actor | Manual Research | ZoomInfo / Clearbit | Other Scrapers |
|---|---|---|---|---|
| 8 sources per company — parallel | ✅ | ❌ Hours | ✅ Expensive | ❌ 1–2 sources |
| Company description + Wikipedia | ✅ | ✅ Slow | ✅ | ⚠️ |
| Tech stack detection | ✅ 5 categories | ❌ | ✅ | ❌ |
| GitHub stats | ✅ | ❌ | ❌ | ❌ |
| Recent news (up to 8 headlines) | ✅ | ❌ | ❌ | ❌ |
| Executive leadership names | ✅ | ✅ | ✅ | ❌ |
| Contact email extraction | ✅ | ❌ | ✅ | ⚠️ |
| Social links (6 platforms) | ✅ | ❌ | ⚠️ | ⚠️ |
| Bulk domain input | ✅ | ❌ | ✅ | ⚠️ |
| Checkpoint resume on abort | ✅ | N/A | N/A | ❌ |
| No subscription or API key | ✅ | N/A | ❌ | ✅ |
Bottom line: This company research scraper is the only actor that aggregates company intelligence data from 8 public sources in parallel — delivering tech stack, GitHub stats, recent news, executive leadership, and contact emails alongside the standard company profile fields.
📡 Data Sources
Every company record is built from 8 public sources queried simultaneously:
| Source | Data Extracted |
|---|---|
| Company Website | Meta description, page title, JSON-LD schema, social links, emails, tech stack, products/services, copyright year, about page text |
| LinkedIn Public Page | Company name, employee count, founded year, headquarters, industry, about text |
| Wikipedia | Full company description, founded year, headquarters location, Wikipedia URL |
| GitHub API | Organization name, public repo count, followers, description, location, blog, email, creation date |
| Google News RSS | Up to 8 recent news headlines with publication date and URL |
| DuckDuckGo Instant Answer | Company abstract, revenue, employee count, industry, company type, leadership names |
| OpenCorporates | Incorporation date, registered address, company type, OpenCorporates URL |
| HTTP Headers | Server technology, CDN provider (Cloudflare, AWS, Vercel, Fastly) |
All 8 sources are queried in parallel per company — not sequentially — delivering a complete profile as fast as the slowest single source.
🎯 Use Cases
💼 B2B Sales & Account-Based Marketing
- Build enriched prospect lists by running this company research scraper against target domain lists
- Automatically populate CRM records with company description, employee count, industry, and contact email
- Identify tech stack usage across prospects — find companies using Salesforce, HubSpot, or Stripe
🔍 Competitive Intelligence & Market Mapping
- Extract company intelligence data on competitors — description, employees, leadership, tech stack, recent news
- Monitor competitor news automatically by scheduling regular runs on competitor domains
- Map the technology landscape of an industry by comparing tech stacks across multiple companies
🤖 AI & Data Pipeline Integrations
- Feed structured company intelligence data into AI research assistants, CRM enrichment pipelines, or RAG systems
- Build automated company profiling workflows for investment screening, partnership evaluation, or vendor assessment
- Use GitHub stats and tech stack data to qualify engineering-focused companies for developer tools outreach
📊 Investment Research & Due Diligence
- Run rapid first-pass due diligence on potential investment targets using publicly available company data
- Extract executive leadership names, employee counts, and industry classification for portfolio screening
- Collect recent news for any company to surface material events before deeper research
🏢 Partnership & Vendor Evaluation
- Research potential partners or vendors at scale before initiating contact
- Compare company size, tech stack, and industry focus across a shortlist of candidates
- Find contact emails from company websites for first outreach
🎓 Academic & Business Research
- Build structured datasets of company profiles for market structure, innovation, or technology adoption research
- Collect GitHub activity data alongside company metadata for studies on open-source software practices
- Analyze tech stack adoption patterns across industries using structured company intelligence data
⚙️ Input Parameters
{"domains": ["openai.com","stripe.com","shopify.com"],"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
| Parameter | Type | Description |
|---|---|---|
domains | array or string | Company domains to research — e.g. "openai.com", "stripe.com". Newline-separated string also accepted. https:// prefix is optional — auto-handled |
proxyConfiguration | object | Apify proxy config — residential proxy recommended for LinkedIn and high-volume runs |
Tip: You can enter domains with or without
https://orwww.— all formats are normalized automatically. Mix any combination of domain formats in the same run.
📋 Output Fields
Every record from this company research scraper contains the following fields. Fields marked ✅ Reliable consistently return data. Fields marked ⚠️ Source-dependent depend on what each public source makes available.
🏢 Core Company Fields
| Field | Reliability | Description | Example |
|---|---|---|---|
domain | ✅ Always | Normalized company domain | "openai.com" |
Company Name | ✅ High | Company display name | "OpenAI" |
Company Description | ✅ High | Full description from Wikipedia or website (max 1500 chars) | "OpenAI is an American AI research organization..." |
Website URL | ✅ Always | Direct company website URL | "https://openai.com" |
LinkedIn Profile URL | ✅ High | LinkedIn company page URL | "https://www.linkedin.com/company/openai" |
Industry/Vertical | ✅ Medium | Industry classification from LinkedIn | "Industry Research Services" |
Company Type | ✅ Medium | Private / Public / Nonprofit | "Private" |
Year Founded | ⚠️ Source-dependent | Founding year when available | 2015 |
Headquarters Location | ✅ High | Primary office location | "San Francisco" |
Total Employees | ✅ High | Employee count from LinkedIn | 8644 |
Executive Leadership | ⚠️ Source-dependent | Founder and executive names | "Founders: Sam Altman, Ilya Sutskever..." |
Products & Services | ✅ Medium | Key products and services from website | "Platform Overview, Solutions" |
💻 Tech Stack Fields
| Field | Reliability | Description | Example |
|---|---|---|---|
Tech Stack.Frontend | ✅ Medium | Frontend frameworks detected | ["React/Next.js", "TypeScript"] |
Tech Stack.Analytics | ✅ Medium | Analytics tools detected | ["Google Analytics", "Amplitude"] |
Tech Stack.Marketing/CRM | ✅ Medium | CRM and marketing tools | ["HubSpot", "Salesforce"] |
Tech Stack.Infrastructure/CDN | ✅ High | Cloud and CDN providers | ["Cloudflare", "AWS"] |
Tech Stack.Payments | ⚠️ Source-dependent | Payment processors detected | ["Stripe"] |
Tech Stack.Server | ✅ Medium | Server technology from headers | ["Next.js", "cloudflare"] |
Tech Stack.CDN | ✅ High | CDN from HTTP headers | ["Cloudflare"] |
🔗 Social & Contact Fields
| Field | Reliability | Description | Example |
|---|---|---|---|
Social Links.linkedin | ✅ High | LinkedIn company URL | "https://www.linkedin.com/company/openai" |
Social Links.twitter | ✅ High | Twitter/X profile URL | "https://x.com/OpenAI" |
Social Links.youtube | ✅ Medium | YouTube channel URL | "https://www.youtube.com/OpenAI" |
Social Links.github | ✅ Medium | GitHub organization URL | "https://github.com/openai" |
Social Links.facebook | ⚠️ Source-dependent | Facebook page URL | "https://www.facebook.com/openai" |
Social Links.instagram | ⚠️ Source-dependent | Instagram profile URL | "https://www.instagram.com/openai" |
Emails Found | ⚠️ Source-dependent | Emails found on website (up to 5) | ["press@company.com"] |
Contact Email | ⚠️ Source-dependent | Best contact email identified | "press@company.com" |
📊 GitHub Fields
| Field | Reliability | Description | Example |
|---|---|---|---|
GitHub URL | ✅ Medium | GitHub org page URL | "https://github.com/openai" |
GitHub Public Repos | ✅ Medium | Number of public repositories | 246 |
GitHub Followers | ✅ Medium | GitHub organization followers | 120231 |
GitHub Description | ⚠️ Source-dependent | GitHub org bio | "AI research and deployment company" |
📰 News & Reference Fields
| Field | Reliability | Description | Example |
|---|---|---|---|
Wikipedia URL | ✅ High | Wikipedia article URL | "https://en.wikipedia.org/wiki/OpenAI" |
Wikipedia Summary | ✅ High | Full Wikipedia extract | "OpenAI is an American AI..." |
Recent News | ✅ High | Up to 8 recent news items with title, date, and URL | See example below |
OpenCorporates URL | ⚠️ Source-dependent | OpenCorporates registry link | "https://opencorporates.com/..." |
⏱️ Meta Fields
| Field | Description | Example |
|---|---|---|
data_sources | List of all 8 sources queried | ["website_html", "linkedin_public", "wikipedia", ...] |
data_freshness | Data recency status | "real-time" |
scraped_at | Extraction timestamp (ISO 8601 UTC) | "2024-03-15T10:30:00Z" |
❌ Fields That Are Never Populated
The following fields appear in the record schema but are always null — they require paid data sources not included in this actor:
Company Stage, Additional Locations, Revenue Growth Rate, Revenue Per Employee, Funding Raised to Date, Latest Funding Round, Latest Funding Amount, Latest Funding Date, Major Investors, Annual Profitability, EBITDA, Burn Rate, Runway, Cash Position, Market Cap, Monthly Website Visits, Global Traffic Rank, Bounce Rate, Avg Visit Duration, Pages Per Visit, Direct Competitors, Indirect Competitors
Note:
Total Annual Revenueis occasionally populated from DuckDuckGo (e.g."US$13.1 billion (2025)"), but this is not guaranteed and depends on DuckDuckGo's knowledge graph coverage.
📦 Example Output
Input:
{ "domains": ["openai.com"] }
Output:
{"domain": "openai.com","Company Name": "OpenAI","Company Description":"OpenAI Global, LLC is an American artificial intelligence (AI) research organization consisting of a for-profit public benefit corporation (PBC) and a nonprofit foundation, headquartered in San Francisco...","Website URL": "https://openai.com","LinkedIn Profile URL":"https://www.linkedin.com/company/openai","Industry/Vertical": "Industry Research Services","Company Type": "Private","Year Founded": null,"Headquarters Location":"San Francisco","Total Employees": 8644,"Executive Leadership":"Founders: Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman...","Products & Services":"Platform Overview, Solutions","Total Annual Revenue":"US$13.1 billion (2025)","Tech Stack": {"Frontend": ["React/Next.js"],"Analytics": ["Amplitude"],"Infrastructure/CDN":["Cloudflare", "AWS"],"Server": ["Next.js", "cloudflare"],"CDN": ["Cloudflare"]},"Social Links": {"twitter": "https://x.com/OpenAI","youtube": "https://www.youtube.com/OpenAI","linkedin": "https://www.linkedin.com/company/openai","github": "https://github.com/openai","instagram": "https://www.instagram.com/openai/"},"GitHub URL": "https://github.com/openai","GitHub Public Repos": 246,"GitHub Followers": 120231,"Wikipedia URL": "https://en.wikipedia.org/wiki/OpenAI","Wikipedia Summary": "OpenAI Global, LLC is an American artificial intelligence (AI) research organization...","Recent News": [{"title": "OpenAI's AI Chip Deal With Broadcom Hits $18 Billion Financing Snag","published": "Thu, 07 May 2026 18:09:00 GMT","url": "https://news.google.com/..."}],"Emails Found": [],"Contact Email": null,"data_sources": ["website_html","linkedin_public","wikipedia","github_api","google_news","duckduckgo","opencorporates","http_headers"],"data_freshness": "real-time","scraped_at": "2026-05-08T02:32:37.964072Z"}
💰 Pricing
This actor uses pay-per-event pricing — you only pay for successfully researched company records.
| Event | Price |
|---|---|
| Actor start fee | $0.02 per run |
| Per company researched | $0.06 per result ($60 per 1,000 companies) |
How billing works:
- ✅ The $0.02 start fee applies once per run regardless of how many companies are processed
- ✅ Each company domain that returns a successful result is charged at $0.06
- ✅ Domains that fail to return any data are not charged
- ✅ You control your total spend by setting a charge limit in your Apify account — the actor stops automatically when your limit is reached
- ✅ No free trial — pay only for what you use, starting from your first result
Example: Research 100 companies = $0.02 (start) + $6.00 (100 × $0.06) = $6.02 total
⚡ Performance & Limits
| Companies | Estimated Time |
|---|---|
| 1 | ~30–90 seconds |
| 10 | ~5–15 minutes |
| 50 | ~25–60 minutes |
| 100 | ~50–120 minutes |
- All 8 data sources queried in parallel per company — not sequentially
- Results pushed to the dataset immediately after each company is processed
- Checkpoint saved after every company — restart resumes from last completed domain automatically
- Proxy rotated every 5 companies for high-volume runs
- Processing time varies by company — well-known companies with Wikipedia and GitHub entries are faster than obscure domains
❓ FAQ
Q: What is the minimum I need to spend to use this actor? A: There is no free trial. The minimum charge is $0.02 for starting the actor plus $0.06 for each successfully researched company. A single company costs $0.08 total.
Q: Why do some fields like Year Founded return null for well-known companies?
A: Founding year data is sourced from Wikipedia, DuckDuckGo, LinkedIn, and the company website. Some sources may omit this field even for major companies. The actor only populates fields when the data is actually found — it never fabricates or estimates values.
Q: Which fields are guaranteed to be populated?
A: domain, Website URL, data_sources, data_freshness, and scraped_at are always present. Company Name, Company Description, Headquarters Location, Total Employees, Wikipedia Summary, Tech Stack, Social Links, and Recent News are highly reliable for well-known companies with public profiles.
Q: Which fields will never have data? A: Fields requiring paid data sources — funding details, revenue growth rate, traffic metrics, market cap, burn rate, runway, and competitor lists — are always null. See the "Fields That Are Never Populated" section above for the complete list.
Q: What happens if a company domain fails to return data? A: The failure is logged and that domain is skipped. No charge is applied for failed domains. The actor continues processing all remaining domains in the batch.
Q: Does the checkpoint feature work if I hit my spending limit? A: Yes. If your Apify spending limit is reached mid-run, the actor saves a checkpoint. On your next run with the same input, it resumes from the last successfully processed domain — you are not charged again for already-completed companies.
Q: Can I export results to Excel or CSV? A: Yes. All results are pushed to the Apify dataset, which can be exported to JSON, CSV, Excel, and more directly from the Apify Console after each run.
📜 Changelog
v2.0.0 (Current)
- ✅ 8 data sources queried in parallel per company
- ✅ Tech stack detection across 5 categories (Frontend, Analytics, CRM, Infrastructure, Payments)
- ✅ CDN and server detection from HTTP response headers
- ✅ GitHub API integration — repos, followers, description, location
- ✅ Recent news extraction — up to 8 headlines per company
- ✅ Executive leadership names from DuckDuckGo knowledge graph
- ✅ Contact email extraction from company website
- ✅ Checkpoint/resume — restarts continue from last completed domain
- ✅ Pay-per-event billing — charged per successfully researched company
- ✅ Spending limit respect — stops automatically when user charge limit reached
- ✅ Real-time dataset push per company
v1.0.0
- Initial release with website and Wikipedia data sources
🏷️ Tags
company research scraper company intelligence data business data extractor company profile scraper tech stack detector b2b data enrichment company intelligence crm enrichment lead enrichment company data scraper firmographic data business intelligence scraper
⚖️ Legal & Terms of Use
This actor retrieves publicly available company information from public websites, Wikipedia, GitHub, DuckDuckGo, OpenCorporates, Google News, and LinkedIn public pages — in the same way a regular user browses these platforms.
Please note:
- Use extracted company intelligence data only for lawful purposes — sales prospecting, market research, CRM enrichment, investment screening, and academic study are common legitimate uses
- Do not use this company research scraper to harvest personal data about individuals or facilitate harassment
- Company information belongs to the respective organizations — always verify critical details directly with primary sources before making business decisions
- Comply with applicable data protection laws — including GDPR and CCPA — when using extracted company data for outreach
- The actor developer is not responsible for decisions made based on extracted company data
🤝 Support & Feedback
- Bug report? Contact us via the Apify actor page
- Feature request? Post in the Apify Community forum
- Loving it? Please leave a ⭐ review — it helps other users find this actor!
Built with ❤️ on Apify
The most complete Company Research Scraper — 8 sources, 30+ fields, tech stack, GitHub, news
💰 $0.02 per run + $0.06 per company · No free trial · Pay only for results