Company Website Research avatar

Company Website Research

Pricing

from $0.001 / actor start

Go to Apify Store
Company Website Research

Company Website Research

Extracting comprehensive data from the corporate website

Pricing

from $0.001 / actor start

Rating

0.0

(0)

Developer

Jian Lee

Jian Lee

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

a day ago

Last modified

Share

Company Research Actor

Apify Actor for researching a public company website and returning structured website evidence in one JSON result.

This Actor is built for company research, lead enrichment, and downstream automation. It can start from a direct website, a bare domain, or only a company name.

What This Actor Does

  • accepts website_url, domain, or company_name
  • discovers an official website when only the company name is provided
  • prefers Apify's Google Search Results Scraper for company-name discovery and uses the first valid Google organic website result directly
  • falls back to the internal heuristic search flow only when the nested Google search actor is unavailable or returns no usable website result
  • if discovery still stays ambiguous after fallback, returns candidate_websites instead of guessing
  • crawls a small set of high-value pages such as homepage, about, products/services, and contact
  • uses a hybrid crawl strategy:
    • http-first when HTML is enough
    • browser-fallback when the site is JS-heavy or the HTTP probe is not enough
  • fails fast on heavy block signals such as CAPTCHA, WAF, or explicit access denial instead of spending time on low-value salvage attempts
  • when running on Apify, prepares a standby Apify Proxy profile and can auto-escalate to proxy for suspicious blocked hosts even if use_proxy is left off
  • extracts:
    • company name
    • resolved website and domain
    • LinkedIn company URL when found
    • cleaned text from kept pages
    • public emails, phones, and an address candidate
    • rule-based summary, products, and market signals
  • returns crawl metadata including strategy, mode, confidence, failure_reason, timing breakdown, browser engine, and salvage usage

Best Fit

Works best for:

  • company websites
  • manufacturer and industrial sites
  • B2B corporate sites
  • one-page company sites
  • public product/catalog websites with clear navigation

Less reliable for:

  • login-only sites
  • CAPTCHA or anti-bot protected sites
  • sites with very heavy client-side rendering
  • sites where key information is hidden behind forms, PDFs, or gated downloads

Input

Resolution order:

  1. website_url
  2. domain
  3. discovery from company_name

Main input fields:

  • company_name: company name for website discovery or as a hint for extraction
  • website_url: full website URL, highest priority input
  • domain: bare domain, normalized to https://<domain>/
  • social_link: known company social URL, usually LinkedIn
  • country: optional discovery hint
  • country: optional discovery hint, available as a dropdown in the Apify input UI
  • mode: fast or deep
  • anti_block_mode: browser hardening level, off, basic, or aggressive
  • use_proxy: force Apify Proxy from the start for HTTP and browser crawling
  • proxy_groups: optional Apify Proxy groups such as RESIDENTIAL
  • salvage_if_blocked: try likely subpages if the homepage is blocked or unavailable, except for clearly heavy-blocked sites that are failed fast
  • max_pages: max number of kept pages in output
  • max_text_chars: max total extracted text characters across kept pages
  • discover_if_missing: whether to discover a website when only the company name is given
  • extract_contacts: whether to extract emails, phones, and address
  • follow_subpages: whether to crawl internal pages beyond the first page
  • include_path_hints: preferred path fragments used to prioritize internal links

Mode

fast

  • lower latency
  • stops earlier once enough useful content is found
  • good for lead enrichment and bulk runs

deep

  • broader page coverage
  • better for contacts, products, and company profile quality
  • slower than fast

Anti-Block Mode

off

  • no browser hardening beyond the default crawler setup

basic

  • adds browser environment hardening and lightweight blocker dismissal
  • recommended default for most runs

aggressive

  • adds stronger popup/overlay removal and lightweight resource blocking
  • useful for difficult websites, but slightly riskier on fragile sites

Example Inputs

Direct website:

{
"website_url": "https://vnsteel.vn/",
"mode": "fast",
"max_pages": 3,
"max_text_chars": 7000,
"extract_contacts": true,
"follow_subpages": true
}

Bare domain:

{
"domain": "pny.com",
"mode": "deep",
"max_pages": 3,
"max_text_chars": 8000,
"extract_contacts": true,
"follow_subpages": true
}

Company name only:

{
"company_name": "VNSTEEL",
"country": "Vietnam",
"mode": "deep",
"max_pages": 3,
"max_text_chars": 7000,
"discover_if_missing": true,
"extract_contacts": true,
"follow_subpages": true
}

Company name discovery notes:

  • when only company_name is provided, this Actor first tries to call apify/google-search-scraper
  • if Google returns a usable organic website result, the Actor uses that website directly for crawling
  • the nested search run is executed under the current runner account, so the runner pays for that search usage
  • if the nested search run is unavailable or returns no usable website result, the Actor falls back to its internal discovery heuristic
  • if discovery is ambiguous, the Actor returns candidate_websites and stops instead of crawling the wrong website

Custom path hints:

{
"website_url": "https://eup.vn/",
"mode": "deep",
"max_pages": 4,
"max_text_chars": 10000,
"extract_contacts": true,
"follow_subpages": true,
"include_path_hints": [
"about",
"products",
"services",
"contact",
"gioi-thieu",
"linh-vuc",
"lien-he"
]
}

Output

The Actor writes one result object to:

  • the default dataset
  • the OUTPUT record in the default key-value store

Output Shape

{
"company_name": "PNY Technologies Inc.",
"resolved_website_url": "https://www.pny.com/",
"resolved_domain": "pny.com",
"resolved_social_link": "https://www.linkedin.com/company/pny-technologies/",
"candidate_websites": [],
"sources": [
"https://www.pny.com/",
"https://www.pny.com/professional/support/contact-us"
],
"pages": [
{
"url": "https://www.pny.com/",
"title": "PNY | NVIDIA Graphics, Storage, Networking & Memory Solutions",
"page_type": "homepage",
"text": "PNY delivers solutions in over 50 countries...",
"text_chars": 3200
}
],
"contacts": {
"emails": ["gopny@pny.com", "tsupport@pny.com"],
"phones": ["19735159700"],
"address": "100 Jefferson Road, Parsippany, New Jersey 07054 US"
},
"signals": {
"about_summary": "PNY delivers solutions in over 50 countries...",
"products": ["GeForce graphics cards", "Solid state drives", "PC memory"],
"markets": ["Global"]
},
"metadata": {
"discovery_used": false,
"strategy": "http-first",
"mode": "deep",
"anti_block_mode": "basic",
"browser_used": false,
"browser_engine": null,
"salvage_used": false,
"pages_crawled": 3,
"failure_reason": null,
"confidence": {
"website": 0.99,
"contacts": 0.99,
"summary": 0.85,
"products": 0.63,
"overall": 0.92
},
"timings": {
"total_ms": 5472,
"discovery_ms": 0,
"crawl_ms": 5472,
"http_probe_ms": 5472,
"browser_crawl_ms": 0
},
"duration_ms": 5472
}
}