Crunchbase Company Scraper - Funding, People & B2B Intel avatar

Crunchbase Company Scraper - Funding, People & B2B Intel

Pricing

Pay per event

Go to Apify Store
Crunchbase Company Scraper - Funding, People & B2B Intel

Crunchbase Company Scraper - Funding, People & B2B Intel

Scrape Crunchbase company profiles with funding rounds, key people, and B2B intel. Build investor and startup databases at scale.

Pricing

Pay per event

Rating

0.0

(0)

Developer

George Kioko

George Kioko

Maintained by Community

Actor stats

0

Bookmarked

22

Total users

7

Monthly active users

2 days ago

Last modified

Share

Crunchbase scraper for direct company URLs, search URLs, and keyword discovery with a warmed Go-TLS VPS fetch layer, bounded /crawl fallback, explicit failure states, replayable proof artifacts, and verified-only billing.

What the actor does

  • Treats direct companyUrls as the promotion-grade path
  • Tries direct companyUrls through a bounded VPS ladder:
    • warmed Go-TLS homepage + company fetch on the same session and proxy identity
    • bounded VPS /crawl fallback only when Go-TLS yields timeout, challenge, or shell-page evidence
    • actor-side Playwright only as a dispute lane after the VPS ladder still fails
  • Uses actor-owned Apify residential proxy sessions and forwards them to the VPS
  • Extracts company identity, funding, people, and investor data from evidence bundles instead of assuming one page-state format
  • Saves replayable proof artifacts for direct-company VPS attempts
  • Returns truthful outcomes for every item:
    • verified_success
    • verified_partial
    • blocked
    • search_discovery_failed
    • failed

Billing contract

  • Only verified_success rows are billable
  • Event name: company-scraped
  • Price: $0.025 per verified company
  • verified_partial, blocked, search_discovery_failed, and failed rows are returned as free diagnostic outcomes

This is intentional. Challenge pages, proxy failures, and weak evidence should not charge like verified company intelligence.

Cost policy

The actor is designed to minimize wasted Apify spend:

  • local tests and local actor smoke should run first
  • VPS /health, /version, and /proxy-preflight should pass before cloud validation
  • the exact direct company target must pass the proof ladder before a billed cloud canary is allowed
  • the default cloud validation budget is one billed canary per change iteration
  • repeated cloud retries are not the debugging loop

Direct-company proof ladder

Before a billed cloud canary is allowed for a direct company URL, the same target should already have:

  1. passed local tests
  2. passed bounded local smoke
  3. passed VPS /health, /version, and /proxy-preflight
  4. produced VPS-only success twice with fresh sessions
  5. reproduced the same result through local artifact replay

This keeps validation centered on likely-success targets and avoids spending Apify credits to discover whether the ladder works.

Shared VPS wiring

The production actor expects these actor environment variables:

  • VPS_URL
  • VPS_API_KEY

When those secrets are present, direct companyUrls automatically use the shared Go-TLS VPS client. Input-level vpsGateway fields are overrides, not the primary production contract.

Current runtime behavior

  • If Crunchbase is reachable and evidence is strong enough, the actor returns a verified company row
  • If warmed Go-TLS times out or returns challenge/shell evidence, the actor retries once through VPS /crawl with the same upstream proxy identity
  • If both VPS lanes fail and evidence is still weak, the actor returns a free terminal failed or blocked result or escalates to the managed fallback path when enabled

Best input mode

Highest confidence:

  • companyUrls

Lower confidence candidate generation:

  • searchUrls
  • keywords

Discovery inputs produce candidate company URLs that still need verified company extraction before they become billable outputs.

searchUrls and keywords are not part of this change's billed promotion-readiness validation.

Example output signals

  • outcome
  • billingState
  • billingReasons
  • failureClass
  • laneAttempts
  • evidenceSummary
  • fieldProvenance
  • latencyMs

These are included so operators and users can distinguish verified company intelligence from degraded or blocked runs.

Local development

npm install
npm test
npm run test:smoke

If you want the heavier local actor path after the bounded smoke suite passes:

$env:CRUNCHBASE_SMOKE_MODE='actor'
apify run --input-file=verify-input-vps.json

Deploy

$apify push