Crunchbase Company Scraper - Funding, People & B2B Intel
Pricing
Pay per event
Crunchbase Company Scraper - Funding, People & B2B Intel
Scrape Crunchbase company profiles with funding rounds, key people, and B2B intel. Build investor and startup databases at scale.
Pricing
Pay per event
Rating
0.0
(0)
Developer
George Kioko
Actor stats
0
Bookmarked
22
Total users
7
Monthly active users
2 days ago
Last modified
Categories
Share
Crunchbase scraper for direct company URLs, search URLs, and keyword discovery with a warmed Go-TLS VPS fetch layer, bounded /crawl fallback, explicit failure states, replayable proof artifacts, and verified-only billing.
What the actor does
- Treats direct
companyUrlsas the promotion-grade path - Tries direct
companyUrlsthrough a bounded VPS ladder:- warmed Go-TLS homepage + company fetch on the same session and proxy identity
- bounded VPS
/crawlfallback only when Go-TLS yields timeout, challenge, or shell-page evidence - actor-side Playwright only as a dispute lane after the VPS ladder still fails
- Uses actor-owned Apify residential proxy sessions and forwards them to the VPS
- Extracts company identity, funding, people, and investor data from evidence bundles instead of assuming one page-state format
- Saves replayable proof artifacts for direct-company VPS attempts
- Returns truthful outcomes for every item:
verified_successverified_partialblockedsearch_discovery_failedfailed
Billing contract
- Only
verified_successrows are billable - Event name:
company-scraped - Price:
$0.025per verified company verified_partial,blocked,search_discovery_failed, andfailedrows are returned as free diagnostic outcomes
This is intentional. Challenge pages, proxy failures, and weak evidence should not charge like verified company intelligence.
Cost policy
The actor is designed to minimize wasted Apify spend:
- local tests and local actor smoke should run first
- VPS
/health,/version, and/proxy-preflightshould pass before cloud validation - the exact direct company target must pass the proof ladder before a billed cloud canary is allowed
- the default cloud validation budget is one billed canary per change iteration
- repeated cloud retries are not the debugging loop
Direct-company proof ladder
Before a billed cloud canary is allowed for a direct company URL, the same target should already have:
- passed local tests
- passed bounded local smoke
- passed VPS
/health,/version, and/proxy-preflight - produced VPS-only success twice with fresh sessions
- reproduced the same result through local artifact replay
This keeps validation centered on likely-success targets and avoids spending Apify credits to discover whether the ladder works.
Shared VPS wiring
The production actor expects these actor environment variables:
VPS_URLVPS_API_KEY
When those secrets are present, direct companyUrls automatically use the shared Go-TLS VPS client. Input-level vpsGateway fields are overrides, not the primary production contract.
Current runtime behavior
- If Crunchbase is reachable and evidence is strong enough, the actor returns a verified company row
- If warmed Go-TLS times out or returns challenge/shell evidence, the actor retries once through VPS
/crawlwith the same upstream proxy identity - If both VPS lanes fail and evidence is still weak, the actor returns a free terminal
failedorblockedresult or escalates to the managed fallback path when enabled
Best input mode
Highest confidence:
companyUrls
Lower confidence candidate generation:
searchUrlskeywords
Discovery inputs produce candidate company URLs that still need verified company extraction before they become billable outputs.
searchUrls and keywords are not part of this change's billed promotion-readiness validation.
Example output signals
outcomebillingStatebillingReasonsfailureClasslaneAttemptsevidenceSummaryfieldProvenancelatencyMs
These are included so operators and users can distinguish verified company intelligence from degraded or blocked runs.
Local development
npm installnpm testnpm run test:smoke
If you want the heavier local actor path after the bounded smoke suite passes:
$env:CRUNCHBASE_SMOKE_MODE='actor'apify run --input-file=verify-input-vps.json
Deploy
$apify push
