PitchBook Data Extractor
Pricing
from $2.99 / 1,000 results
PitchBook Data Extractor
PitchBook investor scraper that pulls firm profiles by investor ID, so you can build sourcing lists and keep your CRM current without clicking through profiles manually.
Pricing
from $2.99 / 1,000 results
Rating
0.0
(0)
Developer
Kawsar
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
PitchBook Data Extractor scrapes public investor profile pages from PitchBook by investor ID or profile URL. Feed it a list of IDs and it returns structured JSON with firm details, deal counts, contact info, social links, and a sample of recent investments -- no manual browsing required.
What is PitchBook?
PitchBook is a financial data platform covering private equity, venture capital, and M&A activity. Each investor on the platform has a public profile page showing the firm's overview, investment history, portfolio companies, and contact details. This actor collects the data visible on those public pages.
What you get
Each scraped profile includes:
Identity and overview
- Firm name and logo URL
- Investor type (Venture Capital, Private Equity, Angel, Corporate, etc.)
- Active/Inactive status
- Investor status (e.g. Actively Seeking New Investments)
- Professionals count
- Total investments count
- Portfolio companies count
- Exits count
Company details
- Firm description / bio
- Website
- Year founded
- Trade association membership
- Primary and other investor types
- Full corporate office address (street, city, state, zip, country)
- LinkedIn profile link
- Twitter / X profile link
Recent investments (public sample)
- Up to 10 most recent deals showing: company name, PitchBook company URL, deal date, deal type, industry, company stage, and lead partner (where publicly available)
Metadata
- Profile URL
- Scraped timestamp (UTC ISO 8601)
- Error field (null on success, message on failure)
Input
| Field | Type | Required | Description |
|---|---|---|---|
investorIds | array of strings | Yes | One or more PitchBook investor IDs or full profile URLs |
maxItems | integer | No | Max profiles to process (default: 100, max: 1000) |
requestTimeoutSecs | integer | No | Per-request timeout in seconds (default: 30, max: 120) |
How to find a PitchBook investor ID
Open any investor profile on PitchBook. The ID is the last segment of the URL path:
https://pitchbook.com/profiles/investor/41716-90^^^^^^^^^investor ID
You can paste the full URL or just the numeric ID into investorIds -- both work.
Known investor IDs (confirmed working)
| Investor ID | Firm Name | Type |
|---|---|---|
41716-90 | Andreessen Horowitz (a16z) | Venture Capital |
11295-73 | Sequoia Capital | Venture Capital |
To find IDs for other firms, open the firm's PitchBook profile page and copy the ID from the URL.
Example input -- minimal
{"investorIds": ["41716-90"]}
Example input -- batch with IDs only
{"investorIds": ["41716-90","11295-73"],"maxItems": 100}
Example input -- batch with full URLs
{"investorIds": ["https://pitchbook.com/profiles/investor/41716-90","https://pitchbook.com/profiles/investor/11295-73"],"maxItems": 100}
Example input -- mixed IDs and URLs
{"investorIds": ["41716-90","https://pitchbook.com/profiles/investor/11295-73"],"maxItems": 50,"requestTimeoutSecs": 60}
Output
Each item in the dataset looks like this:
{"investorId": "41716-90","profileUrl": "https://pitchbook.com/profiles/investor/41716-90","name": "Andreessen Horowitz","logoUrl": "https://image.pitchbook.com/KQfgZcIVkUmergLPYcA33weU7tH...","investorType": "Venture Capital","status": "Active","investorStatus": "Actively Seeking New Investments","professionalsCount": 159,"investmentsCount": 2702,"portfolioCount": 1154,"exitsCount": 564,"firmDescription": "Founded in 2009, Andreessen Horowitz is a venture capital firm based in Menlo Park, California. The firm prefers to invest in bio healthcare, artificial intelligence, consumer, crypto, enterprise, fintech, games, infrastructure, and American dynamism sectors.","website": "https://www.a16z.com","yearFounded": 2009,"tradeAssociation": "National Venture Capital Association (NVCA)","primaryInvestorType": "Venture Capital","otherInvestorTypes": "Accelerator/Incubator","address": {"street": "2865 Sand Hill Road, Suite 101","city": "Menlo Park","stateRegion": "CA","postalCode": "94025","country": "United States"},"linkedinUrl": "https://www.linkedin.com/company/a16z","twitterUrl": "https://twitter.com/a16z","recentInvestments": [{"companyName": "Sparq","companyProfileUrl": "https://pitchbook.com/profiles/company/1396426-69","dealDate": "21-May-2026","dealType": "Seed Round","industry": "Software Development Applications","companyStage": "Startup","leadPartner": null},{"companyName": "Catena Labs","companyProfileUrl": "https://pitchbook.com/profiles/company/531088-39","dealDate": "20-May-2026","dealType": null,"industry": "Other Financial Services","companyStage": "Generating Revenue","leadPartner": null}],"scrapedAt": "2026-05-24T10:00:00+00:00","error": null}
Output field reference
| Field | Type | Notes |
|---|---|---|
investorId | string | Numeric ID extracted from the URL |
profileUrl | string | Full PitchBook profile URL |
name | string | Firm name |
logoUrl | string | Absolute URL to the firm's logo image |
investorType | string | e.g. Venture Capital, Private Equity, Angel |
status | string | Active or Inactive |
investorStatus | string | e.g. Actively Seeking New Investments |
professionalsCount | integer | Number of listed professionals |
investmentsCount | integer | Total investment count shown on profile |
portfolioCount | integer | Active portfolio company count |
exitsCount | integer | Total exit count |
firmDescription | string | Company bio paragraph |
website | string | Firm website URL |
yearFounded | integer | Year the firm was founded |
tradeAssociation | string | e.g. NVCA |
primaryInvestorType | string | Main investor classification |
otherInvestorTypes | string | Additional investor type labels |
address | object | Street, city, stateRegion, postalCode, country |
linkedinUrl | string or null | LinkedIn company page URL |
twitterUrl | string or null | Twitter/X profile URL |
recentInvestments | array | Up to 10 recent deals (see below) |
scrapedAt | string | UTC ISO 8601 timestamp |
error | string or null | null on success, error message on failure |
recentInvestments item fields:
| Field | Type | Notes |
|---|---|---|
companyName | string | Portfolio company name |
companyProfileUrl | string or null | Link to the company's PitchBook profile |
dealDate | string | e.g. 21-May-2026 |
dealType | string or null | e.g. Seed Round, Series A. null if paywalled |
industry | string or null | Industry classification |
companyStage | string or null | e.g. Startup, Generating Revenue, Profitable |
leadPartner | string or null | null if paywalled or not listed |
Use cases
Good for building VC and PE firm lists by sector, keeping CRM records fresh with current descriptions and social links, comparing portfolio sizes across funds, or pulling addresses and contact links for outreach in bulk. Anything that would otherwise mean clicking through dozens of profiles manually.
Limitations
- Only data visible on public PitchBook profile pages is collected. Subscription-gated content (deal sizes, fund performance metrics, full team rosters, LP data) is not available.
- PitchBook shows up to 10 recent investments on the public profile. Full investment history requires a PitchBook account.
- Some deal types, deal sizes, and lead partner names are behind a paywall and return
null. This is expected. - For large batch runs (500+ profiles), increase
requestTimeoutSecsto 60 if you see timeout errors.
Valid input formats
All of the following are accepted in investorIds:
41716-9011295-73https://pitchbook.com/profiles/investor/41716-90https://pitchbook.com/profiles/investor/11295-73
Mixed formats in one run also work:
{"investorIds": ["41716-90","11295-73","https://pitchbook.com/profiles/investor/41716-90"]}
Duplicate IDs (same ID entered as both a raw ID and a full URL) are de-duplicated automatically.