# PitchBook Data Extractor (`kawsar/pitchbook-data-extractor`) Actor

PitchBook investor scraper that pulls firm profiles by investor ID, so you can build sourcing lists and keep your CRM current without clicking through profiles manually.

- **URL**: https://apify.com/kawsar/pitchbook-data-extractor.md
- **Developed by:** [Kawsar](https://apify.com/kawsar) (community)
- **Categories:** Automation, Lead generation, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.99 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PitchBook Data Extractor

PitchBook Data Extractor scrapes public investor profile pages from PitchBook by investor ID or profile URL. Feed it a list of IDs and it returns structured JSON with firm details, deal counts, contact info, social links, and a sample of recent investments -- no manual browsing required.

---

### What is PitchBook?

PitchBook is a financial data platform covering private equity, venture capital, and M&A activity. Each investor on the platform has a public profile page showing the firm's overview, investment history, portfolio companies, and contact details. This actor collects the data visible on those public pages.

---

### What you get

Each scraped profile includes:

**Identity and overview**
- Firm name and logo URL
- Investor type (Venture Capital, Private Equity, Angel, Corporate, etc.)
- Active/Inactive status
- Investor status (e.g. Actively Seeking New Investments)
- Professionals count
- Total investments count
- Portfolio companies count
- Exits count

**Company details**
- Firm description / bio
- Website
- Year founded
- Trade association membership
- Primary and other investor types
- Full corporate office address (street, city, state, zip, country)
- LinkedIn profile link
- Twitter / X profile link

**Recent investments (public sample)**
- Up to 10 most recent deals showing: company name, PitchBook company URL, deal date, deal type, industry, company stage, and lead partner (where publicly available)

**Metadata**
- Profile URL
- Scraped timestamp (UTC ISO 8601)
- Error field (null on success, message on failure)

---

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `investorIds` | array of strings | Yes | One or more PitchBook investor IDs or full profile URLs |
| `maxItems` | integer | No | Max profiles to process (default: 100, max: 1000) |
| `requestTimeoutSecs` | integer | No | Per-request timeout in seconds (default: 30, max: 120) |

#### How to find a PitchBook investor ID

Open any investor profile on PitchBook. The ID is the last segment of the URL path:

````

https://pitchbook.com/profiles/investor/41716-90
^^^^^^^^^
investor ID

````

You can paste the full URL or just the numeric ID into `investorIds` -- both work.

#### Known investor IDs (confirmed working)

| Investor ID | Firm Name | Type |
|-------------|-----------|------|
| `41716-90` | Andreessen Horowitz (a16z) | Venture Capital |
| `11295-73` | Sequoia Capital | Venture Capital |

To find IDs for other firms, open the firm's PitchBook profile page and copy the ID from the URL.

#### Example input -- minimal

```json
{
    "investorIds": ["41716-90"]
}
````

#### Example input -- batch with IDs only

```json
{
    "investorIds": [
        "41716-90",
        "11295-73"
    ],
    "maxItems": 100
}
```

#### Example input -- batch with full URLs

```json
{
    "investorIds": [
        "https://pitchbook.com/profiles/investor/41716-90",
        "https://pitchbook.com/profiles/investor/11295-73"
    ],
    "maxItems": 100
}
```

#### Example input -- mixed IDs and URLs

```json
{
    "investorIds": [
        "41716-90",
        "https://pitchbook.com/profiles/investor/11295-73"
    ],
    "maxItems": 50,
    "requestTimeoutSecs": 60
}
```

***

### Output

Each item in the dataset looks like this:

```json
{
    "investorId": "41716-90",
    "profileUrl": "https://pitchbook.com/profiles/investor/41716-90",
    "name": "Andreessen Horowitz",
    "logoUrl": "https://image.pitchbook.com/KQfgZcIVkUmergLPYcA33weU7tH...",
    "investorType": "Venture Capital",
    "status": "Active",
    "investorStatus": "Actively Seeking New Investments",
    "professionalsCount": 159,
    "investmentsCount": 2702,
    "portfolioCount": 1154,
    "exitsCount": 564,
    "firmDescription": "Founded in 2009, Andreessen Horowitz is a venture capital firm based in Menlo Park, California. The firm prefers to invest in bio healthcare, artificial intelligence, consumer, crypto, enterprise, fintech, games, infrastructure, and American dynamism sectors.",
    "website": "https://www.a16z.com",
    "yearFounded": 2009,
    "tradeAssociation": "National Venture Capital Association (NVCA)",
    "primaryInvestorType": "Venture Capital",
    "otherInvestorTypes": "Accelerator/Incubator",
    "address": {
        "street": "2865 Sand Hill Road, Suite 101",
        "city": "Menlo Park",
        "stateRegion": "CA",
        "postalCode": "94025",
        "country": "United States"
    },
    "linkedinUrl": "https://www.linkedin.com/company/a16z",
    "twitterUrl": "https://twitter.com/a16z",
    "recentInvestments": [
        {
            "companyName": "Sparq",
            "companyProfileUrl": "https://pitchbook.com/profiles/company/1396426-69",
            "dealDate": "21-May-2026",
            "dealType": "Seed Round",
            "industry": "Software Development Applications",
            "companyStage": "Startup",
            "leadPartner": null
        },
        {
            "companyName": "Catena Labs",
            "companyProfileUrl": "https://pitchbook.com/profiles/company/531088-39",
            "dealDate": "20-May-2026",
            "dealType": null,
            "industry": "Other Financial Services",
            "companyStage": "Generating Revenue",
            "leadPartner": null
        }
    ],
    "scrapedAt": "2026-05-24T10:00:00+00:00",
    "error": null
}
```

#### Output field reference

| Field | Type | Notes |
|-------|------|-------|
| `investorId` | string | Numeric ID extracted from the URL |
| `profileUrl` | string | Full PitchBook profile URL |
| `name` | string | Firm name |
| `logoUrl` | string | Absolute URL to the firm's logo image |
| `investorType` | string | e.g. Venture Capital, Private Equity, Angel |
| `status` | string | Active or Inactive |
| `investorStatus` | string | e.g. Actively Seeking New Investments |
| `professionalsCount` | integer | Number of listed professionals |
| `investmentsCount` | integer | Total investment count shown on profile |
| `portfolioCount` | integer | Active portfolio company count |
| `exitsCount` | integer | Total exit count |
| `firmDescription` | string | Company bio paragraph |
| `website` | string | Firm website URL |
| `yearFounded` | integer | Year the firm was founded |
| `tradeAssociation` | string | e.g. NVCA |
| `primaryInvestorType` | string | Main investor classification |
| `otherInvestorTypes` | string | Additional investor type labels |
| `address` | object | Street, city, stateRegion, postalCode, country |
| `linkedinUrl` | string or null | LinkedIn company page URL |
| `twitterUrl` | string or null | Twitter/X profile URL |
| `recentInvestments` | array | Up to 10 recent deals (see below) |
| `scrapedAt` | string | UTC ISO 8601 timestamp |
| `error` | string or null | null on success, error message on failure |

**`recentInvestments` item fields:**

| Field | Type | Notes |
|-------|------|-------|
| `companyName` | string | Portfolio company name |
| `companyProfileUrl` | string or null | Link to the company's PitchBook profile |
| `dealDate` | string | e.g. 21-May-2026 |
| `dealType` | string or null | e.g. Seed Round, Series A. null if paywalled |
| `industry` | string or null | Industry classification |
| `companyStage` | string or null | e.g. Startup, Generating Revenue, Profitable |
| `leadPartner` | string or null | null if paywalled or not listed |

***

### Use cases

Good for building VC and PE firm lists by sector, keeping CRM records fresh with current descriptions and social links, comparing portfolio sizes across funds, or pulling addresses and contact links for outreach in bulk. Anything that would otherwise mean clicking through dozens of profiles manually.

***

### Limitations

- Only data visible on public PitchBook profile pages is collected. Subscription-gated content (deal sizes, fund performance metrics, full team rosters, LP data) is not available.
- PitchBook shows up to 10 recent investments on the public profile. Full investment history requires a PitchBook account.
- Some deal types, deal sizes, and lead partner names are behind a paywall and return `null`. This is expected.
- For large batch runs (500+ profiles), increase `requestTimeoutSecs` to 60 if you see timeout errors.

***

### Valid input formats

All of the following are accepted in `investorIds`:

```
41716-90
11295-73
https://pitchbook.com/profiles/investor/41716-90
https://pitchbook.com/profiles/investor/11295-73
```

Mixed formats in one run also work:

```json
{
    "investorIds": [
        "41716-90",
        "11295-73",
        "https://pitchbook.com/profiles/investor/41716-90"
    ]
}
```

Duplicate IDs (same ID entered as both a raw ID and a full URL) are de-duplicated automatically.

# Actor input Schema

## `investorIds` (type: `array`):

One or more PitchBook investor IDs (e.g. 41716-90) or full profile URLs (e.g. https://pitchbook.com/profiles/investor/41716-90).

## `maxItems` (type: `integer`):

Maximum number of investor profiles to process per run.

## `requestTimeoutSecs` (type: `integer`):

Per-request timeout in seconds.

## Actor input object example

```json
{
  "investorIds": [
    "41716-90",
    "11295-73"
  ],
  "maxItems": 100,
  "requestTimeoutSecs": 30
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "investorIds": [
        "https://pitchbook.com/profiles/investor/52749-01"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("kawsar/pitchbook-data-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "investorIds": ["https://pitchbook.com/profiles/investor/52749-01"] }

# Run the Actor and wait for it to finish
run = client.actor("kawsar/pitchbook-data-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "investorIds": [
    "https://pitchbook.com/profiles/investor/52749-01"
  ]
}' |
apify call kawsar/pitchbook-data-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=kawsar/pitchbook-data-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PitchBook Data Extractor",
        "description": "PitchBook investor scraper that pulls firm profiles by investor ID, so you can build sourcing lists and keep your CRM current without clicking through profiles manually.",
        "version": "0.0",
        "x-build-id": "tHZnX8dH8xrD45p5a"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/kawsar~pitchbook-data-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-kawsar-pitchbook-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/kawsar~pitchbook-data-extractor/runs": {
            "post": {
                "operationId": "runs-sync-kawsar-pitchbook-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/kawsar~pitchbook-data-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-kawsar-pitchbook-data-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "investorIds"
                ],
                "properties": {
                    "investorIds": {
                        "title": "Investor IDs or URLs",
                        "type": "array",
                        "description": "One or more PitchBook investor IDs (e.g. 41716-90) or full profile URLs (e.g. https://pitchbook.com/profiles/investor/41716-90).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of investor profiles to process per run.",
                        "default": 100
                    },
                    "requestTimeoutSecs": {
                        "title": "Request timeout (seconds)",
                        "minimum": 5,
                        "maximum": 120,
                        "type": "integer",
                        "description": "Per-request timeout in seconds.",
                        "default": 30
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
