# PitchBook Fund Data Scraper (`crawlerbros/pitchbook-fund-scraper`) Actor

Scrape public fund profile metadata from PitchBook without a subscription. Supports text search, direct profile URLs, and bulk sitemap discovery. Returns fund name, strategy, size, vintage year, manager, status, location, and more.

- **URL**: https://apify.com/crawlerbros/pitchbook-fund-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Other, Developer tools, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 7 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PitchBook Fund Data Scraper

Extract publicly available fund profile data from [PitchBook](https://pitchbook.com) — no subscription or login required. Supports three modes: scrape specific fund profiles by URL, search by keyword, or bulk-collect thousands of funds via PitchBook's sitemap.

### What This Scraper Does

This actor fetches fund profile pages from PitchBook and extracts all publicly visible metadata: fund name, strategy, size, vintage year, manager details, status, location, investment counts, and more. It does **not** require a PitchBook account.

### Input

| Field | Type | Description |
|---|---|---|
| **Profile URLs** | List of strings | Direct fund profile URLs (e.g. `https://pitchbook.com/profiles/fund/11295-73F`) or bare IDs (e.g. `11295-73F`). When provided, Search Query is ignored. |
| **Search Query** | String | Keyword search (e.g. `venture capital`, `buyout europe`). Used when no Profile URLs are given. |
| **Max Items** | Integer | Maximum number of fund records to return (1–100,000). Ignored in direct URL mode. Default: 10. |
| **Proxy Configuration** | Proxy object | Residential proxy is **required** for reliable scraping. PitchBook's Cloudflare protection rate-limits repeated requests from the same IP. |

#### Input Modes

1. **Direct** — Provide one or more fund profile URLs. All are processed regardless of Max Items.
2. **Search** — Provide a search query. The scraper paginates PitchBook search results and returns up to Max Items funds matching your query.
3. **Bulk** — Leave both fields empty. The scraper streams fund URLs from PitchBook's public sitemaps and returns up to Max Items funds.

### Output

Each record represents one fund profile. Fields marked with `?` are optional — they appear only when PitchBook makes them publicly available.

| Field | Type | Description |
|---|---|---|
| `name` | string | Fund name |
| `profileUrl` | string | Canonical PitchBook fund profile URL |
| `description` | string? | Fund description |
| `logoUrl` | string? | Fund manager logo URL |
| `socialLinks` | object? | Social profiles (twitter, linkedin, facebook) |
| `fundStrategy` | string? | Investment strategy (e.g. Buyout, Venture Capital) |
| `fundStatus` | string? | Fund lifecycle status (e.g. Active, Liquidated) |
| `fundSize` | string? | Total committed capital (e.g. $6.11B) |
| `vintageYear` | integer? | Year the fund was raised |
| `fundCategory` | string? | Asset class (e.g. Private Equity, Venture Capital) |
| `fundFamily` | string? | Fund family name |
| `fundManager` | string? | Managing firm name |
| `fundManagerUrl` | string? | PitchBook profile URL of the managing firm |
| `fundManagerWebsite` | string? | Managing firm's external website URL |
| `fundDomiciles` | string? | Domicile jurisdiction(s) of the fund (e.g. United States: Delaware) |
| `nativeCurrency` | string? | Fund's reporting currency (e.g. USD, EUR) |
| `totalInvestments` | integer? | Number of portfolio investments |
| `totalLimitedPartners` | integer? | Number of limited partners |
| `streetAddress` | string? | Manager street address |
| `postalCode` | string? | Manager postal code |
| `city` | string? | Manager city |
| `state` | string? | Manager state / region |
| `country` | string? | Fund domicile country |
| `scrapedAt` | string | ISO 8601 UTC timestamp of when the record was scraped |

#### Error Records

If a profile cannot be fetched or parsed, the record will contain:

| Field | Description |
|---|---|
| `inputUrl` | The URL or ID that was attempted |
| `error` | Human-readable error message |
| `scrapedAt` | Timestamp |

### Frequently Asked Questions

**Do I need a PitchBook account?**
No. This scraper only extracts data from public PitchBook pages that are visible to any visitor without logging in.

**Why is a proxy required?**
PitchBook uses Cloudflare to protect its pages. Without rotating residential proxies, repeated requests from the same IP address will be blocked. The scraper uses Apify's residential proxy pool by default.

**What data is NOT available?**
Fields that require a PitchBook subscription are not included: IRR, DPI, RVPI, TVPI fund returns, dry powder, deal sizes, LP commitment amounts, investment strategy charts, fund terms & fees, and full contact phone numbers.

**How many funds are available in bulk mode?**
PitchBook's public sitemaps contain approximately 150,000+ fund profiles. Use Max Items to control how many are scraped per run.

**Can I search for funds by strategy or geography?**
Yes. Use Search Query with terms like `buyout europe`, `venture capital berlin`, or `growth equity asia`.

**What fund URL format does PitchBook use?**
Fund profile URLs follow the pattern `https://pitchbook.com/profiles/fund/{id}F` where the ID ends with a capital `F` (e.g. `11295-73F`). This distinguishes fund profiles from investor profiles (`/profiles/investor/11295-73`).

# Actor input Schema

## `profileUrls` (type: `array`):

Direct PitchBook fund profile URLs or bare IDs (e.g. https://pitchbook.com/profiles/fund/11295-73F or 11295-73F). When provided, searchQuery is ignored.
## `searchQuery` (type: `string`):

Text search for funds (e.g. 'venture capital' or 'buyout europe'). Used when no profileUrls are provided.
## `maxItems` (type: `integer`):

Maximum number of fund records to return. Ignored when profileUrls are provided (all URLs are always processed).
## `proxyConfiguration` (type: `object`):

Apify proxy configuration. Residential proxy is required for reliable scraping — PitchBook's Cloudflare protection blocks repeated requests from the same IP.

## Actor input object example

```json
{
  "searchQuery": "venture capital",
  "maxItems": 3,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
````

# Actor output Schema

## `results` (type: `string`):

Dataset containing scraped fund profile records

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "venture capital",
    "maxItems": 3,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/pitchbook-fund-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "venture capital",
    "maxItems": 3,
    "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    },
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/pitchbook-fund-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "venture capital",
  "maxItems": 3,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call crawlerbros/pitchbook-fund-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/pitchbook-fund-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PitchBook Fund Data Scraper",
        "description": "Scrape public fund profile metadata from PitchBook without a subscription. Supports text search, direct profile URLs, and bulk sitemap discovery. Returns fund name, strategy, size, vintage year, manager, status, location, and more.",
        "version": "1.0",
        "x-build-id": "uNseJq3CT4IYnOpdY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~pitchbook-fund-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-pitchbook-fund-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~pitchbook-fund-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-pitchbook-fund-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~pitchbook-fund-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-pitchbook-fund-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "profileUrls": {
                        "title": "Profile URLs",
                        "type": "array",
                        "description": "Direct PitchBook fund profile URLs or bare IDs (e.g. https://pitchbook.com/profiles/fund/11295-73F or 11295-73F). When provided, searchQuery is ignored.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Text search for funds (e.g. 'venture capital' or 'buyout europe'). Used when no profileUrls are provided.",
                        "default": "venture capital"
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Maximum number of fund records to return. Ignored when profileUrls are provided (all URLs are always processed).",
                        "default": 3
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Apify proxy configuration. Residential proxy is required for reliable scraping — PitchBook's Cloudflare protection blocks repeated requests from the same IP.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
