# Y Combinator Companies Scraper (`crawlerbros/y-combinator-scraper`) Actor

Scrape the full Y Combinator company directory with company profiles, founders, open jobs, batch, industry, status, and social links. HTTP-only, no login required.

- **URL**: https://apify.com/crawlerbros/y-combinator-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Jobs, Lead generation, Developer tools
- **Stats:** 1 total users, 0 monthly users, 100.0% runs succeeded, 10 bookmarks
- **User rating**: 3.56 out of 5 stars

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Y Combinator Companies Scraper

Scrape the complete [Y Combinator company directory](https://www.ycombinator.com/companies) — 5,000+ startups across every batch from 2005 to today. Get company profiles, founders, open jobs, industry tags, batch, status, funding stage, team size, and social links. HTTP-only; no login, no cookies, no proxy required.

### Output (per company)

- `type` = `yc_company`
- `id` (slug), `slug`, `url`, `name`, `kind` (startup / non-profit / etc.)
- `shortDescription`, `longDescription`, `pitch` (full pitch text when present)
- `batch` (e.g. `S24`), `industry`, `subIndustry`, `status` (Active / Inactive / Acquired / Public), `stage` (Seed / Series A / etc.)
- `location`, `allLocations`, `regions`, `foundingYear`, `launchedAt` (unix), `teamSize`
- `website`, `linkedin`, `twitter`, `facebook`, `crunchbase`, `wellfound`, `github`
- `logo`, `logoThumb`, `demoDayVideo`
- `isHiring`, `isCompanyHiring`, `jobCount`
- `tags` — when Algolia returns them
- `formerNames` — list, when the company has rebranded
- `topCompany`, `topCompanyBadge`, `ycdcBadgeName`, `nonprofit` — when flagged
- `questionsAndAnswers` — `[{ question, answer }]` short founder Q&A blocks
- `teamHighlights` — list of blurbs about the team
- `highlightBlackFounders`, `highlightWomenFounders`, `highlightHispanicFounders` — only when Algolia explicitly flags them
- `founders` — `[{ name, title, bio, email, linkedin, twitter, hackerNews, github, instagram }]` when `scrapeFounders = true`
- `openJobs` — `[{ title, url, applyUrl, location, remote, type, role, team, yearsExperience, salaryMin, salaryMax, compensationCurrency, equity, equityRange, skills, experience, visa, visaSupported, englishFluent }]` when `scrapeOpenJobs = true`
- `scrapedAt`

If zero companies match the filters, a single `yc_company_blocked` sentinel record is emitted so runs always exit 0.

### Input

| Field | Type | Description |
|---|---|---|
| `directoryUrl` | string | YC directory URL. Default: `https://www.ycombinator.com/companies`. |
| `query` | string | Optional free-text search (`?q=<query>`). |
| `batch` | enum | `any`, `S24`, `W24`, `F24`, `S23`, `W23`, `S22`, `W22`, `S21`, `W21`. |
| `industry` | string | Exact-match industry filter (e.g. `B2B`, `Consumer`, `Fintech`, `Healthcare`). |
| `status` | enum | `any`, `Active`, `Inactive`, `Acquired`, `Public`. |
| `scrapeFounders` | boolean | Fetch founder details per company. Default `true`. |
| `scrapeOpenJobs` | boolean | Fetch open job postings per company. Default `true`. |
| `regions` | string[] | Optional region filter (case-insensitive) — matched against `location`. |
| `tags` | string[] | Optional tag filter (case-insensitive) — matched against `industry`, `subIndustry`, `tags`. |
| `highlightBlackFounders` | boolean | Only include companies flagged with Black founders. |
| `highlightWomenFounders` | boolean | Only include companies flagged with women founders. |
| `highlightHispanicFounders` | boolean | Only include companies flagged with Hispanic / Latino founders. |
| `maxItems` | integer | Max companies per run (1–5500). Default 3. |

### How it works

1. Query Y Combinator's public Algolia search index (`YCCompany_production`) for companies matching your filters. Pagination is handled transparently.
2. For each company, optionally fetch its detail page (`/companies/<slug>`) and parse the Inertia `data-page` JSON blob to get founders and open jobs.
3. Jobs expose salary range, equity, skills, visa policy, and the apply URL.
4. Output uses a strict no-nulls contract — every field present is non-empty.

### FAQ

**Do I need a proxy?** No. YC is publicly accessible from datacenter IPs.

**Does the scraper need YC credentials?** No. All data comes from public endpoints.

**How many companies are in the directory?** About 5,800 across all batches (growing each cycle). `maxItems` caps per run at 5,500.

**Are historical founders included?** Yes — every company's founder list is preserved on its public profile, including exits.

**Why does `jobCount` sometimes differ from the directory badge?** YC's directory badge counts only open job postings; we return the exact set embedded in the profile page.

**What's the `yc_company_blocked` record?** When your filter returns zero matches (e.g. a typo in `industry`), we emit one sentinel record so downstream pipelines never see an empty output.

# Actor input Schema

## `directoryUrl` (type: `string`):

Y Combinator company directory URL. Default scrapes the full directory. You can paste a filtered URL (e.g. https://www.ycombinator.com/companies?batch=S24&industry=AI).
## `query` (type: `string`):

Optional free-text query. When set, filters the directory using the ?q=<query> Algolia search (matches name, description, tags).
## `batch` (type: `string`):

Filter by YC batch code. 'any' returns every batch. Codes: S=Summer, W=Winter, F=Fall followed by two-digit year (e.g. S24 = Summer 2024).
## `industry` (type: `string`):

Filter by industry (e.g. 'B2B', 'Consumer', 'Fintech', 'Healthcare', 'Industrials', 'Real Estate and Construction'). Case-sensitive, exact match against YC's industry facets.
## `status` (type: `string`):

Filter by company operating status. 'any' returns all statuses.
## `scrapeFounders` (type: `boolean`):

Include each company's founders (name, title, bio, LinkedIn, Twitter). Requires an extra detail-page fetch per company.
## `scrapeOpenJobs` (type: `boolean`):

Include each company's open job postings (title, URL, location, salary, equity, skills, visa). Shares the same detail-page fetch as founders.
## `regions` (type: `array`):

Optional region filter (case-insensitive substring match against the company's location). Example: ["San Francisco", "New York"].
## `tags` (type: `array`):

Optional tag filter (case-insensitive). Matched against industry, subIndustry and any tag arrays Algolia returns. Example: ["B2B", "Fintech"].
## `highlightBlackFounders` (type: `boolean`):

If true, only return companies flagged as having Black founders. Requires the signal to be present on Algolia — companies without the flag are not fabricated.
## `highlightWomenFounders` (type: `boolean`):

If true, only return companies flagged as having women founders. Requires the signal to be present on Algolia.
## `highlightHispanicFounders` (type: `boolean`):

If true, only return companies flagged as having Hispanic / Latino founders. Requires the signal to be present on Algolia.
## `maxItems` (type: `integer`):

Maximum number of companies to return.

## Actor input object example

```json
{
  "directoryUrl": "https://www.ycombinator.com/companies",
  "batch": "any",
  "status": "any",
  "scrapeFounders": true,
  "scrapeOpenJobs": true,
  "regions": [],
  "tags": [],
  "highlightBlackFounders": false,
  "highlightWomenFounders": false,
  "highlightHispanicFounders": false,
  "maxItems": 3
}
````

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "directoryUrl": "https://www.ycombinator.com/companies",
    "scrapeFounders": true,
    "scrapeOpenJobs": true,
    "regions": [],
    "tags": [],
    "maxItems": 3
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/y-combinator-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "directoryUrl": "https://www.ycombinator.com/companies",
    "scrapeFounders": True,
    "scrapeOpenJobs": True,
    "regions": [],
    "tags": [],
    "maxItems": 3,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/y-combinator-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "directoryUrl": "https://www.ycombinator.com/companies",
  "scrapeFounders": true,
  "scrapeOpenJobs": true,
  "regions": [],
  "tags": [],
  "maxItems": 3
}' |
apify call crawlerbros/y-combinator-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/y-combinator-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Y Combinator Companies Scraper",
        "description": "Scrape the full Y Combinator company directory with company profiles, founders, open jobs, batch, industry, status, and social links. HTTP-only, no login required.",
        "version": "1.0",
        "x-build-id": "lAbbvjYGNVYWXVUWB"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~y-combinator-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-y-combinator-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~y-combinator-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-y-combinator-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~y-combinator-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-y-combinator-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "directoryUrl"
                ],
                "properties": {
                    "directoryUrl": {
                        "title": "Directory URL",
                        "type": "string",
                        "description": "Y Combinator company directory URL. Default scrapes the full directory. You can paste a filtered URL (e.g. https://www.ycombinator.com/companies?batch=S24&industry=AI).",
                        "default": "https://www.ycombinator.com/companies"
                    },
                    "query": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Optional free-text query. When set, filters the directory using the ?q=<query> Algolia search (matches name, description, tags)."
                    },
                    "batch": {
                        "title": "Batch",
                        "enum": [
                            "any",
                            "S24",
                            "W24",
                            "F24",
                            "S23",
                            "W23",
                            "S22",
                            "W22",
                            "S21",
                            "W21"
                        ],
                        "type": "string",
                        "description": "Filter by YC batch code. 'any' returns every batch. Codes: S=Summer, W=Winter, F=Fall followed by two-digit year (e.g. S24 = Summer 2024).",
                        "default": "any"
                    },
                    "industry": {
                        "title": "Industry",
                        "type": "string",
                        "description": "Filter by industry (e.g. 'B2B', 'Consumer', 'Fintech', 'Healthcare', 'Industrials', 'Real Estate and Construction'). Case-sensitive, exact match against YC's industry facets."
                    },
                    "status": {
                        "title": "Status",
                        "enum": [
                            "any",
                            "Active",
                            "Inactive",
                            "Acquired",
                            "Public"
                        ],
                        "type": "string",
                        "description": "Filter by company operating status. 'any' returns all statuses.",
                        "default": "any"
                    },
                    "scrapeFounders": {
                        "title": "Scrape Founders",
                        "type": "boolean",
                        "description": "Include each company's founders (name, title, bio, LinkedIn, Twitter). Requires an extra detail-page fetch per company.",
                        "default": true
                    },
                    "scrapeOpenJobs": {
                        "title": "Scrape Open Jobs",
                        "type": "boolean",
                        "description": "Include each company's open job postings (title, URL, location, salary, equity, skills, visa). Shares the same detail-page fetch as founders.",
                        "default": true
                    },
                    "regions": {
                        "title": "Regions",
                        "type": "array",
                        "description": "Optional region filter (case-insensitive substring match against the company's location). Example: [\"San Francisco\", \"New York\"].",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "tags": {
                        "title": "Tags",
                        "type": "array",
                        "description": "Optional tag filter (case-insensitive). Matched against industry, subIndustry and any tag arrays Algolia returns. Example: [\"B2B\", \"Fintech\"].",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "highlightBlackFounders": {
                        "title": "Highlight: Black Founders",
                        "type": "boolean",
                        "description": "If true, only return companies flagged as having Black founders. Requires the signal to be present on Algolia — companies without the flag are not fabricated.",
                        "default": false
                    },
                    "highlightWomenFounders": {
                        "title": "Highlight: Women Founders",
                        "type": "boolean",
                        "description": "If true, only return companies flagged as having women founders. Requires the signal to be present on Algolia.",
                        "default": false
                    },
                    "highlightHispanicFounders": {
                        "title": "Highlight: Hispanic / Latino Founders",
                        "type": "boolean",
                        "description": "If true, only return companies flagged as having Hispanic / Latino founders. Requires the signal to be present on Algolia.",
                        "default": false
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "minimum": 1,
                        "maximum": 5500,
                        "type": "integer",
                        "description": "Maximum number of companies to return.",
                        "default": 3
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
