# YCombinator Companies Scraper | 5,900+ YC Startup Directory (`haketa/ycombinator-companies-scraper`) Actor

Scrape the Y Combinator startup directory (5,900+ funded companies) via the official Algolia API. Name, website, batch, status, team size, industry, tags, hiring flag, launched-at, logo. B2B sales prospecting, recruiter intel, VC analytics. HTTP-only, fast.

- **URL**: https://apify.com/haketa/ycombinator-companies-scraper.md
- **Developed by:** [Haketa](https://apify.com/haketa) (community)
- **Categories:** Developer tools, Automation, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## YCombinator Companies Scraper — 5,900+ YC Startup Directory Extractor for Sales, Recruiting, VC Research & Competitive Intelligence

> **The fastest, most complete Y Combinator startup directory extractor on Apify.** Pull every funded YC company since 2005 — name, website, batch, status, stage, team size, industry, tags, region, hiring flag, launched-at, logo — straight from the **official Algolia search backend** that powers `ycombinator.com/companies`. Zero browsers, zero anti-bot, ideal ICP data for B2B SaaS sales prospecting, recruiter intel, VC analytics, and competitive landscape mapping.

[![Apify Actor](https://img.shields.io/badge/Apify-Actor-blue)](https://apify.com/haketa/ycombinator-companies-scraper)
[![Live Updated](https://img.shields.io/badge/Data-Live%20at%20Runtime-orange)]()
[![Engine](https://img.shields.io/badge/Engine-Direct%20Algolia%20API-green)]()
[![No Auth](https://img.shields.io/badge/Authentication-None%20Required-success)]()
[![Coverage](https://img.shields.io/badge/Coverage-5%2C916%2B%20YC%20Companies-purple)]()
[![Pay Per Event](https://img.shields.io/badge/Pricing-Pay%20Per%20Event-yellow)]()
[![Global](https://img.shields.io/badge/Geography-Global%20(US%20%2B%20EU%20%2B%20APAC%20%2B%20LATAM)-red)]()
[![HTTP Only](https://img.shields.io/badge/Speed-50%20records%20in%20~5s-brightgreen)]()

---

### What This Actor Does

The **YCombinator Companies Scraper** is a production-grade Apify Actor that extracts the **complete Y Combinator funded startup directory** — every company that has ever been backed by YC since the IK12 (Independent Kickstart 2012) era through every subsequent Winter, Summer, Spring, and Fall batch up to the latest cohort. As of the current snapshot, that's **5,916 funded companies** spanning early-stage seed bets to publicly traded YC alumni like Airbnb, Coinbase, DoorDash, and Stripe.

Under the hood, the actor talks **directly to YC's official Algolia search backend** — the very same `45BWZJ1SGC` Algolia app and `YCCompany_production` index that power the search box, filters, and infinite scroll on **ycombinator.com/companies**. No headless browser. No HTML parsing. No anti-bot to dodge. Just polite, low-concurrency HTTP `POST` calls to the public Algolia REST endpoint with the `ycdc_public` tag filter that YC explicitly publishes for client-side use.

In a single run (typically **5 seconds for 50 records, ~5 minutes for the full 5,916-company catalog via batch fan-out**), the actor returns richly normalized JSON records covering:

- **Companies** — every YC-funded startup (Active, Acquired, Public, Inactive)
- **Stages** — Seed, Early, Growth — useful for filtering to post-funding ICP
- **Industries** — B2B, Consumer, Fintech, Healthcare, Government, Education, Real Estate & Construction, Industrials and more
- **Regions** — United States of America, Europe, India, Asia, Latin America, Africa, Canada, Australia/New Zealand
- **Batches** — Winter 2024, Summer 2024, Spring 2024, Fall 2024 — all the way back to IK12
- **Hiring signal** — `isHiring` boolean for recruiter and job-board use cases
- **Top company flag** — YC's curated list of unicorns and best exits

Every record ships with the company's website, one-liner pitch, long description, logo URL, team size, launched-at timestamp, tags, sub-industry, former names, and a deep link back to the canonical `ycombinator.com/companies/<slug>` profile.

#### Why scrape Y Combinator yourself when this exists?

YC's directory looks innocently easy to scrape — it's just a public page. But teams that try the DIY route quickly hit a stack of headaches:

- The directory is **fully JavaScript-rendered React** — `curl` of the HTML returns an empty shell with zero company data
- A headless browser approach (Puppeteer, Playwright) means **5-10 minute runs for full coverage** and high compute cost
- Without knowing the secured Algolia key, naive Algolia callers get **403 Forbidden** — the key is base64-encoded and rotates implicitly via embedded `validUntil`
- Algolia caps a single secured-key query at **1,000 results** — you can't just ask for "all 5,916 companies in one call"
- The on-site infinite scroll uses Algolia's `page` pagination which **silently truncates** past 1,000 — most DIY scripts plateau at ~1,000 and never notice the missing 4,900 records
- Facet filters use a **nested array-of-arrays syntax** (`facetFilters=[["batch:W24"],["status:Active"]]`) that's poorly documented outside Algolia's own docs
- Field names in the Algolia response (`one_liner`, `small_logo_thumb_url`, `all_locations`, `launched_at`) need **normalization to a sane camelCase schema** before they're database-ingestible
- Timestamps come as **Unix epochs** that need ISO-8601 conversion
- YC tweaks the index periodically — adding `top_company`, `regions`, `subindustry`, splitting `industries` into array — meaning your custom scraper breaks silently
- Zero retry / backoff on the naive call means **transient Algolia 5xx errors** kill your run

This actor solves all of that: it speaks the Algolia facetFilter dialect fluently, fans out by batch to break through the 1,000-cap, retries with exponential backoff, normalizes every field, converts launched-at to ISO timestamps, and `Actor.fail()`s on zero records so you never get a silent SUCCEEDED with an empty dataset.

---

### Quick Start

#### One-Click Run

1. Click **"Try for free"** on the [Apify Store page](https://apify.com/haketa/ycombinator-companies-scraper)
2. Leave inputs empty to browse the **first 500 YC companies**, or type `AI` into the query box for AI-focused startups
3. Hit **Start** — your dataset is ready in under 10 seconds for a default run
4. Download as JSON, CSV, Excel, or HTML directly from the Apify dataset view, or pipe to Google Sheets / a webhook

#### API Run (Python)

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

## Example 1: every YC-funded AI startup that's actively hiring
run = client.actor("haketa/ycombinator-companies-scraper").call(run_input={
    "query": "AI",
    "hiringOnly": True,
    "maxRecords": 500,
})

for company in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{company['name']:<30}  {company['batch']:<12}  "
          f"team={company['teamSize']}  {company['website']}")
````

#### API Run (Python — full catalog via batch fan-out)

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

## Pull the entire YC catalog by fanning out across recent batches
batches = [
    "Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
    "Winter 2023", "Summer 2023", "Winter 2022", "Summer 2022",
    "Winter 2021", "Summer 2021", "Winter 2020", "Summer 2020",
    ## ...add all batches back to IK12 for full 5,916-company coverage
]

run = client.actor("haketa/ycombinator-companies-scraper").call(run_input={
    "batches": batches,
    "maxRecords": 0,            ## unlimited
    "hitsPerPage": 1000,
    "requestDelay": 300,
})

print(f"Saved {run['stats']['outputBodyLen']} bytes to dataset {run['defaultDatasetId']}")
```

#### API Run (Node.js / TypeScript)

```javascript
import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('haketa/ycombinator-companies-scraper').call({
    industries: ['Fintech'],
    statuses: ['Active'],
    regions: ['United States of America'],
    stages: ['Early', 'Growth'],
    hiringOnly: true,
    maxRecords: 1000,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} US fintech YC companies hiring at Early/Growth stage`);
items.slice(0, 5).forEach(c => console.log(`- ${c.name}: ${c.oneLiner}`));
```

#### API Run (cURL)

```bash
curl -X POST "https://api.apify.com/v2/acts/haketa~ycombinator-companies-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "developer tools",
    "batches": ["Winter 2024", "Summer 2024"],
    "hiringOnly": true,
    "maxRecords": 200
  }'
```

***

### How It Works

YC's directory at `ycombinator.com/companies` is a React single-page app whose search, filters, and infinite scroll all call **Algolia's hosted search REST API**. The Algolia application is publicly identifiable in the browser network tab:

- **Algolia Application ID:** `45BWZJ1SGC`
- **Primary index:** `YCCompany_production`
- **Secondary index:** `YCCompany_By_Launch_Date_production`
- **Tag filter:** `ycdc_public` — YC's own tag for client-exposed data
- **Endpoint:** `https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query`

The actor `POST`s a JSON body with a URL-encoded `params` string containing the free-text `query`, `hitsPerPage`, `page`, and a `facetFilters` array-of-arrays expressing AND-across-categories / OR-within-category logic. It uses the same browser-exposed secured API key that ycombinator.com hands out — a base64 blob that embeds `analyticsTags=ycdc`, `restrictIndices=YCCompany_production,YCCompany_By_Launch_Date_production`, and `tagFilters=["ycdc_public"]` so it can only ever return data YC has explicitly marked public.

#### Endpoint reference

| Source | Endpoint | Records | Cadence |
|---|---|---|---|
| Algolia primary | `https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query` | 5,916 companies (current snapshot) | Live — updated by YC continuously |
| Algolia secondary | `https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_By_Launch_Date_production/query` | Same companies, sorted by launch date | Live |
| YC profile page | `https://www.ycombinator.com/companies/<slug>` | One per company | Live |

#### Engineering details

- **HTTP-only via `got-scraping`** — no Puppeteer, no Playwright, no Chromium. Each Algolia call is a single sub-second HTTPS `POST`.
- **Algolia facet-filter dialect** — nested array-of-arrays serialized as URL-encoded JSON: `[["batch:Winter 2024"],["status:Active","status:Acquired"]]`.
- **Batch fan-out for the 1,000-cap** — Algolia caps a secured-key query at 1000 hits. To exceed that, the actor lets you list every batch (`Winter 2024`, `Summer 2024`, ..., `IK12`) and runs one query per batch. Each batch is < 300 companies, so 50+ batches multiplied out = full 5,916-company catalog.
- **Pagination loop** — each filter combination loops `page=0..nbPages-1` collecting hits, deduplicating by Algolia `id` along the way.
- **3-attempt retry with exponential backoff** — failed Algolia calls are retried with `2s, 4s, 6s` waits plus jitter. Permanent failure logs an error and skips the batch.
- **`Actor.fail()` on zero results** — prevents the dreaded "SUCCEEDED with empty dataset" scenario; the run explicitly fails with a hint about case-sensitive batch/industry spellings.
- **Polite delays** — configurable `requestDelay` (default 300ms) between Algolia calls so the actor never hammers YC's infrastructure.
- **Field normalization** — Algolia's snake\_case (`one_liner`, `small_logo_thumb_url`, `all_locations`, `launched_at`, `team_size`) is mapped to clean camelCase (`oneLiner`, `logoUrl`, `location`, `launchedAt`, `teamSize`).
- **Timestamp conversion** — Unix `launched_at` epoch is converted to ISO-8601 `launchedAt` plus the raw `launchedAtUnix` for time-series workflows.
- **No proxy required** — Algolia's public search endpoint has zero anti-bot. You may attach Apify Proxy via `proxyConfiguration` if you want, but it's pure overhead for this actor.
- **Deterministic output** — same input always produces the same set of records (Algolia is sorted by their default relevance score for the query).

***

### Input Parameters

```json
{
  "query": "AI",
  "batches": ["Winter 2024", "Summer 2024"],
  "statuses": ["Active"],
  "industries": ["B2B"],
  "regions": ["United States of America"],
  "stages": ["Seed", "Early"],
  "hiringOnly": true,
  "topCompaniesOnly": false,
  "maxRecords": 500,
  "hitsPerPage": 1000,
  "requestDelay": 300
}
```

#### Parameter reference

| Parameter | Type | Default | Description |
|---|---|---|---|
| `query` | `string` | `""` | Free-text search across name, one-liner, long description, industry, and tags. Empty = browse all. Examples: `"AI"`, `"developer tools"`, `"fintech"`, `"climate"`, `"vertical SaaS"`. |
| `batches` | `array<string>` | `[]` | Filter by YC batch. Format: `"Winter 2024"`, `"Summer 2023"`, `"Spring 2024"`, `"Fall 2024"`, `"IK12"`, etc. Each batch listed runs as a **separate fan-out query** — the recommended way to break the Algolia 1000-result cap and pull the full catalog. |
| `statuses` | `array<string>` | `[]` | Filter by company status. Values: `"Active"`, `"Acquired"`, `"Public"`, `"Inactive"`. Empty = all four. |
| `industries` | `array<string>` | `[]` | Filter by industry. Examples: `"B2B"`, `"Consumer"`, `"Fintech"`, `"Healthcare"`, `"Government"`, `"Real Estate and Construction"`, `"Education"`, `"Industrials"`. Empty = all. **Case-sensitive — match YC's exact spelling.** |
| `regions` | `array<string>` | `[]` | Filter by region. Examples: `"United States of America"`, `"Europe"`, `"Asia"`, `"India"`, `"Latin America"`, `"Africa"`, `"Canada"`, `"Australia / New Zealand"`. |
| `stages` | `array<string>` | `[]` | Filter by company stage. Values: `"Seed"`, `"Early"`, `"Growth"`. Empty = all three. |
| `hiringOnly` | `boolean` | `false` | When `true`, only returns companies with `isHiring: true`. Killer filter for recruiters and job-board operators. |
| `topCompaniesOnly` | `boolean` | `false` | When `true`, only returns YC's curated **Top Companies** — the unicorns and best exits (think Airbnb, Stripe, Coinbase, DoorDash, Reddit, Twitch, Instacart). |
| `maxRecords` | `integer` | `500` | Hard cap on total records across all fan-out queries. `0` = unlimited (bounded by Algolia's 1000-per-query cap × number of filter combinations). Set to `0` when pulling the full 5,916-company catalog. |
| `hitsPerPage` | `integer` | `1000` | Algolia page size. `1000` is the maximum the secured key allows and keeps request count minimal. |
| `requestDelay` | `integer` | `300` | Milliseconds between Algolia calls. Algolia is sub-second fast but 200-500ms is the polite range. |
| `proxyConfiguration` | `object` | none | Optional Apify proxy. **Almost never needed** — Algolia's public search API has zero rate-limit on the `ycdc_public` tag. |

***

### Output Schema

Every record is a flat JSON object with the same field set, so downstream consumers (Postgres, Snowflake, Salesforce, HubSpot, Airtable) can ingest without per-category branching.

#### Core company fields

| Field | Type | Description |
|---|---|---|
| `companyId` | `integer` | Stable YC-assigned numeric ID. Use as the primary key in your warehouse. |
| `name` | `string` | Company name (e.g., `"Airbyte"`, `"Stripe"`, `"&AI"`). |
| `slug` | `string` | URL-safe handle (e.g., `"airbyte"`, `"stripe"`, `"and-ai"`). |
| `ycProfileUrl` | `string` | Canonical deep link: `https://www.ycombinator.com/companies/<slug>`. |
| `website` | `string` | The company's own homepage URL. |
| `oneLiner` | `string` | The pitch in a sentence (e.g., `"Open-source data movement infrastructure"`). |
| `longDescription` | `string` | Multi-sentence company description from the YC profile. |
| `logoUrl` | `string` | Thumbnail logo URL hosted on YC's CDN. |

#### Classification fields

| Field | Type | Description |
|---|---|---|
| `batch` | `string` | YC cohort (e.g., `"Winter 2020"`, `"Summer 2024"`, `"IK12"`). |
| `status` | `string` | `"Active"`, `"Acquired"`, `"Public"`, or `"Inactive"`. |
| `stage` | `string` | `"Seed"`, `"Early"`, or `"Growth"`. |
| `industry` | `string` | Primary industry (e.g., `"B2B"`, `"Fintech"`, `"Healthcare"`). |
| `subindustry` | `string` | More granular vertical (e.g., `"B2B -> Sales"`, `"Fintech -> Banking and Exchange"`). |
| `industries` | `array<string>` | Full multi-industry tag list. |
| `tags` | `array<string>` | Free-form descriptive tags (e.g., `["AI", "Sales", "B2B", "LegalTech"]`). |

#### Operational fields

| Field | Type | Description |
|---|---|---|
| `teamSize` | `integer` | Reported headcount at scrape time. |
| `location` | `string` | Free-text location string (e.g., `"San Francisco, CA, USA"`). |
| `regions` | `array<string>` | Normalized region list (e.g., `["America / Canada", "United States of America", "Remote"]`). |
| `isHiring` | `boolean` | `true` if the company is actively hiring on Work at a Startup. |
| `topCompany` | `boolean` | `true` if YC has curated this company on its "Top Companies" list. |
| `nonprofit` | `boolean` | `true` if registered as a nonprofit (YC funds a few each batch). |
| `formerNames` | `array<string>` | Previous names if the company rebranded. |
| `launchedAt` | `string` | ISO-8601 launch date (e.g., `"2024-07-15T00:00:00.000Z"`). |
| `launchedAtUnix` | `integer` | Same timestamp as Unix epoch seconds — convenient for time-series joins. |

#### Provenance fields

| Field | Type | Description |
|---|---|---|
| `searchQuery` | `string` | The `query` string that surfaced this record (echoed back for multi-query runs). |
| `searchBatch` | `string` | The `batch` filter that surfaced this record (for fan-out runs). |
| `scrapedAt` | `string` | ISO-8601 timestamp of when the actor pulled this record. |

#### Example: An AI B2B startup (verified live from query="AI")

```json
{
  "companyId": 31984,
  "name": "&AI",
  "slug": "and-ai",
  "ycProfileUrl": "https://www.ycombinator.com/companies/and-ai",
  "website": "https://www.and.ai",
  "oneLiner": "AI for IP and patent law",
  "longDescription": "&AI builds the AI copilot for IP attorneys and patent agents — drafting, prior art searches, office action responses, and portfolio analytics in one workspace.",
  "logoUrl": "https://bookface-images.s3.amazonaws.com/small_logos/and-ai.png",
  "location": "New York, NY, USA",
  "regions": ["America / Canada", "United States of America"],
  "batch": "Summer 2024",
  "status": "Active",
  "stage": "Seed",
  "teamSize": 13,
  "industry": "B2B",
  "subindustry": "B2B -> LegalTech",
  "industries": ["B2B", "B2B -> LegalTech"],
  "tags": ["AI", "Artificial Intelligence", "LegalTech", "B2B"],
  "topCompany": false,
  "isHiring": true,
  "nonprofit": false,
  "formerNames": null,
  "launchedAt": "2024-07-20T00:00:00.000Z",
  "launchedAtUnix": 1721433600,
  "searchQuery": "AI",
  "searchBatch": null,
  "scrapedAt": "2026-05-18T09:15:00.000Z"
}
```

#### Example: A growth-stage YC alumnus (Airbyte)

```json
{
  "companyId": 23892,
  "name": "Airbyte",
  "slug": "airbyte",
  "ycProfileUrl": "https://www.ycombinator.com/companies/airbyte",
  "website": "https://airbyte.com",
  "oneLiner": "Open-source data movement infrastructure",
  "longDescription": "Airbyte is the leading open-source ELT platform with 300+ pre-built connectors. Used by thousands of data teams to centralize data into warehouses, lakes, and AI vector stores.",
  "logoUrl": "https://bookface-images.s3.amazonaws.com/small_logos/airbyte.png",
  "location": "San Francisco, CA, USA",
  "regions": ["America / Canada", "United States of America", "Remote"],
  "batch": "Winter 2020",
  "status": "Active",
  "stage": "Growth",
  "teamSize": 90,
  "industry": "B2B",
  "subindustry": "B2B -> Engineering, Product and Design",
  "industries": ["B2B", "B2B -> Engineering, Product and Design"],
  "tags": ["AI", "Data Engineering", "Open Source", "Developer Tools"],
  "topCompany": true,
  "isHiring": true,
  "nonprofit": false,
  "formerNames": null,
  "launchedAt": "2020-07-21T00:00:00.000Z",
  "launchedAtUnix": 1595289600,
  "searchQuery": "AI",
  "searchBatch": null,
  "scrapedAt": "2026-05-18T09:15:00.000Z"
}
```

***

### Status, Stage & Industry Reference

#### Company statuses

| Status | Meaning |
|---|---|
| `Active` | Still operating independently and most likely raising or growing |
| `Acquired` | Bought by another company (great for M\&A pattern research) |
| `Public` | IPO'd or listed via SPAC (Airbnb, Coinbase, DoorDash, Reddit, etc.) |
| `Inactive` | Shut down, wound up, or otherwise dormant |

#### Stages

| Stage | Typical Profile |
|---|---|
| `Seed` | Just out of YC, < 10 people, pre-Series A — primary recruiter and SDR target |
| `Early` | Series A / B, 10-100 people — prime ICP for dev tools, payroll, HR, observability SaaS |
| `Growth` | Series C+, 100+ people — enterprise SaaS, fintech, and consulting ICP |

#### Top YC industries (with sample counts)

| Industry | Notes |
|---|---|
| `B2B` | The largest industry segment — SaaS, dev tools, sales, HR, security, finance ops |
| `Consumer` | DTC, social, gaming, marketplaces, creator economy |
| `Fintech` | Banking, payments, lending, crypto, insurance, wealth management |
| `Healthcare` | Diagnostics, telehealth, biotech, mental health, healthtech infrastructure |
| `Education` | K-12, higher ed, professional learning, EdTech infrastructure |
| `Real Estate and Construction` | PropTech, construction tech, vacation rentals, real estate fintech |
| `Government` | GovTech, defense, public-sector SaaS |
| `Industrials` | Hardware, manufacturing, supply chain, climate, space |

> **Tip:** Use `industries: ["B2B"]` + `stages: ["Early", "Growth"]` + `hiringOnly: true` to get the canonical SaaS-sales prospecting list — post-funded, growing-headcount B2B YC companies.

***

### Use Cases

#### B2B SaaS Sales Prospecting

Funded YC startups are the **highest-converting cohort** for dev-tools, payroll, HR, observability, security, payment, and infrastructure SaaS sales teams. They're flush with capital, growing headcount, and the founders are technically literate so the sales cycle is short.

- **Build hyper-targeted ICP lists** by combining `industries: ["B2B"]` + `stages: ["Early", "Growth"]` + `teamSize > 20`
- **Identify post-funding spikes** by filtering on the most recent 4 batches (`Winter 2024`, `Summer 2024`, `Spring 2024`, `Fall 2024`) — these are the companies with fresh capital and procurement budgets
- **Enrich your CRM** by appending YC batch year, stage, team size, and industry to existing Salesforce/HubSpot accounts
- **Run trigger-based outbound** — when a `Seed`-stage company in your ICP rolls over to `Early`, that's a buying-signal alert
- **Route territory ownership** by region (`regions: ["United States of America"]` vs `regions: ["Europe"]`)
- **Score account fit** using YC batch as a proxy for company sophistication (a W24 company has different needs than an IK12 company)

#### Recruiter & Executive Search Intel

YC alumni network is the most concentrated source of "ex-founder", "early-engineer-at-unicorn", and "first-PM" talent on the planet. The `isHiring` flag is gold for recruiters.

- **Pull every YC company hiring right now** — `hiringOnly: true` plus a stage filter — and pitch retained search to the founder
- **Build executive search target lists** of late-stage YC alumni (`stage: "Growth"`, `topCompany: true`) for VP / C-suite placements
- **Source ex-YC engineers** for your client roster by joining this dataset with LinkedIn (the YC company website often lists "About" / "Team")
- **Visa-friendly employer mapping** — combine with the [H1B Visa Database](https://apify.com/haketa/h1b-visa-database) to surface YC companies actively sponsoring H-1Bs
- **Time recruiter outreach** to the launched-at date — a new launch means hiring volume jumps
- **Identify acqui-hire targets** by filtering `status: "Inactive"` and recent `batches` — these founders need a soft landing

#### VC Analytics & Deal Flow

Whether you're a seed-stage VC tracking YC dealflow or a growth fund mapping competitor portfolios, this dataset is the foundation.

- **Competitor portfolio mapping** — "What did Sequoia / a16z / Founders Fund back from Winter 2024?" by joining YC names with public investor databases
- **Theme-based pipeline building** — `query: "AI agents"` returns every YC AI-agent startup; `query: "vertical SaaS healthcare"` returns the vertical SaaS healthcare cohort
- **Batch-over-batch trend analysis** — count AI startups in W22 vs W23 vs W24 to quantify the AI explosion
- **Stage progression tracking** — diff `stage` between monthly runs to spot companies graduating from Seed to Early (= recent fundraise = re-engage)
- **Geographic dealflow** — `regions: ["India"]` or `regions: ["Latin America"]` surfaces emerging-market YC dealflow
- **Top Company anomaly detection** — a `topCompany: true` company suddenly switching to `status: "Inactive"` is a data point worth investigating

#### Startup Research & Journalism

YC's batch composition is one of the best leading indicators of startup-ecosystem trends. Journalists, analysts, and researchers use this dataset to write data-driven stories.

- **Quantify the AI explosion** — count companies tagged `"AI"` per batch since W22; the curve goes vertical in W23-S24
- **Track the fintech retreat of 2022** — count Fintech-tagged companies per batch and chart it
- **Cover the climate-tech rebound of 2024** — `query: "climate"` per batch over time
- **Build investor pitch-deck appendices** with charts of YC team-size growth, batch-size evolution, geographic distribution
- **Profile cohorts** — pull all of W24, sort by `launchedAt`, write a 5,000-word "State of W24" feature
- **Compare YC to Techstars / 500** by joining this dataset with sibling Apify scrapers

#### University Career Services

YC alumni companies hire aggressively from top CS programs. Career services teams build curated boards from the YC `isHiring` feed.

- **Show students which YC startups are hiring** — filter `hiringOnly: true` + region matching campus
- **Cross-reference with visa data** — combine with the [H1B Visa Database](https://apify.com/haketa/h1b-visa-database) for international student career boards
- **Build alumni placement reports** — "X% of our CS '24 grads went to YC-backed startups"
- **Power on-campus recruiting pitches** — invite hiring YC founders to do recruiting trips
- **Career fairs** — pull all SF Bay Area YC companies hiring to plan a Bay Area trek

#### Conference & Event Sales

SaaStr, TechCrunch Disrupt, MicroConf, the Stage Convention — every B2B SaaS conference needs to fill seats with funded-founder buyers. YC companies are their bread and butter.

- **Build a SaaStr 2026 prospect list** — `stages: ["Early", "Growth"]` + `industries: ["B2B"]`
- **TechCrunch Disrupt early-bird list** — `stages: ["Seed"]` + most recent 2 batches
- **Sponsor outreach** — `topCompany: true` companies are the dream sponsors with marketing budget
- **Speaker sourcing** — Growth-stage YC founders make excellent panel speakers
- **Side-event invitee lists** — every YC founder in `regions: ["United States of America"]` for the SF event circuit

#### Marketing & Ad Targeting

LinkedIn and Facebook custom audiences become dramatically more valuable when you can build a "YC-alumni-founder" persona.

- **LinkedIn custom audience seed** — upload the founder names from this dataset (joined with LinkedIn URLs) for ABM campaigns
- **Founder-targeted Facebook custom audiences** — match YC company websites to Facebook business accounts
- **Lookalike modeling** — train a lookalike on YC founders to find similar prospects outside YC
- **Account-based marketing (ABM)** for B2B SaaS — every YC company becomes a 1-row ABM target
- **Industry-specific newsletters** — sell ad spots to AI / Fintech / Healthcare advertisers and price by audience size in the dataset

#### Competitive Landscape Mapping & Strategy Decks

Product strategy teams pay consultants $50K+ for "competitive landscape" decks. This dataset lets you build them in an afternoon.

- **"Every YC AI sales startup since 2020"** — `query: "AI sales"` + `batches: <list>` — for sales-tech market mapping
- **"Every YC developer tools startup since IK12"** — `query: "developer tools"` for dev-tools market saturation analysis
- **Industry concentration matrix** — `industry` x `batch` pivot reveals where YC is concentrating bets
- **Product-strategy gap analysis** — find an industry with few YC entrants — likely a green field
- **Investor memo appendix** — "Of the 47 AI infrastructure startups YC has funded since W22, only 8 are growth-stage" is a powerful slide
- **Market sizing** — total team size summed across an industry = a directional TAM proxy

#### M\&A / Sourcing & Acquisition Targets

`status: "Active"` + `stage: "Early"` + sluggish team-size growth = a candidate acqui-hire conversation. Top corporate development teams scout YC alumni systematically.

- **Pre-Series-B acquisition targets** — `stage: "Early"` + `status: "Active"` + small team size
- **Defensive acquisitions** — find every YC company in your direct vertical and triage threat level
- **Acqui-hire scouting** — `status: "Inactive"` companies whose founders are signal-rich talent
- **Founder LinkedIn enrichment** — join names with LinkedIn to cold-message about strategic conversations
- **Competitor's portfolio acquisition** — when a competitor goes on a YC-buying spree, the dataset surfaces the pattern

#### Investor Research & LP Reporting

LPs and emerging fund managers use YC dealflow as a benchmark for their own portfolios.

- **Sector exposure benchmarking** — what % of YC's last 4 batches were AI vs your fund's exposure?
- **Geographic dealflow benchmarking** — YC has 12% India; your fund has 2% — is that an opportunity or risk?
- **Vintage tracking** — pull every YC batch, count Public + Acquired outcomes — compute YC's mortality and upside ratios by vintage
- **LP letter charts** — embed YC market data as the "context" appendix in quarterly LP updates
- **Co-invest sourcing** — identify YC `Growth` stage companies for late-stage co-invest deals

***

### Sample Queries & Recipes

#### Recipe 1: Every AI YC startup actively hiring (recruiter goldmine)

```json
{
  "query": "AI",
  "hiringOnly": true,
  "statuses": ["Active"],
  "maxRecords": 1000
}
```

#### Recipe 2: Full Winter 2024 batch — every company

```json
{
  "batches": ["Winter 2024"],
  "maxRecords": 0
}
```

#### Recipe 3: B2B SaaS ICP for sales prospecting

```json
{
  "industries": ["B2B"],
  "stages": ["Early", "Growth"],
  "statuses": ["Active"],
  "regions": ["United States of America"],
  "hiringOnly": true,
  "maxRecords": 1000
}
```

#### Recipe 4: YC's Top Companies list (Airbnb, Stripe, Coinbase, et al.)

```json
{
  "topCompaniesOnly": true,
  "maxRecords": 0
}
```

#### Recipe 5: Fintech YC alumni in India

```json
{
  "industries": ["Fintech"],
  "regions": ["India"],
  "statuses": ["Active"]
}
```

#### Recipe 6: Climate-tech surge across recent batches

```json
{
  "query": "climate",
  "batches": [
    "Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
    "Winter 2023", "Summer 2023"
  ],
  "maxRecords": 0
}
```

#### Recipe 7: Full 5,916-company catalog via batch fan-out

```json
{
  "batches": [
    "Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
    "Winter 2023", "Summer 2023",
    "Winter 2022", "Summer 2022",
    "Winter 2021", "Summer 2021",
    "Winter 2020", "Summer 2020",
    "Winter 2019", "Summer 2019",
    "Winter 2018", "Summer 2018",
    "Winter 2017", "Summer 2017",
    "Winter 2016", "Summer 2016",
    "Winter 2015", "Summer 2015",
    "Winter 2014", "Summer 2014",
    "Winter 2013", "Summer 2013",
    "Winter 2012", "Summer 2012",
    "Winter 2011", "Summer 2011",
    "Winter 2010", "Summer 2010",
    "Winter 2009", "Summer 2009",
    "Winter 2008", "Summer 2008",
    "Winter 2007", "Summer 2007",
    "Winter 2006", "Summer 2006",
    "Summer 2005",
    "IK12"
  ],
  "maxRecords": 0,
  "hitsPerPage": 1000,
  "requestDelay": 300
}
```

***

### Integration Examples

#### Google Sheets (via Apify Integration)

1. Set up an Apify schedule running this actor weekly at 7:00 AM Monday
2. Add the **"Export to Google Sheets"** integration to the schedule
3. Receive a fresh YC company directory in your Sheet every Monday morning
4. Build pivot tables: batch x industry, stage x region, isHiring counts over time

#### Make.com / Zapier / n8n

Use the **Apify** connector on Make, Zapier, or n8n. Trigger downstream workflows on:

- New companies (this week's run minus last week's = newly-added YC companies)
- Stage transitions (`Seed` → `Early` = recent fundraise signal — fire a Slack alert)
- `isHiring` flips to `true` (new hiring season — push to your recruiter Slack)
- New launches (`launchedAt` is within the last 7 days — push to your Twitter scheduler)

#### Power BI / Tableau / Looker

Connect Apify's REST API as a data source. Refresh on the Apify schedule. Build dashboards covering:

- YC batch size evolution over 18 years
- Industry distribution per batch (the AI surge visualized)
- Geographic dealflow heatmaps
- Top Companies progression — who graduated to topCompany in the last quarter?

#### Postgres / Snowflake / BigQuery

Use the [Apify webhook integration](https://docs.apify.com/platform/integrations/webhooks) to POST run results directly to a data warehouse ingestion endpoint after every scheduled run. Suggested schema:

```sql
CREATE TABLE yc_companies (
  company_id           BIGINT PRIMARY KEY,
  name                 TEXT,
  slug                 TEXT,
  yc_profile_url       TEXT,
  website              TEXT,
  one_liner            TEXT,
  long_description     TEXT,
  logo_url             TEXT,
  location             TEXT,
  regions              JSONB,
  batch                TEXT,
  status               TEXT,
  stage                TEXT,
  team_size            INTEGER,
  industry             TEXT,
  subindustry          TEXT,
  industries           JSONB,
  tags                 JSONB,
  top_company          BOOLEAN,
  is_hiring            BOOLEAN,
  nonprofit            BOOLEAN,
  former_names         JSONB,
  launched_at          TIMESTAMPTZ,
  launched_at_unix     BIGINT,
  scraped_at           TIMESTAMPTZ
);
CREATE INDEX idx_yc_batch ON yc_companies(batch);
CREATE INDEX idx_yc_industry ON yc_companies(industry);
CREATE INDEX idx_yc_is_hiring ON yc_companies(is_hiring) WHERE is_hiring = TRUE;
```

#### Salesforce / HubSpot CRM Enrichment

Trigger an Apify run weekly, then upsert against Account records keyed on `website` or `companyId`. Stage transitions can auto-create Tasks; new Top Company designations can trigger Opportunity stage changes.

#### Webhooks → Slack / Discord

Pipe the actor's `defaultDataset` through an Apify webhook into your Slack channel. Recruiters get a daily "Today's newly-hiring YC companies" post. Sales gets a weekly "New YC fintech ICP additions" digest.

***

### Major Markets & Regional Coverage

YC's portfolio is global. Below is a rough distribution of YC companies by region with significance notes.

| Region | YC Presence | Significance |
|---|---|---|
| United States of America | ~3,800 companies | The core — San Francisco, NYC, LA, Boston, Seattle, Austin, Miami |
| Europe | ~600 companies | London, Berlin, Paris, Amsterdam, Stockholm, Madrid, Lisbon |
| India | ~500 companies | Bengaluru, Mumbai, Delhi NCR, Hyderabad — fast-growing YC region |
| Latin America | ~400 companies | São Paulo, Mexico City, Buenos Aires, Bogotá, Santiago |
| Canada | ~200 companies | Toronto, Vancouver, Montreal, Waterloo |
| Asia (ex-India) | ~250 companies | Singapore, Tokyo, Seoul, Jakarta, Manila |
| Africa | ~120 companies | Lagos, Nairobi, Cape Town, Cairo |
| Australia / New Zealand | ~100 companies | Sydney, Melbourne, Auckland |
| Middle East | ~60 companies | Dubai, Tel Aviv, Riyadh |
| Remote-first | grows every batch | Distributed teams, no HQ |

> **Tip:** Combine `regions` filter with the [H1B Visa Database](https://apify.com/haketa/h1b-visa-database) to surface US-based YC employers who actively sponsor H-1Bs — pure gold for international recruiter outreach.

***

### Cost & Performance

| Metric | Value |
|---|---|
| Engine | Direct Algolia REST API (`got-scraping` HTTP) — no browser |
| Runtime (50 records, simple query) | ~5 seconds |
| Runtime (1,000 records, single filter combination) | ~10 seconds |
| Runtime (full ~5,916-company catalog via batch fan-out) | ~5 minutes |
| Cost per default run | ~0.001 Compute Units (typically less than $0.01) |
| Cost per full-catalog run | ~0.01 CU (typically less than $0.05) |
| Pricing model | Pay-per-event (transparent per-record pricing) |
| Data freshness | Live at runtime — YC's Algolia index is continuously refreshed |
| Auth required | None — uses YC's public `ycdc_public` Algolia key |
| Proxy required | None — Algolia public endpoint has no anti-bot |
| Concurrency | Safe to run multiple parallel filtered configurations |
| Memory footprint | 256 MB minimum, 1024 MB max — no scraping browser, low RAM |

***

### Compliance, Privacy & Legal Notes

- **Public data only** — every field returned by this actor is published by Y Combinator at `ycombinator.com/companies` under their public `ycdc_public` Algolia tag, which is the same tag YC uses to expose data to their own client-side search UI
- **No PII / no personal data** — the dataset describes **companies**, not individuals. Founder names are not in this dataset. (For founder-level enrichment, consume YC's company page separately.)
- **No emails, no phone numbers** — the actor does not return any contact information
- **Respectful of YC's infrastructure** — the actor uses low concurrency (1 in-flight Algolia call), configurable `requestDelay` (default 300ms), and 3-attempt exponential backoff. It is explicitly built to be a polite citizen.
- **YC's robots.txt** does not block `/companies` and the underlying Algolia endpoint is unauthenticated and intentionally public
- **Algolia ToS** — the `ycdc_public` secured key is issued by YC for client-side use; it self-restricts to `tagFilters=["ycdc_public"]` and `restrictIndices=YCCompany_production,YCCompany_By_Launch_Date_production`
- **GDPR / CCPA** — this dataset contains no EU or California resident personal data; company-level facts are not personal data under either regulation
- **No commercial guarantees** — fields, schemas, and Algolia keys are controlled by YC and may change without notice; the actor's normalization layer is built to handle most schema drift gracefully

> **Important:** Use of this dataset for unsolicited bulk communications must comply with **CAN-SPAM, TCPA, GDPR, CCPA**, and the YC website ToS. The actor publisher is not responsible for downstream misuse.

***

### Frequently Asked Questions

#### How fresh is the data?

YC updates its Algolia index continuously — new companies appear within hours of being announced. The actor hits Algolia live on every run, so the data is as fresh as YC publishes it.

#### How many companies will I get?

As of the current snapshot, **5,916 companies** are in YC's directory. A default run with no filters returns 500 records (the per-run cap). To pull the full catalog, list every batch in the `batches` input (~50 batches since IK12) — this fans out into multiple queries and breaks the Algolia 1,000-per-query cap.

#### Why does Algolia cap a single query at 1,000 results?

It's a security feature of the secured API key YC issues to their client-side search UI. Single-query result depth is capped to prevent bulk scraping via the public key. The actor works around this by fanning out across batches — each batch query is < 300 results so each one returns the full batch.

#### Does this scraper require login or API keys to YC?

No. The actor uses YC's own public Algolia key — the same one your browser uses when you visit `ycombinator.com/companies`. You only need an Apify account to run the actor.

#### Does this scrape ycombinator.com HTML?

No. The actor talks **directly to the Algolia search REST API** that powers the YC site. This is faster, more reliable, and respectful of YC's web servers (zero impact on `ycombinator.com`).

#### Does the actor return founder names or emails?

No. Founder information is not in YC's Algolia index — only company-level metadata. Combine with sibling actors like [SEEK](https://apify.com/haketa/seek-scraper) for jobs or [Levels.fyi](https://apify.com/haketa/levels-fyi-scraper) for comp data to enrich.

#### Are inactive / shut-down YC companies included?

Yes. Set `statuses: ["Inactive"]` to filter to wound-up companies, or leave `statuses` empty to get every status (Active, Acquired, Public, Inactive).

#### Can I filter by year of YC participation?

Yes — use the `batches` filter with cohort names like `"Winter 2024"`, `"Summer 2023"`, etc. To get all of 2024, list `["Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024"]`.

#### What's the difference between `industry`, `subindustry`, and `industries`?

- `industry` is the top-level YC category (e.g., `"B2B"`)
- `subindustry` is the more granular vertical with hierarchy syntax (e.g., `"B2B -> Sales"`)
- `industries` is the array of all industry tags the company carries — often the most useful for filtering

#### Can I get the YC Top Companies list?

Yes — set `topCompaniesOnly: true`. This returns YC's curated list of unicorns and best exits (Airbnb, Stripe, Coinbase, DoorDash, Reddit, Twitch, Instacart, Brex, Rappi, GitLab, Faire, et al.).

#### Does the actor deduplicate?

Yes. Within a single run the actor dedups by Algolia `companyId`, so even when multiple batch fan-out queries surface the same company (rare, but possible if companies span batches), only one record is saved.

#### Is proxy or residential IP required?

No. The Algolia public endpoint has no anti-bot or rate-limiting on the `ycdc_public` tag. You can attach Apify Proxy via `proxyConfiguration` if your network policy requires it, but it's pure overhead.

#### How do I pull the entire 5,916-company catalog?

Use Recipe 7 above — list every batch from `"Winter 2024"` back through `"Summer 2005"` and `"IK12"`. The fan-out completes in ~5 minutes.

#### Can I run this on a schedule automatically?

Yes — Apify's built-in Scheduler lets you trigger this actor on any cron expression. Weekly or daily runs work well for change-detection workflows. Combine with webhooks for fully automated pipelines.

#### Does this actor work with the Apify Free Plan?

Yes — full functionality on the free tier. A default 500-record run costs a fraction of a Compute Unit. The full 5,916-company fan-out still fits within the free monthly CU budget.

#### What formats can I export the data in?

JSON, CSV, Excel (XLSX), HTML, XML, JSONLines, and RSS — directly from the Apify dataset view. The API also supports streaming for large datasets.

#### What happens if Algolia returns zero results?

The actor explicitly calls `Actor.fail()` with a helpful error message ("No records. Try clearing all filters, or check that batch/industry/status spellings match YC's exactly (case-sensitive).") so you never get a silent SUCCEEDED with an empty dataset.

#### Is the data accurate?

The data is exactly what YC publishes on their own directory — same source, same recency. If a company's status or stage looks wrong, that's YC's directory; the actor does not modify or filter beyond what you request.

#### How do I report a bug or request a feature?

Open an issue on the Apify Store actor page or contact the developer directly through the Apify Console.

***

### Related Apify Actors by Haketa

Whether you're enriching YC company data with hiring intel, comp data, federal funding, or B2B verification, these sibling actors are designed to compose:

- [H1B Visa Database — US Visa Sponsorship Scraper](https://apify.com/haketa/h1b-visa-database) — perfect complement: surface YC employers actively sponsoring H-1Bs for international recruiting
- [Levels.fyi Scraper](https://apify.com/haketa/levels-fyi-scraper) — tech compensation data for YC-backed startups — invaluable for VC, recruiter, and candidate research
- [SEEK Scraper (Australia / NZ)](https://apify.com/haketa/seek-scraper) — job postings — pair with YC `isHiring: true` data for regional recruiter intel
- [ProductHunt Launches & Makers Scraper](https://apify.com/haketa/producthunt-launches-scraper) — daily startup launches, makers, votes & reviews — VC/founder/recruiter intel
- [BBB Business Scraper](https://apify.com/haketa/bbb-scraper) — Better Business Bureau ratings — verify post-IPO YC alumni reputation
- [SAM.gov Federal Contractor Entity Scraper](https://apify.com/haketa/sam-gov-federal-contractor-scraper) — federal funding peer dataset — see which YC alumni are also federal contractors
- [TTB Alcohol Permittee Scraper](https://apify.com/haketa/ttb-alcohol-permittee-scraper) — federal licensing peer — useful for YC consumer / alcohol vertical research
- [Salary.com Scraper](https://apify.com/haketa/salary-com-scraper) — salary benchmarks for YC alumni job postings
- [Texas Pharmacy License Scraper — TSBP](https://apify.com/haketa/tsbp-license-scraper) — healthcare licensing peer dataset for YC healthtech research
- [California DCA Professional License Scraper](https://apify.com/haketa/california-dca-license-scraper) — CA professional licensing — useful for YC regulated-industry research
- [Ohio eLicense Scraper](https://apify.com/haketa/ohio-elicense-scraper) — Ohio professional licensing — sibling regulatory dataset
- [Illinois IDFPR License Scraper](https://apify.com/haketa/illinois-idfpr-license-scraper) — Illinois professional licensing — sibling regulatory dataset

***

### Comparison vs. Alternatives

| Approach | Setup time | Coverage | Data freshness | Cost (5,916 records) | Schema normalization | Proxy needed |
|---|---|---|---|---|---|---|
| **This actor** | < 1 minute | 5,916+ companies | Live at runtime | < $0.05 | Built-in | No |
| Manual ycombinator.com browsing | Hours/days | Limited by attention span | Live | Free | None | No |
| Headless browser scrape (Puppeteer) | 1-2 days dev | Full | Live | $1-5 per run (CU cost) | DIY | Optional |
| Custom Algolia client | 4-8 hours dev | Full (if you handle 1000-cap) | Live | Free + infra | DIY | No |
| Paid startup database (PitchBook, CB Insights) | Days to onboard | Vast (not just YC) | Daily | $1,000-50,000/year | Built-in | N/A |
| LinkedIn Sales Navigator | Hours | YC alumni only inferred | Live | $99-149/seat/mo | None | N/A |

***

### Why Pay-Per-Event Pricing?

Most data scrapers either charge a flat monthly subscription (you pay even if you don't use it) or per-Compute-Unit (unpredictable). This actor uses **pay-per-event** pricing, which means:

- You only pay when the actor runs
- Charges scale with how much data you actually consume
- Transparent, line-item billing inside Apify
- No monthly minimums or annual commitments
- Free to evaluate — sample 50 records for pennies before committing to a full catalog pull
- Predictable cost-per-record — easy to forecast scrape budget for procurement

***

### Changelog

| Version | Date | Notes |
|---|---|---|
| 1.0.0 | 2026-05 | Initial public release — direct Algolia API integration, batch fan-out for >1,000 results, 3-attempt retry with exponential backoff, `Actor.fail()` on zero records, full ISO-8601 timestamp normalization, pay-per-event pricing |

***

### Keywords

Y Combinator scraper · YC companies database · YC startup directory scraper · Y Combinator API alternative · ycombinator.com/companies scraper · YC batch directory · startup directory scraper · funded startups scraper · B2B SaaS prospecting · VC portfolio scraper · YC alumni directory · Algolia startup search · YC Algolia API · YC Winter 2024 directory · YC Summer 2024 batch scraper · YC Top Companies list scraper · YC unicorn list · YC hiring scraper · YC isHiring filter · YC company API · funded startup lead generation · ICP list builder YC · YC fintech startups · YC AI startups · YC dev tools directory · YC healthcare startups · YC India directory · YC Latin America directory · YC growth-stage companies · YC seed-stage prospects · YC acquisition target list · YC competitor mapping · YC recruiter intelligence · YC executive search · YC ABM list · YC LinkedIn audience seed · YC investor research · startup database API · startup intelligence platform · founder outreach data · YC alumni hiring · YC batch trends · post-funding ICP scraper · YC company logo URLs · Apify YC actor · Haketa YC scraper

***

### Support

- **Bug reports:** Use the **Issues** tab on the Apify Store page
- **Feature requests:** Same place — please describe the use case and the input combination you'd like to see supported
- **Direct contact:** Through the Apify developer profile [haketa](https://apify.com/haketa)

If this actor saves you time or unlocks a new workflow, a **5-star rating** on the Apify Store helps other sales, recruiting, VC, and research teams discover it. Thank you!

# Actor input Schema

## `query` (type: `string`):

Free-text search across name, one-liner, description, industry, tags. Examples: 'AI', 'developer tools', 'fintech', 'climate'. Empty = browse all.

## `batches` (type: `array`):

Filter by YC batch. Formats: 'Winter 2024' / 'Summer 2023' / 'Spring 2024' / 'Fall 2024' / 'IK12' (early). Each batch runs as a separate query (best way to exceed Algolia's 1000-result cap).

## `statuses` (type: `array`):

Filter by company status. Values: 'Active', 'Acquired', 'Public', 'Inactive'. Empty = all.

## `industries` (type: `array`):

Filter by industry. Examples: 'B2B', 'Consumer', 'Fintech', 'Healthcare', 'Government', 'Real Estate and Construction', 'Education'. Empty = all.

## `regions` (type: `array`):

Filter by region. Examples: 'United States of America', 'Europe', 'Asia', 'India', 'Latin America', 'Africa', 'Canada'. Empty = all.

## `stages` (type: `array`):

Filter by company stage. Values: 'Seed', 'Early', 'Growth'. Empty = all.

## `hiringOnly` (type: `boolean`):

When true, only returns companies that are actively hiring (`isHiring: true`). Great for job-board and recruiter use cases.

## `topCompaniesOnly` (type: `boolean`):

When true, only returns YC 'Top Companies' (the curated list of YC's most successful exits and unicorns).

## `maxRecords` (type: `integer`):

Hard cap on total records across all queries. Set 0 for unlimited (bounded by Algolia's 1000-per-query cap × number of filter combinations).

## `hitsPerPage` (type: `integer`):

Algolia page size. 100 is the default; 1000 is the maximum the secured key allows. Larger = fewer requests.

## `requestDelay` (type: `integer`):

Delay between Algolia API calls. Algolia is fast (sub-second) but 200-500ms is polite.

## `proxyConfiguration` (type: `object`):

Optional. The Algolia search API has no anti-bot — proxy is NOT required for normal use.

## Actor input object example

```json
{
  "query": "",
  "batches": [],
  "statuses": [],
  "industries": [],
  "regions": [],
  "stages": [],
  "hiringOnly": false,
  "topCompaniesOnly": false,
  "maxRecords": 100,
  "hitsPerPage": 1000,
  "requestDelay": 300,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `companyId` (type: `string`):

YC internal company ID

## `name` (type: `string`):

Company name

## `slug` (type: `string`):

URL slug

## `ycProfileUrl` (type: `string`):

Y Combinator profile URL

## `website` (type: `string`):

Company website

## `oneLiner` (type: `string`):

Short description

## `batch` (type: `string`):

YC batch (e.g. Winter 2024)

## `status` (type: `string`):

Active/Acquired/Public/Inactive

## `stage` (type: `string`):

Seed/Early/Growth

## `teamSize` (type: `string`):

Employee count

## `industry` (type: `string`):

Primary industry

## `location` (type: `string`):

Office location

## `isHiring` (type: `string`):

Is actively hiring

## `topCompany` (type: `string`):

YC Top Company flag

## `launchedAt` (type: `string`):

Launch date (ISO)

## `scrapedAt` (type: `string`):

ISO timestamp

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "batches": [],
    "statuses": [],
    "industries": [],
    "regions": [],
    "stages": [],
    "maxRecords": 100,
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("haketa/ycombinator-companies-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "batches": [],
    "statuses": [],
    "industries": [],
    "regions": [],
    "stages": [],
    "maxRecords": 100,
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("haketa/ycombinator-companies-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "batches": [],
  "statuses": [],
  "industries": [],
  "regions": [],
  "stages": [],
  "maxRecords": 100,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call haketa/ycombinator-companies-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=haketa/ycombinator-companies-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "YCombinator Companies Scraper | 5,900+ YC Startup Directory",
        "description": "Scrape the Y Combinator startup directory (5,900+ funded companies) via the official Algolia API. Name, website, batch, status, team size, industry, tags, hiring flag, launched-at, logo. B2B sales prospecting, recruiter intel, VC analytics. HTTP-only, fast.",
        "version": "0.0",
        "x-build-id": "haZLI2HTmuF696GZL"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/haketa~ycombinator-companies-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-haketa-ycombinator-companies-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/haketa~ycombinator-companies-scraper/runs": {
            "post": {
                "operationId": "runs-sync-haketa-ycombinator-companies-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/haketa~ycombinator-companies-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-haketa-ycombinator-companies-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Free-text Search Query",
                        "type": "string",
                        "description": "Free-text search across name, one-liner, description, industry, tags. Examples: 'AI', 'developer tools', 'fintech', 'climate'. Empty = browse all.",
                        "default": ""
                    },
                    "batches": {
                        "title": "Batches",
                        "type": "array",
                        "description": "Filter by YC batch. Formats: 'Winter 2024' / 'Summer 2023' / 'Spring 2024' / 'Fall 2024' / 'IK12' (early). Each batch runs as a separate query (best way to exceed Algolia's 1000-result cap).",
                        "items": {
                            "type": "string"
                        }
                    },
                    "statuses": {
                        "title": "Statuses",
                        "type": "array",
                        "description": "Filter by company status. Values: 'Active', 'Acquired', 'Public', 'Inactive'. Empty = all.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "industries": {
                        "title": "Industries",
                        "type": "array",
                        "description": "Filter by industry. Examples: 'B2B', 'Consumer', 'Fintech', 'Healthcare', 'Government', 'Real Estate and Construction', 'Education'. Empty = all.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "regions": {
                        "title": "Regions",
                        "type": "array",
                        "description": "Filter by region. Examples: 'United States of America', 'Europe', 'Asia', 'India', 'Latin America', 'Africa', 'Canada'. Empty = all.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "stages": {
                        "title": "Stages",
                        "type": "array",
                        "description": "Filter by company stage. Values: 'Seed', 'Early', 'Growth'. Empty = all.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "hiringOnly": {
                        "title": "Hiring Only",
                        "type": "boolean",
                        "description": "When true, only returns companies that are actively hiring (`isHiring: true`). Great for job-board and recruiter use cases.",
                        "default": false
                    },
                    "topCompaniesOnly": {
                        "title": "Top Companies Only",
                        "type": "boolean",
                        "description": "When true, only returns YC 'Top Companies' (the curated list of YC's most successful exits and unicorns).",
                        "default": false
                    },
                    "maxRecords": {
                        "title": "Max Records",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Hard cap on total records across all queries. Set 0 for unlimited (bounded by Algolia's 1000-per-query cap × number of filter combinations).",
                        "default": 100
                    },
                    "hitsPerPage": {
                        "title": "Hits Per Page",
                        "minimum": 10,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Algolia page size. 100 is the default; 1000 is the maximum the secured key allows. Larger = fewer requests.",
                        "default": 1000
                    },
                    "requestDelay": {
                        "title": "Request Delay (ms)",
                        "minimum": 100,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Delay between Algolia API calls. Algolia is fast (sub-second) but 200-500ms is polite.",
                        "default": 300
                    },
                    "proxyConfiguration": {
                        "title": "Proxy Configuration",
                        "type": "object",
                        "description": "Optional. The Algolia search API has no anti-bot — proxy is NOT required for normal use."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
