Pricing

from $1.00 / 1,000 results

YCombinator Companies Scraper | 5,900+ YC Startup Directory

Y Combinator companies scraper & API: export the YC startup directory by batch, industry & status — company name, description, website, batch, team size, location, founders, tags and YC profile URL. Startup intelligence, VC research and B2B lead lists — fast, no login.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Actor stats

Bookmarked

Total users

Monthly active users

8 days ago

Last modified

YCombinator Companies Scraper — 5,900+ YC Startup Directory Extractor for Sales, Recruiting, VC Research & Competitive Intelligence

The fastest, most complete Y Combinator startup directory extractor on Apify. Pull every funded YC company since 2005 — name, website, batch, status, stage, team size, industry, tags, region, hiring flag, launched-at, logo — straight from the official Algolia search backend that powers ycombinator.com/companies. Zero browsers, zero anti-bot, ideal ICP data for B2B SaaS sales prospecting, recruiter intel, VC analytics, and competitive landscape mapping.

What This Actor Does

The YCombinator Companies Scraper is a production-grade Apify Actor that extracts the complete Y Combinator funded startup directory — every company that has ever been backed by YC since the IK12 (Independent Kickstart 2012) era through every subsequent Winter, Summer, Spring, and Fall batch up to the latest cohort. As of the current snapshot, that's 5,916 funded companies spanning early-stage seed bets to publicly traded YC alumni like Airbnb, Coinbase, DoorDash, and Stripe.

Under the hood, the actor talks directly to YC's official Algolia search backend — the very same 45BWZJ1SGC Algolia app and YCCompany_production index that power the search box, filters, and infinite scroll on ycombinator.com/companies. No headless browser. No HTML parsing. No anti-bot to dodge. Just polite, low-concurrency HTTP POST calls to the public Algolia REST endpoint with the ycdc_public tag filter that YC explicitly publishes for client-side use.

In a single run (typically 5 seconds for 50 records, ~5 minutes for the full 5,916-company catalog via batch fan-out), the actor returns richly normalized JSON records covering:

Companies — every YC-funded startup (Active, Acquired, Public, Inactive)
Stages — Seed, Early, Growth — useful for filtering to post-funding ICP
Industries — B2B, Consumer, Fintech, Healthcare, Government, Education, Real Estate & Construction, Industrials and more
Regions — United States of America, Europe, India, Asia, Latin America, Africa, Canada, Australia/New Zealand
Batches — Winter 2024, Summer 2024, Spring 2024, Fall 2024 — all the way back to IK12
Hiring signal — isHiring boolean for recruiter and job-board use cases
Top company flag — YC's curated list of unicorns and best exits

Every record ships with the company's website, one-liner pitch, long description, logo URL, team size, launched-at timestamp, tags, sub-industry, former names, and a deep link back to the canonical ycombinator.com/companies/<slug> profile.

Why scrape Y Combinator yourself when this exists?

YC's directory looks innocently easy to scrape — it's just a public page. But teams that try the DIY route quickly hit a stack of headaches:

The directory is fully JavaScript-rendered React — curl of the HTML returns an empty shell with zero company data
A headless browser approach (Puppeteer, Playwright) means 5-10 minute runs for full coverage and high compute cost
Without knowing the secured Algolia key, naive Algolia callers get 403 Forbidden — the key is base64-encoded and rotates implicitly via embedded validUntil
Algolia caps a single secured-key query at 1,000 results — you can't just ask for "all 5,916 companies in one call"
The on-site infinite scroll uses Algolia's page pagination which silently truncates past 1,000 — most DIY scripts plateau at ~1,000 and never notice the missing 4,900 records
Facet filters use a nested array-of-arrays syntax (facetFilters=[["batch:W24"],["status:Active"]]) that's poorly documented outside Algolia's own docs
Field names in the Algolia response (one_liner, small_logo_thumb_url, all_locations, launched_at) need normalization to a sane camelCase schema before they're database-ingestible
Timestamps come as Unix epochs that need ISO-8601 conversion
YC tweaks the index periodically — adding top_company, regions, subindustry, splitting industries into array — meaning your custom scraper breaks silently
Zero retry / backoff on the naive call means transient Algolia 5xx errors kill your run

This actor solves all of that: it speaks the Algolia facetFilter dialect fluently, fans out by batch to break through the 1,000-cap, retries with exponential backoff, normalizes every field, converts launched-at to ISO timestamps, and Actor.fail()s on zero records so you never get a silent SUCCEEDED with an empty dataset.

Quick Start

One-Click Run

Click "Try for free" on the Apify Store page
Leave inputs empty to browse the first 500 YC companies, or type AI into the query box for AI-focused startups
Hit Start — your dataset is ready in under 10 seconds for a default run
Download as JSON, CSV, Excel, or HTML directly from the Apify dataset view, or pipe to Google Sheets / a webhook

API Run (Python)

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

# Example 1: every YC-funded AI startup that's actively hiring
run = client.actor("haketa/ycombinator-companies-scraper").call(run_input={
    "query": "AI",
    "hiringOnly": True,
    "maxRecords": 500,
})

for company in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{company['name']:<30}  {company['batch']:<12}  "
          f"team={company['teamSize']}  {company['website']}")

API Run (Python — full catalog via batch fan-out)

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

# Pull the entire YC catalog by fanning out across recent batches
batches = [
    "Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
    "Winter 2023", "Summer 2023", "Winter 2022", "Summer 2022",
    "Winter 2021", "Summer 2021", "Winter 2020", "Summer 2020",
    # ...add all batches back to IK12 for full 5,916-company coverage
]

run = client.actor("haketa/ycombinator-companies-scraper").call(run_input={
    "batches": batches,
    "maxRecords": 0,            # unlimited
    "hitsPerPage": 1000,
    "requestDelay": 300,
})

print(f"Saved {run['stats']['outputBodyLen']} bytes to dataset {run['defaultDatasetId']}")

API Run (Node.js / TypeScript)

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });

const run = await client.actor('haketa/ycombinator-companies-scraper').call({
    industries: ['Fintech'],
    statuses: ['Active'],
    regions: ['United States of America'],
    stages: ['Early', 'Growth'],
    hiringOnly: true,
    maxRecords: 1000,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} US fintech YC companies hiring at Early/Growth stage`);
items.slice(0, 5).forEach(c => console.log(`- ${c.name}: ${c.oneLiner}`));

API Run (cURL)

curl -X POST "https://api.apify.com/v2/acts/haketa~ycombinator-companies-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "developer tools",
    "batches": ["Winter 2024", "Summer 2024"],
    "hiringOnly": true,
    "maxRecords": 200
  }'

How It Works

YC's directory at ycombinator.com/companies is a React single-page app whose search, filters, and infinite scroll all call Algolia's hosted search REST API. The Algolia application is publicly identifiable in the browser network tab:

Algolia Application ID: 45BWZJ1SGC
Primary index: YCCompany_production
Secondary index: YCCompany_By_Launch_Date_production
Tag filter: ycdc_public — YC's own tag for client-exposed data
Endpoint: https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query

The actor POSTs a JSON body with a URL-encoded params string containing the free-text query, hitsPerPage, page, and a facetFilters array-of-arrays expressing AND-across-categories / OR-within-category logic. It uses the same browser-exposed secured API key that ycombinator.com hands out — a base64 blob that embeds analyticsTags=ycdc, restrictIndices=YCCompany_production,YCCompany_By_Launch_Date_production, and tagFilters=["ycdc_public"] so it can only ever return data YC has explicitly marked public.

Endpoint reference

Source	Endpoint	Records	Cadence
Algolia primary	`https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query`	5,916 companies (current snapshot)	Live — updated by YC continuously
Algolia secondary	`https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_By_Launch_Date_production/query`	Same companies, sorted by launch date	Live
YC profile page	`https://www.ycombinator.com/companies/<slug>`	One per company	Live

Engineering details

HTTP-only via got-scraping — no Puppeteer, no Playwright, no Chromium. Each Algolia call is a single sub-second HTTPS POST.
Algolia facet-filter dialect — nested array-of-arrays serialized as URL-encoded JSON: [["batch:Winter 2024"],["status:Active","status:Acquired"]].
Batch fan-out for the 1,000-cap — Algolia caps a secured-key query at 1000 hits. To exceed that, the actor lets you list every batch (Winter 2024, Summer 2024, ..., IK12) and runs one query per batch. Each batch is < 300 companies, so 50+ batches multiplied out = full 5,916-company catalog.
Pagination loop — each filter combination loops page=0..nbPages-1 collecting hits, deduplicating by Algolia id along the way.
3-attempt retry with exponential backoff — failed Algolia calls are retried with 2s, 4s, 6s waits plus jitter. Permanent failure logs an error and skips the batch.
Actor.fail() on zero results — prevents the dreaded "SUCCEEDED with empty dataset" scenario; the run explicitly fails with a hint about case-sensitive batch/industry spellings.
Polite delays — configurable requestDelay (default 300ms) between Algolia calls so the actor never hammers YC's infrastructure.
Field normalization — Algolia's snake_case (one_liner, small_logo_thumb_url, all_locations, launched_at, team_size) is mapped to clean camelCase (oneLiner, logoUrl, location, launchedAt, teamSize).
Timestamp conversion — Unix launched_at epoch is converted to ISO-8601 launchedAt plus the raw launchedAtUnix for time-series workflows.
No proxy required — Algolia's public search endpoint has zero anti-bot. You may attach Apify Proxy via proxyConfiguration if you want, but it's pure overhead for this actor.
Deterministic output — same input always produces the same set of records (Algolia is sorted by their default relevance score for the query).

Input Parameters

{
  "query": "AI",
  "batches": ["Winter 2024", "Summer 2024"],
  "statuses": ["Active"],
  "industries": ["B2B"],
  "regions": ["United States of America"],
  "stages": ["Seed", "Early"],
  "hiringOnly": true,
  "topCompaniesOnly": false,
  "maxRecords": 500,
  "hitsPerPage": 1000,
  "requestDelay": 300
}

Parameter reference

Parameter	Type	Default	Description
`query`	`string`	`""`	Free-text search across name, one-liner, long description, industry, and tags. Empty = browse all. Examples: `"AI"`, `"developer tools"`, `"fintech"`, `"climate"`, `"vertical SaaS"`.
`batches`	`array<string>`	`[]`	Filter by YC batch. Format: `"Winter 2024"`, `"Summer 2023"`, `"Spring 2024"`, `"Fall 2024"`, `"IK12"`, etc. Each batch listed runs as a separate fan-out query — the recommended way to break the Algolia 1000-result cap and pull the full catalog.
`statuses`	`array<string>`	`[]`	Filter by company status. Values: `"Active"`, `"Acquired"`, `"Public"`, `"Inactive"`. Empty = all four.
`industries`	`array<string>`	`[]`	Filter by industry. Examples: `"B2B"`, `"Consumer"`, `"Fintech"`, `"Healthcare"`, `"Government"`, `"Real Estate and Construction"`, `"Education"`, `"Industrials"`. Empty = all. Case-sensitive — match YC's exact spelling.
`regions`	`array<string>`	`[]`	Filter by region. Examples: `"United States of America"`, `"Europe"`, `"Asia"`, `"India"`, `"Latin America"`, `"Africa"`, `"Canada"`, `"Australia / New Zealand"`.
`stages`	`array<string>`	`[]`	Filter by company stage. Values: `"Seed"`, `"Early"`, `"Growth"`. Empty = all three.
`hiringOnly`	`boolean`	`false`	When `true`, only returns companies with `isHiring: true`. Killer filter for recruiters and job-board operators.
`topCompaniesOnly`	`boolean`	`false`	When `true`, only returns YC's curated Top Companies — the unicorns and best exits (think Airbnb, Stripe, Coinbase, DoorDash, Reddit, Twitch, Instacart).
`maxRecords`	`integer`	`500`	Hard cap on total records across all fan-out queries. `0` = unlimited (bounded by Algolia's 1000-per-query cap × number of filter combinations). Set to `0` when pulling the full 5,916-company catalog.
`hitsPerPage`	`integer`	`1000`	Algolia page size. `1000` is the maximum the secured key allows and keeps request count minimal.
`requestDelay`	`integer`	`300`	Milliseconds between Algolia calls. Algolia is sub-second fast but 200-500ms is the polite range.
`proxyConfiguration`	`object`	none	Optional Apify proxy. Almost never needed — Algolia's public search API has zero rate-limit on the `ycdc_public` tag.

Output Schema

Every record is a flat JSON object with the same field set, so downstream consumers (Postgres, Snowflake, Salesforce, HubSpot, Airtable) can ingest without per-category branching.

Core company fields

Field	Type	Description
`companyId`	`integer`	Stable YC-assigned numeric ID. Use as the primary key in your warehouse.
`name`	`string`	Company name (e.g., `"Airbyte"`, `"Stripe"`, `"&AI"`).
`slug`	`string`	URL-safe handle (e.g., `"airbyte"`, `"stripe"`, `"and-ai"`).
`ycProfileUrl`	`string`	Canonical deep link: `https://www.ycombinator.com/companies/<slug>`.
`website`	`string`	The company's own homepage URL.
`oneLiner`	`string`	The pitch in a sentence (e.g., `"Open-source data movement infrastructure"`).
`longDescription`	`string`	Multi-sentence company description from the YC profile.
`logoUrl`	`string`	Thumbnail logo URL hosted on YC's CDN.

Classification fields

Field	Type	Description
`batch`	`string`	YC cohort (e.g., `"Winter 2020"`, `"Summer 2024"`, `"IK12"`).
`status`	`string`	`"Active"`, `"Acquired"`, `"Public"`, or `"Inactive"`.
`stage`	`string`	`"Seed"`, `"Early"`, or `"Growth"`.
`industry`	`string`	Primary industry (e.g., `"B2B"`, `"Fintech"`, `"Healthcare"`).
`subindustry`	`string`	More granular vertical (e.g., `"B2B -> Sales"`, `"Fintech -> Banking and Exchange"`).
`industries`	`array<string>`	Full multi-industry tag list.
`tags`	`array<string>`	Free-form descriptive tags (e.g., `["AI", "Sales", "B2B", "LegalTech"]`).

Operational fields

Field	Type	Description
`teamSize`	`integer`	Reported headcount at scrape time.
`location`	`string`	Free-text location string (e.g., `"San Francisco, CA, USA"`).
`regions`	`array<string>`	Normalized region list (e.g., `["America / Canada", "United States of America", "Remote"]`).
`isHiring`	`boolean`	`true` if the company is actively hiring on Work at a Startup.
`topCompany`	`boolean`	`true` if YC has curated this company on its "Top Companies" list.
`nonprofit`	`boolean`	`true` if registered as a nonprofit (YC funds a few each batch).
`formerNames`	`array<string>`	Previous names if the company rebranded.
`launchedAt`	`string`	ISO-8601 launch date (e.g., `"2024-07-15T00:00:00.000Z"`).
`launchedAtUnix`	`integer`	Same timestamp as Unix epoch seconds — convenient for time-series joins.

Provenance fields

Field	Type	Description
`searchQuery`	`string`	The `query` string that surfaced this record (echoed back for multi-query runs).
`searchBatch`	`string`	The `batch` filter that surfaced this record (for fan-out runs).
`scrapedAt`	`string`	ISO-8601 timestamp of when the actor pulled this record.

Example: An AI B2B startup (verified live from query="AI")

{
  "companyId": 31984,
  "name": "&AI",
  "slug": "and-ai",
  "ycProfileUrl": "https://www.ycombinator.com/companies/and-ai",
  "website": "https://www.and.ai",
  "oneLiner": "AI for IP and patent law",
  "longDescription": "&AI builds the AI copilot for IP attorneys and patent agents — drafting, prior art searches, office action responses, and portfolio analytics in one workspace.",
  "logoUrl": "https://bookface-images.s3.amazonaws.com/small_logos/and-ai.png",
  "location": "New York, NY, USA",
  "regions": ["America / Canada", "United States of America"],
  "batch": "Summer 2024",
  "status": "Active",
  "stage": "Seed",
  "teamSize": 13,
  "industry": "B2B",
  "subindustry": "B2B -> LegalTech",
  "industries": ["B2B", "B2B -> LegalTech"],
  "tags": ["AI", "Artificial Intelligence", "LegalTech", "B2B"],
  "topCompany": false,
  "isHiring": true,
  "nonprofit": false,
  "formerNames": null,
  "launchedAt": "2024-07-20T00:00:00.000Z",
  "launchedAtUnix": 1721433600,
  "searchQuery": "AI",
  "searchBatch": null,
  "scrapedAt": "2026-05-18T09:15:00.000Z"
}

Example: A growth-stage YC alumnus (Airbyte)

{
  "companyId": 23892,
  "name": "Airbyte",
  "slug": "airbyte",
  "ycProfileUrl": "https://www.ycombinator.com/companies/airbyte",
  "website": "https://airbyte.com",
  "oneLiner": "Open-source data movement infrastructure",
  "longDescription": "Airbyte is the leading open-source ELT platform with 300+ pre-built connectors. Used by thousands of data teams to centralize data into warehouses, lakes, and AI vector stores.",
  "logoUrl": "https://bookface-images.s3.amazonaws.com/small_logos/airbyte.png",
  "location": "San Francisco, CA, USA",
  "regions": ["America / Canada", "United States of America", "Remote"],
  "batch": "Winter 2020",
  "status": "Active",
  "stage": "Growth",
  "teamSize": 90,
  "industry": "B2B",
  "subindustry": "B2B -> Engineering, Product and Design",
  "industries": ["B2B", "B2B -> Engineering, Product and Design"],
  "tags": ["AI", "Data Engineering", "Open Source", "Developer Tools"],
  "topCompany": true,
  "isHiring": true,
  "nonprofit": false,
  "formerNames": null,
  "launchedAt": "2020-07-21T00:00:00.000Z",
  "launchedAtUnix": 1595289600,
  "searchQuery": "AI",
  "searchBatch": null,
  "scrapedAt": "2026-05-18T09:15:00.000Z"
}

Status, Stage & Industry Reference

Company statuses

Status	Meaning
`Active`	Still operating independently and most likely raising or growing
`Acquired`	Bought by another company (great for M&A pattern research)
`Public`	IPO'd or listed via SPAC (Airbnb, Coinbase, DoorDash, Reddit, etc.)
`Inactive`	Shut down, wound up, or otherwise dormant

Stages

Stage	Typical Profile
`Seed`	Just out of YC, < 10 people, pre-Series A — primary recruiter and SDR target
`Early`	Series A / B, 10-100 people — prime ICP for dev tools, payroll, HR, observability SaaS
`Growth`	Series C+, 100+ people — enterprise SaaS, fintech, and consulting ICP

Top YC industries (with sample counts)

Industry	Notes
`B2B`	The largest industry segment — SaaS, dev tools, sales, HR, security, finance ops
`Consumer`	DTC, social, gaming, marketplaces, creator economy
`Fintech`	Banking, payments, lending, crypto, insurance, wealth management
`Healthcare`	Diagnostics, telehealth, biotech, mental health, healthtech infrastructure
`Education`	K-12, higher ed, professional learning, EdTech infrastructure
`Real Estate and Construction`	PropTech, construction tech, vacation rentals, real estate fintech
`Government`	GovTech, defense, public-sector SaaS
`Industrials`	Hardware, manufacturing, supply chain, climate, space

Tip: Use industries: ["B2B"] + stages: ["Early", "Growth"] + hiringOnly: true to get the canonical SaaS-sales prospecting list — post-funded, growing-headcount B2B YC companies.

Use Cases

B2B SaaS Sales Prospecting

Funded YC startups are the highest-converting cohort for dev-tools, payroll, HR, observability, security, payment, and infrastructure SaaS sales teams. They're flush with capital, growing headcount, and the founders are technically literate so the sales cycle is short.

Build hyper-targeted ICP lists by combining industries: ["B2B"] + stages: ["Early", "Growth"] + teamSize > 20
Identify post-funding spikes by filtering on the most recent 4 batches (Winter 2024, Summer 2024, Spring 2024, Fall 2024) — these are the companies with fresh capital and procurement budgets
Enrich your CRM by appending YC batch year, stage, team size, and industry to existing Salesforce/HubSpot accounts
Run trigger-based outbound — when a Seed-stage company in your ICP rolls over to Early, that's a buying-signal alert
Route territory ownership by region (regions: ["United States of America"] vs regions: ["Europe"])
Score account fit using YC batch as a proxy for company sophistication (a W24 company has different needs than an IK12 company)

Recruiter & Executive Search Intel

YC alumni network is the most concentrated source of "ex-founder", "early-engineer-at-unicorn", and "first-PM" talent on the planet. The isHiring flag is gold for recruiters.

Pull every YC company hiring right now — hiringOnly: true plus a stage filter — and pitch retained search to the founder
Build executive search target lists of late-stage YC alumni (stage: "Growth", topCompany: true) for VP / C-suite placements
Source ex-YC engineers for your client roster by joining this dataset with LinkedIn (the YC company website often lists "About" / "Team")
Visa-friendly employer mapping — combine with the H1B Visa Database to surface YC companies actively sponsoring H-1Bs
Time recruiter outreach to the launched-at date — a new launch means hiring volume jumps
Identify acqui-hire targets by filtering status: "Inactive" and recent batches — these founders need a soft landing

VC Analytics & Deal Flow

Whether you're a seed-stage VC tracking YC dealflow or a growth fund mapping competitor portfolios, this dataset is the foundation.

Competitor portfolio mapping — "What did Sequoia / a16z / Founders Fund back from Winter 2024?" by joining YC names with public investor databases
Theme-based pipeline building — query: "AI agents" returns every YC AI-agent startup; query: "vertical SaaS healthcare" returns the vertical SaaS healthcare cohort
Batch-over-batch trend analysis — count AI startups in W22 vs W23 vs W24 to quantify the AI explosion
Stage progression tracking — diff stage between monthly runs to spot companies graduating from Seed to Early (= recent fundraise = re-engage)
Geographic dealflow — regions: ["India"] or regions: ["Latin America"] surfaces emerging-market YC dealflow
Top Company anomaly detection — a topCompany: true company suddenly switching to status: "Inactive" is a data point worth investigating

Startup Research & Journalism

YC's batch composition is one of the best leading indicators of startup-ecosystem trends. Journalists, analysts, and researchers use this dataset to write data-driven stories.

Quantify the AI explosion — count companies tagged "AI" per batch since W22; the curve goes vertical in W23-S24
Track the fintech retreat of 2022 — count Fintech-tagged companies per batch and chart it
Cover the climate-tech rebound of 2024 — query: "climate" per batch over time
Build investor pitch-deck appendices with charts of YC team-size growth, batch-size evolution, geographic distribution
Profile cohorts — pull all of W24, sort by launchedAt, write a 5,000-word "State of W24" feature
Compare YC to Techstars / 500 by joining this dataset with sibling Apify scrapers

University Career Services

YC alumni companies hire aggressively from top CS programs. Career services teams build curated boards from the YC isHiring feed.

Show students which YC startups are hiring — filter hiringOnly: true + region matching campus
Cross-reference with visa data — combine with the H1B Visa Database for international student career boards
Build alumni placement reports — "X% of our CS '24 grads went to YC-backed startups"
Power on-campus recruiting pitches — invite hiring YC founders to do recruiting trips
Career fairs — pull all SF Bay Area YC companies hiring to plan a Bay Area trek

Conference & Event Sales

SaaStr, TechCrunch Disrupt, MicroConf, the Stage Convention — every B2B SaaS conference needs to fill seats with funded-founder buyers. YC companies are their bread and butter.

Build a SaaStr 2026 prospect list — stages: ["Early", "Growth"] + industries: ["B2B"]
TechCrunch Disrupt early-bird list — stages: ["Seed"] + most recent 2 batches
Sponsor outreach — topCompany: true companies are the dream sponsors with marketing budget
Speaker sourcing — Growth-stage YC founders make excellent panel speakers
Side-event invitee lists — every YC founder in regions: ["United States of America"] for the SF event circuit

Marketing & Ad Targeting

LinkedIn and Facebook custom audiences become dramatically more valuable when you can build a "YC-alumni-founder" persona.

LinkedIn custom audience seed — upload the founder names from this dataset (joined with LinkedIn URLs) for ABM campaigns
Founder-targeted Facebook custom audiences — match YC company websites to Facebook business accounts
Lookalike modeling — train a lookalike on YC founders to find similar prospects outside YC
Account-based marketing (ABM) for B2B SaaS — every YC company becomes a 1-row ABM target
Industry-specific newsletters — sell ad spots to AI / Fintech / Healthcare advertisers and price by audience size in the dataset

Competitive Landscape Mapping & Strategy Decks

Product strategy teams pay consultants $50K+ for "competitive landscape" decks. This dataset lets you build them in an afternoon.

"Every YC AI sales startup since 2020" — query: "AI sales" + batches: <list> — for sales-tech market mapping
"Every YC developer tools startup since IK12" — query: "developer tools" for dev-tools market saturation analysis
Industry concentration matrix — industry x batch pivot reveals where YC is concentrating bets
Product-strategy gap analysis — find an industry with few YC entrants — likely a green field
Investor memo appendix — "Of the 47 AI infrastructure startups YC has funded since W22, only 8 are growth-stage" is a powerful slide
Market sizing — total team size summed across an industry = a directional TAM proxy

M&A / Sourcing & Acquisition Targets

status: "Active" + stage: "Early" + sluggish team-size growth = a candidate acqui-hire conversation. Top corporate development teams scout YC alumni systematically.

Pre-Series-B acquisition targets — stage: "Early" + status: "Active" + small team size
Defensive acquisitions — find every YC company in your direct vertical and triage threat level
Acqui-hire scouting — status: "Inactive" companies whose founders are signal-rich talent
Founder LinkedIn enrichment — join names with LinkedIn to cold-message about strategic conversations
Competitor's portfolio acquisition — when a competitor goes on a YC-buying spree, the dataset surfaces the pattern

Investor Research & LP Reporting

LPs and emerging fund managers use YC dealflow as a benchmark for their own portfolios.

Sector exposure benchmarking — what % of YC's last 4 batches were AI vs your fund's exposure?
Geographic dealflow benchmarking — YC has 12% India; your fund has 2% — is that an opportunity or risk?
Vintage tracking — pull every YC batch, count Public + Acquired outcomes — compute YC's mortality and upside ratios by vintage
LP letter charts — embed YC market data as the "context" appendix in quarterly LP updates
Co-invest sourcing — identify YC Growth stage companies for late-stage co-invest deals

Sample Queries & Recipes

Recipe 1: Every AI YC startup actively hiring (recruiter goldmine)

{
  "query": "AI",
  "hiringOnly": true,
  "statuses": ["Active"],
  "maxRecords": 1000
}

Recipe 2: Full Winter 2024 batch — every company

{
  "batches": ["Winter 2024"],
  "maxRecords": 0
}

Recipe 3: B2B SaaS ICP for sales prospecting

{
  "industries": ["B2B"],
  "stages": ["Early", "Growth"],
  "statuses": ["Active"],
  "regions": ["United States of America"],
  "hiringOnly": true,
  "maxRecords": 1000
}

Recipe 4: YC's Top Companies list (Airbnb, Stripe, Coinbase, et al.)

{
  "topCompaniesOnly": true,
  "maxRecords": 0
}

Recipe 5: Fintech YC alumni in India

{
  "industries": ["Fintech"],
  "regions": ["India"],
  "statuses": ["Active"]
}

Recipe 6: Climate-tech surge across recent batches

{
  "query": "climate",
  "batches": [
    "Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
    "Winter 2023", "Summer 2023"
  ],
  "maxRecords": 0
}

Recipe 7: Full 5,916-company catalog via batch fan-out

{
  "batches": [
    "Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
    "Winter 2023", "Summer 2023",
    "Winter 2022", "Summer 2022",
    "Winter 2021", "Summer 2021",
    "Winter 2020", "Summer 2020",
    "Winter 2019", "Summer 2019",
    "Winter 2018", "Summer 2018",
    "Winter 2017", "Summer 2017",
    "Winter 2016", "Summer 2016",
    "Winter 2015", "Summer 2015",
    "Winter 2014", "Summer 2014",
    "Winter 2013", "Summer 2013",
    "Winter 2012", "Summer 2012",
    "Winter 2011", "Summer 2011",
    "Winter 2010", "Summer 2010",
    "Winter 2009", "Summer 2009",
    "Winter 2008", "Summer 2008",
    "Winter 2007", "Summer 2007",
    "Winter 2006", "Summer 2006",
    "Summer 2005",
    "IK12"
  ],
  "maxRecords": 0,
  "hitsPerPage": 1000,
  "requestDelay": 300
}

Integration Examples

Google Sheets (via Apify Integration)

Set up an Apify schedule running this actor weekly at 7:00 AM Monday
Add the "Export to Google Sheets" integration to the schedule
Receive a fresh YC company directory in your Sheet every Monday morning
Build pivot tables: batch x industry, stage x region, isHiring counts over time

Make.com / Zapier / n8n

Use the Apify connector on Make, Zapier, or n8n. Trigger downstream workflows on:

New companies (this week's run minus last week's = newly-added YC companies)
Stage transitions (Seed → Early = recent fundraise signal — fire a Slack alert)
isHiring flips to true (new hiring season — push to your recruiter Slack)
New launches (launchedAt is within the last 7 days — push to your Twitter scheduler)

Power BI / Tableau / Looker

Connect Apify's REST API as a data source. Refresh on the Apify schedule. Build dashboards covering:

YC batch size evolution over 18 years
Industry distribution per batch (the AI surge visualized)
Geographic dealflow heatmaps
Top Companies progression — who graduated to topCompany in the last quarter?

Postgres / Snowflake / BigQuery

Use the Apify webhook integration to POST run results directly to a data warehouse ingestion endpoint after every scheduled run. Suggested schema:

CREATE TABLE yc_companies (
  company_id           BIGINT PRIMARY KEY,
  name                 TEXT,
  slug                 TEXT,
  yc_profile_url       TEXT,
  website              TEXT,
  one_liner            TEXT,
  long_description     TEXT,
  logo_url             TEXT,
  location             TEXT,
  regions              JSONB,
  batch                TEXT,
  status               TEXT,
  stage                TEXT,
  team_size            INTEGER,
  industry             TEXT,
  subindustry          TEXT,
  industries           JSONB,
  tags                 JSONB,
  top_company          BOOLEAN,
  is_hiring            BOOLEAN,
  nonprofit            BOOLEAN,
  former_names         JSONB,
  launched_at          TIMESTAMPTZ,
  launched_at_unix     BIGINT,
  scraped_at           TIMESTAMPTZ
);
CREATE INDEX idx_yc_batch ON yc_companies(batch);
CREATE INDEX idx_yc_industry ON yc_companies(industry);
CREATE INDEX idx_yc_is_hiring ON yc_companies(is_hiring) WHERE is_hiring = TRUE;

Salesforce / HubSpot CRM Enrichment

Trigger an Apify run weekly, then upsert against Account records keyed on website or companyId. Stage transitions can auto-create Tasks; new Top Company designations can trigger Opportunity stage changes.

Webhooks → Slack / Discord

Pipe the actor's defaultDataset through an Apify webhook into your Slack channel. Recruiters get a daily "Today's newly-hiring YC companies" post. Sales gets a weekly "New YC fintech ICP additions" digest.

Major Markets & Regional Coverage

YC's portfolio is global. Below is a rough distribution of YC companies by region with significance notes.

Region	YC Presence	Significance
United States of America	~3,800 companies	The core — San Francisco, NYC, LA, Boston, Seattle, Austin, Miami
Europe	~600 companies	London, Berlin, Paris, Amsterdam, Stockholm, Madrid, Lisbon
India	~500 companies	Bengaluru, Mumbai, Delhi NCR, Hyderabad — fast-growing YC region
Latin America	~400 companies	São Paulo, Mexico City, Buenos Aires, Bogotá, Santiago
Canada	~200 companies	Toronto, Vancouver, Montreal, Waterloo
Asia (ex-India)	~250 companies	Singapore, Tokyo, Seoul, Jakarta, Manila
Africa	~120 companies	Lagos, Nairobi, Cape Town, Cairo
Australia / New Zealand	~100 companies	Sydney, Melbourne, Auckland
Middle East	~60 companies	Dubai, Tel Aviv, Riyadh
Remote-first	grows every batch	Distributed teams, no HQ

Tip: Combine regions filter with the H1B Visa Database to surface US-based YC employers who actively sponsor H-1Bs — pure gold for international recruiter outreach.

Cost & Performance

Metric	Value
Engine	Direct Algolia REST API (`got-scraping` HTTP) — no browser
Runtime (50 records, simple query)	~5 seconds
Runtime (1,000 records, single filter combination)	~10 seconds
Runtime (full ~5,916-company catalog via batch fan-out)	~5 minutes
Cost per default run	~0.001 Compute Units (typically less than $0.01)
Cost per full-catalog run	~0.01 CU (typically less than $0.05)
Pricing model	Pay-per-event (transparent per-record pricing)
Data freshness	Live at runtime — YC's Algolia index is continuously refreshed
Auth required	None — uses YC's public `ycdc_public` Algolia key
Proxy required	None — Algolia public endpoint has no anti-bot
Concurrency	Safe to run multiple parallel filtered configurations
Memory footprint	256 MB minimum, 1024 MB max — no scraping browser, low RAM

Compliance, Privacy & Legal Notes

Public data only — every field returned by this actor is published by Y Combinator at ycombinator.com/companies under their public ycdc_public Algolia tag, which is the same tag YC uses to expose data to their own client-side search UI
No PII / no personal data — the dataset describes companies, not individuals. Founder names are not in this dataset. (For founder-level enrichment, consume YC's company page separately.)
No emails, no phone numbers — the actor does not return any contact information
Respectful of YC's infrastructure — the actor uses low concurrency (1 in-flight Algolia call), configurable requestDelay (default 300ms), and 3-attempt exponential backoff. It is explicitly built to be a polite citizen.
YC's robots.txt does not block /companies and the underlying Algolia endpoint is unauthenticated and intentionally public
Algolia ToS — the ycdc_public secured key is issued by YC for client-side use; it self-restricts to tagFilters=["ycdc_public"] and restrictIndices=YCCompany_production,YCCompany_By_Launch_Date_production
GDPR / CCPA — this dataset contains no EU or California resident personal data; company-level facts are not personal data under either regulation
No commercial guarantees — fields, schemas, and Algolia keys are controlled by YC and may change without notice; the actor's normalization layer is built to handle most schema drift gracefully

Important: Use of this dataset for unsolicited bulk communications must comply with CAN-SPAM, TCPA, GDPR, CCPA, and the YC website ToS. The actor publisher is not responsible for downstream misuse.

Frequently Asked Questions

How fresh is the data?

YC updates its Algolia index continuously — new companies appear within hours of being announced. The actor hits Algolia live on every run, so the data is as fresh as YC publishes it.

How many companies will I get?

As of the current snapshot, 5,916 companies are in YC's directory. A default run with no filters returns 500 records (the per-run cap). To pull the full catalog, list every batch in the batches input (~50 batches since IK12) — this fans out into multiple queries and breaks the Algolia 1,000-per-query cap.

Why does Algolia cap a single query at 1,000 results?

It's a security feature of the secured API key YC issues to their client-side search UI. Single-query result depth is capped to prevent bulk scraping via the public key. The actor works around this by fanning out across batches — each batch query is < 300 results so each one returns the full batch.

No. The actor uses YC's own public Algolia key — the same one your browser uses when you visit ycombinator.com/companies. You only need an Apify account to run the actor.

Does this scrape ycombinator.com HTML?

No. The actor talks directly to the Algolia search REST API that powers the YC site. This is faster, more reliable, and respectful of YC's web servers (zero impact on ycombinator.com).

Does the actor return founder names or emails?

No. Founder information is not in YC's Algolia index — only company-level metadata. Combine with sibling actors like SEEK for jobs or Levels.fyi for comp data to enrich.

Are inactive / shut-down YC companies included?

Yes. Set statuses: ["Inactive"] to filter to wound-up companies, or leave statuses empty to get every status (Active, Acquired, Public, Inactive).

Can I filter by year of YC participation?

Yes — use the batches filter with cohort names like "Winter 2024", "Summer 2023", etc. To get all of 2024, list ["Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024"].

What's the difference between `industry`, `subindustry`, and `industries`?

industry is the top-level YC category (e.g., "B2B")
subindustry is the more granular vertical with hierarchy syntax (e.g., "B2B -> Sales")
industries is the array of all industry tags the company carries — often the most useful for filtering

Can I get the YC Top Companies list?

Yes — set topCompaniesOnly: true. This returns YC's curated list of unicorns and best exits (Airbnb, Stripe, Coinbase, DoorDash, Reddit, Twitch, Instacart, Brex, Rappi, GitLab, Faire, et al.).

Does the actor deduplicate?

Yes. Within a single run the actor dedups by Algolia companyId, so even when multiple batch fan-out queries surface the same company (rare, but possible if companies span batches), only one record is saved.

Is proxy or residential IP required?

No. The Algolia public endpoint has no anti-bot or rate-limiting on the ycdc_public tag. You can attach Apify Proxy via proxyConfiguration if your network policy requires it, but it's pure overhead.

How do I pull the entire 5,916-company catalog?

Use Recipe 7 above — list every batch from "Winter 2024" back through "Summer 2005" and "IK12". The fan-out completes in ~5 minutes.

Can I run this on a schedule automatically?

Yes — Apify's built-in Scheduler lets you trigger this actor on any cron expression. Weekly or daily runs work well for change-detection workflows. Combine with webhooks for fully automated pipelines.

Does this actor work with the Apify Free Plan?

Yes — full functionality on the free tier. A default 500-record run costs a fraction of a Compute Unit. The full 5,916-company fan-out still fits within the free monthly CU budget.

What formats can I export the data in?

JSON, CSV, Excel (XLSX), HTML, XML, JSONLines, and RSS — directly from the Apify dataset view. The API also supports streaming for large datasets.

What happens if Algolia returns zero results?

The actor explicitly calls Actor.fail() with a helpful error message ("No records. Try clearing all filters, or check that batch/industry/status spellings match YC's exactly (case-sensitive).") so you never get a silent SUCCEEDED with an empty dataset.

Is the data accurate?

The data is exactly what YC publishes on their own directory — same source, same recency. If a company's status or stage looks wrong, that's YC's directory; the actor does not modify or filter beyond what you request.

How do I report a bug or request a feature?

Open an issue on the Apify Store actor page or contact the developer directly through the Apify Console.

Whether you're enriching YC company data with hiring intel, comp data, federal funding, or B2B verification, these sibling actors are designed to compose:

H1B Visa Database — US Visa Sponsorship Scraper — perfect complement: surface YC employers actively sponsoring H-1Bs for international recruiting
Levels.fyi Scraper — tech compensation data for YC-backed startups — invaluable for VC, recruiter, and candidate research
SEEK Scraper (Australia / NZ) — job postings — pair with YC isHiring: true data for regional recruiter intel
ProductHunt Launches & Makers Scraper — daily startup launches, makers, votes & reviews — VC/founder/recruiter intel
BBB Business Scraper — Better Business Bureau ratings — verify post-IPO YC alumni reputation
SAM.gov Federal Contractor Entity Scraper — federal funding peer dataset — see which YC alumni are also federal contractors
TTB Alcohol Permittee Scraper — federal licensing peer — useful for YC consumer / alcohol vertical research
Salary.com Scraper — salary benchmarks for YC alumni job postings
Texas Pharmacy License Scraper — TSBP — healthcare licensing peer dataset for YC healthtech research
California DCA Professional License Scraper — CA professional licensing — useful for YC regulated-industry research
Ohio eLicense Scraper — Ohio professional licensing — sibling regulatory dataset
Illinois IDFPR License Scraper — Illinois professional licensing — sibling regulatory dataset

Comparison vs. Alternatives

Approach	Setup time	Coverage	Data freshness	Cost (5,916 records)	Schema normalization	Proxy needed
This actor	< 1 minute	5,916+ companies	Live at runtime	< $0.05	Built-in	No
Manual ycombinator.com browsing	Hours/days	Limited by attention span	Live	Free	None	No
Headless browser scrape (Puppeteer)	1-2 days dev	Full	Live	$1-5 per run (CU cost)	DIY	Optional
Custom Algolia client	4-8 hours dev	Full (if you handle 1000-cap)	Live	Free + infra	DIY	No
Paid startup database (PitchBook, CB Insights)	Days to onboard	Vast (not just YC)	Daily	$1,000-50,000/year	Built-in	N/A
LinkedIn Sales Navigator	Hours	YC alumni only inferred	Live	$99-149/seat/mo	None	N/A

Why Pay-Per-Event Pricing?

Most data scrapers either charge a flat monthly subscription (you pay even if you don't use it) or per-Compute-Unit (unpredictable). This actor uses pay-per-event pricing, which means:

You only pay when the actor runs
Charges scale with how much data you actually consume
Transparent, line-item billing inside Apify
No monthly minimums or annual commitments
Free to evaluate — sample 50 records for pennies before committing to a full catalog pull
Predictable cost-per-record — easy to forecast scrape budget for procurement

Changelog

Last updated: 2026-07-02 — Actor verified and maintained. Data pipeline tested for quality, structure and freshness; selectors/endpoints confirmed against the live site.

Version	Date	Notes
1.0.1	2026-07	Maintenance — verified Algolia endpoint, field mapping and data freshness against the live YC directory; confirmed batch fan-out and retry behavior
1.0.0	2026-05	Initial public release — direct Algolia API integration, batch fan-out for >1,000 results, 3-attempt retry with exponential backoff, `Actor.fail()` on zero records, full ISO-8601 timestamp normalization, pay-per-event pricing

Keywords

Y Combinator scraper · YC companies database · YC startup directory scraper · Y Combinator API alternative · ycombinator.com/companies scraper · YC batch directory · startup directory scraper · funded startups scraper · B2B SaaS prospecting · VC portfolio scraper · YC alumni directory · Algolia startup search · YC Algolia API · YC Winter 2024 directory · YC Summer 2024 batch scraper · YC Top Companies list scraper · YC unicorn list · YC hiring scraper · YC isHiring filter · YC company API · funded startup lead generation · ICP list builder YC · YC fintech startups · YC AI startups · YC dev tools directory · YC healthcare startups · YC India directory · YC Latin America directory · YC growth-stage companies · YC seed-stage prospects · YC acquisition target list · YC competitor mapping · YC recruiter intelligence · YC executive search · YC ABM list · YC LinkedIn audience seed · YC investor research · startup database API · startup intelligence platform · founder outreach data · YC alumni hiring · YC batch trends · post-funding ICP scraper · YC company logo URLs · Apify YC actor · Haketa YC scraper

Support

Bug reports: Use the Issues tab on the Apify Store page
Feature requests: Same place — please describe the use case and the input combination you'd like to see supported
Direct contact: Through the Apify developer profile haketa

If this actor saves you time or unlocks a new workflow, a 5-star rating on the Apify Store helps other sales, recruiting, VC, and research teams discover it. Thank you!

Y Combinator Scraper

michael.g/y-combinator-scraper

Extract startup leads, founder emails, LinkedIn profiles, hiring data, and more from YC companies and founders. Export scraped data, schedule via API, and integrate with other tools or AI workflows.

Michael G

1.5K

5.0

Y Combinator · YC · Only $1💰 · Jobs & Companies scraper

memo23/y-combinator-scraper

💰 $1/1K One actor for Y Combinator jobs (Work at a Startup) and companies (Startup Directory). Paste any YC URL — auto-routed — or use filters. Companies via Algolia: no proxy, clean schema. Optional founder enrichment: LinkedIn/Twitter URLs, company socials, open jobs. Full batch history to 2005.

Muhamed Didovic

133

4.6

Y Combinator Startups Scraper

automation-lab/ycombinator-scraper

Extract Y Combinator startup data: company names, websites, descriptions, team sizes, batches, industries, and hiring status. Filter by batch (W24, S23), status, industry, or tags. Uses the official YC API — no proxy needed. Export as JSON, CSV, or Excel.

Stas Persiianenko

Y Combinator Companies Directory — Startup API

nexgendata/yc-companies-directory-scraper

Scrape the Y Combinator company directory for deal sourcing. Clean JSON for VCs, deal scouts and AI agents.

NexGenData

Y Combinator Companies Scraper

parseforge/y-combinator-scraper

Extract company profiles, founders, and open job listings from the Y Combinator directory. Filter by batch, industry, subindustry, region, and hiring status. Covers 5,700+ funded startups from W05 to the latest YC cohort. Includes growth stage, equity ranges, salary data, and contact emails.

ParseForge

Y Combinator Scraper - 5000+ Startups & 8000+ Founders

clearpath/ycombinator-api-scraper

Extract complete Y Combinator ecosystem data - 5000+ companies, 8000+ founders, 3500+ jobs. Perfect for VCs, recruiters, and researchers. Get startup intelligence, funding trends, team data, and job listings. Reliable Python scraper with proxy support. Start at $3.50.

ClearPath

372

4.3

Y Combinator Jobs Scraper

artemlazarevm/yc-jobs-scraper

Scrape Y Combinator companies and job listings. 2,500+ startups, 2,400+ jobs, 3,300+ founders. Free dataset: https://www.kaggle.com/datasets/lazarun/y-combinator-jobs-enriched (scraped with this API).

Artem Lazarev

117

Y Combinator Scraper with Founders & Emails

fatihtahta/y-combinator-directory-scraper

Scrape the Y Combinator directory and get rich company profiles with socials, founder details + emails, hiring status/job links, and news mentions. Perfect for lead gen, market mapping, recruiting, and competitor tracking.

Fatih Tahta

217

3.7