YCombinator Companies Scraper | 5,900+ YC Startup Directory avatar

YCombinator Companies Scraper | 5,900+ YC Startup Directory

Pricing

from $1.00 / 1,000 results

Go to Apify Store
YCombinator Companies Scraper | 5,900+ YC Startup Directory

YCombinator Companies Scraper | 5,900+ YC Startup Directory

Scrape the Y Combinator startup directory (5,900+ funded companies) via the official Algolia API. Name, website, batch, status, team size, industry, tags, hiring flag, launched-at, logo. B2B sales prospecting, recruiter intel, VC analytics. HTTP-only, fast.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Haketa

Haketa

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

YCombinator Companies Scraper — 5,900+ YC Startup Directory Extractor for Sales, Recruiting, VC Research & Competitive Intelligence

The fastest, most complete Y Combinator startup directory extractor on Apify. Pull every funded YC company since 2005 — name, website, batch, status, stage, team size, industry, tags, region, hiring flag, launched-at, logo — straight from the official Algolia search backend that powers ycombinator.com/companies. Zero browsers, zero anti-bot, ideal ICP data for B2B SaaS sales prospecting, recruiter intel, VC analytics, and competitive landscape mapping.

Apify Actor


What This Actor Does

The YCombinator Companies Scraper is a production-grade Apify Actor that extracts the complete Y Combinator funded startup directory — every company that has ever been backed by YC since the IK12 (Independent Kickstart 2012) era through every subsequent Winter, Summer, Spring, and Fall batch up to the latest cohort. As of the current snapshot, that's 5,916 funded companies spanning early-stage seed bets to publicly traded YC alumni like Airbnb, Coinbase, DoorDash, and Stripe.

Under the hood, the actor talks directly to YC's official Algolia search backend — the very same 45BWZJ1SGC Algolia app and YCCompany_production index that power the search box, filters, and infinite scroll on ycombinator.com/companies. No headless browser. No HTML parsing. No anti-bot to dodge. Just polite, low-concurrency HTTP POST calls to the public Algolia REST endpoint with the ycdc_public tag filter that YC explicitly publishes for client-side use.

In a single run (typically 5 seconds for 50 records, ~5 minutes for the full 5,916-company catalog via batch fan-out), the actor returns richly normalized JSON records covering:

  • Companies — every YC-funded startup (Active, Acquired, Public, Inactive)
  • Stages — Seed, Early, Growth — useful for filtering to post-funding ICP
  • Industries — B2B, Consumer, Fintech, Healthcare, Government, Education, Real Estate & Construction, Industrials and more
  • Regions — United States of America, Europe, India, Asia, Latin America, Africa, Canada, Australia/New Zealand
  • Batches — Winter 2024, Summer 2024, Spring 2024, Fall 2024 — all the way back to IK12
  • Hiring signalisHiring boolean for recruiter and job-board use cases
  • Top company flag — YC's curated list of unicorns and best exits

Every record ships with the company's website, one-liner pitch, long description, logo URL, team size, launched-at timestamp, tags, sub-industry, former names, and a deep link back to the canonical ycombinator.com/companies/<slug> profile.

Why scrape Y Combinator yourself when this exists?

YC's directory looks innocently easy to scrape — it's just a public page. But teams that try the DIY route quickly hit a stack of headaches:

  • The directory is fully JavaScript-rendered Reactcurl of the HTML returns an empty shell with zero company data
  • A headless browser approach (Puppeteer, Playwright) means 5-10 minute runs for full coverage and high compute cost
  • Without knowing the secured Algolia key, naive Algolia callers get 403 Forbidden — the key is base64-encoded and rotates implicitly via embedded validUntil
  • Algolia caps a single secured-key query at 1,000 results — you can't just ask for "all 5,916 companies in one call"
  • The on-site infinite scroll uses Algolia's page pagination which silently truncates past 1,000 — most DIY scripts plateau at ~1,000 and never notice the missing 4,900 records
  • Facet filters use a nested array-of-arrays syntax (facetFilters=[["batch:W24"],["status:Active"]]) that's poorly documented outside Algolia's own docs
  • Field names in the Algolia response (one_liner, small_logo_thumb_url, all_locations, launched_at) need normalization to a sane camelCase schema before they're database-ingestible
  • Timestamps come as Unix epochs that need ISO-8601 conversion
  • YC tweaks the index periodically — adding top_company, regions, subindustry, splitting industries into array — meaning your custom scraper breaks silently
  • Zero retry / backoff on the naive call means transient Algolia 5xx errors kill your run

This actor solves all of that: it speaks the Algolia facetFilter dialect fluently, fans out by batch to break through the 1,000-cap, retries with exponential backoff, normalizes every field, converts launched-at to ISO timestamps, and Actor.fail()s on zero records so you never get a silent SUCCEEDED with an empty dataset.


Quick Start

One-Click Run

  1. Click "Try for free" on the Apify Store page
  2. Leave inputs empty to browse the first 500 YC companies, or type AI into the query box for AI-focused startups
  3. Hit Start — your dataset is ready in under 10 seconds for a default run
  4. Download as JSON, CSV, Excel, or HTML directly from the Apify dataset view, or pipe to Google Sheets / a webhook

API Run (Python)

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
# Example 1: every YC-funded AI startup that's actively hiring
run = client.actor("haketa/ycombinator-companies-scraper").call(run_input={
"query": "AI",
"hiringOnly": True,
"maxRecords": 500,
})
for company in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{company['name']:<30} {company['batch']:<12} "
f"team={company['teamSize']} {company['website']}")

API Run (Python — full catalog via batch fan-out)

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
# Pull the entire YC catalog by fanning out across recent batches
batches = [
"Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
"Winter 2023", "Summer 2023", "Winter 2022", "Summer 2022",
"Winter 2021", "Summer 2021", "Winter 2020", "Summer 2020",
# ...add all batches back to IK12 for full 5,916-company coverage
]
run = client.actor("haketa/ycombinator-companies-scraper").call(run_input={
"batches": batches,
"maxRecords": 0, # unlimited
"hitsPerPage": 1000,
"requestDelay": 300,
})
print(f"Saved {run['stats']['outputBodyLen']} bytes to dataset {run['defaultDatasetId']}")

API Run (Node.js / TypeScript)

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('haketa/ycombinator-companies-scraper').call({
industries: ['Fintech'],
statuses: ['Active'],
regions: ['United States of America'],
stages: ['Early', 'Growth'],
hiringOnly: true,
maxRecords: 1000,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Got ${items.length} US fintech YC companies hiring at Early/Growth stage`);
items.slice(0, 5).forEach(c => console.log(`- ${c.name}: ${c.oneLiner}`));

API Run (cURL)

curl -X POST "https://api.apify.com/v2/acts/haketa~ycombinator-companies-scraper/runs?token=YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"query": "developer tools",
"batches": ["Winter 2024", "Summer 2024"],
"hiringOnly": true,
"maxRecords": 200
}'

How It Works

YC's directory at ycombinator.com/companies is a React single-page app whose search, filters, and infinite scroll all call Algolia's hosted search REST API. The Algolia application is publicly identifiable in the browser network tab:

  • Algolia Application ID: 45BWZJ1SGC
  • Primary index: YCCompany_production
  • Secondary index: YCCompany_By_Launch_Date_production
  • Tag filter: ycdc_public — YC's own tag for client-exposed data
  • Endpoint: https://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query

The actor POSTs a JSON body with a URL-encoded params string containing the free-text query, hitsPerPage, page, and a facetFilters array-of-arrays expressing AND-across-categories / OR-within-category logic. It uses the same browser-exposed secured API key that ycombinator.com hands out — a base64 blob that embeds analyticsTags=ycdc, restrictIndices=YCCompany_production,YCCompany_By_Launch_Date_production, and tagFilters=["ycdc_public"] so it can only ever return data YC has explicitly marked public.

Endpoint reference

SourceEndpointRecordsCadence
Algolia primaryhttps://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_production/query5,916 companies (current snapshot)Live — updated by YC continuously
Algolia secondaryhttps://45bwzj1sgc-dsn.algolia.net/1/indexes/YCCompany_By_Launch_Date_production/querySame companies, sorted by launch dateLive
YC profile pagehttps://www.ycombinator.com/companies/<slug>One per companyLive

Engineering details

  • HTTP-only via got-scraping — no Puppeteer, no Playwright, no Chromium. Each Algolia call is a single sub-second HTTPS POST.
  • Algolia facet-filter dialect — nested array-of-arrays serialized as URL-encoded JSON: [["batch:Winter 2024"],["status:Active","status:Acquired"]].
  • Batch fan-out for the 1,000-cap — Algolia caps a secured-key query at 1000 hits. To exceed that, the actor lets you list every batch (Winter 2024, Summer 2024, ..., IK12) and runs one query per batch. Each batch is < 300 companies, so 50+ batches multiplied out = full 5,916-company catalog.
  • Pagination loop — each filter combination loops page=0..nbPages-1 collecting hits, deduplicating by Algolia id along the way.
  • 3-attempt retry with exponential backoff — failed Algolia calls are retried with 2s, 4s, 6s waits plus jitter. Permanent failure logs an error and skips the batch.
  • Actor.fail() on zero results — prevents the dreaded "SUCCEEDED with empty dataset" scenario; the run explicitly fails with a hint about case-sensitive batch/industry spellings.
  • Polite delays — configurable requestDelay (default 300ms) between Algolia calls so the actor never hammers YC's infrastructure.
  • Field normalization — Algolia's snake_case (one_liner, small_logo_thumb_url, all_locations, launched_at, team_size) is mapped to clean camelCase (oneLiner, logoUrl, location, launchedAt, teamSize).
  • Timestamp conversion — Unix launched_at epoch is converted to ISO-8601 launchedAt plus the raw launchedAtUnix for time-series workflows.
  • No proxy required — Algolia's public search endpoint has zero anti-bot. You may attach Apify Proxy via proxyConfiguration if you want, but it's pure overhead for this actor.
  • Deterministic output — same input always produces the same set of records (Algolia is sorted by their default relevance score for the query).

Input Parameters

{
"query": "AI",
"batches": ["Winter 2024", "Summer 2024"],
"statuses": ["Active"],
"industries": ["B2B"],
"regions": ["United States of America"],
"stages": ["Seed", "Early"],
"hiringOnly": true,
"topCompaniesOnly": false,
"maxRecords": 500,
"hitsPerPage": 1000,
"requestDelay": 300
}

Parameter reference

ParameterTypeDefaultDescription
querystring""Free-text search across name, one-liner, long description, industry, and tags. Empty = browse all. Examples: "AI", "developer tools", "fintech", "climate", "vertical SaaS".
batchesarray<string>[]Filter by YC batch. Format: "Winter 2024", "Summer 2023", "Spring 2024", "Fall 2024", "IK12", etc. Each batch listed runs as a separate fan-out query — the recommended way to break the Algolia 1000-result cap and pull the full catalog.
statusesarray<string>[]Filter by company status. Values: "Active", "Acquired", "Public", "Inactive". Empty = all four.
industriesarray<string>[]Filter by industry. Examples: "B2B", "Consumer", "Fintech", "Healthcare", "Government", "Real Estate and Construction", "Education", "Industrials". Empty = all. Case-sensitive — match YC's exact spelling.
regionsarray<string>[]Filter by region. Examples: "United States of America", "Europe", "Asia", "India", "Latin America", "Africa", "Canada", "Australia / New Zealand".
stagesarray<string>[]Filter by company stage. Values: "Seed", "Early", "Growth". Empty = all three.
hiringOnlybooleanfalseWhen true, only returns companies with isHiring: true. Killer filter for recruiters and job-board operators.
topCompaniesOnlybooleanfalseWhen true, only returns YC's curated Top Companies — the unicorns and best exits (think Airbnb, Stripe, Coinbase, DoorDash, Reddit, Twitch, Instacart).
maxRecordsinteger500Hard cap on total records across all fan-out queries. 0 = unlimited (bounded by Algolia's 1000-per-query cap × number of filter combinations). Set to 0 when pulling the full 5,916-company catalog.
hitsPerPageinteger1000Algolia page size. 1000 is the maximum the secured key allows and keeps request count minimal.
requestDelayinteger300Milliseconds between Algolia calls. Algolia is sub-second fast but 200-500ms is the polite range.
proxyConfigurationobjectnoneOptional Apify proxy. Almost never needed — Algolia's public search API has zero rate-limit on the ycdc_public tag.

Output Schema

Every record is a flat JSON object with the same field set, so downstream consumers (Postgres, Snowflake, Salesforce, HubSpot, Airtable) can ingest without per-category branching.

Core company fields

FieldTypeDescription
companyIdintegerStable YC-assigned numeric ID. Use as the primary key in your warehouse.
namestringCompany name (e.g., "Airbyte", "Stripe", "&AI").
slugstringURL-safe handle (e.g., "airbyte", "stripe", "and-ai").
ycProfileUrlstringCanonical deep link: https://www.ycombinator.com/companies/<slug>.
websitestringThe company's own homepage URL.
oneLinerstringThe pitch in a sentence (e.g., "Open-source data movement infrastructure").
longDescriptionstringMulti-sentence company description from the YC profile.
logoUrlstringThumbnail logo URL hosted on YC's CDN.

Classification fields

FieldTypeDescription
batchstringYC cohort (e.g., "Winter 2020", "Summer 2024", "IK12").
statusstring"Active", "Acquired", "Public", or "Inactive".
stagestring"Seed", "Early", or "Growth".
industrystringPrimary industry (e.g., "B2B", "Fintech", "Healthcare").
subindustrystringMore granular vertical (e.g., "B2B -> Sales", "Fintech -> Banking and Exchange").
industriesarray<string>Full multi-industry tag list.
tagsarray<string>Free-form descriptive tags (e.g., ["AI", "Sales", "B2B", "LegalTech"]).

Operational fields

FieldTypeDescription
teamSizeintegerReported headcount at scrape time.
locationstringFree-text location string (e.g., "San Francisco, CA, USA").
regionsarray<string>Normalized region list (e.g., ["America / Canada", "United States of America", "Remote"]).
isHiringbooleantrue if the company is actively hiring on Work at a Startup.
topCompanybooleantrue if YC has curated this company on its "Top Companies" list.
nonprofitbooleantrue if registered as a nonprofit (YC funds a few each batch).
formerNamesarray<string>Previous names if the company rebranded.
launchedAtstringISO-8601 launch date (e.g., "2024-07-15T00:00:00.000Z").
launchedAtUnixintegerSame timestamp as Unix epoch seconds — convenient for time-series joins.

Provenance fields

FieldTypeDescription
searchQuerystringThe query string that surfaced this record (echoed back for multi-query runs).
searchBatchstringThe batch filter that surfaced this record (for fan-out runs).
scrapedAtstringISO-8601 timestamp of when the actor pulled this record.

Example: An AI B2B startup (verified live from query="AI")

{
"companyId": 31984,
"name": "&AI",
"slug": "and-ai",
"ycProfileUrl": "https://www.ycombinator.com/companies/and-ai",
"website": "https://www.and.ai",
"oneLiner": "AI for IP and patent law",
"longDescription": "&AI builds the AI copilot for IP attorneys and patent agents — drafting, prior art searches, office action responses, and portfolio analytics in one workspace.",
"logoUrl": "https://bookface-images.s3.amazonaws.com/small_logos/and-ai.png",
"location": "New York, NY, USA",
"regions": ["America / Canada", "United States of America"],
"batch": "Summer 2024",
"status": "Active",
"stage": "Seed",
"teamSize": 13,
"industry": "B2B",
"subindustry": "B2B -> LegalTech",
"industries": ["B2B", "B2B -> LegalTech"],
"tags": ["AI", "Artificial Intelligence", "LegalTech", "B2B"],
"topCompany": false,
"isHiring": true,
"nonprofit": false,
"formerNames": null,
"launchedAt": "2024-07-20T00:00:00.000Z",
"launchedAtUnix": 1721433600,
"searchQuery": "AI",
"searchBatch": null,
"scrapedAt": "2026-05-18T09:15:00.000Z"
}

Example: A growth-stage YC alumnus (Airbyte)

{
"companyId": 23892,
"name": "Airbyte",
"slug": "airbyte",
"ycProfileUrl": "https://www.ycombinator.com/companies/airbyte",
"website": "https://airbyte.com",
"oneLiner": "Open-source data movement infrastructure",
"longDescription": "Airbyte is the leading open-source ELT platform with 300+ pre-built connectors. Used by thousands of data teams to centralize data into warehouses, lakes, and AI vector stores.",
"logoUrl": "https://bookface-images.s3.amazonaws.com/small_logos/airbyte.png",
"location": "San Francisco, CA, USA",
"regions": ["America / Canada", "United States of America", "Remote"],
"batch": "Winter 2020",
"status": "Active",
"stage": "Growth",
"teamSize": 90,
"industry": "B2B",
"subindustry": "B2B -> Engineering, Product and Design",
"industries": ["B2B", "B2B -> Engineering, Product and Design"],
"tags": ["AI", "Data Engineering", "Open Source", "Developer Tools"],
"topCompany": true,
"isHiring": true,
"nonprofit": false,
"formerNames": null,
"launchedAt": "2020-07-21T00:00:00.000Z",
"launchedAtUnix": 1595289600,
"searchQuery": "AI",
"searchBatch": null,
"scrapedAt": "2026-05-18T09:15:00.000Z"
}

Status, Stage & Industry Reference

Company statuses

StatusMeaning
ActiveStill operating independently and most likely raising or growing
AcquiredBought by another company (great for M&A pattern research)
PublicIPO'd or listed via SPAC (Airbnb, Coinbase, DoorDash, Reddit, etc.)
InactiveShut down, wound up, or otherwise dormant

Stages

StageTypical Profile
SeedJust out of YC, < 10 people, pre-Series A — primary recruiter and SDR target
EarlySeries A / B, 10-100 people — prime ICP for dev tools, payroll, HR, observability SaaS
GrowthSeries C+, 100+ people — enterprise SaaS, fintech, and consulting ICP

Top YC industries (with sample counts)

IndustryNotes
B2BThe largest industry segment — SaaS, dev tools, sales, HR, security, finance ops
ConsumerDTC, social, gaming, marketplaces, creator economy
FintechBanking, payments, lending, crypto, insurance, wealth management
HealthcareDiagnostics, telehealth, biotech, mental health, healthtech infrastructure
EducationK-12, higher ed, professional learning, EdTech infrastructure
Real Estate and ConstructionPropTech, construction tech, vacation rentals, real estate fintech
GovernmentGovTech, defense, public-sector SaaS
IndustrialsHardware, manufacturing, supply chain, climate, space

Tip: Use industries: ["B2B"] + stages: ["Early", "Growth"] + hiringOnly: true to get the canonical SaaS-sales prospecting list — post-funded, growing-headcount B2B YC companies.


Use Cases

B2B SaaS Sales Prospecting

Funded YC startups are the highest-converting cohort for dev-tools, payroll, HR, observability, security, payment, and infrastructure SaaS sales teams. They're flush with capital, growing headcount, and the founders are technically literate so the sales cycle is short.

  • Build hyper-targeted ICP lists by combining industries: ["B2B"] + stages: ["Early", "Growth"] + teamSize > 20
  • Identify post-funding spikes by filtering on the most recent 4 batches (Winter 2024, Summer 2024, Spring 2024, Fall 2024) — these are the companies with fresh capital and procurement budgets
  • Enrich your CRM by appending YC batch year, stage, team size, and industry to existing Salesforce/HubSpot accounts
  • Run trigger-based outbound — when a Seed-stage company in your ICP rolls over to Early, that's a buying-signal alert
  • Route territory ownership by region (regions: ["United States of America"] vs regions: ["Europe"])
  • Score account fit using YC batch as a proxy for company sophistication (a W24 company has different needs than an IK12 company)

Recruiter & Executive Search Intel

YC alumni network is the most concentrated source of "ex-founder", "early-engineer-at-unicorn", and "first-PM" talent on the planet. The isHiring flag is gold for recruiters.

  • Pull every YC company hiring right nowhiringOnly: true plus a stage filter — and pitch retained search to the founder
  • Build executive search target lists of late-stage YC alumni (stage: "Growth", topCompany: true) for VP / C-suite placements
  • Source ex-YC engineers for your client roster by joining this dataset with LinkedIn (the YC company website often lists "About" / "Team")
  • Visa-friendly employer mapping — combine with the H1B Visa Database to surface YC companies actively sponsoring H-1Bs
  • Time recruiter outreach to the launched-at date — a new launch means hiring volume jumps
  • Identify acqui-hire targets by filtering status: "Inactive" and recent batches — these founders need a soft landing

VC Analytics & Deal Flow

Whether you're a seed-stage VC tracking YC dealflow or a growth fund mapping competitor portfolios, this dataset is the foundation.

  • Competitor portfolio mapping — "What did Sequoia / a16z / Founders Fund back from Winter 2024?" by joining YC names with public investor databases
  • Theme-based pipeline buildingquery: "AI agents" returns every YC AI-agent startup; query: "vertical SaaS healthcare" returns the vertical SaaS healthcare cohort
  • Batch-over-batch trend analysis — count AI startups in W22 vs W23 vs W24 to quantify the AI explosion
  • Stage progression tracking — diff stage between monthly runs to spot companies graduating from Seed to Early (= recent fundraise = re-engage)
  • Geographic dealflowregions: ["India"] or regions: ["Latin America"] surfaces emerging-market YC dealflow
  • Top Company anomaly detection — a topCompany: true company suddenly switching to status: "Inactive" is a data point worth investigating

Startup Research & Journalism

YC's batch composition is one of the best leading indicators of startup-ecosystem trends. Journalists, analysts, and researchers use this dataset to write data-driven stories.

  • Quantify the AI explosion — count companies tagged "AI" per batch since W22; the curve goes vertical in W23-S24
  • Track the fintech retreat of 2022 — count Fintech-tagged companies per batch and chart it
  • Cover the climate-tech rebound of 2024query: "climate" per batch over time
  • Build investor pitch-deck appendices with charts of YC team-size growth, batch-size evolution, geographic distribution
  • Profile cohorts — pull all of W24, sort by launchedAt, write a 5,000-word "State of W24" feature
  • Compare YC to Techstars / 500 by joining this dataset with sibling Apify scrapers

University Career Services

YC alumni companies hire aggressively from top CS programs. Career services teams build curated boards from the YC isHiring feed.

  • Show students which YC startups are hiring — filter hiringOnly: true + region matching campus
  • Cross-reference with visa data — combine with the H1B Visa Database for international student career boards
  • Build alumni placement reports — "X% of our CS '24 grads went to YC-backed startups"
  • Power on-campus recruiting pitches — invite hiring YC founders to do recruiting trips
  • Career fairs — pull all SF Bay Area YC companies hiring to plan a Bay Area trek

Conference & Event Sales

SaaStr, TechCrunch Disrupt, MicroConf, the Stage Convention — every B2B SaaS conference needs to fill seats with funded-founder buyers. YC companies are their bread and butter.

  • Build a SaaStr 2026 prospect liststages: ["Early", "Growth"] + industries: ["B2B"]
  • TechCrunch Disrupt early-bird liststages: ["Seed"] + most recent 2 batches
  • Sponsor outreachtopCompany: true companies are the dream sponsors with marketing budget
  • Speaker sourcing — Growth-stage YC founders make excellent panel speakers
  • Side-event invitee lists — every YC founder in regions: ["United States of America"] for the SF event circuit

Marketing & Ad Targeting

LinkedIn and Facebook custom audiences become dramatically more valuable when you can build a "YC-alumni-founder" persona.

  • LinkedIn custom audience seed — upload the founder names from this dataset (joined with LinkedIn URLs) for ABM campaigns
  • Founder-targeted Facebook custom audiences — match YC company websites to Facebook business accounts
  • Lookalike modeling — train a lookalike on YC founders to find similar prospects outside YC
  • Account-based marketing (ABM) for B2B SaaS — every YC company becomes a 1-row ABM target
  • Industry-specific newsletters — sell ad spots to AI / Fintech / Healthcare advertisers and price by audience size in the dataset

Competitive Landscape Mapping & Strategy Decks

Product strategy teams pay consultants $50K+ for "competitive landscape" decks. This dataset lets you build them in an afternoon.

  • "Every YC AI sales startup since 2020"query: "AI sales" + batches: <list> — for sales-tech market mapping
  • "Every YC developer tools startup since IK12"query: "developer tools" for dev-tools market saturation analysis
  • Industry concentration matrixindustry x batch pivot reveals where YC is concentrating bets
  • Product-strategy gap analysis — find an industry with few YC entrants — likely a green field
  • Investor memo appendix — "Of the 47 AI infrastructure startups YC has funded since W22, only 8 are growth-stage" is a powerful slide
  • Market sizing — total team size summed across an industry = a directional TAM proxy

M&A / Sourcing & Acquisition Targets

status: "Active" + stage: "Early" + sluggish team-size growth = a candidate acqui-hire conversation. Top corporate development teams scout YC alumni systematically.

  • Pre-Series-B acquisition targetsstage: "Early" + status: "Active" + small team size
  • Defensive acquisitions — find every YC company in your direct vertical and triage threat level
  • Acqui-hire scoutingstatus: "Inactive" companies whose founders are signal-rich talent
  • Founder LinkedIn enrichment — join names with LinkedIn to cold-message about strategic conversations
  • Competitor's portfolio acquisition — when a competitor goes on a YC-buying spree, the dataset surfaces the pattern

Investor Research & LP Reporting

LPs and emerging fund managers use YC dealflow as a benchmark for their own portfolios.

  • Sector exposure benchmarking — what % of YC's last 4 batches were AI vs your fund's exposure?
  • Geographic dealflow benchmarking — YC has 12% India; your fund has 2% — is that an opportunity or risk?
  • Vintage tracking — pull every YC batch, count Public + Acquired outcomes — compute YC's mortality and upside ratios by vintage
  • LP letter charts — embed YC market data as the "context" appendix in quarterly LP updates
  • Co-invest sourcing — identify YC Growth stage companies for late-stage co-invest deals

Sample Queries & Recipes

Recipe 1: Every AI YC startup actively hiring (recruiter goldmine)

{
"query": "AI",
"hiringOnly": true,
"statuses": ["Active"],
"maxRecords": 1000
}

Recipe 2: Full Winter 2024 batch — every company

{
"batches": ["Winter 2024"],
"maxRecords": 0
}

Recipe 3: B2B SaaS ICP for sales prospecting

{
"industries": ["B2B"],
"stages": ["Early", "Growth"],
"statuses": ["Active"],
"regions": ["United States of America"],
"hiringOnly": true,
"maxRecords": 1000
}

Recipe 4: YC's Top Companies list (Airbnb, Stripe, Coinbase, et al.)

{
"topCompaniesOnly": true,
"maxRecords": 0
}

Recipe 5: Fintech YC alumni in India

{
"industries": ["Fintech"],
"regions": ["India"],
"statuses": ["Active"]
}

Recipe 6: Climate-tech surge across recent batches

{
"query": "climate",
"batches": [
"Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
"Winter 2023", "Summer 2023"
],
"maxRecords": 0
}

Recipe 7: Full 5,916-company catalog via batch fan-out

{
"batches": [
"Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024",
"Winter 2023", "Summer 2023",
"Winter 2022", "Summer 2022",
"Winter 2021", "Summer 2021",
"Winter 2020", "Summer 2020",
"Winter 2019", "Summer 2019",
"Winter 2018", "Summer 2018",
"Winter 2017", "Summer 2017",
"Winter 2016", "Summer 2016",
"Winter 2015", "Summer 2015",
"Winter 2014", "Summer 2014",
"Winter 2013", "Summer 2013",
"Winter 2012", "Summer 2012",
"Winter 2011", "Summer 2011",
"Winter 2010", "Summer 2010",
"Winter 2009", "Summer 2009",
"Winter 2008", "Summer 2008",
"Winter 2007", "Summer 2007",
"Winter 2006", "Summer 2006",
"Summer 2005",
"IK12"
],
"maxRecords": 0,
"hitsPerPage": 1000,
"requestDelay": 300
}

Integration Examples

Google Sheets (via Apify Integration)

  1. Set up an Apify schedule running this actor weekly at 7:00 AM Monday
  2. Add the "Export to Google Sheets" integration to the schedule
  3. Receive a fresh YC company directory in your Sheet every Monday morning
  4. Build pivot tables: batch x industry, stage x region, isHiring counts over time

Make.com / Zapier / n8n

Use the Apify connector on Make, Zapier, or n8n. Trigger downstream workflows on:

  • New companies (this week's run minus last week's = newly-added YC companies)
  • Stage transitions (SeedEarly = recent fundraise signal — fire a Slack alert)
  • isHiring flips to true (new hiring season — push to your recruiter Slack)
  • New launches (launchedAt is within the last 7 days — push to your Twitter scheduler)

Power BI / Tableau / Looker

Connect Apify's REST API as a data source. Refresh on the Apify schedule. Build dashboards covering:

  • YC batch size evolution over 18 years
  • Industry distribution per batch (the AI surge visualized)
  • Geographic dealflow heatmaps
  • Top Companies progression — who graduated to topCompany in the last quarter?

Postgres / Snowflake / BigQuery

Use the Apify webhook integration to POST run results directly to a data warehouse ingestion endpoint after every scheduled run. Suggested schema:

CREATE TABLE yc_companies (
company_id BIGINT PRIMARY KEY,
name TEXT,
slug TEXT,
yc_profile_url TEXT,
website TEXT,
one_liner TEXT,
long_description TEXT,
logo_url TEXT,
location TEXT,
regions JSONB,
batch TEXT,
status TEXT,
stage TEXT,
team_size INTEGER,
industry TEXT,
subindustry TEXT,
industries JSONB,
tags JSONB,
top_company BOOLEAN,
is_hiring BOOLEAN,
nonprofit BOOLEAN,
former_names JSONB,
launched_at TIMESTAMPTZ,
launched_at_unix BIGINT,
scraped_at TIMESTAMPTZ
);
CREATE INDEX idx_yc_batch ON yc_companies(batch);
CREATE INDEX idx_yc_industry ON yc_companies(industry);
CREATE INDEX idx_yc_is_hiring ON yc_companies(is_hiring) WHERE is_hiring = TRUE;

Salesforce / HubSpot CRM Enrichment

Trigger an Apify run weekly, then upsert against Account records keyed on website or companyId. Stage transitions can auto-create Tasks; new Top Company designations can trigger Opportunity stage changes.

Webhooks → Slack / Discord

Pipe the actor's defaultDataset through an Apify webhook into your Slack channel. Recruiters get a daily "Today's newly-hiring YC companies" post. Sales gets a weekly "New YC fintech ICP additions" digest.


Major Markets & Regional Coverage

YC's portfolio is global. Below is a rough distribution of YC companies by region with significance notes.

RegionYC PresenceSignificance
United States of America~3,800 companiesThe core — San Francisco, NYC, LA, Boston, Seattle, Austin, Miami
Europe~600 companiesLondon, Berlin, Paris, Amsterdam, Stockholm, Madrid, Lisbon
India~500 companiesBengaluru, Mumbai, Delhi NCR, Hyderabad — fast-growing YC region
Latin America~400 companiesSão Paulo, Mexico City, Buenos Aires, Bogotá, Santiago
Canada~200 companiesToronto, Vancouver, Montreal, Waterloo
Asia (ex-India)~250 companiesSingapore, Tokyo, Seoul, Jakarta, Manila
Africa~120 companiesLagos, Nairobi, Cape Town, Cairo
Australia / New Zealand~100 companiesSydney, Melbourne, Auckland
Middle East~60 companiesDubai, Tel Aviv, Riyadh
Remote-firstgrows every batchDistributed teams, no HQ

Tip: Combine regions filter with the H1B Visa Database to surface US-based YC employers who actively sponsor H-1Bs — pure gold for international recruiter outreach.


Cost & Performance

MetricValue
EngineDirect Algolia REST API (got-scraping HTTP) — no browser
Runtime (50 records, simple query)~5 seconds
Runtime (1,000 records, single filter combination)~10 seconds
Runtime (full ~5,916-company catalog via batch fan-out)~5 minutes
Cost per default run~0.001 Compute Units (typically less than $0.01)
Cost per full-catalog run~0.01 CU (typically less than $0.05)
Pricing modelPay-per-event (transparent per-record pricing)
Data freshnessLive at runtime — YC's Algolia index is continuously refreshed
Auth requiredNone — uses YC's public ycdc_public Algolia key
Proxy requiredNone — Algolia public endpoint has no anti-bot
ConcurrencySafe to run multiple parallel filtered configurations
Memory footprint256 MB minimum, 1024 MB max — no scraping browser, low RAM

  • Public data only — every field returned by this actor is published by Y Combinator at ycombinator.com/companies under their public ycdc_public Algolia tag, which is the same tag YC uses to expose data to their own client-side search UI
  • No PII / no personal data — the dataset describes companies, not individuals. Founder names are not in this dataset. (For founder-level enrichment, consume YC's company page separately.)
  • No emails, no phone numbers — the actor does not return any contact information
  • Respectful of YC's infrastructure — the actor uses low concurrency (1 in-flight Algolia call), configurable requestDelay (default 300ms), and 3-attempt exponential backoff. It is explicitly built to be a polite citizen.
  • YC's robots.txt does not block /companies and the underlying Algolia endpoint is unauthenticated and intentionally public
  • Algolia ToS — the ycdc_public secured key is issued by YC for client-side use; it self-restricts to tagFilters=["ycdc_public"] and restrictIndices=YCCompany_production,YCCompany_By_Launch_Date_production
  • GDPR / CCPA — this dataset contains no EU or California resident personal data; company-level facts are not personal data under either regulation
  • No commercial guarantees — fields, schemas, and Algolia keys are controlled by YC and may change without notice; the actor's normalization layer is built to handle most schema drift gracefully

Important: Use of this dataset for unsolicited bulk communications must comply with CAN-SPAM, TCPA, GDPR, CCPA, and the YC website ToS. The actor publisher is not responsible for downstream misuse.


Frequently Asked Questions

How fresh is the data?

YC updates its Algolia index continuously — new companies appear within hours of being announced. The actor hits Algolia live on every run, so the data is as fresh as YC publishes it.

How many companies will I get?

As of the current snapshot, 5,916 companies are in YC's directory. A default run with no filters returns 500 records (the per-run cap). To pull the full catalog, list every batch in the batches input (~50 batches since IK12) — this fans out into multiple queries and breaks the Algolia 1,000-per-query cap.

Why does Algolia cap a single query at 1,000 results?

It's a security feature of the secured API key YC issues to their client-side search UI. Single-query result depth is capped to prevent bulk scraping via the public key. The actor works around this by fanning out across batches — each batch query is < 300 results so each one returns the full batch.

Does this scraper require login or API keys to YC?

No. The actor uses YC's own public Algolia key — the same one your browser uses when you visit ycombinator.com/companies. You only need an Apify account to run the actor.

Does this scrape ycombinator.com HTML?

No. The actor talks directly to the Algolia search REST API that powers the YC site. This is faster, more reliable, and respectful of YC's web servers (zero impact on ycombinator.com).

Does the actor return founder names or emails?

No. Founder information is not in YC's Algolia index — only company-level metadata. Combine with sibling actors like SEEK for jobs or Levels.fyi for comp data to enrich.

Are inactive / shut-down YC companies included?

Yes. Set statuses: ["Inactive"] to filter to wound-up companies, or leave statuses empty to get every status (Active, Acquired, Public, Inactive).

Can I filter by year of YC participation?

Yes — use the batches filter with cohort names like "Winter 2024", "Summer 2023", etc. To get all of 2024, list ["Winter 2024", "Summer 2024", "Spring 2024", "Fall 2024"].

What's the difference between industry, subindustry, and industries?

  • industry is the top-level YC category (e.g., "B2B")
  • subindustry is the more granular vertical with hierarchy syntax (e.g., "B2B -> Sales")
  • industries is the array of all industry tags the company carries — often the most useful for filtering

Can I get the YC Top Companies list?

Yes — set topCompaniesOnly: true. This returns YC's curated list of unicorns and best exits (Airbnb, Stripe, Coinbase, DoorDash, Reddit, Twitch, Instacart, Brex, Rappi, GitLab, Faire, et al.).

Does the actor deduplicate?

Yes. Within a single run the actor dedups by Algolia companyId, so even when multiple batch fan-out queries surface the same company (rare, but possible if companies span batches), only one record is saved.

Is proxy or residential IP required?

No. The Algolia public endpoint has no anti-bot or rate-limiting on the ycdc_public tag. You can attach Apify Proxy via proxyConfiguration if your network policy requires it, but it's pure overhead.

How do I pull the entire 5,916-company catalog?

Use Recipe 7 above — list every batch from "Winter 2024" back through "Summer 2005" and "IK12". The fan-out completes in ~5 minutes.

Can I run this on a schedule automatically?

Yes — Apify's built-in Scheduler lets you trigger this actor on any cron expression. Weekly or daily runs work well for change-detection workflows. Combine with webhooks for fully automated pipelines.

Does this actor work with the Apify Free Plan?

Yes — full functionality on the free tier. A default 500-record run costs a fraction of a Compute Unit. The full 5,916-company fan-out still fits within the free monthly CU budget.

What formats can I export the data in?

JSON, CSV, Excel (XLSX), HTML, XML, JSONLines, and RSS — directly from the Apify dataset view. The API also supports streaming for large datasets.

What happens if Algolia returns zero results?

The actor explicitly calls Actor.fail() with a helpful error message ("No records. Try clearing all filters, or check that batch/industry/status spellings match YC's exactly (case-sensitive).") so you never get a silent SUCCEEDED with an empty dataset.

Is the data accurate?

The data is exactly what YC publishes on their own directory — same source, same recency. If a company's status or stage looks wrong, that's YC's directory; the actor does not modify or filter beyond what you request.

How do I report a bug or request a feature?

Open an issue on the Apify Store actor page or contact the developer directly through the Apify Console.


Whether you're enriching YC company data with hiring intel, comp data, federal funding, or B2B verification, these sibling actors are designed to compose:


Comparison vs. Alternatives

ApproachSetup timeCoverageData freshnessCost (5,916 records)Schema normalizationProxy needed
This actor< 1 minute5,916+ companiesLive at runtime< $0.05Built-inNo
Manual ycombinator.com browsingHours/daysLimited by attention spanLiveFreeNoneNo
Headless browser scrape (Puppeteer)1-2 days devFullLive$1-5 per run (CU cost)DIYOptional
Custom Algolia client4-8 hours devFull (if you handle 1000-cap)LiveFree + infraDIYNo
Paid startup database (PitchBook, CB Insights)Days to onboardVast (not just YC)Daily$1,000-50,000/yearBuilt-inN/A
LinkedIn Sales NavigatorHoursYC alumni only inferredLive$99-149/seat/moNoneN/A

Why Pay-Per-Event Pricing?

Most data scrapers either charge a flat monthly subscription (you pay even if you don't use it) or per-Compute-Unit (unpredictable). This actor uses pay-per-event pricing, which means:

  • You only pay when the actor runs
  • Charges scale with how much data you actually consume
  • Transparent, line-item billing inside Apify
  • No monthly minimums or annual commitments
  • Free to evaluate — sample 50 records for pennies before committing to a full catalog pull
  • Predictable cost-per-record — easy to forecast scrape budget for procurement

Changelog

VersionDateNotes
1.0.02026-05Initial public release — direct Algolia API integration, batch fan-out for >1,000 results, 3-attempt retry with exponential backoff, Actor.fail() on zero records, full ISO-8601 timestamp normalization, pay-per-event pricing

Keywords

Y Combinator scraper · YC companies database · YC startup directory scraper · Y Combinator API alternative · ycombinator.com/companies scraper · YC batch directory · startup directory scraper · funded startups scraper · B2B SaaS prospecting · VC portfolio scraper · YC alumni directory · Algolia startup search · YC Algolia API · YC Winter 2024 directory · YC Summer 2024 batch scraper · YC Top Companies list scraper · YC unicorn list · YC hiring scraper · YC isHiring filter · YC company API · funded startup lead generation · ICP list builder YC · YC fintech startups · YC AI startups · YC dev tools directory · YC healthcare startups · YC India directory · YC Latin America directory · YC growth-stage companies · YC seed-stage prospects · YC acquisition target list · YC competitor mapping · YC recruiter intelligence · YC executive search · YC ABM list · YC LinkedIn audience seed · YC investor research · startup database API · startup intelligence platform · founder outreach data · YC alumni hiring · YC batch trends · post-funding ICP scraper · YC company logo URLs · Apify YC actor · Haketa YC scraper


Support

  • Bug reports: Use the Issues tab on the Apify Store page
  • Feature requests: Same place — please describe the use case and the input combination you'd like to see supported
  • Direct contact: Through the Apify developer profile haketa

If this actor saves you time or unlocks a new workflow, a 5-star rating on the Apify Store helps other sales, recruiting, VC, and research teams discover it. Thank you!