πŸš€ YC Companies Directory β€” Y Combinator Alumni DB avatar

πŸš€ YC Companies Directory β€” Y Combinator Alumni DB

Pricing

from $50.00 / 1,000 yc companies

Go to Apify Store
πŸš€ YC Companies Directory β€” Y Combinator Alumni DB

πŸš€ YC Companies Directory β€” Y Combinator Alumni DB

Scrape the Y Combinator company directory β€” every batch S05 to today. Returns id, name, slug, website, oneLiner, longDescription, batch + batchYear, status, team size, industries, regions, locations, hiring flag, top-company badge, logo. VC sourcing, BD, recruiting, journalism.

Pricing

from $50.00 / 1,000 yc companies

Rating

0.0

(0)

Developer

Stephan Corbeil

Stephan Corbeil

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 hours ago

Last modified

Share

πŸš€ YC Companies Directory Scraper β€” Y Combinator alumni database β€” every batch, every status, every company

The YC Companies Directory Scraper pulls Y Combinator's complete alumni directory β€” every company that's ever been through YC, across every batch from S05 to the most recent. You get batch, status (active / acquired / public / dead), location, team size, founders, vertical/industry tags, one-liner, demo-day link, website, and social links. Ideal for VCs sourcing follow-on opportunities, sales reps building YC-only prospect lists, and researchers studying startup outcomes.

Why YC Companies Directory Scraper Beats Y Combinator, Crunchbase Pro, and Crunchbase

SourcePriceWhat you get
Y Combinator (official directory)FreeManual browse, no bulk export
Crunchbase Pro$588/mo (Starter)Broad coverage, but YC-specific filtering requires Pro+; throttled exports
Crunchbase (free)FreeHeavily limited; no bulk export
PitchBook$25K+/yrEnterprise pricing, broad coverage
NexGenData YC DirectoryPPE per companyYC-native fields: batch, status, location, team, vertical, demo-day link β€” bulk JSON

What You Get

  • Company name + canonical ycombinator.com/companies/{slug} URL
  • Batch (W23, S22, IK12, X25, etc.)
  • Status (Active / Acquired / Public / Inactive)
  • Company one-liner + long description
  • Location (city, state, country, region)
  • Team size (when published)
  • Vertical tags (Fintech, AI, Healthcare, Climate, etc.)
  • Founders: name + LinkedIn + Twitter (when published)
  • Company website URL
  • Demo-day video URL (when public)
  • Twitter / LinkedIn / Crunchbase URLs (when public)
  • Year founded (when published)
  • Has YC SUS / equity-only round flag (when present)

Use Cases

  • VC follow-on sourcing β€” every 3 months, pull all YC companies from the last 2 batches in your verticals, dedupe against your CRM, route to investors.
  • Sales prospecting β€” build an ICP-matched YC-only list (e.g. 'Fintech YC companies with 5-25 employees and a website') for cold outreach.
  • Startup-outcome research β€” academics & journalists studying YC's hit-rate by batch, vertical, founder demographics.
  • Acqui-hire pipeline β€” track companies with 'Inactive' status flips for talent acquisition opportunities.
  • Founder benchmarking β€” first-time founders studying which verticals their YC batch is concentrated in.
  • Service-business marketing β€” accountants, law firms, HR-tech, payroll-tech can target their YC-batch ICP precisely.
  • Press research β€” journalists building YC anniversary or 'where are they now' coverage.

Quick Start

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run_input = {
"batches": ["W26", "S25"],
"statuses": ["Active"],
"industries": ["Fintech", "AI"]
}
run = client.actor("nexgendata/yc-companies-directory-scraper").call(run_input=run_input)
# Iterate results
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
# Or fetch all in one go
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Got {len(items)} rows")

You can also run from the Apify CLI:

apify call nexgendata/yc-companies-directory-scraper --input='{
"batches": ["W26", "S25"],
"statuses": ["Active"],
"industries": ["Fintech", "AI"]
}'

Or from the web console: open the actor page on Apify, click Try for free, paste the input JSON, hit Run. Results stream into the dataset which you can export as JSON / JSONL / CSV / Excel / HTML.

Scheduling

This actor pairs cleanly with Apify Scheduler (built into the platform) β€” schedule it hourly / daily / cron-style and dedupe results into your warehouse on the stable primary-key fields documented above. Webhook outputs are supported, so you can fire a Slack / Zapier / Make / n8n / your-own-API call the moment new rows materialize.

Integration patterns

  • CRM enrichment: pipe rows directly into HubSpot / Salesforce / Pipedrive via Zapier or Make
  • Warehouse: append to BigQuery / Snowflake / Postgres on a daily schedule via Apify β†’ S3 β†’ warehouse ingest
  • LLM-ready RAG: each row is already JSON-flat; embed the plain-text body field and store in pgvector / Pinecone / Weaviate
  • Slack alerts: filter by your trigger keyword and fire a Slack webhook for matches in real-time

Pricing

This actor runs on Apify's pay-per-event (PPE) model β€” you pay only for results, not run-time:

  • $0.05 per YC company row β€” the primary event (one charge per row pushed to the dataset)
  • 0.00005 USD per actor-start GB-event β€” actor start cost (one-time per run, sub-cent at typical memory)

No subscriptions, no minimums, no per-CPU-second charges. Apify's $5/month free tier covers most experiments. Browse 200+ buyer-intent actors at https://apify.com/nexgendata?fpr=2ayu9b

Cost worked example

A daily scheduled run pulling 500 fresh rows costs roughly:

  • 500 rows Γ— primary-event price (~$0.04-0.05) = $20-25
  • 1 actor start Γ— ~$0.00005 = negligible

So ~$20-25 per 500-row daily run, or ~$0.04-0.05 per row all-in. There are no surprise compute, storage, or proxy add-ons β€” proxy rotation is bundled into the per-row price.

Why pay-per-event beats time-based pricing

  • Predictable: you know your cost from row count before the run starts
  • Failure-safe: if a target site changes its HTML and the actor returns 0 rows, you pay 0 (vs paying for the CPU-seconds anyway under time-based pricing)
  • Easy to attribute: 1 row = 1 unit cost, so per-customer / per-pipeline cost accounting is trivial

Sister Actors in the NexGenData Fleet

Use caseActor
Crunchbase news & funding announcementscrunchbase-news-scraper
AngelList startup discoveryangellist-startup-search
Techstars alumni & cohortstechstars-companies-directory
500 Global accelerator alumni500-global-companies-directory
SEC Form D private placement filingssec-form-d-scraper
Indie Hackers solo & bootstrapped trackerindie-hackers-products-tracker
Daily Product Hunt launch streamproduct-hunt-launches-scraper
Show HN indie launch streamhn-show-hn-tracker

(All sister actors share the same PPE billing and Apify-standard JSON output, so you can compose multi-step pipelines without rewriting input/output adapters.)

FAQ

Q: How fresh is the data?

A: The official YC directory updates when YC updates the public-facing site. The actor pulls live data each run; we recommend a weekly schedule for monitoring batch additions.

Q: How many companies are there?

A: YC has funded 5,000+ companies. The actor returns all of them by default; filter by batch, status, or vertical to narrow.

Q: Can I get founder LinkedIn URLs?

A: When YC publishes them on the company page, yes. Not every company exposes them.

Q: Is scraping YC's directory legal?

A: The actor reads YC's public unauthenticated directory pages. Same as Crunchbase's free tier reads them. Use is at your discretion; most read-only competitive-intelligence use is widely accepted.

Q: Output format?

A: JSON, JSONL, CSV, Excel via Apify dataset export. Schema is stable.

Q: Can I monitor only new batches?

A: Yes β€” filter by batch (e.g. 'W26', 'S26') and schedule the actor to run after each Demo Day to dedupe new entries into your CRM.

Schema Stability & Versioning

This actor follows NexGenData's additive-only schema contract:

  • New fields may be added at any time β€” they will simply appear as new keys in the JSON output, defaulting to null for older runs.
  • Existing fields are never renamed or removed without a major-version bump and an advance changelog notice.
  • Field semantics (units, timezones, value-sets) are never silently changed β€” if we need to change semantics, we add a new field with the new name and deprecate (but keep) the old one for at least 90 days.

This means you can build production pipelines on this actor and not worry about a Tuesday breaking a Friday's ETL job. If you spot an unexpected change, reach out via the actor's Apify Issues tab and we'll look at it the same day.

  • The actor reads public, unauthenticated pages the same way a logged-out browser does.
  • All requests route through Apify's compliant residential-proxy infrastructure with polite rate limiting.
  • You are responsible for ensuring your downstream use complies with the target site's Terms of Service, your jurisdiction's data-protection laws (GDPR, CCPA, UK DPA, etc.), and any sector-specific rules (HIPAA, PCI, etc.).
  • We do not collect, store, or transmit credentials for the target site.
  • Most read-only competitive-intelligence and lead-generation use is widely accepted. Consult counsel before bulk redistribution.

Support

Open an issue on the actor's Apify Issues tab β€” the NexGenData team responds within one business day. For feature requests (new fields, new input filters), include the use case so we can prioritize on it.

About NexGenData

NexGenData publishes 200+ buyer-intent actors covering SEC filings, YC alumni, Delaware DOC, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, ATS job boards, real-estate marketplaces, and more. All actors are pay-per-result and share a stable, additive-only JSON schema. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b


SEO: πŸš€ YC Companies Directory β€” Y Combinator Alumni Database API