π YC Companies Directory β Y Combinator Alumni DB
Pricing
from $50.00 / 1,000 yc companies
π YC Companies Directory β Y Combinator Alumni DB
Scrape the Y Combinator company directory β every batch S05 to today. Returns id, name, slug, website, oneLiner, longDescription, batch + batchYear, status, team size, industries, regions, locations, hiring flag, top-company badge, logo. VC sourcing, BD, recruiting, journalism.
Pricing
from $50.00 / 1,000 yc companies
Rating
0.0
(0)
Developer
Stephan Corbeil
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 hours ago
Last modified
Categories
Share
π YC Companies Directory Scraper β Y Combinator alumni database β every batch, every status, every company
The YC Companies Directory Scraper pulls Y Combinator's complete alumni directory β every company that's ever been through YC, across every batch from S05 to the most recent. You get batch, status (active / acquired / public / dead), location, team size, founders, vertical/industry tags, one-liner, demo-day link, website, and social links. Ideal for VCs sourcing follow-on opportunities, sales reps building YC-only prospect lists, and researchers studying startup outcomes.
Why YC Companies Directory Scraper Beats Y Combinator, Crunchbase Pro, and Crunchbase
| Source | Price | What you get |
|---|---|---|
| Y Combinator (official directory) | Free | Manual browse, no bulk export |
| Crunchbase Pro | $588/mo (Starter) | Broad coverage, but YC-specific filtering requires Pro+; throttled exports |
| Crunchbase (free) | Free | Heavily limited; no bulk export |
| PitchBook | $25K+/yr | Enterprise pricing, broad coverage |
| NexGenData YC Directory | PPE per company | YC-native fields: batch, status, location, team, vertical, demo-day link β bulk JSON |
What You Get
- Company name + canonical ycombinator.com/companies/{slug} URL
- Batch (W23, S22, IK12, X25, etc.)
- Status (Active / Acquired / Public / Inactive)
- Company one-liner + long description
- Location (city, state, country, region)
- Team size (when published)
- Vertical tags (Fintech, AI, Healthcare, Climate, etc.)
- Founders: name + LinkedIn + Twitter (when published)
- Company website URL
- Demo-day video URL (when public)
- Twitter / LinkedIn / Crunchbase URLs (when public)
- Year founded (when published)
- Has YC SUS / equity-only round flag (when present)
Use Cases
- VC follow-on sourcing β every 3 months, pull all YC companies from the last 2 batches in your verticals, dedupe against your CRM, route to investors.
- Sales prospecting β build an ICP-matched YC-only list (e.g. 'Fintech YC companies with 5-25 employees and a website') for cold outreach.
- Startup-outcome research β academics & journalists studying YC's hit-rate by batch, vertical, founder demographics.
- Acqui-hire pipeline β track companies with 'Inactive' status flips for talent acquisition opportunities.
- Founder benchmarking β first-time founders studying which verticals their YC batch is concentrated in.
- Service-business marketing β accountants, law firms, HR-tech, payroll-tech can target their YC-batch ICP precisely.
- Press research β journalists building YC anniversary or 'where are they now' coverage.
Quick Start
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run_input = {"batches": ["W26", "S25"],"statuses": ["Active"],"industries": ["Fintech", "AI"]}run = client.actor("nexgendata/yc-companies-directory-scraper").call(run_input=run_input)# Iterate resultsfor item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)# Or fetch all in one goitems = list(client.dataset(run["defaultDatasetId"]).iterate_items())print(f"Got {len(items)} rows")
You can also run from the Apify CLI:
apify call nexgendata/yc-companies-directory-scraper --input='{"batches": ["W26", "S25"],"statuses": ["Active"],"industries": ["Fintech", "AI"]}'
Or from the web console: open the actor page on Apify, click Try for free, paste the input JSON, hit Run. Results stream into the dataset which you can export as JSON / JSONL / CSV / Excel / HTML.
Scheduling
This actor pairs cleanly with Apify Scheduler (built into the platform) β schedule it hourly / daily / cron-style and dedupe results into your warehouse on the stable primary-key fields documented above. Webhook outputs are supported, so you can fire a Slack / Zapier / Make / n8n / your-own-API call the moment new rows materialize.
Integration patterns
- CRM enrichment: pipe rows directly into HubSpot / Salesforce / Pipedrive via Zapier or Make
- Warehouse: append to BigQuery / Snowflake / Postgres on a daily schedule via Apify β S3 β warehouse ingest
- LLM-ready RAG: each row is already JSON-flat; embed the plain-text body field and store in pgvector / Pinecone / Weaviate
- Slack alerts: filter by your trigger keyword and fire a Slack webhook for matches in real-time
Pricing
This actor runs on Apify's pay-per-event (PPE) model β you pay only for results, not run-time:
- $0.05 per YC company row β the primary event (one charge per row pushed to the dataset)
- 0.00005 USD per actor-start GB-event β actor start cost (one-time per run, sub-cent at typical memory)
No subscriptions, no minimums, no per-CPU-second charges. Apify's $5/month free tier covers most experiments. Browse 200+ buyer-intent actors at https://apify.com/nexgendata?fpr=2ayu9b
Cost worked example
A daily scheduled run pulling 500 fresh rows costs roughly:
- 500 rows Γ primary-event price (~$0.04-0.05) = $20-25
- 1 actor start Γ ~$0.00005 = negligible
So ~$20-25 per 500-row daily run, or ~$0.04-0.05 per row all-in. There are no surprise compute, storage, or proxy add-ons β proxy rotation is bundled into the per-row price.
Why pay-per-event beats time-based pricing
- Predictable: you know your cost from row count before the run starts
- Failure-safe: if a target site changes its HTML and the actor returns 0 rows, you pay 0 (vs paying for the CPU-seconds anyway under time-based pricing)
- Easy to attribute: 1 row = 1 unit cost, so per-customer / per-pipeline cost accounting is trivial
Sister Actors in the NexGenData Fleet
| Use case | Actor |
|---|---|
| Crunchbase news & funding announcements | crunchbase-news-scraper |
| AngelList startup discovery | angellist-startup-search |
| Techstars alumni & cohorts | techstars-companies-directory |
| 500 Global accelerator alumni | 500-global-companies-directory |
| SEC Form D private placement filings | sec-form-d-scraper |
| Indie Hackers solo & bootstrapped tracker | indie-hackers-products-tracker |
| Daily Product Hunt launch stream | product-hunt-launches-scraper |
| Show HN indie launch stream | hn-show-hn-tracker |
(All sister actors share the same PPE billing and Apify-standard JSON output, so you can compose multi-step pipelines without rewriting input/output adapters.)
FAQ
Q: How fresh is the data?
A: The official YC directory updates when YC updates the public-facing site. The actor pulls live data each run; we recommend a weekly schedule for monitoring batch additions.
Q: How many companies are there?
A: YC has funded 5,000+ companies. The actor returns all of them by default; filter by batch, status, or vertical to narrow.
Q: Can I get founder LinkedIn URLs?
A: When YC publishes them on the company page, yes. Not every company exposes them.
Q: Is scraping YC's directory legal?
A: The actor reads YC's public unauthenticated directory pages. Same as Crunchbase's free tier reads them. Use is at your discretion; most read-only competitive-intelligence use is widely accepted.
Q: Output format?
A: JSON, JSONL, CSV, Excel via Apify dataset export. Schema is stable.
Q: Can I monitor only new batches?
A: Yes β filter by batch (e.g. 'W26', 'S26') and schedule the actor to run after each Demo Day to dedupe new entries into your CRM.
Schema Stability & Versioning
This actor follows NexGenData's additive-only schema contract:
- New fields may be added at any time β they will simply appear as new keys in the JSON output, defaulting to
nullfor older runs. - Existing fields are never renamed or removed without a major-version bump and an advance changelog notice.
- Field semantics (units, timezones, value-sets) are never silently changed β if we need to change semantics, we add a new field with the new name and deprecate (but keep) the old one for at least 90 days.
This means you can build production pipelines on this actor and not worry about a Tuesday breaking a Friday's ETL job. If you spot an unexpected change, reach out via the actor's Apify Issues tab and we'll look at it the same day.
Compliance & Legal
- The actor reads public, unauthenticated pages the same way a logged-out browser does.
- All requests route through Apify's compliant residential-proxy infrastructure with polite rate limiting.
- You are responsible for ensuring your downstream use complies with the target site's Terms of Service, your jurisdiction's data-protection laws (GDPR, CCPA, UK DPA, etc.), and any sector-specific rules (HIPAA, PCI, etc.).
- We do not collect, store, or transmit credentials for the target site.
- Most read-only competitive-intelligence and lead-generation use is widely accepted. Consult counsel before bulk redistribution.
Support
Open an issue on the actor's Apify Issues tab β the NexGenData team responds within one business day. For feature requests (new fields, new input filters), include the use case so we can prioritize on it.
About NexGenData
NexGenData publishes 200+ buyer-intent actors covering SEC filings, YC alumni, Delaware DOC, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, ATS job boards, real-estate marketplaces, and more. All actors are pay-per-result and share a stable, additive-only JSON schema. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b
SEO: π YC Companies Directory β Y Combinator Alumni Database API