Y Combinator Scraper With Emails | $4.5 / 1K
Pricing
$4.49 / 1,000 companies
Y Combinator Scraper With Emails | $4.5 / 1K
Scrape the Y Combinator directory and get rich company profiles with socials, founder details + emails, hiring status/job links, and news mentions. Perfect for lead gen, market mapping, recruiting, and competitor tracking.
Pricing
$4.49 / 1,000 companies
Rating
4.6
(4)
Developer

Fatih Tahta
Actor stats
10
Bookmarked
63
Total users
4
Monthly active users
a day ago
Last modified
Categories
Share
Y Combinator Directory Scraper with Founder Emails
Slug: fatihtahta/y-combinator-directory-scraper
Overview
This actor collects structured Y Combinator company profiles, founder details (including emails), and related public context such as news and jobs when present. It captures key company attributes like name, industry, batch, team size, locations, and status, along with social links and founder roles. The data is sourced from https://www.ycombinator.com/companies, a widely used directory for tracking YC-backed companies and market activity. Results are delivered as consistent JSON records, enabling repeatable analysis and reliable downstream use. The automation saves manual research time while keeping outputs structured and easy to process.
Why Use This Actor
- Market research / analytics: Build datasets of YC companies to analyze batches, industries, locations, and growth patterns.
- Product & content teams: Discover companies by keyword or category to inform content planning and product positioning.
- Developers / data engineering pipelines: Feed structured company records into analytics tools, warehouses, or internal directories.
- Lead gen / enrichment: Identify founders and company context to enrich outreach and qualification workflows.
- Monitoring / competitive tracking: Track changes in hiring status, public/private status, or category coverage over time.
Input Parameters
Provide any combination of URLs, queries, and filters…
| Parameter | Type | Description | Default |
|---|---|---|---|
topCompanies | boolean | When enabled, return only YC’s top companies. | – |
isHiring | boolean | When enabled, return only companies actively hiring. | – |
nonprofit | boolean | When enabled, return only companies marked as nonprofit. | – |
queries | string[] | Keywords to discover companies (e.g., product, market, or location). Use when you don’t already have URLs. | ["AI Assistant"] |
batches | string[] | Filter by YC batch. Allowed values: All Batches, Spring 2026, Winter 2026, Fall 2025, Summer 2025, Spring 2025, Winter 2025, Fall 2024, Summer 2024, Winter 2024, Summer 2023, Winter 2023, Summer 2022, Winter 2022, Summer 2021, Winter 2021, Summer 2020, Winter 2020, Summer 2019, Winter 2019, Summer 2018, Winter 2018, Summer 2017, Winter 2017, Summer 2016, Winter 2016, Summer 2015, Winter 2015, Summer 2014, Winter 2014, Summer 2013, Winter 2013, Summer 2012, Winter 2012, Summer 2011, Winter 2011, Summer 2010, Winter 2010, Summer 2009, Winter 2009, Summer 2008, Winter 2008, Summer 2007, Winter 2007, Summer 2006, Winter 2006, Summer 2005. | ["All Batches"] |
industries | string[] | Filter by industry. Allowed values: All industries, B2B, Consumer, Fintech, Healthcare, Education, Industrials, Real Estate and Construction, Government, Unspecified. | ["All industries"] |
regions | string[] | Filter by company region. Allowed values: Anywhere, America / Canada, Remote, Europe, South Asia, Latin America, Southeast Asia, Africa, Middle East and North Africa, East Asia, Oceania. | ["Anywhere"] |
minEmployeeSize | string | Minimum company size. Allowed values: 1+, 5+, 10+, 25+, 50+, 100+, 250+, 500+, 1000+. | "1+" |
maxEmployeeSize | string | Maximum company size. Allowed values: 1+, 5, 10, 25, 50, 100, 250, 500, 1000+. | "1000+" |
limit | integer | Maximum companies to save per query. Minimum: 10. | 50000 |
getEmails | boolean | When enabled, include founder email permutations where available. | false |
includeRiskyEmails | boolean | Include additional potential emails with lower confidence, labeled as verified or risky. | true |
proxyConfiguration | object | Optional connection settings to improve reliability on larger runs. | { "useApifyProxy": true, "apifyProxyGroups": [] } |
Example Input
{"queries": ["climate", "fintech"],"batches": ["Summer 2024"],"industries": ["Fintech"],"regions": ["America / Canada"],"minEmployeeSize": "10+","getEmails": true,"includeRiskyEmails": false,"limit": 250}
Output
6.1 Output destination
The actor writes results to an Apify dataset as JSON records. And the dataset is designed for direct consumption by analytics tools, ETL pipelines, and downstream APIs without post-processing.
6.2 Record envelope (all items)
Every record includes stable identifiers:
- type (string, required)
- id (number, required)
- url (string, required)
Recommended idempotency key: type + ":" + id.
Use this key to deduplicate and upsert records when the same company appears across multiple runs or inputs.
6.3 Examples
Example: company (type = "company")
{"type": "company","id": 731,"url": "https://www.ycombinator.com/companies/oklo","Company General Info": {"Company Name": "Oklo","One-liner Description": "Emission free, always on power from advanced fission power plants.","Full Description": "About Oklo Inc.: \r\n\r\nOklo Inc. (Oklo) is developing advanced fission power plants to provide emission-free, reliable, and affordable energy. \r\n\r\nOklo received a Site Use Permit from the U.S Department of Energy, has performed successful prototypic fuel fabrication, was awarded fuel material from Idaho National Laboratory, developed the first advanced fission combined license application accepted and docketed by the U.S. Nuclear Regulatory Commission, and is developing advanced fuel recycling technologies in collaboration with the U.S. Department of Energy and national laboratories.\r\n\r\nOklo has been featured in Time, Newsweek, Wall Street Journal, CNBC, Popular Mechanics, Wired, Architectural Digest, Hyperallergic, POWER Magazine, has been the subject of a Harvard Business School case, and is featured in the Oliver Stone documentary Nuclear, among other features.","Website": "http://oklo.com","Year Founded": null,"Team Size": 50,"Company Stage": "Growth","Industry": "Industrials","Sub-industry": "Industrials -> Energy","YC Batch": "Summer 2014","Public / Private Status": "Public","Hiring Status": false,"Top Company": true,"Nonprofit": false,"Regions": ["United States of America","America / Canada","Remote","Partly Remote"],"Headquarters City": "Santa Clara","Headquarters Country": "US","All Known Locations": "Santa Clara, CA, USA; Sunnyvale, CA, USA","Tags / Categories": ["Small Modular Reactors","Climate"]},"Company Socials & External Links": {"Website": "http://oklo.com","LinkedIn": "https://www.linkedin.com/company/oklo/","Twitter / X": "http://www.twitter.com/oklo","Facebook": "http://www.facebook.com/okloinc","GitHub": null,"Crunchbase": "","YC Company Page URL": "https://www.ycombinator.com/companies/oklo","Jobs Page URL": "https://bookface.ycombinator.com/workatastartup","News Page URL": "https://bookface.ycombinator.com/company/731/company_news"},"Founders": [{"Label": "Founder 1","Founder Name": "Jacob DeWitte","Title / Role": "Founder/CEO","Bio": null,"LinkedIn URL": "http://linkedin.com/in/jacob-dewitte-90062132","Twitter URL": "","Email Available": true,"Email": "jacob@oklo.com","Email Status": "verified","Avatar Image URL": "https://bookface-images.s3.us-west-2.amazonaws.com/avatars/6147b6a370516ec8d93ef1234b50853dec23e50a.jpg","Latest YC Company": "Oklo"},{"Label": "Founder 2","Founder Name": "Caroline Cochran","Title / Role": "Founder","Bio": "Caroline Cochran is the Co-Founder and Chief Operating Officer of Oklo Inc., a company developing advanced fission cleantech for cleaner air and human development. She was one of the youngest recipients of the University of Oklahoma Regent's Alumni Award. Caroline received her S.M. in Nuclear Engineering from MIT, a B.A. in Economics and a B.S. in Mechanical Engineering from the University of Oklahoma.","LinkedIn URL": "https://www.linkedin.com/in/caorilne","Twitter URL": "https://www.twitter.com/caorilne","Email Available": true,"Email": "caroline@oklo.com","Email Status": "verified","Avatar Image URL": "https://bookface-images.s3.us-west-2.amazonaws.com/avatars/db9d311bc6421042f8b6d63bd74ddf14772adb4f.jpg","Latest YC Company": "Oklo"}],"Jobs": [],"Company News": [{"Headline": "Oklo’s microreactor project pipeline jumps 93% ahead of 2027 planned deployment | Utility Dive","Source": "Utility Dive","Publication Date": "Aug 15, 2024","Article URL": "https://www.utilitydive.com/news/oklo-advanced-nuclear-microreactor-project-pipeline-nrc/724343/"},{"Headline": "Oklo starts trading on NYSE","Source": null,"Publication Date": "May 10, 2024","Article URL": "https://www.cnbc.com/2024/05/10/sam-altman-takes-nuclear-startup-oklo-public-to-power-ai-ambitions.html"},{"Headline": "A Sam Altman-backed nuclear startup is going public via SPAC.","Source": null,"Publication Date": "Jul 11, 2023","Article URL": "https://www.axios.com/pro/climate-deals/2023/07/11/sam-altman-backed-nuclear-startup-to-go-public-via-500m-spac"}],"Metadata": {"YC Company ID": 731,"Company Slug": "oklo","Source URL": "https://www.ycombinator.com/companies?top_company=true","Seed Type": "filters","Seed Value": "default","Scrape Timestamp": "2026-01-16T17:44:49+00:00","Data Completeness Flags": ["missing founder bio","missing jobs"]}}
Field reference
Company record fields (type = "company")
- type (string, required): Record type.
- id (number, required): Company identifier.
- url (string, required): Canonical company page URL.
- Company General Info (object, optional): High-level company details.
- Company General Info.Company Name (string, optional): Company name.
- Company General Info.One-liner Description (string, optional): Short description.
- Company General Info.Full Description (string, optional): Extended description.
- Company General Info.Website (string, optional): Company website URL.
- Company General Info.Year Founded (number, optional): Year founded when available.
- Company General Info.Team Size (number, optional): Team size estimate.
- Company General Info.Company Stage (string, optional): Company stage.
- Company General Info.Industry (string, optional): Primary industry.
- Company General Info.Sub-industry (string, optional): More specific industry label.
- Company General Info.YC Batch (string, optional): YC batch.
- Company General Info.Public / Private Status (string, optional): Public or private status.
- Company General Info.Hiring Status (boolean, optional): Hiring status when listed.
- Company General Info.Top Company (boolean, optional): Top company flag.
- Company General Info.Nonprofit (boolean, optional): Nonprofit flag.
- Company General Info.Regions (array[string], optional): Regions list.
- Company General Info.Headquarters City (string, optional): HQ city.
- Company General Info.Headquarters Country (string, optional): HQ country code.
- Company General Info.All Known Locations (string, optional): Aggregated locations string.
- Company General Info.Tags / Categories (array[string], optional): Tags or categories.
- Company Socials & External Links (object, optional): External links and social profiles.
- Company Socials & External Links.Website (string, optional)
- Company Socials & External Links.LinkedIn (string, optional)
- Company Socials & External Links.Twitter / X (string, optional)
- Company Socials & External Links.Facebook (string, optional)
- Company Socials & External Links.GitHub (string, optional)
- Company Socials & External Links.Crunchbase (string, optional)
- Company Socials & External Links.YC Company Page URL (string, optional)
- Company Socials & External Links.Jobs Page URL (string, optional)
- Company Socials & External Links.News Page URL (string, optional)
- Founders (array[object], optional): Founder list.
- Founders.Label (string, optional): Founder label.
- Founders.Founder Name (string, optional): Founder name.
- Founders.Title / Role (string, optional): Title or role.
- Founders.Bio (string, optional): Founder bio.
- Founders.LinkedIn URL (string, optional)
- Founders.Twitter URL (string, optional)
- Founders.Email Available (boolean, optional): Email availability flag.
- Founders.Email (string, optional): Email address when present.
- Founders.Email Status (string, optional): Email confidence label.
- Founders.Avatar Image URL (string, optional)
- Founders.Latest YC Company (string, optional)
- Jobs (array[object], optional): Job listings when present.
- Company News (array[object], optional): News coverage.
- Company News.Headline (string, optional)
- Company News.Source (string, optional)
- Company News.Publication Date (string, optional)
- Company News.Article URL (string, optional)
- Metadata (object, optional): Run-level metadata.
- Metadata.YC Company ID (number, optional)
- Metadata.Company Slug (string, optional)
- Metadata.Source URL (string, optional)
- Metadata.Seed Type (string, optional)
- Metadata.Seed Value (string, optional)
- Metadata.Scrape Timestamp (string, optional)
- Metadata.Data Completeness Flags (array[string], optional)
Data guarantees & handling
- Best-effort extraction: fields may vary by region/session/availability/UI experiments.
- Optional fields: null-check in downstream code.
- Deduplication: recommend
type + ":" + id.
How to Run on Apify
- Open the Actor in Apify Console.
- Configure your search parameters (e.g., category/practice area, state/region, optional city).
- Set the maximum number of outputs to collect.
- Click Start and wait for the run to finish.
- Download results in JSON, CSV, Excel, or other supported formats.
Scheduling & Automation
Scheduling
Automated Data Collection Schedule runs to keep your YC company dataset fresh and consistent over time.
- Navigate to Schedules in Apify Console
- Create a new schedule (daily, weekly, or custom cron)
- Configure input parameters
- Enable notifications for run completion
- (Optional) Add webhooks for automated processing
Integration Options
- Webhooks: Trigger downstream actions when a run completes
- Zapier: Connect to 5,000+ apps without coding
- Make (Integromat): Build multi-step automation workflows
- Google Sheets: Export results to a spreadsheet
- Slack/Discord: Receive notifications and summaries
- Email: Send automated reports via email
Performance
Estimated run times:
- Small runs (< 1,000 outputs): ~5–10 minutes
- Medium runs (1,000–5,000 outputs): ~15–25 minutes
- Large runs (5,000+ outputs): ~25–90 minutes
Execution time varies based on filters, result volume, and how much information is returned per record.
Compliance & Ethics
Responsible Data Collection
This actor collects publicly available startup, company, and founder metadata from Y Combinator for legitimate business, research, and analytical purposes, including:
- Startup ecosystem research, trend analysis, and market mapping
- Venture intelligence, portfolio analysis, and competitive landscape monitoring
- Data enrichment workflows for internal databases, CRM systems, analytics dashboards, and research pipelines
The actor is designed to operate on non-authenticated, publicly accessible pages and does not attempt to bypass access controls.
This section is informational and not legal advice.
Best Practices
- Use collected data in accordance with applicable laws, regulations, and the target site’s terms
- Respect individual privacy and personal information
- Use data responsibly and avoid disruptive or excessive collection
- Do not use this actor for spamming, harassment, or other harmful purposes
- Follow relevant data protection requirements where applicable (e.g., GDPR, CCPA)
Support
For help or troubleshooting, open an issue on the actor page in Apify Console. Include the input you used (redacted), the run ID, expected vs. actual behavior, and a small output sample if possible.