Y Combinator Scraper with Founders & Emails avatar

Y Combinator Scraper with Founders & Emails

Pricing

from $1.00 / 1,000 startups

Go to Apify Store
Y Combinator Scraper with Founders & Emails

Y Combinator Scraper with Founders & Emails

Scrape the Y Combinator directory and get rich company profiles with socials, founder details + emails, hiring status/job links, and news mentions. Perfect for lead gen, market mapping, recruiting, and competitor tracking.

Pricing

from $1.00 / 1,000 startups

Rating

4.2

(3)

Developer

Fatih Tahta

Fatih Tahta

Maintained by Community

Actor stats

16

Bookmarked

162

Total users

21

Monthly active users

22 hours

Issues response

20 days ago

Last modified

Share

Y Combinator Directory Scraper with Founder Emails

Slug: fatihtahta/y-combinator-directory-scraper

This actor is built upon the ValidatedMails.com architecture for email enrichment workflows.

Overview

This actor collects structured Y Combinator company profiles, founder details (including emails), and related public context such as news and jobs when present.

It captures key company attributes like name, industry, batch, team size, locations, and status, along with social links and founder roles. The data is sourced from https://www.ycombinator.com/companies, a widely used directory for tracking YC-backed companies and market activity. Results are delivered as consistent JSON records, enabling repeatable analysis and reliable downstream use. The automation saves manual research time while keeping outputs structured and easy to process.

For broad YC directory searches, the actor automatically partitions large Algolia result sets by batch and subindustry so it can keep collecting beyond the search API's 1000-hit window.

Why Use This Actor

  • Market research / analytics: Build datasets of YC companies to analyze batches, industries, locations, and growth patterns.
  • Product & content teams: Discover companies by keyword or category to inform content planning and product positioning.
  • Developers / data engineering pipelines: Feed structured company records into analytics tools, warehouses, or internal directories.
  • Lead gen / enrichment: Identify founders and company context to enrich outreach and qualification workflows.
  • Monitoring / competitive tracking: Track changes in hiring status, public/private status, or category coverage over time.

Input Parameters

Provide any combination of URLs, queries, and filters…

ParameterTypeDescriptionDefault
topCompaniesbooleanWhen enabled, return only YC’s top companies.
isHiringbooleanWhen enabled, return only companies actively hiring.
nonprofitbooleanWhen enabled, return only companies marked as nonprofit.
queriesstring[]Keywords to discover companies (e.g., product, market, or location). Use when you don’t already have URLs.["AI Assistant"]
batchesstring[]Filter by YC batch. Allowed values: All Batches, Fall 2026, Summer 2026, Spring 2026, Winter 2026, Fall 2025, Summer 2025, Spring 2025, Winter 2025, Fall 2024, Summer 2024, Winter 2024, Summer 2023, Winter 2023, Summer 2022, Winter 2022, Summer 2021, Winter 2021, Summer 2020, Winter 2020, Summer 2019, Winter 2019, Summer 2018, Winter 2018, Summer 2017, Winter 2017, Summer 2016, Winter 2016, Summer 2015, Winter 2015, Summer 2014, Winter 2014, Summer 2013, Winter 2013, Summer 2012, Winter 2012, Summer 2011, Winter 2011, Summer 2010, Winter 2010, Summer 2009, Winter 2009, Summer 2008, Winter 2008, Summer 2007, Winter 2007, Summer 2006, Winter 2006, Summer 2005.["All Batches"]
industriesstring[]Filter by industry. Allowed values: All industries, B2B, Consumer, Fintech, Healthcare, Education, Industrials, Real Estate and Construction, Government, Unspecified.["All industries"]
regionsstring[]Filter by company region. Allowed values: Anywhere, America / Canada, Remote, Europe, South Asia, Latin America, Southeast Asia, Africa, Middle East and North Africa, East Asia, Oceania.["Anywhere"]
minEmployeeSizestringMinimum company size. Allowed values: 1+, 5+, 10+, 25+, 50+, 100+, 250+, 500+, 1000+."1+"
maxEmployeeSizestringMaximum company size. Allowed values: 1+, 5, 10, 25, 50, 100, 250, 500, 1000+."1000+"
limitintegerMaximum companies to save per query. Minimum: 10. Broad directory searches are internally partitioned to work past YC's 1000-hit search window.50000
get_foundersbooleanWhen enabled, save founder details in each company record. When disabled, omit founders and skip founder pay-per-event charges.true
getEmailsbooleanWhen enabled, include founder email permutations where available.true
includeRiskyEmailsbooleanInclude additional potential emails with lower confidence, labeled as verified or risky.true
proxyConfigurationobjectOptional connection settings to improve reliability on larger runs.{ "useApifyProxy": true, "apifyProxyGroups": [] }

Pricing

Pricing is based on saved founder and enrichment values. Founder records saved inside company records are charged at $1 per 1,000 founders. Founder email enrichment is charged at $2 per 1,000 discovered emails saved to the dataset.

For cost control, start with a small limit, inspect the dataset, and disable get_founders or getEmails when founder data or founder email discovery is not needed. If get_founders is disabled, founder details are omitted and founder/email PPE charges are skipped. If a run reaches its configured charge limit, company records can still be saved, but unpaid founder entries or email fields are omitted.

Example Input

{
"queries": ["climate", "fintech"],
"batches": ["Summer 2024"],
"industries": ["Fintech"],
"regions": ["America / Canada"],
"minEmployeeSize": "10+",
"get_founders": true,
"getEmails": true,
"includeRiskyEmails": false,
"limit": 250
}

Output

6.1 Output destination

The actor writes results to an Apify dataset as JSON records. And the dataset is designed for direct consumption by analytics tools, ETL pipelines, and downstream APIs without post-processing.

6.2 Record envelope (all items)

Every record includes stable identifiers:

  • type (string, required)
  • id (number, required)
  • url (string, required)

Recommended idempotency key: type + ":" + id. Use this key to deduplicate and upsert records when the same company appears across multiple runs or inputs.

6.3 Examples

Example: company (type = "company")

{
"type": "company",
"id": 731,
"url": "https://www.ycombinator.com/companies/oklo",
"name": "Oklo",
"one_liner": "Emission free, always on power from advanced fission power plants.",
"description": "About Oklo Inc.: Oklo Inc. (Oklo) is developing advanced fission power plants to provide emission-free, reliable, and affordable energy.",
"website": "http://oklo.com",
"status": "Public",
"is_hiring": false,
"company_details": {
"team_size": 50,
"stage": "Growth",
"industry": "Industrials",
"subindustry": "Industrials -> Energy",
"yc_batch": "Summer 2014",
"is_top_company": true,
"is_nonprofit": false,
"tags": [
"Small Modular Reactors",
"Climate"
]
},
"location": {
"city": "Santa Clara",
"country": "US",
"regions": [
"United States of America",
"America / Canada",
"Remote",
"Partly Remote"
],
"all_locations": "Santa Clara, CA, USA; Sunnyvale, CA, USA"
},
"links": {
"linkedin_url": "https://www.linkedin.com/company/oklo/",
"twitter_url": "http://www.twitter.com/oklo",
"facebook_url": "http://www.facebook.com/okloinc",
"jobs_url": "https://bookface.ycombinator.com/workatastartup",
"news_url": "https://bookface.ycombinator.com/company/731/company_news"
},
"founders": [
{
"name": "Jacob DeWitte",
"title": "Founder/CEO",
"linkedin_url": "http://linkedin.com/in/jacob-dewitte-90062132",
"email_available": true,
"email": "jacob@oklo.com",
"email_status": "verified",
"avatar_url": "https://bookface-images.s3.us-west-2.amazonaws.com/avatars/6147b6a370516ec8d93ef1234b50853dec23e50a.jpg",
"latest_yc_company": {
"name": "Oklo"
}
},
{
"name": "Caroline Cochran",
"title": "Founder",
"bio": "Caroline Cochran is the Co-Founder and Chief Operating Officer of Oklo Inc.",
"linkedin_url": "https://www.linkedin.com/in/caorilne",
"twitter_url": "https://www.twitter.com/caorilne",
"email_available": true,
"email": "caroline@oklo.com",
"email_status": "verified",
"avatar_url": "https://bookface-images.s3.us-west-2.amazonaws.com/avatars/db9d311bc6421042f8b6d63bd74ddf14772adb4f.jpg",
"latest_yc_company": {
"name": "Oklo"
}
}
],
"jobs": [],
"news": [
{
"headline": "Oklo’s microreactor project pipeline jumps 93% ahead of 2027 planned deployment | Utility Dive",
"source": "Utility Dive",
"publication_date": "Aug 15, 2024",
"url": "https://www.utilitydive.com/news/oklo-advanced-nuclear-microreactor-project-pipeline-nrc/724343/"
},
{
"headline": "Oklo starts trading on NYSE",
"publication_date": "May 10, 2024",
"url": "https://www.cnbc.com/2024/05/10/sam-altman-takes-nuclear-startup-oklo-public-to-power-ai-ambitions.html"
}
],
"source_context": {
"source_url": "https://www.ycombinator.com/companies?top_company=true",
"seed_type": "filters",
"seed_value": "default",
"company_slug": "oklo",
"scraped_at": "2026-01-16T17:44:49+00:00"
},
"data_quality": {
"completeness_flags": [
"missing founder bio",
"missing jobs"
]
}
}

Field reference

Company record fields (type = "company")

  • type (string, required): Record type.
  • id (number, required): Company identifier.
  • url (string, required): Canonical company page URL.
  • name (string, optional): Company name.
  • one_liner (string, optional): Short description.
  • description (string, optional): Extended description with markup stripped.
  • website (string, optional): Company website URL.
  • status (string, optional): Public/private status.
  • is_hiring (boolean, optional): Hiring status when listed.
  • company_details (object, optional): Year founded, team size, stage, industry, subindustry, YC batch, top-company flag, nonprofit flag, and tags.
  • location (object, optional): Headquarters city/country, regions, and aggregated source locations.
  • links (object, optional): Social profile URLs, jobs URL, news URL, Crunchbase/GitHub URLs, and a YC company URL only when it differs from url.
  • founders (array[object], optional): Founder records with name, title, bio, profile URLs, optional email fields, avatar URL, and latest YC company detail.
  • jobs (array[object], optional): Public job postings with title, role, employment type, location, remote flag, salary/equity ranges, skills, visa sponsorship, apply URL, job URL, and source date/age text.
  • news (array[object], optional): News coverage with headline, inferred source, publication date, and article URL.
  • source_context (object, optional): Crawl/request metadata such as source_url, seed_type, seed_value, company_slug, and scraped_at.
  • data_quality (object, optional): Completeness flags for optional sections such as founders, jobs, news, website, and full description.

Data guarantees & handling

  • Best-effort extraction: fields may vary by region/session/availability/UI experiments.
  • Optional fields: null-check in downstream code.
  • Deduplication: recommend type + ":" + id.

How to Run on Apify

  1. Open the Actor in Apify Console.
  2. Configure your search parameters (e.g., category/practice area, state/region, optional city).
  3. Set the maximum number of outputs to collect.
  4. Click Start and wait for the run to finish.
  5. Download results in JSON, CSV, Excel, or other supported formats.

Love that you caught it — yeah, that section reads like generic “insert automation here” filler.

Here’s a tailored replacement for your YC actor. You can drop this straight into the README in place of the current ## Scheduling & Automation block.


Scheduling & Automation

This actor is designed for continuous YC intelligence, not just one-off pulls.

Whether you’re tracking new batches, monitoring hiring companies, or running ongoing founder email enrichment, scheduling turns this into a live data pipeline.


Scheduling Recurring YC Snapshots

Use Apify Schedules to automatically re-run the actor with the same filters over time.

Common use cases:

  • 🗓 Track new companies in a specific batch (e.g., Summer 2026)
  • 📈 Monitor hiring YC companies in a region (e.g., America / Canada)
  • 🧠 Rebuild enriched founder-email datasets weekly
  • 🕵️ Watch specific industries (e.g., Fintech, Healthcare) for new entrants

How to set it up:

  1. Open the actor in Apify Console
  2. Go to Schedules → Create schedule
  3. Choose frequency (daily, weekly, or custom cron)
  4. Paste your production input JSON
  5. Enable notifications (optional)

Each run writes to a new dataset, allowing you to:

  • Compare snapshots over time
  • Track deltas (new companies, hiring changes, status changes)
  • Re-enrich previously missing emails

Automation & Downstream Workflows

This actor is structured for ETL, lead generation, and analytics pipelines.

Because each record includes stable identifiers (type, id), you can safely deduplicate and upsert into your database using:

type + ":" + id

Trigger automations immediately after a run finishes:

  • Push results into your CRM
  • Insert/update rows in a warehouse
  • Send founder emails to a verification pipeline
  • Trigger AI enrichment or scoring

Typical Integration Patterns

  • CRM sync: Upsert founders + companies into HubSpot, Salesforce, or internal tools
  • Lead generation: Extract founders with Email Available = true and feed into outreach systems
  • Warehouse ingestion: Stream dataset into BigQuery / Snowflake for batch-level analytics
  • Hiring alerts: Notify Slack when new YC companies are marked Hiring Status = true
  • Batch monitoring: Run per-batch schedules and compare growth patterns over time

Performance

Estimated run times:

  • Small runs (< 1,000 outputs): ~5–10 minutes
  • Medium runs (1,000–5,000 outputs): ~15–25 minutes
  • Large runs (5,000+ outputs): ~25–90 minutes

Execution time varies based on filters, result volume, and how much information is returned per record.

Compliance & Ethics

Responsible Data Collection

This actor collects publicly available startup, company, and founder metadata from Y Combinator for legitimate business, research, and analytical purposes, including:

  • Startup ecosystem research, trend analysis, and market mapping
  • Venture intelligence, portfolio analysis, and competitive landscape monitoring
  • Data enrichment workflows for internal databases, CRM systems, analytics dashboards, and research pipelines

The actor is designed to operate on non-authenticated, publicly accessible pages and does not attempt to bypass access controls.

This section is informational and not legal advice.

Best Practices

  • Use collected data in accordance with applicable laws, regulations, and the target site’s terms
  • Respect individual privacy and personal information
  • Use data responsibly and avoid disruptive or excessive collection
  • Do not use this actor for spamming, harassment, or other harmful purposes
  • Follow relevant data protection requirements where applicable (e.g., GDPR, CCPA)

Support

For help or troubleshooting, open an issue on the actor page in Apify Console. Include the input you used (redacted), the run ID, expected vs. actual behavior, and a small output sample if possible.