Y Combinator Scraper avatar

Y Combinator Scraper

Pricing

from $1.20 / 1,000 results

Go to Apify Store
Y Combinator Scraper

Y Combinator Scraper

Discover the Y Combinator Scraper, an efficient actor for scraping the YC companies directory. Easily extract detailed profiles, founder info, and batch data. Ideal for market research, lead generation, or investment analysis. For best results and to avoid blocks, residential proxies are best.

Pricing

from $1.20 / 1,000 results

Rating

3.5

(3)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

2

Bookmarked

23

Total users

7

Monthly active users

3 days ago

Last modified

Share

Y Combinator Companies Scraper

Extract structured company data from the Y Combinator directory in a fast, reliable format. Collect startup profiles, locations, batches, hiring signals, and optional founder/job details for research and analysis.

Features

  • Dynamic listing extraction — Collect companies from YC listing pages with pagination support.
  • Richer company fields — Enrich records with Algolia-backed fields such as industries, regions, stage, and launch date.
  • Optional detail depth — Enable founder and open-jobs scraping only when needed.
  • Flexible scope control — Use results limit, max pages, or scrape-all mode.
  • Clean dataset output — Ready for BI tools, spreadsheets, APIs, and automations.

Use Cases

Startup Market Research

Build a current dataset of YC companies to analyze sectors, company stages, and geography.

Talent and Hiring Intelligence

Track companies that are actively hiring and collect optional open role details.

Founder and Ecosystem Mapping

Enable founder extraction to map founder profiles and ecosystem relationships.

Batch and Cohort Analysis

Compare YC cohorts over time using batch, launch timing, and growth indicators.

Input Parameters

ParameterTypeRequiredDefaultDescription
urlStringNohttps://www.ycombinator.com/companies/industry/allYC listing URL to start from.
scrape_all_companiesBooleanNofalseIgnore result limit and scrape all available listings (bounded by max pages).
results_wantedIntegerNo20Maximum records to save when scrape-all is disabled.
max_pagesIntegerNo5Safety cap for listing pagination.
proxyConfigurationObjectNoApify Proxy ResidentialProxy settings for reliability.

Output Data

Each dataset item includes:

FieldTypeDescription
company_idStringYC company ID
slugStringCompany slug
company_nameStringCompany name
urlStringAbsolute YC company profile URL
short_descriptionStringOne-liner description
long_descriptionStringDetailed company description
batchStringYC batch label
statusStringCompany status
company_locationStringMain location
websiteStringCompany website
company_linkedinStringLinkedIn URL
company_xStringX/Twitter URL
team_sizeStringTeam size
marketStringMarket/industry
industriesArrayIndustry taxonomy
regionsArrayRegion taxonomy
stageStringCompany stage
top_companyBooleanTop-company marker
is_hiringBooleanHiring signal
nonprofitBooleanNonprofit flag
launched_atStringLaunch timestamp (ISO)
foundersArrayAlways empty array in current simplified mode
open_jobsArrayAlways empty array in current simplified mode
algolia_rawObject/nullInternal field currently set to null

Usage Examples

Basic Run

{
"results_wanted": 20,
"max_pages": 1
}

Batch Filter Run

{
"url": "https://www.ycombinator.com/companies/industry/all?batch=w2024",
"results_wanted": 50,
"max_pages": 3
}

Full Directory Scan

{
"scrape_all_companies": true,
"max_pages": 50
}

Sample Output

{
"company_id": "271",
"slug": "airbnb",
"company_name": "Airbnb",
"url": "https://www.ycombinator.com/companies/airbnb",
"short_description": "Book accommodations around the world.",
"long_description": "Founded in August of 2008 and based in San Francisco...",
"batch": "Winter 2009",
"status": "Public",
"company_location": "San Francisco, CA, USA",
"website": "http://airbnb.com",
"team_size": "6132",
"market": "Consumer",
"industries": ["Consumer", "Travel, Leisure and Tourism"],
"regions": ["United States of America", "America / Canada"],
"stage": "Growth",
"top_company": true,
"is_hiring": false,
"nonprofit": false,
"launched_at": "2012-01-17T22:34:16.000Z",
"founders": [],
"open_jobs": [],
"algolia_raw": null
}

Tips for Best Results

Start Small

  • Test with results_wanted: 20 and max_pages: 1.

Use Proxy for Stability

  • Residential proxy helps maintain reliable extraction on larger runs.

Integrations

  • Google Sheets — Export for quick analysis
  • Airtable — Build filtered company tables
  • Make / Zapier — Trigger downstream automations
  • Webhooks — Push fresh records into your systems

Frequently Asked Questions

Why are some fields null?

Some companies do not publish every field publicly. Partner and founding metadata may be missing for some records.

Is the Algolia key hardcoded?

No. The actor reads current Algolia credentials dynamically from YC page bootstrap on each run.

Can I scrape all companies?

Yes. Enable scrape_all_companies and set a suitable max_pages value.

Does it support batch-filter URLs?

Yes. You can pass YC listing URLs with query parameters and the actor will follow them.

Support

For bug reports or feature requests, use the Actor page issue/support channel.

Use this actor responsibly and ensure your usage complies with applicable laws and the source website terms.