Y Combinator Scraper
Pricing
from $1.20 / 1,000 results
Y Combinator Scraper
Discover the Y Combinator Scraper, an efficient actor for scraping the YC companies directory. Easily extract detailed profiles, founder info, and batch data. Ideal for market research, lead generation, or investment analysis. For best results and to avoid blocks, residential proxies are best.
Pricing
from $1.20 / 1,000 results
Rating
3.5
(3)
Developer

Shahid Irfan
Actor stats
2
Bookmarked
23
Total users
7
Monthly active users
3 days ago
Last modified
Categories
Share
Y Combinator Companies Scraper
Extract structured company data from the Y Combinator directory in a fast, reliable format. Collect startup profiles, locations, batches, hiring signals, and optional founder/job details for research and analysis.
Features
- Dynamic listing extraction — Collect companies from YC listing pages with pagination support.
- Richer company fields — Enrich records with Algolia-backed fields such as industries, regions, stage, and launch date.
- Optional detail depth — Enable founder and open-jobs scraping only when needed.
- Flexible scope control — Use results limit, max pages, or scrape-all mode.
- Clean dataset output — Ready for BI tools, spreadsheets, APIs, and automations.
Use Cases
Startup Market Research
Build a current dataset of YC companies to analyze sectors, company stages, and geography.
Talent and Hiring Intelligence
Track companies that are actively hiring and collect optional open role details.
Founder and Ecosystem Mapping
Enable founder extraction to map founder profiles and ecosystem relationships.
Batch and Cohort Analysis
Compare YC cohorts over time using batch, launch timing, and growth indicators.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url | String | No | https://www.ycombinator.com/companies/industry/all | YC listing URL to start from. |
scrape_all_companies | Boolean | No | false | Ignore result limit and scrape all available listings (bounded by max pages). |
results_wanted | Integer | No | 20 | Maximum records to save when scrape-all is disabled. |
max_pages | Integer | No | 5 | Safety cap for listing pagination. |
proxyConfiguration | Object | No | Apify Proxy Residential | Proxy settings for reliability. |
Output Data
Each dataset item includes:
| Field | Type | Description |
|---|---|---|
company_id | String | YC company ID |
slug | String | Company slug |
company_name | String | Company name |
url | String | Absolute YC company profile URL |
short_description | String | One-liner description |
long_description | String | Detailed company description |
batch | String | YC batch label |
status | String | Company status |
company_location | String | Main location |
website | String | Company website |
company_linkedin | String | LinkedIn URL |
company_x | String | X/Twitter URL |
team_size | String | Team size |
market | String | Market/industry |
industries | Array | Industry taxonomy |
regions | Array | Region taxonomy |
stage | String | Company stage |
top_company | Boolean | Top-company marker |
is_hiring | Boolean | Hiring signal |
nonprofit | Boolean | Nonprofit flag |
launched_at | String | Launch timestamp (ISO) |
founders | Array | Always empty array in current simplified mode |
open_jobs | Array | Always empty array in current simplified mode |
algolia_raw | Object/null | Internal field currently set to null |
Usage Examples
Basic Run
{"results_wanted": 20,"max_pages": 1}
Batch Filter Run
{"url": "https://www.ycombinator.com/companies/industry/all?batch=w2024","results_wanted": 50,"max_pages": 3}
Full Directory Scan
{"scrape_all_companies": true,"max_pages": 50}
Sample Output
{"company_id": "271","slug": "airbnb","company_name": "Airbnb","url": "https://www.ycombinator.com/companies/airbnb","short_description": "Book accommodations around the world.","long_description": "Founded in August of 2008 and based in San Francisco...","batch": "Winter 2009","status": "Public","company_location": "San Francisco, CA, USA","website": "http://airbnb.com","team_size": "6132","market": "Consumer","industries": ["Consumer", "Travel, Leisure and Tourism"],"regions": ["United States of America", "America / Canada"],"stage": "Growth","top_company": true,"is_hiring": false,"nonprofit": false,"launched_at": "2012-01-17T22:34:16.000Z","founders": [],"open_jobs": [],"algolia_raw": null}
Tips for Best Results
Start Small
- Test with
results_wanted: 20andmax_pages: 1.
Use Proxy for Stability
- Residential proxy helps maintain reliable extraction on larger runs.
Integrations
- Google Sheets — Export for quick analysis
- Airtable — Build filtered company tables
- Make / Zapier — Trigger downstream automations
- Webhooks — Push fresh records into your systems
Frequently Asked Questions
Why are some fields null?
Some companies do not publish every field publicly. Partner and founding metadata may be missing for some records.
Is the Algolia key hardcoded?
No. The actor reads current Algolia credentials dynamically from YC page bootstrap on each run.
Can I scrape all companies?
Yes. Enable scrape_all_companies and set a suitable max_pages value.
Does it support batch-filter URLs?
Yes. You can pass YC listing URLs with query parameters and the actor will follow them.
Support
For bug reports or feature requests, use the Actor page issue/support channel.
Legal Notice
Use this actor responsibly and ensure your usage complies with applicable laws and the source website terms.