Y Combinator [Only $1๐ฐ] Jobs & Companies scraper
Pricing
from $1.00 / 1,000 results
Y Combinator [Only $1๐ฐ] Jobs & Companies scraper
๐ฐ $1/1K One actor for Y Combinator jobs (Work at a Startup) and companies (Startup Directory). Paste any YC URL โ auto-routed โ or use filters. Companies via Algolia: no proxy, clean schema. Optional founder enrichment: LinkedIn/Twitter URLs, company socials, open jobs. Full batch history to 2005.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Muhamed Didovic
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
20 hours ago
Last modified
Categories
Share
Y Combinator Scraper
Y Combinator data, structured. Jobs, companies, founders, socials โ one actor, no proxy needed for companies.
Scrape both surfaces of ycombinator.com from a single Apify actor: jobs (Work at a Startup) and companies (Startup Directory). Auto-routes any YC URL to the right scraper, or compose a query from filters.
How it works

โจ Why use this scraper?
- Two surfaces, one actor. Jobs (Work at a Startup) and Companies (Startup Directory) in the same dataset, distinguishable by row shape (
jobIdvsslug). - Companies fetch via YC's public Algolia index โ no HTML parsing, no proxy required, ~2 s for 100 rows. Clean, structured fields with no scraping artefacts (no leaked alt text, no concatenated values).
- Rich output schemas. ~33 fields per job (salary parsed into min/max/currency, equity, founders with bios, JSON-LD
datePosted). ~26 fields per company (batch, industries, regions, team size, stage, status, top-company / hiring / nonprofit flags). - Optional founder enrichment with proper
name/titleseparation (e.g.Brian Chesky/Founder/CEO, not split-on-whitespace), plus LinkedIn + Twitter URLs per founder and company-level socials (linkedin, twitter, facebook, crunchbase, github). - Optional open-jobs enrichment per company, with cleanly separated
title/salary/location/equity/experiencefields. - Multi-keyword company discovery โ pass several keywords; each runs as a separate Algolia search and results merge by company id with dedupe.
- Full YC batch history back to Summer 2005.
Overview
Built for recruiters, sourcers, BD/sales teams, investors, and anyone doing market research on YC-backed startups. The actor produces a heterogeneous dataset: each row is either a job posting or a company profile. You can run jobs and companies in the same job (mix URLs of both kinds), and tell them apart in downstream tooling by the presence of jobId (jobs) vs slug (companies).
Companies-mode goes through YC's Algolia search API โ fast, no proxy, no HTML parsing. Jobs-mode uses Crawlee + Cheerio against YC's server-rendered job listing/detail pages.
Supported inputs
Jobs URLs
| Pattern | What it does |
|---|---|
/jobs | YC's curated jobs index (~20 jobs) |
/jobs/role/{role} | All jobs in a role (software-engineer, designer, product-manager, operations, marketing, sales-manager, recruiting-hr, support, science) |
/jobs/role/{role}/{location} | Role + location (san-francisco, new-york, los-angeles, seattle, austin, chicago, india, remote) โ location applied locally because YC filters it client-side |
/jobs/location/{location} | Location-only listing |
/companies/{co}/jobs/{job} | A single job-detail page |
Companies URLs
| Pattern | What it does |
|---|---|
/companies | All companies (paginated through Algolia) |
/companies?batch=โฆ&industry=โฆ&query=โฆ&isHiring=true&top_company=true&minEmployeeSize=10%2B&maxEmployeeSize=100 | Companies search with any combination of filters |
/companies/{slug} | Single-company lookup (e.g. โฆ/companies/airbnb) |
Filter form (when no URLs)
mode = jobs (default) or companies. Then the matching filter set:
- ๐ผ Jobs:
role,location. - ๐ข Companies:
queries[],topCompany,isHiring,nonprofit,batch[],industries[],regions[],minEmployeeSize,maxEmployeeSize.
๐ฏ Use cases
| Team | Typical use |
|---|---|
| Recruiters / talent sourcing | Pull active YC job postings filtered by role + city, watch for new postings in monitoringMode |
| Investors / VCs | Track company batches, stages, and team sizes across every YC cohort back to 2005 |
| BD / sales | Build a target list of YC companies by industry, region, employee size; enrich with founder LinkedIns for outreach |
| Founder / market research | Discover companies by keyword across the full directory; find similar / competitive companies to map a market |
| Data engineering / pipelines | Schedule daily runs into a warehouse for YC startup intelligence; monitoringMode keeps datasets incremental |
- You provide YC URLs (Option A) or filters (Option B). Mix Jobs and Companies URLs freely; the actor routes each one.
- Jobs URLs hit a Crawlee CheerioCrawler. The listing's inlined
jobPostingsJSON is parsed; the location slug from the URL is applied as a local substring filter against each job's location string (YC's listings filter location client-side, so we compensate). - Companies URLs hit YC's public Algolia index (
YCCompany_production). All Algolia filters compose with AND across attributes / OR within attribute. Multi-keyword runs union the results and dedupe byobjectID. - Optional company enrichment. When
scrapeFounderDetailsorscrapeOpenJobsis on, the actor fetches/companies/{slug}and/or/companies/{slug}/jobsper row and merges the parsed founders/socials/openJobs into the company row. Concurrency = 5. - Output to dataset. Job rows and company rows go into the same dataset; jobs are also exported as
data.csv/data.json.
Banner source is readme-stuff/how-it-works-yc-v1.svg โ edit the SVG and re-rasterize when you change the copy. A 2ร retina version is at readme-stuff/how-it-works-yc-v1@2x.png (this is the version hosted at the URL above). The hosted PNG is served from a GitHub Pages repo so it renders in both GitHub and the Apify Console.
Quick start
// Recent SF software-engineering jobs{ "mode": "jobs", "role": "software-engineer", "location": "san-francisco", "maxItems": 50 }// Top hiring B2B companies from the last two batches, 100-500 employees{ "mode": "companies", "batch": ["Spring 2026", "Winter 2026"], "industries": ["B2B"],"minEmployeeSize": "100+", "maxEmployeeSize": "500", "isHiring": true, "topCompany": true,"maxItems": 100 }// Multi-keyword company discovery (results merged + deduped by company id){ "mode": "companies", "queries": ["dev tools", "observability", "feature flags"], "maxItems": 50 }// Companies + founder/social/jobs enrichment (one extra HTTP per company per toggle){ "mode": "companies", "batch": ["Spring 2026"], "isHiring": true,"scrapeFounderDetails": true, "scrapeOpenJobs": true, "maxItems": 25 }// Or skip the form and paste any YC URL โ the actor auto-routes{ "startUrls": ["https://www.ycombinator.com/jobs/role/software-engineer/san-francisco","https://www.ycombinator.com/companies?batch=Spring%202026&industry=Healthcare","https://www.ycombinator.com/companies/airbnb"], "maxItems": 50 }
Input configuration
The form is two collapsible sections โ alternatives, not steps:
Three top-level sections. Option A and Option B are alternatives; Run options apply to both regardless of which you pick.
- Option A | Search by URL ๐ โ Jobs or Companies (recommended) โ
startUrls[]. If non-empty, Option B is ignored. - Option B | Configure with Filters ๐๏ธ โ used only when Option A is empty. Field titles inside the panel are prefixed by category:
Modeโ"jobs"(default) or"companies".๐ผ Jobs ยท โฆโrole,location.๐ข Companies ยท โฆโqueries[],topCompany,isHiring,nonprofit,batch[],industries[],regions[],minEmployeeSize,maxEmployeeSize.๐ Companies Enrich ยท โฆโscrapeFounderDetails,scrapeOpenJobs(each adds one HTTP per company; concurrency 5).
- Run options โ๏ธ โ applies to both โ shared run-time settings:
maxItemsโ output cap (max records to return, default 100). Applies to both modes.maxPagesโ pagination depth (max listing pages per URL, default 10). Jobs scraper only; Companies paginates Algolia automatically.monitoringMode,maxConcurrency,minConcurrency,maxRequestRetries,proxyโ Jobs scraper only. Companies hits Algolia directly and ignores them.
Multi-select filters (batch[], industries[], regions[]) use AND across attributes, OR within. The "All โฆ" sentinel values (default) are stripped before the Algolia query so leaving them = no filter.
Non-paying users are capped at 100 items and have monitoringMode disabled.
Note on YC's location filtering
YC's listing pages include a location segment in the URL but apply that filter client-side โ the SSR'd jobPostings JSON is role-filtered only. This actor compensates by parsing the location slug out of the URL and applying a substring filter against each job's location string locally. Special cases: remote matches anything containing "Remote"; india matches the country (/\b(india|IN)\b/i). Cities use a case-insensitive substring match against the slug with - replaced by space.
A job listed as "San Francisco, CA, US / Remote (US)" matches both the san-francisco and remote slugs.
Output overview
Heterogeneous dataset โ job rows and company rows have different shapes but share the same dataset. Distinguish by:
- presence of
jobId(jobs) vsslug(companies) - or the
urlpath (/companies/{co}/jobs/{job}vs/companies/{slug}).
Output samples
Job row (truncated for display)
Captured with startUrls=["https://www.ycombinator.com/jobs/role/software-engineer/san-francisco"], maxItems=1:
{"jobId": "gD334As-systems-engineer","title": "Systems Engineer","url": "https://www.ycombinator.com/companies/substack/jobs/gD334As-systems-engineer","companyName": "Substack","companySlug": "substack","companyUrl": "https://www.ycombinator.com/companies/substack","ycBatch": "W18","jobType": "Full-time","roleCategory": "Engineering","roleSubcategory": "Devops","salaryRange": "$185K - $225K","salaryMin": 185000,"salaryMax": 225000,"salaryCurrency": "USD","equity": "","location": "San Francisco, CA, US / New York, NY, US","postedAgo": "2 days","experience": "6+ years","visaSponsorship": "","description": "Substack is building a new economic engine for culture โฆ","descriptionHtml": "<p>Substack is building โฆ</p>","companyDescription": "Start a newsletter. Build your community. โฆ","companyFounded": 2017,"companyTeamSize": 90,"companyStatus": "Active","companyLocation": "San Francisco","founders": [{ "name": "Chris Best", "role": "Co-founder & CEO at Substackโฆ" },{ "name": "Hamish McKenzie", "role": "COO" },{ "name": "Jairaj Sethi", "role": "๐๐ฝ" }],"datePosted": "2025-09-11T18:28:34Z","companyWebsite": "https://substack.com","scrapedAt": "2026-05-02T10:49:39.558Z"}
Company row, with both enrichment toggles on (truncated)
Captured with startUrls=["https://www.ycombinator.com/companies/airbnb"], scrapeFounderDetails=true, scrapeOpenJobs=true:
{"id": 271,"slug": "airbnb","name": "Airbnb","url": "https://www.ycombinator.com/companies/airbnb","batch": "Winter 2009","industry": "Consumer","subindustry": "Consumer -> Travel, Leisure and Tourism","industries": ["Consumer", "Travel, Leisure and Tourism"],"regions": ["United States of America", "America / Canada"],"allLocations": "San Francisco, CA, USA","oneLiner": "Book accommodations around the world.","teamSize": 6132,"status": "Public","stage": "Growth","topCompany": true,"isHiring": false,"nonprofit": false,"launchedAt": "2012-01-17T09:00:56.000Z","website": "http://airbnb.com","tags": ["Marketplace", "Travel"],"formerNames": [],"appVideoPublic": false,"demoDayVideoPublic": false,"founders": [{"name": "Brian Chesky", "title": "Founder/CEO","linkedinUrl": "https://www.linkedin.com/in/brianchesky/","twitterUrl": "https://twitter.com/bchesky","isActive": true, "hasEmail": true},{ "name": "Nathan Blecharczyk", "title": "Founder/CTO", "linkedinUrl": "โฆ", "twitterUrl": "โฆ" },{ "name": "Joe Gebbia", "title": "Founder/CPO", "linkedinUrl": "โฆ", "twitterUrl": "โฆ" }],"socials": {"linkedin": "https://www.linkedin.com/company/airbnb/","twitter": "https://twitter.com/Airbnb","facebook": "https://www.facebook.com/airbnb/","crunchbase": "https://www.crunchbase.com/organization/airbnb"},"openJobs": [],"scrapedAt": "2026-05-02T10:49:25.606Z"}
Output fields
Jobs row (~33 fields)
| Field | Description |
|---|---|
jobId | YC's job id, e.g. gD334As-systems-engineer. |
title, url | Posting title and absolute URL (https://www.ycombinator.com/companies/{co}/jobs/{id}). |
companyName, companySlug, companyUrl, companyTagline | Company display name, slug, profile URL, one-liner. |
ycBatch | YC batch (e.g. S21, W18). |
jobType, roleCategory, roleSubcategory | E.g. Full-time, Engineering, Devops. |
salaryRange | Raw display string from the listing ("$185K - $225K"). |
salaryMin, salaryMax, salaryCurrency | Parsed numeric range and currency (USD/GBP/EUR/INR). |
equity | Equity range string from YC. |
location | Job location string from YC. |
postedAgo | Relative time from listing (e.g. "2 days"). |
applyUrl | Direct apply link (the YC OAuth bridge to workatastartup.com). |
experience | Min experience requirement. |
visaSponsorship | "Will sponsor" if offered, else empty. |
description, descriptionHtml | Full description as text and as HTML (preserves newlines). |
companyDescription, companyFounded, companyTeamSize, companyStatus, companyLocation | Company metadata. |
founders | Array of { name, role } โ role falls back through founder_bio โ title โ "Founder". |
datePosted | ISO datetime from the embedded JobPosting JSON-LD. |
companyWebsite, companyLogo | External website and YC small-logo URL. |
scrapedAt | ISO timestamp of when the row was produced. |
Companies row (~26 fields, plus enrichment fields)
| Field | Description |
|---|---|
id, slug, name | YC company id, slug, display name. |
url | Profile URL (https://www.ycombinator.com/companies/{slug}). |
batch | E.g. "Winter 2009", "Spring 2026". |
industry, subindustry, industries[] | Top-level industry, second-level subindustry, full industries array. |
regions[] | HQ region tags. |
allLocations | Free-form location string. |
oneLiner, longDescription | Short tagline and full description. |
teamSize | Headcount or null. |
status | "Active" / "Acquired" / "Public" / "Inactive" / etc. |
stage | YC's stage label: "Early" / "Growth" / "Public" / etc. |
topCompany, isHiring, nonprofit | Booleans. |
launchedAt | ISO datetime (converted from epoch). |
website | External company website. |
logo | Small-logo URL from YC's S3. |
tags[] | Free-form YC tags. |
formerNames[] | Past company names if YC has them. |
appVideoPublic, demoDayVideoPublic | Whether YC's videos are publicly available. |
scrapedAt | ISO timestamp. |
With scrapeFounderDetails: true:
| Field | Description |
|---|---|
founders[] | { name, title, bio, linkedinUrl, twitterUrl, avatarUrl, isActive, hasEmail }. name and title are properly separated. |
socials | { linkedin, twitter, facebook, crunchbase, github } (non-empty values only). |
appVideoUrl, demoDayVideoUrl | Direct video URLs when YC has them and they're public. |
With scrapeOpenJobs: true:
| Field | Description |
|---|---|
openJobs[] | { jobId, title, url, type, roleCategory, roleSubcategory, salaryRange, equity, location, experience, applyUrl, postedAgo, visaSponsorship }. |
Monitoring mode
When monitoringMode is enabled (jobs only), the actor only emits jobs whose numeric id has not been seen in previous runs by the same Apify user. Useful for:
- Tracking new YC job postings as they appear
- Building a historical archive without re-scraping
- Keeping downstream notifications free of duplicates
The actor maintains a per-user Key-Value store keyed YC-JOBS-SEEN-{apifyUserId}. On each run with monitoringMode: true, every listing job is checked against this store; new ids are added (with a small stub: id, url, title) and only those new jobs are enqueued for detail scraping. Reset by deleting the corresponding KV store from the Apify console.
monitoringMode is currently jobs-only. Companies-mode dedup is planned but not implemented.
Local development
npm installnpm run start:dev # runs src/main.ts via tsxnpm run build # tsc to dist/npm run lint # eslint src/
Local input is read from storage/key_value_stores/default/INPUT.json. For fully isolated runs set APIFY_LOCAL_STORAGE_DIR and CRAWLEE_STORAGE_DIR to a temp path.
Limitations / known gaps
- Single-company URL uses Algolia full-text query + post-filter to exact slug match (slug isn't in YC's filterable attributes). Returns 1 row max regardless of
maxItems. - Companies mode ignores
monitoringModeโ dedup is jobs-only for now. - Mixed start URLs (jobs + companies):
maxItemsis shared. Companies first, jobs gets the remainder. - Multi-
queriesbudget is shared across keywords. IfmaxItemsis small, later keywords may not fire. - Companies enrichment cost โ
scrapeFounderDetailsandscrapeOpenJobseach add one HTTP per company (concurrency = 5). For 100 companies, expect ~10โ20 seconds extra per toggle.
โ FAQ
Can I scrape both jobs and companies in the same run?
Yes. Mix any /jobs/... and /companies... URLs in startUrls; each one auto-routes. With maxItems shared, companies path consumes its share first and jobs gets the remainder.
Does the Companies path need a proxy? No. It hits YC's public Algolia index directly, no rate-limit issues, no IP blocks. The proxy setting only applies to the Jobs scraper.
What does the Jobs location slug actually filter on?
A case-insensitive substring of the slug (with - โ ) against each job's location string. remote matches anything containing "Remote"; india matches the country code IN or the word India. YC applies that filter client-side, so we replicate it locally.
Can I get founders' LinkedIn URLs?
Yes โ set scrapeFounderDetails: true. Each founder row includes linkedinUrl and twitterUrl parsed from the inlined company JSON on /companies/{slug}.
Does scrapeOpenJobs overlap with running Jobs mode?
Different surface. scrapeOpenJobs enriches a company row with that company's open postings. Jobs mode (or a /jobs/... URL) returns each job as its own row in the dataset.
What's monitoringMode and when should I use it?
Jobs-only flag that dedupes against a per-user KV store of seen job ids. Use it for scheduled runs that should only emit new postings.
Can I use the actor with a single-company URL like /companies/airbnb?
Yes. It returns exactly one row regardless of maxItems because slug isn't a filterable Algolia attribute โ the actor full-text-searches the slug and post-filters to an exact match.
How do I control cost?
Use maxItems strictly. Companies-mode hits Algolia paginated 100 at a time; jobs-mode follows pagination up to maxPages. Disable both enrichment toggles if you don't need founders/socials/jobs โ they add HTTP per row.
Support
- File an issue or a feature request via the Apify Console Issues tab on the actor page.
- Custom integrations (different output shape, additional filters, scheduled feeds into a warehouse) โ open an issue describing the use case.
- The repo source is open in src/ โ
main.tsorchestrates dispatch,lib/ycScrape.tshandles jobs HTML parsing,lib/ycCompanies.tsis the Algolia client + enrichment.
License
ISC. See package.json.