๐ŸŒฑ Greenhouse Jobs Scraper avatar

๐ŸŒฑ Greenhouse Jobs Scraper

Pricing

from $1.50 / 1,000 job listing records

Go to Apify Store
๐ŸŒฑ Greenhouse Jobs Scraper

๐ŸŒฑ Greenhouse Jobs Scraper

Scrape every open job at any Greenhouse company. Title, location, departments, remote flag, comp, seniority, posted date, hiring-velocity signal, drop-into-LLM card. Watchlist mode emits only new jobs. Export, run via API, schedule, or integrate with other tools.

Pricing

from $1.50 / 1,000 job listing records

Rating

0.0

(0)

Developer

Skootle

Skootle

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

Greenhouse Jobs Scraper

TL;DR

BD reps and recruiters waste 20 minutes a day clicking through Greenhouse boards at the 200 companies they are tracking. One run pulls every open job at every company on your list, normalized, with a hiring-velocity signal per company. Watchlist mode emits only jobs that appeared since your last run, so a daily schedule replaces a daily manual sweep. Export to your CRM or ATS in one call.

Try it on a small dataset, then let us know what you think in a review.


What does Greenhouse Jobs do?

Greenhouse Jobs pulls every open role from the public Greenhouse job board of any company you point it at and returns the data normalized and ready to use. You hand it a list of board slugs (the part after boards.greenhouse.io/ on the company's careers page, like stripe, airbnb, figma, discord, vercel), and it walks each board's full job catalog in one call per company.

Each open job comes back with the title, the resolved company name, the location, the departments and offices it ladders under, whether the role is remote, the seniority bucket parsed from the title, any salary range mentioned in the description, the original posted/updated timestamp, the full job description as plain text and HTML, and a compact drop-into-LLM card you can paste straight into an agent prompt or a Slack channel.

Alongside the job records you also get a one-row hiring-velocity summary per company: how many jobs are open right now, how many were posted or updated in the last 7 days, and a cold / steady / hot velocity hint so you can sort your watchlist by who is actually scaling. This is the signal sales-intel and recruitment-marketing platforms charge a five-figure annual contract for; you get it as part of every run.

Switch on watchlist mode and the actor stores a rolling 50,000-job-ID window in the key-value store and emits only the IDs it has not seen before. Schedule the run hourly or daily and the dataset becomes your real-time new-jobs feed for outbound, sourcing, or AI agents.

Why scrape Greenhouse?

Greenhouse hosts the careers pages for most of the venture-backed, Series A+ companies you care about. The careers page is the single source of truth for what a company is hiring for right now: who they are looking for, where, at what level, and how fast they are growing the team. That signal is fresher than LinkedIn (where postings lag and duplicates dominate), more complete than any third-party aggregator (which sample), and more accurate than press-release-style "we raised, we're hiring" announcements.

The problem: every Greenhouse board lives at a different URL, paginates differently in the browser, and ships descriptions as HTML-encoded HTML. Tracking 100 companies by hand is impossible. Tracking 1,000 with a script means writing the same boilerplate every week. We give you a single endpoint that returns the entire catalog, normalized, with the hiring-velocity signal already computed.

Who needs this?

  • BD reps prospecting fast-growing startups based on hiring signal, who want a daily list of companies that just opened 5+ new roles.
  • Recruiters sourcing engineers, designers, or PMs at companies that just raised, with one click to see every open role across their target list.
  • VC scouts tracking team growth at portfolio companies and competitors, who need a weekly delta of new hires.
  • Competitive intel teams tracking what stack, regions, and roles competitors are hiring for, so they can predict product direction.
  • Sales-intel platforms enriching company records with hiring velocity, remote percentage, and department distribution.
  • Recruitment marketing teams measuring time-to-fill across a portfolio of brands.
  • AI agents that auto-apply on behalf of candidates, auto-recommend matched roles, or auto-route warm intros based on hiring signal.
  • Talent ops at engineering-heavy companies, who need to benchmark their open-req mix against direct competitors.

How to use Greenhouse Jobs

  1. Open the actor in Apify Console and click Try for free.
  2. Paste one or more Greenhouse board slugs into boardTokens. The slug is the last path segment on the company's Greenhouse careers page. Example: boards.greenhouse.io/stripe -> use stripe.
  3. Optionally narrow with keywords (title substrings), locations, departments, or remoteOnly.
  4. Optionally flip watchlistMode to true for monitoring schedules; the first run is the baseline, every later run emits only new jobs.
  5. Set maxItems, then click Start. Results appear in the dataset within seconds for most boards.

How much will this cost?

Pricing is per result (one record per job, plus one summary row per company) plus a small per-run start fee. The price per result drops as you upgrade your Apify plan.

PlanPer-run startPer result500 results
FREE$0.001$0.003$1.50
BRONZE$0.001$0.0025$1.25
SILVER$0.001$0.002$1.00
GOLD$0.001$0.0015$0.75
PLATINUM$0.001$0.0015$0.75
DIAMOND$0.001$0.0015$0.75

Typical day, watching 50 companies (~3,000 open roles total, ~50 new per day with watchlist mode on): under $0.20 a day on FREE, well under a dollar a week.

The Greenhouse Job Board API returns the same public data that any visitor sees on the company's careers page. No authentication is required, no rate-limit bypasses are used, and the actor honors the published endpoint's response cadence. The data covered (open jobs, locations, departments) is public-by-design - companies publish it specifically so candidates and partners can see it.

For commercial redistribution (reselling the data, embedding it in a public product, or republishing job descriptions verbatim) consult your own counsel; the source companies retain copyright on job descriptions. For internal use - prospecting, sourcing, market research, AI agents working on your behalf - this is exactly the use case Greenhouse's public API was designed for.

Examples

Track every Stripe role:

{ "boardTokens": ["stripe"] }

Engineering-only across three AI-forward companies:

{
"boardTokens": ["stripe", "anthropic", "scaleai"],
"departments": ["Engineering"]
}

Remote-only senior roles across a watchlist:

{
"boardTokens": ["airbnb", "figma", "vercel", "linear"],
"remoteOnly": true,
"keywords": ["senior", "staff", "principal"]
}

Daily monitor mode (new jobs only since last run):

{
"boardTokens": ["stripe", "airbnb", "figma"],
"watchlistMode": true,
"maxItems": 1000
}

Location-targeted recruiter list:

{
"boardTokens": ["stripe", "airbnb"],
"locations": ["New York", "Brooklyn"],
"keywords": ["engineer"]
}

VC scout sweep across 20 portfolio companies:

{
"boardTokens": ["companyA", "companyB", "companyC"],
"maxItems": 5000
}

Input parameters

FieldTypeRequiredDefaultDescription
boardTokensstring[]yes-Greenhouse board slugs to scrape.
keywordsstring[]no[]Case-insensitive title substring match.
locationsstring[]no[]Case-insensitive substring match on location string.
departmentsstring[]no[]Case-insensitive substring match on department names.
remoteOnlybooleannofalseKeep only roles whose location/title/department mentions remote.
watchlistModebooleannofalseEmit only jobs whose IDs are new since the previous run.
maxItemsintegerno10Maximum job records to save (max 5,000).
proxyConfigurationobjectno{}Not required for Greenhouse. Leave at default.

Greenhouse job output format

Two record types share the dataset. Filter on recordType to separate them.

greenhouse_job

FieldTypeDescription
outputSchemaVersionstringLiteral 2026-05-11. Bumps on breaking changes.
recordTypestringLiteral greenhouse_job.
jobIdintegerGreenhouse internal job ID. Stable primary key.
boardTokenstringThe board slug you provided (e.g. stripe).
companystringResolved company name from the Greenhouse board root.
titlestringJob title.
absoluteUrlstringDirect link to the public posting.
requisitionIdstring | nullCompany-internal requisition ID, when shipped.
locationobject{ name, normalized, isRemote }.
departmentsstring[]Flat department names.
officesstring[]Flat office names.
metadataobject[]{ name, value, valueType } array as shipped by Greenhouse (custom fields).
descriptionTextstringPlain-text description, HTML stripped, max 12,000 chars.
descriptionHtmlstringOriginal HTML description, decoded, max 30,000 chars.
seniorityenum | nullHeuristic-parsed from title: intern, entry, mid, senior, staff, principal, lead, director, vp, executive.
compRangeobject{ min, max, currency, period }. Heuristic-parsed from description.
postedAtstringISO 8601 timestamp from Greenhouse's updated_at.
scrapedAtstringISO 8601 timestamp of the run.
fieldCompletenessScoreinteger0-100. Self-filter sparse rows.
agentMarkdownstring300-500 char drop-into-LLM card.
{
"outputSchemaVersion": "2026-05-11",
"recordType": "greenhouse_job",
"jobId": 7532733,
"boardToken": "stripe",
"company": "Stripe",
"title": "Account Executive, AI Sales",
"absoluteUrl": "https://stripe.com/jobs/search?gh_jid=7532733",
"requisitionId": "See Opening ID",
"location": { "name": "San Francisco, CA", "normalized": "San Francisco, CA", "isRemote": false },
"departments": ["Sales"],
"offices": ["San Francisco"],
"metadata": [],
"descriptionText": "Who we are. About Stripe. Stripe is a financial infrastructure platform...",
"descriptionHtml": "<h2>Who we are</h2>...",
"seniority": null,
"compRange": { "min": null, "max": null, "currency": null, "period": null },
"postedAt": "2026-05-08T17:59:17-04:00",
"scrapedAt": "2026-05-11T00:00:00.000Z",
"fieldCompletenessScore": 82,
"agentMarkdown": "**Account Executive, AI Sales** at Stripe\n- ๐Ÿ“ San Francisco, CA\n- ๐Ÿงญ Sales\n- ๐Ÿ“… Updated 2026-05-08\n- ๐Ÿ”— https://stripe.com/jobs/search?gh_jid=7532733"
}

company_summary

FieldTypeDescription
recordTypestringLiteral company_summary.
boardTokenstringBoard slug.
companystringCompany name.
totalCountintegerOpen jobs found this run.
openedInLast7DaysintegerJobs whose updated_at is within the last 7 days.
velocityHintenumcold < 5%, steady 5-14%, hot >= 15% of jobs updated this week.
jobsByDepartmentobject[][{ name, count }], ranked.
jobsByLocationobject[][{ name, count }], ranked.
scrapedAtstringRun timestamp.
{
"outputSchemaVersion": "2026-05-11",
"recordType": "company_summary",
"boardToken": "stripe",
"company": "Stripe",
"totalCount": 494,
"openedInLast7Days": 38,
"velocityHint": "steady",
"jobsByDepartment": [{ "name": "Engineering", "count": 142 }, { "name": "Sales", "count": 76 }],
"jobsByLocation": [{ "name": "San Francisco, CA", "count": 88 }, { "name": "New York, NY", "count": 71 }],
"scrapedAt": "2026-05-11T00:00:00.000Z"
}

During the Actor run

The actor talks only to Greenhouse's public Job Board API. No proxy, no browser, no auth. Typical board returns in under 2 seconds and a 50-board sweep finishes in well under a minute. The agent briefing markdown lands at AGENT_BRIEFING in the default key-value store, and the rolling watchlist state lives at WATCHLIST_STATE so you can clear it from the Console anytime to start a new baseline.

FAQ

How is this different from Greenhouse's careers RSS feeds?

The RSS feed gives you titles and links. Greenhouse Jobs gives you the full description (text and HTML), the parsed seniority bucket, the comp range when shipped, the department and office structure, every metadata field, the hiring-velocity signal per company, and an LLM-ready summary card. RSS is fine for "show me titles"; this is fine for everything else.

Can I monitor only new jobs?

Yes. Set watchlistMode: true. The first run records every current job ID as the baseline (and emits zero rows to the dataset on new boards seen for the first time, depending on your schedule). Every subsequent run emits only IDs not yet in the rolling 50,000-ID window. The state persists in the actor's key-value store between runs.

Will this work with Lever or Workable boards?

Not directly - those are different ATS systems with different APIs. A dedicated Lever scraper is shipping next from the same Skootle portfolio with the same record shape and watchlist pattern; Workable is on the roadmap. Watch the Other Skootle actors section below for links.

Why does this cost more than free Greenhouse scrapers I see?

The free ones I have audited return a thin slice (title, URL, maybe location) and break when the board ships a new metadata field or when Greenhouse changes the response shape. They do not give you the hiring-velocity signal, the seniority parsing, the comp parsing, or the watchlist mode. If you are using this for one-off curiosity, save the money. If you are feeding it into a CRM, an outbound sequence, or an AI agent, the per-result fee at GOLD ($0.0015) is a rounding error compared to the time you spend rebuilding your pipeline when the free one silently goes empty.

Can I use this with Python, n8n, or Make?

Yes. Apify exposes every actor as a REST endpoint and provides SDKs for Python, JavaScript, and CLI. n8n and Make have official Apify nodes; trigger a run, wait for completion, then read the dataset.

How many companies can I track in one run?

There is no hard cap on boardTokens. We have tested up to 100 boards in a single run; cost scales linearly with results, not with board count. For watchlist mode at scale, lean on maxItems to bound a single run and schedule frequently.

Why is the seniority field sometimes null?

Seniority is parsed heuristically from the title (Senior, Staff, Principal, Director, etc.). Titles like "Software Engineer" with no level word return null on purpose - we would rather give you null than a guess. Filter on seniority IS NOT NULL in your downstream query when you need only graded roles.

Why is compRange sometimes empty?

Salary disclosure depends on the company and the job's location (US states like CA, CO, NY require it; many engineering job descriptions in other locations omit it entirely). We parse the most common patterns ($X - $Y per year, USD X-Y, $X-$Yk) but do not synthesize ranges when the description does not state one.

Does watchlist mode survive an actor restart?

Yes. The state is written to the default key-value store under the WATCHLIST_STATE key. To reset the baseline, delete that key from the Console.

Can I get historical jobs (closed/expired)?

The Greenhouse public API only exposes currently open postings. Historical jobs are not reachable through this endpoint. For longitudinal hiring history at a company, run this actor on a schedule and warehouse the dataset; we keep the jobId as a stable primary key so backfilling deduplicates cleanly.

What rate limit does Greenhouse enforce?

The public endpoint serves the cached board snapshot and is generous in practice - we routinely scrape 50+ boards back-to-back without throttle. If you point this at 500+ boards in one run we recommend splitting into multiple scheduled runs to be polite.

Why choose Greenhouse Jobs

  • Hiring-velocity signal per company in every run - find who is actually scaling, not who has had the same 4 reqs open for 90 days.
  • Watchlist / monitor mode out of the box - schedule it daily and get a new-jobs delta feed with zero plumbing.
  • One record shape across companies - downstream code does not care whether you point it at Stripe, Vercel, or a 12-person seed startup; the schema is identical.
  • Sub-minute typical runtime for single-board sweeps; full 50-board watchlist refresh in under 2 minutes.
  • Seniority and comp parsing at the record level so you can filter without writing the regex.
  • Agent-ready output - agentMarkdown per job, AGENT_BRIEFING.md per run, both drop straight into an LLM context or a Slack channel.
  • Cross-ATS coverage roadmap - Lever, Workable, Ashby shipping next from the same portfolio with the same record shape, so your pipeline does not fork by ATS.
  • Versioned schema and idempotent primary keys - jobId is stable across runs, and the schema version is on every record so breaking changes never silently corrupt your warehouse.

Your feedback

Hit a bug or want a feature? Open an issue on the Issues tab rather than the reviews page, and we'll fix it fast (typically within 48 hours).

Other Skootle actors you might want to check

  • Wellfound Jobs Scraper - hiring data from Wellfound (formerly AngelList Talent), including org updates and team signals.
  • Lever Jobs Scraper - shipping soon, same record shape as Greenhouse Jobs.
  • SAM.gov Federal Contracts - federal contract opportunities for B2G prospecting, enriched with USAspending award history.
  • SEC EDGAR Filings - 8-K, 10-K, S-1 filings as a funding/scale signal for sales-intel pipelines.

Support and contact