Greenhouse Jobs Scraper — Company Job Boards
Pricing
Pay per event
Greenhouse Jobs Scraper — Company Job Boards
Extract live job postings from any company's Greenhouse board (boards.greenhouse.io) via the official public API. Pass one or more company slugs and get clean JSON: title, department, location, posting date and apply URL. No login, no proxies, no HTML parsing — pure API reliability.
Scrape any company's Greenhouse job board through the official public JSON API — pass company slugs, get clean structured postings. No login, no proxies, no breakage.
Input
| Field | Type | Default | Description |
|---|---|---|---|
companies | array (required) | — | Company slugs as used on boards.greenhouse.io/<slug> (full board URLs also accepted). |
keyword | string | — | Case-insensitive substring match on the job title. |
titleExclude | array | — | Drop postings whose title contains any of these substrings (case-insensitive). |
locationFilter | string | — | Case-insensitive substring match on the location. |
postedSince | integer | — | Keep only postings first published within this many days. Postings with no posting date are dropped when this is set. |
remoteOnly | boolean | false | Keep only postings flagged remote by the source. Always returns nothing on this Actor — see FAQ. |
includeDescription | boolean | true | Include a plain-text description snippet per posting. |
maxItemsPerCompany | integer | 100 | Cap postings returned per company (0 = no cap). Each result is a billed event. |
maxItems | integer | 200 | Hard cap on total postings returned (0 = no cap). Each result is a billed event. |
onlyNewSinceLastRun | boolean | false | Delta/monitoring mode: only output postings not seen on a previous run made with this flag on (see "Delta mode / monitoring"). |
aiEnrichment | boolean | false | Adds aiKeySkills/aiExperienceLevel/aiWorkArrangement/aiVisaSponsorship per posting via the Anthropic or Mistral API — BYOK (see "AI enrichment"). |
aiProvider | string | anthropic | Which AI provider runs enrichment: anthropic (default, uses anthropicApiKey) or mistral (uses mistralApiKey). |
anthropicApiKey | string (secret) | — | Your Anthropic API key. Only used when aiEnrichment is on and aiProvider is anthropic; billed separately by Anthropic, not by this Actor. |
aiModel | string | claude-haiku-4-5-20251001 | Claude model for AI enrichment (when aiProvider is anthropic): claude-haiku-4-5-20251001 (fast/cheap) or claude-sonnet-4-5 (higher quality). |
mistralApiKey | string (secret) | — | Your Mistral API key. Only used when aiEnrichment is on and aiProvider is mistral; billed separately by Mistral, not by this Actor. |
mistralModel | string | mistral-small-latest | Mistral model for AI enrichment (when aiProvider is mistral): mistral-small-latest (default, fast/cheap — matches larger Mistral models on this task), mistral-medium-latest, or mistral-large-latest. |
concurrency | integer | 8 | Companies fetched in parallel (advanced). |
What Greenhouse jobs data does this scraper extract?
One flat JSON record per live posting:
| Field | Meaning |
|---|---|
ats | Which ATS served the posting (always "greenhouse" here) |
company | Real company display name, resolved from Greenhouse's own data (falls back to the input slug if unresolvable) |
id | Greenhouse's internal job ID |
title | Job title as posted |
department | Department or team where provided |
location | Location text (may include remote hints) |
url | Direct link to the posting |
postedAt | First-published date (YYYY-MM-DD) where provided |
snippet | Plain-text description excerpt (optional) |
globalId | Stable composite id <ats>:<company-slug>:<id> — unique across the whole ATS-actor family, handy for merging with the Lever/Ashby/Workable Actors or the Company Careers Bundle |
warnings | Array of data-quality notes for this record (e.g. ["postedAt missing"]); empty array when there's nothing to flag |
isNew | Only present when onlyNewSinceLastRun is on — always true (already-seen postings are dropped, never emitted with isNew: false) |
aiKeySkills | Only present when aiEnrichment is on — array of skills/technologies explicitly named in the posting text, never invented |
aiExperienceLevel | Only present when aiEnrichment is on — one of entry/mid/senior/lead/unknown |
aiWorkArrangement | Only present when aiEnrichment is on — one of onsite/hybrid/remote/unknown |
aiVisaSponsorship | Only present when aiEnrichment is on — true/false only if the posting explicitly states a policy, otherwise null |
remote and employmentType are always null on this Actor. Both keys are still present on every record — kept for schema consistency with the Lever / Ashby / Workable / Company Careers Bundle Actors — but Greenhouse's public API exposes neither a remote/workplace-type signal nor an employment-type field on any endpoint, so this Actor never guesses. Use locationFilter (e.g. "remote") as the closest available proxy for remote roles.
How to scrape Greenhouse jobs with this Actor
- Enter one or more company slugs (
stripe,gitlab,duolingo). Open the company's careers page and look for boards.greenhouse.io/ - Optionally set
keyword/titleExclude/locationFilter/postedSince/ caps. - Run and export JSON, CSV or Excel — or call it over the API:
from apify_client import ApifyClientclient = ApifyClient("<YOUR_APIFY_TOKEN>")run = client.actor("nomad-jobs/greenhouse-jobs-scraper").call(run_input={"companies": ["stripe", "gitlab", "duolingo"],"keyword": "engineer",})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["company"], "|", item["title"], item["url"])
curl -X POST \"https://api.apify.com/v2/acts/nomad-jobs~greenhouse-jobs-scraper/run-sync-get-dataset-items?token=<YOUR_APIFY_TOKEN>" \-H "Content-Type: application/json" \-d '{"companies": ["stripe", "gitlab", "duolingo"]}'
Output example
{"ats": "greenhouse","company": "Stripe","id": "7954688","title": "Senior Software Engineer","department": "Engineering","location": "San Francisco, CA","url": "https://stripe.com/jobs/search?gh_jid=7954688","postedAt": "2026-06-25","employmentType": null,"remote": null,"snippet": "We are hiring a Senior Software Engineer...","globalId": "greenhouse:stripe:7954688","warnings": []}
Delta mode / monitoring
Set onlyNewSinceLastRun: true to turn this Actor into a "what's new" monitor. Postings already seen on a previous run made with this flag on are dropped before push — you are not billed for them, so pairing this with an Apify schedule (cron) means every run only returns, and only charges for, postings that showed up since the last flagged run.
How it works: seen postings are tracked in a dedicated key-value store, keyed by each posting's globalId, capped at roughly 50,000 entries (oldest evicted first). The first run made with the flag on has nothing to compare against yet, so it emits everything — all isNew: true. Every emitted record gets isNew: true stamped on it; there's no isNew: false in the output, since unseen postings just aren't included.
Runs made with the flag off never read or write this cache — turning it on and off between runs is safe and has no side effects on normal runs.
AI enrichment
Turn on aiEnrichment and supply your own anthropicApiKey (or mistralApiKey, with aiProvider: "mistral") to add four AI-extracted fields to every posting:
| Field | Meaning |
|---|---|
aiKeySkills | Specific skills/technologies/tools explicitly named in the title or description — the model is instructed to never invent one. |
aiExperienceLevel | One of entry / mid / senior / lead / unknown. |
aiWorkArrangement | One of onsite / hybrid / remote / unknown. |
aiVisaSponsorship | true / false only when the posting explicitly states a sponsorship policy, otherwise null. |
The extraction prompt is explicit about never guessing: when the text doesn't clearly support a value you get "unknown" / null / an empty array, not a fabricated answer. Pick the model with aiModel (Anthropic: claude-haiku-4-5-20251001 default, fast/cheap, or claude-sonnet-4-5 higher quality) or mistralModel (Mistral: mistral-small-latest default — matches larger Mistral models on this task, mistral-medium-latest, mistral-large-latest).
Postings are batched (~12 per call) through whichever provider's API you picked (aiProvider). Your Anthropic or Mistral API key is billed separately by that provider, not by this Actor. Rough cost with Haiku or Mistral Small: enriching 100 postings runs well under $0.05 in provider token spend (short prompts, small JSON replies); Sonnet/Mistral Large cost roughly 4-5x that for the same batch.
Enrichment needs a posting's description text even when you have includeDescription off — this Actor fetches it internally for enrichment either way, then still honors your includeDescription choice for what actually ends up in the output snippet field.
If aiEnrichment is on but no matching key is available (anthropicApiKey/ANTHROPIC_API_KEY for aiProvider: "anthropic", or mistralApiKey/MISTRAL_API_KEY for aiProvider: "mistral"), enrichment is skipped: you get one extra dataset row explaining why (warnings: ["aiEnrichment skipped: no anthropicApiKey or mistralApiKey provided"]), a run status message, and every other posting is still returned normally, just without the ai* fields.
This is the same class of field fantastic-jobs' career-site-api prices a whole tier on (ai_key_skills, work arrangement, visa signals) — comparable output here, opt-in and BYOK instead of bundled into every row's price.
Integrations
Export results as JSON, CSV or Excel/XLSX, or pipe them straight into Make, Zapier or n8n. Call this Actor synchronously with run-sync-get-dataset-items, or plug it into any AI agent through the Apify MCP server.
Pricing
Pay per event: $0.05 per Actor start and $0.004 per posting returned. 100 postings ≈ $0.45. No subscription — pay only for what you fetch.
If you turn on aiEnrichment, your Anthropic or Mistral key is billed separately by that provider for the enrichment calls themselves — see "AI enrichment" above for a rough per-100-postings cost estimate. Delta mode (onlyNewSinceLastRun) only reduces cost: already-seen postings are dropped before the billed push step.
Use cases
- Track hiring at specific companies (competitors, targets, portfolio)
- Build company-careers pages and job boards without HTML scraping
- Recruiting intelligence: who opens which roles, where, how fast
- Feed AI matching agents with reliable ATS-direct data
FAQ
Is it legal to scrape Greenhouse jobs? The data comes from the ATS providers' official, public, unauthenticated JSON APIs — the same data any visitor sees on the company's careers page. Review the providers' terms for your use case.
Do I need an API key or login? No. These are public job-board APIs — no authentication of any kind.
What if a company isn't found? It is logged and skipped — the run continues with the other companies. Full board URLs are also accepted and reduced to slugs automatically.
Why is remote always null, and why does remoteOnly return nothing?
Greenhouse's public API doesn't expose a remote/workplace-type field on any endpoint — not on the job list, not on the board root. Rather than guess from free-text location strings, this Actor reports null faithfully. Use locationFilter: "remote" to approximate it instead.
How fresh is the data? Every run hits the ATS APIs live. No caching layer in between.
Something broken or missing? Open an issue on the Actor's Issues tab — it is monitored and fixes ship fast.