Greenhouse Jobs Scraper avatar

Greenhouse Jobs Scraper

Pricing

from $0.79 / 1,000 results

Go to Apify Store
Greenhouse Jobs Scraper

Greenhouse Jobs Scraper

Every job from any Greenhouse board in one run. Paste a boards.greenhouse.io URL or company token → title, requisition ID, full HTML description, location, departments, offices, salary, apply URL. No login, no limits, one flat row per job. Thousands of companies supported.

Pricing

from $0.79 / 1,000 results

Rating

0.0

(0)

Developer

Muhamed Didovic

Muhamed Didovic

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Greenhouse Jobs Scraper — Any Company

Scrape job postings from any company's Greenhouse board — give a board token (e.g. stripe) or a boards.greenhouse.io URL and get clean structured rows: title, requisition ID, full HTML description, location, departments, offices, salary band (when published), employer, first-published + updated dates, application deadline, language, and the apply URL. One flat row per job.

How the Greenhouse Jobs Scraper works

Why this actor

Greenhouse is one of the most widely-used ATSs — thousands of companies run their careers page on it (Stripe, Airbnb, Databricks, Figma, GitLab, Robinhood, and many more). This actor turns any of those boards into structured data:

  • Any board, zero per-company code — paste a board token or URL; if the company is on Greenhouse, it works.
  • Greenhouse's own public board API — no HTML scraping, no anti-bot, no auth. The whole board (all jobs, full HTML descriptions) comes back in a single request — no pagination, no per-job detail fetch.
  • Full job content — the complete HTML job description, decoded from the API's entity-escaped form into clean markup.
  • Structured org data — departments and offices per job, plus company name, requisition ID, and the internal job ID.
  • Salary band (USD / GBP / EUR) — regex-parsed from the description for boards subject to pay-transparency rules (US states, EU).
  • Optional application-form questions — flip includeQuestions on to also pull each job's application questions (and compliance / demographic / location questions).
  • Mixed input — multiple board tokens, board URLs (boards.greenhouse.io/{token}, job-boards.greenhouse.io/{token}, embed ?for={token}), or a direct /jobs/{id} URL for a single posting.

Use cases

  • Talent / competitive intelligence — track which roles a company is hiring for, where, and in which departments; diff over time to spot ramp-ups or freezes.
  • Recruitment market data — aggregate openings across many Greenhouse companies into one structured feed.
  • Comp benchmarking — collect published salary bands by role and location (US/EU transparency).
  • Sales prospecting (HR-tech / staffing) — find companies hiring in your target functions + geographies.
  • Job-board / aggregator ingestion — clean rows for downstream pipelines without per-company scrapers.

Input

FieldTypeRequiredNotes
boardTokensstring[]one of these*Board tokens — the company id in boards.greenhouse.io/{token}. E.g. stripe, databricks, figma.
startUrlsstring[]one of these*Board URLs (boards.greenhouse.io/stripe, job-boards.greenhouse.io/stripe), embed URLs (...?for=stripe), or a direct job URL (.../jobs/{id}). Token auto-extracted.
includeQuestionsbooleannoAlso fetch each job's application-form questions (1 extra API call per job). Default false.
maxItemsintegernoMax job rows per board. 3 boards × maxItems: 200 → up to 600 rows. Each row = one paid dataset item. Default 1000. Free-tier users capped at 100 total.
maxConcurrencyintegernoParallel API calls (matters mainly with includeQuestions). Default 8.
maxRequestRetriesintegernoPer-request retry budget. Default 5.
proxyobjectnoApify Residential (any country) recommended; direct/datacenter also work.

* Provide at least one of boardTokens or startUrls.

Example input

{
"boardTokens": ["stripe", "figma"],
"startUrls": ["https://boards.greenhouse.io/databricks"],
"maxItems": 500,
"includeQuestions": false,
"proxy": { "useApifyProxy": true, "apifyProxyGroups": ["RESIDENTIAL"] }
}

How it works

  1. Resolve boards — your boardTokens + any tokens extracted from startUrls become the board set; a /jobs/{id} URL targets one posting.
  2. One call per boardGET /v1/boards/{token}/jobs?content=true returns every job with full content. No pagination.
  3. Map + dedupe + emit — each job → flat row; HTML entities decoded; salary regex-parsed; deduped by (board, jobId); pushed to the dataset.
  4. Optional enrichment — with includeQuestions, each job also gets a /jobs/{id}?questions=true call for its application-form fields.

Output schema

Every row has rowType: "job". Real sample (Figma board):

{
"rowType": "job",
"jobId": "5822886004",
"internalJobId": "5061458004",
"requisitionId": "2178",
"boardToken": "figma",
"company": "Figma",
"title": "Account Executive, Enterprise",
"location": "San Francisco, CA",
"departments": ["Sales"],
"offices": ["San Francisco, CA"],
"description": "<div class=\"content-intro\"><p>Figma is growing…</p></div>", // decoded HTML
"salary": { // regex-parsed; null when none published
"rawText": "$165,000 - $231,000 USD",
"min": 165000, "max": 231000, "currency": "USD", "period": null
},
"firstPublished": "2026-04-17T12:21:54-04:00",
"updatedAt": "2026-04-17T12:25:57-04:00",
"applicationDeadline": null,
"language": "en",
"jobUrl": "https://boards.greenhouse.io/figma/jobs/5822886004",
"applyUrl": "https://boards.greenhouse.io/figma/jobs/5822886004",
"applyType": "internal", // "internal" = greenhouse-hosted; "external" = company's own careers domain
"metadata": null, // board custom fields, when present
"questionCount": null, // number of application questions (when includeQuestions=true)
"scrapedAt": "2026-06-02T…Z"
}

Key output fields

FieldMeaning
jobId / internalJobId / requisitionIdGreenhouse public id / internal id / the company's own req number.
boardToken / companyThe board it came from + the employer name.
departments / officesGreenhouse's structured org tags for the role.
descriptionFull job description as clean HTML (entities decoded).
salaryRegex-parsed band (min/max/currency/period); null when the board doesn't publish pay.
applyUrl / applyTypeThe job/apply URL; internal = hosted on greenhouse.io, external = redirects to the company's own site.
questionCountApplication-form question count (only when includeQuestions is on).

FAQ

What's a "board token"? It's the company identifier in the board URL — boards.greenhouse.io/{token} (e.g. stripe). You can paste the token directly, the full board URL, the job-boards.greenhouse.io form, an embed ?for={token} URL, or a direct /jobs/{id} link — the token is extracted automatically.

How do I find a company's token? Open the company's careers page; if jobs link to boards.greenhouse.io/{something} or job-boards.greenhouse.io/{something}, that {something} is the token. Many company career sites embed Greenhouse and show ?for={token} in the board URL.

Does it get the full description? Yes — the board API's content=true mode returns the complete HTML description for every job in one call. We decode the entity-escaped HTML so description is clean markup.

Why is salary sometimes null? Greenhouse itself doesn't have a structured salary field on the public board API — pay appears in the description text only for boards subject to transparency laws (US states like CA/CO/NY/WA, EU). We regex-parse it when present; otherwise salary is null. We never synthesize.

What does includeQuestions add? A per-job call to /jobs/{id}?questions=true, which returns the application-form questions plus compliance / demographic / location questions. It's off by default because it adds one HTTP call per job; turn it on if you need the application schema.

What does each dataset-item charge cover? One job row with all fields. maxItems is per board, so maxItems: 100 across 3 boards = up to 300 charges. The Apify Store pricing event is apify-default-dataset-item.

My run returned fewer rows than the careers page shows — why? Usually the maxItems per-board cap, or the board genuinely has fewer open roles than a stale page suggests. The API's meta.total is the source of truth and the actor logs it per board.

Support

  • Bugs / feature requests — open an issue on the GitHub repo.
  • Custom exports / extra fields (e.g. parsed pay from metadata, department filters) — drop a note via the Apify Store contact form.
  • Other actors — see my Apify Store profile for the rest of the catalog.

⚠️ Disclaimer

This Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by Greenhouse Software, Inc. or the companies whose boards it can read. All trademarks mentioned are the property of their respective owners.

The scraper reads only publicly available job postings served by Greenhouse's public job-board API (boards-api.greenhouse.io) — the same endpoint that powers each company's public careers page. No login, no CAPTCHA solving, no access to recruiter/admin APIs. It rate-limits via a concurrency cap (default 8).

Users are responsible for:

  • Complying with Greenhouse's and each company's terms of use
  • Following GDPR, CCPA, and your jurisdiction's data-protection laws when storing or processing scraped postings
  • Not contacting candidates or employees referenced in postings
  • Not republishing scraped data in a way that competes with Greenhouse or its customers

SEO Keywords

greenhouse scraper, scrape greenhouse jobs, greenhouse board scraper, boards.greenhouse.io scraper, greenhouse job board api, greenhouse ats scraper, greenhouse api jobs, scrape company careers page, ats job scraper, multi-company job scraper, tech jobs scraper, startup jobs scraper, stripe jobs scraper, databricks jobs scraper, job postings api, requisition data scraper, hiring intelligence, talent intelligence, recruitment market data, comp benchmarking, salary data scraper, apify greenhouse, careers page api, job description scraper