Greenhouse Jobs Scraper & Intent Signals API avatar

Greenhouse Jobs Scraper & Intent Signals API

Pricing

from $4.90 / 1,000 results

Go to Apify Store
Greenhouse Jobs Scraper & Intent Signals API

Greenhouse Jobs Scraper & Intent Signals API

Scrape jobs from any Greenhouse career page instantly. Extract clean, English-only job data with AI intent tagging (e.g., 'Data & AI') and days-active filters. Perfect for B2B sales leads, lead generation, and feeding LLMs.

Pricing

from $4.90 / 1,000 results

Rating

0.0

(0)

Developer

Aether

Aether

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Greenhouse Hiring Intent Signals — The SDR's Secret Weapon

This is NOT a generic job scraper. It's a high-speed, API-native extractor purpose-built to feed clean, actionable hiring signals directly into LLM prompt chains and CRM outreach workflows.

A 200-person Series B company posting 6 new "Sales Engineer" roles this week isn't just hiring — they just got budget approval for a tool your software replaces. That's the signal. This Actor surfaces it in under 60 seconds, with zero HTML bloat and zero wasted tokens.


Why Choose This Actor?

⏱ Time-Based Filtering (max_days_old)

Generic scrapers dump every job posted since 2019 into your pipeline — noise that burns API credits and wastes SDR time. This Actor's hard freshness gate drops any job last updated more than N days ago (default: 7). You only see signals from companies actively spending budget right now.

🏷 Auto Intent Categorization

Every job is classified into a high-level intent_category via zero-cost regex rules — no LLM call required:

CategoryExample SignalsYour Play
Data & AIData Scientist, ML Engineer, AI Product ManagerPitch your data infra / analytics tool
RevenueAccount Executive, SDR, VP of SalesPitch your CRM / sales engagement platform
EngineeringBackend Engineer, DevOps, Security EngineerPitch your developer tool / cloud service
ProductProduct Manager, Program ManagerPitch your project management / roadmapping tool
MarketingGrowth Marketer, Demand Gen, Content StrategistPitch your marketing automation / SEO tool
GeneralEverything elseGeneric nurture

Route each category to a different email sequence automatically — no extra AI processing needed.

🌐 English-Only Guarantee

Greenhouse boards for global companies often contain duplicate job postings in French, German, Japanese, and other APAC/EMEA languages. These localised listings add zero value to your English-language outbound campaigns and burn expensive LLM tokens. This Actor auto-detects and filters out non-English job descriptions using a fast Unicode character-range heuristic, keeping your output lean and your token costs flat.

🧹 LLM-Ready Clean Text

Forget raw HTML with nested <div> tags, inline styles, and 5,000-character walls of text. This Actor strips all HTML via Cheerio and produces a clean job_summary_clean field — the first 400 characters of plain text, word-boundary truncated. Drop it directly into your GPT prompt:

"Write a cold email to the Head of Engineering at {company}, referencing this job opening: {job_summary_clean}"


Input Configuration

ParameterTypeRequiredDefaultDescription
company_boardsstring[]YesGreenhouse board tokens (the subdomain slug). e.g. "datadog"https://boards.greenhouse.io/datadog
max_days_oldintegerNo7Skip jobs last updated more than this many days ago. Set to 1 for today-only signals, 14 for a broader sweep. Max: 90.
target_departmentsstring[]No[] (all)Case-insensitive department filter. Pass ["Engineering"] to only see engineering roles. Leave empty to scrape everything.

Example input:

{
"company_boards": ["datadog", "stripe", "figma", "vercel"],
"max_days_old": 3,
"target_departments": ["Engineering", "Data Science"]
}

Sample Output

Here's exactly what you get — one flat, LLM-ready record per fresh signal:

{
"board_name": "datadog",
"job_title": "Senior Software Engineer - Data Platform",
"department": "Engineering",
"location": "New York, NY, United States",
"updated_at": "2026-05-11T09:15:00-04:00",
"days_active": 2,
"intent_category": "Engineering",
"job_summary_clean": "As a Senior Software Engineer on the Data Platform team, you will design and build the next generation of Datadog's petabyte-scale analytics infrastructure. You'll work closely with product teams to deliver real-time observability features used by thousands of enterprise customers. Strong experience with distributed systems, Kafka, and…",
"apply_url": "https://boards.greenhouse.io/datadog/jobs/9876543"
}

Every field earns its place in your workflow:

FieldWhat it unlocks
days_activeSort ascending. Hit 0–2 day signals first — those companies just opened budget.
intent_categoryRoute "Data & AI" signals to your data tool pitch. Route "Revenue" signals to your CRM pitch. Zero-runtime classification.
job_summary_cleanDrop into GPT-4o / Claude prompt. "Write a cold email referencing this opening…" No pre-processing needed.
apply_urlCross-reference with your CRM. Existing customer? Deprioritize. Net-new logo? Gold.
departmentVerify alignment with your ICP. You sell to Engineering leads? Filter target_departments: ["Engineering"].
locationGeo-qualify. Only selling in North America? Filter location downstream in Clay or Airtable.

How SDR Teams Deploy This

┌──────────────────┐ ┌──────────────────┐ ┌──────────────────────┐
│ Apify Scheduler │────▶│ This Actor │────▶│ Zapier / Make │
(Daily 6 AM run) │ │ (Fresh signals) │ │ (Webhook trigger)
└──────────────────┘ └──────────────────┘ └──────────┬───────────┘
┌──────────▼───────────┐
│ Clay / Airtable │
(Enrich + ICP tag)
└──────────┬───────────┘
┌──────────▼───────────┐
│ Smartlead / Outreach│
"{first_name}, saw │
│ {company} is hiring│
│ a {job_title}..."
└──────────────────────┘
  1. Schedule daily with max_days_old: 1 to catch every new posting.
  2. Webhook to your enrichment layer — tag each company (ICP fit? Existing customer? Competitor?).
  3. Auto-generate cold emails — pipe job_title + job_summary_clean + intent_category into your LLM prompt.
  4. Prioritize by days_active ascending — freshest signals get first contact.

Local Development

git clone <repo-url> && cd greenhouse-intent-signals-scraper
npm install
npm start # Run with tsx (no build required)
npm run build # Compile TypeScript to dist/

When running locally without Apify input, the Actor auto-falls back to test boards (datadog, mailchimp) with max_days_old: 14 so you can verify end-to-end immediately.


Cost Efficiency

This Actor calls Greenhouse's public JSON API — one lightweight HTTP GET per company board. No headless browser. No Playwright. No expensive proxy rotation. Scraping 50 boards costs a fraction of what a single paginated website scrape would consume in Apify compute units.


Built for SDRs. Optimized for LLMs. No bloat. Just signal.