# Greenhouse Jobs Extractor | Clean Hiring Data API (`gtgyani206/greenhouse-job-extractor`) Actor

Extract structured job listings from any Greenhouse-powered job board using a fast and reliable API-based approach.

This Actor converts Greenhouse job boards into clean, structured datasets that are ready for analysis, dashboards, or integration into your own applications.

- **URL**: https://apify.com/gtgyani206/greenhouse-job-extractor.md
- **Developed by:** [Gyanendra Thakur](https://apify.com/gtgyani206) (community)
- **Categories:** Jobs, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $4.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

### What does Greenhouse Job Extractor do?

**Greenhouse Job Extractor** collects open job listings from public [Greenhouse job boards](https://www.greenhouse.io/) by calling the Greenhouse Boards API for one or more company boards. It returns structured job records with the board source URL, company slug, job ID, title, location, departments, job posting URL, update timestamp, and optionally the full description HTML. You can try it immediately by supplying public board URLs such as `https://job-boards.greenhouse.io/webflow`.

Because it runs on the **Apify platform**, you can launch it from the UI or API, schedule recurring runs, monitor errors, connect the output to tools like Make, Zapier, Google Sheets, or custom ETL pipelines, and keep job data flowing into downstream systems. For most public boards this Actor uses a lightweight JSON endpoint instead of browser automation, which keeps runs fast and cost-efficient.

### Why use Greenhouse Job Extractor?

Greenhouse is widely used for public career pages, but manually opening each board is slow and hard to automate. This Actor gives you a repeatable way to collect openings across many Greenhouse-powered companies without building your own scraper from scratch.

Common use cases include tracking new openings for hiring intelligence, building internal job aggregation feeds, enriching CRM or recruiting workflows, monitoring competitors, powering no-code automations, and exporting Greenhouse listings into analytics tools. If you need to watch dozens or hundreds of companies on a schedule, Apify gives you the API access, logging, run history, and integration surface to do it reliably.

### How to use Greenhouse Job Extractor

1. Open the Actor in Apify Console and go to the **Input** tab.
2. Add one or more public Greenhouse board URLs to the `sources` field.
3. Decide whether you want the full job description included in the output.
4. Optionally set `maxResults` if you want to test with a smaller sample.
5. Run the Actor manually or schedule it to run automatically.
6. Open the **Output** tab to inspect the dataset, or fetch the results through the Apify API.

If you are running locally, use `apify run` so the Apify environment and local storage are configured correctly.

### Input

Configure the Actor from the **Input** tab with the following fields:

- `sources`: Array of public Greenhouse board URLs such as `https://job-boards.greenhouse.io/webflow`.
- `includeDescription`: Boolean flag that controls whether the full job description HTML is returned. Default is `true`.
- `maxResults`: Optional integer limit for the total number of jobs returned across all sources.

Example input:

```json
{
  "sources": [
    "https://job-boards.greenhouse.io/webflow",
    "https://job-boards.greenhouse.io/stripe"
  ],
  "includeDescription": true,
  "maxResults": 100
}
````

This Actor is designed for public Greenhouse board URLs. If a URL does not match the expected board format, the Actor skips it and logs a warning.

### Output

The Actor stores results in the default dataset as one item per job listing. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.

Simplified output example:

```json
[
  {
    "source": "https://job-boards.greenhouse.io/webflow",
    "company": "webflow",
    "id": 7483921,
    "title": "Senior Software Engineer",
    "location": "Remote, United States",
    "departments": ["Engineering"],
    "url": "https://job-boards.greenhouse.io/webflow/jobs/7483921",
    "updatedAt": "2026-04-20T18:42:11Z",
    "description": "<div><p>Job description HTML...</p></div>"
  }
]
```

If `includeDescription` is set to `false`, the `description` field will be `null`.

### Data table

| Field | Type | Description |
| --- | --- | --- |
| `source` | string | Original Greenhouse board URL used as input |
| `company` | string | Greenhouse company slug extracted from the board URL |
| `id` | number | Unique job ID from Greenhouse |
| `title` | string | Job title |
| `location` | string or null | Reported job location |
| `departments` | string\[] | Department names attached to the job |
| `url` | string | Direct link to the public job posting |
| `updatedAt` | string | Last updated timestamp from Greenhouse |
| `description` | string or null | Full job description HTML when enabled |

### Pricing / Cost estimation

#### How much does it cost to scrape Greenhouse job boards?

This Actor is relatively inexpensive because it requests Greenhouse's public jobs API instead of rendering pages in a browser. Small runs that query a limited number of boards usually finish quickly and consume only a small amount of platform resources.

Actual cost depends on the number of source URLs, how many jobs each company has open, how often you schedule runs, and whether you include descriptions. If your Apify account includes free usage, smaller monitoring workloads may fit inside that allowance, but pricing and free-tier limits can change, so check the current Apify pricing page before relying on a specific number.

### Tips or Advanced options

- Turn off `includeDescription` if you only need titles, locations, and links. This reduces payload size and speeds up downstream processing.
- Use `maxResults` when testing integrations so you do not pull a full dataset every time.
- Schedule recurring runs for daily or hourly job monitoring.
- Deduplicate records downstream using `company` plus `id` if you merge multiple runs together.
- Provide canonical public board URLs in the `https://job-boards.greenhouse.io/<company>` format for the most predictable results.

### FAQ, disclaimers, and support

#### Does this work on every Greenhouse URL?

It is intended for public Greenhouse board URLs. Individual job post URLs or unusual custom URL shapes may not match the expected board pattern.

#### Does it bypass authentication or private pages?

No. This Actor only reads publicly available Greenhouse job data.

#### Is scraping Greenhouse legal?

You are responsible for making sure your use complies with the target site's terms, applicable laws, and your own internal policies. Only collect and use data you are permitted to process.

#### Where can I get help?

Use the Actor's **Issues** tab for bug reports, feature requests, and support questions. If you need custom fields, change tracking, downstream integrations, or a broader careers data pipeline, this project can be extended into a custom solution.

# Actor input Schema

## `sources` (type: `array`):

List of Greenhouse job board URLs

## `includeDescription` (type: `boolean`):

Whether to include full job descriptions

## `maxResults` (type: `integer`):

Maximum number of jobs to return

## Actor input object example

```json
{
  "sources": [
    "https://job-boards.greenhouse.io/webflow"
  ],
  "includeDescription": true
}
```

# Actor output Schema

## `results` (type: `string`):

Job listings extracted from the provided public Greenhouse job boards.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "sources": [
        "https://job-boards.greenhouse.io/webflow"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("gtgyani206/greenhouse-job-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "sources": ["https://job-boards.greenhouse.io/webflow"] }

# Run the Actor and wait for it to finish
run = client.actor("gtgyani206/greenhouse-job-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "sources": [
    "https://job-boards.greenhouse.io/webflow"
  ]
}' |
apify call gtgyani206/greenhouse-job-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=gtgyani206/greenhouse-job-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Greenhouse Jobs Extractor | Clean Hiring Data API",
        "description": "Extract structured job listings from any Greenhouse-powered job board using a fast and reliable API-based approach.\n\nThis Actor converts Greenhouse job boards into clean, structured datasets that are ready for analysis, dashboards, or integration into your own applications.",
        "version": "1.0",
        "x-build-id": "fOcCnBc5Xf6kedaVE"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/gtgyani206~greenhouse-job-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-gtgyani206-greenhouse-job-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/gtgyani206~greenhouse-job-extractor/runs": {
            "post": {
                "operationId": "runs-sync-gtgyani206-greenhouse-job-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/gtgyani206~greenhouse-job-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-gtgyani206-greenhouse-job-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "sources"
                ],
                "properties": {
                    "sources": {
                        "title": "Greenhouse Job Board URLs",
                        "type": "array",
                        "description": "List of Greenhouse job board URLs",
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeDescription": {
                        "title": "Include Job Description",
                        "type": "boolean",
                        "description": "Whether to include full job descriptions",
                        "default": true
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Maximum number of jobs to return"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
