# 💼 HN Who's Hiring Scraper — Structured Tech Job Data & Trends (`nexgendata/hn-whos-hiring-scraper`) Actor

The best HN Who is Hiring parser on Apify — a cheaper alternative to manual LinkedIn scraping. Extract structured jobs: company, role, salary, location, tech stack, remote status & benefits. Filter & analyze 500+ listings/month. Tracker mode reveals hiring trends, top stacks & salary ranges.

- **URL**: https://apify.com/nexgendata/hn-whos-hiring-scraper.md
- **Developed by:** [Stephan Corbeil](https://apify.com/nexgendata) (community)
- **Categories:** Developer tools, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## HN Who's Hiring Scraper -- Structured Tech Job Listings from Hacker News

Automatically scrape and parse Hacker News "Ask HN: Who is Hiring?" monthly threads into clean, structured job data. Stop manually scanning hundreds of unformatted comments -- **get filterable, searchable job listings with extracted company names, salaries, tech stacks, remote status, benefits, and location data in seconds.** This is the only tool that transforms HN hiring threads into production-ready structured data.

### Key Features

- **Smart Comment Parsing** -- Extracts structured fields from free-form HN comments using pattern matching across 100+ technology keywords, salary formats, and location conventions.
- **Company & Role Extraction** -- Identifies company names, job titles, and seniority levels even when posters use inconsistent formatting.
- **Salary Detection** -- Captures salary ranges, equity mentions, and compensation details from text that uses dozens of different formats ($150K, $150,000/yr, 150-200k, etc.).
- **Tech Stack Recognition** -- Detects 100+ technologies including languages (Python, Rust, Go), frameworks (React, Django, Rails), infrastructure (AWS, GCP, Kubernetes), and databases (PostgreSQL, MongoDB, Redis).
- **Remote Status Classification** -- Classifies each listing as remote, hybrid, on-site, or remote-friendly based on contextual analysis of the full comment text.
- **Location Normalization** -- Parses locations like "SF", "San Francisco, CA", "NYC", "Berlin, Germany" into consistent, filterable formats.
- **Historical Thread Access** -- Scrape not just the current month but any past "Who is Hiring?" thread. Build a longitudinal dataset of tech hiring trends going back years.
- **Advanced Filtering** -- Filter by keyword, location, remote-only, specific technologies, or salary ranges to get exactly the listings you need.

### Output Example

Each parsed job listing contains structured fields extracted from the raw HN comment:

```json
{
  "company": "Stripe",
  "role": "Senior Backend Engineer",
  "salary": "$180,000 - $250,000",
  "equity": "0.01% - 0.05%",
  "location": "San Francisco, CA",
  "remote": "hybrid",
  "techStack": ["Ruby", "Go", "AWS", "PostgreSQL", "Kubernetes"],
  "benefits": ["Health insurance", "401k match", "Unlimited PTO"],
  "description": "Building the next generation of payment infrastructure...",
  "hnCommentUrl": "https://news.ycombinator.com/item?id=39562281",
  "threadDate": "2025-03-01",
  "postedBy": "stripe_recruiter"
}
````

### How to Use

1. **Select the thread** -- Choose the current month's "Who is Hiring?" thread or specify a historical thread URL. The actor auto-detects the latest thread if no URL is provided.
2. **Set filters** -- Optionally filter by keywords (e.g., "machine learning"), location (e.g., "remote"), or technology (e.g., "Python"). Leave blank to get all listings.
3. **Run the actor** -- Click "Start" or trigger via API. The actor parses all top-level comments and extracts structured data, typically completing in under 60 seconds.
4. **Export and analyze** -- Download results as JSON/CSV, push to Google Sheets, or integrate with your ATS, job board, or recruitment pipeline via webhooks.

### Integration Examples

**Python SDK**

```python
from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("nexgendata/hn-whos-hiring-scraper").call(
    run_input={
        "keyword": "machine learning",
        "remote_only": True,
        "maxResults": 100
    }
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item["company"], item["role"], item["salary"])
```

**cURL**

```bash
curl "https://api.apify.com/v2/acts/nexgendata~hn-whos-hiring-scraper/runs?token=YOUR_API_TOKEN" \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"keyword": "machine learning", "remote_only": true}'
```

### Use Cases

- **Job Seekers** -- Find high-quality tech jobs from companies that post on Hacker News, typically well-funded startups and established tech companies. Filter by your preferred stack, location, and salary range.
- **Recruiters & Talent Teams** -- Monitor which companies are actively hiring and for what roles. Identify competing offers and salary benchmarks in your market.
- **Market Researchers** -- Track which technologies are in highest demand month over month. Build trend reports on the rise of AI/ML roles, Rust adoption, or remote work shifts.
- **Startup Analysts** -- Identify well-funded startups entering hiring mode. A burst of HN job posts often signals a recent funding round or product launch.
- **Job Board Operators** -- Enrich your job board with high-quality structured listings from HN threads. These are often exclusive postings not found on LinkedIn or Indeed.
- **Compensation Analysts** -- Aggregate salary data from hundreds of postings to build compensation benchmarks for specific roles, technologies, and locations in the tech industry.

### FAQ

**What are the rate limits?**
The actor scrapes Hacker News respectfully with built-in rate limiting. A typical monthly thread with 500+ comments completes in under 60 seconds with no risk of being blocked.

**How fresh is the data?**
Data is scraped in real-time from the live HN thread. New comments posted after a run started will not be included -- re-run the actor to capture late additions. The "Who is Hiring?" thread is posted on the first of each month.

**What output formats are supported?**
JSON, CSV, Excel, and XML. You can also push results to Google Sheets, Slack, or any webhook endpoint. The structured schema makes it easy to import into any database or analytics tool.

**How much does it cost?**
This actor uses pay-per-event pricing at $0.008 per parsed listing plus $0.01 per actor start. A typical monthly thread with 400 listings costs approximately $3.21. See the pricing table below.

**Is there a competing tool for this data?**
No. This is the only tool that provides structured, machine-readable data from HN "Who is Hiring?" threads. The alternative is manually reading hundreds of unformatted comments or writing your own parser from scratch.

**Can I access this via API?**
Yes. Full REST API access plus official Python, JavaScript, and Node.js SDKs. Schedule monthly runs to automatically capture each new "Who is Hiring?" thread as it is posted.

### Pricing

| Metric | Cost |
|---|---|
| Cost per parsed listing | $0.008 |
| Cost per actor start | $0.01 |
| 100 listings | $0.81 |
| 400 listings (typical month) | $3.21 |
| 12 months of data | ~$38.52 |
| **Manual parsing (3 hrs @ $50/hr)** | **$150/month** |
| **Custom scraper development** | **$2,000+ one-time** |
| **Competing structured HN job data** | **Does not exist** |

A full year of monthly HN hiring data costs under $40 -- less than a single hour of manual research.

### Why Choose This Actor

- **Unique data source** -- No other tool provides structured, queryable data from HN "Who is Hiring?" threads. This is exclusive intelligence you cannot get from LinkedIn, Indeed, or any job aggregator.
- **100+ technology keywords** -- The most comprehensive tech stack detection for job listings, recognizing languages, frameworks, infrastructure tools, and databases automatically.
- **Historical trend analysis** -- Build longitudinal datasets spanning months or years to track hiring trends, salary shifts, and technology adoption in the startup ecosystem.
- **Zero maintenance** -- The actor handles HN's comment format variations, edge cases, and thread structure changes automatically. No custom parser to maintain.

### Get Started

New to Apify? [Sign up here](https://apify.com/?fpr=2ayu9b) to get started with $5 in free credits -- enough to parse several months of HN hiring threads for free.

### Related Hacker News Actors

| Actor | What It Does | Best For |
|-------|-------------|----------|
| [Hacker News Scraper & Trend Tracker](https://apify.com/nexgendata/hacker-news-scraper) | Front page stories, engagement velocity, topic classification | Content strategy, trend monitoring, market research |
| [HN Who's Hiring Scraper](https://apify.com/nexgendata/hn-whos-hiring-scraper) | Structured jobs from monthly hiring threads — salary, tech stack, remote status | Recruiting, job market analysis, hiring trends |

[Sign up for Apify](https://apify.com?fpr=2ayu9b) to get started with $5 in free credits every month.

# Actor input Schema

## `months` (type: `integer`):

Number of monthly 'Who is Hiring?' threads to scrape (1-12). Each month is a separate thread posted on the 1st.

## `filterKeywords` (type: `array`):

Only include job listings containing at least one of these keywords (case-insensitive). Leave empty to get all listings.

## `filterLocation` (type: `string`):

Only include listings mentioning this location (e.g., 'San Francisco', 'Remote', 'London'). Leave empty for all locations.

## `filterRemoteOnly` (type: `boolean`):

If enabled, only return job listings that explicitly mention remote work.

## `maxResults` (type: `integer`):

Maximum number of job listings to return. Use a lower number for faster runs.

## `outputMode` (type: `string`):

Choose 'raw' for just job listings, or 'tracker' to include a hiring analytics summary with top companies, tech stack trends, remote vs onsite breakdown, salary distribution, and monthly comparisons.

## Actor input object example

```json
{
  "months": 1,
  "filterKeywords": [],
  "filterRemoteOnly": false,
  "maxResults": 200,
  "outputMode": "raw"
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "months": 1,
    "filterKeywords": [],
    "filterLocation": "",
    "filterRemoteOnly": false,
    "maxResults": 200,
    "outputMode": "raw"
};

// Run the Actor and wait for it to finish
const run = await client.actor("nexgendata/hn-whos-hiring-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "months": 1,
    "filterKeywords": [],
    "filterLocation": "",
    "filterRemoteOnly": False,
    "maxResults": 200,
    "outputMode": "raw",
}

# Run the Actor and wait for it to finish
run = client.actor("nexgendata/hn-whos-hiring-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "months": 1,
  "filterKeywords": [],
  "filterLocation": "",
  "filterRemoteOnly": false,
  "maxResults": 200,
  "outputMode": "raw"
}' |
apify call nexgendata/hn-whos-hiring-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=nexgendata/hn-whos-hiring-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "💼 HN Who's Hiring Scraper — Structured Tech Job Data & Trends",
        "description": "The best HN Who is Hiring parser on Apify — a cheaper alternative to manual LinkedIn scraping. Extract structured jobs: company, role, salary, location, tech stack, remote status & benefits. Filter & analyze 500+ listings/month. Tracker mode reveals hiring trends, top stacks & salary ranges.",
        "version": "0.0",
        "x-build-id": "erYU1gdLY2po5yPbz"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/nexgendata~hn-whos-hiring-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-nexgendata-hn-whos-hiring-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/nexgendata~hn-whos-hiring-scraper/runs": {
            "post": {
                "operationId": "runs-sync-nexgendata-hn-whos-hiring-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/nexgendata~hn-whos-hiring-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-nexgendata-hn-whos-hiring-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "months": {
                        "title": "Months to scrape",
                        "minimum": 1,
                        "maximum": 12,
                        "type": "integer",
                        "description": "Number of monthly 'Who is Hiring?' threads to scrape (1-12). Each month is a separate thread posted on the 1st.",
                        "default": 1
                    },
                    "filterKeywords": {
                        "title": "Filter keywords",
                        "type": "array",
                        "description": "Only include job listings containing at least one of these keywords (case-insensitive). Leave empty to get all listings.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "filterLocation": {
                        "title": "Filter by location",
                        "type": "string",
                        "description": "Only include listings mentioning this location (e.g., 'San Francisco', 'Remote', 'London'). Leave empty for all locations."
                    },
                    "filterRemoteOnly": {
                        "title": "Remote only",
                        "type": "boolean",
                        "description": "If enabled, only return job listings that explicitly mention remote work.",
                        "default": false
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Maximum number of job listings to return. Use a lower number for faster runs.",
                        "default": 200
                    },
                    "outputMode": {
                        "title": "Output mode",
                        "enum": [
                            "raw",
                            "tracker"
                        ],
                        "type": "string",
                        "description": "Choose 'raw' for just job listings, or 'tracker' to include a hiring analytics summary with top companies, tech stack trends, remote vs onsite breakdown, salary distribution, and monthly comparisons.",
                        "default": "raw"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
