# Hacker News Job Scraper: Who is Hiring Posts (`getascraper/hn-hiring-scraper`) Actor

Scrape Hacker News Who is Hiring job posts into structured JSON. Extract company, role, salary, remote status, tech stack, emails, and application URLs. Drop-in for Google Sheets, Airtable, and Zapier. Skip manual copy-paste. $0.02 per job.

- **URL**: https://apify.com/getascraper/hn-hiring-scraper.md
- **Developed by:** [GetAScraper](https://apify.com/getascraper) (community)
- **Categories:** Lead generation, AI, Social media
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $20.00 / 1,000 jobs

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## HN Who is Hiring Scraper

Extract structured job postings from Hacker News monthly "Who is Hiring?" threads. Parse company, role, location, remote status, salary, technologies, emails, and application URLs from the largest organic tech job board on the internet.

Built on the official Hacker News Firebase API and Algolia Search API for reliable, rate-limit-free access to job data.

### Why use it?

- **Structured Data**: Extracts company, role, location, remote status, salary, technologies, emails, and URLs from unstructured HN comments
- **Auto-Discovery**: Automatically finds the latest "Who is Hiring?" posts without needing specific URLs
- **Historical Data**: Scrape multiple months back for trend analysis
- **Tech-Focused**: Identifies technologies mentioned in each job post for filtering and analysis
- **Contact Extraction**: Automatically finds email addresses and application URLs

### How to use

1. Open the Actor in Apify Console.
2. Leave `startUrls` empty to auto-discover the latest hiring post, or provide specific HN post URLs.
3. Set `monthsBack` to scrape multiple months (max 12).
4. Set `maxJobsPerMonth` to limit results (0 = unlimited).
5. Optionally enable `includeReplies` to capture nested discussion threads.
6. Run the Actor and consume the output via Apify API, CSV, or JSON.

### Input fields

- `startUrls` (array, optional): Specific HN "Who is Hiring?" post URLs. If empty, auto-discovers the latest posts.
- `monthsBack` (integer): How many months of hiring posts to scrape when auto-discovering. Default: 1, Max: 12.
- `maxJobsPerMonth` (integer): Maximum job postings to extract per month. Default: 0 (unlimited).
- `includeReplies` (boolean): Whether to include nested replies/discussion threads. Default: false.
- `proxyConfiguration` (object): Proxy configuration for API requests. Optional - HN APIs are generally open.

### Output schema

Each dataset item represents one job posting:

```json
{
  "commentId": 22666455,
  "hnUser": "kfx",
  "postedAt": 1584984975,
  "postedAtIso": "2020-03-23T17:36:15.000Z",
  "rawText": "PBS | Various Engineers | Full-Time | ONSITE...",
  "cleanText": "PBS | Various Engineers | Full-Time | ONSITE...",
  "company": "PBS",
  "role": "Various Engineers",
  "location": "Alexandria, VA",
  "remoteStatus": "ONSITE (Flexible WFH)",
  "employmentType": "Full-Time",
  "salary": null,
  "technologies": ["express", "iOS"],
  "emails": ["digitaljobs@pbs.org"],
  "urls": ["https://tinyurl.com/v7c8nb2"],
  "isTopLevel": true,
  "parentId": 22665398,
  "replyCount": 0,
  "hnUrl": "https://news.ycombinator.com/item?id=22666455"
}
````

### Data table

| Field | Type | Description |
|---|---|---|
| `commentId` | number | Unique HN comment ID |
| `company` | string | Company name (extracted from post) |
| `role` | string | Job role/title |
| `location` | string | Job location |
| `remoteStatus` | string | REMOTE, ONSITE, HYBRID, etc. |
| `employmentType` | string | Full-time, Contract, Intern, etc. |
| `salary` | string | Salary range if found in text |
| `technologies` | array | Technologies mentioned in the post |
| `emails` | array | Email addresses found |
| `urls` | array | Application/company URLs found |
| `hnUser` | string | HN username who posted the job |
| `postedAtIso` | string | ISO timestamp of the post |
| `replyCount` | number | Number of replies to this job post |
| `hnUrl` | string | Direct link to the comment on HN |

### Pricing / cost estimation

Priced at **$0.02 per job posting** (Pay-per-Result).

| Target Jobs | Estimated Cost |
|---|---|
| 100 | $2.00 |
| 500 | $10.00 |
| 1,000 | $20.00 |

HN APIs are open and free to access. No proxy costs typically required.

### Tips / Advanced

- **Auto-Discovery**: Leave `startUrls` empty and set `monthsBack` to 3-6 to get a rolling window of hiring posts
- **Focus on Remote**: Filter output by `remoteStatus` field containing "REMOTE"
- **Tech Filtering**: Use the `technologies` array to find jobs matching specific skills
- **Speed**: Each API call has a 50ms delay to be respectful to HN. Expect ~20 jobs/minute

### FAQ

**Is scraping Hacker News legal?**
HN provides official APIs (Firebase and Algolia) for accessing this data. This Actor uses those APIs, not HTML scraping.

**Why did I get fewer results than expected?**
Some comments in hiring threads are discussion, not job posts. The parser attempts to filter these, but imperfectly. Set `includeReplies: true` to capture more.

**Can I scrape historical data?**
Yes. Set `monthsBack` up to 12 to scrape past hiring threads. Note that older posts may have fewer active listings.

### Support

For bug reports or feature requests, open a ticket in the Issues tab.

# Actor input Schema

## `startUrls` (type: `array`):

Specific HN 'Who is Hiring?' post URLs. If empty, auto-discovers the latest posts.

## `monthsBack` (type: `integer`):

How many months of hiring posts to scrape when auto-discovering. Max 12.

## `maxJobsPerMonth` (type: `integer`):

Maximum job postings to extract per month. 0 = unlimited.

## `includeReplies` (type: `boolean`):

Whether to include nested replies/discussion threads (not just top-level job posts).

## `proxyConfiguration` (type: `object`):

Proxy configuration for API requests. Optional - HN APIs are generally open.

## Actor input object example

```json
{
  "startUrls": [
    {
      "url": "https://news.ycombinator.com/item?id=47975571"
    }
  ],
  "monthsBack": 1,
  "maxJobsPerMonth": 0,
  "includeReplies": false,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "startUrls": [
        {
            "url": "https://news.ycombinator.com/item?id=47975571"
        }
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("getascraper/hn-hiring-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "startUrls": [{ "url": "https://news.ycombinator.com/item?id=47975571" }] }

# Run the Actor and wait for it to finish
run = client.actor("getascraper/hn-hiring-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "startUrls": [
    {
      "url": "https://news.ycombinator.com/item?id=47975571"
    }
  ]
}' |
apify call getascraper/hn-hiring-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=getascraper/hn-hiring-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Hacker News Job Scraper: Who is Hiring Posts",
        "description": "Scrape Hacker News Who is Hiring job posts into structured JSON. Extract company, role, salary, remote status, tech stack, emails, and application URLs. Drop-in for Google Sheets, Airtable, and Zapier. Skip manual copy-paste. $0.02 per job.",
        "version": "0.1",
        "x-build-id": "778d9YZHdjQRy2FMy"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/getascraper~hn-hiring-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-getascraper-hn-hiring-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/getascraper~hn-hiring-scraper/runs": {
            "post": {
                "operationId": "runs-sync-getascraper-hn-hiring-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/getascraper~hn-hiring-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-getascraper-hn-hiring-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "startUrls": {
                        "title": "Hiring Post URLs",
                        "type": "array",
                        "description": "Specific HN 'Who is Hiring?' post URLs. If empty, auto-discovers the latest posts.",
                        "items": {
                            "type": "object",
                            "required": [
                                "url"
                            ],
                            "properties": {
                                "url": {
                                    "type": "string",
                                    "title": "URL of a web page",
                                    "format": "uri"
                                }
                            }
                        }
                    },
                    "monthsBack": {
                        "title": "Months back",
                        "minimum": 1,
                        "maximum": 12,
                        "type": "integer",
                        "description": "How many months of hiring posts to scrape when auto-discovering. Max 12.",
                        "default": 1
                    },
                    "maxJobsPerMonth": {
                        "title": "Max jobs per month",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Maximum job postings to extract per month. 0 = unlimited.",
                        "default": 0
                    },
                    "includeReplies": {
                        "title": "Include replies",
                        "type": "boolean",
                        "description": "Whether to include nested replies/discussion threads (not just top-level job posts).",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Proxy configuration for API requests. Optional - HN APIs are generally open.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
