Pricing

from $2.00 / 1,000 results

Hacker News Who Is Hiring Scraper – Jobs, Salary & Email

Scrape structured job listings from Hacker News 'Who is Hiring?' monthly threads. Extracts company, role, location, salary, remote policy and tech stack — no AI, no API key, no proxy needed.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Actor stats

Bookmarked

Total users

Monthly active users

2 days ago

Last modified

Hacker News Who Is Hiring Scraper — Jobs, Salary & Tech Stack Data

Hacker News Who Is Hiring Scraper – Jobs, Salary & Email

Scrape structured job listings from Hacker News "Ask HN: Who is Hiring?" monthly threads. Extracts company name, role, location, salary, remote policy, tech stack, visa sponsorship, apply URL, and contact email — automatically, from any month going back years. No AI, no API key, no proxy required.

What Is This Actor?

Every month, Hacker News hosts one of the internet's most trusted job boards: the "Ask HN: Who is Hiring?" thread. Thousands of startups, scale-ups, and tech companies post jobs there directly — no recruiter markup, no job board fees, straight from the hiring team. Each thread typically contains 400–900 real job postings.

This actor reads those threads via the Algolia HN API, parses every job comment into structured fields, and outputs a clean dataset ready for analysis, job alerts, or CRM import.

Built for:

👩‍💻 Job seekers — filter thousands of HN listings by tech stack, remote policy, or salary without reading every comment
📊 Recruiters & HR teams — monitor the HN talent market and track competing companies' hiring activity
🔬 Researchers & analysts — study tech hiring trends, salary ranges, and in-demand skills over time
🤖 Pipeline builders — feed structured job data into Notion, Airtable, or a custom job alert bot
📈 Investors & founders — understand who is scaling and what roles are in demand across the startup ecosystem
🗃️ Data engineers — build a historical job market dataset from months or years of HN hiring threads

Features

Three scrape modes — monthly hiring threads, specific thread by ID, or full-text keyword search across all of HN
Structured field parsing — extracts company, role, location, salary, remote policy, tech stack, visa info, apply URL, and contact email from free-form comment text
40+ tech stack keywords detected — Python, Go, Rust, React, Kubernetes, PostgreSQL, AWS, LLMs, and more
Remote policy classification — distinguishes Full Remote, Hybrid, and Onsite from natural language mentions
Salary range extraction — detects $120k–$160k, $200k/yr, and similar formats
Visa sponsorship detection — flags H1B mentions, "visa sponsorship available", and "no visa sponsorship"
Keyword include/exclude filters — narrow results to exactly the roles you want
Remote-only filter — one toggle to return only remote-friendly listings
Multi-month history — scrape up to 24 months of threads in a single run
No API key, no proxy, no login — uses the public Algolia HN API
Minimal dependencies — only the Apify SDK; no Playwright, no Cheerio, no browser

Output Data

Each record represents one parsed job posting (top-level comment) from a hiring thread.

Field	Type	Description
`commentId`	string	HN item ID of the comment
`threadId`	string	HN item ID of the parent thread
`threadTitle`	string	Full title of the Ask HN thread
`threadMonth`	string	Month and year of the thread (e.g. `"May 2025"`)
`author`	string	HN username of the commenter
`company`	string \| null	Company name parsed from the first line of the posting
`role`	string \| null	Job title or role parsed from the first line
`location`	string \| null	Office location(s) detected in the text
`remote`	string \| null	`"Remote"`, `"Hybrid"`, or `"Onsite"`
`salary`	string \| null	Salary range or figure if mentioned (raw string)
`techStack`	array \| null	List of detected technologies and languages
`visa`	string \| null	Visa sponsorship status if mentioned
`applyUrl`	string \| null	First URL found in the posting (apply link or company site)
`email`	string \| null	Contact email address if present
`fullText`	string	Complete plain-text content of the job posting
`postedAt`	string	ISO 8601 timestamp of when the comment was posted
`hnUrl`	string	Direct link to the comment on Hacker News
`scrapedAt`	string	ISO 8601 timestamp of when this record was scraped

Sample Output Record

{
  "commentId": "43812345",
  "threadId": "43800001",
  "threadTitle": "Ask HN: Who is Hiring? (May 2025)",
  "threadMonth": "May 2025",
  "author": "jane_at_acme",
  "company": "Acme AI",
  "role": "Senior Backend Engineer",
  "location": "San Francisco, Remote",
  "remote": "Remote",
  "salary": "$160k–$200k",
  "techStack": ["Python", "Go", "PostgreSQL", "Kubernetes", "AWS"],
  "visa": "Visa sponsorship available",
  "applyUrl": "https://acmeai.io/careers",
  "email": "jobs@acmeai.io",
  "fullText": "Acme AI | Senior Backend Engineer | Remote | $160k–$200k\n\nWe're building the next generation of AI infrastructure...",
  "postedAt": "2025-05-01T10:22:05.000Z",
  "hnUrl": "https://news.ycombinator.com/item?id=43812345",
  "scrapedAt": "2025-05-15T14:00:00.000Z"
}

Detected Tech Stack Keywords

The actor scans each posting for 40+ technology keywords across languages, frameworks, databases, cloud platforms, and AI/ML tools:

Languages: Python, JavaScript, TypeScript, Go / Golang, Rust, Java, Kotlin, Swift, C++, C#, Ruby, PHP, Scala, Elixir, Clojure, Haskell

Frontend: React, Vue, Angular, Next.js, Svelte

Backend: Node.js, Express, Django, FastAPI, Flask, Rails, Spring, Laravel

Databases: PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch, Cassandra, DynamoDB

Cloud & DevOps: AWS, GCP, Azure, Kubernetes, Docker, Terraform, Ansible

APIs & Messaging: GraphQL, REST, gRPC, Kafka, RabbitMQ, Celery

AI / ML: TensorFlow, PyTorch, LLM, OpenAI, ML, AI

Mobile: iOS, Android, React Native, Flutter

Input Configuration

`mode` · string · default: `"hiring"`

Selects what the actor scrapes. Three modes are available:

Mode	Value	Description
Who is Hiring? threads	`"hiring"`	Scrapes recent monthly "Who is Hiring?" threads
Keyword search	`"search"`	Searches all HN posts and comments by keyword
Specific thread	`"thread"`	Scrapes all top-level comments from specific thread IDs

`months` · integer · default: `1` · range: `1–24`

(Used when mode is "hiring")

How many recent "Who is Hiring?" threads to scrape. Each thread covers one calendar month and typically contains 400–900 job postings.

Value	What you get
`1`	Latest month only (~400–900 jobs)
`3`	Last quarter of hiring data
`6`	Half-year trend dataset
`12`	Full year of HN job data
`24`	Two-year historical archive

`threadIds` · array of strings · default: `[]`

(Used when mode is "thread")

List of specific HN thread item IDs to scrape. All top-level comments from each thread are parsed.

How to find the ID: Open any HN thread in your browser. The ID is the number in the URL:

https://news.ycombinator.com/item?id=43574497
                                         ↑
                                    threadId = "43574497"

Useful for threads other than "Who is Hiring?", such as:

"Ask HN: Who wants to be hired?" — for job seekers posting their own profiles
"Ask HN: Freelancer? Seeking Freelancer?" — for freelance contracts
Any custom hiring thread from a specific community

`searchQuery` · string · default: `""`

(Used when mode is "search")

A keyword or phrase to search across all HN posts and comments via the Algolia HN index. Returns any matching HN content — not limited to hiring threads.

Examples:

"remote rust engineer" — find Rust job mentions anywhere on HN
"founding engineer Series A" — find early-stage company posts
"LLM inference hiring" — find AI infrastructure hiring discussions
"YC W25 hiring" — find YC Winter 2025 batch companies hiring

Results in search mode include fullText but often have fewer parsed structured fields (company, role, location), since posts outside hiring threads don't follow the standard comment format.

`filterKeywords` · array of strings · default: `[]`

Only keep postings whose full text contains at least one of these keywords. Case-insensitive. Applied after parsing, before saving.

Examples:

["Python", "Go", "Rust"] — only postings mentioning these languages
["San Francisco", "NYC", "Austin"] — only specific cities
["Series A", "Series B", "YC"] — only funded or accelerator-backed companies
["founding", "founding engineer"] — early-stage opportunities only

Leave empty to include all postings.

`excludeKeywords` · array of strings · default: `[]`

Remove postings whose full text contains any of these keywords. Case-insensitive.

Examples:

["cleared", "security clearance"] — exclude defense/government roles
["no remote", "onsite only", "in-office"] — exclude non-remote roles
["blockchain", "web3", "crypto"] — exclude crypto roles
["10+ years", "15+ years"] — exclude very senior requirements

`remoteOnly` · boolean · default: `false`

When enabled, only returns postings that explicitly mention remote work in any of these forms: REMOTE, FULL REMOTE, FULLY REMOTE, 100% REMOTE, REMOTE OK, REMOTE FRIENDLY, REMOTE FIRST, HYBRID.

Postings with only ONSITE or IN-OFFICE mentions are excluded.

`maxResults` · integer · default: `0` (unlimited)

Maximum number of job records to save across all threads. Set to 0 for unlimited. A single month's thread typically yields 400–900 jobs after filtering out non-job comments.

Usage Examples

Example 1 — Latest "Who is Hiring?" thread, all jobs

{
  "mode": "hiring",
  "months": 1,
  "maxResults": 0,
  "remoteOnly": false
}

Returns every parsed job posting from the current month's thread (~400–900 results).

Example 2 — Remote Python or Go jobs from the last 3 months

{
  "mode": "hiring",
  "months": 3,
  "filterKeywords": ["Python", "Go", "Golang"],
  "remoteOnly": true,
  "maxResults": 200
}

Example 3 — Six-month trend dataset for salary research

{
  "mode": "hiring",
  "months": 6,
  "maxResults": 0
}

Export to CSV and analyze salary and techStack columns for compensation benchmarking across the HN startup ecosystem.

Example 4 — Specific "Who wants to be hired?" thread (candidate sourcing)

{
  "mode": "thread",
  "threadIds": ["43574497", "41822152"],
  "maxResults": 500
}

Scrapes all top-level comments — useful for finding candidates from "Who wants to be hired?" threads.

Example 5 — Full-text keyword search across all of HN

{
  "mode": "search",
  "searchQuery": "founding engineer Series A remote",
  "maxResults": 100
}

Example 6 — Curated frontend jobs, exclude noise

{
  "mode": "hiring",
  "months": 1,
  "filterKeywords": ["React", "TypeScript", "Next.js"],
  "excludeKeywords": ["blockchain", "web3", "crypto", "no remote", "onsite only"],
  "remoteOnly": true,
  "maxResults": 50
}

How It Works

Mode: `hiring`

Step 1 — Discover threads
Queries the Algolia HN API for "Ask HN: Who is Hiring?" threads by title pattern, sorted by date. The most recent N threads (per months) are selected.

Step 2 — Fetch thread comments
Each thread is fetched by item ID, returning all top-level comments.

GET https://hn.algolia.com/api/v1/items/{threadId}

Step 3 — Parse each comment
For every top-level comment:

HTML tags are stripped and entities decoded to clean plain text
isRealJobPosting() heuristics reject non-job comments (general replies, congratulations, bare links, very short texts)
The first line is parsed for Company | Role or Company / Role format
Full text is regex-scanned for location, remote policy, salary, tech stack, visa, apply URL, and email
filterKeywords, excludeKeywords, and remoteOnly filters are applied

Step 4 — Save
Passing records are pushed to the dataset. A 500 ms courtesy delay is added between threads.

Mode: `thread`

Same as hiring mode but skips thread discovery — fetches the exact IDs you provide. Works for any Ask HN thread.

Mode: `search`

Queries the Algolia HN search API with your keyword, paginating through results (50 per page) until maxResults is reached or no more pages exist. Searches across all of HN history.

Hiring Mode Flow:

Input (months=N)
      │
      ▼
Discover N "Who is Hiring?" threads via Algolia
      │
      ▼  (for each thread)
Fetch all top-level comments
      │
      ▼  (for each comment)
Strip HTML → isRealJobPosting? → Parse fields
      │
      ▼
Apply filters (keywords, remote, maxResults)
      │
      ▼
Push to Dataset

Job Posting Format on HN

The "Who is Hiring?" community follows an informal but consistent format:

Company Name | Role | Location | Remote | Salary
[Optional second line with more details]

Description paragraph...

Tech stack, requirements, what you'll work on...

Apply: https://company.io/jobs
Contact: hiring@company.io

First-line separators can be | (pipe) or / (slash). The actor parses both.

Comments rejected as non-job-postings:

Shorter than 30 characters
First line longer than 200 characters (likely a paragraph, not a header)
Starts with a bare URL
Matches generic reply phrases: "Congratulations", "Good luck", "Does anyone know...", "Interesting thread", etc.

Data Quality Notes

Company & Role: Extracted from the first line using the |// convention. Companies that skip this format may have null for these fields — fullText always contains the complete raw posting.

Salary: Only captures explicitly stated salary figures. Many postings omit salary. null does not mean the role is low-paying.

Tech Stack: Detected via regex on 40+ known keywords. Technologies mentioned in unusual abbreviations or non-standard spellings may not be captured.

Remote policy: Classified from natural language keywords. Nuanced mentions ("we're a distributed team") may not be classified — use filterKeywords: ["remote"] for broader matching.

Location: Only detects a pre-defined list of major city names. Unusual city names or country-only mentions may not be captured.

Performance

Scenario	Threads	Expected Jobs	Est. Time
1 month, no filters	1	400–900	< 30 sec
3 months, no filters	3	1,200–2,700	~1–2 min
12 months, no filters	12	5,000–10,000	~5–10 min
24 months, no filters	24	10,000–20,000	~10–20 min
Search mode (100 results)	—	100	< 30 sec

Cost: Negligible. The actor uses only native fetch with the Apify SDK — no browser, no Playwright, no Cheerio. Expect under $0.01 per full monthly thread scrape.

Export Formats

Download your results from the Apify Dataset in:

JSON — full structured output, techStack as a native array
CSV — flat table; techStack serialized as comma-joined string, ready for Excel or Google Sheets
Excel (.xlsx) — native spreadsheet for sharing with non-technical stakeholders
JSONL — one record per line for streaming into Notion, Airtable, job alert bots, or custom pipelines

Tips & Recipes

Build a personal job alert:
Schedule this actor daily with mode: "hiring", months: 1, and your filterKeywords. Export to Airtable or a Google Sheet and watch matching jobs appear automatically.

Salary benchmarking:
Run months: 12 with no filters. Export to CSV. Filter salary != null and pivot by techStack. You now have a year of self-reported salary data from actual hiring managers — not aggregated survey estimates.

Track a company's hiring history:
Use mode: "search" with the company name as searchQuery. Returns all mentions of that company across years of HN hiring threads.

Source candidates:
Use mode: "thread" with the ID of the latest "Who wants to be hired?" thread. Same parsing logic extracts structured profiles from candidates advertising themselves.

Identify trending technologies:
Run months: 6, export to CSV, and count frequency in the techStack column. Reveals what the HN startup ecosystem is actually building with right now — a more reliable signal than survey reports.

Exclude noise efficiently:
Combine excludeKeywords: ["blockchain", "web3", "NFT"] with filterKeywords: ["Python", "Go"] to get a focused, high-signal list without manual review.

Limitations

Free-form text parsing. HN job postings follow a convention, not a strict schema. Postings that don't use the Company | Role first-line format will have null for company and role. The fullText field always contains the full original text regardless.
Salary not normalized. Salary is extracted verbatim. $180k and $180,000 are stored as different strings. Normalize in post-processing if needed.
No cross-month deduplication. Companies that post in multiple consecutive months appear as separate records. Use company + threadMonth as a composite key if deduplication is needed.
Search mode returns less structured data. Posts outside hiring threads don't follow the Company | Role convention, so company, role, location, and other parsed fields are often null in mode: "search" results.
Top-level comments only. The actor only processes top-level comments (one per job posting). Replies to job comments (e.g. "Is this role still open?") are not included.
Location detection is city-list based. Only a pre-defined set of major cities is matched. Uncommon city names or country-only mentions are not captured in the location field.

Frequently Asked Questions

Q: How often is the "Who is Hiring?" thread posted?
On the first weekday of every month, posted by the HN moderator whoishiring. It is one of the most consistent monthly events on the platform, running continuously for over a decade.

Q: How many job postings are in a typical thread?
Between 400 and 900 top-level comments, of which ~80–90% are genuine job postings after filtering out general replies and off-topic comments.

Q: Can I scrape the "Who wants to be hired?" thread too?
Yes — use mode: "thread" and provide that thread's item ID. The same parsing logic applies, extracting company, role, location, and tech stack from each commenter's profile post.

Q: Is this actor free to run?
The actor itself costs minimal Apify compute (under $0.01 per month of data). The HN Algolia API is completely free and requires no API key or registration.

Q: Do I need a proxy?
No. The Algolia HN API is public, rate-limit-generous, and does not require proxy usage for normal scraping volumes.

Q: Why are company and role null for some records?
Some commenters don't follow the standard Company | Role format and write a paragraph instead of a structured first line. The fullText field always contains the complete posting.

Q: Can I scrape older threads from 2020, 2021, or earlier?
Yes — use mode: "thread" with the specific thread IDs from those years. Find old thread IDs via HN search or the Algolia API. The months parameter only looks back from the current date via date-sorted discovery.

Q: What's the difference between filterKeywords and searchQuery?
searchQuery is used only in mode: "search" and queries the Algolia index server-side before any data is fetched. filterKeywords is a client-side filter applied after fetching and parsing, and works in all three modes on the fullText of already-downloaded comments.

Q: Can I run this on a schedule for continuous monitoring?
Yes — use the Apify Scheduler to run daily or weekly. With months: 1 and your keyword filters, you get a fresh filtered dataset of each month's new postings automatically.

Technical Details

Property	Value
Runtime	Node.js (ES Modules)
Framework	Apify SDK v3
HTTP client	Native `fetch`
Data source	Algolia HN API (`hn.algolia.com/api/v1`)
Proxy required	❌ No
API key required	❌ No
Browser required	❌ No
Dependencies	`apify` only
Tech keywords detected	40+
Delay between threads	500 ms
Delay between search pages	300 ms
Max redirect hops	N/A (JSON API)

Changelog

2026-06-01 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

2026-05-25 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

v1.0

Initial release
Three modes: hiring (monthly threads), thread (by ID), search (full-text Algolia)
Structured field extraction: company, role, location, remote, salary, tech stack, visa, apply URL, email
40+ tech keyword detection with pre-compiled regex
Remote policy classification: Remote / Hybrid / Onsite
Salary range extraction (various formats)
Visa sponsorship detection (H1B, sponsorship available/not available)
filterKeywords, excludeKeywords, and remoteOnly client-side filters
Up to 24 months of historical thread scraping
No proxy, no API key, no browser required

Support

If you encounter missing fields, unexpected empty results, or parsing issues, please open a support ticket via the Apify Console. Include the thread ID or search query, your full input configuration, and the actor run ID to help diagnose the issue quickly.

Changelog

2026-05-20 — Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.

Last reviewed: 2026-06-01.

Hacker News "Who is Hiring" Jobs Scraper

seemuapps/hn-who-is-hiring-scraper

Scrape every job listing from the latest Hacker News monthly Who is Hiring thread. Company, role, location, remote flag, salary, links, and emails for each post.

Andrew

Hacker News Jobs Scraper

money_machine_agent/hn-who-is-hiring-scraper

Pulls clean job postings from HN monthly hiring threads — company, role, location, remote, tech stack, salary, contact. Free, no proxies.

Shane Miller

Hacker News Who's Hiring Job Scraper

gocreative.ai/hn-hiring-extractor

Parses the monthly Hacker News 'Who is hiring' thread into clean structured JSON. Filter by keywords, remote-only. For recruiters, job aggregators, candidate research.

GoCreative AI

Hacker News Who's Hiring Jobs Scraper

parseforge/hn-whoishiring-scraper

Parse the monthly Ask HN: Who is hiring? threads into structured job postings. Returns company, role, location, remote/onsite/hybrid, salary, visa support, full tech stack detection, employment type, and HN comment URL. Filter by month, keyword, remote-only, salary, or stack.

ParseForge

Hacker News Jobs Scraper — Who is Hiring

deadlyaccurate/hn-jobs-scraper

Scrape HN "Who is Hiring" threads. Extracts company, title, location, remote, salary, technologies from unstructured text. Multi-month coverage. Unified/raw/both output.

Doug Silkstone

Hn Who Is Hiring Scraper

carmine_tennis/hn-who-is-hiring-scraper

Extract every job post from the monthly Hacker News "Who Is Hiring?" thread into clean JSON. Auto-detects the latest thread. Parses company, role, salary, remote status, tech stack, and apply link. Perfect for job seekers, recruiters, and developers building job aggregators.

Anthony Aivaliotis

Hacker News Scraper

klondikeking/hacker-news-scraper

Pierrick McD0nald

Hacker News Scraper

koreyoshi/hacker-news-scraper

Mr-chen