Hacker News Who Is Hiring Scraper – Jobs, Salary & Email avatar

Hacker News Who Is Hiring Scraper – Jobs, Salary & Email

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Hacker News Who Is Hiring Scraper – Jobs, Salary & Email

Hacker News Who Is Hiring Scraper – Jobs, Salary & Email

Scrape structured job listings from Hacker News 'Who is Hiring?' monthly threads. Extracts company, role, location, salary, remote policy and tech stack — no AI, no API key, no proxy needed.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

0

Bookmarked

30

Total users

7

Monthly active users

2 days ago

Last modified

Share

Hacker News Who Is Hiring Scraper — Jobs, Salary & Tech Stack Data

Hacker News Who Is Hiring Scraper – Jobs, Salary & Email

Scrape structured job listings from Hacker News "Ask HN: Who is Hiring?" monthly threads. Extracts company name, role, location, salary, remote policy, tech stack, visa sponsorship, apply URL, and contact email — automatically, from any month going back years. No AI, no API key, no proxy required.


What Is This Actor?

Every month, Hacker News hosts one of the internet's most trusted job boards: the "Ask HN: Who is Hiring?" thread. Thousands of startups, scale-ups, and tech companies post jobs there directly — no recruiter markup, no job board fees, straight from the hiring team. Each thread typically contains 400–900 real job postings.

This actor reads those threads via the Algolia HN API, parses every job comment into structured fields, and outputs a clean dataset ready for analysis, job alerts, or CRM import.

Built for:

  • 👩‍💻 Job seekers — filter thousands of HN listings by tech stack, remote policy, or salary without reading every comment
  • 📊 Recruiters & HR teams — monitor the HN talent market and track competing companies' hiring activity
  • 🔬 Researchers & analysts — study tech hiring trends, salary ranges, and in-demand skills over time
  • 🤖 Pipeline builders — feed structured job data into Notion, Airtable, or a custom job alert bot
  • 📈 Investors & founders — understand who is scaling and what roles are in demand across the startup ecosystem
  • 🗃️ Data engineers — build a historical job market dataset from months or years of HN hiring threads

Features

  • Three scrape modes — monthly hiring threads, specific thread by ID, or full-text keyword search across all of HN
  • Structured field parsing — extracts company, role, location, salary, remote policy, tech stack, visa info, apply URL, and contact email from free-form comment text
  • 40+ tech stack keywords detected — Python, Go, Rust, React, Kubernetes, PostgreSQL, AWS, LLMs, and more
  • Remote policy classification — distinguishes Full Remote, Hybrid, and Onsite from natural language mentions
  • Salary range extraction — detects $120k–$160k, $200k/yr, and similar formats
  • Visa sponsorship detection — flags H1B mentions, "visa sponsorship available", and "no visa sponsorship"
  • Keyword include/exclude filters — narrow results to exactly the roles you want
  • Remote-only filter — one toggle to return only remote-friendly listings
  • Multi-month history — scrape up to 24 months of threads in a single run
  • No API key, no proxy, no login — uses the public Algolia HN API
  • Minimal dependencies — only the Apify SDK; no Playwright, no Cheerio, no browser

Output Data

Each record represents one parsed job posting (top-level comment) from a hiring thread.

FieldTypeDescription
commentIdstringHN item ID of the comment
threadIdstringHN item ID of the parent thread
threadTitlestringFull title of the Ask HN thread
threadMonthstringMonth and year of the thread (e.g. "May 2025")
authorstringHN username of the commenter
companystring | nullCompany name parsed from the first line of the posting
rolestring | nullJob title or role parsed from the first line
locationstring | nullOffice location(s) detected in the text
remotestring | null"Remote", "Hybrid", or "Onsite"
salarystring | nullSalary range or figure if mentioned (raw string)
techStackarray | nullList of detected technologies and languages
visastring | nullVisa sponsorship status if mentioned
applyUrlstring | nullFirst URL found in the posting (apply link or company site)
emailstring | nullContact email address if present
fullTextstringComplete plain-text content of the job posting
postedAtstringISO 8601 timestamp of when the comment was posted
hnUrlstringDirect link to the comment on Hacker News
scrapedAtstringISO 8601 timestamp of when this record was scraped

Sample Output Record

{
"commentId": "43812345",
"threadId": "43800001",
"threadTitle": "Ask HN: Who is Hiring? (May 2025)",
"threadMonth": "May 2025",
"author": "jane_at_acme",
"company": "Acme AI",
"role": "Senior Backend Engineer",
"location": "San Francisco, Remote",
"remote": "Remote",
"salary": "$160k–$200k",
"techStack": ["Python", "Go", "PostgreSQL", "Kubernetes", "AWS"],
"visa": "Visa sponsorship available",
"applyUrl": "https://acmeai.io/careers",
"email": "jobs@acmeai.io",
"fullText": "Acme AI | Senior Backend Engineer | Remote | $160k–$200k\n\nWe're building the next generation of AI infrastructure...",
"postedAt": "2025-05-01T10:22:05.000Z",
"hnUrl": "https://news.ycombinator.com/item?id=43812345",
"scrapedAt": "2025-05-15T14:00:00.000Z"
}

Detected Tech Stack Keywords

The actor scans each posting for 40+ technology keywords across languages, frameworks, databases, cloud platforms, and AI/ML tools:

Languages: Python, JavaScript, TypeScript, Go / Golang, Rust, Java, Kotlin, Swift, C++, C#, Ruby, PHP, Scala, Elixir, Clojure, Haskell

Frontend: React, Vue, Angular, Next.js, Svelte

Backend: Node.js, Express, Django, FastAPI, Flask, Rails, Spring, Laravel

Databases: PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch, Cassandra, DynamoDB

Cloud & DevOps: AWS, GCP, Azure, Kubernetes, Docker, Terraform, Ansible

APIs & Messaging: GraphQL, REST, gRPC, Kafka, RabbitMQ, Celery

AI / ML: TensorFlow, PyTorch, LLM, OpenAI, ML, AI

Mobile: iOS, Android, React Native, Flutter


Input Configuration

mode · string · default: "hiring"

Selects what the actor scrapes. Three modes are available:

ModeValueDescription
Who is Hiring? threads"hiring"Scrapes recent monthly "Who is Hiring?" threads
Keyword search"search"Searches all HN posts and comments by keyword
Specific thread"thread"Scrapes all top-level comments from specific thread IDs

months · integer · default: 1 · range: 1–24

(Used when mode is "hiring")

How many recent "Who is Hiring?" threads to scrape. Each thread covers one calendar month and typically contains 400–900 job postings.

ValueWhat you get
1Latest month only (~400–900 jobs)
3Last quarter of hiring data
6Half-year trend dataset
12Full year of HN job data
24Two-year historical archive

threadIds · array of strings · default: []

(Used when mode is "thread")

List of specific HN thread item IDs to scrape. All top-level comments from each thread are parsed.

How to find the ID: Open any HN thread in your browser. The ID is the number in the URL:

https://news.ycombinator.com/item?id=43574497
threadId = "43574497"

Useful for threads other than "Who is Hiring?", such as:

  • "Ask HN: Who wants to be hired?" — for job seekers posting their own profiles
  • "Ask HN: Freelancer? Seeking Freelancer?" — for freelance contracts
  • Any custom hiring thread from a specific community

searchQuery · string · default: ""

(Used when mode is "search")

A keyword or phrase to search across all HN posts and comments via the Algolia HN index. Returns any matching HN content — not limited to hiring threads.

Examples:

  • "remote rust engineer" — find Rust job mentions anywhere on HN
  • "founding engineer Series A" — find early-stage company posts
  • "LLM inference hiring" — find AI infrastructure hiring discussions
  • "YC W25 hiring" — find YC Winter 2025 batch companies hiring

Results in search mode include fullText but often have fewer parsed structured fields (company, role, location), since posts outside hiring threads don't follow the standard comment format.


filterKeywords · array of strings · default: []

Only keep postings whose full text contains at least one of these keywords. Case-insensitive. Applied after parsing, before saving.

Examples:

  • ["Python", "Go", "Rust"] — only postings mentioning these languages
  • ["San Francisco", "NYC", "Austin"] — only specific cities
  • ["Series A", "Series B", "YC"] — only funded or accelerator-backed companies
  • ["founding", "founding engineer"] — early-stage opportunities only

Leave empty to include all postings.


excludeKeywords · array of strings · default: []

Remove postings whose full text contains any of these keywords. Case-insensitive.

Examples:

  • ["cleared", "security clearance"] — exclude defense/government roles
  • ["no remote", "onsite only", "in-office"] — exclude non-remote roles
  • ["blockchain", "web3", "crypto"] — exclude crypto roles
  • ["10+ years", "15+ years"] — exclude very senior requirements

remoteOnly · boolean · default: false

When enabled, only returns postings that explicitly mention remote work in any of these forms: REMOTE, FULL REMOTE, FULLY REMOTE, 100% REMOTE, REMOTE OK, REMOTE FRIENDLY, REMOTE FIRST, HYBRID.

Postings with only ONSITE or IN-OFFICE mentions are excluded.


maxResults · integer · default: 0 (unlimited)

Maximum number of job records to save across all threads. Set to 0 for unlimited. A single month's thread typically yields 400–900 jobs after filtering out non-job comments.


Usage Examples

Example 1 — Latest "Who is Hiring?" thread, all jobs

{
"mode": "hiring",
"months": 1,
"maxResults": 0,
"remoteOnly": false
}

Returns every parsed job posting from the current month's thread (~400–900 results).


Example 2 — Remote Python or Go jobs from the last 3 months

{
"mode": "hiring",
"months": 3,
"filterKeywords": ["Python", "Go", "Golang"],
"remoteOnly": true,
"maxResults": 200
}

Example 3 — Six-month trend dataset for salary research

{
"mode": "hiring",
"months": 6,
"maxResults": 0
}

Export to CSV and analyze salary and techStack columns for compensation benchmarking across the HN startup ecosystem.


Example 4 — Specific "Who wants to be hired?" thread (candidate sourcing)

{
"mode": "thread",
"threadIds": ["43574497", "41822152"],
"maxResults": 500
}

Scrapes all top-level comments — useful for finding candidates from "Who wants to be hired?" threads.


Example 5 — Full-text keyword search across all of HN

{
"mode": "search",
"searchQuery": "founding engineer Series A remote",
"maxResults": 100
}

Example 6 — Curated frontend jobs, exclude noise

{
"mode": "hiring",
"months": 1,
"filterKeywords": ["React", "TypeScript", "Next.js"],
"excludeKeywords": ["blockchain", "web3", "crypto", "no remote", "onsite only"],
"remoteOnly": true,
"maxResults": 50
}

How It Works

Mode: hiring

Step 1 — Discover threads
Queries the Algolia HN API for "Ask HN: Who is Hiring?" threads by title pattern, sorted by date. The most recent N threads (per months) are selected.

Step 2 — Fetch thread comments
Each thread is fetched by item ID, returning all top-level comments.

GET https://hn.algolia.com/api/v1/items/{threadId}

Step 3 — Parse each comment
For every top-level comment:

  1. HTML tags are stripped and entities decoded to clean plain text
  2. isRealJobPosting() heuristics reject non-job comments (general replies, congratulations, bare links, very short texts)
  3. The first line is parsed for Company | Role or Company / Role format
  4. Full text is regex-scanned for location, remote policy, salary, tech stack, visa, apply URL, and email
  5. filterKeywords, excludeKeywords, and remoteOnly filters are applied

Step 4 — Save
Passing records are pushed to the dataset. A 500 ms courtesy delay is added between threads.

Mode: thread

Same as hiring mode but skips thread discovery — fetches the exact IDs you provide. Works for any Ask HN thread.

Queries the Algolia HN search API with your keyword, paginating through results (50 per page) until maxResults is reached or no more pages exist. Searches across all of HN history.

Hiring Mode Flow:
Input (months=N)
Discover N "Who is Hiring?" threads via Algolia
(for each thread)
Fetch all top-level comments
(for each comment)
Strip HTML → isRealJobPosting? → Parse fields
Apply filters (keywords, remote, maxResults)
Push to Dataset

Job Posting Format on HN

The "Who is Hiring?" community follows an informal but consistent format:

Company Name | Role | Location | Remote | Salary
[Optional second line with more details]
Description paragraph...
Tech stack, requirements, what you'll work on...
Apply: https://company.io/jobs
Contact: hiring@company.io

First-line separators can be | (pipe) or / (slash). The actor parses both.

Comments rejected as non-job-postings:

  • Shorter than 30 characters
  • First line longer than 200 characters (likely a paragraph, not a header)
  • Starts with a bare URL
  • Matches generic reply phrases: "Congratulations", "Good luck", "Does anyone know...", "Interesting thread", etc.

Data Quality Notes

Company & Role: Extracted from the first line using the |// convention. Companies that skip this format may have null for these fields — fullText always contains the complete raw posting.

Salary: Only captures explicitly stated salary figures. Many postings omit salary. null does not mean the role is low-paying.

Tech Stack: Detected via regex on 40+ known keywords. Technologies mentioned in unusual abbreviations or non-standard spellings may not be captured.

Remote policy: Classified from natural language keywords. Nuanced mentions ("we're a distributed team") may not be classified — use filterKeywords: ["remote"] for broader matching.

Location: Only detects a pre-defined list of major city names. Unusual city names or country-only mentions may not be captured.


Performance

ScenarioThreadsExpected JobsEst. Time
1 month, no filters1400–900< 30 sec
3 months, no filters31,200–2,700~1–2 min
12 months, no filters125,000–10,000~5–10 min
24 months, no filters2410,000–20,000~10–20 min
Search mode (100 results)100< 30 sec

Cost: Negligible. The actor uses only native fetch with the Apify SDK — no browser, no Playwright, no Cheerio. Expect under $0.01 per full monthly thread scrape.


Export Formats

Download your results from the Apify Dataset in:

  • JSON — full structured output, techStack as a native array
  • CSV — flat table; techStack serialized as comma-joined string, ready for Excel or Google Sheets
  • Excel (.xlsx) — native spreadsheet for sharing with non-technical stakeholders
  • JSONL — one record per line for streaming into Notion, Airtable, job alert bots, or custom pipelines

Tips & Recipes

Build a personal job alert:
Schedule this actor daily with mode: "hiring", months: 1, and your filterKeywords. Export to Airtable or a Google Sheet and watch matching jobs appear automatically.

Salary benchmarking:
Run months: 12 with no filters. Export to CSV. Filter salary != null and pivot by techStack. You now have a year of self-reported salary data from actual hiring managers — not aggregated survey estimates.

Track a company's hiring history:
Use mode: "search" with the company name as searchQuery. Returns all mentions of that company across years of HN hiring threads.

Source candidates:
Use mode: "thread" with the ID of the latest "Who wants to be hired?" thread. Same parsing logic extracts structured profiles from candidates advertising themselves.

Identify trending technologies:
Run months: 6, export to CSV, and count frequency in the techStack column. Reveals what the HN startup ecosystem is actually building with right now — a more reliable signal than survey reports.

Exclude noise efficiently:
Combine excludeKeywords: ["blockchain", "web3", "NFT"] with filterKeywords: ["Python", "Go"] to get a focused, high-signal list without manual review.


Limitations

  • Free-form text parsing. HN job postings follow a convention, not a strict schema. Postings that don't use the Company | Role first-line format will have null for company and role. The fullText field always contains the full original text regardless.
  • Salary not normalized. Salary is extracted verbatim. $180k and $180,000 are stored as different strings. Normalize in post-processing if needed.
  • No cross-month deduplication. Companies that post in multiple consecutive months appear as separate records. Use company + threadMonth as a composite key if deduplication is needed.
  • Search mode returns less structured data. Posts outside hiring threads don't follow the Company | Role convention, so company, role, location, and other parsed fields are often null in mode: "search" results.
  • Top-level comments only. The actor only processes top-level comments (one per job posting). Replies to job comments (e.g. "Is this role still open?") are not included.
  • Location detection is city-list based. Only a pre-defined set of major cities is matched. Uncommon city names or country-only mentions are not captured in the location field.

Frequently Asked Questions

Q: How often is the "Who is Hiring?" thread posted?
On the first weekday of every month, posted by the HN moderator whoishiring. It is one of the most consistent monthly events on the platform, running continuously for over a decade.

Q: How many job postings are in a typical thread?
Between 400 and 900 top-level comments, of which ~80–90% are genuine job postings after filtering out general replies and off-topic comments.

Q: Can I scrape the "Who wants to be hired?" thread too?
Yes — use mode: "thread" and provide that thread's item ID. The same parsing logic applies, extracting company, role, location, and tech stack from each commenter's profile post.

Q: Is this actor free to run?
The actor itself costs minimal Apify compute (under $0.01 per month of data). The HN Algolia API is completely free and requires no API key or registration.

Q: Do I need a proxy?
No. The Algolia HN API is public, rate-limit-generous, and does not require proxy usage for normal scraping volumes.

Q: Why are company and role null for some records?
Some commenters don't follow the standard Company | Role format and write a paragraph instead of a structured first line. The fullText field always contains the complete posting.

Q: Can I scrape older threads from 2020, 2021, or earlier?
Yes — use mode: "thread" with the specific thread IDs from those years. Find old thread IDs via HN search or the Algolia API. The months parameter only looks back from the current date via date-sorted discovery.

Q: What's the difference between filterKeywords and searchQuery?
searchQuery is used only in mode: "search" and queries the Algolia index server-side before any data is fetched. filterKeywords is a client-side filter applied after fetching and parsing, and works in all three modes on the fullText of already-downloaded comments.

Q: Can I run this on a schedule for continuous monitoring?
Yes — use the Apify Scheduler to run daily or weekly. With months: 1 and your keyword filters, you get a fresh filtered dataset of each month's new postings automatically.


Technical Details

PropertyValue
RuntimeNode.js (ES Modules)
FrameworkApify SDK v3
HTTP clientNative fetch
Data sourceAlgolia HN API (hn.algolia.com/api/v1)
Proxy required❌ No
API key required❌ No
Browser required❌ No
Dependenciesapify only
Tech keywords detected40+
Delay between threads500 ms
Delay between search pages300 ms
Max redirect hopsN/A (JSON API)

Changelog

  • 2026-06-01 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
  • 2026-05-25 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

v1.0

  • Initial release
  • Three modes: hiring (monthly threads), thread (by ID), search (full-text Algolia)
  • Structured field extraction: company, role, location, remote, salary, tech stack, visa, apply URL, email
  • 40+ tech keyword detection with pre-compiled regex
  • Remote policy classification: Remote / Hybrid / Onsite
  • Salary range extraction (various formats)
  • Visa sponsorship detection (H1B, sponsorship available/not available)
  • filterKeywords, excludeKeywords, and remoteOnly client-side filters
  • Up to 24 months of historical thread scraping
  • No proxy, no API key, no browser required

Support

If you encounter missing fields, unexpected empty results, or parsing issues, please open a support ticket via the Apify Console. Include the thread ID or search query, your full input configuration, and the actor run ID to help diagnose the issue quickly.


Changelog

  • 2026-05-20 — Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.

Last reviewed: 2026-06-01.