Hacker News Who Is Hiring Scraper – Jobs, Salary & Email
Pricing
from $2.00 / 1,000 results
Hacker News Who Is Hiring Scraper – Jobs, Salary & Email
Scrape structured job listings from Hacker News 'Who is Hiring?' monthly threads. Extracts company, role, location, salary, remote policy and tech stack — no AI, no API key, no proxy needed.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
0
Bookmarked
30
Total users
7
Monthly active users
2 days ago
Last modified
Categories
Share
Hacker News Who Is Hiring Scraper — Jobs, Salary & Tech Stack Data

Scrape structured job listings from Hacker News "Ask HN: Who is Hiring?" monthly threads. Extracts company name, role, location, salary, remote policy, tech stack, visa sponsorship, apply URL, and contact email — automatically, from any month going back years. No AI, no API key, no proxy required.
What Is This Actor?
Every month, Hacker News hosts one of the internet's most trusted job boards: the "Ask HN: Who is Hiring?" thread. Thousands of startups, scale-ups, and tech companies post jobs there directly — no recruiter markup, no job board fees, straight from the hiring team. Each thread typically contains 400–900 real job postings.
This actor reads those threads via the Algolia HN API, parses every job comment into structured fields, and outputs a clean dataset ready for analysis, job alerts, or CRM import.
Built for:
- 👩💻 Job seekers — filter thousands of HN listings by tech stack, remote policy, or salary without reading every comment
- 📊 Recruiters & HR teams — monitor the HN talent market and track competing companies' hiring activity
- 🔬 Researchers & analysts — study tech hiring trends, salary ranges, and in-demand skills over time
- 🤖 Pipeline builders — feed structured job data into Notion, Airtable, or a custom job alert bot
- 📈 Investors & founders — understand who is scaling and what roles are in demand across the startup ecosystem
- 🗃️ Data engineers — build a historical job market dataset from months or years of HN hiring threads
Features
- Three scrape modes — monthly hiring threads, specific thread by ID, or full-text keyword search across all of HN
- Structured field parsing — extracts company, role, location, salary, remote policy, tech stack, visa info, apply URL, and contact email from free-form comment text
- 40+ tech stack keywords detected — Python, Go, Rust, React, Kubernetes, PostgreSQL, AWS, LLMs, and more
- Remote policy classification — distinguishes Full Remote, Hybrid, and Onsite from natural language mentions
- Salary range extraction — detects
$120k–$160k,$200k/yr, and similar formats - Visa sponsorship detection — flags H1B mentions, "visa sponsorship available", and "no visa sponsorship"
- Keyword include/exclude filters — narrow results to exactly the roles you want
- Remote-only filter — one toggle to return only remote-friendly listings
- Multi-month history — scrape up to 24 months of threads in a single run
- No API key, no proxy, no login — uses the public Algolia HN API
- Minimal dependencies — only the Apify SDK; no Playwright, no Cheerio, no browser
Output Data
Each record represents one parsed job posting (top-level comment) from a hiring thread.
| Field | Type | Description |
|---|---|---|
commentId | string | HN item ID of the comment |
threadId | string | HN item ID of the parent thread |
threadTitle | string | Full title of the Ask HN thread |
threadMonth | string | Month and year of the thread (e.g. "May 2025") |
author | string | HN username of the commenter |
company | string | null | Company name parsed from the first line of the posting |
role | string | null | Job title or role parsed from the first line |
location | string | null | Office location(s) detected in the text |
remote | string | null | "Remote", "Hybrid", or "Onsite" |
salary | string | null | Salary range or figure if mentioned (raw string) |
techStack | array | null | List of detected technologies and languages |
visa | string | null | Visa sponsorship status if mentioned |
applyUrl | string | null | First URL found in the posting (apply link or company site) |
email | string | null | Contact email address if present |
fullText | string | Complete plain-text content of the job posting |
postedAt | string | ISO 8601 timestamp of when the comment was posted |
hnUrl | string | Direct link to the comment on Hacker News |
scrapedAt | string | ISO 8601 timestamp of when this record was scraped |
Sample Output Record
{"commentId": "43812345","threadId": "43800001","threadTitle": "Ask HN: Who is Hiring? (May 2025)","threadMonth": "May 2025","author": "jane_at_acme","company": "Acme AI","role": "Senior Backend Engineer","location": "San Francisco, Remote","remote": "Remote","salary": "$160k–$200k","techStack": ["Python", "Go", "PostgreSQL", "Kubernetes", "AWS"],"visa": "Visa sponsorship available","applyUrl": "https://acmeai.io/careers","email": "jobs@acmeai.io","fullText": "Acme AI | Senior Backend Engineer | Remote | $160k–$200k\n\nWe're building the next generation of AI infrastructure...","postedAt": "2025-05-01T10:22:05.000Z","hnUrl": "https://news.ycombinator.com/item?id=43812345","scrapedAt": "2025-05-15T14:00:00.000Z"}
Detected Tech Stack Keywords
The actor scans each posting for 40+ technology keywords across languages, frameworks, databases, cloud platforms, and AI/ML tools:
Languages: Python, JavaScript, TypeScript, Go / Golang, Rust, Java, Kotlin, Swift, C++, C#, Ruby, PHP, Scala, Elixir, Clojure, Haskell
Frontend: React, Vue, Angular, Next.js, Svelte
Backend: Node.js, Express, Django, FastAPI, Flask, Rails, Spring, Laravel
Databases: PostgreSQL, MySQL, MongoDB, Redis, Elasticsearch, Cassandra, DynamoDB
Cloud & DevOps: AWS, GCP, Azure, Kubernetes, Docker, Terraform, Ansible
APIs & Messaging: GraphQL, REST, gRPC, Kafka, RabbitMQ, Celery
AI / ML: TensorFlow, PyTorch, LLM, OpenAI, ML, AI
Mobile: iOS, Android, React Native, Flutter
Input Configuration
mode · string · default: "hiring"
Selects what the actor scrapes. Three modes are available:
| Mode | Value | Description |
|---|---|---|
| Who is Hiring? threads | "hiring" | Scrapes recent monthly "Who is Hiring?" threads |
| Keyword search | "search" | Searches all HN posts and comments by keyword |
| Specific thread | "thread" | Scrapes all top-level comments from specific thread IDs |
months · integer · default: 1 · range: 1–24
(Used when mode is "hiring")
How many recent "Who is Hiring?" threads to scrape. Each thread covers one calendar month and typically contains 400–900 job postings.
| Value | What you get |
|---|---|
1 | Latest month only (~400–900 jobs) |
3 | Last quarter of hiring data |
6 | Half-year trend dataset |
12 | Full year of HN job data |
24 | Two-year historical archive |
threadIds · array of strings · default: []
(Used when mode is "thread")
List of specific HN thread item IDs to scrape. All top-level comments from each thread are parsed.
How to find the ID: Open any HN thread in your browser. The ID is the number in the URL:
https://news.ycombinator.com/item?id=43574497↑threadId = "43574497"
Useful for threads other than "Who is Hiring?", such as:
- "Ask HN: Who wants to be hired?" — for job seekers posting their own profiles
- "Ask HN: Freelancer? Seeking Freelancer?" — for freelance contracts
- Any custom hiring thread from a specific community
searchQuery · string · default: ""
(Used when mode is "search")
A keyword or phrase to search across all HN posts and comments via the Algolia HN index. Returns any matching HN content — not limited to hiring threads.
Examples:
"remote rust engineer"— find Rust job mentions anywhere on HN"founding engineer Series A"— find early-stage company posts"LLM inference hiring"— find AI infrastructure hiring discussions"YC W25 hiring"— find YC Winter 2025 batch companies hiring
Results in search mode include
fullTextbut often have fewer parsed structured fields (company,role,location), since posts outside hiring threads don't follow the standard comment format.
filterKeywords · array of strings · default: []
Only keep postings whose full text contains at least one of these keywords. Case-insensitive. Applied after parsing, before saving.
Examples:
["Python", "Go", "Rust"]— only postings mentioning these languages["San Francisco", "NYC", "Austin"]— only specific cities["Series A", "Series B", "YC"]— only funded or accelerator-backed companies["founding", "founding engineer"]— early-stage opportunities only
Leave empty to include all postings.
excludeKeywords · array of strings · default: []
Remove postings whose full text contains any of these keywords. Case-insensitive.
Examples:
["cleared", "security clearance"]— exclude defense/government roles["no remote", "onsite only", "in-office"]— exclude non-remote roles["blockchain", "web3", "crypto"]— exclude crypto roles["10+ years", "15+ years"]— exclude very senior requirements
remoteOnly · boolean · default: false
When enabled, only returns postings that explicitly mention remote work in any of these forms:
REMOTE, FULL REMOTE, FULLY REMOTE, 100% REMOTE, REMOTE OK, REMOTE FRIENDLY, REMOTE FIRST, HYBRID.
Postings with only ONSITE or IN-OFFICE mentions are excluded.
maxResults · integer · default: 0 (unlimited)
Maximum number of job records to save across all threads. Set to 0 for unlimited. A single month's thread typically yields 400–900 jobs after filtering out non-job comments.
Usage Examples
Example 1 — Latest "Who is Hiring?" thread, all jobs
{"mode": "hiring","months": 1,"maxResults": 0,"remoteOnly": false}
Returns every parsed job posting from the current month's thread (~400–900 results).
Example 2 — Remote Python or Go jobs from the last 3 months
{"mode": "hiring","months": 3,"filterKeywords": ["Python", "Go", "Golang"],"remoteOnly": true,"maxResults": 200}
Example 3 — Six-month trend dataset for salary research
{"mode": "hiring","months": 6,"maxResults": 0}
Export to CSV and analyze salary and techStack columns for compensation benchmarking across the HN startup ecosystem.
Example 4 — Specific "Who wants to be hired?" thread (candidate sourcing)
{"mode": "thread","threadIds": ["43574497", "41822152"],"maxResults": 500}
Scrapes all top-level comments — useful for finding candidates from "Who wants to be hired?" threads.
Example 5 — Full-text keyword search across all of HN
{"mode": "search","searchQuery": "founding engineer Series A remote","maxResults": 100}
Example 6 — Curated frontend jobs, exclude noise
{"mode": "hiring","months": 1,"filterKeywords": ["React", "TypeScript", "Next.js"],"excludeKeywords": ["blockchain", "web3", "crypto", "no remote", "onsite only"],"remoteOnly": true,"maxResults": 50}
How It Works
Mode: hiring
Step 1 — Discover threads
Queries the Algolia HN API for "Ask HN: Who is Hiring?" threads by title pattern, sorted by date. The most recent N threads (per months) are selected.
Step 2 — Fetch thread comments
Each thread is fetched by item ID, returning all top-level comments.
GET https://hn.algolia.com/api/v1/items/{threadId}
Step 3 — Parse each comment
For every top-level comment:
- HTML tags are stripped and entities decoded to clean plain text
isRealJobPosting()heuristics reject non-job comments (general replies, congratulations, bare links, very short texts)- The first line is parsed for
Company | RoleorCompany / Roleformat - Full text is regex-scanned for location, remote policy, salary, tech stack, visa, apply URL, and email
filterKeywords,excludeKeywords, andremoteOnlyfilters are applied
Step 4 — Save
Passing records are pushed to the dataset. A 500 ms courtesy delay is added between threads.
Mode: thread
Same as hiring mode but skips thread discovery — fetches the exact IDs you provide. Works for any Ask HN thread.
Mode: search
Queries the Algolia HN search API with your keyword, paginating through results (50 per page) until maxResults is reached or no more pages exist. Searches across all of HN history.
Hiring Mode Flow:Input (months=N)│▼Discover N "Who is Hiring?" threads via Algolia│▼ (for each thread)Fetch all top-level comments│▼ (for each comment)Strip HTML → isRealJobPosting? → Parse fields│▼Apply filters (keywords, remote, maxResults)│▼Push to Dataset
Job Posting Format on HN
The "Who is Hiring?" community follows an informal but consistent format:
Company Name | Role | Location | Remote | Salary[Optional second line with more details]Description paragraph...Tech stack, requirements, what you'll work on...Apply: https://company.io/jobsContact: hiring@company.io
First-line separators can be | (pipe) or / (slash). The actor parses both.
Comments rejected as non-job-postings:
- Shorter than 30 characters
- First line longer than 200 characters (likely a paragraph, not a header)
- Starts with a bare URL
- Matches generic reply phrases: "Congratulations", "Good luck", "Does anyone know...", "Interesting thread", etc.
Data Quality Notes
Company & Role: Extracted from the first line using the |// convention. Companies that skip this format may have null for these fields — fullText always contains the complete raw posting.
Salary: Only captures explicitly stated salary figures. Many postings omit salary. null does not mean the role is low-paying.
Tech Stack: Detected via regex on 40+ known keywords. Technologies mentioned in unusual abbreviations or non-standard spellings may not be captured.
Remote policy: Classified from natural language keywords. Nuanced mentions ("we're a distributed team") may not be classified — use filterKeywords: ["remote"] for broader matching.
Location: Only detects a pre-defined list of major city names. Unusual city names or country-only mentions may not be captured.
Performance
| Scenario | Threads | Expected Jobs | Est. Time |
|---|---|---|---|
| 1 month, no filters | 1 | 400–900 | < 30 sec |
| 3 months, no filters | 3 | 1,200–2,700 | ~1–2 min |
| 12 months, no filters | 12 | 5,000–10,000 | ~5–10 min |
| 24 months, no filters | 24 | 10,000–20,000 | ~10–20 min |
| Search mode (100 results) | — | 100 | < 30 sec |
Cost: Negligible. The actor uses only native fetch with the Apify SDK — no browser, no Playwright, no Cheerio. Expect under $0.01 per full monthly thread scrape.
Export Formats
Download your results from the Apify Dataset in:
- JSON — full structured output,
techStackas a native array - CSV — flat table;
techStackserialized as comma-joined string, ready for Excel or Google Sheets - Excel (.xlsx) — native spreadsheet for sharing with non-technical stakeholders
- JSONL — one record per line for streaming into Notion, Airtable, job alert bots, or custom pipelines
Tips & Recipes
Build a personal job alert:
Schedule this actor daily with mode: "hiring", months: 1, and your filterKeywords. Export to Airtable or a Google Sheet and watch matching jobs appear automatically.
Salary benchmarking:
Run months: 12 with no filters. Export to CSV. Filter salary != null and pivot by techStack. You now have a year of self-reported salary data from actual hiring managers — not aggregated survey estimates.
Track a company's hiring history:
Use mode: "search" with the company name as searchQuery. Returns all mentions of that company across years of HN hiring threads.
Source candidates:
Use mode: "thread" with the ID of the latest "Who wants to be hired?" thread. Same parsing logic extracts structured profiles from candidates advertising themselves.
Identify trending technologies:
Run months: 6, export to CSV, and count frequency in the techStack column. Reveals what the HN startup ecosystem is actually building with right now — a more reliable signal than survey reports.
Exclude noise efficiently:
Combine excludeKeywords: ["blockchain", "web3", "NFT"] with filterKeywords: ["Python", "Go"] to get a focused, high-signal list without manual review.
Limitations
- Free-form text parsing. HN job postings follow a convention, not a strict schema. Postings that don't use the
Company | Rolefirst-line format will havenullforcompanyandrole. ThefullTextfield always contains the full original text regardless. - Salary not normalized. Salary is extracted verbatim.
$180kand$180,000are stored as different strings. Normalize in post-processing if needed. - No cross-month deduplication. Companies that post in multiple consecutive months appear as separate records. Use
company+threadMonthas a composite key if deduplication is needed. - Search mode returns less structured data. Posts outside hiring threads don't follow the
Company | Roleconvention, socompany,role,location, and other parsed fields are oftennullinmode: "search"results. - Top-level comments only. The actor only processes top-level comments (one per job posting). Replies to job comments (e.g. "Is this role still open?") are not included.
- Location detection is city-list based. Only a pre-defined set of major cities is matched. Uncommon city names or country-only mentions are not captured in the
locationfield.
Frequently Asked Questions
Q: How often is the "Who is Hiring?" thread posted?
On the first weekday of every month, posted by the HN moderator whoishiring. It is one of the most consistent monthly events on the platform, running continuously for over a decade.
Q: How many job postings are in a typical thread?
Between 400 and 900 top-level comments, of which ~80–90% are genuine job postings after filtering out general replies and off-topic comments.
Q: Can I scrape the "Who wants to be hired?" thread too?
Yes — use mode: "thread" and provide that thread's item ID. The same parsing logic applies, extracting company, role, location, and tech stack from each commenter's profile post.
Q: Is this actor free to run?
The actor itself costs minimal Apify compute (under $0.01 per month of data). The HN Algolia API is completely free and requires no API key or registration.
Q: Do I need a proxy?
No. The Algolia HN API is public, rate-limit-generous, and does not require proxy usage for normal scraping volumes.
Q: Why are company and role null for some records?
Some commenters don't follow the standard Company | Role format and write a paragraph instead of a structured first line. The fullText field always contains the complete posting.
Q: Can I scrape older threads from 2020, 2021, or earlier?
Yes — use mode: "thread" with the specific thread IDs from those years. Find old thread IDs via HN search or the Algolia API. The months parameter only looks back from the current date via date-sorted discovery.
Q: What's the difference between filterKeywords and searchQuery?
searchQuery is used only in mode: "search" and queries the Algolia index server-side before any data is fetched. filterKeywords is a client-side filter applied after fetching and parsing, and works in all three modes on the fullText of already-downloaded comments.
Q: Can I run this on a schedule for continuous monitoring?
Yes — use the Apify Scheduler to run daily or weekly. With months: 1 and your keyword filters, you get a fresh filtered dataset of each month's new postings automatically.
Technical Details
| Property | Value |
|---|---|
| Runtime | Node.js (ES Modules) |
| Framework | Apify SDK v3 |
| HTTP client | Native fetch |
| Data source | Algolia HN API (hn.algolia.com/api/v1) |
| Proxy required | ❌ No |
| API key required | ❌ No |
| Browser required | ❌ No |
| Dependencies | apify only |
| Tech keywords detected | 40+ |
| Delay between threads | 500 ms |
| Delay between search pages | 300 ms |
| Max redirect hops | N/A (JSON API) |
Changelog
- 2026-06-01 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
- 2026-05-25 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
v1.0
- Initial release
- Three modes:
hiring(monthly threads),thread(by ID),search(full-text Algolia) - Structured field extraction: company, role, location, remote, salary, tech stack, visa, apply URL, email
- 40+ tech keyword detection with pre-compiled regex
- Remote policy classification: Remote / Hybrid / Onsite
- Salary range extraction (various formats)
- Visa sponsorship detection (H1B, sponsorship available/not available)
filterKeywords,excludeKeywords, andremoteOnlyclient-side filters- Up to 24 months of historical thread scraping
- No proxy, no API key, no browser required
Support
If you encounter missing fields, unexpected empty results, or parsing issues, please open a support ticket via the Apify Console. Include the thread ID or search query, your full input configuration, and the actor run ID to help diagnose the issue quickly.
Changelog
- 2026-05-20 — Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.
Last reviewed: 2026-06-01.