Hacker News Scraper
Pricing
from $0.40 / 1,000 data fetcheds
Hacker News Scraper
Scrapes Hacker News stories, comments, jobs, polls, and user profiles via the official Firebase and Algolia APIs. Supports full-text search, Who's Hiring thread extraction, author karma snapshots, and deep comment trees.
Pricing
from $0.40 / 1,000 data fetcheds
Rating
5.0
(1)
Developer
Omar Eldeeb
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Scrape Hacker News stories, comments, jobs, polls, and user profiles with no proxy, no captcha, no headless browser. Built directly on the official Firebase and Algolia APIs, so it's fast, reliable, and cheap.
What does Hacker News Scraper do?
This actor extracts structured data from Hacker News — stories, comments, jobs, user profiles, and the monthly "Ask HN: Who is hiring?" threads — via HN's two official public APIs (the Firebase API and the Algolia search API). It produces a unified JSON output with titles, URLs, points, authors, timestamps, and optional full comment trees with author karma snapshots.
Unlike most HN scrapers on the Store that parse HTML and require proxies, this one hits the official APIs directly. You can schedule it, call it via REST/Webhook/MCP from the Apify platform, pipe its output to downstream actors, and never worry about rate limits, captchas, or IP blocks.
Why use Hacker News Scraper?
- 🏢 Recruiters & talent teams — pull the latest "Who is hiring?" monthly thread as structured job rows, filter by role keywords, feed into your ATS.
- 📈 Market intelligence & PR teams — monitor mentions of your company, competitors, or product categories with full-text search.
- 🧠 AI/LLM data pipelines — build clean, deduplicated training or retrieval-augmented-generation datasets from HN discussions, with author karma signals.
- 📰 Newsletters & aggregators — daily digest of top / new / best stories above a score threshold, optionally filtered to specific domains.
- 🎯 Research & academia — reproducible HN corpora for discourse analysis, link-graph research, or longitudinal studies of tech discussion.
How to use Hacker News Scraper
- Click Try for free on this actor's page to open the Apify Console input form.
- Pick a mode — start with
topstoriesto see how it works. - Set Max items (default 30). For a first run, try 10.
- Click Save & Start.
- When the run finishes, open the Output tab to see the scraped items, or use Export to download as JSON, CSV, or Excel.
- For more advanced runs, enable Include comments to attach comment trees, or use
mode=searchto run a keyword query.
Input
Every field is configurable via the input form. See the Input tab for live validation and tooltips. The only required field is mode.
Minimal examples (copy into the Console's "JSON" tab):
Top 30 front-page stories:
{ "mode": "topstories", "maxItems": 30 }
Stories mentioning "Claude", newest first, ≥ 50 points:
{"mode": "search","searchQuery": "Claude","sortSearchBy": "date","minScore": 50,"maxItems": 100}
Show HN from github.com or arxiv.org with heavy discussion:
{"mode": "showstories","minScore": 50,"minComments": 20,"domainFilter": ["github.com", "arxiv.org"],"maxItems": 50}
Latest "Who is hiring?" thread as structured job rows:
{ "mode": "hiring_threads", "maxItems": 500 }
A story + its full 3-level comment tree + author karma:
{"mode": "topstories","maxItems": 5,"includeComments": true,"maxCommentDepth": 3,"includeUserProfiles": true}
Output
Every row — whether it's a story, comment, job, or poll — uses the same unified shape. You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.
{"id": 47822805,"type": "story","title": "SPEAKE(a)R: Turn Speakers to Microphones for Fun and Profit","url": "https://www.usenix.org/system/files/conference/woot17/woot17-paper-guri.pdf","domain": "usenix.org","hnUrl": "https://news.ycombinator.com/item?id=47822805","by": "Eridanus2","byUserKarma": 1847,"byUserCreated": 1625097600,"score": 67,"time": 1776588348,"createdAt": "2026-04-19T08:45:48.000Z","descendants": 26,"comments": [{"id": 47823010,"by": "someuser","text": "<p>Interesting approach...</p>","depth": 1,"replies": [{ "id": 47823201, "by": "replyer", "text": "...", "depth": 2 }]}],"scrapedAt": "2026-04-19T11:21:58.601Z","source": "firebase"}
Data fields
| Field | Type | Description |
|---|---|---|
id | number | Unique HN item ID |
type | enum | story / comment / job / poll / pollopt |
title | string | Story / job title (null for comments) |
url | string | Outbound URL for stories (null for text-only) |
domain | string | Extracted hostname (e.g., github.com) |
hnUrl | string | Deep link to the item on news.ycombinator.com |
by | string | Author username |
byUserKarma | number | Author's karma (when includeUserProfiles=true) |
byUserCreated | number | Unix timestamp of author account creation |
score | number | Points (votes) |
time | number | Unix timestamp of submission |
createdAt | string | ISO 8601 timestamp |
text | string | HTML body (comments, Ask HN, job descriptions) |
descendants | number | Total comments on a story |
parent | number | Parent ID for comments |
comments | array | Nested comment tree (when includeComments=true) |
flatComments | array | Flat list with depth (when flattenComments=true) |
deleted / dead | boolean | Item status flags |
scrapedAt | string | ISO timestamp when this row was fetched |
source | enum | firebase or algolia — which API returned it |
How much does it cost to scrape Hacker News?
This actor uses pay-per-event pricing — you only pay for the rows you get, no monthly subscription.
| Event | Price |
|---|---|
story-fetched | $0.00040 / item ($0.40 per 1,000 stories or jobs) |
comment-fetched | $0.00015 / comment ($0.15 per 1,000 comments) |
user-profile-fetched | $0.00030 / profile ($0.30 per 1,000 profiles — only when includeUserProfiles=true) |
The first 50 chargeable events in every run are free. That means any run with ≤ 50 total output rows + comments + profiles costs nothing beyond the platform's trivial startup fee. You only pay for events 51+ within the same run.
Typical run costs (after the 50-event trial is exhausted in the same run):
- 100 top stories (no comments): 100 stories → 50 free + 50 paid = $0.020
- 30 top stories +
10 comments each (≈ 330 events): **$0.046** - Monthly "Who is hiring?" full extract (≈ 500 jobs): $0.180
- 1,000 search hits on a keyword: $0.380
- A 30-story smoke test with no comments: $0 (entirely free)
Tips & advanced options
- Comment depth —
maxCommentDepth=3is a good default. 5+ is only useful for megathreads; 1 gives you only top-level replies. - Domain filter — combine with
topstoriesorshowstoriesfor a recruiter-style feed of GitHub / Arxiv / company-blog stories. - Date ranges — pass
dateFromUnix/dateToUnix(Unix seconds) to scope search mode. Works best withsortSearchBy=date. - User karma snapshot —
includeUserProfiles=trueaddsbyUserKarma+byUserCreatedto every row. Useful for weighting by poster reputation. - Flatten comments for spreadsheets — set
flattenComments=trueto get a flat array with adepthfield, which exports cleanly to CSV/Excel. - Schedule it — use Apify's scheduler to run
topstoriesevery hour for a live HN feed, orhiring_threadsmonthly. - Chain it — pipe the output into another actor (e.g., a text classifier, Slack poster) via Apify's integrations.
FAQ, disclaimers & support
Does this need a proxy? No. Both HN APIs are public and unmetered.
How fresh is the data? Firebase list endpoints update roughly every 5 minutes (HN's own cadence). user and hiring_threads are live.
What counts as a "comment" for billing? Only comments that actually appear in your output dataset — i.e. inside the depth limit you set. Comments skipped because they're deleted or beyond maxCommentDepth are free.
Can I search for comments by a specific author? Yes — set mode=search with searchTags: ["comment", "author_dang"].
Why are some by fields null? Deleted items have no author. They're filtered out by default.
Legality. Hacker News data is publicly accessible and both APIs are officially sanctioned by Y Combinator. This actor respects those APIs' rate limits and does not bypass any access control. You are responsible for compliant use of the data under Y Combinator's Terms of Service and any applicable privacy laws (GDPR, CCPA, etc.) when processing personal data such as usernames.
Found a bug or want a feature? Open an issue on the actor's Issues tab. Custom extensions (e.g., Slack / Discord forwarding, semantic dedupe, dashboards) are available on request.