Hacker News Scraper
Pricing
from $1.00 / 1,000 results
Hacker News Scraper
Scrape Hacker News stories, comments, jobs, and user profiles. Modes: top/new/best/ask/show/jobs/past/item/user/search. Filters: minScore, domainFilter, dateRange, commentMinScore. No proxy, no auth.
Pricing
from $1.00 / 1,000 results
Rating
5.0
(13)
Developer
Crawler Bros
Actor stats
7
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape Hacker News stories, comments, jobs, and user profiles via the official Firebase + Algolia public APIs. No login, no cookies, no proxy. Modes for top / new / best / Ask HN / Show HN / Jobs / Past — plus direct item lookup, user profile fetch, and Algolia full-text search.
What this actor does
- Walks the public Hacker News feeds (top, new, best, ask, show, jobs)
- Fetches individual stories or comment threads by ID
- Pulls user profiles (karma, account age, submitted item count)
- Runs full-text search across HN history via the Algolia HN API
- Builds nested comment trees on demand (
enableCommentHierarchy=true) or emits flat comment rows otherwise
Modes
| Mode | What it does |
|---|---|
topStories | Front-page top stories |
newStories | Newest submissions |
bestStories | Best-of recent submissions |
askStories | Ask HN posts |
showStories | Show HN posts |
jobStories | YC company job ads |
item | Direct lookup of specific item IDs |
user | Direct lookup of user profiles |
search | Algolia full-text search across all HN history |
Output per story
id,hnUrl,type,titleurl,domain(parsed host)score,numComments,authorcreatedAt(ISO-8601 UTC),createdAtEpoch,ageHourstext(Ask/Show/Job posts only — HN-flavored markdown converted to plain text)kids(list of top-level comment IDs)comments[](nested replies — only whenenableCommentHierarchy=true)dead/deleted(boolean — only when true)recordType: "story",scrapedAt
Output per comment
id,hnUrl,storyId,parentId,depthauthor,text(HTML → plain text)createdAt,createdAtEpoch,ageHourskids(list of reply IDs),replies[](nested — only whenenableCommentHierarchy=true)recordType: "comment",scrapedAt
Output per user
id,profileUrlkarma,createdAt,createdAtEpoch,aboutsubmittedCount,submitted(capped at 50 most-recent)recordType: "user",scrapedAt
Empty fields are omitted from the output (no nulls).
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | enum | topStories | topStories / newStories / bestStories / askStories / showStories / jobStories / item / user / search |
itemIds | array | [] | When mode=item: numeric HN item IDs |
usernames | array | [] | When mode=user: HN usernames |
searchQuery | string | – | When mode=search: full-text query |
startUrls | array | [] | HN URLs (item?id=N or user?id=USER) — auto-routed to itemIds / usernames |
enableCommentHierarchy | boolean | false | When true, attach a nested comments[] array to each story instead of emitting flat sibling rows |
maxItems | int | 100 | Hard cap on emitted records (1–5000) |
maxComments | int | 0 | Cap comments fetched per story. 0 = skip comments. |
maxDepth | int | 5 | Maximum reply depth when maxComments > 0. |
minScore | int | – | Drop stories below this score |
domainAllowlist | array | [] | Only emit stories whose URL host contains one of these substrings |
domainBlocklist | array | [] | Drop stories whose URL host contains one of these substrings |
excludeDeadOrDeleted | bool | true | Drop items flagged dead/deleted by HN |
dateRangeFrom | string | – | Drop items posted before this ISO-date (UTC) |
dateRangeTo | string | – | Drop items posted after this ISO-date (UTC) |
commentMinScore | int | – | Drop comments below this score |
commentAuthorFilter | array | [] | Only emit comments by these usernames |
minCommentCount | int | – | Drop stories with fewer than this many comments |
Example: top stories with filters
{"mode": "topStories","maxItems": 50,"minScore": 100,"domainBlocklist": ["twitter.com", "x.com"],"excludeDeadOrDeleted": true}
Example: full-text search
{"mode": "search","searchQuery": "rust async runtime","maxItems": 100,"minScore": 50}
Example: story with comment tree
{"mode": "item","itemIds": ["12345678"],"maxComments": 200,"maxDepth": 3,"enableCommentHierarchy": true}
Use cases
- Trend monitoring — track which domains hit the front page each week
- Comment intelligence — pull every comment for an Ask HN thread to study reactions
- YC job-ads digest — weekly extract of
jobStoriesfor the careers newsletter - User research — fetch a user's submission history + karma stats for outreach
- Search-driven enrichment — feed Algolia search results into a downstream tagger
FAQ
Does it require a login or cookies? No. Both Firebase and Algolia HN APIs are fully public.
Is a proxy needed? No. The actor works from datacenter IPs without any proxy.
Why are some stories missing a url? Ask HN / Show HN posts are self-text — they have a text field instead of a url. The omit-empty contract drops the url field on these.
Why does commentMinScore not filter much? Hacker News rarely exposes per-comment scores. When the field is missing the comment is kept.
What's the difference between maxItems and maxComments? maxItems caps the total emitted records (stories + comments + users combined). maxComments caps how deep the actor goes into each story's comment tree per story.
How fresh is the data? Real-time. Both APIs serve the live HN database.