Hacker News Scraper avatar

Hacker News Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Hacker News Scraper

Hacker News Scraper

Scrape Hacker News stories, comments, jobs, and user profiles. Modes: top/new/best/ask/show/jobs/past/item/user/search. Filters: minScore, domainFilter, dateRange, commentMinScore. No proxy, no auth.

Pricing

from $1.00 / 1,000 results

Rating

5.0

(13)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

7

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Scrape Hacker News stories, comments, jobs, and user profiles via the official Firebase + Algolia public APIs. No login, no cookies, no proxy. Modes for top / new / best / Ask HN / Show HN / Jobs / Past — plus direct item lookup, user profile fetch, and Algolia full-text search.

What this actor does

  • Walks the public Hacker News feeds (top, new, best, ask, show, jobs)
  • Fetches individual stories or comment threads by ID
  • Pulls user profiles (karma, account age, submitted item count)
  • Runs full-text search across HN history via the Algolia HN API
  • Builds nested comment trees on demand (enableCommentHierarchy=true) or emits flat comment rows otherwise

Modes

ModeWhat it does
topStoriesFront-page top stories
newStoriesNewest submissions
bestStoriesBest-of recent submissions
askStoriesAsk HN posts
showStoriesShow HN posts
jobStoriesYC company job ads
itemDirect lookup of specific item IDs
userDirect lookup of user profiles
searchAlgolia full-text search across all HN history

Output per story

  • id, hnUrl, type, title
  • url, domain (parsed host)
  • score, numComments, author
  • createdAt (ISO-8601 UTC), createdAtEpoch, ageHours
  • text (Ask/Show/Job posts only — HN-flavored markdown converted to plain text)
  • kids (list of top-level comment IDs)
  • comments[] (nested replies — only when enableCommentHierarchy=true)
  • dead / deleted (boolean — only when true)
  • recordType: "story", scrapedAt

Output per comment

  • id, hnUrl, storyId, parentId, depth
  • author, text (HTML → plain text)
  • createdAt, createdAtEpoch, ageHours
  • kids (list of reply IDs), replies[] (nested — only when enableCommentHierarchy=true)
  • recordType: "comment", scrapedAt

Output per user

  • id, profileUrl
  • karma, createdAt, createdAtEpoch, about
  • submittedCount, submitted (capped at 50 most-recent)
  • recordType: "user", scrapedAt

Empty fields are omitted from the output (no nulls).

Input

FieldTypeDefaultDescription
modeenumtopStoriestopStories / newStories / bestStories / askStories / showStories / jobStories / item / user / search
itemIdsarray[]When mode=item: numeric HN item IDs
usernamesarray[]When mode=user: HN usernames
searchQuerystringWhen mode=search: full-text query
startUrlsarray[]HN URLs (item?id=N or user?id=USER) — auto-routed to itemIds / usernames
enableCommentHierarchybooleanfalseWhen true, attach a nested comments[] array to each story instead of emitting flat sibling rows
maxItemsint100Hard cap on emitted records (1–5000)
maxCommentsint0Cap comments fetched per story. 0 = skip comments.
maxDepthint5Maximum reply depth when maxComments > 0.
minScoreintDrop stories below this score
domainAllowlistarray[]Only emit stories whose URL host contains one of these substrings
domainBlocklistarray[]Drop stories whose URL host contains one of these substrings
excludeDeadOrDeletedbooltrueDrop items flagged dead/deleted by HN
dateRangeFromstringDrop items posted before this ISO-date (UTC)
dateRangeTostringDrop items posted after this ISO-date (UTC)
commentMinScoreintDrop comments below this score
commentAuthorFilterarray[]Only emit comments by these usernames
minCommentCountintDrop stories with fewer than this many comments

Example: top stories with filters

{
"mode": "topStories",
"maxItems": 50,
"minScore": 100,
"domainBlocklist": ["twitter.com", "x.com"],
"excludeDeadOrDeleted": true
}
{
"mode": "search",
"searchQuery": "rust async runtime",
"maxItems": 100,
"minScore": 50
}

Example: story with comment tree

{
"mode": "item",
"itemIds": ["12345678"],
"maxComments": 200,
"maxDepth": 3,
"enableCommentHierarchy": true
}

Use cases

  • Trend monitoring — track which domains hit the front page each week
  • Comment intelligence — pull every comment for an Ask HN thread to study reactions
  • YC job-ads digest — weekly extract of jobStories for the careers newsletter
  • User research — fetch a user's submission history + karma stats for outreach
  • Search-driven enrichment — feed Algolia search results into a downstream tagger

FAQ

Does it require a login or cookies? No. Both Firebase and Algolia HN APIs are fully public.

Is a proxy needed? No. The actor works from datacenter IPs without any proxy.

Why are some stories missing a url? Ask HN / Show HN posts are self-text — they have a text field instead of a url. The omit-empty contract drops the url field on these.

Why does commentMinScore not filter much? Hacker News rarely exposes per-comment scores. When the field is missing the comment is kept.

What's the difference between maxItems and maxComments? maxItems caps the total emitted records (stories + comments + users combined). maxComments caps how deep the actor goes into each story's comment tree per story.

How fresh is the data? Real-time. Both APIs serve the live HN database.