Hacker News Scraper: Stories, Comments, Users & Search avatar

Hacker News Scraper: Stories, Comments, Users & Search

Pricing

$1.00 / 1,000 result items

Go to Apify Store
Hacker News Scraper: Stories, Comments, Users & Search

Hacker News Scraper: Stories, Comments, Users & Search

Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.

Pricing

$1.00 / 1,000 result items

Rating

0.0

(0)

Developer

Perconey

Perconey

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

What does Hacker News Scraper do?

Hacker News Scraper pulls structured data from Hacker News using two official public APIs: the Firebase API (canonical, zero-rate-limit, the same backend the HN web app uses) and the Algolia full-text search API (which powers HN's own search box). No browser, no proxies, no cookies, no anti-bot fight. The Firebase API has no rate limit, no quota, no auth - it's just a Google Firebase Realtime Database mirror that the YC team keeps open for everyone.

Try it instantly: pick getTopStories, leave queries empty, click Start. You get the current HN frontpage with author, score, comment count, URL, and posting time in under 10 seconds for $0.03.

Why use Hacker News Scraper?

  • Tech recruiters: Find high-karma engineers on niche topics. searchStories with tags: comment,author_dhh returns every comment a specific developer has written. getUserProfile with includeSubmissions: true returns 30 most recent stories/comments by a user - a real signal of expertise.
  • Startup researchers: Track "Show HN" launches with getShowStories. Watch what's gaining traction in getBestStories (high-karma recent submissions). Build a daily digest with Apify Scheduler.
  • DevRel teams: Monitor stories mentioning your product. searchStories with your project name as query + sortByDate: true gives newest-first chronological feed. Pipe into Slack/Discord via Apify Integrations.
  • Content marketers: Reverse-engineer what HN upvotes. getTopStories over time + URL clustering reveals which domains/topics get traction.
  • AI/ML pipelines: Hacker News comments are one of the highest-quality public conversation datasets. getItemDetail with includeComments: true returns full threaded discussions in clean JSON.
  • Hiring managers: Use getJobStories to track Y Combinator portfolio hiring activity.

How to use Hacker News Scraper

  1. Open the Input tab.
  2. Pick an action from the dropdown. getTopStories is the simplest starting point.
  3. Story-list actions (Top/New/Best/Ask/Show/Jobs) need no queries - leave that field empty.
  4. For getItemDetail or getUserProfile, enter the item id (e.g. 48106024) or HN username (e.g. pg) in queries.
  5. For searchStories, type the search query and optionally set tags (e.g. story,ask_hn) and sortByDate.
  6. Set maxItems to cap the run. Default 30.
  7. Toggle includeComments for getItemDetail or includeSubmissions for getUserProfile if you want the deep dive.
  8. Click Start. Results stream to the dataset.

Input

FieldRequiredDescription
actionyesWhich API call to make. Nine options.
queriessometimesRequired for getItemDetail / getUserProfile / searchStories. Empty for story-list actions.
maxItemsnoMax items per query. Default 30. For getItemDetail+includeComments this caps the BFS comment walk.
includeCommentsnoFor getItemDetail: walk the full comment tree (each comment counts as a result-item).
includeSubmissionsnoFor getUserProfile: also fetch the user's most recent submissions.
sortByDatenoFor searchStories: use Algolia's newest-first ranking instead of relevance.
tagsnoFor searchStories: Algolia tags filter (e.g. story, comment, ask_hn, show_hn, author_USERNAME).

Output

Every dataset item carries _type (story / comment / job / poll / user / error) plus _action for filtering.

{
"_type": "story",
"_action": "getTopStories",
"id": 48106024,
"type": "story",
"title": "Learning Software Architecture",
"url": "https://example.com/article",
"score": 189,
"descendants": 47,
"by": "surprisetalk",
"time": "2026-05-12T11:23:11.000Z",
"kids": [48106101, 48106220, 48106305],
"hn_url": "https://news.ycombinator.com/item?id=48106024"
}

You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab or the Apify API.

Data fields

TypeKey fields
story / job / pollid, type, title, url, text, score, descendants (comment count), by (author), time, kids (comment ids), hn_url
commentid, type, text, by, time, parent, kids, hn_url, score
userid (username), karma, about (HTML bio), created, submitted_count, submitted_ids, hn_url
error_action, _query, error, status

Pricing

Pay-per-result: $0.001 per item. Each story = one event. Each comment in a tree = one event. Each user profile = one event. No flat monthly fee.

Cost examples:

  • Daily HN frontpage (30 stories): $0.03
  • Full comment tree of a 200-comment story: $0.20
  • 500 stories matching a search query: $0.50
  • 100 user profiles + 30 submissions each (4000 items total): $4.00

Tips

  • Firebase API has NO rate limit. Run as many parallel requests as you want. We batch 5 concurrent fetches per page out of politeness.
  • Algolia returns up to 50 hits per page, max 1000 hits per query. For exhaustive search over a high-volume term, segment by date with numericFilters=created_at_i>... (advanced) or use sortByDate and paginate by date.
  • Use tags: author_USERNAME in searchStories to get every post or comment by a specific user. Compare with getUserProfile + includeSubmissions: true - same data, different angle.
  • HN ids are global and immutable. A story id is unique across stories, comments, users, jobs, polls. The _type field tells you what shape the item is.
  • For real-time monitoring, set up an Apify Schedule running getNewStories every 5 minutes. Combine with an Apify Integration (Slack, Discord, webhook) to get an instant feed.

FAQ, disclaimers, support

Is this legal? The actor calls Hacker News's official public APIs (firebaseio.com + hn.algolia.com), both maintained by Y Combinator for public use. No scraping of the HTML, no auth bypass, no rate-limit games. We send a clear User-Agent identifying the actor.

What about deleted/flagged content? The Firebase API surfaces deleted: true and dead: true flags. We skip both in the comment-tree walker to keep your dataset clean, but solo getItemDetail will still return them with the flags so you can audit deletions if needed.

Does it return downvotes? No. HN doesn't expose downvote counts publicly. Only net score is available.

Why no realtime stream? The Firebase API supports realtime subscriptions but Apify Actors are batch workers. For a true realtime feed, run this actor on Apify Schedule with a tight interval (every 1-5 minutes).

Bug or feature request? Open an Issue on the actor's Issues tab. Usually responded to within a day.

Need a scraper for Reddit, Lobsters, Lemmy, Tildes? See my other actors at https://apify.com/perconey, or open an Issue.