Hacker News Scraper: Stories, Comments, Users & Search
Pricing
$1.00 / 1,000 result items
Hacker News Scraper: Stories, Comments, Users & Search
Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.
Pricing
$1.00 / 1,000 result items
Rating
0.0
(0)
Developer
Perconey
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
What does Hacker News Scraper do?
Hacker News Scraper pulls structured data from Hacker News using two official public APIs: the Firebase API (canonical, zero-rate-limit, the same backend the HN web app uses) and the Algolia full-text search API (which powers HN's own search box). No browser, no proxies, no cookies, no anti-bot fight. The Firebase API has no rate limit, no quota, no auth - it's just a Google Firebase Realtime Database mirror that the YC team keeps open for everyone.
Try it instantly: pick getTopStories, leave queries empty, click Start. You get the current HN frontpage with author, score, comment count, URL, and posting time in under 10 seconds for $0.03.
Why use Hacker News Scraper?
- Tech recruiters: Find high-karma engineers on niche topics.
searchStorieswithtags: comment,author_dhhreturns every comment a specific developer has written.getUserProfilewithincludeSubmissions: truereturns 30 most recent stories/comments by a user - a real signal of expertise. - Startup researchers: Track "Show HN" launches with
getShowStories. Watch what's gaining traction ingetBestStories(high-karma recent submissions). Build a daily digest with Apify Scheduler. - DevRel teams: Monitor stories mentioning your product.
searchStorieswith your project name as query +sortByDate: truegives newest-first chronological feed. Pipe into Slack/Discord via Apify Integrations. - Content marketers: Reverse-engineer what HN upvotes.
getTopStoriesover time + URL clustering reveals which domains/topics get traction. - AI/ML pipelines: Hacker News comments are one of the highest-quality public conversation datasets.
getItemDetailwithincludeComments: truereturns full threaded discussions in clean JSON. - Hiring managers: Use
getJobStoriesto track Y Combinator portfolio hiring activity.
How to use Hacker News Scraper
- Open the Input tab.
- Pick an action from the dropdown.
getTopStoriesis the simplest starting point. - Story-list actions (Top/New/Best/Ask/Show/Jobs) need no queries - leave that field empty.
- For
getItemDetailorgetUserProfile, enter the item id (e.g.48106024) or HN username (e.g.pg) in queries. - For
searchStories, type the search query and optionally settags(e.g.story,ask_hn) andsortByDate. - Set maxItems to cap the run. Default 30.
- Toggle
includeCommentsfor getItemDetail orincludeSubmissionsfor getUserProfile if you want the deep dive. - Click Start. Results stream to the dataset.
Input
| Field | Required | Description |
|---|---|---|
action | yes | Which API call to make. Nine options. |
queries | sometimes | Required for getItemDetail / getUserProfile / searchStories. Empty for story-list actions. |
maxItems | no | Max items per query. Default 30. For getItemDetail+includeComments this caps the BFS comment walk. |
includeComments | no | For getItemDetail: walk the full comment tree (each comment counts as a result-item). |
includeSubmissions | no | For getUserProfile: also fetch the user's most recent submissions. |
sortByDate | no | For searchStories: use Algolia's newest-first ranking instead of relevance. |
tags | no | For searchStories: Algolia tags filter (e.g. story, comment, ask_hn, show_hn, author_USERNAME). |
Output
Every dataset item carries _type (story / comment / job / poll / user / error) plus _action for filtering.
{"_type": "story","_action": "getTopStories","id": 48106024,"type": "story","title": "Learning Software Architecture","url": "https://example.com/article","score": 189,"descendants": 47,"by": "surprisetalk","time": "2026-05-12T11:23:11.000Z","kids": [48106101, 48106220, 48106305],"hn_url": "https://news.ycombinator.com/item?id=48106024"}
You can download the dataset in JSON, CSV, XML, Excel, RSS or HTML format from the Output tab or the Apify API.
Data fields
| Type | Key fields |
|---|---|
story / job / poll | id, type, title, url, text, score, descendants (comment count), by (author), time, kids (comment ids), hn_url |
comment | id, type, text, by, time, parent, kids, hn_url, score |
user | id (username), karma, about (HTML bio), created, submitted_count, submitted_ids, hn_url |
error | _action, _query, error, status |
Pricing
Pay-per-result: $0.001 per item. Each story = one event. Each comment in a tree = one event. Each user profile = one event. No flat monthly fee.
Cost examples:
- Daily HN frontpage (30 stories): $0.03
- Full comment tree of a 200-comment story: $0.20
- 500 stories matching a search query: $0.50
- 100 user profiles + 30 submissions each (4000 items total): $4.00
Tips
- Firebase API has NO rate limit. Run as many parallel requests as you want. We batch 5 concurrent fetches per page out of politeness.
- Algolia returns up to 50 hits per page, max 1000 hits per query. For exhaustive search over a high-volume term, segment by date with
numericFilters=created_at_i>...(advanced) or usesortByDateand paginate by date. - Use
tags: author_USERNAMEin searchStories to get every post or comment by a specific user. Compare withgetUserProfile + includeSubmissions: true- same data, different angle. - HN ids are global and immutable. A story id is unique across stories, comments, users, jobs, polls. The
_typefield tells you what shape the item is. - For real-time monitoring, set up an Apify Schedule running
getNewStoriesevery 5 minutes. Combine with an Apify Integration (Slack, Discord, webhook) to get an instant feed.
FAQ, disclaimers, support
Is this legal? The actor calls Hacker News's official public APIs (firebaseio.com + hn.algolia.com), both maintained by Y Combinator for public use. No scraping of the HTML, no auth bypass, no rate-limit games. We send a clear User-Agent identifying the actor.
What about deleted/flagged content? The Firebase API surfaces deleted: true and dead: true flags. We skip both in the comment-tree walker to keep your dataset clean, but solo getItemDetail will still return them with the flags so you can audit deletions if needed.
Does it return downvotes? No. HN doesn't expose downvote counts publicly. Only net score is available.
Why no realtime stream? The Firebase API supports realtime subscriptions but Apify Actors are batch workers. For a true realtime feed, run this actor on Apify Schedule with a tight interval (every 1-5 minutes).
Bug or feature request? Open an Issue on the actor's Issues tab. Usually responded to within a day.
Need a scraper for Reddit, Lobsters, Lemmy, Tildes? See my other actors at https://apify.com/perconey, or open an Issue.