Pricing

from $0.005 / story scraped

HackerNews Insights Scraper — Stories, Comments & Velocity

Hacker News stories, full comment trees, user karma and contact info, story velocity tracking, history deltas. Search all 3.7M stories with filters for points, karma, domain, dates, keywords. For VCs hunting Show HN, recruiters mining talent, journalists tracking tech, and AI/RAG pipelines.

Pricing

from $0.005 / story scraped

Rating

0.0

(0)

Developer

Yuliia Kulakova

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

HackerNews Insights Scraper

Stories, comments, user karma, velocity tracking, and contact intelligence — turn Hacker News into a structured intelligence feed.

HackerNews Insights Scraper

Why this scraper

Hacker News is the single highest-signal community in tech: where launches break, where engineers vent, where investors hunt. But the site itself gives you a ranked list and a thread view — no filters, no trends, no exports, no way to track a story's momentum or pull a list of senior commenters by domain expertise.

This scraper turns Hacker News into a structured data feed you can pipe straight into your CRM, dashboard, LLM, or spreadsheet. Search the entire 3.7M-story archive, filter by score / karma / domain / keywords, pull full comment trees with depth and reply analytics, enrich author profiles with contact info, and track how stories grow between runs.

What you get

Stories with rich context

Top, Best, New, Ask HN, Show HN, and Job listings
Title, URL, domain, body text, author, score, comment count, submission time
Auto-tagged: story, show_hn, ask_hn, job
Permalinks straight back to news.ycombinator.com

Full-text search across all 3.7M Hacker News stories

Search by keyword across the entire HN archive (2007 → today)
Filter by tag, date range, score threshold, and author karma
Sort by popularity or newest

Complete comment trees with analytics

Recursive thread fetch with configurable depth (1–10 tiers)
Hard cap per story so viral threads don't explode your budget
Per-comment: author, body text, parent, depth, reply count, timestamps
Per-story analytics: max depth, average replies per node, top 5 commenters by reply count

User intelligence

Karma, account age, total submission count
Recent submission IDs (last 50)
Dominant domains the user posts about (signal for "what they're into")
Average score on their last 20 stories

Contact extraction from user bios

Emails, X/Twitter handles, GitHub, LinkedIn, Mastodon
Personal websites (auto-classified separately from social profiles)
Smart parsing that ignores false positives (an email like thomas@fly.io won't produce a fake @fly Twitter handle)

Story velocity tracking

Points per hour and comments per hour since submission
Points-to-comments ratio (viral vs. controversial signal)
Story age in hours

History delta — track stories across runs

Persistent snapshot store keyed by story ID
On every subsequent run, each story includes: scoreDelta, commentsDelta, scorePerHour, commentsPerHour, trend (up/down/flat)
See exactly how a Show HN gained traction overnight or how a controversial post peaked and stalled

Input-time filters (the headline differentiator)

Minimum score, minimum comments, minimum author karma
Date range (from / to)
Keyword include list (case-insensitive title + body match)
Domain whitelist (only stories pointing to github.com, arxiv.org, etc.)
Filters apply before expensive fetches — you only pay for records that pass

Use cases

Who	What they pull
VCs & angel investors	Show HN deal flow — every product launch with > 100 points + maker contact + velocity since launch
Recruiters	High-karma authors who post about specific domains (Rust, ML, infrastructure) — with surfaced contact info
Tech journalists	Trending stories from arxiv.org, github.com, or competitor domains; sentiment via comment trees
PR & comms teams	Track when your company / product gets mentioned; full comment thread for response strategy
AI / RAG engineers	High-signal, opinion-rich training and retrieval data — full comments, not just titles
Startup founders	Competitor monitoring; see what users are saying about adjacent products in threads
Product managers	Pull all "Ask HN: how do you…" threads in your category for organic user research
Open-source maintainers	Find every HN discussion of your project across years; see which features users actually care about

Quick start

Drop this into the Input panel and run:

{
    "lists": ["top"],
    "maxStoriesPerList": 30
}

You'll get 30 top stories with velocity analytics and tag classification — typically in under 15 seconds.

Common input examples

All Show HN with at least 500 points from the last year

{
    "tagSearchOnly": true,
    "tags": ["show_hn"],
    "minPoints": 500,
    "dateFrom": "2025-06-01",
    "maxStoriesPerQuery": 100,
    "sortBy": "popularity"
}

Track a topic and gather author contacts

{
    "queries": ["LLM observability", "rust async"],
    "minPoints": 50,
    "includeAuthorProfiles": true,
    "includeContactInfo": true,
    "maxStoriesPerQuery": 30
}

Pull a single story with the full comment tree

{
    "storyIds": ["48513806"],
    "includeComments": true,
    "commentDepth": 5,
    "maxCommentsPerStory": 500,
    "includeAuthorProfiles": true
}

Daily monitor with growth tracking

{
    "lists": ["best"],
    "maxStoriesPerList": 50,
    "enableHistory": true,
    "minPoints": 100
}

Run on a schedule. Every run after the first includes history.delta showing how each story has grown.

Domain-specific intelligence (arxiv papers on the front page)

{
    "queries": ["AI", "machine learning"],
    "domains": ["arxiv.org"],
    "minPoints": 100,
    "dateFrom": "2025-01-01",
    "maxStoriesPerQuery": 50
}

Look up specific power users

{
    "userIds": ["pg", "tptacek", "patio11", "dang"],
    "includeContactInfo": true
}

Output overview

Three record types in the dataset:

Story

Field	Description
`type`	`"story"` or `"job"`
`id`	Hacker News story ID
`title`	Story title
`url`	External link (null for Ask HN / Tell HN self-posts)
`domain`	Apex domain of the URL
`text`	Body text for Ask HN / Show HN / Tell HN (HTML stripped)
`by`	Author username
`score`	Current points
`descendants`	Total comment count
`tag`	`story`, `show_hn`, `ask_hn`, or `job`
`createdAt`	ISO timestamp
`permalink`	Link to the story on news.ycombinator.com
`analytics`	`pointsPerHour`, `commentsPerHour`, `pointsToCommentsRatio`, `ageHours`, plus comment-tree shape stats when comments are fetched
`history`	`scoreDelta`, `commentsDelta`, `trend`, snapshot series — present when history tracking is on

Comment

Field	Description
`type`	`"comment"`
`id`, `parent`, `storyId`	Comment ID, immediate parent, root story
`by`, `text`	Author and full comment body (HTML stripped)
`depth`	1 = direct reply, 2 = reply-to-reply, etc.
`replyCount`	Number of direct child replies
`createdAt`, `permalink`	Timestamp and link

User

Field	Description
`type`	`"user"`
`username`	HN handle
`karma`	Total karma
`about`, `aboutHtml`	Bio (cleaned and original)
`createdAt`	When the account was created
`submittedCount`	Lifetime submission count
`recentSubmittedIds`	Last 50 submission IDs
`contactInfo`	`emails`, `twitter`, `github`, `linkedin`, `mastodon`, `websites`
`activity`	Recent activity sample, dominant domains, average story score

Pricing

Charge	Cost
Actor start	$0.01 per run
Story scraped	$0.005 per story (or job listing)
Comment scraped	$0.001 per comment
User profile scraped	$0.005 per user

Records are only counted after filters pass — you don't pay for stories that get dropped by minPoints, domains, or dateRange. Comment trees and author profiles are opt-in.

Worked examples:

Scenario	Stories	Comments	Users	Cost
50 top stories, no comments	50	0	0	$0.26
100 Show HN historical search	100	0	0	$0.51
30 stories + full comment trees (~30 avg)	30	~900	0	$1.06
1 viral story + 500 comments + author profile	1	500	1	$0.52
Daily best-of-50 monitor with author profiles	50	0	50	$0.51
Deep weekly review: 100 stories + 5000 comments + 100 authors	100	5000	100	$6.01

Comments are intentionally priced low so that comment-tree analytics and AI/RAG workloads stay affordable.

Proxies

Proxies are included and configured automatically. No setup required.

FAQ

Does this work with the Hacker News API directly? You don't need an API key or any setup. Pass an input, get a dataset. We handle the upstream calls.

Can I get comments from before HN existed? Comments and stories go back to HN's launch in 2007. Full-text search covers the entire archive.

Will this hit rate limits if I run it often? Hacker News exposes a generous public data surface for scrapers. Per-request throttling is built in. You can safely schedule this every 15 minutes for monitoring use cases.

Can I track sentiment? The scraper returns full comment text. Sentiment is something you'd run downstream (an LLM call, your own classifier, etc.). We don't bundle sentiment to keep pricing flat and the data unopinionated.

Why don't I see Twitter handles for tptacek even though his bio has email addresses? The contact parser is intentionally strict: it won't extract @sockpuppet from the email thomas@sockpuppet.org because that would be a false positive. Real Twitter handles (text like @username written as a standalone mention, or a twitter.com/username URL) are extracted reliably.

How does history tracking work? Turn on enableHistory: true and pick a historyStoreName. On every run, each story's current score and comment count are snapshotted under that name. From the second run onward, every story includes a history.delta block with the change since the previous run, expressed as raw deltas and as per-hour rates.

Why might a story I expected to see not appear in the output? Most often a filter dropped it. Check the log: it prints active filters at the start of every run. Common gotchas: domains set with a self-post (Ask HN has no URL → automatically dropped), dateFrom cutoff too aggressive, or minAuthorKarma filtering out new accounts.

Does this fetch reply chains under deeply nested comments? Yes, up to commentDepth levels (default 3). HN threads sometimes go 8–10 levels deep; raise the limit if you need the full tree, but expect cost to scale with thread size.

Can I export to CSV / XML / RSS? Apify supports all of those formats out of the box — pick your format in the "Export results" panel after a run finishes.

What about private / dead / deleted content? Deleted comments and stories are skipped (you won't see hollow placeholder records). Reply chains beneath a deleted comment are still traversed when present.

Will this work on a free Apify plan? Yes. Typical runs cost cents, well within the free tier's monthly compute budget.

Limits (the honest list)

Show HN / Ask HN classification is taken from the title prefix and from HN's own tags. Stories that aren't formally tagged Show HN but include "Show HN" in casual text will be classified as Show HN; this matches HN's own behavior.
Comment trees are capped by maxCommentsPerStory (default 200). On the most viral threads (Anthropic-acquires-Bun-tier discussions with 1000+ comments) you'll get the top 200 by BFS order, not every leaf.
Comment sentiment / topic extraction is not included. You get the raw text — sentiment is a downstream concern.
User contact extraction is best-effort. It scans the bio the user wrote about themselves; if they didn't put their email in there, we can't surface it.
Real-time push / streaming is not supported. This is a batch scraper. Schedule it on Apify's cron and pipe to a webhook for "almost-real-time" workflows.
No login-required content. Everything we return is public — HN doesn't gate content behind auth in any meaningful way, so this is rarely a problem.

Maintained by brilliant_gum on the Apify platform. Open an issue on the actor page for bugs, feature requests, or pricing questions.

Hacker News Scraper

noximilian/hackernews-scraper

Scrape Hacker News stories, comments, and user profiles. Fetch top, new, best, ask, show, and job stories with full item details. Extract comments, user karma, and submission history.

Noximilian

Hackernews Scraper

fortuitous_pirate/hackernews-scraper

Extract stories, jobs, Ask HN, and Show HN posts from Hacker News. Get top stories, best stories, job listings, or search by keyword. Returns title, URL, score, comment count, author, and timestamp. Free API — no authentication required.

Fortuitous Pirate

Hacker News Scraper: Stories, Comments & Search

scrapemint/hacker-news-scraper

Search and scrape Hacker News: stories and comments by keyword, author, points, date or category (front page, Ask HN, Show HN). One clean row per story or comment with points, comment count, author and links. No API key.

Ken M

Hacker News Scraper: Stories, Comments, Users & Search

perconey/hackernews-scraper

Scrape Hacker News via the official Firebase API + Algolia search. Top/new/best/ask/show/jobs stories, full comment trees, user profiles with karma, free-text search. No browser, no proxies, no auth. Pay only per result item.

Perconey

Hacker News Scraper

parseforge/hacker-news-scraper

Extract stories, comments, and user data from Hacker News. Browse 6 feed types (Top, New, Best, Ask HN, Show HN, Jobs) or search with filters for points, comments, and date ranges. Get nested comment threads with depth control and author karma scores. Perfect for tech trends monitoring and analysis.

ParseForge

Hacker News Enhanced Scraper - Stories, Comments & Search

hata1234/hn-scraper

Scrape Hacker News stories, comments, and search results via official Firebase and Algolia APIs. No proxy needed. Supports top, best, new, Ask HN, Show HN, job stories, full-text search, comment extraction, and advanced filtering by points, date, and domain.

Moris Chao

Hacker News Scraper — Stories, Comments & Search for RAG

ahampton83/hackernews-scraper

Fetch Hacker News stories, comment threads, and search results as clean structured data with markdown comments. Search by keyword, grab front page stories, or get full comment trees. MCP-enabled for Claude, Cursor, and AI agents.

Aaron Hampton

Hacker News Scraper — Stories, Comments & Users

openclawmara/hacker-news-scraper

Scrape Hacker News stories, comments, and user profiles. Extract trending tech news, top stories by score, new submissions, Ask HN, Show HN, and job posts. Filter by date, score, and comment count. Perfect for tech trend analysis, competitive intelligence, and content curation.

OpenClaw Mara

Hacker News Scraper — Stories, Comments & Users API

sian.agency/hacker-news-scraper

Hacker News scraper & data API. Extract front-page stories, full comment threads, Ask/Show HN, jobs, polls, search results and user profiles: title, score, author, comment count, body text, links, karma — clean JSON/CSV. Fast overview or full detail. No-code, no API key needed.