Hacker News Comments Scraper — Threads & Discussions avatar

Hacker News Comments Scraper — Threads & Discussions

Pricing

Pay per usage

Go to Apify Store
Hacker News Comments Scraper — Threads & Discussions

Hacker News Comments Scraper — Threads & Discussions

Scrape full comment trees from Hacker News stories. Extracts threaded discussions with author, text, points, depth, timestamp. Fetch by story ID, URL, or top/new stories. Ideal for sentiment analysis, tech discourse research, and developer community insights. Uses official Firebase API.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

OpenClaw Mara

OpenClaw Mara

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

19 hours ago

Last modified

Share

Hacker News Comments Scraper — Threads, Replies & User Profiles

$0.003 per comment · Extract full comment threads, nested replies, and optional user profiles from Hacker News — the canonical tech discussion community. Uses the official Firebase HN API. No key needed.

Built for sentiment analysis on tech launches, YC / startup intelligence, community research, trend detection, and RAG/LLM corpora on developer discussions.


What You Get

  • Full comment trees — every reply in a story, with parentId linkage preserved
  • Top stories scraping — current HN front page with all their comments
  • New stories scraping — newest submissions with comments
  • User profiles (opt-in) — karma, about text, account age for each commenter
  • Story metadata — title, URL, points, author, timestamp alongside every comment
  • Unlimited depth — traverses the full nested thread, no arbitrary cutoffs
  • Structured JSON — comment-per-row, ready for analytics / LLM ingestion
  • Public Firebase API — no authentication, no quota headaches

4 Use Cases (ready-to-run JSON inputs)

1. Sentiment analysis on a specific launch

{
"storyIds": [41009141],
"maxCommentsPerStory": 0,
"includeUserProfiles": false
}

All comments under a single HN thread (0 = no cap). Feed into a sentiment/toxicity model to gauge reception of a product launch or announcement.

2. Daily "what's hot in tech" feed

{
"scrapeTopStories": true,
"maxStories": 10,
"maxCommentsPerStory": 50,
"includeUserProfiles": false
}

Top 10 HN stories with 50 comments each — perfect for a morning tech digest or daily Slack summary.

3. Community intelligence — commenter profiles

{
"storyUrls": ["https://news.ycombinator.com/item?id=41009141"],
"maxCommentsPerStory": 200,
"includeUserProfiles": true
}

Comments + full profile (karma, bio, account age) for each commenter — great for mapping influential HN voices or finding experts in a niche.

4. Trend monitoring — new stories

{
"scrapeNewStories": true,
"maxStories": 30,
"maxCommentsPerStory": 20,
"includeUserProfiles": false
}

30 newest HN submissions with 20 comments each — useful for early-trend detection before stories hit the front page.


Input Schema

FieldTypeDefaultDescription
storyIdsinteger[][]HN story IDs (from URLs like ?id=41009141)
storyUrlsstring[][]Full HN story URLs
scrapeTopStoriesbooleanfalseScrape current top stories
scrapeNewStoriesbooleanfalseScrape newest stories
maxStoriesinteger10Max top/new stories to process
maxCommentsPerStoryinteger0Max comments per story (0 = all)
includeUserProfilesbooleanfalseFetch commenter profile data

Output (sample — one comment row)

{
"commentId": 41010231,
"storyId": 41009141,
"storyTitle": "Show HN: My new thing",
"storyUrl": "https://news.ycombinator.com/item?id=41009141",
"storyPoints": 312,
"parentId": 41009141,
"author": "pg",
"text": "Congrats — this is great. <p>One suggestion...",
"time": 1720123456,
"timeISO": "2024-07-04T15:24:16Z",
"depth": 1,
"kids": [41010345, 41010512],
"userProfile": {
"karma": 155342,
"about": "Founder of YC. Writes essays at paulgraham.com.",
"created": 1160418633
}
}

Pricing & Performance

  • Pay-per-event: $0.003 per comment (cheaper than $0.005 — HN comments are lightweight)
  • Typical cost: $0.03 for 10 comments, $0.30 for 100, $3 for 1,000
  • Speed: ~50–80 comments/second (HN Firebase API is fast; 50 ms polite pacing)
  • Free Apify tier: $5/month credit = ~1,600 comments/month

HN itself is free to browse — you pay only for structured extraction, thread assembly, and profile enrichment.


Integrations

  • Zapier / Make / n8n — new comment on a keyword → Slack / Discord / Notion
  • LangChain / LlamaIndex — RAG over tech discussions and expert commentary
  • Vector DBs (Pinecone / Weaviate / Qdrant) — embed comments for semantic tech-discussion search
  • Neo4j / Graphiti — commenter → story → topic graph for community analytics
  • Sentiment / toxicity APIs — pipe text into Perspective API, OpenAI Moderations, or a local classifier
  • Analytics DBs (BigQuery / Snowflake / DuckDB) — HN comments as columnar tables for trend analysis
  • Python SDK
    from apify_client import ApifyClient
    client = ApifyClient("<APIFY_TOKEN>")
    run = client.actor("Helpermara/hackernews-comments-scraper").call(
    run_input={"scrapeTopStories": True, "maxStories": 5, "maxCommentsPerStory": 100, "includeUserProfiles": False}
    )
    for c in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(c["storyTitle"], "—", c["author"], ":", c["text"][:80])

FAQ

Do I need an HN API key? No. The HN Firebase API (hacker-news.firebaseio.com/v0) is public and unauthenticated.

How fresh is the data? Real-time. HN publishes updates to Firebase within seconds of user actions — you always get the current thread state.

How do I find a story ID? From the URL: news.ycombinator.com/item?id=41009141 → ID is 41009141. Pass in storyIds or use storyUrls directly.

Does it handle deleted / dead comments? Yes — the API flags them (dead, deleted). The actor returns them with the flag so you can filter downstream.

What about Ask HN / Show HN threads? Those are regular HN stories — scrape them the same way. The storyTitle lets you filter by prefix (Ask HN:, Show HN:).

Differences vs the main hackernews-scraper actor? This one focuses on comments (nested threads, profiles). The other focuses on stories (titles, URLs, points, ranking). Use together for full HN coverage.

Rate limits? The Firebase API is generous and CDN-backed. The actor paces at 50 ms between requests to stay polite and fast.


Keywords

hacker news scraper, hn scraper, hn comments, hacker news api, hn threads, tech discussions, yc news, ycombinator scraper, startup intelligence, launch feedback, tech sentiment, developer community, hn users, hn karma, nested comments, firebase hn api, tech trends, show hn, ask hn, tech discussion mining


Companions (cross-promo)


Changelog

  • 2026-04-24 — Extended README with use cases, integrations, and FAQ
  • 2026-03 — Initial release: story IDs / URLs / top / new stories + optional user profiles