Pricing

Pay per event

Hacker News Stories & Comments Scraper

Extract trending tech discussions, nested comment hierarchies, and post scores from Hacker News directly into structured JSON for custom RAG pipelines.

Pricing

Pay per event

Rating

0.0

(0)

Developer

naoki anzai

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

📰 Hacker News Scraper

Feed your artificial intelligence pipelines and custom RAG applications with high-quality, vetted tech discussions by extracting data directly from Hacker News. This robust Hacker News scraper is purpose-built for AI researchers, data scientists, and developer teams who require highly structured conversational text to train sentiment analysis models and build search aggregators. By bypassing fragile web page HTML parsing and querying the official Firebase API directly, the scraper ensures your extraction tasks run flawlessly and return perfectly formatted JSON results every time.

Automate your data collection workflow by scheduling the scraper to run on a daily or weekly basis. You can effortlessly scrape the top 100 trending posts alongside their complete, nested comment hierarchies. Filter the extracted results by setting a minimum score threshold, guaranteeing you only collect meaningful text that has gained genuine traction within the developer community. This targeted extraction is ideal for teams building AI agents designed to summarize emerging GitHub repositories, track new developer tools, or analyze sentiment around newly released AI research papers.

The scraped data is delivered in a highly structured format, granting you deep programmatic access to multi-level nested comment trees, detailed author profiles, precise post scores, and external URLs. Stop manually scraping unstructured websites or struggling with brittle CSS selectors. With this extractor, you can reliably capture the internet's most valuable tech insights and seamlessly integrate them into your overarching data strategy.

Store Quickstart

Start with the Quickstart template (top stories, 20 items). For tech trend monitoring, use Top Trends with minScore=100 and domain analysis.

Key Features

🔥 Official Firebase API — hacker-news.firebaseio.com — 10+ year stable
📂 6 story modes — top, new, best, ask, show, job
⭐ Score filtering — Minimum score threshold for quality filtering
💬 Comment threads — Optional nested comment extraction
🏷️ Top domains analysis — Which domains dominate the front page
🔑 No API key needed — Public Firebase API

Use Cases

Who	Why
Tech journalists	Daily Hacker News trend reports
Startup founders	Watch which tools/frameworks gain HN traction
VCs/Investors	Signal for emerging tech and founder announcements
Developer tool companies	Monitor HN sentiment on products and competitors
AI/ML researchers	Discover papers and repos trending in tech community

Input

Field	Type	Default	Description
mode	string	top	top, new, best, ask, show, job
maxItems	integer	30	Max stories (1-500)
minScore	integer	0	Minimum score filter
includeComments	boolean	false	Include comment threads

Input Example

{
  "mode": "top",
  "maxItems": 30,
  "minScore": 100,
  "includeComments": false
}

Input Examples

Example: Top stories snapshot

{
  "feed": "topstories",
  "maxStories": 30,
  "commentDepth": 1
}

Example: Keyword search across history

{
  "query": "Rust",
  "maxResults": 100,
  "sortBy": "byPopularity"
}

Example: Story + full comment tree

{
  "storyIds": [
    42096277
  ],
  "commentDepth": 5
}

Output

Field	Type	Description
`id`	integer	HN story ID
`title`	string	Story title
`url`	string	External URL (if any)
`author`	string	HN username
`score`	integer	Upvote score
`numComments`	integer	Comment count
`createdAt`	string	ISO timestamp
`hnUrl`	string	Hacker News thread URL
`comments`	object[]	Top comments (if includeComments enabled)

Output Example

{
  "id": 12345678,
  "title": "Claude 4.5 released with new features",
  "url": "https://anthropic.com/news/claude-4-5",
  "score": 523,
  "by": "user123",
  "time": 1712345678,
  "descendants": 142,
  "type": "story"
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~hacker-news-intelligence/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{ "mode": "top", "maxItems": 30, "minScore": 100, "includeComments": false }'

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/hacker-news-intelligence").call(run_input={
  "mode": "top",
  "maxItems": 30,
  "minScore": 100,
  "includeComments": false
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/hacker-news-intelligence').call({
  "mode": "top",
  "maxItems": 30,
  "minScore": 100,
  "includeComments": false
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

Use mode: "top" for the front page, "new" for breaking submissions.
Set minScore: 50 to filter out noise and focus on signal.
Schedule daily to track trending dev/startup topics.
Combine with Article Content Extractor to fetch full content of linked stories.

FAQ

What does score mean?

Net upvotes (upvotes minus downvotes). 100+ is front-page quality. 500+ is viral.

How often does the HN front page update?

Rapidly — rankings shift every few minutes. Scrape hourly for trend tracking.

Can I get old/archived stories?

Yes, the 'new' mode iterates chronologically; 'best' returns high-score stories over time.

What's the comment limit?

All comments under a story are available via the API. Comment-heavy posts slow down extraction.

What's the difference vs the official HN API?

This actor handles pagination, deduplication, comment threading, and outputs to Apify dataset — no SDK needed.

Can I search HN by keyword?

Use the Algolia HN search API for keyword search. This actor focuses on top/new/best feeds.

News & Content cluster — explore related Apify tools:

📰 Google News Scraper — Scrape Google News articles for any search query via official RSS feed.
📰 Article Extractor — Extract clean article content with title, author, publish date, images from news and blog pages.
📄 Website Content Extractor — Extract clean main content from any webpage as text, markdown, or HTML.
📡 RSS Feed Aggregator — Aggregate multiple RSS and Atom feeds with keyword filtering and deduplication.
📡 Reddit All-in-One Scraper — Scrape Reddit subreddits, posts, comments, user profiles, and search results via public JSON endpoints.
🚨 Reddit Keyword Monitor Alerts — Focused Reddit keyword and subreddit monitor built for recurring alerts, snapshot diffing, and webhook handoff.

Cost

Pay Per Event:

actor-start: $0.01 (flat fee per run)
dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.

Hacker News Scraper - Stories & Comments

pear_fight/hackernews-scraper

Scrape Hacker News stories, comments & user profiles. Extract titles, URLs, scores, comment counts, timestamps, full comment threads. Monitor trending tech topics in real time. Pay per result. Export JSON/CSV.

Harald

Hacker News Scraper — Stories, Comments & Trends

oneary/hackernews-scraper

Scrape Hacker News — front page stories, newest posts, job listings, and comments. Track trending tech topics and discussions.

Luan M.

Hacker News Scraper - Stories, Comments & Trends

viralanalyzer/hackernews-intelligence

Scrape Hacker News stories, comments, and discussions. Track tech trends, startup news, and developer community sentiment.

viralanalyzer

5.0

Hacker News Scraper - Stories & Comments

spiky_pepperoni/hacker-news-scraper

Search Hacker News stories and comments by keyword. No login.

Arad S

Hacker News Scraper

sweet_rebel/hacker-news-scraper

Rajat Sharda

Hacker News Scraper

klondikeking/hacker-news-scraper

Pierrick McD0nald

Hacker News MCP Server

nyxar_dev/hackernews-mcp

Read top stories, comments, and search Hacker News via MCP. Get real-time tech news, discussions, and trending topics from the HN community.

Nyxar Dev

Hacker News Deep Scraper

fluxcurulin/hn-scraper

Extract Hacker News stories, points, authors, comment counts, and links with full metadata. Track tech trends, monitor startup discussions, and export structured data for market intelligence and competitive analysis.