๐Ÿ”ถ Hacker News Scraper โ€” Stories & Tech Trends avatar

๐Ÿ”ถ Hacker News Scraper โ€” Stories & Tech Trends

Pricing

from $8.00 / 1,000 results

Go to Apify Store
๐Ÿ”ถ Hacker News Scraper โ€” Stories & Tech Trends

๐Ÿ”ถ Hacker News Scraper โ€” Stories & Tech Trends

Scrape Hacker News stories โ€” top, new, best, ask, show, jobs. Engagement tracking, trend analysis, and tech topic monitoring.

Pricing

from $8.00 / 1,000 results

Rating

0.0

(0)

Developer

Stephan Corbeil

Stephan Corbeil

Maintained by Community

Actor stats

0

Bookmarked

27

Total users

7

Monthly active users

a day ago

Last modified

Share

๐Ÿ” Hacker News Scraper & Tech Trend Tracker

Extract trending stories, comments, and metadata from Hacker News at scale. A drop-in alternative to the HN Algolia API and the official Firebase API โ€” with bulk pagination, comment-thread expansion, and structured JSON output, no rate-limit headaches.

Why This HN Scraper Beats HN Algolia, Firebase API & Manual Polling

FeatureNexGenData Hacker News ScraperHN Algolia APIHN Official Firebase APIManual cron + scraping
Cost$5 / 1,000 results, pay-per-eventFree but rate-limitedFree but no batchEngineering time + infra
Bulk paginationUp to 500 stories per runPlan-limitedOne ID at a timeBuild it yourself
Comment threadsFull nested comments per storySeparate callsWalk descendants tree manuallyBuild it yourself
Story feedstop, new, best, ask, show, jobLimited categoriesYes (one per call)Build it yourself
Output formatJSON / CSV / ExcelJSONJSONWhatever you write
Schedule + webhookNative cron + webhook on completionNoneNoneBuild it yourself
Time-to-first-row< 60 secondsSignup neededYes (slow per-call)Days
AuthApify tokenNone (anon)NoneYour IP / proxy
MaintenanceWe handle itAlgolia handles itYou handle Firebase quirksYou handle everything

Most teams pick this scraper because it is faster than walking the Firebase descendants tree by hand and more flexible than Algolia's fixed search index โ€” and it ships JSON straight to a dataset, no Firebase SDK required.

What This Actor Does

The Hacker News Scraper connects directly to Hacker News' Firebase API to extract stories, comments, and metadata in seconds. No parsing, no rate limits, no complex API documentation. Whether you're tracking tech trends, monitoring startup mentions, or feeding AI training data, this actor delivers structured JSON output you can use immediately.

Perfect for:

  • Startups building competitive intelligence systems
  • Data scientists gathering training datasets
  • Content strategists tracking industry discussions
  • Researchers analyzing tech community behavior
  • Automated news feeds and aggregators

Why Scrape Hacker News?

Hacker News data extraction powers decision-making across tech companies. HN discussions reveal product launches before major announcements, engineering challenges competitors are solving, investor and founder sentiment shifts, early signals for emerging technologies, and real-time feedback on industry trends.

Key Features

Search Multiple Story Types

Need top Hacker News stories? Use searchType: top. Want trending HN tech news? Try searchType: best. The actor supports all six story feeds: top (frontpage stories), new (recently submitted), best (ranked by score with visibility weighting), ask (Ask HN discussions), show (Show HN project submissions), and job (job postings and hiring).

Fetch Exact Result Counts

Set maxResults from 1 to 500. Whether you need the top 10 Hacker News articles for a daily brief or 500 HN stories for machine learning training data, get exactly what you specify.

Include Full Comment Threads

Set includeComments: true to attach every comment under each story. Extract sentiment, track discussions, build comment datasets. With includeComments: false, run faster and leaner when you only need stories.

Fast Execution

Leverages HN's Firebase backend for speed. Most requests complete in under 30 seconds.

Real-World Use Cases

1. Competitive Intelligence Dashboard

Automatically surface mentions of competitors, their products, and industry discussions daily. Feed results into a dashboard that flags stories mentioning competitor names. Set it to run daily on searchType: new with maxResults: 100. Sales teams get alerts when competitors are discussed, what people like about them, and what criticism appears in comments.

2. AI Training Dataset for Tech Sentiment Analysis

Build production-grade datasets for fine-tuning LLMs on real tech conversations. A 500-result scrape with includeComments: true gives you 10,000-50,000 comments across stories. Combined with story scores and timestamps, you have labeled sentiment data.

3. Automated Newsletter Content

Run the actor daily on searchType: top, extract titles and top-voted comments, feed into your newsletter template. Readers see what the HN community is discussing with context from real discussions.

4. Job Board Aggregation

Set searchType: job and maxResults: 100 to scrape HN's job listings. Automatically notify candidates when companies in target cities are hiring. Extract company names and roles from structured output.

Input Parameters

ParameterTypeRangeDescription
searchTypestringtop, new, best, ask, show, jobWhich HN feed to scrape. Default: top
maxResultsnumber1-500How many stories to extract. Default: 30
includeCommentsbooleantrue/falseAttach all comments under each story. Default: false

Quick Start

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("nexgendata/hacker-news-scraper").call(run_input={
"searchType": "top",
"maxResults": 100,
"includeComments": False,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item.get("title"), item.get("score"))

Sample Output

{
"id": 42840302,
"title": "Building a machine learning model for production",
"url": "https://example.com/ml-guide",
"score": 487,
"descendants": 142,
"time": 1711723200,
"type": "story",
"by": "techauthor",
"comments": [
{"id": 42840910, "text": "Great breakdown...", "score": 52, "by": "commentor1"}
]
}

Pricing: $5 per 1,000 Results

Cost breakdown: Scrape 30 stories = $0.15. Scrape 100 stories = $0.50. Scrape 500 stories = $2.50.

Building it yourself costs more: 40+ hours to write, test, and deploy a reliable HN scraper (~$2,000 in dev time), plus 5-10 hours/month in maintenance when things change.

FAQ

Will this scraper get blocked or rate-limited? No. The actor uses Hacker News' own Firebase API, which is public and official. No rate limits, no blocking risk. HN publicly documents and allows automated access via this API.

How fresh is the data? Real-time. The actor pulls directly from HN's live database. Stories appear in your output within seconds of being posted.

Can I schedule this to run daily automatically? Yes. Apify handles scheduling natively. Set up a daily run on your preferred search type and let it populate your database automatically.

Is my data private? Completely. All data stays within your Apify account. nexgendata has no access to results, metadata, or usage patterns.

How is this different from the HN Algolia API? HN Algolia is a search index built on top of HN โ€” great for full-text search across years of HN history, but the rate limit is real and the JSON shape doesn't include the comment tree. This actor walks the Firebase tree for you and ships flat comment arrays.

Agentic payments (AI agent buyers welcome)

This actor supports autonomous payment via Skyfire โ€” AI agents (Claude Desktop, OpenCode, Cursor, Vercel AI SDK, OpenAI Agents SDK) can discover, fund, and invoke it without a human-in-the-loop credit card flow.

Agents using Apify's MCP server can find this actor by searching for Hacker News stories, YC News trend monitoring, or tech community signals and pay via a Skyfire PAY token (minimum $5 prefund). The existing pay-per-event pricing applies unchanged โ€” the agent funds a token, runs the actor at the published per-result rate, and unused balance returns to the wallet on expiry.

Compatible agent frameworks:

  • Apify's official MCP server (mcp.apify.com)
  • Claude Desktop with Apify MCP integration
  • OpenCode + Apify MCP
  • OpenAI Agents SDK + Skyfire toolkit (via Composio)
  • Vercel AI SDK + Skyfire toolkit (via Composio)

No code changes needed on the actor side โ€” the integration is fully on Apify's infrastructure. AI agents discover via allowsAgenticUsers=true filter on Apify's store API.

Use caseActor
Show HN launch trackerHN Show HN Tracker
Reddit subreddit trend & post trackerReddit Subreddit Trends
News & media monitoring for AI agentsNews MCP Server
Indie Hackers product launchesIndie Hackers Products Tracker
Product Hunt launches trackerProduct Hunt Launches Scraper
Wikipedia structured-knowledge scraperWikipedia Scraper
Google Scholar paper searchGoogle Scholar Scraper
arXiv preprint searcharXiv Scraper

About NexGenData

NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b


How NexGenData Pricing Works

Every NexGenData actor uses pay-per-event pricing โ€” you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.

  • Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
  • Result: charged per item written to the default dataset
  • No charge for retries, internal proxy rotation, or failed sub-requests โ€” those are absorbed by the platform

If you only need the data once a quarter, you only pay once a quarter. If you scale to millions of records, the unit cost stays the same.

Apify Platform Bonus

New to Apify? Sign up with the NexGenData referral link โ€” you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.

Integration Surface

Every actor in the NexGenData catalog can be triggered from:

  • Apify console โ€” point-and-click run
  • Apify API โ€” REST + webhooks
  • Apify Python / JS SDKs โ€” programmatic batch
  • Zapier, Make.com, n8n โ€” official integrations
  • MCP โ€” many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
  • Schedules โ€” built-in cron for daily / weekly / monthly runs
  • Webhooks โ€” POST results to any HTTPS endpoint on dataset write

Support

NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome โ€” high-demand features ship in the next version.

Home: thenextgennexus.com Full catalog: apify.com/nexgendata